Hi Stan
thanks for your interest. PM on it's way.
![]()
Hi Stan
thanks for your interest. PM on it's way.
![]()
From what I have Googled its near impossible to block the Baidu bot. One guy even has China totally blocked from his server and then he said it arrived via a Japanese IP too.
We have blocked the requested Baidu IP range on our server with firewall .
Hello Dave,
Thanks for your PM. We taken the necessary action, the same could be seen in the reply stated by Tux.
Kindly revert if you face the same issue again, we'd be glad to offer assistance with sorting it.
Thank you.
Best Regards,
Stan
Web Hosting UK | Cloud Servers | SSL Certificate
Best Paying Affiliate Programs offered by Eukhost Ltd.
Sorry for this belated comment. I also had a lot of problems with Baidu, and it was all the more annoying since I am currently in Beijing, P.R.China anyway, and staying reasonably close to their main office. I was tempted at times to go around and try to "have it out" with them in person, but that wouldn't have helped, and it might have made trouble for me anyway, given the way things are sometimes done here.
Despite what they claim, they visited my site excessively, and ignored robots.txt . There are other, similar robots and crawlers originating within China that are almost as bad as well. The problem is that I have seen it alleged that there is some evidence that so-called robots or crawlers originating from Baidu might actually be some form of automated scraping attempt disguised as originating from Baidu. I actually looked at the good advice given on Stop Forum Spam which may well be useful for others facing similar or related problems. The difficulty with blocking entire countries, which is what some advocate, is that you can make it difficult for perfectly legitimate users to access sites they may want to use legitimately (for example, it is probably likely that many or even almost all expats in China have no nefarious intentions when surfing, but can be blocked by people blanket-blocking China.)
DDS - China and UK.
I'm personally not too sure what to suggest. Of course the easy solution is blocking the entire country from accessing your site, but again, that may block legitimate users from your site. If most of your visitors are not from China, and it wouldn't affect your site revenue or business too much, you may wish to consider simply blocking the entire country altogether.
Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!
eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?
Chat to us on Twitter!
Join our incentive affiliate program now - and earn generous commission with each sale!
How do I contact eUKhost?
Support: Client Area - 0808 262 0455
Sales: sales[@]eukhost.com - 0800 862 0380
Contact eUKhost Management
Customer Relations:
feedback@eukhost.com - 0808 262 0255
Contact:
ben@eukhost.com
Skype: euk_ben
Windows Live Messenger: ben@eukhost.com
Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.
I agree that it is essentially a cost-benefit trade-off problem: do the costs of blocking an entire country get repaid in the lessening of the costs of having your bandwidth eaten into and worse by badly-behaved robots apparently originating from within that country by blocking access from within that country?
In some cases, it perhaps may not be a benefit to blanket-ban a country if other actions are being routinely done: for example, if you regularly monitor accesses to your site anyway, then depending on how frequently you access the site, you could just use .htaccess to ban the already-known ip-addresses of the rogue robots, and then reactively ban any other new ip address that the robot pops up using - that would have dealt with the Japanese incarnation of Baidu that a user reported (I think on Stop Forum Spam , if I recall, as well as being mentioned here.) The problem is that the accesses may be so frequent that one might have to routinely monitor accesses more than one is prepared to monitor, which would then make a blanket-ban possibly an overall benefit in the cost-benefit decision process.
For that to really work, there needs somewhere to be a centralized list of known ip addresses that search engines' robots use, and one might as well also have a subsection dealing with non-standard robots, scrapers, and so on. Another site that could help with this is Project Honey Pot . Using that one might have a big job of initially editing .htaccess and then a smaller job of updating it.
However, I imagine also that there may be other cost-benefit trade-offs that come into play if or when one has a very large .htaccess file.
Last edited by ddstretch; 02-01-2012 at 14:34. Reason: grmmar improvement
DDS - China and UK.
I think you've answered your own question somewhat then. If you don't make a lot of money or business from China, blocking the entire country may be the most effective solution for you. If you have advertisements on your website, if it is possible, you may want to check how much money you make through AdSense from clicks or impressions originating from Chinese traffic to your website. Again, you may want to consider how much traffic is coming specifically from China and whether this trade off is better than taking the extra traffic brings extra revenue from advertisements from genuine Chinese traffic.
Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!
eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?
Chat to us on Twitter!
Join our incentive affiliate program now - and earn generous commission with each sale!
How do I contact eUKhost?
Support: Client Area - 0808 262 0455
Sales: sales[@]eukhost.com - 0800 862 0380
Contact eUKhost Management
Customer Relations:
feedback@eukhost.com - 0808 262 0255
Contact:
ben@eukhost.com
Skype: euk_ben
Windows Live Messenger: ben@eukhost.com
Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.
Still getting hammered by Baidu. It would seem it can't be stopped.
Have numerous blocks in robots file and have blocked complete IP ranges as below:
Baidu IP ranges trying to block:
180.76.5.0 to 180.76.5.255
and
180.76.6.0 to 180.76.6.255
...yet it's still spidering away with 200 codes!
e.g. from latest visitors
Host: 180.76.5.60
/product.php?p=51167777&c=464
Http Code: 200 Date: Mar 07 09:23:34 Http Version: HTTP/1.1 Size in Bytes: 2313
Referer: -
Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +???????????Baiduspider)
Surely that shouldn't be happening with IP blocks in place (let alone all the robots stuff)?
Can't anything be done about Baidu???
Complete bandwidth sucking nightmare
![]()
Baidu do seem to be on the march again...
Don't waste time with robots.txt.
On a shared server, all you can do is apply rules to your top-level .htaccess
On a VPS, CSF rules are the best bet.
I'm putting together a list, to distribute across all my VPSs, 'cos I'm sick to death of these automated scrapers. Granted, many do originate from China but they do come from around the world, the U.S.A. being the 2nd largest in my experience.
To illustrate how bad the situation is, literally within seconds of a new eNlight server being commissioned a China bot had attempted to crack it!
Bandwidth usage is not the real issue (though it is a pain); it's the additional Load placed on servers and with Pay-As-You-Use this will become an even bigger cost problem.
Nuke em all!
EJ
Yeah, gotta admit that they are a pain. I've / they've managed to avoid me recently (touch wood!).
As EJ has said, CSF rules are probably the best chance of decreasing the amount of traffic you get from them - though it is unlikely to completely eliminate it. But it's a step in the right direction.
David Smith
DPS Computing
http://www.dpscomputing.com (Computing, Reviews, News) - We're still plodding on adding new content and features (August 2011)
http://www.djdavid.co.uk - Massive update! (September 2011) - It's now not neglected!!
http://davidsmith.dpscomputing.com (My Personal Website) - New Site (10/2009)
Thanks Guys, nice to have a sympathetic ear even if I can't fix the problem (on reseller)
Does anyone have any idea why IP blocking in cPanel has no effect at all on Baidu?
Site now down - monthly bandwidth used!
Should be able to send Baidu the bill![]()
As a Reseller, you're going to have to add IP blocks on a per account basis. I suggest putting the blocks at /home/user_account/public_html/.htaccess
Note the format, no CIDR or address range and the dropping of the last octet. You could get really mean and just use "180.76."Code:order allow,deny deny from 180.76.5. deny from 180.76.6. allow from all .... rest of file.
Once you have sites back up of course. A week into the month and all the bandwidth used up on Reseller - not good! Are you sure it's just from Baidu and not other sites/scrapers/hotlinks?
Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!
eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?
Chat to us on Twitter!
Join our incentive affiliate program now - and earn generous commission with each sale!
How do I contact eUKhost?
Support: Client Area - 0808 262 0455
Sales: sales[@]eukhost.com - 0800 862 0380
Contact eUKhost Management
Customer Relations:
feedback@eukhost.com - 0808 262 0255
Contact:
ben@eukhost.com
Skype: euk_ben
Windows Live Messenger: ben@eukhost.com
Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.
Thanks ej, already have this in htaccess though and I'm afraid it doesn't work.
Sorry Ben - I'm not sure I fully understand you. Are you suggesting that the bandwidth sucking isn't coming from Baidu???
I was suggesting that it isn't all Baidu (and certainly not just restricted to those IP addresses) - Ben, I think, was suggesting otherwise. If the inclusion of the code above, in the place that I've indicated, doesn't block that IP range, then it appears the server needs tweaking (example: in CSF setting .htaccess checking to three levels deep).
Do you have Webalizer/Awstats to show where the bandwidth is being consumed? These should detail specific IP addresses.
It is Baidu and those IPs are accessing despite everything:
From cPanel
IPs Blocked
By racinefan at 2012-03-08
Visits
By racinefan at 2012-03-08
By racinefan at 2012-03-08
By racinefan at 2012-03-08
...more pages
By racinefan at 2012-03-08
...and more pages
By racinefan at 2012-03-08
...same through to page 10
By racinefan at 2012-03-08
... CONTINUOS (I.E. NOT JUST TODAY - THIS IS WHAT IS SUCKING MY BANDWIDTH)
More detail from Legacy
By racinefan at 2012-03-08
Pretty conclusive I would have thought
Just can't stop Baidu
![]()
From an outsider's perspective it does appear that your server is ignoring the .htaccess directives. Support should be able to assist with this.
Could it be that the "env=bad_bot" entry is affecting things?
Also, I have seen .htaccess become corrupt and you could rename as .htaccess.old, then create a new one from scratch, based on the old one.
There are currently 1 users browsing this thread. (0 members and 1 guests)