Page 2 of 3 FirstFirst 123 LastLast
Results 21 to 40 of 46
  1. #21

    Default

    Hi Stan

    thanks for your interest. PM on it's way.


  2. #22
    Join Date
    Nov 2005
    Location
    Nuevo Mexico
    Posts
    1,411

    Default

    From what I have Googled its near impossible to block the Baidu bot. One guy even has China totally blocked from his server and then he said it arrived via a Japanese IP too.

  3. #23
    Join Date
    Apr 2007
    Posts
    94

    Default

    We have blocked the requested Baidu IP range on our server with firewall .

  4. #24
    Join Date
    Sep 2011
    Posts
    139

    Default

    Hello Dave,

    Thanks for your PM. We taken the necessary action, the same could be seen in the reply stated by Tux.

    Kindly revert if you face the same issue again, we'd be glad to offer assistance with sorting it.

    Thank you.

    Best Regards,
    Stan
    Web Hosting UK | Cloud Servers | SSL Certificate
    Best Paying Affiliate Programs offered by Eukhost Ltd.

  5. #25
    Join Date
    May 2011
    Location
    Beijing, P.R. China and Stoke-on-Trent, UK
    Posts
    17

    Exclamation Belated comment

    Sorry for this belated comment. I also had a lot of problems with Baidu, and it was all the more annoying since I am currently in Beijing, P.R.China anyway, and staying reasonably close to their main office. I was tempted at times to go around and try to "have it out" with them in person, but that wouldn't have helped, and it might have made trouble for me anyway, given the way things are sometimes done here.

    Despite what they claim, they visited my site excessively, and ignored robots.txt . There are other, similar robots and crawlers originating within China that are almost as bad as well. The problem is that I have seen it alleged that there is some evidence that so-called robots or crawlers originating from Baidu might actually be some form of automated scraping attempt disguised as originating from Baidu. I actually looked at the good advice given on Stop Forum Spam which may well be useful for others facing similar or related problems. The difficulty with blocking entire countries, which is what some advocate, is that you can make it difficult for perfectly legitimate users to access sites they may want to use legitimately (for example, it is probably likely that many or even almost all expats in China have no nefarious intentions when surfing, but can be blocked by people blanket-blocking China.)
    DDS - China and UK.

  6. #26
    Join Date
    Jan 2007
    Location
    United Kingdom
    Posts
    3,011

    Default

    Quote Originally Posted by ddstretch View Post
    Sorry for this belated comment. I also had a lot of problems with Baidu, and it was all the more annoying since I am currently in Beijing, P.R.China anyway, and staying reasonably close to their main office. I was tempted at times to go around and try to "have it out" with them in person, but that wouldn't have helped, and it might have made trouble for me anyway, given the way things are sometimes done here.

    Despite what they claim, they visited my site excessively, and ignored robots.txt . There are other, similar robots and crawlers originating within China that are almost as bad as well. The problem is that I have seen it alleged that there is some evidence that so-called robots or crawlers originating from Baidu might actually be some form of automated scraping attempt disguised as originating from Baidu. I actually looked at the good advice given on Stop Forum Spam which may well be useful for others facing similar or related problems. The difficulty with blocking entire countries, which is what some advocate, is that you can make it difficult for perfectly legitimate users to access sites they may want to use legitimately (for example, it is probably likely that many or even almost all expats in China have no nefarious intentions when surfing, but can be blocked by people blanket-blocking China.)
    I'm personally not too sure what to suggest. Of course the easy solution is blocking the entire country from accessing your site, but again, that may block legitimate users from your site. If most of your visitors are not from China, and it wouldn't affect your site revenue or business too much, you may wish to consider simply blocking the entire country altogether.
    Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!

    eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
    How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?

    Chat to us on Twitter!
    Join our incentive affiliate program now - and earn generous commission with each sale!

    How do I contact eUKhost?
    Support: Client Area - 0808 262 0455
    Sales: sales[@]eukhost.com - 0800 862 0380
    Contact eUKhost Management

    Customer Relations:
    feedback@eukhost.com - 0808 262 0255

    Contact:
    ben@eukhost.com
    Skype: euk_ben
    Windows Live Messenger: ben@eukhost.com


    Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.

  7. #27
    Join Date
    May 2011
    Location
    Beijing, P.R. China and Stoke-on-Trent, UK
    Posts
    17

    Default

    Quote Originally Posted by Ben View Post
    I'm personally not too sure what to suggest. Of course the easy solution is blocking the entire country from accessing your site, but again, that may block legitimate users from your site. If most of your visitors are not from China, and it wouldn't affect your site revenue or business too much, you may wish to consider simply blocking the entire country altogether.
    I agree that it is essentially a cost-benefit trade-off problem: do the costs of blocking an entire country get repaid in the lessening of the costs of having your bandwidth eaten into and worse by badly-behaved robots apparently originating from within that country by blocking access from within that country?

    In some cases, it perhaps may not be a benefit to blanket-ban a country if other actions are being routinely done: for example, if you regularly monitor accesses to your site anyway, then depending on how frequently you access the site, you could just use .htaccess to ban the already-known ip-addresses of the rogue robots, and then reactively ban any other new ip address that the robot pops up using - that would have dealt with the Japanese incarnation of Baidu that a user reported (I think on Stop Forum Spam , if I recall, as well as being mentioned here.) The problem is that the accesses may be so frequent that one might have to routinely monitor accesses more than one is prepared to monitor, which would then make a blanket-ban possibly an overall benefit in the cost-benefit decision process.

    For that to really work, there needs somewhere to be a centralized list of known ip addresses that search engines' robots use, and one might as well also have a subsection dealing with non-standard robots, scrapers, and so on. Another site that could help with this is Project Honey Pot . Using that one might have a big job of initially editing .htaccess and then a smaller job of updating it.

    However, I imagine also that there may be other cost-benefit trade-offs that come into play if or when one has a very large .htaccess file.
    Last edited by ddstretch; 02-01-2012 at 14:34. Reason: grmmar improvement
    DDS - China and UK.

  8. #28
    Join Date
    Jan 2007
    Location
    United Kingdom
    Posts
    3,011

    Default

    Quote Originally Posted by ddstretch View Post
    I agree that it is essentially a cost-benefit trade-off problem: do the costs of blocking an entire country get repaid in the lessening of the costs of having your bandwidth eaten into and worse by badly-behaved robots apparently originating from within that country by blocking access from within that country?

    In some cases, it perhaps may not be a benefit to blanket-ban a country if other actions are being routinely done: for example, if you regularly monitor accesses to your site anyway, then depending on how frequently you access the site, you could just use .htaccess to ban the already-known ip-addresses of the rogue robots, and then reactively ban any other new ip address that the robot pops up using - that would have dealt with the Japanese incarnation of Baidu that a user reported (I think on Stop Forum Spam , if I recall, as well as being mentioned here.) The problem is that the accesses may be so frequent that one might have to routinely monitor accesses more than one is prepared to monitor, which would then make a blanket-ban possibly an overall benefit in the cost-benefit decision process.

    For that to really work, there needs somewhere to be a centralized list of known ip addresses that search engines' robots use, and one might as well also have a subsection dealing with non-standard robots, scrapers, and so on. Another site that could help with this is Project Honey Pot . Using that one might have a big job of initially editing .htaccess and then a smaller job of updating it.

    However, I imagine also that there may be other cost-benefit trade-offs that come into play if or when one has a very large .htaccess file.
    I think you've answered your own question somewhat then . If you don't make a lot of money or business from China, blocking the entire country may be the most effective solution for you. If you have advertisements on your website, if it is possible, you may want to check how much money you make through AdSense from clicks or impressions originating from Chinese traffic to your website. Again, you may want to consider how much traffic is coming specifically from China and whether this trade off is better than taking the extra traffic brings extra revenue from advertisements from genuine Chinese traffic.
    Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!

    eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
    How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?

    Chat to us on Twitter!
    Join our incentive affiliate program now - and earn generous commission with each sale!

    How do I contact eUKhost?
    Support: Client Area - 0808 262 0455
    Sales: sales[@]eukhost.com - 0800 862 0380
    Contact eUKhost Management

    Customer Relations:
    feedback@eukhost.com - 0808 262 0255

    Contact:
    ben@eukhost.com
    Skype: euk_ben
    Windows Live Messenger: ben@eukhost.com


    Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.

  9. #29

    Default

    Still getting hammered by Baidu. It would seem it can't be stopped.

    Have numerous blocks in robots file and have blocked complete IP ranges as below:

    Baidu IP ranges trying to block:

    180.76.5.0 to 180.76.5.255
    and
    180.76.6.0 to 180.76.6.255

    ...yet it's still spidering away with 200 codes!

    e.g. from latest visitors

    Host: 180.76.5.60
    /product.php?p=51167777&c=464
    Http Code: 200 Date: Mar 07 09:23:34 Http Version: HTTP/1.1 Size in Bytes: 2313
    Referer: -
    Agent: Mozilla/5.0 (compatible; Baiduspider/2.0; +???????????Baiduspider)


    Surely that shouldn't be happening with IP blocks in place (let alone all the robots stuff)?

    Can't anything be done about Baidu???
    Complete bandwidth sucking nightmare

  10. #30
    Join Date
    Jan 2011
    Location
    Haggisland
    Posts
    228

    Post

    Baidu do seem to be on the march again...

    Don't waste time with robots.txt.
    On a shared server, all you can do is apply rules to your top-level .htaccess
    On a VPS, CSF rules are the best bet.
    I'm putting together a list, to distribute across all my VPSs, 'cos I'm sick to death of these automated scrapers. Granted, many do originate from China but they do come from around the world, the U.S.A. being the 2nd largest in my experience.
    To illustrate how bad the situation is, literally within seconds of a new eNlight server being commissioned a China bot had attempted to crack it!


    Bandwidth usage is not the real issue (though it is a pain); it's the additional Load placed on servers and with Pay-As-You-Use this will become an even bigger cost problem.

    Nuke em all!

    EJ

  11. #31
    Join Date
    Apr 2007
    Location
    Manchester, United Kingdom
    Posts
    8,440

    Default

    Yeah, gotta admit that they are a pain. I've / they've managed to avoid me recently (touch wood!).

    As EJ has said, CSF rules are probably the best chance of decreasing the amount of traffic you get from them - though it is unlikely to completely eliminate it. But it's a step in the right direction .
    David Smith
    DPS Computing
    http://www.dpscomputing.com (Computing, Reviews, News) - We're still plodding on adding new content and features (August 2011)
    http://www.djdavid.co.uk - Massive update! (September 2011) - It's now not neglected!!
    http://davidsmith.dpscomputing.com (My Personal Website) - New Site (10/2009)

  12. #32

    Default

    Thanks Guys, nice to have a sympathetic ear even if I can't fix the problem (on reseller)

    Does anyone have any idea why IP blocking in cPanel has no effect at all on Baidu?

  13. #33

    Default

    Site now down - monthly bandwidth used!

    Should be able to send Baidu the bill

  14. #34
    Join Date
    Jan 2011
    Location
    Haggisland
    Posts
    228

    Post

    As a Reseller, you're going to have to add IP blocks on a per account basis. I suggest putting the blocks at /home/user_account/public_html/.htaccess

    Code:
    order allow,deny
    deny from 180.76.5.
    deny from 180.76.6.
    allow from all
    
    .... rest of file.
    Note the format, no CIDR or address range and the dropping of the last octet. You could get really mean and just use "180.76."
    Once you have sites back up of course. A week into the month and all the bandwidth used up on Reseller - not good! Are you sure it's just from Baidu and not other sites/scrapers/hotlinks?

  15. #35
    Join Date
    Jan 2007
    Location
    United Kingdom
    Posts
    3,011

    Default

    Quote Originally Posted by ejsolutions View Post
    As a Reseller, you're going to have to add IP blocks on a per account basis. I suggest putting the blocks at /home/user_account/public_html/.htaccess

    Code:
    order allow,deny
    deny from 180.76.5.
    deny from 180.76.6.
    allow from all
    
    .... rest of file.
    Note the format, no CIDR or address range and the dropping of the last octet. You could get really mean and just use "180.76."
    Once you have sites back up of course. A week into the month and all the bandwidth used up on Reseller - not good! Are you sure it's just from Baidu and not other sites/scrapers/hotlinks?
    I really wouldn't be surprised if it is just Baidu. I'm willing to be proven otherwise .
    Domains Registrations @ £7.49pa! - Download the official eUKhost Android and iOS App!

    eNlight Cloud Hosting - Cost-Effective, Autoscaling Cloud Hosting
    How does eNlight work? What differences and benefits are there to VPS and Dedicated Servers?

    Chat to us on Twitter!
    Join our incentive affiliate program now - and earn generous commission with each sale!

    How do I contact eUKhost?
    Support: Client Area - 0808 262 0455
    Sales: sales[@]eukhost.com - 0800 862 0380
    Contact eUKhost Management

    Customer Relations:
    feedback@eukhost.com - 0808 262 0255

    Contact:
    ben@eukhost.com
    Skype: euk_ben
    Windows Live Messenger: ben@eukhost.com


    Don't ever let other people's thoughts, feelings, perceptions and/or opinions drown over yours. You know yourself the best. Go with what you think is right. Everyone else's opinions or statements about you or others are secondary.

  16. #36

    Default

    Thanks ej, already have this in htaccess though and I'm afraid it doesn't work.

  17. #37

    Default

    Sorry Ben - I'm not sure I fully understand you. Are you suggesting that the bandwidth sucking isn't coming from Baidu???

  18. #38
    Join Date
    Jan 2011
    Location
    Haggisland
    Posts
    228

    Post

    Quote Originally Posted by Bud Boy View Post
    Sorry Ben - I'm not sure I fully understand you. Are you suggesting that the bandwidth sucking isn't coming from Baidu???
    I was suggesting that it isn't all Baidu (and certainly not just restricted to those IP addresses) - Ben, I think, was suggesting otherwise. If the inclusion of the code above, in the place that I've indicated, doesn't block that IP range, then it appears the server needs tweaking (example: in CSF setting .htaccess checking to three levels deep).
    Do you have Webalizer/Awstats to show where the bandwidth is being consumed? These should detail specific IP addresses.

  19. #39

    Default

    It is Baidu and those IPs are accessing despite everything:

    From cPanel

    IPs Blocked


    By racinefan at 2012-03-08

    Visits


    By racinefan at 2012-03-08


    By racinefan at 2012-03-08


    By racinefan at 2012-03-08

    ...more pages


    By racinefan at 2012-03-08

    ...and more pages


    By racinefan at 2012-03-08

    ...same through to page 10


    By racinefan at 2012-03-08

    ... CONTINUOS (I.E. NOT JUST TODAY - THIS IS WHAT IS SUCKING MY BANDWIDTH)


    More detail from Legacy


    By racinefan at 2012-03-08



    Pretty conclusive I would have thought

    Just can't stop Baidu

  20. #40
    Join Date
    Jan 2011
    Location
    Haggisland
    Posts
    228

    Exclamation

    From an outsider's perspective it does appear that your server is ignoring the .htaccess directives. Support should be able to assist with this.
    Could it be that the "env=bad_bot" entry is affecting things?
    Also, I have seen .htaccess become corrupt and you could rename as .htaccess.old, then create a new one from scratch, based on the old one.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •