Banning Bots with Apache
From Shrubbery
- How to ban bots and jerks so they don't pummel your web server.
- This article on how to block spambots shows an interesting way to use mod_rewrite to block clients based on UserAgent or IP address.
- The more comprehensive approach
The reason I was looking for a way to stop bots is that I found that a particular web crawler was making 19K requests per day and I thought that this might interfere with real users. I added the following to my httpd.conf file to block this spider:
RewriteCond %{HTTP_USER_AGENT} "^sogou"
RewriteRule .* - [F,L]
The RewriteRule in this case sends the client a 403 - Access Denied response. You can also ban certain IP addresses using this same approach, for example:
RewriteCond %{REMOTE_ADDR} "^220\.181\.19\.194$"
RewriteRule .* - [F,L]
Multiple rules can be combined using the \[OR\] option:
RewriteCond %{HTTP_USER_AGENT} sogou [NC,OR]
RewriteCond %{REMOTE_ADDR} "^220\.181\.19\.194$"
RewriteRule .* - [F,L]
In the case of persistent bad robots that ignore robots.txt and don't get the hint from all the 403 responses you can also use iptables to filter out specific IP addresses before Apache Web Server even gets them.
[edit] The 'Jerk' list
This is my own personal list of "jerks":
- 'sogou' user agent - This is some jerky web crawler from China that ignores robots.txt. Nice.
- 220.181.19.194 - IP address often used by sogou.
- 64.15.69.3 - a.k.a. Glumper. www.glumper.com recently started beating my server up pretty badly. This guy had the same problem. Freakin' jerks!
- 94.102.60.15 - Some kind of jerky spammer.
http://www.projecthoneypot.org/ip_94.102.60.15
[edit] See Also
- iptables - Block 'em before they even get to apache.

