Banning Bots with Apache

From Shrubbery

Jump to: navigation, search


How to ban bots and jerks so they don't pummel your web server.

The reason I was looking for a way to stop bots is that I found that a particular web crawler was making 19K requests per day and I thought that this might interfere with real users. I added the following to my httpd.conf file to block this spider:

RewriteCond %{HTTP_USER_AGENT} "^sogou"
RewriteRule .* - [F,L]

The RewriteRule in this case sends the client a 403 - Access Denied response. You can also ban certain IP addresses using this same approach, for example:

RewriteCond %{REMOTE_ADDR} "^220\.181\.19\.194$"
RewriteRule .* - [F,L]

Multiple rules can be combined using the \[OR\] option:

RewriteCond %{HTTP_USER_AGENT} sogou [NC,OR]
RewriteCond %{REMOTE_ADDR} "^220\.181\.19\.194$"
RewriteRule .* - [F,L]

In the case of persistent bad robots that ignore robots.txt and don't get the hint from all the 403 responses you can also use iptables to filter out specific IP addresses before Apache Web Server even gets them.

[edit] The 'Jerk' list

This is my own personal list of "jerks":

  1. 'sogou' user agent - This is some jerky web crawler from China that ignores robots.txt. Nice.
  2. 220.181.19.194 - IP address often used by sogou.
  3. 64.15.69.3 - a.k.a. Glumper. www.glumper.com recently started beating my server up pretty badly. This guy had the same problem. Freakin' jerks!
  4. 94.102.60.15 - Some kind of jerky spammer.
    http://www.projecthoneypot.org/ip_94.102.60.15

[edit] See Also

  • iptables - Block 'em before they even get to apache.
Personal tools