Cleaning up the web server logs

Please keep in mind that this post is about 4 years old.
Technology may have changed in the meantime.

I am between 2 books currently, author-wise. So I’m using this time to do the server-housekeeping I’ve been putting of in the past months. Today, I had a look at my web server logs. And I almost regretted it instantly.

The logs were full of requests clearly made by script kiddies. Half of the error-triggering requests were for the file wp-login.php, even for my non-WordPress sites, and what’s worse: even for non-PHP sites. And then there were requests for the usual 1337w0rm, indoxploit, adminer, etcetera. I could have left it, of course, since it doesn’t really hurt to have all these idle requests, but all this noise makes it difficult to distil the log messages that really do need my attention.

In this post, I’ll share what I did to have cleaner Apache log files in the future.
And at the same time, these instructions help protect the web server against known vulnerabilities and exploits.

Mind you: this is Apache 2.4 and above (but you really shouldn’t be using lower versions anymore).

The simplest, to begin with. I have a few websites that only consist of a placeholder named index.html. So, I added the following lines to those VirtualHost configurations:

<LocationMatch "^(?!/(index\.html)?$)">
  Require all denied
</LocationMatch>

Those lines will make sure that all requests other than https://www.example.com/ and https://www.example.com/index.html are rejected.

If you want to have a custom error page for the returned Error 403 (Forbidden), create a page 403.html in the document root, and modify the above to look like this:

ErrorDocument 403 /403.html
<LocationMatch "^(?!/((index|403)\.html)?$)">
  Require all denied
</LocationMatch>

But don’t make too much work of the error page; this will just generate extra work and extra network trafic, and as most attacks are scripted, nobody will enjoy the fruits of your labour.

Then, there are some sites that were not created in PHP. So, for those VirtualHosts, we can reject all requests for PHP files.

<Files "*.php">
  Require all denied
</Files>

Here, I used Files instead of LocationMatch, so that files in deeper directories are also matched. Also, using <FilesMatch/> would be overkill, since we don’t need any advanced matching.

Other sites are created in PHP, but they are not WordPress sites. Almost all WordPress URLs start with wp-, and another popular WordPress URL for script kiddies is xmlrpc.php; those WP-specific URLs can be blocked.

<LocationMatch "^/(wp-[a-z]+|xmlrpc)">
  Require all denied
</LocationMatch>

Here, I did not append the .php extension, because the wp-[a-z]+ pattern may match directories as well.

A rule specifically for WordPress sites. If you don’t use remote publishing (smartphone, for instance), you may wish to disable access to the file xmlrpc.php. This file may be used in bruteforce and ddos attacks, so if you don’t need it, block it.

<Files "xmlrpc.php">
  Require all denied
</Files>

And another one for WordPress sites. If you see probes for WP plugins you don’t have installed, try something like this:

<LocationMatch "^/wp-content/plugins/(db-backup|videowhisper-video-presentation|zingiri-web-shop)/">
  Require all denied
</LocationMatch>

(If you see probes for WP plugins you do have installed, this may mean they have known vulnaribilities. Investigate this, and replace those plugins if necessary.)

And the below lines can be added to all sites, including WordPress.

Reject any filename starting with a dot.

<Files ".*">
  Require all denied
</Files>

We don’t need a <FilesMatch/> here: a regular expression would be overkill; we just say a dot and then any sequence of characters. Two reasons to not use <Location/> here: we don’t want to block access to the .well-known directory, and we also want to match filenames in deeper directories.

Next, all URLs starting with a dot, except the .well-known directory (I’ve seen the directories /.git/ and /.ssh/, for example).

<LocationMatch "^/\.(?!well-known/)">
  Require all denied
</LocationMatch>

Reject all requests for files with extensions .bak, .php5, .php7, .phtml and .sql. Feel free to adjust this regular expression to taste; make sure you don’t list any extensions that you actually wish to host. Keep the regular expression as small as possible, and only list extensions that you find in your Apache log files.

<FilesMatch ".+\.(bak|php5|php7|phtml|sql)$">
  Require all denied
</FilesMatch>

Reject requests for the URL /cgi-bin/test-cgi (I had many of these for some reason), and for PHP files in the cgi-bin (who does that?).

<LocationMatch "^/cgi-bin/(test-cgi|.+\.php)$">
  Require all denied
</LocationMatch>

There was a bug in ThinkPHP in 2018, and script kiddies are still looking for it. This request goes to /index.php, but the string HelloThinkPHP is in the query string.

<If "%{QUERY_STRING} =~ /HelloThinkPHP/">
  Require all denied
</If>

And this last one is risky, and you should not blindly copy it. It is a list of filenames (all PHP) compiled into a regular expression. I strongly advise you to learn regexps first, and then verify for each filename whether you host a file with that name; it is possible that there are URLs in the list that I can safely reject, but you can’t (default.php, sql.php, ?). And obviously, here too, you should only include filenames that you encounter in your web server logs and filenames for known exploits that you want to protect yourself against, to keep the regexp as small as possible.

<FilesMatch "^(01|098|1|1337w0rm|1ndex|33|404-wp|a|acadmin|accesson|ad(m)?|(_|mysql-)?adminer(-.*)?|ajax-index|al|bak|bigdump|blackhat|boardData(10(2|3)|JP|NA|WW)|by|cache_|card_scan_decoder|cdg|cfq|code|command|conns|contaco|content-po|conweb|cr|csspwn|data2|dec|default|deleteme\.[a-z]+|diagnostic|dl|Dwsonv|eg|elrekt|error_log|fantversion|fe(al(11)?)?|fgertreyersd|gaestebuch|gastenboek|gb|get_config|(jax_)?(g|G)uestbook|handle_iscsi|hello|html-wp-404|idb|indoxploit|installer-backup|login_handler|lpfi8|master|miNuS|mos|mudxizxnc|murikha|mysql|new_license|neter|news(f|l)eter|(n)?ewsrsss|newsslide|ninja|noname|o22opo|olux|phpminiadmin|pic(tur)?|pluggable|pma|portal|qdbk|raiz0|rdpl|replace|res|roots|router|rulkszqcey|sasasas|scarbook|searchreplacedb2|seo_script|seter|SetSmarcardSettings|sh8541|shell(manger)?|siteindex|sql|ss|swin|system_api|thfncjyhea|titan|tmp\.sys|u3p|upgrade_handle|unzip(per)?|up__vofqx|up(el|load|x)?|V5|vveb|webconfig\.txt|wolfm|wp_[a-z]+|wso|xfwpnyukyv|xkl|xo|xpwd|XxX|xyz|yt|zaz|zeb)\.php$">
  Require all denied
</FilesMatch>

(Yes, that’s quite a scroll. 🙂 )

And in the same fashion, I reject some directories I encountered multiple times.

<LocationMatch "^/((php)?(m|M)y(a|A)dmin|polycom|provisioning)/">
  Require all denied
</LocationMatch>

In that regular expression, the part that comes before ‘|polycom‘ matches

  • phpMyAdmin
  • phpmyadmin
  • phpMyadmin
  • phpmyAdmin
  • myadmin
  • MyAdmin
  • Myadmin
  • myAdmin

To make my life even easier, I did not insert those lines directly into the VirtualHost sections of the Apache config. Instead, I created a separate file for each of the snippets above (some can logically be joined in a single file, like filenames starting with a dot and URLs starting with a dot), and then included those files where I needed them.

<VirtualHost *:443>
  ServerName placeholder.example.com
  Include /etc/apache2/skiddie.indexhtml.inc
  …
</VirtualHost>
<VirtualHost *:443>
  ServerName static.example.com
  Include /etc/apache2/skiddie.nophp.inc
  Include /etc/apache2/skiddie.dot.inc 
  Include /etc/apache2/skiddie.extensions.inc
  …
</VirtualHost>
<VirtualHost *:443>
  ServerName wp.example.com
  Include /etc/apache2/skiddie.dot.inc 
  Include /etc/apache2/skiddie.extensions.inc
  Include /etc/apache2/skiddie.cgi-bin.inc
  Include /etc/apache2/skiddie.thinkphp.inc
  Include /etc/apache2/skiddie.files.inc
  …
</VirtualHost>
…

This way, if I want to modify a collection, I only need to do it in one place, instead of modifying in one place, and then copying the changes around.

Think well about the order of those rules in your configuration: they are checked in order, so it’s a good idea to put the simple (cheap) checks before the more complicated (expensive) ones.

And finally, to complete the whole thing, I enabled mod_allowmethods in httpd.conf, and I added this line to the sites that do not have forms:

AllowMethods GET OPTIONS

And this line to the sites that do have forms:

AllowMethods GET POST OPTIONS

All the other HTTP request methods are not needed for regular websites, so there is no need to enable them.

The error code for Method Not Allowed is 405; to create a custom error page, see the explanation above for 403 (Forbidden).

Attention: AllowMethods can only be used in <Directory/> and equivalent sections, and the method names are case sensitive.

This should keep my Apache logs a bit more readable from now on.

I could have done the above using mod_rewrite, but I think this requires less memory and processor capacity.
But that’s just a gut feeling, and I haven’t benchmarked it.

REPUBLISHING TERMS

You may republish this article online or in print under our Creative Commons license. You may not edit or shorten the text, you must attribute the article to OhReally.nl and you must include the author’s name in your republication.

If you have any questions, please email rob@ohreally.nl

License

Creative Commons License AttributionCreative Commons Attribution
Cleaning up the web server logs