Apache

Artikel

Apache tips & tricks

Introduction

I think I do not really need to explain what Apache is (a web server; for those having expected something else: yes, there's also the Apache Foundation, standing behind this and other projects). I will also not explain here how to install it (if you're on Linux, it's probably already installed; or you can easily do so using your favorite package manager).

Being one of the most used web servers, and coming almost out-of-the-box with most Linux distributions, there are plenty of documentations and how-tos available on the net – so a basic setup should be no problem. Remains the question:

What's this article about then?

If you're administrating your own Apache install not only in your little local intranet, but serving your homepage "in the wild", you might soon face some issues not always discussed in above mentioned "documentations and how-tos". Sure, the features I will use here are documented; but often it's a question of putting things together.

I picked some tricks here and there. What I found useful, I implemented. Time to document what I did, so I can repeat it on a different server should the need arise. At the same time, I'm sharing my little pieces with you, in the hope you will find them useful and can pick the one or other idea from them.

What this article is NOT about

This article is neither a complete reference nor a complete guide or manual. As described, you will find some pieces of a larger puzzle here, not the full picture. So please don't complain about missing parts (but feel free to share them as well). If you're stuck somewhere, or need additional information, a really good place to ask your question will certainly be ServerFault.

Introduction
- What's this article about then?
- What this article is NOT about
Logging Tricks
- Conditional logging
- Conditional error-logging
Securing your Apache installation with mod_security
Adjusting things with mod_rewrite
Lock out stubborn bots/hackers with fail2ban

Logging Tricks

How basic logging is done you'll find described in the Apache documentation (see the project's homepage to pick the documentation for the version of Apache you are running; for Apache2.4 you can find the logging documentation e.g. here). But a few things I missed there, at least in the context. So those you'll find here:

Conditional logging

If your log files get to large, you have the feeling that "something has to be done about that". Especially when browsing them and seeing a lot of stuff you don't want there. As it happened to me:

Running a tiny little repository for Debian and RPM packages (to be found with instructions for its usage at apt.izzysoft.de) I suddenly noticed: The size of its Apache logs almost exceeds the amount of data transferred! How can that be? All those package managers (like yum, apt-get, aptitude, etc.) are constantly requesting things my repository does not contain, as e.g. Spanish translations of the package descriptions, the package list compressed in formats I don't use/offer (lzma, xz), and other things.

Of course those requests caused an entry in the server's access_log ("404"), and in the error_log ("File does not exist"). But those entries are neither helpful (I do now those files don't exist, and that's pretty fine with me) nor useful (except for filling the disk). So I wanted to get rid of them. For the access_log, a way is described in the documentation (loog at the very end of the Access Log section, and you'll see a subsection titled "Conditional Logs"). So this was quite easy to establish, using the built-in mod_setenvif module. Picking the example from aforementioned documentation:

# Mark requests for the robots.txt file, then log what remains
SetEnvIf Request_URI "^/robots\.txt$" dontlog
CustomLog logs/access_log common env=!dontlog

Unfortunately, this only works for the access_log – the ErrorLog directive does not accept the env=!dontlog parameter.

Conditional error-logging

This caused me a little headache, as the error_log was by far the larger of the two. I didn't want to adjust the LogLevel to suppress all 404 errors – I just wanted to get rid of those described above.

[text] What finally did the trick was abusing mod_rewrite for the task: "If a requested file does not exist, and the URL matches a given pattern – send a 404 and stop processing" (i.e. catch the error before it's logged).

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.+(/i18n|Packages\.(lzma|xz|bz2)|Packages.diff/Index)$ - [R=404,L]

This keeps non-existing files out of the error log, but only if they match the regular expression specified by RewriteRule. At the same time, should any of those files be added somewhere in the future, the RewriteCond would no longer apply to them – so no danger from that side. But as this fires only after the request was made (well, it hardly can do so before), it does not apply to the access_log. So if you want to keep things out of both files, you need to combine these approaches.

Securing your Apache installation with mod_security

Investigating your log files, you will notice a lot of …

Web Spiders (aka Crawlers; bots of search engines building up search indexes). Some you might welcome (as your pages should be found), which probably include Google and Bing – others you will find "less useful", as they are unlikely to bring you any benefit (e.g. localized Chinese search engines like Baidu and Sougu).
More spiders which are not belonging to above mentioned "search engines", but rather to companies who want to make profit with your data (mostly SEO).
A bunch of strange user agents not fitting in any of the two above categories.
Obvious hacking attempts.
Requests for URLs you never had (e.g. /_vti_bin or /wp-login.php (more hackers looking for vulnerable installations).
Other strange things.

While the Spiders might be just an annoyance eating your bandwidth, at least the hackers pose a security risk – for sure if you use one of the targeted applications on your server. So you should take pre-cautions, and secure your installation. One possibility to do so is using mod_security; a second is fail2ban (which can be used in combination, and will be discussed on the next page).

Installing mod_security

On a linux system, that's a one-liner. On debian, simply issue an

apt-get install libapache2-modsecurity

and the module is installed. To enable it, you can use Apache's a2enmod (and a2dismod to disable it) – or you simply symlink the two related files manually:

cd /etc/apache2/mods-enabled
ln -s ../mods-available/mod-security* .

With the next restart of your Apache server, mod_security would be enabled. (A more detailed installation description can be found at e.g. DigitalOcean.com).

Configuring mod_security

mod_security already ships with a collection of rules, which are available to you (you can find them in /usr/share/modsecurity-crs), but not enabled by default (for good reasons). Also by default, mod_security does not perform any actions (except for logging), so you can first analyze the impact it would have on your services.

You can find the documentation at the project site, explaining all configuration directives. I will not explain everything in depth here, but just give a few hints for a "fast start".

in /etc/apache2/mods-available/mod-security.conf you see which config files are used (check for the Include clause). This usually includes /usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf.
to keep the original files as they are, create a directory with the apache config location (e.g. /etc/apache2/mod_security), and copy the second file there. Replace the entry in mod-security.conf accordingly. Now we adjust the copied file:
without any of the CRS rule files activated, it should be safe to switch SecRuleEngine On. Take a look at the SecAuditEngine setting (should be set to RelevantOnly first; as that still generates a huge amount of log data, you might want to switch it to off once you see everything is working)
you can play with the SecAuditLogParts setting. First letters you probably want to remove are E and I, as they are logging the complete response and request headers and thus produce the most log data. To investigate problems, those data might be helpful, though.
Now you may carefully start including some of the core rulesets. This can be done either by including /usr/share/modsecurity-crs/activated_rules/*.conf and linking the files there – or using your own directories in the above created /etc/apache2/mod_security location, and copy the files there (so you could also modify them without losing the originals).

Which of the core rules you will activate might very much depend on the web apps you are using, and several other criteria. I found even some of the base_rules problematic with some of my servers (too many false positives). But I can recommend include at least the rulesets number 20 (protocol violations; matches a lot of bad bots), 21 (protocol anomalies, similarly), 35 (known "bad bots"), and 45 (trojans). Number 60 (correlation) is purely informal and just produces more log; it might be interesting to play with, but that can be kept for (much) later. You might instead wish to include some of the optional rules (e.g. #42 against "comment spam", if you run a forum, serve "guest books", or the likes), but for now should keep your fingers from the "experimental rules".

Dealing with "false positives"

Especially in the beginning, you might encounter "false positives" – where regular (and legit) users are being blocked. Consult the error_log to see which rules are causing this (most of them log their ID). For the start, you can exclude troublesome rules via the SecRuleRemoveByID clause, followed by one or more IDs. If the list gets longer, you might want to check if some of them are easier excluded with SecRuleRemoveByTag or SecRuleRemoveByMsg. Keep track of what you exclude this way; once you've excluded (almost) all rules of a given file, you might rather prefer to no longer include that file instead. An example could look like:

SecRuleRemoveById 960020 # blocks FooBar; crs_20, tag:PROTOCOL_VIOLATION/INVALID_HREQ
SecRuleRemoveById 960015 # blocks Baz; crs_21, tag:PROTOCOL_VIOLATION/MISSING_HEADER_ACCEPT
<LocationMatch "^/forum/sqlarea">
  SecRuleRemoveByTag WEB_ATTACK/SQL_INJECTION # users post SQL examples here
</LocationMatch>

Those SecRuleRemoveBy* statements can be placed globally, per virtual host, or even inside a <LocationMatch> block, which offers a lot of flexibility.

A helpful practice is using the unique_id modul (pre-installed with Apache) in connection with a custom error page (see also my post at StackOverflow). Setup Apache to use such for 403 errors via ErrorDocument 403 /error/403.php, and place the following code into /usr/share/apache2/error/403.php:


<?php
 $protocol = $_SERVER['SERVER_PROTOCOL'];
 header("$protocol 403 Forbidden");
 header("Status: 403 Forbidden");
 header("Connection: close");
 $msg = $_SERVER["UNIQUE_ID"];
 if (isset($_SERVER["HTTP_MOD_SECURITY_MESSAGE"])) { // no longer there with recent versions
   $txt = $_SERVER["HTTP_MOD_SECURITY_MESSAGE"];
   preg_match('/id "(\d{6})"/',$txt,$match);
   if (!empty($match[1])) $msg .= ", ".$match[1];
 }
?>
<HTML><HEAD>
 <TITLE>You have no access to this resource (403)</TITLE>
 <META content="text/html; charset=utf-8" http-equiv=Content-Type>
 <META NAME="robots" CONTENT="noindex">
 </STYLE>
</HEAD><BODY>
 <P>Access to the requested resource has been denied (HTTP Response Code: 403).</P>
 <P><?php echo $msg?></P>
</HTML></BODY>

403.txt

Of course you should adjust the text (I kept it simple for easier understanding here), and polishing it using some stylesheets. This is how it could look like then:

Example error page

The $msg (shown in the black box in above screenshot) will display the unique id of the failed request, so you can easily find it in your logs. If possible, we want also include the rule id triggered – but recent versions of mod_security seem no longer to expose them to the PHP $_SERVER[] array. Have your users report those "cryptic codes", and it shouldn't be too hard finding out what happened (and maybe remove or adjust the rule).

Adding your own rules

Now we come to the interesting part: Adding things special to your installation. If you e.g. have no Wordpress installation, you might want to kick those trying to access it – or simply having your fun with them. Let's see some possibilities:

First, we create a directory to hold our local adjustments. Corresponding to above examples, let's chose /etc/apache2/mod_security/local. All the following files will be placed there – and of course we need to include this directory with our mod-security.conf

Bad URLs

Make a file named bad_urls.data. Here you place all the bad URLs you've found being accessed, one per line. The file could look like:

/wp-content
/wp-login.php
/register.aspx
/tiki-register.php

Next, we need the matching rule. Let's put that into bad_urls.conf, to keep things clear:

SecRule REQUEST_URI "@pmFromFile bad_urls.data" "msg:'hacker blocked',severity:2"

Which basically means: Whenever any requested URL contains one of our configured names, the client will be blocked, and a corresponding entry will occur in the error log (with "security level 2" – this we will need on the next page for fail2ban). Instead of blocking, we could also have some fun:

SecRule REQUEST_URI "/_vti_bin" "redirect:http://www.microsoft.com/frontpage/"

In this example, somebody wanted to hack our FrontPage. Running a Linux server, one rarely uses FrontPage. But we want to please our visitors – so if he loves FrontPage that much, let's send him there …

Keep bad bots at bay

Now we go for the misbehaving spiders, which do obey no robots.txt, and overload our server with excessive requests. Same principle as with the bad URLs, so we first need a file containing strings from their user agents. Let's call this spiderbots.data, and an extract might look like this:

360Spider
AhrefsBot
CareerBot
Ezooms
MacInroy Privacy Auditors

And as with the bad URLs, our corresponding spiderbots.conf basically can live with a single line:

SecRule REQUEST_HEADERS:User-Agent "@pmFromFile spiderbots.data" "deny,log,ctl:auditEngine=Off,severity:2,msg:'Spiderbot blocked',status:403"

You might wonder about the ctl:auditEngine=Off part here: wouldn't noauditlog be enough? Basically yes, but only concerning this one rule. If the client triggered other rules additionally, those could still cause an audit entry to be generated. As we know whom we hit here, we are not interested in further details – so this directive takes care that no audit entry will be written under any circumstances when this rule was triggered.

Add your own bad bots to the spiderbots.data as you encounter them. A good source to check bots against is Wetena.Com (German), which includes some background and recommendation for the bots it knows about.

Deal with referrer spam

Guess for yourself:

SecRule REQUEST_HEADERS:REFERER "lease-a-seo.com" "msg:'Referer Spam blocked',ctl:auditEngine=Off,severity:2"

Of course you can use the above method with an external .data file to handle larger collections.

Adjusting things with mod_rewrite

Next to mod_security we [just dealt with](#modsec], mod_rewrite is another mighty enhancement to your Apache setup. You already saw some examples when I introduced some Logging tricks; but on this page we want to deal with the module directly.

General things

The two probably most used commands of mod_rewrite are RewriteCond and RewriteRule. Both are using regular expressions (don't worry too much if you're not familiar with those; always keep in mind a normal string is the simplest RegExp, so you can do it!), and both are using (at least) two parameters. While with RewriteCond they are for comparision purposes (on the condition, that X matches Y …), for RewriteRule it's rather a replace X by Y. Both are intended to go together (on the condition … apply this rule), but RewriteRule can also be used stand-alone (no condition, always do).

With regular expressions, you can even use so-called "back references" – that is, using parts of the matched string to apply it with your replacement string in the second parameter to RewriteRule. Very useful for dynamic actions, and we will see this with the examples further down this page. Then, there are even variables you can use to e.g. find out where your visitor came from (%{HTTP_REFERER}), or which of your virtual hosts he was targeting (%{HTTP_HOST}; but that's sometimes the first pitfall when writing your first rules.

Some pitfalls

I already mentioned there are some when it comes to variables: Don't try to directly compare them (to e.g. see whether your HTTP_HOST is part of the HTTP_REFERER) – this doesn't work. Those variables can only be used on the "left side of the equation". There are work-arounds for that, though, and I will show some.

Another trap might happen when combining RewriteCond with multiple RewriteRule statements. Again, that doesn't work. While you can combine multiple conditions (as we will see), those will be "forgotten" after the next executed RewriteRule. Again, I will show a work-around for that.

Some basic examples

Now let's take a look at some examples to get you started with the basic syntax:

RewriteRule ^/old_location /new/location

That looks easy, doesn't it? A good usecase when you moved something to a different place in your webtree. The only "strange looking" part for beginners might be the ^, which simply states "the start of the requested URL" here. So if your visitor requests http://your.server/old_location (which no longer exists), he wouldn't see an error page, but instead end up as if he had requested http://your.server/new/location. Guess I do not need to explain the usefulness.

Now, let's apply some additional magic:

RewriteRule ^/old_location /new/location [R=301]

With our example of a permanently moved location, this would be the more proper approach. Now, mod_rewrite no longer "silently rewrites" the requested URL, but rather sends a 302 response code to the client, telling him that the resource has been moved permanently. What good is this for? Search engines forget the old URL, and use the new instead. Browsers should act similarly, no longer requesting the old URL but directly going to the new one.

An example of combining RewriteRule with RewriteCond we already discussed with conditional error logging:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.+(/i18n|Packages\.(lzma|xz|bz2)|Packages.diff/Index)$ - [R=404,L]

Even having skipped the logging page possibly, you still will recognize the [R=404] part from the previous example, this time combined with an additional L marking this as the Last rule to be processed for this request: on the condition the requested file does not exist (RewriteCond %{REQUEST_FILENAME} !-f, with -f saying it's not a file we have), when the requested name matches the first expression in our RewriteRule, issue a 404 response and cancel the request (so the error doesn't make it into Apache's Error log).

Deal with DeepLinks

The entire web is based on links, and that's a good thing. Even if the links don't stay on your site, they might hint to further information, or reference a source you've used, giving it credit. But links can also be used in a completely different manner:

Ever heard about DeepLinking? In short and easy words: There's a really nice website with beautiful pictures and useful resources, making the impression its owner did a really good job. They all seem to be served on that site, that must be Gigabytes of storage used! But in fact: that server sums up to a few hundred of kilobytes, if at all. The content is neither theirs, nor really served by them.

As this my affect us, let's construct an example to visualize the facts: There is your.server with the real stuff created by your hard work, and there is their.server. Taking a look at their HTML sources, you will notice the IMG tags look like <IMG SRC='http://your.server/your.jpg>, and the other good resources are downloadable via http://your.server/your.zip in the easy cases – or download-123.html (they then use mod_rewrite themselves to have the visitor loading the file from your site, while he still thinks it is served by them), or download.php?id=123 to do the same via scripts. So you not only had the hard work, but you also pay their traffic, while they save on storage and CPU power at the same time!

Now you might think that's a very bad thing (it's bad behavior, true). And you want to punish them. You might even have good reasons to do so, thinking of copyright and competition law. But what a headache! You've got to run for a lawyer. It will take time, costs a lot of nerves, and will prove difficult especially with "the others" being in a different country. And chances are you won't see any compensation.

Deeplinked images

But you would approach the issue in a totally wrong way! Are you making business? Even if not, you know that consists of making and accepting offers. Well: you just got an offer. By loading pictures directly from your site, they offered you to provide them with pictures. Take the offer – and serve what they deserve! No, I don't mean offending material. But isn't it nice of them to offer you free advertizing? Your banners on their site, everywhere, totally for free! Can you reject such an offer? I couldn't resist it, and so I did:

RewriteCond %{HTTP_REFERER} their\.site
RewriteCond %{REQUEST_URI} !^/special_stuff
RewriteRule \.(jpg|png|gif)$ /special_stuff/visit_my_site.jpg [L]

Short explanation necessary?

RewriteCond %{HTTP_REFERER} their\.site: if the referer contains their server name
RewriteCond %{REQUEST_URI} !^/special_stuff (and) the requested URL does not start with where my banners reside
RewriteRule \.(jpg|png|gif)$ /special_stuff/visit_my_site.jpg [L] rewrite all requests for images (URL ending with .jpg, .png, or .gif to the specified banner image

That's pretty cool! With that, I started loving such deep-linkers. Now, what will happen? Won't they update their site pretty fast? Chances are it will rather take months before they realize. As you saw, that code fires only if their site is in the referer – which it was not when they saw the image on your site, and decided to use it on theirs. After that, when they checked the result, the browser didn't load it again (it was already in its cache). So all looks fine to them.

Detecting DeepLinks

So how you figure out you're affected by DeepLinks? You can see that in your Apache logs. Again, some example:

1.2.3.4 - [07/Nov/2013:17:04:59 +0100] "GET /some/picture.jpg HTTP/1.1" 200 78116 64852 "http://their.site/page.html" "Mozilla..."
92.76.238.176 - [07/Nov/2013:17:04:59 +0100] "GET /some/picture.jpg HTTP/1.1" 200 78116 64852 "https://www.google.de/" "Mozilla..."

You see, the first line shows a DeepLink from their site. And in the second line you will notice even Google does that (probably on their image search pages); so if you don't care about people finding your images via that search provider, what better free advertizement place than that! So if you don't care about collateral damage, and only want to show your pictures to your visitors (and direct requests, as anonymizer addons might clear the referer), feel free to adjust your RewriteCond. But first a little helper for you, so you don't have to check tons of log files manually:

#!/bin/bash
# Check access_log for Deeplinks

function syntax() {
  echo
  echo "Syntax:"
  echo "  $0 <linkterm> <HTTP_HOST>"
  echo "Example:"
  echo "  $0 jpg your.site"
  echo
  exit
}

[ -z "$2" ] && syntax

egrep "$1.+\"\s+[[:digit:]]+\s+[[:digit:]]+\s+[[:digit:]]+\s+\"http" access_log |grep -v $2

apache_deeplink.sh

Just called without any parameters, this shell script will explain itself: it expects two parameters. First is the link term you're looking for, and second the host name of your Apache (virtual) host. The regular expression then matches all log file entries for requests from other websites (referer starts with http), and filters out your own server's host name. Which should result in a listing of all deeplinked requests.

Deeplinked downloads

Don't be afraid, I didn't forget the second candidate: your archives. Let's use a catch-'em-all here:

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://&#92;1/.*
RewriteCond %{QUERY_STRING} download\.php?category=([a-z]+)&id=([0-9]+)
RewriteRule /files /mystuff/%1/index.php#%2 [NE,R,L]

Oh, that was wicked of me: I've loaded that example with tricky stuff. But the good guy I am, I'm going to explain:

RewriteCond %{HTTP_REFERER} !^$: Ignore empty referers (direct downloads; remember anonymizers)
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.*: Referer must match your host (see pitfalls: as variables can only be used on the left-hand side, we trick it with a back-referencing RegExp)
RewriteCond %{QUERY_STRING} download\.php?category=([a-z]+)&id=([0-9]+): request matches our downloader script
RewriteRule /files /mystuff/%1/index.php#id_%2 [NE,R,L]: assuming download.php resides somewhere in a /files directory, this will rewrite the URL to the category's index

A few more words might be required on the second line's RegExp: What does !^([^@]*)@@https?://\1/.* do? ^ we already know to stand for the start of the string. ([^@]*) means "everything not matching the @ sign" (i.e. %{HTTP_HOST}), https? matches "http", optionally followed by an "s". Finally, \1 is a back-reference to the first term in parenthesis (again, our %{HTTP_HOST}). And the ! before it inverts our condition: "all requests, where the referer host does not match our host".

The third line also "groups" parts with parenthesis for back-referencing: the category (consisting of 1 ore more letters), and the file id (made up of 1 or more digits). These we back-reference in the last line by %1 and %2. My example assumes each category has a corresponding directory directly below /mystuff, where an index.php resides. In this index, each file's entry is anchored using something like <A NAME="id_123">. You got the idea. But we needed some flags here: [NE] to deal with the # literally (otherwise it would become a HTML entity; [NE] means "do Not Encode), the other two we already dealt with in previous examples.

If your setup is different, the last two lines might simply look like

RewriteCond %{QUERY_STRING} ^/files/.+\.zip$
RewriteRule . /files/index.html[R,L]

In other words: Rewrite all requests to /files/*.zip to /files/index.html using a [R] redirect, and consider this your [L]ast act.

Creating IF-THEN-ELSE sets

In pitfalls I explained that a RewriteCond is "forgotten" once the first RewriteRule was processed – and promised a work-around. You find the following explained in the mod_rewrite documentation itself: the [S]kip flag can be used to construct IF-THEN-ELSE sets:

# Does the file exist?
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Create an if-then-else construct by skipping 3 lines if we meant to go to the "else" stanza.
RewriteRule .? - [S=3]

# IF the file exists, then:
RewriteRule (.*\.gif) images.php?
RewriteRule (.*\.html) docs.php?
# Skip past the "else" stanza.
RewriteRule .? - [S=1]

# ELSE...
RewriteRule (.*) 404.php?file=
# END

Testing and debugging

Now you've created your set of rewrite rules, and surely want to know whether they work as expected. Sure, you could wait for the first visitor to hit one of your definitions. But that might take a while; and if you have to adjust, this might take weeks to finish. You could use your browser to see for yourself: But the deeplinked image probably is already in your browsers cache as well (permanently deleting the cache is not that convenient), and how to deal with referer-based rules? There must be a better way.

Testing

The tool of my choice for this is Wget, which comes by default with every Linux distribution I've seen. You can see the man page (or check the online manual to explore its full flexibility; but I will give you some examples here for the things we've discussed above:

# Is your moved page accessible via its old location?
wget http://your.server/old_location
# Are your non-existing files kept out of your error_log?
wget http://your.server/Packages.xz
# Try the deeplinked image
wget --referer=http://deep.linkers.com/deeplink.html http://your.server/image.jpg
# We had no example for that, but you might have user-agent based rules
wget --user-agent=Foobar http://your.server/some_page.html

Debugging

But what if you cannot figure out what goes wrong? How to debug? For this, mod_rewrite offers its own logging. But other than with mod_security, you cannot switch it per rule: logging is either enabled or disabled. And the log level is fixed to what you set it. So first a few general rules:

As rewrite logging produces noticable overhead, you should not enable it globally, but rather per VHost, and only if needed
Keep the log level as low as possible; the higher you set it, the bigger the generated log files, and the larger the overhead
Log levels above 2 should better not be enabled on production systems, as performance will decrease noticably
Switch rewrite logging off as soon as you no longer need it

You can define the log file to write to using the RewriteLog keyword, passing it the file name – e.g. RewriteLog /var/log/apache2/rewrite.log. This way it's kept separate from your regular Apache log files, and you can simply delete it when done. The log level is defined in a similar manner, using the RewriteLogLevel keyword followed by the log level, which is an integer between 0 and 9 (currently used are only levels 0-4). Each log level includes its lower siblings; so log level 2 would include everything from 0 and 1 as well. To give you an idea of what to expect:

Level	Amount of log	Content
0	Nothing	Nothing (Logging switched off)
1	1 line per request	Result of rewrite processing¹
2	0 or multiple lines per request	Rule steps processed²
3	0 or multiple lines	Intermediate steps, depending on rule
4	multiple lines per request	Each rule is listed, and marked whether it matched or not³

You can see that log level 4 would be perfect for debugging – on a server only you yourself access. On a heavily used production server, it would not only degrade performance, but the resulting log would be hard to catch up with.

Lock out stubborn bots/hackers with fail2ban

Fail2Ban helps to keep attackers away. As the name suggests: Too many "fails", and they get "banned". To accomplish this, IpTables is used: the attacker's IP gets blocked at IP level, so he can reach the service no longer. The admin can configure how many fails are "too many", and how long the block should be uphold then.

Fail2Ban can be installed on Debian systems by simply invoking apt-get install fail2ban, and ships with several ready-to-use "jails". First thing to enable should be the SSH jail: 6 failed login attempts from the same IP, and that IP is banned for 10 minutes (of course you can adjust those values).

Pre-configured apache jails

Fail2Ban also ships with some pre-configured jails for Apache:

apache-auth: parses the error_log for failed HTTP AUTH attempts
apache-badbots: blocks known bad bots. Add your own in /etc/fail2ban/filter.d/apache-badbots.conf
apache-overflows: same for too long URLs

Adding your own jails

To establish your own custom jails, two things are needed: A filter definition in the /etc/fail2ban/filter.d directory, and a corresponding entry in /etc/fail2ban/jail.conf. I will show you some examples here:

Bots abusing the `robots.txt`

In your robots.txt you can define which URLs a robot should not visit. Some hackers use that to find out where the "interesting stuff" might be, and explicitly request those URLs. If they are not linked from any of your pages, you could use this as a trap:

User-agent: *
Disallow: /here_are_dragons

Needless to say: I don't have that URL on any of my servers, which is fine. So let's make use of that trap. First, we need our /etc/fail2ban/filter.d/apache-robots-txt.conf:

[Definition]
failregex = ^<HOST> -.*"(GET|POST) /here_are_dragons
ignoreregex =

Now we add the jail:

[apache-robots-txt]
# Crawlers taking the robots.txt and then just crawl the disallowed URLs
enabled  = true
port     = http,https
filter   = apache-robots-txt
maxretry = 1
logpath  = /var/log/apache2/access_log

This definition is quite self-explaining: The jail is enabled for http and https ports, uses our filter, and bans any access attempt at its first occurence in Apache's access_log (note:you can use wildcards for the log file if you use separate logs per vhost). Blocking time uses default values (see at the top of jail.conf; if you want different values, just add them here.

Blocking bad guys via `mod_security`

In our mod_security examples, we established our own rules with severity:2 (aka CRITICAL). Several other rules use this as well for "real bad things"; for even worse things, there are also severities 1 (EMERGENCY) and 0 (ALERT). So let's jail those attackers as well. We start with /etc/fail2ban/filter.d/mod-security.conf:

[Definition]
failregex = \[client <HOST>\] .* \[severity \"ALERT\"\]
            \[client <HOST>\] .* \[severity \"EMERGENCY\"\]
            \[client <HOST>\] .* \[severity \"CRITICAL\"\]
ignoreregex =

Next, we add the corresponding jail to /etc/fail2ban/jail.conf:

[apache-mod-security]
# ban IP if mod_security issues an EMERGENCY, ALERT, or CRITICAL (severity 0..2)
enabled = true
port    = http,https
filter  = mod-security
logpath = /var/log/apache2/error_log
maxretry = 1
bantime = 1800

So if someone triggered such "intrusion alert", he cannot access our web service for the next 1800 seconds (30 minutes).

Did we succeed?

A tool I can really recommend to monitor your server is Monitorix. It can not only monitor your system's health status (system load, CPU usage, disk activities, and more), but also your Apache, MySQL, and – here it comes – Fail2Ban activities. So you could use it to check whether your rules and jails had some hits, which could look like this:

Monitorix graph for Fail2Ban

e.g. "pass through <REQUEST_URI>" when no rule hit, "go-ahead with <NEW_URI>" otherwise ↩︎
on a simple rewrite, this could be 3 lines: "rewrite old -> new", "local path result <URI>", "prefixed with document_root to <full file path>" ↩︎
see mod_rewrite Wiki for details ↩︎

2018-12-16

IzzySoft

Apache tips & tricks

Introduction

What's this article about then?

What this article is NOT about

Table of contents

Logging Tricks

Conditional logging

Conditional error-logging

Securing your Apache installation with mod_security

Installing mod_security

Configuring mod_security

Dealing with "false positives"

Adding your own rules

Bad URLs

Keep bad bots at bay

Deal with referrer spam

Adjusting things with mod_rewrite

General things

Some pitfalls

Some basic examples

Deal with DeepLinks

Deeplinked images

Detecting DeepLinks

Deeplinked downloads

Creating IF-THEN-ELSE sets

Testing and debugging

Testing

Debugging

Lock out stubborn bots/hackers with fail2ban

Pre-configured apache jails

Adding your own jails

Bots abusing the `robots.txt`

Blocking bad guys via `mod_security`

Did we succeed?

IzzySoft

Apache tips & tricks

Introduction

What's this article about then?

What this article is NOT about

Table of contents

Logging Tricks

Conditional logging

Conditional error-logging

Securing your Apache installation with mod_security

Installing mod_security

Configuring mod_security

Dealing with "false positives"

Adding your own rules

Bad URLs

Keep bad bots at bay

Deal with referrer spam

Adjusting things with mod_rewrite

General things

Some pitfalls

Some basic examples

Deal with DeepLinks

Deeplinked images

Detecting DeepLinks

Deeplinked downloads

Creating IF-THEN-ELSE sets

Testing and debugging

Testing

Debugging

Lock out stubborn bots/hackers with fail2ban

Pre-configured apache jails

Adding your own jails

Bots abusing the robots.txt

Blocking bad guys via mod_security

Did we succeed?

Bots abusing the `robots.txt`

Blocking bad guys via `mod_security`