Software List | Documentation | Demos | Articles |
Apache tips & tricks
Introduction
I think I do not really need to explain what Apache is (a web server; for those having expected something else: yes, there's also the Apache Foundation, standing behind this and other projects). I will also not explain here how to install it (if you're on Linux, it's probably already installed; or you can easily do so using your favorite package manager).
Being one of the most used web servers, and coming almost out-of-the-box with most Linux distributions, there are plenty of documentations and how-tos available on the net – so a basic setup should be no problem. Remains the question:
What's this article about then?
If you're administrating your own Apache install not only in your little local intranet, but serving your homepage "in the wild", you might soon face some issues not always discussed in above mentioned "documentations and how-tos". Sure, the features I will use here are documented; but often it's a question of putting things together.
I picked some tricks here and there. What I found useful, I implemented. Time to document what I did, so I can repeat it on a different server should the need arise. At the same time, I'm sharing my little pieces with you, in the hope you will find them useful and can pick the one or other idea from them.
What this article is NOT about
This article is neither a complete reference nor a complete guide or manual. As described, you will find some pieces of a larger puzzle here, not the full picture. So please don't complain about missing parts (but feel free to share them as well). If you're stuck somewhere, or need additional information, a really good place to ask your question will certainly be ServerFault.
Table of contents
- Introduction
- Logging Tricks
- Securing your Apache installation with mod_security
- Adjusting things with mod_rewrite
- Lock out stubborn bots/hackers with fail2ban
Logging Tricks
How basic logging is done you'll find described in the Apache documentation (see the project's homepage to pick the documentation for the version of Apache you are running; for Apache2.4 you can find the logging documentation e.g. here). But a few things I missed there, at least in the context. So those you'll find here:
Conditional logging
If your log files get to large, you have the feeling that "something has to be done about that". Especially when browsing them and seeing a lot of stuff you don't want there. As it happened to me:
Running a tiny little repository for Debian and RPM packages (to be found with
instructions for its usage at apt.izzysoft.de) I suddenly
noticed: The size of its Apache logs almost exceeds the amount of data
transferred! How can that be? All those package managers (like yum
,
apt-get
, aptitude
, etc.) are constantly requesting things my repository
does not contain, as e.g. Spanish translations of the package descriptions, the
package list compressed in formats I don't use/offer (lzma
, xz
), and other
things.
Of course those requests caused an entry in the server's access_log
("404"), and in the error_log
("File does not exist"). But those
entries are neither helpful (I do now those files don't exist, and that's
pretty fine with me) nor useful (except for filling the disk). So I wanted to get
rid of them. For the access_log
, a way is described in the documentation
(loog at the very end of the Access Log section, and you'll see a subsection
titled "Conditional Logs"). So this was quite easy to establish, using the built-in
mod_setenvif
module. Picking the example from aforementioned documentation:
# Mark requests for the robots.txt file, then log what remains
SetEnvIf Request_URI "^/robots\.txt$" dontlog
CustomLog logs/access_log common env=!dontlog
Unfortunately, this only works for the access_log
– the ErrorLog
directive
does not accept the env=!dontlog
parameter.
Conditional error-logging
This caused me a little headache, as the error_log
was by far the larger of
the two. I didn't want to adjust the LogLevel
to suppress
all 404 errors – I just wanted to get rid of those described above.
[text]
What finally did the trick was abusing mod_rewrite
for the task: "If a requested file does not exist, and the URL matches a given
pattern – send a 404 and stop processing" (i.e. catch the error before it's logged).
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.+(/i18n|Packages\.(lzma|xz|bz2)|Packages.diff/Index)$ - [R=404,L]
This keeps non-existing files out of the error log, but only if they match the regular
expression specified by RewriteRule
. At the same time, should any of those
files be added somewhere in the future, the RewriteCond
would no longer
apply to them – so no danger from that side. But as this fires only after the request
was made (well, it hardly can do so before), it does not apply to the
access_log
. So if you want to keep things out of both files, you need to
combine these approaches.
Securing your Apache installation with mod_security
Investigating your log files, you will notice a lot of …
- Web Spiders (aka Crawlers; bots of search engines building up search indexes). Some you might welcome (as your pages should be found), which probably include Google and Bing – others you will find "less useful", as they are unlikely to bring you any benefit (e.g. localized Chinese search engines like Baidu and Sougu).
- More spiders which are not belonging to above mentioned "search engines", but rather to companies who want to make profit with your data (mostly SEO).
- A bunch of strange user agents not fitting in any of the two above categories.
- Obvious hacking attempts.
- Requests for URLs you never had (e.g.
/_vti_bin
or/wp-login.php
(more hackers looking for vulnerable installations). - Other strange things.
While the Spiders might be just an annoyance eating your bandwidth, at least the
hackers pose a security risk – for sure if you use one of the targeted applications
on your server. So you should take pre-cautions, and secure your installation. One
possibility to do so is using mod_security
; a second is fail2ban
(which can be used in combination, and will be discussed on the next page).
Installing mod_security
On a linux system, that's a one-liner. On debian, simply issue an
apt-get install libapache2-modsecurity
and the module is installed. To enable it, you can use Apache's a2enmod
(and a2dismod
to disable it) – or you simply symlink the two related
files manually:
cd /etc/apache2/mods-enabled
ln -s ../mods-available/mod-security* .
With the next restart of your Apache server, mod_security
would be
enabled. (A more detailed installation description can be found at e.g.
DigitalOcean.com).
Configuring mod_security
mod_security
already ships with a collection of rules, which are
available to you (you can find them in /usr/share/modsecurity-crs
),
but not enabled by default (for good reasons). Also by default, mod_security
does not perform any actions (except for logging), so you can first analyze the
impact it would have on your services.
You can find the documentation at the project site, explaining all configuration directives. I will not explain everything in depth here, but just give a few hints for a "fast start".
- in
/etc/apache2/mods-available/mod-security.conf
you see which config files are used (check for theInclude
clause). This usually includes/usr/share/modsecurity-crs/modsecurity_crs_10_setup.conf
. - to keep the original files as they are, create a directory with the apache
config location (e.g.
/etc/apache2/mod_security
), and copy the second file there. Replace the entry inmod-security.conf
accordingly. Now we adjust the copied file: - without any of the CRS rule files activated, it should be safe to switch
SecRuleEngine On
. Take a look at theSecAuditEngine
setting (should be set toRelevantOnly
first; as that still generates a huge amount of log data, you might want to switch it tooff
once you see everything is working) - you can play with the
SecAuditLogParts
setting. First letters you probably want to remove areE
andI
, as they are logging the complete response and request headers and thus produce the most log data. To investigate problems, those data might be helpful, though. - Now you may carefully start including some of the core rulesets. This can be done
either by including
/usr/share/modsecurity-crs/activated_rules/*.conf
and linking the files there – or using your own directories in the above created/etc/apache2/mod_security
location, and copy the files there (so you could also modify them without losing the originals).
Which of the core rules you will activate might very much depend on the web apps
you are using, and several other criteria. I found even some of the base_rules
problematic with some of my servers (too many false positives).
But I can recommend include at least the rulesets number 20 (protocol violations; matches a
lot of bad bots), 21 (protocol anomalies, similarly), 35 (known "bad bots"), and 45
(trojans). Number 60 (correlation) is purely informal and just produces more log; it might
be interesting to play with, but that can be kept for (much) later. You might instead wish
to include some of the optional rules (e.g. #42 against "comment spam", if you run a forum,
serve "guest books", or the likes), but for now should keep your fingers from the "experimental rules".
Dealing with "false positives"
Especially in the beginning, you might encounter "false positives" – where regular
(and legit) users are being blocked. Consult the error_log
to see which
rules are causing this (most of them log their ID). For the start, you can exclude
troublesome rules via the SecRuleRemoveByID
clause, followed by one or
more IDs. If the list gets longer, you might want to check if some of them are
easier excluded with SecRuleRemoveByTag
or SecRuleRemoveByMsg
.
Keep track of what you exclude this way; once you've excluded (almost) all rules
of a given file, you might rather prefer to no longer include that file instead.
An example could look like:
SecRuleRemoveById 960020 # blocks FooBar; crs_20, tag:PROTOCOL_VIOLATION/INVALID_HREQ
SecRuleRemoveById 960015 # blocks Baz; crs_21, tag:PROTOCOL_VIOLATION/MISSING_HEADER_ACCEPT
<LocationMatch "^/forum/sqlarea">
SecRuleRemoveByTag WEB_ATTACK/SQL_INJECTION # users post SQL examples here
</LocationMatch>
Those SecRuleRemoveBy*
statements can be placed globally, per
virtual host, or even inside a <LocationMatch>
block, which
offers a lot of flexibility.
A helpful practice is using the unique_id
modul (pre-installed
with Apache) in connection with a custom error page (see also my post at
StackOverflow). Setup Apache to use
such for 403 errors via ErrorDocument 403 /error/403.php
, and
place the following code into /usr/share/apache2/error/403.php
:
|
Of course you should adjust the text (I kept it simple for easier understanding here), and polishing it using some stylesheets. This is how it could look like then:

The $msg
(shown in the black box in above screenshot) will display
the unique id of the failed request, so you can easily find it in your logs. If
possible, we want also include the rule id triggered – but recent versions of
mod_security seem no longer to expose them to the PHP $_SERVER[]
array. Have your users report those "cryptic codes", and it shouldn't be too
hard finding out what happened (and maybe remove or adjust the rule).
Adding your own rules
Now we come to the interesting part: Adding things special to your installation. If you e.g. have no Wordpress installation, you might want to kick those trying to access it – or simply having your fun with them. Let's see some possibilities:
First, we create a directory to hold our local adjustments. Corresponding to above
examples, let's chose /etc/apache2/mod_security/local
. All the
following files will be placed there – and of course we need to include this
directory with our mod-security.conf
Bad URLs
Make a file named bad_urls.data
. Here you place all the bad URLs
you've found being accessed, one per line. The file could look like:
/wp-content
/wp-login.php
/register.aspx
/tiki-register.php
Next, we need the matching rule. Let's put that into bad_urls.conf
,
to keep things clear:
SecRule REQUEST_URI "@pmFromFile bad_urls.data" "msg:'hacker blocked',severity:2"
Which basically means: Whenever any requested URL contains one of our configured names, the client will be blocked, and a corresponding entry will occur in the error log (with "security level 2" – this we will need on the next page for fail2ban). Instead of blocking, we could also have some fun:
SecRule REQUEST_URI "/_vti_bin" "redirect:http://www.microsoft.com/frontpage/"
In this example, somebody wanted to hack our FrontPage. Running a Linux server, one rarely uses FrontPage. But we want to please our visitors – so if he loves FrontPage that much, let's send him there …
Keep bad bots at bay
Now we go for the misbehaving spiders, which do obey no robots.txt
,
and overload our server with excessive requests. Same principle as with the bad URLs,
so we first need a file containing strings from their user agents. Let's call
this spiderbots.data
, and an extract might look like this:
360Spider
AhrefsBot
CareerBot
Ezooms
MacInroy Privacy Auditors
And as with the bad URLs, our corresponding spiderbots.conf
basically
can live with a single line:
SecRule REQUEST_HEADERS:User-Agent "@pmFromFile spiderbots.data" "deny,log,ctl:auditEngine=Off,severity:2,msg:'Spiderbot blocked',status:403"
You might wonder about the ctl:auditEngine=Off
part here: wouldn't
noauditlog
be enough? Basically yes, but only concerning this one
rule. If the client triggered other rules additionally, those could still cause
an audit entry to be generated. As we know whom we hit here, we are not
interested in further details – so this directive takes care that no audit
entry will be written under any circumstances when this rule was triggered.
Add your own bad bots to the spiderbots.data
as you encounter
them. A good source to check bots against is
Wetena.Com (German), which includes some background
and recommendation for the bots it knows about.
Deal with referrer spam
Guess for yourself:
SecRule REQUEST_HEADERS:REFERER "lease-a-seo.com" "msg:'Referer Spam blocked',ctl:auditEngine=Off,severity:2"
Of course you can use the above method with an external .data
file
to handle larger collections.
Adjusting things with mod_rewrite
Next to mod_security
we [just dealt with](#modsec],
mod_rewrite
is another mighty enhancement to your Apache setup.
You already saw some examples when I introduced some Logging
tricks; but on this page we want to deal with the module directly.
General things
The two probably most used commands of mod_rewrite
are
RewriteCond
and RewriteRule
. Both are using regular
expressions (don't worry too much if you're not familiar with those; always keep
in mind a normal string is the simplest RegExp, so you can do it!), and both
are using (at least) two parameters. While with RewriteCond
they
are for comparision purposes (on the condition, that X matches Y …),
for RewriteRule
it's rather a replace X by Y. Both are
intended to go together (on the condition … apply this rule), but
RewriteRule
can also be used stand-alone (no condition, always
do).
With regular expressions,
you can even use so-called "back references" – that is, using parts of the
matched string to apply it with your replacement string in the second parameter
to RewriteRule
. Very useful for dynamic actions, and we will see
this with the examples further down this page. Then, there are even variables
you can use to e.g. find out where your visitor came from
(%{HTTP_REFERER}
), or which of your virtual hosts he was
targeting (%{HTTP_HOST}
; but that's sometimes the first
pitfall when writing your first rules.
Some pitfalls
I already mentioned there are some when it comes to variables: Don't try to
directly compare them (to e.g. see whether your HTTP_HOST
is part
of the HTTP_REFERER
) – this doesn't work. Those variables can
only be used on the "left side of the equation". There are work-arounds for
that, though, and I will show some.
Another trap might happen when combining RewriteCond
with multiple
RewriteRule
statements. Again, that doesn't work. While you can
combine multiple conditions (as we will see), those will be "forgotten" after
the next executed RewriteRule
. Again, I will show a work-around
for that.
Some basic examples
Now let's take a look at some examples to get you started with the basic syntax:
RewriteRule ^/old_location /new/location
That looks easy, doesn't it? A good usecase when you moved something to a
different place in your webtree. The only "strange looking" part for beginners
might be the ^
, which simply states "the start of the requested
URL" here. So if your visitor requests http://your.server/old_location
(which no longer exists), he wouldn't see an error page, but instead end up as
if he had requested http://your.server/new/location
. Guess I do not
need to explain the usefulness.
Now, let's apply some additional magic:
RewriteRule ^/old_location /new/location [R=301]
With our example of a permanently moved location, this would be the more proper
approach. Now, mod_rewrite
no longer "silently rewrites" the
requested URL, but rather sends a 302 response
code to the client, telling him that
the resource has been moved permanently. What good is this for? Search engines
forget the old URL, and use the new instead. Browsers should act similarly, no
longer requesting the old URL but directly going to the new one.
An example of combining RewriteRule
with RewriteCond
we already discussed with conditional error logging:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.+(/i18n|Packages\.(lzma|xz|bz2)|Packages.diff/Index)$ - [R=404,L]
Even having skipped the logging page possibly, you still will recognize the
[R=404]
part from the previous example, this time combined with an additional
L
marking this as the Last rule to be processed for this request: on
the condition the requested file does not exist (RewriteCond
%{REQUEST_FILENAME} !-f
, with -f
saying it's not a file we have), when the
requested name matches the first expression in our RewriteRule
,
issue a 404 response and cancel the
request (so the error doesn't make it into Apache's Error log).
Deal with DeepLinks
The entire web is based on links, and that's a good thing. Even if the links don't stay on your site, they might hint to further information, or reference a source you've used, giving it credit. But links can also be used in a completely different manner:
Ever heard about DeepLinking? In short and easy words: There's a really nice website with beautiful pictures and useful resources, making the impression its owner did a really good job. They all seem to be served on that site, that must be Gigabytes of storage used! But in fact: that server sums up to a few hundred of kilobytes, if at all. The content is neither theirs, nor really served by them.
As this my affect us, let's construct an example to visualize the facts: There
is your.server
with the real stuff created by your hard work, and there is
their.server
. Taking a look at their HTML sources, you will notice the IMG
tags look like <IMG SRC='http://your.server/your.jpg>
, and the other good
resources are downloadable via http://your.server/your.zip
in the easy cases
– or download-123.html
(they then use mod_rewrite
themselves to have the
visitor loading the file from your site, while he still thinks it is served by
them), or download.php?id=123
to do the same via scripts. So you not only had
the hard work, but you also pay their traffic, while they save on storage and
CPU power at the same time!
Now you might think that's a very bad thing (it's bad behavior, true). And you want to punish them. You might even have good reasons to do so, thinking of copyright and competition law. But what a headache! You've got to run for a lawyer. It will take time, costs a lot of nerves, and will prove difficult especially with "the others" being in a different country. And chances are you won't see any compensation.
Deeplinked images
But you would approach the issue in a totally wrong way! Are you making business? Even if not, you know that consists of making and accepting offers. Well: you just got an offer. By loading pictures directly from your site, they offered you to provide them with pictures. Take the offer – and serve what they deserve! No, I don't mean offending material. But isn't it nice of them to offer you free advertizing? Your banners on their site, everywhere, totally for free! Can you reject such an offer? I couldn't resist it, and so I did:
RewriteCond %{HTTP_REFERER} their\.site
RewriteCond %{REQUEST_URI} !^/special_stuff
RewriteRule \.(jpg|png|gif)$ /special_stuff/visit_my_site.jpg [L]
Short explanation necessary?
RewriteCond %{HTTP_REFERER} their\.site
: if the referer contains their server nameRewriteCond %{REQUEST_URI} !^/special_stuff
(and) the requested URL does not start with where my banners resideRewriteRule \.(jpg|png|gif)$ /special_stuff/visit_my_site.jpg [L]
rewrite all requests for images (URL ending with.jpg
,.png
, or.gif
to the specified banner image
That's pretty cool! With that, I started loving such deep-linkers. Now, what will happen? Won't they update their site pretty fast? Chances are it will rather take months before they realize. As you saw, that code fires only if their site is in the referer – which it was not when they saw the image on your site, and decided to use it on theirs. After that, when they checked the result, the browser didn't load it again (it was already in its cache). So all looks fine to them.
Detecting DeepLinks
So how you figure out you're affected by DeepLinks? You can see that in your Apache logs. Again, some example:
1.2.3.4 - [07/Nov/2013:17:04:59 +0100] "GET /some/picture.jpg HTTP/1.1" 200 78116 64852 "http://their.site/page.html" "Mozilla..."
92.76.238.176 - [07/Nov/2013:17:04:59 +0100] "GET /some/picture.jpg HTTP/1.1" 200 78116 64852 "https://www.google.de/" "Mozilla..."
You see, the first line shows a DeepLink from their site. And in the second line
you will notice even Google does that (probably on their image search pages); so
if you don't care about people finding your images via that search provider,
what better free advertizement place than that! So if you don't care about
collateral damage, and only want to show your pictures to your visitors (and
direct requests, as anonymizer addons might clear the referer), feel free to
adjust your RewriteCond
. But first a little helper for you, so you
don't have to check tons of log files manually:
#!/bin/bash # Check access_log for Deeplinks function syntax() { echo echo "Syntax:" echo " $0 <linkterm> <HTTP_HOST>" echo "Example:" echo " $0 jpg your.site" echo exit } [ -z "$2" ] && syntax egrep "$1.+\"\s+[[:digit:]]+\s+[[:digit:]]+\s+[[:digit:]]+\s+\"http" access_log |grep -v $2 |
Just called without any parameters, this shell script will explain itself: it
expects two parameters. First is the link term you're looking for, and second
the host name of your Apache (virtual) host. The regular expression then matches
all log file entries for requests from other websites (referer starts with
http
), and filters out your own server's host name. Which should
result in a listing of all deeplinked requests.
Deeplinked downloads
Don't be afraid, I didn't forget the second candidate: your archives. Let's use a catch-'em-all here:
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.*
RewriteCond %{QUERY_STRING} download\.php?category=([a-z]+)&id=([0-9]+)
RewriteRule /files /mystuff/%1/index.php#%2 [NE,R,L]
Oh, that was wicked of me: I've loaded that example with tricky stuff. But the good guy I am, I'm going to explain:
RewriteCond %{HTTP_REFERER} !^$
: Ignore empty referers (direct downloads; remember anonymizers)RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.*
: Referer must match your host (see pitfalls: as variables can only be used on the left-hand side, we trick it with a back-referencing RegExp)RewriteCond %{QUERY_STRING} download\.php?category=([a-z]+)&id=([0-9]+)
: request matches our downloader scriptRewriteRule /files /mystuff/%1/index.php#id_%2 [NE,R,L]
: assumingdownload.php
resides somewhere in a/files
directory, this will rewrite the URL to the category's index
A few more words might be required on the second line's RegExp: What does !^([^@]*)@@https?://\1/.*
do?
^
we already know to stand for the start of the string. ([^@]*)
means "everything not matching the @
sign" (i.e. %{HTTP_HOST}
),
https?
matches "http", optionally followed by an "s". Finally,
\1
is a back-reference to the first term in parenthesis (again, our
%{HTTP_HOST}
). And the !
before it inverts our condition:
"all requests, where the referer host does not match our host".
The third line also "groups" parts with parenthesis for back-referencing:
the category (consisting of 1 ore more letters), and the file id (made up of
1 or more digits). These we back-reference in the last line by %1
and %2
. My example assumes each category has a corresponding
directory directly below /mystuff
, where an index.php
resides. In this index, each file's entry is anchored using something like
<A NAME="id_123">
. You got the idea. But we needed some
flags here: [NE]
to deal with the #
literally
(otherwise it would become a HTML entity; [NE]
means "do Not
Encode), the other two we already dealt with in previous examples.
If your setup is different, the last two lines might simply look like
RewriteCond %{QUERY_STRING} ^/files/.+\.zip$
RewriteRule . /files/index.html[R,L]
In other words: Rewrite all requests to /files/*.zip
to
/files/index.html
using a [R]
redirect, and
consider this your [L]
ast act.
Creating IF-THEN-ELSE sets
In pitfalls I explained that a RewriteCond
is
"forgotten" once the first RewriteRule
was processed – and promised a
work-around. You find the following explained in the mod_rewrite
documentation itself: the
[S]
kip flag can be used to construct IF-THEN-ELSE sets:
# Does the file exist?
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Create an if-then-else construct by skipping 3 lines if we meant to go to the "else" stanza.
RewriteRule .? - [S=3]
# IF the file exists, then:
RewriteRule (.*\.gif) images.php?
RewriteRule (.*\.html) docs.php?
# Skip past the "else" stanza.
RewriteRule .? - [S=1]
# ELSE...
RewriteRule (.*) 404.php?file=
# END
Testing and debugging
Now you've created your set of rewrite rules, and surely want to know whether they work as expected. Sure, you could wait for the first visitor to hit one of your definitions. But that might take a while; and if you have to adjust, this might take weeks to finish. You could use your browser to see for yourself: But the deeplinked image probably is already in your browsers cache as well (permanently deleting the cache is not that convenient), and how to deal with referer-based rules? There must be a better way.
Testing
The tool of my choice for this is Wget, which comes by default with every Linux distribution I've seen. You can see the man page (or check the online manual to explore its full flexibility; but I will give you some examples here for the things we've discussed above:
# Is your moved page accessible via its old location?
wget http://your.server/old_location
# Are your non-existing files kept out of your error_log?
wget http://your.server/Packages.xz
# Try the deeplinked image
wget --referer=http://deep.linkers.com/deeplink.html http://your.server/image.jpg
# We had no example for that, but you might have user-agent based rules
wget --user-agent=Foobar http://your.server/some_page.html
Debugging
But what if you cannot figure out what goes wrong? How to debug? For this,
mod_rewrite
offers its own logging. But other than with
mod_security, you cannot switch it per rule: logging is either
enabled or disabled. And the log level is fixed to what you set it. So first a
few general rules:
- As rewrite logging produces noticable overhead, you should not enable it globally, but rather per VHost, and only if needed
- Keep the log level as low as possible; the higher you set it, the bigger the generated log files, and the larger the overhead
- Log levels above 2 should better not be enabled on production systems, as performance will decrease noticably
- Switch rewrite logging off as soon as you no longer need it
You can define the log file to write to using the RewriteLog
keyword,
passing it the file name – e.g. RewriteLog /var/log/apache2/rewrite.log
.
This way it's kept separate from your regular Apache log files, and you can simply
delete it when done. The log level is defined in a similar manner, using the
RewriteLogLevel
keyword followed by the log level, which is an
integer between 0 and 9 (currently used are only levels 0-4). Each log level
includes its lower siblings; so log level 2 would include everything from 0 and
1 as well. To give you an idea of what to expect:
Level | Amount of log | Content |
---|---|---|
0 | Nothing | Nothing (Logging switched off) |
1 | 1 line per request | Result of rewrite processing1 |
2 | 0 or multiple lines per request | Rule steps processed2 |
3 | 0 or multiple lines | Intermediate steps, depending on rule |
4 | multiple lines per request | Each rule is listed, and marked whether it matched or not3 |
You can see that log level 4 would be perfect for debugging – on a server only you yourself access. On a heavily used production server, it would not only degrade performance, but the resulting log would be hard to catch up with.
Lock out stubborn bots/hackers with fail2ban
Fail2Ban helps to keep attackers away. As the name suggests: Too many "fails", and they get "banned". To accomplish this, IpTables is used: the attacker's IP gets blocked at IP level, so he can reach the service no longer. The admin can configure how many fails are "too many", and how long the block should be uphold then.
Fail2Ban can be installed on Debian
systems by simply invoking apt-get install fail2ban
, and ships with several
ready-to-use "jails". First thing to enable should be the SSH jail: 6 failed
login attempts from the same IP, and that IP is banned for 10 minutes (of
course you can adjust those values).
Pre-configured apache jails
Fail2Ban also ships with some pre-configured jails for Apache:
- apache-auth: parses the
error_log
for failed HTTP AUTH attempts - apache-badbots: blocks known bad bots. Add your own in
/etc/fail2ban/filter.d/apache-badbots.conf
- apache-overflows: same for too long URLs
Adding your own jails
To establish your own custom jails, two things are needed: A filter definition
in the /etc/fail2ban/filter.d
directory, and a corresponding entry in
/etc/fail2ban/jail.conf
. I will show you some examples here:
Bots abusing the robots.txt
In your robots.txt
you can define which URLs a robot should not visit. Some
hackers use that to find out where the "interesting stuff" might be, and
explicitly request those URLs. If they are not linked from any of your pages,
you could use this as a trap:
User-agent: *
Disallow: /here_are_dragons
Needless to say: I don't have that URL on any of my servers, which is fine. So
let's make use of that trap. First, we need our /etc/fail2ban/filter.d/apache-robots-txt.conf
:
[Definition]
failregex = ^<HOST> -.*"(GET|POST) /here_are_dragons
ignoreregex =
Now we add the jail:
[apache-robots-txt]
# Crawlers taking the robots.txt and then just crawl the disallowed URLs
enabled = true
port = http,https
filter = apache-robots-txt
maxretry = 1
logpath = /var/log/apache2/access_log
This definition is quite self-explaining: The jail is enabled for http and https ports,
uses our filter, and bans any access attempt at its first occurence in Apache's
access_log
(note:you can use wildcards for the log file if you use
separate logs per vhost). Blocking time uses default values (see at the top of
jail.conf
; if you want different values, just add them here.
Blocking bad guys via mod_security
In our mod_security
examples, we established our own rules with
severity:2
(aka CRITICAL
). Several other rules use
this as well for "real bad things"; for even worse things, there are also
severities 1 (EMERGENCY
) and 0 (ALERT
). So let's
jail those attackers as well. We start with /etc/fail2ban/filter.d/mod-security.conf
:
[Definition]
failregex = \[client <HOST>\] .* \[severity \"ALERT\"\]
\[client <HOST>\] .* \[severity \"EMERGENCY\"\]
\[client <HOST>\] .* \[severity \"CRITICAL\"\]
ignoreregex =
Next, we add the corresponding jail to /etc/fail2ban/jail.conf
:
[apache-mod-security]
# ban IP if mod_security issues an EMERGENCY, ALERT, or CRITICAL (severity 0..2)
enabled = true
port = http,https
filter = mod-security
logpath = /var/log/apache2/error_log
maxretry = 1
bantime = 1800
So if someone triggered such "intrusion alert", he cannot access our web service for the next 1800 seconds (30 minutes).
Did we succeed?
A tool I can really recommend to monitor your server is Monitorix. It can not only monitor your system's health status (system load, CPU usage, disk activities, and more), but also your Apache, MySQL, and – here it comes – Fail2Ban activities. So you could use it to check whether your rules and jails had some hits, which could look like this:

-
e.g. "pass through <REQUEST_URI>" when no rule hit, "go-ahead with <NEW_URI>" otherwise ↩︎
-
on a simple rewrite, this could be 3 lines: "rewrite old -> new", "local path result <URI>", "prefixed with document_root to <full file path>" ↩︎
-
see mod_rewrite Wiki for details ↩︎