Using GeoIP information together with ModSecurity


Introduction

In my tutorial’s webserver logfile configuration, there is a position reserved for the country code of an IP address. I have never explained how I get the information into the environment variable that is then used to fill said position. There are several other guides around, but I think it’s time to provide my own tutorial.

First, let’s understand why this is really interesting to put in the logs!

Look at the following offending IPv4 addresses from a production log:

95.214.52.187
109.206.241.123
185.243.218.41
199.195.254.123
45.66.35.35
185.220.101.21
94.140.114.216
95.214.52.156
205.185.116.162
107.189.14.89

If you are like me, this is just a list of numbers. And it’s not better when we look at their IPv6 counterpart.

Now check out the same numeric addresses accompanied by the country code of the IP address.

95.214.52.187 PL
109.206.241.123 US
185.243.218.41 NO
199.195.254.123 US
45.66.35.35 NL
185.220.101.21 DE
94.140.114.216 LV
95.214.52.156 PL
205.185.116.162 US
107.189.14.89 LU

The addresses are taking a form now and a patterns evolves. So the country of origin for the attacks play an important role. With the information available in the logs, it’s easy to extract a list of the origins of the IP addresses that attacked netnea.com in the last 24 hours. I do this as follows (notice how I filter for status code 403).

$ cat access.log | grep "\" 403 " | alcountry | sucspercent    
             Entry        Count Percent
---------------------------------------------------
                GR            1   0.06%
                LT            1   0.06%
                ...
                LU           19   1.66%
                AT           20   1.75%
                CA           23   2.01%
                SC           25   2.19%
                GB           48   4.20%
                NL           56   4.90%
                FR           68   5.94%
                DE          155  13.55%
                US          437  38.20%
---------------------------------------------------
             Total         1144 100.00%

(Please note there is a group of IPs whose country of origin could not be determined. I’ve filtered them away for this report.)

This is useful and anything IP becomes more tanglible when you know where the client resides.

Getting the country code

So how do we know where these IP addresses are located?

We know because of the RIPE database. It’s the database of all ip address ranges in the world, organized in networks and every network is associated to a country.

Looking up the country code in the RIPE database for every request is very expensive. But if you are in the shell and want to learn about an individual IP address, then ipinfo.io gives you easy access.

Here is an example of what I do:

$ curl -s https://ipinfo.io/5.9.158.195
{
  "ip": "5.9.158.195",
  "hostname": "static.195.158.9.5.clients.your-server.de",
  "city": "Koserow",
  "region": "Mecklenburg-Vorpommern",
  "country": "DE",
  "loc": "54.0519,14.0020",
  "org": "AS24940 Hetzner Online GmbH",
  "postal": "17459",
  "timezone": "Europe/Berlin",
  "readme": "https://ipinfo.io/missingauth"
}

That’s a serial offender at netnea.com by the way.

I know people who run a separate DNS server for RIPE lookups. They instruct the webserver to do a reverse lookup on the IP address and the caching DNS will respond with the country code. This is neat, but I have not done this myself, so I’ll stick to a road more often travelled below.

Getting the country code from a local database

Webservers usually use the free MaxMind GeoIP database or they subscribe to their paid service that gives better results. The Maxmind database comes in a proprietary format and you need to be able to read it. Maxmind has also changed the format and abandoned the old legacy format 2-3 years back. ModSecurity 2.9 used to support the old format directly, but it does not support the new format unfortunately. The situation is better for the ModSec 3.0 release line where the new MaxMind database format is supported natively.

There are a few hacks where people on the ModSec 2.9 release line take the
new MaxMind database and convert it into the old format.

Here is one such script: https://github.com/emphazer/GeoIP_convert-v2-v1

Personally, I do not really like ModSecurity to do GeoIP lookups itself. Sure, it sounds neat, but the company MaxMind actually maintains an Apache module for the purpose and it comes with additional features beyond the country code. That’s why this module is my preferred way. So how do we do this?

First, let’s get the database and create the cronjob that performs the regular updates.

You need to register for a free license key at
https://dev.maxmind.com/geoip/geolite2-free-geolocation-data

Navigating the MaxMind website is not really easy, but at the time of this writing, this link is where I was able to obtain a free license after registering an account first, then generating a license key, etc.

For more serious commercial usage, you are well advised to invest in a commercial license. I think they are not very expensive.

Once we have the license key, we can install the geoipupdate package (available under this name with Debian and friends).

The script will install a cronjob at /etc/cron.d/geoipupdate which checks for updates of the database once a day.

Under /etc/GeoIP.conf, you can configure the download of the database. I’ve set the following options in the file:

AccountID <MaxMind Account ID>
LicenseKey <MaxMind License Key>
EditionIDs GeoLite2-Country GeoLite2-ASN

This configures the download of the GeoIP Country database and the database with the network information, the Autonomous System Numbers (ASN). If you are not interested in the ASNs, then remove that part.

Once configured, we can trigger the download by hand:

$ geoipupdate -v
Using config file /etc/GeoIP.conf
Using database directory /usr/share/GeoIP
Performing get filename request to https://updates.maxmind.com/app/update_getfilename?product_id=GeoLite2-Country
Acquired lock file lock (/usr/share/GeoIP/.geoipupdate.lock)
Performing update request to https://updates.maxmind.com/geoip/databases/GeoLite2-Country/update?db_md5=00000000000000000000000000000000
...

You should then find your databases under /usr/share/GeoIP:

-rw-r--r-- 1 root root 5.5M Oct 6 14:15 GeoLite2-Country.mmdb
-rw-r--r-- 1 root root 7.6M Oct 6 14:15 GeoLite2-ASN.mmdb

Now let’s check wether we are able to look up stuff in this database. We install the mmdb-bin package and call it as follows:

$ mmdblookup --file /usr/share/GeoIP/GeoLite2-Country.mmdb --ip 5.9.158.195 country iso_code
DE <utf8_string>

This looks up the IP of the serial offender named above and prints out the English representation of the country code. I’ve put an easier version of this into the following script I called geoip.

#!/bin/bash
#
# GeoIP Country Lookup Script
#

if [ -z "$1" ]; then
	echo "usage: $ geoip <IP address>"
	exit 1
fi

mmdblookup --file /usr/share/GeoIP/GeoLite2-Country.mmdb --ip $1 country iso_code | sed -n -e "2p" | sed -e "s/.*\"\(..\)\".*/\1/"

Integrating the MaxMind GeoLite2 database into Apache

In order to use the database in Apache, we either need to run ModSecurity 3, or we need the mod_maxminddb Apache module. Surprisingly, there is no binary distribution of this package and we need to compile it ourselves.

This is very easy to do though, here are my instructions building on top of my Apache Tutorial 1.

$ sudo apt-get install libmaxminddb0 libmaxminddb-dev
$ wget https://github.com/maxmind/mod_maxminddb/releases/download/1.2.0/mod_maxminddb-1.2.0.tar.gz
$ tar xvzf mod_maxminddb-1.2.0.tar.gz
$ cd mod_maxminddb-1.2.0/
$ ./configure --with-apxs=/apache/bin/apxs
$ make
$ sudo make install
$ ls /apache/modules/mod_maxminddb.so
-rw-r--r-- 1 root root 58600 October 4 2022 /apache/modules/mod_maxminddb.so

Time for the configuration as part of the Apache config:

MaxMindDBEnable On

MaxMindDBFile   COUNTRY_DB         /usr/share/GeoIP/GeoLite2-Country.mmdb
MaxMindDBEnv    GEOIP_COUNTRY_CODE COUNTRY_DB/country/iso_code

MaxMindDBFile   ASN_DB             /usr/share/GeoIP/GeoLite2-ASN.mmdb
MaxMindDBEnv    ASN                ASN_DB/autonomous_system_number
MaxMindDBEnv    ASORG              ASN_DB/autonomous_system_organization

In this snippet, I enable the database. Then I set the path to the country database and instruct mod_maxminddb to make the geoip country code available in the GEOIP_COUNTRY_CODE environment variable.

In the next block, I do the same for the ASN number and the ASORG based on the ASN database.

And that’s all we need to display %{GEOIP_COUNTRY_CODE}e in our access log format. If you are familiar with my extended format, then you know I use the position right after the IP address.

In the original combined log format, this location was reserved for the so called logname. This variable was supposed to be filled by a lookup using the inetd service based on RFC 1413. This has service has fallen out of fashion in the late 20th century, so I have never seen this variable filled out in a logfile. That means, we can use it for our GeoIP information without losing anything or without moving away from the combined log format for the first few columns.

This will then bring us something like the following in combined:

195.154.61.123 FR - [2022-10-06 14:59:49.270679] "GET /wp-login.php?action=register HTTP/1.1" 403 199 "https://www.netnea.com/" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"

And this in my preferred extended format:

195.154.61.123 FR - [2022-10-06 14:59:49.270679] "GET /wp-login.php?action=register HTTP/1.1" 403 199 "https://www.netnea.com/" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-" 60922 www.netnea.com 192.168.3.7 443 application/x-httpd-php - - "-" Yz7RRblzHfE3nPzmtDbBWgAAAAc TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 1072 5831 -% 5848 3247 0 0 5-0-0-0 0-0-0-0 5 0

Perfect.

Integration with ModSecurity

Having the GeoIP country code in an environment variable makes it very easy to test for it with ModSecurity. The following rule will deny access to Switzerland and it’s neighbouring countries via the parallel match operator.

SecRule ENV:GEOIP_COUNTRY_CODE "@pm AT CH DE FR IT LI" \
   "id:10000,\
   phase:2,\
   deny,\
   log,\
   logdata:'%{MATCHED_VAR}',\
   msg:'Denying access for IP from GeoIP country %{MATCHED_VAR}'"

Testing this is a wee bit tricky. What worked for me is configuring an additional virtual IP from one of these countries on the server itself, adding a listener for said IP address and then using curl to access the server locally. That would make sure the additional IP address is used for the connection, geoip lookup happens on said IP address and the rule triggers. Other options to test this may exist though.

But back to ModSecurity: I am not particularly fond of denying access to individual countries, yet it has its uses in certain application level denial of service situations. Here is a variant where you deny access to everybody unless the client IP address is associated with Ukraine or Poland.

SecRule ENV:GEOIP_COUNTRY_CODE "!@pm PL UK" \
   "id:10000,\
   phase:2,\
   deny,\
   log,\
   logdata:'%{MATCHED_VAR}',\
   msg:'Denying access for IP from GeoIP country %{MATCHED_VAR}'"

If you are working with bigger allow-list or deny-lists, then it may be worth to look into the @pmFromFile operator which allows you to list the country codes in a separate file. That way longer lists become easier to read and write.

Now are there any other real world use cases for GeoIP within ModSecurity?

Here is a recipe that defines a different CRS paranoia level and anomaly threshold for Danish IP addresses:

CRS 3.x:

SecRule ENV:GEOIP_COUNTRY_CODE "@streq DK" \
   "id:10000,\
   phase:2,\
   pass,\
   nolog,\
   setvar:tx.inbound_anomaly_score_threshold=3,\
   setvar:tx.outbound_anomaly_score_threshold=3,\
   setvar:tx.paranoia_level=2"

CRS 4.x (soon to be released):

SecRule ENV:GEOIP_COUNTRY_CODE "@streq DK" \
   "id:10000,\
   phase:2,\
   pass,\
   nolog,\
   setvar:tx.inbound_anomaly_score_threshold=3,\
   setvar:tx.outbound_anomaly_score_threshold=3,\
   setvar:tx.blocking_paranoia_level=2"

By default this would be a paranoia level of 1 and anomaly threshold 5. With this recipe Danish IPs will be facing paranoia level 2 rules and a tighter anomaly threshold than the rest of the clients.

Please make sure you place this recipe before the CRS rules include or it won’t work. And please note the paranoia level variable will be called tx.blocking_paranoia_level for CRS v4.

Now this may look attractive. Yet please think twice before you push this into production. And not only because Danish clients are usually nice. The problem is you introduce two classes of clients: There are going to be your VIP clients and the 2nd class citizens, Danish clients here. These clients get more CRS rules and as it happens these are the rules that are more likely to trigger false positives. And unless you pay special attention to the false positives that your 2nd class clients will be facing, you are simply degrading the experience of your site for the 95% of legitimate Danes. And I think that’s not what you really want.

On top, it makes reading the logs more complicated. In a standard setup, you look at the anomaly score and you know wether the request was blocked or not. And if it was blocked wether ModSecurity / CRS played a role or not. With a stricter anomaly threshold for certain source IPs, you would have to check the GeoIP country code to know for sure, etc.

So I think it’s troubling. But here is a recipe that makes some sense: It raises the detection paranoia level for Swiss Clients. The idea is to run paranoia level 1, but you want to see the alerts / false positives that a move to paranoia level 2 would bring. If you are not familiar with the concept of detection and blocking paranoia levels, then please look it up in the documentation. The brief version: You can run rules of a higher paranoia level for detection / reconnaissance purposes, but throw away the score, while you continue to block on the lower paranoia level rules.

CRS 3.x:

SecRule ENV:GEOIP_COUNTRY_CODE "@streq CH" \
   "id:10000,\
   phase:2,\
   pass,\
   nolog,\
   setvar:tx.executing_paranoia_level=2"

CRS 4.x:

SecRule ENV:GEOIP_COUNTRY_CODE "@streq CH" \
   "id:10000,\
   phase:2,\
   pass,\
   nolog,\
   setvar:tx.detection_paranoia_level=2"

Why is this useful? While it does not change a thing on the blocking level, it runs higher level rules for a small group of your clients. In my case, clients from Switzerland are close to me – other netnea employees for example. So I know what they are doing on the site and looking at their false positives gives me an idea what would happen if we were to raise the blocking paranoia level for everybody.

Making sure the GeoIP info is available in ModSecurity phase 1

Apache modules are called via so-called hooks. As a request moves through the lifecyle within the webserver, the various hooks are called one after the other and modules registered for a certain hook are executed. As it happens, the MaxMind module and ModSecurity’s phase 1 happen on the same hook, yet unfortunately, ModSecurity runs before the GeoIP lookup takes place. That means the lookup is too late for ModSecurity’s phase 1 and you need to wait until phase 2.

As we are compiling mod_maxminddb ourselves, we can fix this behavior and make sure the module runs before ModSecurity.

$ diff -c -r mod_maxminddb-1.2.0/src/mod_maxminddb.c mod_maxminddb-1.2.0-mod-security2-patch/src/mod_maxminddb.c
*** mod_maxminddb-1.2.0/src/mod_maxminddb.c     2020-02-03 23:04:14.000000000 +0100
--- mod_maxminddb-1.2.0-mod-security2-patch/src/mod_maxminddb.c 2022-10-10 22:43:36.314817231 +0200
***************
*** 180,186 ****
  static void maxminddb_register_hooks(apr_pool_t *UNUSED(p)) {
      /* make sure we run before mod_rewrite's handler */
      static const char *const asz_succ[] = {
!         "mod_setenvif.c", "mod_rewrite.c", NULL};

      ap_hook_header_parser(export_env_for_dir, NULL, asz_succ, APR_HOOK_MIDDLE);
      ap_hook_post_read_request(
--- 180,186 ----
  static void maxminddb_register_hooks(apr_pool_t *UNUSED(p)) {
      /* make sure we run before mod_rewrite's handler */
      static const char *const asz_succ[] = {
!         "mod_setenvif.c", "mod_rewrite.c", "mod_security2.c", NULL};

      ap_hook_header_parser(export_env_for_dir, NULL, asz_succ, APR_HOOK_MIDDLE);
      ap_hook_post_read_request(

Of course you can also do it the other way around and patch ModSecurity before the compilation. I have submitted that as a pull request with the ModSecurity project.

Both patches have the same effect. They allow you to do the following:

SecRule ENV:GEOIP_COUNTRY_CODE "@pm AT CH DE FR IT LI" \
   "id:10000,\
   phase:1,\
   deny,\
   log,\
   logdata:'%{MATCHED_VAR}',\
   msg:'Denying access for IP from GeoIP country %{MATCHED_VAR}'"

So the GeoIP information is now available in phase 1, where it should be, obviously.

Adding the ASN infos to the logfile

We’re slowly coming to the end, but we have not yet written the ASN information to the access log. I’ve fallen in love with ASN information in the access log lately. The point is, it is making a big difference, when a user agent says it is a browser and it comes from an IP range used for landline customers. Or it says it is a browser and lives in a cheap server network zone.

A first step in this direction is to record ASN information into the access log. The difficult question is where to put this information in the access log.

Adding it to the end would be simple, but I prefer to have it next to the client IP, together with the GeoIP country code. I also want to make sure I am not destroying the combined log format, so adding spaces is out of question as is adding quotes around the strings.

I am not entirely sure my solution is the best, but here is what I do:

I combine GeoIP + ";" + ASN + ";" + ASORG and write it in the second position of the access log line. This is simply an extension of the GeoIP information introduced above. This is the extended Logformat with this definition

LogFormat "%h %{GEOIP_COUNTRY_CODE}e;%{ASN}e;%{ASORG}e %u [%{%Y-%m-%d %H:%M:%S}t.%{usec_frac}t] \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Content-Type}i\" %{remote}p %v %A %p %R %{BALANCER_WORKER_ROUTE}e %X \"ReqID-%{ReqID}i\" %{UNIQUE_ID}e %{SSL_PROTOCOL}x %{SSL_CIPHER}x %I %O %{ratio}n%% %D %{ModSecTimeIn}e %{ApplicationTime}e %{ModSecTimeOut}e %{ModSecAnomalyScoreInPLs}e %{ModSecAnomalyScoreOutPLs}e %{ModSecAnomalyScoreIn}e %{ModSecAnomalyScoreOut}e" extended

Now the problem is that ASORG comes with spaces in the MaxMind database and if we introduce double quotes it breaks the format and all the aliases that I distribute with my tutorials.

So we need to replace all spaces. Replacing them with underscores is my preferred approach.

Here is how I pulled this off in the server context:

RewriteEngine              On
RewriteOptions             InheritDownBefore

RewriteCond "%{ENV:ASORG}" "(.*?)\ (.*)"
RewriteRule / -            [ENV=ASORG:%1_%2,next]

In fact I did not pull this off all by myself. Max Leske pointed out the next flag to me. Let’s examine this rewrite rule!

The rewrite condition looks for spaces. It saves the entire string before the space and the entire string after the space for later use via the brackets.

The rewrite rule does not rewrite the URL at all. Instead it sets an environment variable with the two captured strings. But instead of the space, it uses the underscore for concatenation.

So this would replace the first space character. Enter next. Next is a loop and it’s functionality is to repeat all the rewrite rules from the very beginning (!) up to here until the rewrite condition no longer matches. Or in other words: Repeat until there are no space characters anymore.

There are two things to consider when using next. The first is that this could launch an infinite loop. I think we have everything under control, but if you run a loop on user-supplied input, you are asking for a denial of service.

The second is that it will not loop over the two lines of code above. Instead it will loop over the entire list of rewrite rules. The best way to do this is probably to put the recipe be at the beginning of the rewrite rules, hence the InheritDownBefore.

So how does this look like the access log? Here is one of the serial offenders trying to register on our WordPress site under an URL that does not exist. Note the ASN, the well known Organization and the user agent that does not really match the server zone used to make the request.

116.203.213.175 DE;24940;Hetzner_Online_GmbH - [2022-10-10 17:47:40.171690] "GET /wp-login.php?action=register HTTP/1.1" 403 199 "https://www.netnea.com/" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-" 60977 www.netnea.com 192.168.3.7 443 application/x-httpd-php - - "-" Y0Q@nIL@4OLFph9kJAAdZAAAAAo TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 1072 5831 -% 14005 10221 0 0 5-0-0-0 0-0-0-0 5 0

Neat, is not it?

As you probably know, I have published a series of aliases, you can use to extract information out of the extended access log format.

I have now expanded these to extract the additional information available with this extension and I also made sure the alcountry alias remains compatible with the simple GeoIP country code as well as this extended version:

alias alcountry='cut -d\ -f2 | cut -d\; -f1'
alias alasn='cut -d\ -f2 | sed -e "s/.;([0-9-]);./\1/" -e "s/^[A-Z][A-Z]$/-/" -e "s/-;-;-/-/"'
alias alasorg='cut -d\ -f2 | sed -e "s/.;//" -e "s/^[A-Z][A-Z]$/-/"'

Practical use:

$ grep Y0Q@nIL@4OLFph9kJAAdZAAAAAo access.log | alcountry
DE
$ grep Y0Q@nIL@4OLFph9kJAAdZAAAAAo access.log | alasn
24940
$ grep Y0Q@nIL@4OLFph9kJAAdZAAAAAo access.log | alasorg
Hetzner_Online_GmbH

Summary

GeoIP and ASN information add a lot of perspective to your logfile and it allows to filter traffic effectively if the need arises. ModSecurity brings basic GeoIP information, but the use of mod_maxminddb adds more flexibility and the option to introduce ASN information into the log which adds a lot of context to attack campaigns.