{"id":948,"date":"2016-10-11T09:20:24","date_gmt":"2016-10-11T07:20:24","guid":{"rendered":"http:\/\/www.netnea.com\/cms\/?page_id=948"},"modified":"2025-11-07T10:48:43","modified_gmt":"2025-11-07T09:48:43","slug":"apache-tutorial-5_extending-access-log","status":"publish","type":"page","link":"https:\/\/www.netnea.com\/cms\/apache-tutorial-5_extending-access-log\/","title":{"rendered":"Extending and Analyzing the Access Log"},"content":{"rendered":"\n<h2 id=\"extending-and-analyzing-the-access-log\">Extending and analyzing the access log<\/h2>\n<h3 id=\"what-are-we-doing\">What are we doing?<\/h3>\n<p>We are defining a greatly extended log format in order to better monitor traffic.<\/p>\n<h3 id=\"why-are-we-doing-this\">Why are we doing this?<\/h3>\n<p>In the usual configuration of the Apache web server a log format is used that logs only the most necessary information about access from different clients. In practice, additional information is often required, which can easily be recorded in the server\u2019s access log.<\/p>\n<h3 id=\"requirements\">Requirements<\/h3>\n<ul>\n<li>An Apache web server, ideally one created using the file structure shown in <a href=\"https:\/\/www.netnea.com\/cms\/apache-tutorial-1_compiling-apache\/\">Tutorial 1 (Compiling Apache)<\/a>.<\/li>\n<li>Understanding of the minimal configuration in <a href=\"https:\/\/www.netnea.com\/cms\/apache-tutorial-2_minimal-apache-configuration\/\">Tutorial 2 (Configuring a Minimal Apache Web Server)<\/a>.<\/li>\n<li>An Apache web server with SSL\/TLS support as in <a href=\"https:\/\/www.netnea.com\/cms\/apache-tutorial-4_configuring-ssl-tls\/\">Tutorial 4 (Enabling Encryption with SSL\/TLS)<\/a>.<\/li>\n<\/ul>\n<h3 id=\"step-1-understanding-the-common-log-format\">Step 1: Understanding the common log format<\/h3>\n<p>The <em>common<\/em> log format is a very simple format that is hardly ever used any more. It has the advantage of being space-saving and hardly ever writing unnecessary information.<\/p>\n<div class=\"sourceCode\" id=\"cb1\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb1-1\"><a href=\"#cb1-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">LogFormat<\/span> <span class=\"st\">&quot;%h %l %u %t <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%r<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %&gt;s %b&quot;<\/span> common<\/span>\n<span id=\"cb1-2\"><a href=\"#cb1-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb1-3\"><a href=\"#cb1-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">CustomLog<\/span> logs\/access.log common<\/span><\/code><\/pre><\/div>\n<p>We use the <em>LogFormat<\/em> directive to define a format and give it a name, <em>common<\/em> in this case.<\/p>\n<p>We invoke this name in the definition of the log file using the <em>CustomLog<\/em> directive. We can use these two directives multiple times in the configuration. Thus, multiple log formats with several name abbreviations can be defined next to one another and log files written in different formats. It\u2019s possible for different services to write to separate log files on the same server.<\/p>\n<p>The individual elements of the <em>common<\/em> log format are as follows:<\/p>\n<p><em>%h<\/em> designates the <em>remote host<\/em>, normally the IP address of the client making the request. But if the client is behind a proxy server, then we\u2019ll see the IP address of the proxy server here. So, if multiple clients share the proxy server then they will have the same <em>remote host<\/em> entry. It\u2019s also possible to retranslate the IP addresses using DNS reverse lookup on our server. If we configure this (which is not recommended), then the host name determined for the client would be entered here.<\/p>\n<p><em>%l<\/em> represents the <em>remote log name<\/em>. It is usually empty and output as a hyphen (\u201c-\u201c). In fact, this is an attempt to identify the client via <em>ident<\/em> access to the client. This has little client support and results in the biggest performance bottlenecks which is why <em>%l<\/em> is an artifact from the early 1990s.<\/p>\n<p><em>%u<\/em> is more commonly used and designates the user name of an authenticated user. The name is set by an authentication module and remains empty (thus the \u201d-\u201d), for as long as access without authentication on the server takes.<\/p>\n<p><em>%t<\/em> means the time of access. For big, slow requests the time means the moment the server receives the request line. Since Apache writes a request in the log file only after completing the response, it may occur that a slower request with an earlier time may appear several entries below a short request started later. Up to now this has resulted in confusion when reading the log file.<\/p>\n<p>By default, the time is output between square brackets. It is normally the local time including the deviation from standard time. For example:<\/p>\n<div class=\"sourceCode\" id=\"cb2\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb2-1\"><a href=\"#cb2-1\" aria-hidden=\"true\"><\/a>[<span class=\"ex\">25\/Nov<\/span>\/2014:<span class=\"ex\">08<\/span>:51:22 +0100]<\/span><\/code><\/pre><\/div>\n<p>This means November 25, 2014, 8:51 am, 1 hour before standard time. The format of the time can also be changed if necessary. This is done using the <em>%{format}t<\/em> pattern, where <em>format<\/em> follows the specification of <em>strftime(3)<\/em>. We have already made use of this option in Tutorial 2. But let\u2019s use an example to take a closer look:<\/p>\n<div class=\"sourceCode\" id=\"cb3\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb3-1\"><a href=\"#cb3-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">%<\/span>{[%Y-%m-%d %H:%M:%S %z (%s)]}<span class=\"ex\">t<\/span><\/span><\/code><\/pre><\/div>\n<p>In this example we put the date in the order <em>Year-Month-Day<\/em>, to make it sortable. And after the deviation from standard time we add the time in seconds since the start of the Unix age in January 1970. This format is more easily read and interpreted via a script.<\/p>\n<p>This example gives us entries using the following pattern:<\/p>\n<div class=\"sourceCode\" id=\"cb4\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb4-1\"><a href=\"#cb4-1\" aria-hidden=\"true\"><\/a>[<span class=\"ex\">2014-11-25<\/span> 09:34:33 +0100 (1322210073)]<\/span><\/code><\/pre><\/div>\n<p>So much for <em>%t<\/em>. This brings us to <em>%r<\/em> and the request line. This is the first line of the HTTP request as it was sent from the client to the server. Strictly speaking, the request line does not belong in the group of request headers, but it is normally subsumed along with them. In any case, in the request line the client transmits the identification of the resource it is demanding.<\/p>\n<p>Specifically, the line follows this pattern:<\/p>\n<div class=\"sourceCode\" id=\"cb5\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb5-1\"><a href=\"#cb5-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">Method<\/span> URI Protocol<\/span><\/code><\/pre><\/div>\n<p>In practice, it\u2019s a simple example such as this:<\/p>\n<div class=\"sourceCode\" id=\"cb6\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb6-1\"><a href=\"#cb6-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">GET<\/span> \/index.html HTTP\/1.1<\/span><\/code><\/pre><\/div>\n<p>The <em>GET<\/em> method is being used. This is followed by a space, then the absolute path of the resource on the server. The index file in this case. Optionally, the client can, as we are aware, add a <em>query string<\/em> to the path. This <em>query string<\/em> normally begins with a question mark and comes with a number of parameter value pairs. The <em>query string<\/em> is also output in the log file. Finally, the protocol that is most likely to be HTTP version 1.1. Version 1.0 still continues to be used by some agents (automated scripts). The new HTTP\/2 protocol does not appear in the request line of the initial request. In HTTP\/2 an update from HTTP\/1.1 to HTTP\/2 takes place during the request. The start follows the pattern above.<\/p>\n<p>The following format element follows a somewhat different pattern: <em>%&gt;s<\/em>. This means the status of the response, such as <em>200<\/em> for a successfully completed request. The angled bracket indicates that we are interesting in the final status. It may occur that a request is passed off within the server. In this case what we are interested in is not the status that passing it off triggered, but the status of the response for the final internal request.<\/p>\n<p>One typical example would be a request that causes an error on the server (Status 500). But if the associated error page is unavailable, this results in status 404 for the internal transfer. Using the angled bracket means that in this case we want 404 to be written to the log file. If we reverse the direction of the angled bracket, then Status 500 would be logged. Just to be certain, it may be advisable to log both values using the following entry (which is not usual in practice):<\/p>\n<div class=\"sourceCode\" id=\"cb7\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb7-1\"><a href=\"#cb7-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">%<\/span><span class=\"op\">&lt;<\/span>s %<span class=\"op\">&gt;<\/span>s<\/span><\/code><\/pre><\/div>\n<p><em>%b<\/em> is the last element of the <em>common<\/em> log format. It shows the number of bytes announced in the content-length response headers. In a request for <em>http:\/\/www.example.com\/index.html<\/em> this value is the size of the <em>index.html<\/em> file. The <em>response headers<\/em> also transmitted are not counted. And there is an additional problem with this number: It is a calculation by the webserver and not an account of the bytes that have actually been sent in the response.<\/p>\n<h3 id=\"step-2-understanding-the-combined-log-format\">Step 2: Understanding the combined log format<\/h3>\n<p>The most widespread log format, <em>combined<\/em>, is based on the <em>common<\/em> log format, extending it by two items.<\/p>\n<div class=\"sourceCode\" id=\"cb8\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb8-1\"><a href=\"#cb8-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">LogFormat<\/span> <span class=\"st\">&quot;%h %l %u %t <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%r<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %&gt;s %b <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{Referer}i<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{User-Agent}i<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">&quot;<\/span> combined<\/span>\n<span id=\"cb8-2\"><a href=\"#cb8-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb8-3\"><a href=\"#cb8-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">CustomLog<\/span> logs\/access.log combined<\/span><\/code><\/pre><\/div>\n<p><em>\u201c%{Referer}i\u201d<\/em> is used for the referrer. It is output in quotes. The referrer means any resource from which the request that just occurred was originally initiated. This complicated paraphrasing can best be illustrated by an example. Suppose you click a link at a search engine to get to <em>www.example.com<\/em>. But when you send your request, you are automatically redirected to <em>shop.example.com<\/em>. The log entry for <em>shop.example.com<\/em> will include the search engine as the referrer and not the link to <em>www.example.com<\/em>. If however a CSS file dependent on <em>shop.example.com<\/em> is loaded, the referer would normally be attributed to <em>shop.example.com<\/em>. However, despite all of this, the referrer is part of the client\u2019s request. The client is required to follow the protocol and conventions, but can in fact send any kind of information, which is why you cannot rely on headers like these when security is an issue.<\/p>\n<p>Finally, <em>\u201c%{User-Agent}i\u201d<\/em> means the client user agent, which is also placed in quotes. This is also a value controlled by the client and which we should not rely on too much. The user agent is the client browser software, normally including the version, the rendering engine, information about compatibility with other browsers and various installed plugins. This results in very long user agent entries which can in some cases include so much information that an individual client can be uniquely identified, because they feature a particular combination of different add-ons of specific versions.<\/p>\n<h3 id=\"step-3-enabling-the-logio-and-unique-id-modules\">Step 3: Enabling the Logio and Unique-ID modules<\/h3>\n<p>We have become familiar with the <em>combined<\/em> format, the most widespread Apache log format. However, to simplify day-to-day work, the information shown is just not enough. Additional useful information has to be included in the log file.<\/p>\n<p>It is advisable to use the same log format on all servers. Now, instead of just propagating one or two additional values, these instructions describe a comprehensive log format that has proven useful in a variety of scenarios.<\/p>\n<p>However, in order to be able to configure the log format below, we first have to enable the <em>Logio<\/em> module. And on top if the the Unique-ID module, which useful from the start.<\/p>\n<p>If the server has been compiled as described in Tutorial 1, then these modules are already present and only have to be added to the list of modules being loaded in the server\u2019s configuration file.<\/p>\n<div class=\"sourceCode\" id=\"cb9\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb9-1\"><a href=\"#cb9-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">LoadModule<\/span>              logio_module            modules\/mod_logio.so<\/span>\n<span id=\"cb9-2\"><a href=\"#cb9-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">LoadModule<\/span>              unique_id_module        modules\/mod_unique_id.so<\/span><\/code><\/pre><\/div>\n<p>We need this module to be able to write two values. <em>IO In<\/em> and <em>IO Out<\/em>. This means the total number of bytes of the HTTP request including header lines and the total number of bytes in the response, also including header lines. The Unique-ID module is calculating a unique identifier for every request. We\u2019ll return to this later on.<\/p>\n<h3 id=\"step-4-configuring-the-new-extended-log-format\">Step 4: Configuring the new, extended log format<\/h3>\n<p>We are now ready for a new, comprehensive log format. The format also includes values that the server is as of yet unaware of with the modules defined up to now. It will leave them empty or show them as a hyphen <em>\u201c-\u201d<\/em>. Only with the <em>Logio<\/em> module just enabled this won\u2019t work. The server will crash if we request these values without them being present.<\/p>\n<p>We will gradually be filling in these values in the instructions below. But, as explained above, because it is useful to use the same log format everywhere, we will now be getting a bit ahead of ourselves in the configuration below.<\/p>\n<p>We\u2019ll be starting from the <em>combined<\/em> format and will be extending it to the right. The advantage of this is that the extended log files will continue to be readable in many standard tools, because the additional values are simply ignored. Furthermore, it is very easy to convert the extended log file back into the basic version and then still end up with a <em>combined<\/em> log format.<\/p>\n<p>We define the log format as follows:<\/p>\n<div class=\"sourceCode\" id=\"cb10\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb10-1\"><a href=\"#cb10-1\" aria-hidden=\"true\"><\/a><\/span>\n<span id=\"cb10-2\"><a href=\"#cb10-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">LogFormat<\/span> <span class=\"st\">&quot;%h %{GEOIP_COUNTRY_CODE}e %u [%{%Y-%m-%d %H:%M:%S}t.%{usec_frac}t] <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%r<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %&gt;s %b \\<\/span><\/span>\n<span id=\"cb10-3\"><a href=\"#cb10-3\" aria-hidden=\"true\"><\/a><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{Referer}i<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{User-Agent}i<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{Content-Type}i<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %{remote}p %v %A %p %R \\<\/span><\/span>\n<span id=\"cb10-4\"><a href=\"#cb10-4\" aria-hidden=\"true\"><\/a><span class=\"st\">%{BALANCER_WORKER_ROUTE}e %X <\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%{cookie}n<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %{UNIQUE_ID}e %{SSL_PROTOCOL}x %{SSL_CIPHER}x \\<\/span><\/span>\n<span id=\"cb10-5\"><a href=\"#cb10-5\" aria-hidden=\"true\"><\/a><span class=\"st\">%I %O %{ratio}n%% %D %{ModSecTimeIn}e %{ApplicationTime}e %{ModSecTimeOut}e \\<\/span><\/span>\n<span id=\"cb10-6\"><a href=\"#cb10-6\" aria-hidden=\"true\"><\/a><span class=\"st\">%{ModSecAnomalyScoreInPLs}e %{ModSecAnomalyScoreOutPLs}e \\<\/span><\/span>\n<span id=\"cb10-7\"><a href=\"#cb10-7\" aria-hidden=\"true\"><\/a><span class=\"st\">%{ModSecAnomalyScoreIn}e %{ModSecAnomalyScoreOut}e&quot;<\/span> extended<\/span>\n<span id=\"cb10-8\"><a href=\"#cb10-8\" aria-hidden=\"true\"><\/a><\/span>\n<span id=\"cb10-9\"><a href=\"#cb10-9\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb10-10\"><a href=\"#cb10-10\" aria-hidden=\"true\"><\/a><\/span>\n<span id=\"cb10-11\"><a href=\"#cb10-11\" aria-hidden=\"true\"><\/a><span class=\"ex\">CustomLog<\/span>       logs\/access.log extended<\/span><\/code><\/pre><\/div>\n<h3 id=\"step-5-understanding-the-new-extended-log-format\">Step 5: Understanding the new, extended log format<\/h3>\n<p>The new log format adds 23 values to the access log. This may seem excessive at first glance, but there are in fact good reasons for all of them and having these values available in day-to-day work makes it a lot easier to track down errors.<\/p>\n<p>Let\u2019s have a look at the values in order.<\/p>\n<p>In the description of the <em>common<\/em> log format we saw that the second value, the <em>logname<\/em> entry, displays an unused artifact right after the client IP address. We\u2019ll replace this item in the log file with the country code for the client IP address. This is useful, because this country code is strongly characteristic of an IP address. (In many cases there is a big difference whether the request originates nationally or from the South Pacific). It is now practical to place it right next to the IP address and have it add more in-depth information to the meaningless number.<\/p>\n<p>After this comes the time format defined in Tutorial 2, which is oriented to the time format of the error log and is now congruent with it. We are also keeping track of microseconds, giving us precise timing information. We are familiar with the next values.<\/p>\n<p>_&#8221;%Content-Type}i&#8221; describes the Content-Type Request header. This is usually empty, however requests with the HTTP method POST that are used to submit forms or upload data make use of this header. We are ultimately planning to add ModSecurity to our service. Together with that module, the information about the content type becomes really important as the behavior of the ModSecurity engine changes a lot depending on this header.<\/p>\n<p><em>%{remote}p<\/em> is the next addition. It stands for port number of the client. So whenever a client opens a TCP connection to a server address and port, he uses one of his own ports. Information about this port can tell us just how many connections a client opens to our server. And in a situation where multiple clients connect using the same IP address (thanks to Network Address Translation NAT), it can help to tell the clients apart.<\/p>\n<p><em>%v<\/em> refers to the canonical host name of the server that handled the request. If we talk to the server via an alias, the actual name of the server will be written here, not the alias. In a virtual host setup the virtual host server names are also canonical. They will thus also show up here and we can distinguish among them in the log file.<\/p>\n<p><em>%A<\/em> is the IP address of the server that received the request. This value helps us to distinguish among servers if multiple log files are combined or multiple servers are writing to the same log file.<\/p>\n<p><em>%p<\/em> then describes the port number on which the request was received. This is also important to be able to keep some entries apart if we combine different log files (such as those for port 80 and those for port 443).<\/p>\n<p><em>%R<\/em> shows the handler that generated the response to a request. This value may also be empty (\u201c-\u201c) if a static file was sent. Or it uses <em>proxy<\/em> to indicate that the request was forwarded to another server.<\/p>\n<p><em>%{BALANCER_WORKER_ROUTE}e<\/em> also has to do with forwarding requests. If we alternate among target servers this value represents where the request was sent.<\/p>\n<p><em>%X<\/em> shows the status of the TCP connection after the request has been completed. There are three possible values: The connection is closed (<em>&#8211;<\/em>), the connection is being kept open using <em>Keep-Alive<\/em> (<em>+<\/em>) or the connection was lost before the request could be completed (<em>X<\/em>).<\/p>\n<p><em>\u201c%{cookie}n\u201d<\/em> is a value employed by user tracking. This enables us to use a cookie to identify a client and recognize it at a later point in time, provided it still has the cookie. If we set the cookie for the whole domain, e.g.\u00a0to example.com and not limited to www.example.com, then we are even able to track a client across multiple hosts. Ideally, this would also be possible from the client\u2019s IP address, but this may change over the course of a session and multiple clients may be sharing a single IP address.<\/p>\n<p><em>%{UNIQUE_ID}e<\/em> is a very helpful value. A unique ID is created on the server for every request. When we output this value on an error page for instance, then a request in the log file can be easily identified using a screenshot, and ideally the entire session can be reproduced on the basis of the user tracking cookies.<\/p>\n<p>Now come two values made available by <em>mod_ssl<\/em>. The encryption module provides the log module values in its own name space, indicated by <em>x<\/em>. The individual values are explained in the <em>mod_ssl<\/em> documentation. For the operation of a server the protocol and encryption used are of primary interest. These two values, referenced by <em>%{SSL_PROTOCOL}x<\/em> and <em>%{SSL_CIPHER}x<\/em> help us get an overview of encryption use. Sooner or later there will come a time when we have to disable the <em>TLSv1<\/em> protocol. But first we want to be certain that is it no longer playing a significant role in practice. The log file will help us do that. It is similar to the encryption algorithm that tells us about the <em>ciphers<\/em> actually being used and helps us make a statement about which ciphers are no longer being used. The information is important. If, for example, vulnerabilities in individual versions of protocols or individual encryption methods become known, then we can assess the effect of our measures by referring to the log file. In spring 2015, these log files proved to be extremely valuable and allowed us to quickly assess the impact of disabling SSLv3 as follows: \u201cImmediately disabling the SSLv3 protocol as a reaction to the POODLE vulnerability will cause an error in approx. 0.8% of requests. Extrapolated to our customer base, xx number of customers will be impacted.\u201d Based on these numbers, the risk and the effect of the measures were predictable.<\/p>\n<p><em>%I<\/em> and <em>%O<\/em> are used to define the values used by the <em>Logio<\/em> module. It is the total number of bytes in the request and the total number of bytes in the response. We are already familiar with <em>%b<\/em> for the total number of bytes in the response body. <em>%O<\/em> is a bit more precise here and helps us recognize when the request or its response violates size limits.<\/p>\n<p><em>%{ratio}n%%<\/em> means the percentage by which the transferred data were able to be compressed by using the <em>Deflate<\/em> module. This is of no concern for the moment, but will provide us interesting performance data in the future.<\/p>\n<p><em>%D<\/em> specifies the complete duration of the request in microseconds. Measurement takes place from the time the request line is received until the last part of the response leaves the server.<\/p>\n<p>We\u2019ll continue with performance data. In the future we will be using a stopwatch to separately measure the request on its way to the server, onward to the application and while processing the response. The values for this are set in the <em>ModSecTimeIn<\/em>, <em>ApplicationTime<\/em> and <em>ModSecTimeOut<\/em> environment variables.<\/p>\n<p>And, last but not least, there are other values provided to us by the <em>OWASP ModSecurity Core Rule Set<\/em> (to be handled in a subsequent tutorial), specifically the anomaly scores of the request and the response. For the moment it\u2019s not important to know all of this. What\u2019s important is that this highly extended log format gives us a foundation upon which we can build without having to adjust the log format again.<\/p>\n<h3 id=\"step-6-writing-other-request-and-response-headers-to-an-additional-log-file\">Step 6: Writing other request and response headers to an additional log file<\/h3>\n<p>In day-to-day work you are often looking for specific requests or you are unsure of which requests are causing an error. It has often been shown to be useful to have specific additional values written to the log file. Any request and response headers or environment variables can be easily written. Our log format makes extensive use of it.<\/p>\n<p>The <em>&#8220;%{Referer}i&#8221;<\/em> and <em>&#8220;%{User-Agent}i&#8221;<\/em> values are request header fields. The balancer route in <em>%{BALANCER_WORKER_ROUTE}e<\/em> is an environment variable. The pattern is clear: <em>%{Header\/Variable}<Domain><\/em>. Request headers are assigned to the <em>i<\/em> domain. Environment variables to domain <em>e<\/em>, the response headers to domain <em>o<\/em> and the variables of the <em>SSL<\/em> modules to the <em>x<\/em> domain.<\/p>\n<p>So, for debugging purposes we will be writing an additional log file. We will no longer be using the <em>LogFormat<\/em> directive, but instead defining the format together with the file on one line. This is a shortcut, if you want to use a specific format one time only.<\/p>\n<div class=\"sourceCode\" id=\"cb11\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb11-1\"><a href=\"#cb11-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">CustomLog<\/span> logs\/access-debug.log <span class=\"st\">&quot;[%{%Y-%m-%d %H:%M:%S}t.%{usec_frac}t] %{UNIQUE_ID}e \\<\/span><\/span>\n<span id=\"cb11-2\"><a href=\"#cb11-2\" aria-hidden=\"true\"><\/a><span class=\"dt\">\\&quot;<\/span><span class=\"st\">%r<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\"> %{Accept}i %{Content-Type}o&quot;<\/span><\/span><\/code><\/pre><\/div>\n<p>With this additional log file we see the wishes expressed by the client in terms of the content type and what the server actually delivered. Normally this interplay between client and server works very well. But in practice there are sometimes inconsistencies, which is why an additional log file of this kind can be useful for debugging. The result could then look something like this:<\/p>\n<div class=\"sourceCode\" id=\"cb12\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb12-1\"><a href=\"#cb12-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> logs\/access-debug.log<\/span>\n<span id=\"cb12-2\"><a href=\"#cb12-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">2015-09-02<\/span> 11:58:35.654011 VebITcCoAwcAADRophsAAAAX <span class=\"st\">&quot;GET \/ HTTP\/1.1&quot;<\/span> *\/* text\/html<\/span>\n<span id=\"cb12-3\"><a href=\"#cb12-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">2015-09-02<\/span> 11:58:37.486603 VebIT8CoAwcAADRophwAAAAX <span class=\"st\">&quot;GET \/cms\/feed\/ HTTP\/1.1&quot;<\/span> text\/html,application\/xhtml+xml\u2026 <\/span>\n<span id=\"cb12-4\"><a href=\"#cb12-4\" aria-hidden=\"true\"><\/a><span class=\"ex\">2015-09-02<\/span> 11:58:39.253209 VebIUMCoAwcAADRoph0AAAAX <span class=\"st\">&quot;GET \/cms\/2014\/04\/17\/ubuntu-14-04\/ HTTP\/1.1&quot;<\/span> *\/* text\/html<\/span>\n<span id=\"cb12-5\"><a href=\"#cb12-5\" aria-hidden=\"true\"><\/a><span class=\"ex\">2015-09-02<\/span> 11:58:40.893992 VebIU8CoAwcAADRbdGkAAAAD <span class=\"st\">&quot;GET \/cms\/2014\/05\/13\/download-softfiles HTTP\/1.1&quot;<\/span> *\/* \u2026 <\/span>\n<span id=\"cb12-6\"><a href=\"#cb12-6\" aria-hidden=\"true\"><\/a><span class=\"ex\">2015-09-02<\/span> 11:58:43.558478 VebIVcCoAwcAADRbdGoAAAAD <span class=\"st\">&quot;GET \/cms\/2014\/08\/25\/netcapture-sshargs HTTP\/1.1&quot;<\/span> *\/* \u2026 <\/span>\n<span id=\"cb12-7\"><a href=\"#cb12-7\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span><\/code><\/pre><\/div>\n<p>This is how log files can be very freely defined in Apache. What\u2019s more interesting is analyzing the data. But we\u2019ll need some data first.<\/p>\n<h3 id=\"step-7-trying-it-out-and-filling-the-log-file\">Step 7: Trying it out and filling the log file<\/h3>\n<p>Let\u2019s configure the extended access log in the <em>extended<\/em> format as described above and work a bit with the server.<\/p>\n<p>We could use <em>ApacheBench<\/em> as described in the second tutorial for this, but that would result in a very uniform log file. We can change things up a bit with the following two one-liners (note the <em>insecure<\/em> flag, that does away with certificate warnings in curl).<\/p>\n<div class=\"sourceCode\" id=\"cb13\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb13-1\"><a href=\"#cb13-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"kw\">for<\/span> <span class=\"ex\">N<\/span> in <span class=\"dt\">{1..100}<\/span><span class=\"kw\">;<\/span> <span class=\"kw\">do<\/span> <span class=\"ex\">curl<\/span> --silent --insecure https:\/\/localhost\/index.html?n=<span class=\"va\">${N}<\/span>a <span class=\"op\">&gt;<\/span>\/dev\/null<span class=\"kw\">;<\/span> <span class=\"kw\">done<\/span><\/span>\n<span id=\"cb13-2\"><a href=\"#cb13-2\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"kw\">for<\/span> <span class=\"ex\">N<\/span> in <span class=\"dt\">{1..100}<\/span><span class=\"kw\">;<\/span> <span class=\"kw\">do<\/span> <span class=\"va\">PAYLOAD=$(<\/span><span class=\"ex\">uuid<\/span> -n <span class=\"va\">$N<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">xargs<\/span><span class=\"va\">)<\/span>; <span class=\"kw\">\\<\/span><\/span>\n<span id=\"cb13-3\"><a href=\"#cb13-3\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">curl<\/span> --silent --data <span class=\"st\">&quot;payload=<\/span><span class=\"va\">$PAYLOAD<\/span><span class=\"st\">&quot;<\/span> --insecure https:\/\/localhost\/index.html?n=<span class=\"va\">${N}<\/span>b <span class=\"op\">&gt;<\/span>\/dev\/null<span class=\"kw\">;<\/span> <span class=\"kw\">\\<\/span><\/span>\n<span id=\"cb13-4\"><a href=\"#cb13-4\" aria-hidden=\"true\"><\/a>   <span class=\"kw\">done<\/span><\/span><\/code><\/pre><\/div>\n<p>On the first line we simply make one hundred requests, numbered in the <em>query string<\/em>. Then comes the interesting idea on the second line: We again make one hundred requests. But this time we want to send the data using a POST request in the body of the request. We are dynamically creating this payload in such a way that it gets bigger every time it is called. We use <em>uuidgen<\/em> to generate the data we need. This is a command that generates an <em>ascii ID<\/em>. Stringed together, we get a lot of data. (If there is an error message, this could be because the <em>uuidgen<\/em> command is not present. In this case, the <em>uuid<\/em> package should be installed).<\/p>\n<p>It may take a moment to process this line. As a result we see the following entries in the log file:<\/p>\n<div class=\"sourceCode\" id=\"cb14\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb14-1\"><a href=\"#cb14-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.594159] <span class=\"st\">&quot;GET \/index.html?n=1a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026 <\/span>\n<span id=\"cb14-2\"><a href=\"#cb14-2\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53252<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjEAAAAAI TLSv1.2 \u2026 <\/span>\n<span id=\"cb14-3\"><a href=\"#cb14-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 97 - - - - - - -<\/span>\n<span id=\"cb14-4\"><a href=\"#cb14-4\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.612331] <span class=\"st\">&quot;GET \/index.html?n=2a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-5\"><a href=\"#cb14-5\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53254<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjEQAAAAg TLSv1.2 \u2026<\/span>\n<span id=\"cb14-6\"><a href=\"#cb14-6\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 123 - - - - - - -<\/span>\n<span id=\"cb14-7\"><a href=\"#cb14-7\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.634044] <span class=\"st\">&quot;GET \/index.html?n=3a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-8\"><a href=\"#cb14-8\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53256<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjEgAAAAc TLSv1.2 \u2026<\/span>\n<span id=\"cb14-9\"><a href=\"#cb14-9\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 136 - - - - - - -<\/span>\n<span id=\"cb14-10\"><a href=\"#cb14-10\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.652333] <span class=\"st\">&quot;GET \/index.html?n=4a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-11\"><a href=\"#cb14-11\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53258<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjEwAAAAs TLSv1.2 \u2026<\/span>\n<span id=\"cb14-12\"><a href=\"#cb14-12\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 100 - - - - - - -<\/span>\n<span id=\"cb14-13\"><a href=\"#cb14-13\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.669342] <span class=\"st\">&quot;GET \/index.html?n=5a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-14\"><a href=\"#cb14-14\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53260<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjFAAAAA4 TLSv1.2 \u2026<\/span>\n<span id=\"cb14-15\"><a href=\"#cb14-15\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 101 - - - - - - -<\/span>\n<span id=\"cb14-16\"><a href=\"#cb14-16\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.686449] <span class=\"st\">&quot;GET \/index.html?n=6a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-17\"><a href=\"#cb14-17\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53262<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjFQAAABA TLSv1.2 \u2026<\/span>\n<span id=\"cb14-18\"><a href=\"#cb14-18\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 102 - - - - - - -<\/span>\n<span id=\"cb14-19\"><a href=\"#cb14-19\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:35.703428] <span class=\"st\">&quot;GET \/index.html?n=7a HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-20\"><a href=\"#cb14-20\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">53264<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAt7evwRxnTvzP--AjFgAAABQ TLSv1.2 \u2026<\/span>\n<span id=\"cb14-21\"><a href=\"#cb14-21\" aria-hidden=\"true\"><\/a><span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span> 422 1463 -% 101 - - - - - - -<\/span>\n<span id=\"cb14-22\"><a href=\"#cb14-22\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb14-23\"><a href=\"#cb14-23\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.060573] <span class=\"st\">&quot;POST \/index.html?n=1b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-24\"><a href=\"#cb14-24\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53452<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjdAAAAAE \u2026<\/span>\n<span id=\"cb14-25\"><a href=\"#cb14-25\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 536 1463 -% 97 - - - - - - -<\/span>\n<span id=\"cb14-26\"><a href=\"#cb14-26\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.080029] <span class=\"st\">&quot;POST \/index.html?n=2b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-27\"><a href=\"#cb14-27\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53454<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjdQAAAAU \u2026<\/span>\n<span id=\"cb14-28\"><a href=\"#cb14-28\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 573 1463 -% 113 - - - - - - -<\/span>\n<span id=\"cb14-29\"><a href=\"#cb14-29\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.100103] <span class=\"st\">&quot;POST \/index.html?n=3b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-30\"><a href=\"#cb14-30\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53456<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjdgAAAAQ \u2026<\/span>\n<span id=\"cb14-31\"><a href=\"#cb14-31\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 611 1463 -% 105 - - - - - - -<\/span>\n<span id=\"cb14-32\"><a href=\"#cb14-32\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.119907] <span class=\"st\">&quot;POST \/index.html?n=4b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-33\"><a href=\"#cb14-33\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53458<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjdwAAAAo \u2026<\/span>\n<span id=\"cb14-34\"><a href=\"#cb14-34\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 648 1463 -% 93 - - - - - - -<\/span>\n<span id=\"cb14-35\"><a href=\"#cb14-35\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.137620] <span class=\"st\">&quot;POST \/index.html?n=5b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-36\"><a href=\"#cb14-36\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53460<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjeAAAAA0 \u2026<\/span>\n<span id=\"cb14-37\"><a href=\"#cb14-37\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 685 1463 -% 98 - - - - - - -<\/span>\n<span id=\"cb14-38\"><a href=\"#cb14-38\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.155453] <span class=\"st\">&quot;POST \/index.html?n=6b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-39\"><a href=\"#cb14-39\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53462<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjeQAAABE \u2026<\/span>\n<span id=\"cb14-40\"><a href=\"#cb14-40\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 722 1463 -% 97 - - - - - - -<\/span>\n<span id=\"cb14-41\"><a href=\"#cb14-41\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:48.174826] <span class=\"st\">&quot;POST \/index.html?n=7b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-42\"><a href=\"#cb14-42\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53464<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxLevwRxnTvzP--AjegAAABM \u2026<\/span>\n<span id=\"cb14-43\"><a href=\"#cb14-43\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 759 1463 -% 95 - - - - - - -<\/span>\n<span id=\"cb14-44\"><a href=\"#cb14-44\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb14-45\"><a href=\"#cb14-45\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:50.533651] <span class=\"st\">&quot;POST \/index.html?n=99b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-46\"><a href=\"#cb14-46\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53648<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxrevwRxnTvzP--Aj1gAAABM \u2026<\/span>\n<span id=\"cb14-47\"><a href=\"#cb14-47\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 4216 1517 -% 1740 - - - - - - -<\/span>\n<span id=\"cb14-48\"><a href=\"#cb14-48\" aria-hidden=\"true\"><\/a><span class=\"ex\">127.0.0.1<\/span> - - [2019-01-31 05:59:50.555242] <span class=\"st\">&quot;POST \/index.html?n=100b HTTP\/1.1&quot;<\/span> 200 45 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;curl\/7.58.0&quot;<\/span> \u2026<\/span>\n<span id=\"cb14-49\"><a href=\"#cb14-49\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;application\/x-www-form-urlencoded&quot;<\/span> <span class=\"ex\">53650<\/span> localhost 127.0.0.1 443 - - + <span class=\"st\">&quot;-&quot;<\/span> XFKAxrevwRxnTvzP--Aj1wAAABc \u2026<\/span>\n<span id=\"cb14-50\"><a href=\"#cb14-50\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 4254 1517 -% 264 - - - - - - -<\/span><\/code><\/pre><\/div>\n<p>As predicted above, a lot of values are still empty or indicated by <em>&#8211;<\/em>. But we see that we talked to server <em>www.example.com<\/em> on port 443 and that the size of the request increased with every <em>POST<\/em> request, with it being over 4K, or 4096 bytes, in the end. Simple analyses can already be performed with this simple log file.<\/p>\n<h3 id=\"step-8-performing-simple-analyses-using-the-extended-log-format\">Step 8: Performing simple analyses using the extended log format<\/h3>\n<p>If you take a close look at the example log output above you will see that the duration of the requests are not evenly distributed and that there is a single outlier. We can identify the outlier as follows:<\/p>\n<div class=\"sourceCode\" id=\"cb15\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb15-1\"><a href=\"#cb15-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">egrep<\/span> -o <span class=\"st\">&quot;\\% [0-9]+ &quot;<\/span> logs\/access.log <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -b3- <span class=\"kw\">|<\/span> <span class=\"fu\">tr<\/span> -d <span class=\"st\">&quot; &quot;<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> -n<\/span><\/code><\/pre><\/div>\n<p>Using this one-liner we cut out the value that specifies the duration of a request from the log file. We use the percent sign of the Deflate value as an anchor for a simple regular expression and take the number following it. <em>egrep<\/em> makes sense here, because we want to work with regex, the <em>-o<\/em> option results in only the match itself being output, not the entire line. This is very helpful. One detail that will help us to avoid errors in the future is the space following the plus sign. It only accepts values that have a space following the number. The problem is the user agent that also appears in our log format and which has up to now also included percent signs. We assume here that percent signs can be followed by a space and a whole number. But this is not followed by another space and this combination only appears at the end of the log file line after the <em>Deflate space savings<\/em> percent sign. We then use <em>cut<\/em> so that only the third and subsequent characters are output and finally we use <em>tr<\/em> to separate the closing space (see regex). We are then ready for numerical sorting. This delivers the following result for me (your mileage will vary):<\/p>\n<div class=\"sourceCode\" id=\"cb16\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb16-1\"><a href=\"#cb16-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb16-2\"><a href=\"#cb16-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">354<\/span><\/span>\n<span id=\"cb16-3\"><a href=\"#cb16-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">355<\/span><\/span>\n<span id=\"cb16-4\"><a href=\"#cb16-4\" aria-hidden=\"true\"><\/a><span class=\"ex\">357<\/span><\/span>\n<span id=\"cb16-5\"><a href=\"#cb16-5\" aria-hidden=\"true\"><\/a><span class=\"ex\">363<\/span><\/span>\n<span id=\"cb16-6\"><a href=\"#cb16-6\" aria-hidden=\"true\"><\/a><span class=\"ex\">363<\/span><\/span>\n<span id=\"cb16-7\"><a href=\"#cb16-7\" aria-hidden=\"true\"><\/a><span class=\"ex\">363<\/span><\/span>\n<span id=\"cb16-8\"><a href=\"#cb16-8\" aria-hidden=\"true\"><\/a><span class=\"ex\">1740<\/span><\/span><\/code><\/pre><\/div>\n<p>In our example, almost all of the requests have been handled very fast. Yet, there is a single one with a duration of over 1,000 microseconds, or more than one millisecond. This is still within reason, but interesting to see how this request is setting itself apart from the other values as a statistical outlier.<\/p>\n<p>We know that we made 100 GET and 100 POST requests. But for the sake of practice, let\u2019s count them again:<\/p>\n<div class=\"sourceCode\" id=\"cb17\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb17-1\"><a href=\"#cb17-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">egrep<\/span> -c <span class=\"st\">&quot;<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">GET &quot;<\/span> logs\/access.log <\/span><\/code><\/pre><\/div>\n<p>This should result in at least 100 GET requests. More if you did not start with an empty file:<\/p>\n<div class=\"sourceCode\" id=\"cb18\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb18-1\"><a href=\"#cb18-1\" aria-hidden=\"true\"><\/a><span class=\"ex\">100<\/span><\/span><\/code><\/pre><\/div>\n<p>We can also compare GET and POST with one another. We do this as follows:<\/p>\n<div class=\"sourceCode\" id=\"cb19\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb19-1\"><a href=\"#cb19-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">egrep<\/span> -o <span class=\"st\">&#39;&quot;(GET|POST)&#39;<\/span> logs\/access.log <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -b2- <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> -c<\/span><\/code><\/pre><\/div>\n<p>Here, we filter out the GET and the POST requests using the method that follows a quote mark. We then cut out the quote mark, sort and count grouped:<\/p>\n<div class=\"sourceCode\" id=\"cb20\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb20-1\"><a href=\"#cb20-1\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">100<\/span> GET <\/span>\n<span id=\"cb20-2\"><a href=\"#cb20-2\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">100<\/span> POST <\/span><\/code><\/pre><\/div>\n<p>So much for these first finger exercises. On the basis of this self-filled log file this is unfortunately not yet very exciting. So let\u2019s try it with a real log file from a production server.<\/p>\n<h3 id=\"step-9-greater-in-depth-analysis-of-an-example-log-file\">Step 9: Greater in-depth analysis of an example log file<\/h3>\n<p>Analyses using a real log file from a production server are much more exciting. Here\u2019s one with 10,000 requests:<\/p>\n<p><a href=\"https:\/\/www.netnea.com\/files\/tutorial-5-example-access.log\">tutorial-5-example-access.log<\/a><\/p>\n<div class=\"sourceCode\" id=\"cb21\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb21-1\"><a href=\"#cb21-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">head<\/span> tutorial-5-example-access.log<\/span>\n<span id=\"cb21-2\"><a href=\"#cb21-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:44.365656] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-3\"><a href=\"#cb21-3\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/ HTTP\/1.1&quot;<\/span> 200 10273 \u2026<\/span>\n<span id=\"cb21-4\"><a href=\"#cb21-4\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.google.com\/&quot;<\/span> <span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, \u2026<\/span><\/span>\n<span id=\"cb21-5\"><a href=\"#cb21-5\" aria-hidden=\"true\"><\/a><span class=\"st\">like Gecko) Chrome\/71.0.3578.98 Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">13258<\/span> www.example.com 192.168.3.7 443 \u2026<\/span>\n<span id=\"cb21-6\"><a href=\"#cb21-6\" aria-hidden=\"true\"><\/a><span class=\"ex\">redirect-handler<\/span> - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAMCoAwcAAAxAiBgAAAAA TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 \u2026<\/span>\n<span id=\"cb21-7\"><a href=\"#cb21-7\" aria-hidden=\"true\"><\/a><span class=\"ex\">776<\/span> 14527 -% 360398 2128 0 0 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-8\"><a href=\"#cb21-8\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:44.943942] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-9\"><a href=\"#cb21-9\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr\/assets\/shared\/fonts\/fa\/css\/fontawesome-all.min.css?ver=4.1.12 \u2026<\/span><\/span>\n<span id=\"cb21-10\"><a href=\"#cb21-10\" aria-hidden=\"true\"><\/a><span class=\"st\">HTTP\/1.1&quot;<\/span> 200 7439 \u2026<\/span>\n<span id=\"cb21-11\"><a href=\"#cb21-11\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-12\"><a href=\"#cb21-12\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-13\"><a href=\"#cb21-13\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">13258<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAMCoAwcAAAxAiBkAAAAA \u2026<\/span>\n<span id=\"cb21-14\"><a href=\"#cb21-14\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 507 7950 -% 2509 1039 0 164 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-15\"><a href=\"#cb21-15\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:44.947470] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-16\"><a href=\"#cb21-16\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr\/inc\/assets\/css\/tc_common.min.css?ver=4.1.12 HTTP\/1.1&quot;<\/span> 200 28225 \u2026<\/span>\n<span id=\"cb21-17\"><a href=\"#cb21-17\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-18\"><a href=\"#cb21-18\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-19\"><a href=\"#cb21-19\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">32554<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAMCoAwcAAAxDkkUAAAAD \u2026<\/span>\n<span id=\"cb21-20\"><a href=\"#cb21-20\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 754 32406 -% 6714 1338 0 168 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-21\"><a href=\"#cb21-21\" aria-hidden=\"true\"><\/a><span class=\"ex\">212.147.59.0<\/span> CH - [2019-01-21 23:51:44.849966] <span class=\"st\">&quot;GET \/cds\/ HTTP\/1.1&quot;<\/span> 200 31332 <span class=\"st\">&quot;-&quot;<\/span> <span class=\"st\">&quot;check_http\/v2.1.4 \u2026<\/span><\/span>\n<span id=\"cb21-22\"><a href=\"#cb21-22\" aria-hidden=\"true\"><\/a><span class=\"st\">(nagios-plugins 2.1.4)&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> 16291 www.example.com 192.168.3.7 443 application\/x-httpd-php - - \u2026<\/span>\n<span id=\"cb21-23\"><a href=\"#cb21-23\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;ReqID--&quot;<\/span> <span class=\"ex\">XEZNAMCoAwcAAAxEZFIAAAAE<\/span> TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 586 35548 -% 144762 1180 0 \u2026<\/span>\n<span id=\"cb21-24\"><a href=\"#cb21-24\" aria-hidden=\"true\"><\/a><span class=\"ex\">13771<\/span> 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-25\"><a href=\"#cb21-25\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.143819] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-26\"><a href=\"#cb21-26\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr-child\/inc\/assets\/css\/neagreen.min.css?ver=4.1.12 HTTP\/1.1&quot;<\/span> 200 2458 \u2026<\/span>\n<span id=\"cb21-27\"><a href=\"#cb21-27\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-28\"><a href=\"#cb21-28\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-29\"><a href=\"#cb21-29\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">13258<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAxAiBoAAAAA \u2026<\/span>\n<span id=\"cb21-30\"><a href=\"#cb21-30\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 494 2969 -% 1946 1070 0 141 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-31\"><a href=\"#cb21-31\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.269615] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-32\"><a href=\"#cb21-32\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr-child\/style.css?ver=4.1.12 HTTP\/1.1&quot;<\/span> 200 483 \u2026<\/span>\n<span id=\"cb21-33\"><a href=\"#cb21-33\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-34\"><a href=\"#cb21-34\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-35\"><a href=\"#cb21-35\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">22078<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAxBiDkAAAAB \u2026<\/span>\n<span id=\"cb21-36\"><a href=\"#cb21-36\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 737 4544 -% 4237 2221 0 241 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-37\"><a href=\"#cb21-37\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.309680] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-38\"><a href=\"#cb21-38\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr\/assets\/front\/js\/libs\/fancybox\/jquery.fancybox-1.3.4.min.css?ver=4.9.8 \u2026<\/span><\/span>\n<span id=\"cb21-39\"><a href=\"#cb21-39\" aria-hidden=\"true\"><\/a><span class=\"st\">HTTP\/1.1&quot;<\/span> 200 981 \u2026<\/span>\n<span id=\"cb21-40\"><a href=\"#cb21-40\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-41\"><a href=\"#cb21-41\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-42\"><a href=\"#cb21-42\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">32554<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAxDkkYAAAAD \u2026<\/span>\n<span id=\"cb21-43\"><a href=\"#cb21-43\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 515 1461 -% 2830 1656 0 200 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-44\"><a href=\"#cb21-44\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.467686] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-45\"><a href=\"#cb21-45\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/includes\/js\/jquery\/jquery-migrate.min.js?ver=1.4.1 HTTP\/1.1&quot;<\/span> 200 4014 \u2026<\/span>\n<span id=\"cb21-46\"><a href=\"#cb21-46\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-47\"><a href=\"#cb21-47\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-48\"><a href=\"#cb21-48\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">35193<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAyCROcAAAAF \u2026<\/span>\n<span id=\"cb21-49\"><a href=\"#cb21-49\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 721 8120 -% 3238 1536 0 142 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-50\"><a href=\"#cb21-50\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.469159] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-51\"><a href=\"#cb21-51\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr\/assets\/front\/js\/libs\/fancybox\/jquery.fancybox-1.3.4.min.js?ver=4.1.12 \u2026<\/span><\/span>\n<span id=\"cb21-52\"><a href=\"#cb21-52\" aria-hidden=\"true\"><\/a><span class=\"st\">HTTP\/1.1&quot;<\/span> 200 5209 \u2026<\/span>\n<span id=\"cb21-53\"><a href=\"#cb21-53\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-54\"><a href=\"#cb21-54\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-55\"><a href=\"#cb21-55\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">46005<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAxCwlQAAAAC \u2026<\/span>\n<span id=\"cb21-56\"><a href=\"#cb21-56\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 765 9315 -% 2938 1095 0 246 0-0-0-0 0-0-0-0 0 0<\/span>\n<span id=\"cb21-57\"><a href=\"#cb21-57\" aria-hidden=\"true\"><\/a><span class=\"ex\">192.31.242.0<\/span> US - [2019-01-21 23:51:45.469177] <span class=\"st\">&quot;GET \u2026<\/span><\/span>\n<span id=\"cb21-58\"><a href=\"#cb21-58\" aria-hidden=\"true\"><\/a><span class=\"st\">\/cds\/snippets\/themes\/customizr\/assets\/front\/js\/libs\/modernizr.min.js?ver=4.1.12 HTTP\/1.1&quot;<\/span> 200 5926 \u2026<\/span>\n<span id=\"cb21-59\"><a href=\"#cb21-59\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;https:\/\/www.example.com\/cds\/2016\/10\/16\/using-ansible-to-fetch-information-from-ios-devices\/&quot;<\/span> \u2026<\/span>\n<span id=\"cb21-60\"><a href=\"#cb21-60\" aria-hidden=\"true\"><\/a><span class=\"st\">&quot;Mozilla\/5.0 (Windows NT 10.0; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/71.0.3578.98 \u2026<\/span><\/span>\n<span id=\"cb21-61\"><a href=\"#cb21-61\" aria-hidden=\"true\"><\/a><span class=\"st\">Safari\/537.36&quot;<\/span> <span class=\"st\">&quot;-&quot;<\/span> <span class=\"ex\">42316<\/span> www.example.com 192.168.3.7 443 - - + <span class=\"st\">&quot;ReqID--&quot;<\/span> XEZNAcCoAwcAAAyDuEIAAAAG \u2026<\/span>\n<span id=\"cb21-62\"><a href=\"#cb21-62\" aria-hidden=\"true\"><\/a><span class=\"ex\">TLSv1.2<\/span> ECDHE-RSA-AES256-GCM-SHA384 744 10032 -% 3022 1094 0 245 0-0-0-0 0-0-0-0 0 0<\/span><\/code><\/pre><\/div>\n<p>Let\u2019s have a look at the distribution of <em>GET<\/em> and <em>POST<\/em> requests here:<\/p>\n<div class=\"sourceCode\" id=\"cb22\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb22-1\"><a href=\"#cb22-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log  <span class=\"kw\">|<\/span> <span class=\"fu\">egrep<\/span> -o <span class=\"st\">&#39;&quot;(GET|POST)&#39;<\/span>  <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -b2- <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> -c<\/span>\n<span id=\"cb22-2\"><a href=\"#cb22-2\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">9466<\/span> GET<\/span>\n<span id=\"cb22-3\"><a href=\"#cb22-3\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">115<\/span> POST<\/span><\/code><\/pre><\/div>\n<p>This is a clear result. Do we actually see many errors? Or requests answered with an HTTP error code?<\/p>\n<div class=\"sourceCode\" id=\"cb23\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb23-1\"><a href=\"#cb23-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -d<span class=\"dt\">\\&quot;<\/span> -f3 <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -d<span class=\"dt\">\\ <\/span> -f2 <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> -c<\/span>\n<span id=\"cb23-2\"><a href=\"#cb23-2\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">9397<\/span> 200<\/span>\n<span id=\"cb23-3\"><a href=\"#cb23-3\" aria-hidden=\"true\"><\/a>      <span class=\"ex\">3<\/span> 206<\/span>\n<span id=\"cb23-4\"><a href=\"#cb23-4\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">233<\/span> 301<\/span>\n<span id=\"cb23-5\"><a href=\"#cb23-5\" aria-hidden=\"true\"><\/a>     <span class=\"ex\">67<\/span> 304<\/span>\n<span id=\"cb23-6\"><a href=\"#cb23-6\" aria-hidden=\"true\"><\/a>      <span class=\"ex\">9<\/span> 400<\/span>\n<span id=\"cb23-7\"><a href=\"#cb23-7\" aria-hidden=\"true\"><\/a>     <span class=\"ex\">58<\/span> 403<\/span>\n<span id=\"cb23-8\"><a href=\"#cb23-8\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">233<\/span> 404<\/span><\/code><\/pre><\/div>\n<p>Besides the nine requests with the \u201c400 Bad Request\u201d HTTP response there is a large number of 404s (\u201c404 Not Found\u201d). HTTP status 400 means a protocol error. As is commonly known, 404 is a page not found. This is where we should have a look at the permissions. But before we continue, a note about the request using the <em>cut<\/em> command. We have subdivided the log line using the <em>\u201d<\/em>-delimiter, extracted the third field with this subdivision and then further subdivided the content, but this time with a space (note the __ character) as the delimiter and extracted the second field, which is now the status. Afterwards this was sorted and the <em>uniq function<\/em> used in count mode. We will be seeing that this type of access to the data is a pattern that repeats itself. Let\u2019s take a closer look at the log file.<\/p>\n<p>Further above we discussed encryption protocols and how their analyses was a foundation for deciding on an appropriate reaction to the <em>POODLE<\/em> vulnerability. In practice, which encryption protocols are actually on the server since then:<\/p>\n<div class=\"sourceCode\" id=\"cb24\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb24-1\"><a href=\"#cb24-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -d<span class=\"dt\">\\&quot;<\/span> -f11 <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -d<span class=\"dt\">\\ <\/span> -f3 <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> -c <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> -n<\/span>\n<span id=\"cb24-2\"><a href=\"#cb24-2\" aria-hidden=\"true\"><\/a>      <span class=\"ex\">4<\/span> -<\/span>\n<span id=\"cb24-3\"><a href=\"#cb24-3\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">155<\/span> TLSv1<\/span>\n<span id=\"cb24-4\"><a href=\"#cb24-4\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">9841<\/span> TLSv1.2<\/span><\/code><\/pre><\/div>\n<p>It appears that Apache is not always recording an encryption protocol. This is a bit strange, but because it is a very rare case, we won\u2019t be pursuing it for the moment. What\u2019s more important are the numerical ratios between the TLS protocols. After disabling <em>SSLv3<\/em>, the <em>TLSv1.2<\/em> protocol is dominant and <em>TLSv1.0<\/em> is slowly being phased out.<\/p>\n<p>We again got to the desired result by a series of <em>cut<\/em> commands. It would actually be advisable to take note of these commands, because will be needing them again and again. It would then be an alias list as follows:<\/p>\n<div class=\"sourceCode\" id=\"cb25\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb25-1\"><a href=\"#cb25-1\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alip=<span class=\"st\">&#39;cut -d\\  -f1&#39;<\/span><\/span>\n<span id=\"cb25-2\"><a href=\"#cb25-2\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alcountry=<span class=\"st\">&#39;cut -d\\  -f2&#39;<\/span><\/span>\n<span id=\"cb25-3\"><a href=\"#cb25-3\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> aluser=<span class=\"st\">&#39;cut -d\\  -f3&#39;<\/span><\/span>\n<span id=\"cb25-4\"><a href=\"#cb25-4\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> altimestamp=<span class=\"st\">&#39;cut -d\\  -f4,5 | tr -d &quot;[]&quot;&#39;<\/span><\/span>\n<span id=\"cb25-5\"><a href=\"#cb25-5\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alrequestline=<span class=\"st\">&#39;cut -d\\&quot; -f2&#39;<\/span><\/span>\n<span id=\"cb25-6\"><a href=\"#cb25-6\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> almethod=<span class=\"st\">&#39;cut -d\\&quot; -f2 | cut -d\\  -f1 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-7\"><a href=\"#cb25-7\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> aluri=<span class=\"st\">&#39;cut -d\\&quot; -f2 | cut -d\\  -f2 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-8\"><a href=\"#cb25-8\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alprotocol=<span class=\"st\">&#39;cut -d\\&quot; -f2 | cut -d\\  -f3 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-9\"><a href=\"#cb25-9\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alstatus=<span class=\"st\">&#39;cut -d\\&quot; -f3 | cut -d\\  -f2&#39;<\/span><\/span>\n<span id=\"cb25-10\"><a href=\"#cb25-10\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alresponsebodysize=<span class=\"st\">&#39;cut -d\\&quot; -f3 | cut -d\\  -f3&#39;<\/span><\/span>\n<span id=\"cb25-11\"><a href=\"#cb25-11\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alreferer=<span class=\"st\">&#39;cut -d\\&quot; -f4 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-12\"><a href=\"#cb25-12\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alreferrer=<span class=\"st\">&#39;cut -d\\&quot; -f4 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-13\"><a href=\"#cb25-13\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> aluseragent=<span class=\"st\">&#39;cut -d\\&quot; -f6 | sed &quot;s\/^-$\/**NONE**\/&quot;&#39;<\/span><\/span>\n<span id=\"cb25-14\"><a href=\"#cb25-14\" aria-hidden=\"true\"><\/a><span class=\"bu\">alias<\/span> alcontenttype=<span class=\"st\">&#39;cut -d\\&quot; -f8&#39;<\/span><\/span>\n<span id=\"cb25-15\"><a href=\"#cb25-15\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span><\/code><\/pre><\/div>\n<p>All of the aliases begin with <em>al<\/em>. This stands for <em>ApacheLog<\/em> or <em>AccessLog<\/em>. This is followed by the field name. The individual aliases are not sorted alphabetically. They instead follow the sequence of the fields in the format of the log file.<\/p>\n<p>This list with alias definitions is available in the file <a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/.apache-modsec.alias\">.apache-modsec.alias<\/a>. They have been put together there with a few additional aliases that we will be defining in subsequent tutorials. If you often work with Apache and its log files, then it is advisable to place these alias definitions in the home directory and to load them when logging in. By using the following entry in the <em>.bashrc<\/em> file or via another related mechanism. (The entry also has a 2nd functionality: it adds the <code>bin<\/code> subfolder of your home folder to the PATH if it is not there already. This is often not the case and we are using this folder to place additional custom scripts that play with the apache-modsec file. So it\u2019s a good moment to prepare this immediately.)<\/p>\n<div class=\"sourceCode\" id=\"cb26\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb26-1\"><a href=\"#cb26-1\" aria-hidden=\"true\"><\/a><span class=\"co\"># Load apache \/ modsecurity aliases if file exists<\/span><\/span>\n<span id=\"cb26-2\"><a href=\"#cb26-2\" aria-hidden=\"true\"><\/a><span class=\"bu\">test<\/span> -e <span class=\"va\">$HOME<\/span>\/.apache-modsec.alias <span class=\"kw\">&amp;&amp;<\/span> <span class=\"bu\">.<\/span> <span class=\"va\">$HOME<\/span><span class=\"ex\">\/.apache-modsec.alias<\/span><\/span>\n<span id=\"cb26-3\"><a href=\"#cb26-3\" aria-hidden=\"true\"><\/a><\/span>\n<span id=\"cb26-4\"><a href=\"#cb26-4\" aria-hidden=\"true\"><\/a><span class=\"co\"># Add $HOME\/bin to PATH<\/span><\/span>\n<span id=\"cb26-5\"><a href=\"#cb26-5\" aria-hidden=\"true\"><\/a><span class=\"kw\">[[<\/span> <span class=\"st\">&quot;:<\/span><span class=\"va\">$PATH<\/span><span class=\"st\">:&quot;<\/span> <span class=\"ot\">!=<\/span> *<span class=\"st\">&quot;:<\/span><span class=\"va\">$HOME<\/span><span class=\"st\">\/bin:&quot;<\/span>*<span class=\"kw\"> ]]<\/span> <span class=\"kw\">&amp;&amp;<\/span> <span class=\"va\">PATH=<\/span><span class=\"st\">&quot;<\/span><span class=\"va\">$HOME<\/span><span class=\"st\">\/bin:<\/span><span class=\"va\">${PATH}<\/span><span class=\"st\">&quot;<\/span><\/span><\/code><\/pre><\/div>\n<p>Restart the shell by loggin in anew and then let\u2019s use the new alias right away:<\/p>\n<div class=\"sourceCode\" id=\"cb27\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb27-1\"><a href=\"#cb27-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"bu\">cd<\/span> \/apache\/logs<\/span>\n<span id=\"cb27-2\"><a href=\"#cb27-2\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alsslprotocol<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> -c <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> -n<\/span>\n<span id=\"cb27-3\"><a href=\"#cb27-3\" aria-hidden=\"true\"><\/a>      <span class=\"ex\">4<\/span> -<\/span>\n<span id=\"cb27-4\"><a href=\"#cb27-4\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">155<\/span> TLSv1<\/span>\n<span id=\"cb27-5\"><a href=\"#cb27-5\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">9841<\/span> TLSv1.2<\/span><\/code><\/pre><\/div>\n<p>This is a bit easier. But the repeated typing of <em>sort<\/em> followed by <em>uniq -c<\/em> and then a numerical <em>sort<\/em> yet again is tiresome. Because it is again a repeating pattern, an alias is also worthwhile here, which can be abbreviated to <em>sucs<\/em>: a merger of the beginning letters and the <em>c<\/em> from <em>uniq -c<\/em>.<\/p>\n<div class=\"sourceCode\" id=\"cb28\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb28-1\"><a href=\"#cb28-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"bu\">alias<\/span> sucs=<span class=\"st\">&#39;sort | uniq -c | sort -n&#39;<\/span><\/span><\/code><\/pre><\/div>\n<p>This then enables us to do the following:<\/p>\n<div class=\"sourceCode\" id=\"cb29\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb29-1\"><a href=\"#cb29-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alsslprotocol<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">sucs<\/span><\/span>\n<span id=\"cb29-2\"><a href=\"#cb29-2\" aria-hidden=\"true\"><\/a>      <span class=\"ex\">4<\/span> -<\/span>\n<span id=\"cb29-3\"><a href=\"#cb29-3\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">155<\/span> TLSv1<\/span>\n<span id=\"cb29-4\"><a href=\"#cb29-4\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">9841<\/span> TLSv1.2<\/span><\/code><\/pre><\/div>\n<p>This is now a simple command that is easy to remember and easy to write. We now have a look at the numerical ratio of 1764 to 8150. We have a total of exactly 10,000 requests; the percentage values can be derived by looking at it. In practice however log files may not counted so easily, we will thus be needing help calculating the percentages.<\/p>\n<h3 id=\"step-10-analyses-using-percentages-and-simple-statistics\">Step 10: Analyses using percentages and simple statistics<\/h3>\n<p>What we are lacking is a command that works similar to the <em>sucs<\/em> alias, but converts the number values into percentages in the same pass: <em>sucspercent<\/em>. It\u2019s important to know that this script is based on an expanded <em>awk<\/em> implementation (yes, there are several). The package is normally named <em>gawk<\/em> and it makes sure that the <code>awk<\/code> command uses the Gnu awk implementation.<\/p>\n<div class=\"sourceCode\" id=\"cb30\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb30-1\"><a href=\"#cb30-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"bu\">alias<\/span> sucspercent=<span class=\"st\">&#39;sort | uniq -c | sort -n | $HOME\/bin\/percent.awk&#39;<\/span><\/span><\/code><\/pre><\/div>\n<p>Traditionally, <em>awk<\/em> is used for quick calculations in Linux. In addition to the above linked <em>alias<\/em> file, which also includes the <em>sucspercent<\/em>, the <em>awk<\/em> script <em>percent.awk<\/em> is also available. It is ideally placed in the <em>bin<\/em> directory of your home directory. The <em>sucspercent<\/em> alias above then assumes this setup. The <em>awk<\/em> script is available via this link: <a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/percent.awk\">percent.awk<\/a>. Please make sure it is executable or you will get a permission denied.<\/p>\n<div class=\"sourceCode\" id=\"cb31\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb31-1\"><a href=\"#cb31-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alsslprotocol<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">sucspercent<\/span> <\/span>\n<span id=\"cb31-2\"><a href=\"#cb31-2\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Entry<\/span>        Count Percent<\/span>\n<span id=\"cb31-3\"><a href=\"#cb31-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb31-4\"><a href=\"#cb31-4\" aria-hidden=\"true\"><\/a>                             <span class=\"ex\">-<\/span>            4   0.04%<\/span>\n<span id=\"cb31-5\"><a href=\"#cb31-5\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">TLSv1<\/span>          155   1.55%<\/span>\n<span id=\"cb31-6\"><a href=\"#cb31-6\" aria-hidden=\"true\"><\/a>                       <span class=\"ex\">TLSv1.2<\/span>        9,841  98.41%<\/span>\n<span id=\"cb31-7\"><a href=\"#cb31-7\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb31-8\"><a href=\"#cb31-8\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Total<\/span>        10000 100.00%<\/span><\/code><\/pre><\/div>\n<p>Wonderful. We are now able to output the numerical ratios for any repeating values. How does it look, for example, with the encryption method used?<\/p>\n<div class=\"sourceCode\" id=\"cb32\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb32-1\"><a href=\"#cb32-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alsslcipher<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">sucspercent<\/span> <\/span>\n<span id=\"cb32-2\"><a href=\"#cb32-2\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Entry<\/span>        Count Percent<\/span>\n<span id=\"cb32-3\"><a href=\"#cb32-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb32-4\"><a href=\"#cb32-4\" aria-hidden=\"true\"><\/a>                             <span class=\"ex\">-<\/span>            4   0.04%<\/span>\n<span id=\"cb32-5\"><a href=\"#cb32-5\" aria-hidden=\"true\"><\/a>                    <span class=\"ex\">AES256-SHA<\/span>           23   0.23%<\/span>\n<span id=\"cb32-6\"><a href=\"#cb32-6\" aria-hidden=\"true\"><\/a>     <span class=\"ex\">DHE-RSA-AES256-GCM-SHA384<\/span>           42   0.42%<\/span>\n<span id=\"cb32-7\"><a href=\"#cb32-7\" aria-hidden=\"true\"><\/a>       <span class=\"ex\">ECDHE-RSA-AES128-SHA256<\/span>           50   0.50%<\/span>\n<span id=\"cb32-8\"><a href=\"#cb32-8\" aria-hidden=\"true\"><\/a>          <span class=\"ex\">ECDHE-RSA-AES256-SHA<\/span>          156   1.56%<\/span>\n<span id=\"cb32-9\"><a href=\"#cb32-9\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">ECDHE-RSA-AES128-GCM-SHA256<\/span>          171   1.71%<\/span>\n<span id=\"cb32-10\"><a href=\"#cb32-10\" aria-hidden=\"true\"><\/a>       <span class=\"ex\">ECDHE-RSA-AES256-SHA384<\/span>          234   2.34%<\/span>\n<span id=\"cb32-11\"><a href=\"#cb32-11\" aria-hidden=\"true\"><\/a>   <span class=\"ex\">ECDHE-RSA-AES256-GCM-SHA384<\/span>        9,320  93.20%<\/span>\n<span id=\"cb32-12\"><a href=\"#cb32-12\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb32-13\"><a href=\"#cb32-13\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Total<\/span>        10000 100.00%<\/span><\/code><\/pre><\/div>\n<p>A good overview on the fly. We can be satisfied with this for the moment. Is there anything to say about the HTTP protocol versions?<\/p>\n<div class=\"sourceCode\" id=\"cb33\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb33-1\"><a href=\"#cb33-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alprotocol<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">sucspercent<\/span> <\/span>\n<span id=\"cb33-2\"><a href=\"#cb33-2\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Entry<\/span>        Count Percent<\/span>\n<span id=\"cb33-3\"><a href=\"#cb33-3\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb33-4\"><a href=\"#cb33-4\" aria-hidden=\"true\"><\/a>                      <span class=\"ex\">HTTP\/1.0<\/span>           66   0.66%<\/span>\n<span id=\"cb33-5\"><a href=\"#cb33-5\" aria-hidden=\"true\"><\/a>                      <span class=\"ex\">HTTP\/1.1<\/span>        9,934  99.34%<\/span>\n<span id=\"cb33-6\"><a href=\"#cb33-6\" aria-hidden=\"true\"><\/a><span class=\"ex\">---------------------------------------------------<\/span><\/span>\n<span id=\"cb33-7\"><a href=\"#cb33-7\" aria-hidden=\"true\"><\/a>                         <span class=\"ex\">Total<\/span>        10000 100.00%<\/span><\/code><\/pre><\/div>\n<p>The obsolete <em>HTTP\/1.0<\/em> still appears, but <em>HTTP\/1.1<\/em> is clearly dominant.<\/p>\n<p>With the different aliases for the extraction of values from the log file and the two <em>sucs<\/em> and <em>sucspercent<\/em> aliases we have come up with a handy tool enabling us to simply answer questions about the relative frequency of repeating values using the same pattern of commands.<\/p>\n<p>For measurements that no longer repeat, such as the duration of a request or the size of the response, these percentages are not very useful. What we need is a simple statistical analysis. What are needed are the mean, perhaps the median, information about the outliers and, for logical reasons, the standard deviation.<\/p>\n<p>Such a script is also available for download: <a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/basicstats.awk\">basicstats.awk<\/a>. Similar to percent.awk, it is advisable to place this script in your private <em>bin<\/em> directory.<\/p>\n<div class=\"sourceCode\" id=\"cb34\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb34-1\"><a href=\"#cb34-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alioout<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">basicstats.awk<\/span><\/span>\n<span id=\"cb34-2\"><a href=\"#cb34-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">Num<\/span> of values:          10,000.00<\/span>\n<span id=\"cb34-3\"><a href=\"#cb34-3\" aria-hidden=\"true\"><\/a>         <span class=\"ex\">Mean<\/span>:          19,360.91<\/span>\n<span id=\"cb34-4\"><a href=\"#cb34-4\" aria-hidden=\"true\"><\/a>       <span class=\"ex\">Median<\/span>:           7,942.00<\/span>\n<span id=\"cb34-5\"><a href=\"#cb34-5\" aria-hidden=\"true\"><\/a>          <span class=\"ex\">Min<\/span>:               0.00<\/span>\n<span id=\"cb34-6\"><a href=\"#cb34-6\" aria-hidden=\"true\"><\/a>          <span class=\"ex\">Max<\/span>:       3,920,007.00<\/span>\n<span id=\"cb34-7\"><a href=\"#cb34-7\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">Range<\/span>:       3,920,007.00<\/span>\n<span id=\"cb34-8\"><a href=\"#cb34-8\" aria-hidden=\"true\"><\/a><span class=\"ex\">Std<\/span> deviation:          52,480.41<\/span><\/code><\/pre><\/div>\n<p>These numbers give a clear picture of the service. With a mean response size of 19 KB and a median of 7.9 KB we have a typical web service. Specifically, the median means that half of the responses were smaller than 7.9 KB. The largest response came in at almost 4 MB, the standard deviation of just over 52 KB means that the large values were less frequent overall.<\/p>\n<p>How does the duration of the requests look? Do we have a similar homogeneous picture?<\/p>\n<div class=\"sourceCode\" id=\"cb35\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb35-1\"><a href=\"#cb35-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alduration<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">basicstats.awk<\/span><\/span>\n<span id=\"cb35-2\"><a href=\"#cb35-2\" aria-hidden=\"true\"><\/a><span class=\"ex\">Num<\/span> of values:          10,000.00<\/span>\n<span id=\"cb35-3\"><a href=\"#cb35-3\" aria-hidden=\"true\"><\/a>         <span class=\"ex\">Mean<\/span>:          74,852.49<\/span>\n<span id=\"cb35-4\"><a href=\"#cb35-4\" aria-hidden=\"true\"><\/a>       <span class=\"ex\">Median<\/span>:           3,684.50<\/span>\n<span id=\"cb35-5\"><a href=\"#cb35-5\" aria-hidden=\"true\"><\/a>          <span class=\"ex\">Min<\/span>:             641.00<\/span>\n<span id=\"cb35-6\"><a href=\"#cb35-6\" aria-hidden=\"true\"><\/a>          <span class=\"ex\">Max<\/span>:      31,360,516.00<\/span>\n<span id=\"cb35-7\"><a href=\"#cb35-7\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">Range<\/span>:      31,359,875.00<\/span>\n<span id=\"cb35-8\"><a href=\"#cb35-8\" aria-hidden=\"true\"><\/a><span class=\"ex\">Std<\/span> deviation:         695,283.17<\/span><\/code><\/pre><\/div>\n<p>It\u2019s important to remember that we are dealing in microseconds here. The median was 3,684 microseconds, which is just over 3 milliseconds. At 74 milliseconds, the mean is much larger. We obviously have a lot of outliers which have pushed up the mean. In fact, we have a maximum value of 31 seconds and less surprisingly a standard deviation of 695 milliseconds. The picture is thus less homogeneous and we have at least some requests that should be investigated. But this is now getting a bit more complicated. The suggested method is only one of many possible and is included here as a suggestion and inspiration for further work with the log file:<\/p>\n<div class=\"sourceCode\" id=\"cb36\"><pre class=\"sourceCode bash\"><code class=\"sourceCode bash\"><span id=\"cb36-1\"><a href=\"#cb36-1\" aria-hidden=\"true\"><\/a>$<span class=\"op\">&gt;<\/span> <span class=\"fu\">cat<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"fu\">grep<\/span> <span class=\"st\">&quot;<\/span><span class=\"dt\">\\&quot;<\/span><span class=\"st\">GET &quot;<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">aluri<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">cut<\/span> -d\\\/ -f1,2,3 <span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> <span class=\"kw\">|<\/span> <span class=\"fu\">uniq<\/span> <span class=\"kw\">\\<\/span><\/span>\n<span id=\"cb36-2\"><a href=\"#cb36-2\" aria-hidden=\"true\"><\/a><span class=\"kw\">|<\/span> <span class=\"kw\">while<\/span> <span class=\"bu\">read<\/span> <span class=\"va\">P<\/span>; <span class=\"kw\">do<\/span> <span class=\"va\">MEAN=$(<\/span><span class=\"fu\">grep<\/span> <span class=\"st\">&quot;GET <\/span><span class=\"va\">$P<\/span><span class=\"st\">&quot;<\/span> tutorial-5-example-access.log <span class=\"kw\">|<\/span> <span class=\"ex\">alduration<\/span> <span class=\"kw\">|<\/span> <span class=\"ex\">basicstats.awk<\/span> <span class=\"kw\">\\<\/span><\/span>\n<span id=\"cb36-3\"><a href=\"#cb36-3\" aria-hidden=\"true\"><\/a><span class=\"kw\">|<\/span> <span class=\"fu\">grep<\/span> Mean <span class=\"kw\">|<\/span> <span class=\"fu\">sed<\/span> <span class=\"st\">&#39;s\/.*: \/\/&#39;<\/span><span class=\"va\">)<\/span>; <span class=\"bu\">echo<\/span> <span class=\"st\">&quot;<\/span><span class=\"va\">$MEAN<\/span><span class=\"st\"> <\/span><span class=\"va\">$P<\/span><span class=\"st\">&quot;<\/span><span class=\"kw\">;<\/span> <span class=\"kw\">done<\/span> <span class=\"kw\">\\<\/span><\/span>\n<span id=\"cb36-4\"><a href=\"#cb36-4\" aria-hidden=\"true\"><\/a><span class=\"kw\">|<\/span> <span class=\"fu\">sort<\/span> -n<\/span>\n<span id=\"cb36-5\"><a href=\"#cb36-5\" aria-hidden=\"true\"><\/a><span class=\"ex\">...<\/span><\/span>\n<span id=\"cb36-6\"><a href=\"#cb36-6\" aria-hidden=\"true\"><\/a>    <span class=\"ex\">...<\/span><\/span>\n<span id=\"cb36-7\"><a href=\"#cb36-7\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">122<\/span>,309.45 \/cds\/holiday-planner-download<\/span>\n<span id=\"cb36-8\"><a href=\"#cb36-8\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">124<\/span>,395.18 \/cds\/tools-inventory<\/span>\n<span id=\"cb36-9\"><a href=\"#cb36-9\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">137<\/span>,230.18 \/cds\/weather-app<\/span>\n<span id=\"cb36-10\"><a href=\"#cb36-10\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">143<\/span>,830.30 \/cds\/2016<\/span>\n<span id=\"cb36-11\"><a href=\"#cb36-11\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">146<\/span>,114.83 \/cds\/2015<\/span>\n<span id=\"cb36-12\"><a href=\"#cb36-12\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">163<\/span>,269.89 \/cds\/category<\/span>\n<span id=\"cb36-13\"><a href=\"#cb36-13\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">216<\/span>,229.88 \/cds\/tutorials<\/span>\n<span id=\"cb36-14\"><a href=\"#cb36-14\" aria-hidden=\"true\"><\/a>        <span class=\"ex\">576<\/span>,129.71 \/storage\/static<\/span><\/code><\/pre><\/div>\n<p>What happens here in order? We use <em>grep<\/em> to filter <em>GET<\/em> requests. We extract the <em>URI<\/em> and use <em>cut<\/em> to cut it. We are only interested in the first part of the path. We limit ourselves here in order to get a reasonable grouping, because too many different paths will add little value. The path list we get is then sorted alphabetically and reduced by using <em>uniq<\/em>. This is half the work.<\/p>\n<p>We now sequentially place the paths into variable <em>P<\/em> and use <em>while<\/em> to make a loop. In the loop we calculate the basic statistics for the path saved in <em>P<\/em> and filter the output for the mean. In doing so, we use <em>sed<\/em> to filter in such a way that the <em>MEAN variable includes only a number and not the <em>Mean<\/em> name itself. We now output this average value and the path names. End of the loop. Last, but not least, we sort everything numerically and get an overview of which paths resulted in requests with longer response times. A path named <\/em>\/storage\/static_ apparently comes out on top. The keyword <em>storage<\/em> makes this appear plausible.<\/p>\n<p>This brings us to the end of this tutorial. The goal was to introduce an expanded log format and to demonstrate working with the log files. In doing so, we repeatedly used a series of aliases and two <em>awk<\/em> scripts, which can be chained in different ways. With these tools and the necessary experience in their handling you will be able to quickly get at the information available in the log files.<\/p>\n<h3 id=\"references\">References<\/h3>\n<ul>\n<li><a href=\"http:\/\/httpd.apache.org\/docs\/current\/mod\/mod_log_config.html\">Apache Module mod_log_config documentation<\/a><\/li>\n<li><a href=\"http:\/\/httpd.apache.org\/docs\/current\/mod\/mod_ssl.html\">Apache Module mod_ssl documentation<\/a><\/li>\n<li><a href=\"https:\/\/www.netnea.com\/files\/tutorial-5-example-access.log\">tutorial-5-example-access.log<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/.apache-modsec.alias\">.apache-modsec.alias<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/percent.awk\">percent.awk<\/a><\/li>\n<li><a href=\"https:\/\/raw.githubusercontent.com\/Apache-Labor\/labor\/master\/bin\/basicstats.awk\">basicstats.awk<\/a><\/li>\n<\/ul>\n<h3 id=\"license-copying-further-use\">License \/ Copying \/ Further use<\/h3>\n<p><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-sa\/4.0\/80x15.png\" \/><\/a><br \/>This work is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License<\/a>.<\/p>\n<h5 id=\"changelog\">Changelog<\/h5>\n<ul>\n<li>2022-11-25: Updated link name in step 10<\/li>\n<li>2022-11-14: Minor fixes<\/li>\n<li>2020-07-22: Typo<\/li>\n<li>2020-02-04: Remark about making percent.awk executable<\/li>\n<li>2019-11-04: Replaced ~ with $HOME in .bashrc recipe<\/li>\n<li>2018-04-13: Update title format (markdown); rewordings (Simon Studer)<\/li>\n<li>2017-12-17: Fixed several links, replaced Average with Mean<\/li>\n<li>2017-03-04: Typo User-agent -&gt; User-Agent<\/li>\n<li>2017-02-25: Introducing CRS in Step 5<\/li>\n<li>2017-02-16: Reformatting<\/li>\n<li>2017-02-15: Updated curl call in loop from http to https<\/li>\n<li>2016-12-28: Bugfix with filename<\/li>\n<li>2016-12-04: Typo, reduce example output<\/li>\n<li>2016-10-10: Fixing small issues<\/li>\n<li>2016-07-15: Apache 2.4.20 -&gt; 2.4.23<\/li>\n<li>2016-04-18: Fixing small issues<\/li>\n<li>2016-03-10: Translated to English<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Extending and analyzing the access log What are we doing? We are defining a greatly extended log format in order to better monitor traffic. Why are we doing this? In the usual configuration of the Apache web server a log format is used that logs only the most necessary information about access from different clients. [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-948","page","type-page","status-publish","czr-hentry"],"_links":{"self":[{"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/pages\/948","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/comments?post=948"}],"version-history":[{"count":2,"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/pages\/948\/revisions"}],"predecessor-version":[{"id":2058,"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/pages\/948\/revisions\/2058"}],"wp:attachment":[{"href":"https:\/\/www.netnea.com\/cms\/wp-json\/wp\/v2\/media?parent=948"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}