Adjusting the Apache Log Format on Server 5 and Updating Bot Exclusions

Photo of Greg
Photo by Roberto Verzo - http://flic.kr/p/dRUwc3

The change to a proxied architecture in version 5 of the Server app brings with it a log format designed for virtual hosts in a proxy environment. Unfortunately, it fails to log the visitor’s IP address correctly and slightly complicates the task of excluding selected bot visits and loopback requests.

(Update, 25 March 2016: See “Server 5.1 Brings TLS 1.2 at Last” for details of log changes in Server 5.1.)

As I described in “Improve OS X Performance by Tweaking Apache Logging, And Options for Rotating Logs”, it’s straightforward to replace the Server app’s default common format logging with something more useful for virtual hosts, such as combinedvhost. As of version 5, the main Apache configuration includes an alternative which is intended to log the same information, but in a proxied environment, called combinedvhostproxy.

The main Apache configuration file, which lives at

/Library/Server/Web/Config/apache2/httpd_server_app.conf

now defines the following:

    LogFormat "%{last-x-forwarded-host}e %{last-x-forwarded-for}e %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedvhostproxy

However, as of Server 5.0.15, I haven’t been able to find any occasions where last-x-forwarded-host is actually populated correctly, except when testing connections to a local machine; in fact, the X-Forwarded-For header, which I’ll come to in just a moment, appears as if it is never being set at all on a remote server, as can be verified with with a simple PHP script to var_dump( $_SERVER ). This means that despite the log format’s having been intended to provide that information for virtual host logs, it doesn’t work. So, to preserve original visitor IP addresses in logs, we need to replace %{last-x-forwarded-for}e with %a, as follows:

    LogFormat "%{last-x-forwarded-host}e %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combinedvhostproxy

As to setting Apache to actually use the combinedvhostproxy format — rather than the default common or commonproxy — it’s now slightly trickier than I described in the earlier article, at least if we want to preserve flexibility to refrain from logging certain types of requests. It’s still necessary to make the changes directly in httpd_server_app.conf, where by default we’ll find:

    CustomLog "/var/log/apache2/access_log" common env=!forwarded
    CustomLog "/var/log/apache2/access_log" commonproxy env=forwarded

Those environment variables are set a few lines earlier with this:

    #X-Forwarded headers except for the last ',' delimited value can contain injected text.
    SetEnvIf X-Forwarded-For "\s*([^,]*$)" last-x-forwarded-for=$1
    SetEnvIf X-Forwarded-Host "\s*([^,]*$)" last-x-forwarded-host=$1
    SetEnvIf last-x-forwarded-host "\s*(.+)" forwarded

What’s happening here is that the configuration sets the forwarded environment variable when it detects that the request has been received via the proxy, triggering the commonproxy format when appropriate. This works fine, but it does mean that if we want to use any other environment variables to control logging — such as not logging internal dummy connections as described in the earlier article — we can’t do it with the default setup. In addition, it also means that if we ever do log any traffic without proxying, we’ll commingle two different formats in the same log file; to avoid that, it would be preferable to log non-proxied traffic to somewhere other than the main log. So, something like this will do the job, with the commented out commonproxy line included for reference:

    CustomLog "/var/log/apache2/access_log_not_forwarded" common env=!forwarded
#    CustomLog "/var/log/apache2/access_log" commonproxy env=forwarded

	# First we assume dolog if forwarded, then unset dolog if also donotlog:
    SetEnvIf forwarded 1 dolog
    SetEnvIf donotlog 1 !dolog
    # Then log only if definitely dolog:
    CustomLog "/var/log/apache2/access_log" combinedvhostproxy env=dolog

(Alternatively, if you don’t have any need for further control over logging and just want to use combinedvhostproxy, you can simply replace commonproxy with combinedvhostproxy in the configuration and call it good — there’s no need to mess with any new environment variables except for what’s there by default. As I described in the earlier article, you’ll still need to comment out any CustomLog directives in vhost files in order to avoid logging visits twice.)

What this does is enable us to use the value of another environment variable called donotlog, which can be set by other directives elsewhere, including in vhost configs or bad bot exclusion lists, and make the logging choice dependent upon both that other environment variable and the original forwarded environment variable. But to do this with the least amount of fiddliness, we’re actually going to use a third variable called dolog, which we’re always going to unset whenever we do set donotlog.

Specifically, after any chunk of code where we set the donotlog environment variable, we’re going to switch off the dolog variable, like so:

SetEnvIf donotlog 1 !dolog

It’s this three-way checking that enables the final logging choice to be dependent upon the value of two different environment variables.

Therefore, we can update the original logging tweak intended to exclude loopbacks like so:

SetEnvIf Remote_Addr "127\.0\.0\.1" donotlog
SetEnvIf Remote_Addr "::1" donotlog
SetEnvIfNoCase User-Agent ".*internal dummy connection.*" donotlog
SetEnvIf donotlog 1 !dolog

In order for these logging exclusions to work, you’ll need to ensure that the server’s default site config does not have a CustomLog line enabled: comment out any active CustomLog directives, and the logging exclusions should work as intended.

We can add the same extra line to bad bot lists, which I described in “Improve OS X Server Performance With a Server-Wide Ban List”. We can also take the opportunity to update to Apache 2.4 syntax, although this isn’t strictly necessary since Apple enabled backward compatibility with the Apache 2.2 syntax when it moved to Apache 2.4 in Server 4. The global bad bot exclusion config then ends up like this:

# Update our flag to prevent logging
SetEnvIf donotlog 1 !dolog

# Updated for Apache 2.4 by dropping Order deny,allow...
# and replacing Allow from all with Require all granted

# Now deny access to bad bots
<Files "*">
	Deny from env=badbot
</Files>

# Allow everyone to see a 403 error
<Files "403.shtml">
	Require all granted
</Files>

# And allow everyone to see the robots.txt
<Files "robots.txt">
	Require all granted
</Files>

Note that in the case of the bad bot exclusion list, the logging line could just as easily have been the following, since that original bot exclusion list set both the badbot and the donotlog environment variables:

SetEnvIf badbot 1 !dolog

As described in the original two articles, the additional log tweaks outlined above — for excluding internal dummy connections and for excluding bad bots — should be incorporated via include files referenced in the main httpd_server_app.conf, rather than being added directly to the file itself.

All material on this site is carefully reviewed, but its accuracy cannot be guaranteed, and some suggestions offered here might just be silly ideas. For best results, please do your own checking and verifying. This specific article was last reviewed or updated by Greg on .

This site is provided for informational and entertainment purposes only. It is not intended to provide advice of any kind.