Logfeed

Logfeed generates generates custom-filtered, templated Atom feeds from Apache access logs.

1. Overview

Feeds are defined by simple configuration files which contain

Filtering is performed by matching or ignoring user-defined regular expressions written in plain Perl. Entry templates define the content of entries and are simple XML+XHTML fragments containing template variables. The easiest way to understand how logfeed works is through a few simple examples.

Logfeed is very similar in nature to Blosxom both in the way config files are defined and how templates are interpolated. In fact, I used several bits of code from Blosxom itself and the Blosxom config plugin. If you have used Blosxom before then logfeed’s behavior will probably seem natural.

2. Dependencies

Logfeed requires the File::ReadBackwards Perl module. This is a “non-standard” module and may need to be installed. On Debian-based Linux distributions, this is as easy as:

sudo apt-get install libfile-readbackwards-perl

If there is no similar package available for your operating system you can download the module from CPAN.

3. Download

You can either browse the repository, download a snapshot, or clone the repository using Git:

git clone git://jblevins.org/git/logfeed.git/

4. Configuration

Logfeed config files are actually just Perl fragments. They are evaluated each time logfeed runs. Each configuration variable is described below with examples.

Metadata

The following metadata variables are required:

All of the following are optional.

Filters

You can match or ignore lines using the %match and %ignore hashes with the following keys:

Values in these hashes should consist of regular expressions. Lines that match at least one of the %ignore rules will be excluded. Remaining lines that match all of the %match rules for each key will be included. This is perhaps best illustrated with an example.

The following rules will create a feed of all requests with referring URLs containing (‘google’ OR ‘yahoo’) AND result in a 404 code:

$match{'ref'} = 'google|yahoo';
$match{'code'} = '404';

Below are some more examples:

Templates

The body of feed entries can be completely customized using a template, a string stored in the variable $entry. This template tells logfeed how to generate <entry> items in the Atom feed. If you do not define this variable, the default template will be used. You need to use single quotes (or qw) so that the variables don’t interpolate.

If you modify the default template, make sure the body of the <content> element is valid XHTML and that the required elements, <id>, <title>, and <updated> are all included. It is very important that the IDs are unique.

The following variables will be interpolated using information from the log file:

And the following will be interpolated using the metadata defined above:

Here is the default template:

$entry = '<entry>
    <id>tag$colon$id_domain,$id_year$colon$feed_path/$id_time/$ip$req</id>
    <title>$host: $req</title>
    <author>
      <name>$author_name</name>
      $author_uri$author_email
    </author>
    <updated>$utc_date</updated>
    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">
    <ul>
      <li><strong>Date:</strong> $utc_date</li>
      <li><strong>User:</strong> $user</li>
      <li><strong>Host:</strong> $host</li>
      <li><strong>User Agent:</strong> $ua</li>
      <li><strong>Referrer:</strong> <a href="$ref">$ref</a></li>
      <li><strong>File:</strong> <a href="$base_url$req">$base_url$req</a></li>
      <li><strong>Size:</strong> $sz</li>
      <li><strong>Status:</strong> $code</li>
    </ul>
    </div>
    </content>
    <link rel="alternate" href="$ref"/>
  </entry>
';

5. Usage

Config files can be named anything. For the following examples, let’s assume files have the .conf extension. This is completely optional.

logfeed can run from the command line, as in

perl log-feed.pl conf=bar.conf

This command can, for example, be called periodically via a cron job. It can also run as a CGI script:

http://foo.net/feeds/log-feed.pl?conf=bar.conf

Optionally, one could use Apache’s mod_rewrite to help clean up the URLs when running in CGI mode. For example, the following rewrite rule could be placed in .htaccess:

RewriteRule ^feeds/(.*).atom$ feeds/log-feed.cgi?conf=$1.conf

Then, given a config file called bar.conf, the feed would be made available at http://foo.net/feeds/bar.atom.

6. Notes