logfeed 1.15

logfeed is a Perl script which generates custom-filtered, templated Atom feeds from Apache access logs. Feeds are defined by simple configuration files, fragments of Perl code, which define the metadata (the title, URI, etc.), filtering rules, and (optionally) a template for the resulting feed entries. You can literally filter on anything you can write a regular expression for and it’s simple to format the results however you like—it’s pretty easy to get creative.

I’ve been working on this for a few days and I think it’s in fairly good shape to release now. There are few examples below. You can also browse the source or visit the homepage to read more.

Defining the metadata for a feed is straightforward:

$log_file = '/home/jrblevin/access-logs/jblevins.org';
$base_url = 'http://jblevins.org';
$feed_path = '/feeds/referrers.atom';
$feed_title = 'jblevins.org: Recent Referrers';
$author_name = 'Jason Blevins';
$author_email = 'jrblevin@sdf.lonestar.org';
$reverse_dns = 1;

You then define regular expressions to match or ignore certain log entries. If you want to see hits that result in 404’s, match on the status code:

$match{'code'} = '404';

See hits on your feeds by matching atom and rss extensions ('req' is for request filename):

$match{'req'} = 'atom$|rss$';

A more concrete example: I use a cron job to generate a feed of my recent referrers. Since config files are just Perl code, you can do things like this:

my @temp = qw! ^-$
               ^http://www\.duke\.edu/~jrb11 !;
$ignore{'ref'} = join '|', @temp;

This just defines a regular expression to ignore certain referrers ('ref'). The first piece (^-$) filters out log hits with no referrer. The remaining lines filter out hits from Google, Yahoo!, and my own sites. I then join these individual regexes using | to define an ignore rule which matches any one of them.