Dec 5, 2016

Parsing IIS server logs with Tx.Windows and CsvHelper

IIS, C#, logs

A little while ago I put together a quick and dirty IIS log parser and I want to capture some of the things learned.

A couple of libs came in really handy so here’s a shout to the wonderful folks that contributed those:

The first one was used to enumerate trough the IIS log files and parse out some of the interesting information within. I know what you’re thinking - Google Analytics is great for that! Unfortunately it wasn’t an option on my project and it was quickly shot down citing vague security risks, etc. I wasn’t going to fight that battle and a simpler solution was enough for the time being.

The customer wanted to get some insight into usage patterns and the tools/application features used the most. A lot of the backed was REST based so parsing out interesting endpoints was easy and they mapped broadly to usage patterns in the application. Looking at the logs we could get a decent picture of what users are focused on based on the end points/access patterns on the back end.

In the end the solution was pretty simple and with the help of the two libs quite concise. I set up some parsers to looks for some very specific endpoints in the log files and then aggregate some data on top of that. Tx.Windows is used to expose the data in the logs in a way that makes it trivial to consume and the reports are pumped out with the help of CsvHelper:

    public interface IParser
{
bool IsMatch(W3CEvent item);
string GetText(W3CEvent item);
}

public class SimpleRegExParser : IParser
{
protected readonly Regex Pattern;

protected SimpleRegExParser(string pattern)
{
Pattern = new Regex(pattern);
}

public virtual bool IsMatch(W3CEvent item)
{
return Pattern.IsMatch(item.cs_uri_stem);
}

public virtual string GetText(W3CEvent item)
{
var i = 2;
var match = Pattern.Match(item.cs_uri_stem);
var text = match.Groups[1].Value;

while (i < match.Groups.Count)
{
text = string.Format("{0} - {1}", text, match.Groups[i].Value);
i++;
}

return text;
}
}

I have set up some unit tests around this to assert that the parsers are actually parsing out the right stuff.

    [TestClass]
public class When_simple_regex_parser_is_given_compact_view_text
{
private readonly W3CEvent item = new W3CEvent
{
cs_uri_stem = "/application/api/log/series"
};

[TestMethod]
public void it_matches()
{
var parser = new SimpleRegExParser(@"\/application\/api\/log\/(?<Name>.*)");

var text = parser.GetText(item);

Assert.AreEqual("series", text);
}
}

There are a few more specific parser that are build on top of this with some pretty interesting regular expressions but you get the idea. One thing worth mentioning is that this one only matches on the uri_stem but you could have more complex implementations where several W3CEvent are looked at before determining if the log entry is a match.

In the end the program looks similar to this (some sections omitted), and is invoked every week:

    var path = args[0];
var fileName = args[1];
var excludes = new Excludes("\\.js$", "\\.css$", "\\.gif$", "\\.jpg$", "\\.png$", @"^\/arcgis\/rest\/services");

var files = Directory.GetFiles(path, "*.log", SearchOption.AllDirectories);
var logs = W3CEnumerable.FromFiles(files);
var byUserActivity = logs
.Where(item => !string.IsNullOrEmpty(item.cs_username) && !excludes.Match(item.cs_uri_stem))
.Where(item => parsers.Match(item))
.GroupBy(item => new
{
Date = item.dateTime.ToString("g"),
Item = parsers.GetText(item),
User = item.cs_username
}).Select(group => new
{
group.Key.Date,
UserName = group.Key.User,
ItemName = group.Key.Item,
Hits = group.Count()
});

using(var writer = new CsvWriter(File.CreateText(fileName))) {
writer.WriteRecords(byUserActivity);
}

The .csv data exported can be analyzed in Excel and pivoted to your heart’s content. You can build a pretty robust analytics solution on this and log files we get every day from 50-100Mb in size are processed without a hitch. In the end I wouldn’t go down this path - you’d be much better served by using an off the shelf solution like Google Analytics if your project can.