A little while ago I put together a quick and dirty IIS log parser and I want to capture some of the things learned.
A couple of libs came in really handy so here’s a shout to the wonderful folks that contributed those:
The first one was used to enumerate trough the IIS log files and parse out some of the interesting information within. I know what you’re thinking - Google Analytics is great for that! Unfortunately it wasn’t an option on my project and it was quickly shot down citing vague security risks, etc. I wasn’t going to fight that battle and a simpler solution was enough for the time being.
The customer wanted to get some insight into usage patterns and the tools/application features used the most. A lot of the backed was REST based so parsing out interesting endpoints was easy and they mapped broadly to usage patterns in the application. Looking at the logs we could get a decent picture of what users are focused on based on the end points/access patterns on the back end.
In the end the solution was pretty simple and with the help of the two libs quite concise. I set up some parsers to looks for some very specific endpoints in the log files and then aggregate some data on top of that. Tx.Windows is used to expose the data in the logs in a way that makes it trivial to consume and the reports are pumped out with the help of CsvHelper
:
I have set up some unit tests around this to assert that the parsers are actually parsing out the right stuff.
There are a few more specific parser that are build on top of this with some pretty interesting regular expressions but you get the idea. One thing worth mentioning is that this one only matches on the uri_stem
but you could have more complex implementations where several W3CEvent
are looked at before determining if the log entry is a match.
In the end the program looks similar to this (some sections omitted), and is invoked every week:
The .csv data exported can be analyzed in Excel and pivoted to your heart’s content. You can build a pretty robust analytics solution on this and log files we get every day from 50-100Mb in size are processed without a hitch. In the end I wouldn’t go down this path - you’d be much better served by using an off the shelf solution like Google Analytics if your project can.