Apache Log Analysis

I’ve just had to do some Apache Log analysis for the various Find Locally domains. As all the sites are run off one server, the log files contain data from all the domains. This makes it a little difficult to get any meaningful information without having to analise each domain one at a time, but before getting to that, I’d have to get all the data into a format I could do something with.

After downloading the 140 Mb log files, I used M$ Access to import the delimited files and then used the MySQL ODBC Connector to export the tables into a MySQL database on my laptop (it’s a Core Duo and pretty much the fastest machine I have).

I then decided to get rid of the Bot traffic from the 5 million hits so I could be left with something usable. As Apache only stored the IP address of the remote machine, I sent PHP on a 12 hour mission to use gethostbyaddr() to store all the remote domains. This made it a lot easier to filter out known Search Engines and the like.

Having gotten rid of all the junk data and having it in a decent database, it made it easy to perform analysis on the remaining logs. As our domain names are all in the same format, using PHP made it easy to dump out a few tables of results showing exactly what I need. All I need now is to get it into a set of automated scripts so I don’t have to do it by hand next month.

About the Author

I'm a web developer based in the East Midlands, UK and if I keep up the current rate, I might have developed 3 million sites by the time I retire