PHP and SQL ouput caching
Friday, April 1st, 2005One of the largest projects I´ve worked on up till now is the Find Locally suite of sites. This is an internal project and as such I´m also in charge of managing all the hosting stuff. Since it consists of about 1300 sites which are all pretty database heavy (even the "static" pages). This means that as they start to get populated and generate more interest the server load is going to shoot up.
This is not a problem at the moment but I do like to think about what the options will be down the line. Quite far down the line will be getting a big meaty internet connection to hook up to my 12 server database cluster and 6 web servers (drool, drool - it´s a geek thing), but until I´ve had the chance to get our address on enough junk mailing lists so that the boss doesn´t notice the internet bill in the pile, I´ll have to get the most out of our current setup.
That´s where output caching comes in. The main bottleneck comes from the database so trying to minimise the amount of queries sent to the database is a good idea. The idea is to save certain parts of a given dynamic page in a file which is read by the PHP interpreter (or directly by the web server as HTML if possible) and insert it into the rest of the page. Example time…
Take a job advert on a job advertising website. This will contain a lot of text pulled from the database but won´t change very often. The most likely change is that it will be removed when the job is filled. Best idea is to save the body of the page in a file named with that adverts ID number. When the page is requested, a short query is sent to the database to see if that ID number is there and if so use the content in the file. If not, it´s been removed so do whatever you would normally do in this situation. There´s extra stuff you can do like adding the "last updated" time string to the file name and checking this with the DB to see if it has been modified and if so create a new page and file. You get the idea.
There are other tricks you can do as well. A lot of these sites contain lists (e.g. of the towns in a county) which don´t change very often at all (unless I get elected Prime Minister and rename London to Andyville) but it would take ages to update all the pages if changes do occur. This is a trickier one since you can´t be sure they haven´t changed unless you check it with the database - which is pointless. Answer: leave it alone and just check it once a week!
There´s bound to be more I can do and it´ll depend on the individual page. In the mean time I might set up the servers with a fake DB and a load of clients on the network to fire as many requests at it as possible to see what effect the output chaching has on the load.