All the problems are solved! After spending all day pouring through javascript, Googling the forums to death and non stop testing I´ve finally cracked it! Some users were having trouble entering data on our site and I finally managed to figure out that most of them were using IE5.2 on the Mac. Turns out that the features of the Rich Text Editor I was using along with similar editors is powered by the browser. I kind of knew this but didn´t know how it was done.
Turns out that its implemented in IE in versions 5 and above (but not on the Mac) using something called MSHTML and in the various Mozilla browsers through Midas (I think). These are pretty much compatible but if the browser doesn´t support one of them then your users have no way of using one of these editors.
I´ve just had to go through and change the sites so that those who haven´t got up to date browsers can stick text in using standard textareas. Just to make it a bit neater I´ve hidden them inside hidden divs until they click on a link to show them. Not too bad considering.
Linux Archive
Every time I have a few days off…
… something gets blown up! Last year I went to France camping for two weeks. We managed to get through the first weekend before I got a phone call saying the building had lost power during a thunder storm and the internet connection had gone down. Back then all internet traffic was routed through the main server and this had lost power and needed to be booted again. Since I´m the only technical person in the company and the Linux box needed the etho interface bringing up so thats why I got a call. After 5 minutes of my boss not being able to log in to the box because the keyboard he was tapping away at wasn´t plugged into anything, it was up again and I was off to the beach.
This week I had another few days off and the last night that I was off we had another thunder storm. The lights flickered a bit in the pub but apart from that I didn´t think that anything would be wrong. Especially now that we have seperate routers, UPSes on the main servers and most things were set to automatically boot up again on a power failure. I got to work though and it turned out we had got a few more lightning problems but all that had happened was a few fuses in the power sockets had blown. They managed to sort this out by the time I got there so I just had to check what had been going on with the automatic invoicing system and get it to run off the ones that were lost when a print server was down.
Next time I want some time off we´re getting a 3 day forcast first.
Ever more complex backup scripts
My server requirements are fairly simplistic compared to some setups as I only use Apache, PHP and MySQL on my actual web servers. All the other stuff like DNS and email are handled by other servers managed by our hosting company. This means that if I keep a copy of the httpd.conf and cron job files then all I need to do is take regular copies of all the web files and the MySQL database store. That way I can get back up and running even if I get a complete hard drive failure within a few hours using the original server image. Sorted.
The only problem is that I want a nice automated set of scripts that will take daily backups of the database, weekly backups of the web files and do it in a completely redundant way.
After a fair bit of messing around I?ve managed to get two identicle Linux boxes that I?m going to use as development servers, one for each of the dedicated servers we?re getting. Using FTP from protected directories and mysqldump run by cron with plenty of error checking and reporting by email. The same thing is done by tar and gzip to the web files. All these files are then copied by a third server onto our windows file server. All these servers perform error checking on each other and if something hasn?t been done then they report it by email. The final stage is to FTP the weekly backup to an offsite backup server which also performs error checking on the main office servers. This way, if something is not working, it gets reported so everything from individual server crashes up to complete power failure or loss of internet connection on one of the sites is flagged up.
Just got to wait for a week or so till we get through the hassle of moving the servers over and then I can test this whole lot.
New Servers on the way!
We went out for lunch on Friday to KFC and after stuffing my face with a nice Fillet Tower Meal I was just resting my head on the table thinking about things while the others were talking about… something. The MD knows me well enough now that means there´s some technical impending doom on the way. It´s also a dead giveaway when he asks me if something is possible and I respond with "Can´t be done".
Anyway, what was bothering me was that a series of sites had started to run slower and slower over the last few weeks. This is mostly becuase they are starting to get more and more hits from search engines, the SEO is kicking in a bit and so the traffic is rising fairly sharply. I´d spent all morning tinkering with the code in the templates and got the PHP execution time down to about half what it was but it hadn´t had much effect on the non-static pages. Only one answer to this – DB overload. There was no way I could optimise the DB queries any more so the only option was an upgrade. Luckily the current hosting package is a "Professional" package on a shared server.
I talked to the boss about it and we decided to upgrade to a dedicated server since the extra cost was worth it. But it got better than that when we looked at the options available. We currently have two "Professional" packages and a "Business" (Pfff.. right) package with the various domains spread accross them. Since there are basically two ´main´ projects (ones that´ll cause high server load) he said I could have two dedicated servers! Ace! We´re now gonna get a nice top of the range server and a mid-range server and spread the load accross them. Plus they´re solely ours so I get root access to them and everything. Sweet, I´m happy!
Just got to go and plan the switch accross without causing more than a few hours of downtime and without losing any database information… Easy Peasy!?!
Finally stopped Apache redirecting to similar file names
I´ve got this membership section for one of the sites that I´ve done. It allows you create a profile along with an image. It also gets displayed on another site on another server that shares the database. Instead of creating two copies of the image on both web servers I just went and linked one to the other.
As usual I named the images based on the ID number of the profile in the database. I then noticed a problem with a few of the profiles. Since the images are optional some of them didn´t have one on the server. The problem was that when the profile on the second server checked to see if a profile image existed on the first server, if there wasn´t one but there was a profile ID number that was similar then this image was getting displayed instead.
Very strange problem that needs sorting out pronto… I did some checking and by firing a request for the non-existant image with Wget at the server I found out it was giving me a "301 Moved Permenantly" response. I tried looking through some of the server logs and Apache httpd.conf files but couldn´t find out much more.
Next step, load up Google and go into research mode. After a couple of hours I found out that it was the Apache module mod_speling which very helpfully (although not in this case) rewrites misspelled URLs. After finding this the solution was simple: just create a .htaccess file in the image directory that contains the line "CheckSpelling off". Sorted.
Awstats killed my server
Well I´ve given up on trying to get Awstats working on 1and1 since the bloody thing went and killed my server. I got in on Monday morning and all the sites were down with the old "CGI limits reached, please try again later", which usually means something is gobbling up all the memory.
Since I couldn´t do anything as even ssh access was just throwing up errors, I got on to 1and1 support and asked them to find out what the hell was going on. After a couple of minutes on hold the tech guy managed to find out that it was Awstats that wasn´t terminating its threads and so eating up the memory.
The tech guy killed all the processes for me and I could then get in and remove Awstats. Since it wasn´t doing what I needed anyway I couldn´t be bothered to go in and try and fix it so getting rid was the best option. That just leaves job number 4,698 for this week: write my own weblog software.
Awstats on 1and1.co.uk is so not worth it
The bog standard logs and the webpages it gets displayed on with 1and1 hosting is pretty crap. Especially when we have up to 1500 domains in one package which all get bundled into the same set of stats. Most of these are just holding pages and I?m not interetsed in them but they do screw up the logs enough.
Couple this with the fact that the domains I am interested in are generally php and mysql driven beasts which don?t actually get included in the access logs. This is because they use the Apache ErrorDocument 404 trick to turn nasty variable ridden urls into nice user-friendly ones, and as such get treated as errors by apache and not logged. Any user agent get told they’re ok by the php script but this doesn?t help with logging.
So I decided to install Awstats which I have used on a single site on my own server and think is quite excellent. I decided to get it up and running before I tried it with the error logs as well. This in itself can be a bit of a pain on 1and1 servers but Ravi?s awstats and 1and1.com tutorial is a good place to start. It?s still not working properly but I?m gonna wait till next week till I have another crack.
I?ll post more when I either get it figured out or give up…
Another interesting way to crash a server
I was setting up another site with a paid for membership system and since I´m reeeally lazy I just get the server to do all the admin stuff through cron jobs and php scripts. One of these does a daily check to find the users that signed up a year ago and then generate a new invoice and print it out in the office.
Since I´d done this before on other projects I just went and copied the file over and then made a few changes to the database details and the folder info in it. As I was just checking through the code I noticed the way that it checked for the users that needed renewing was, well… stupid.
I wrote the code ages ago when I were but knee-high to a grass-hopper type thing, plus I might of been hung over. Insted of sending a mysql query asking for a list of users whose sign up times fall between two times (ie now and 24 hours from now) and then dealing with them, I was asking for ALL the users, seeing if the sign up time fell between the two values and then if it did, do the whatever.
Since this code gets used on about 10 sites which are all hosted on the same server, and all the scripts get run at the same time daily, it puts a bit of a strain on the server. Luckily I found this out while most of the sites only had about 500 members so the server could take it. Could have been fun when they were up in the thousands though.
More on output caching
I´ve wrote a bit before about PHP and SQL output caching but it was based around caching it within the server/network of the host. This is good thing for the project I was considering it for since it was necessary for the pages that change regularly.
There are, however, caching servers (such as Squid) that take copies of pages to save bandwidth and server load. These servers are generally set up to ignore .php pages since they are generally dynamic by nature, which is a good thing.
It did occure to me that I could reduce server load on some sites that don´t change that often by getting these servers to cache the sites. All I would need to do is send the content type header as html instead of php and the servers should keep a copy. It could be done through the server configuration or something like
header(´Content-type text/html´);
should do it. Only thinking out load really, might give it a try and see what happens.
PHP and SQL ouput caching
One of the largest projects I´ve worked on up till now is the Find Locally suite of sites. This is an internal project and as such I´m also in charge of managing all the hosting stuff. Since it consists of about 1300 sites which are all pretty database heavy (even the "static" pages). This means that as they start to get populated and generate more interest the server load is going to shoot up.
This is not a problem at the moment but I do like to think about what the options will be down the line. Quite far down the line will be getting a big meaty internet connection to hook up to my 12 server database cluster and 6 web servers (drool, drool – it´s a geek thing), but until I´ve had the chance to get our address on enough junk mailing lists so that the boss doesn´t notice the internet bill in the pile, I´ll have to get the most out of our current setup.
That´s where output caching comes in. The main bottleneck comes from the database so trying to minimise the amount of queries sent to the database is a good idea. The idea is to save certain parts of a given dynamic page in a file which is read by the PHP interpreter (or directly by the web server as HTML if possible) and insert it into the rest of the page. Example time…
Take a job advert on a job advertising website. This will contain a lot of text pulled from the database but won´t change very often. The most likely change is that it will be removed when the job is filled. Best idea is to save the body of the page in a file named with that adverts ID number. When the page is requested, a short query is sent to the database to see if that ID number is there and if so use the content in the file. If not, it´s been removed so do whatever you would normally do in this situation. There´s extra stuff you can do like adding the "last updated" time string to the file name and checking this with the DB to see if it has been modified and if so create a new page and file. You get the idea.
There are other tricks you can do as well. A lot of these sites contain lists (e.g. of the towns in a county) which don´t change very often at all (unless I get elected Prime Minister and rename London to Andyville) but it would take ages to update all the pages if changes do occur. This is a trickier one since you can´t be sure they haven´t changed unless you check it with the database – which is pointless. Answer: leave it alone and just check it once a week!
There´s bound to be more I can do and it´ll depend on the individual page. In the mean time I might set up the servers with a fake DB and a load of clients on the network to fire as many requests at it as possible to see what effect the output chaching has on the load.