Setting up JISCPress on Amazon Web Services

I’ve spent the last couple of days – about 13 hrs altogether – setting up JISCPress on Amazon Web Services (AWS). Prior to yesterday, I’d not really used AWS except for setting up the command line tools and starting and stopping a server, also known as an Amazon Machine Instance (AMI).

It’s gone pretty well and I’ve documented the outline of the process I went through on the JISCPress wiki. I used a combination of the Amazon Management Console and the command line tools to work on the Elastic Compute Cloud (EC2). To create ‘buckets’ on the Simple Storage Service (S3), I used the S3 Firefox plugin. There are a lot of third-party tools to interface with both EC2 and S3.

I work on virtual servers both at the university and on Slicehost, where WriteToReply is hosted. A server on AWS is a different kind of virtual server, which takes a little while to understand, but the documentation is good and I pretty much followed the suggested workflow.

In addition to using EC2 and S3, I am also using an Elastic IP address and the Elastic Block Store (EBS). The elastic IP address is convenient in that it allows you to ‘own’ a static IP address that you can bind your DNS A record to – in our case jiscpress.org Without the Elastic IP address, the IP address of the AMI is lost when you terminate the machine (turn it off) and so you have to change the DNS record and wait for DNS to re-propagate. It’s ‘elastic’ because while it is a persistent, external IP address, you can hot swap the machines that use the IP address. That was my first lesson.

My second lesson was to understand that there are a number of alternative, trusted public AMIs available to choose from. I don’t really mind which flavour of Linux I use and when looking at the AMIs that Amazon provide, I saw that only Fedora 8 was available so I chose that.  As it happens, there are also Ubuntu AMIs from Canonical.  Instead, I just upgraded Fedora 8 to Fedora 11 which wasn’t too much trouble, but had I known, I’d have just chosen an Ubuntu image from Canonical as they are more up-to-date. I also learned that despite upgrading the machine, the kernel remains the same. Amazon build kernels for their service and you can build kernel modules from the sources which they provide if you need to.

My third lesson was that although you can reboot an AMI just as you can reboot any virtual server, if you ‘terminate’ or turn off the AMI, you appear to lose all data that has been created since you created the AMI. In my case, that wasn’t much as I wondered about the persistence of data and had created my AMI after I’d got a basic web server with WordPress MU set up. But it’s really worth noting this as not only might you want to turn the machine off to save on running costs, but there’s also the chance that it might unexpectedly go down and you’d have lost your work. This underlines how AWS is being used. Machines are cheap and replicable. Use S3 and EBS for data you care about. Using a single machine for both production and development is never the right way to go about working long-term, but with a decent back-up strategy, it should work fine for us.

This led me sort out backups and I set up rsync to backup /var /home /root and /etc to an Elastic Block Store. EBS is a virtual block device (i.e. hard disk) which you can format and then attach and mount on your AMI. So I’ve got rsync backing up to /mnt/data

Getting the domain name and DNS sorted out was very simple. I registered jiscpress.org via Dreamhost ($10/yr) and then used a free UK-based DNS host to host the record. WordPress MU can run using either sub-directories (i.e. http://jiscpress.org/site1 or on sub-domains (i.e. http://site1.jiscpress.org). On the whole, it’s better to set up wildcard DNS and go with sub-domains, which I did by simply adding an A record entry of *.jiscpress.org against the Elastic IP address and ‘ServerAlias *.jiscpress.org’ in the section of the apache config file.

Finally, it looks like sending mail from an AMI is not as simple as you might expect. This is because the hostname for your machine is provided dynamically when you activate it and can’t be changed. This means that you can’t add a PTR record in your DNS and therefore can’t set up reverse DNS.  Without this, most mail hosts such as Hotmail or Yahoo, will treat mail from your server as spam. So far, Google is treating mail from the server as ‘neutral’ and letting it through. The simplest way around this is to relay your mail to an external relayer which a lot of AWS users appear to be doing. For the time-being, I’m not too worried about this but I may have to do more work on it if we find that mail is regularly failing to get through.

I’ve enjoyed the learning process of setting JISCPress up on AWS. I’ve only really scratched the surface of what the platform offers but once I’d got my head around how the different services work together, it seems pretty straight forward. The basic machine (1.2  GHz 2007 Opteron or 2007 Xeon processor, 1.7 GB RAM, 160GB storage) feels very, very fast, as does the network it’s running on. WPMU can be pretty resource hungry, but for the purposes of our project, I think this will be sufficient.

Anyhow, http://jiscpress.org is now live and running a bare WPMU install. I’ll refine it over the next week in preparation for Eddie and Alex to begin work in early July.

If you’ve got experience working on AWS and can clarify or correct any of my assumptions, please do. I get the feeling that now JISCPress is in the cloud, I need to relax a bit and enjoy the flexibility of the platform and learn more about what it has to offer.