Amazon S3 Backup via Rails Plugin

03 Jan 2007

I think Amazon S3 is awesome. I was looking into building a RAID NAS (Network-Attached Storage) for backing up all my important data and I nearly bought a setup that would run into the hundreds of dollars - but then I did a little fancy multiplication and addition and realized S3 would cost me less than one one-hundredth what the NAT would have cost.

In case you’re as in the dark about S3 as I recently was, here’s a little rundown: it’s a very simple, super fast, extremely large backup system that Amazon uses for all of it’s own storage needs. It’s opened the service up to the public on a pay-as-you-go basis.

Costs: * $0.15 per GB-Month of storage used * $0.20 per GB of data transferred

In other words, it’s damn cheap. Say you want to upload 500 MS Word documents that are around 45KB each? How much would it cost to store those on some easy-to-access, highly secure, permanent backup place? Less than one cent per month. Eight years later you’d only be out a half-dollar ($0.32 to be precise).

So S3 is my new storage/backup location of choice. The one difficulty of using it is that I need to figure out some way of automating the backups so that the backups are actually useful and can easily be recovered if necessary. In particular, I need some way to automatically backup the website data that is so crucial to me.

So I made a plugin.

The S3 plugin will allow you to backup your crucial website data to S3 via a handy Rake task (written by the talented Adam Greene).

Amazon has been an excellent supporter of Ruby/Rails lately (they fund 43things.com among other things) and they’ve made sure to release a ruby library for S3. I’ve combined that with Adam’s S3 rake task into a handy S3 backup plugin.

You can install it via the following two commands:

    ruby script/plugin source http://svn.6brand.com/projects/plugins
    ruby script/plugin install -x s3

Then backing up is easy as:

    rake s3:backup:db
    rake s3:backup:code
    rake s3:backup:scm

or, to get them all together:

    rake s3:backup
  • adam greene said: wow! I was just starting to do the same thing. But this looks great. The next version I was working on uses the AWS::S3 gem, as it significantly cuts down on the amount of code. Plus I need to integrate pushing my static files to S3, so I was going to add that in as well. I'm wondering if we should combine 'forces' ;) thanks for this, Adam
  • Danger said: Hey, I'm glad you like it - it truthfully is all your work. I'd be interested in switching this over to the AWS::S3 gem. Do you think it would be too big of a problem for folks or is gem installation pretty much a given now. I like the idea of all the code being in the plugin, but I do prefer the AWS::S3 gem. Feel free to do whatever the heck you want with this plugin - consider it a stepping stone!
  • Aryk said: Hi, this seems like a great plugin. I was wondering though. What happens if the bucket your plugin wants to create is already created? I looked at your code and it seems like there is no backup for it.
  • Danger said: Hmmm, that's a good point. Do you happen to know what the status code S3 returns when you try to create an existing bucket? I don't have the time to check it myself right now but if you knew what it was then I could build the fallback into the rake task.
  • Aryk said: I actually don't know, Im just going to go into the plugin and hard code the bucket I want to use. Hopefully no one has used it. I think it would be helpful and easier on your side if you had the user provide those kind of preferences from the beginning, and also paths to their svn repositories instead of trying to figure it all our in ruby. Maybe it could go in a configuration file, like a yml or something. Also, the *.rb files under lib, weren't copied to the main the /lib folder. Do they have to be. The s3.yml wasn't copied either even though the code in install.rb ran a few times and said it was copied over. I might be the only one experiencing these problems. All in all, great plugin...
  • Aryk said: I actually don't know, Im just going to go into the plugin and hard code the bucket I want to use. Hopefully no one has used it. I think it would be helpful and easier on your side if you had the user provide those kind of preferences from the beginning, and also paths to their svn repositories instead of trying to figure it all our in ruby. Maybe it could go in a configuration file, like a yml or something. Also, the *.rb files under lib, weren't copied to the main the /lib folder. Do they have to be. The s3.yml wasn't copied either even though the code in install.rb ran a few times and said it was copied over. I might be the only one experiencing these problems. All in all, great plugin...
  • Aryk said: I actually don't know, Im just going to go into the plugin and hard code the bucket I want to use. Hopefully no one has used it. I think it would be helpful and easier on your side if you had the user provide those kind of preferences from the beginning, and also paths to their svn repositories instead of trying to figure it all our in ruby. Maybe it could go in a configuration file, like a yml or something. Also, the *.rb files under lib, weren't copied to the main the /lib folder. Do they have to be. The s3.yml wasn't copied either even though the code in install.rb ran a few times and said it was copied over. I might be the only one experiencing these problems. All in all, great plugin...
  • Aryk said: sorry about all the messages, i kept getting rails error and didnt think it was going through...
  • Danger said: It's cool about the multi-comments - it happens. No, the *.rb files don't need to be moved to the main lib folder. The plugin's lib folder will work. Yes, the s3.yml file *should* be copied to config. I'll have to test it again to see if it's still working but it worked fine the last time I checked. Thanks for the feedback! I think it might be a good idea to add a bucket_prefix configuration option in the s3.yml file.
  • Ben said: Nice work! After sorting out a little trouble with server times, it worked a treat. I'm in the process of adding tasks to handle backup and retrieve of any assets in the 'shared' directory (as used by the standard Capistrano deploys). It'd be good to get a bit more automation on the retrieve stage as well. I'll forward you the code when I've finished.
  • Ben said: Hm. There's a problem. The tar.gz file for code is being created properly and seems to be being stored effectively, however the retrieve task isn't finding it. It's probably looking in the wrong bucket. I'll see if I can fix it and upload the changes.
  • Ben said: By the way, the retrieve fails silently, leaving an XML error in place of the tar.gz file. If you're using this, check that retrieve works properly. Currently there's no automation for this task, so you'd have to untar and unzip the file manually in any case.
  • Ben said: OK, as I suspected, the retrieve_file function used a hard-coded 'db' instead of the passed-in name. The fix is to change a line in the 'retrieve_file' function: data = conn.get(bucket_name(name), entry_key).object.data (name was 'db' in the original) I've changed the tasks to add a prefix (specified in the s3.yml file) to the bucket names to help avoid conflicts. Seems to be working so far... I'll put the code up on the web somewhere once I've integrated it fully and run some more tests. Next on the agenda: automated retrieve. I'm thinking about the best way to do this - anything to shave vital seconds off a retrieve in a catastrophic failure situation is a bonus. If you have any ideas, let me know. A DB restore would be good, and a more general rake task that can recreate the whole site, put files in the right places, re-point 'current' to the new code directory and run a cap deploy:restart... I have a bit of a problem with the scm functions. They seem to require that the live directory has been directly deployed with a Subversion check-out. Unless you take care to disallow web access to files in the .svn directories, this is probably bad practice as it could potentially make your repository insecure. If you use an svn export to deploy cleanly (or the default Capistrano method which leaves no .svn directories) the backup tasks can't find out the SVN login information. I'll put in more config options to allow the repository and user details to be specified directly in due course.

Please if you found this post helpful or have questions.