Backups using rsync and ssh

I have a server (which just handed you this web page, in fact) located in downtown Orlando, Florida. I also live in the Orlando area, but the server is not in my house. For the most part, the server just sits in the rack, serving up web pages and handling email, without any problems at all.

However, as we all know, sometimes bad things happen. Within the last year, I have had a hard drive physically go bad in the server (i.e. it wouldn't spin at all), and I've had the air conditioning fail at the facility in Melbourne where my server was until a month ago, which luckily my server survived without any permanent damage, but which could have been a catastrophic failure- when I couldn't reach the server, and couldn't reach the colo provider on the phone, I drove down to Melbourne and found the temperature in the room to be about 95F (or 35C), with the windows open and a four-foot box fan trying to pull the heat out of the room (but not having much luck, since there was no air "input" on the other side of the building.)

To prevent this from being a "total loss" event, I have a machine at my house (a "Buffalo Linkstation" NAS device, which has been "hacked" and is now running Debian) which runs a cron job every four hours, which uses ssh to connect to the server, and rsync to copy any files which have changed since the previous backup.

And by running multiple instances of this cron job, I am able to back up multiple servers on the same "backup server".

Background

The way it works is this:

On the backup server, cron runs a script called "backup-blah" (in this case, "backup-phineas", because "phineas" is the server's name. It could be anything you like.)
This script runs rsync, using an environment variable to cause it to use ssh to connect to the server, and an ssh key to authenticate to the server as root.
The rsync command backs up the entire system (with certain exceptions) to a single directory on the backup server. For example, the server's "/" is backed up to "/backup/phineas/", the server's "/usr/local" is backed up to "/backup/phineas/usr/local", and so forth. The permissions, timestamps, and ownerships are all preserved.

Creating the ssh key

The first step is to create an ssh key pair. When the script connects to the server, it will use this key to authenticate instead of using a password. This will allow it to access the server without somebody having to type in a password every time the backup runs.

Creating the key pair involves using the "ssh-keygen" command on the backup server. The process looks like this:

# cd ~/.ssh
# ssh-keygen -t dsa -b 1024 -f id_dsa_backup -C 'rsync backups'
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase): Just hit ENTER.
Enter same passphrase again: Again, just hit ENTER.
Your identification has been saved in id_dsa_backup.
Your public key has been saved in id_dsa_backup.pub.
The key fingerprint is:
08:35:3d:bb:94:bf:71:fe:8d:7e:cd:23:52:4f:4b:5a rsync backups The fingerprint will be different.

As you can see, we are creating a key with no passphrase. You will need to make sure that the "id_dsa_backup" file (or whatever you named it) is not allowed to fall into the hands of anybody who shouldn't have root-level access to the backup server AND to the servers you're backing up.

Installing the key on the server

Configuring sshd

Before we install the key itself, we need to make sure that sshd on the server (i.e. the machine from which we're pulling the backup) is configured to allow key-based authentication.

On the server, find your "sshd_config" file. This will usually be in an "/etc/ssh" directory, although some systems use "/etc" instead. We need to check the following lines in the file:

"PubkeyAuthentication" should be set to "yes". If the line is commented out (i.e. has an "#" in front of it) or is missing, the default value is "yes" anyway... so what you're really looking for is to ensure that somebody didn't change it to say "no" for some reason.
"AuthorizedKeysFile" will normally be set to ".ssh/authorized_keys", however some older versions of sshd used the name ".ssh/authorized_keys2" instead. Either value is correct, however you do need to know which one your system uses.
"PermitRootLogin" should be set to either "yes", "without-password", or "forced-commands-only".

Note that "without-password" doesn't mean what it appears to be at first glance. What it means is that root is allowed to log in, but only if they're using an authentication method other than "password". This way, people can ssh in using a key, but not using a password. If somebody manages to find out your root password, they can't just ssh into your server as root- they would have to get your ssh private key first (which is why you need to safeguard the "id_dsa_backup" you generated above.)

The "forced-commands-only" setting is relatively new. It allows clients using a valid key to log in as root, but only if the key has a forced command attached to it (which, in this case, it will.) Technically, this is more secure, however I also need to log in as root and get a shell from time to time, so this is actually more restrictive than I need.

If you're curious, I use "without-password" on my own servers.

If you had to change any of the options in the "sshd_config" file, you should restart your sshd process. This is normally done using a command like "service sshd restart" or "/etc/init.d/sshd restart".

Installing the key and forced command

The next step is to install the public key in root's ".ssh/authorized_keys" file on the server. This tells sshd that the corresponding private key is allowed to log in as root. In addition, we will also restrict that key's rights to running one single specific command.

The first step is to copy the "id_dsa_backup.pub" file to the server. This can be done with any mechanism you like- you can write it to a USB stick or a floppy on the backup server, and then read it on the main server; you can use FTP or scp to transfer it, you can even cat the file on the backup server, and copy/paste it into the main server (as long as you're careful to ensure that the file is not modified in transit- it's supposed to be one very long line of text.)

I'll leave the mechanics of copying the file up to you. Just make sure that you ONLY copy the "id_dsa_backup.pub" file. You DO NOT NEED the "id_dsa_backup" file (the secret key) on the servers from which you will be pulling data, and probably should not have them there at all.

Once the file is copied to the main server (the one from which you will be pulling the backups), move it to root's ".ssh" directory. Then, we could add it to the "authorized_keys" file, with a command like "cat id_dsa_backup.pub >> authorized_keys", however that would give the key full root access to the server- meaning that not only could it be used to pull backups, but it could be used to get a shell with root access as well.

Obviously this isn't a very good idea- a private key without a passphrase having unrestricted access to a root shell is very unsafe. If somebody ever got a copy of the secret key file, they would have full root-shell access to the server.

We can prevent this by attaching a "forced command" to the key, so that when the key is used to authenticate, it will always run the forced command, and cannot be used to run any other command.

Of course, in order to do this, we need a suitable command which can be attached to the key. And as it happens, I've already written one, which I call "allow-backup". You are welcome to use it, or of course you can write your own script which does the same thing.

File:	allow-backup
Size:	2,585 bytes
Date:	2008-01-13 04:42:41 +0000
MD5:	`e6e3d1eb2f198cbd55421fae6549cd5c`
SHA-1:	`842cfcf265f5affd36dbd4e215d8290d7400565d`
RIPEMD-160:	`8c33ee2546ee17a3acd9a17320653851301cf928`
PGP Signature:	allow-backup.asc

On my server, I have saved this script (with the appropriate addresses for emailing my cell phone if needed) as "/root/bin/allow-backup". To attach the command to the key, I first built a work file with the forced command attached to the beginning of the public key:

# cd ~/.ssh
# /bin/echo -n 'command="/root/bin/allow-backup" ' > z
# cat id_dsa_backup.pub >> z
# cat z
command="/root/bin/allow-backup" ssh-dss AAAAB3NzaC1kc3MAAACBAP...AIcn1bnVulBbkdAEZhen rsync backups

The "z file should be one really long line of text. You can verify using this command:

# wc -l z
1 z

Once you have verified that the "z" file is correct, you can add it to the "authorized_keys" file using this command:

# cat z >> authorized_keys The name "authorized_keys" may be different on your system. This is why I told you to check the "AuthorizedKeysFile" line in the sshd_config file.

Now, anytime somebody logs into the server and uses that key to authenticate, the server will run that script instead of whatever command they were trying to run.

Oh, and you can delete the "z" file, it is no longer needed.

Installing the script on the backup server

The last part, of course, is the script on the backup server which pulls the data from the main server(s).

File:	backup-servers
Size:	3,227 bytes
Date:	2008-01-13 05:40:26 +0000
MD5:	`9254aec07899881b1baa98622bf60e43`
SHA-1:	`54c3eb9bf9dce945e751fe4e7d024f27aa670427`
RIPEMD-160:	`184de07f424638e07684e1f5b51ca77594e0a6e1`
PGP Signature:	backup-servers.asc

Again, on my backup server, this script is installed in the "/root/bin" directory. Of course it has the actual server names, and I have a disk mounted as "/backup" which is the repository for the backups. You may need to customize this- follow the BACKUPDIR and TARGET variables.

I'm also limiting the bandwidth used by the backup process to 1Mbit (which is 2/3 of a T-1 line.) This is done using the "--bwlimit=" parameter on the actual rsync command line. You may wish to change that, or remove it altogether.

Of course you'll need to customize the SERVERS variable. It lists the server names you will be backing up. For example, the actual script on my backup box has four server names in it. Just list the names, one after another, separated by spaces.

You may also wish to modify the list of files or directories which will be excluded from the backup. This is the EXCLUDES variable. You will see how to build a longer list without making one line of text fall off the edge of the screen.

Running the script

The first time you back up a server, it will be copying ALL files on the server. This means, depending on how much data is involved, it will probably take a long time to finish. You may want to run the script with the "-v" parameter (i.e. "./backup-servers -v") so that you can see the filenames as they're being copied.

Once you have the script configured the way you need it, you will probably want to set up a cron job to run it on a regular basis. How often you run it will depend on your needs- I pull backups from my own server every four hours, and from my clients' servers once a day.

Notes

If you will be backing up multiple servers on the same schedule (i.e. all at the same time) you will need to do the steps in the "Installing the key on the server" section on each server, and add the servers' names to the SERVERS variable within the "backup-servers" script on the backup machine.
If you are backing up multiple servers on different schedules, or need to use different keys for different servers, you should copy the "backup-servers" script with a different name, and set up a separate cron job for each one. For example, my own backup server actually has two scripts- one called backup-phineas which runs every four hours, and one called backup-clients which runs once a day.

Contributions

Ingo Claro modified the script, so that instead of backing up "/" with a list of exceptions on each server, it backs up just certain directories from each server (basically, a list of what TO back up, rather than a list of what NOT to back up.) I haven't tried his version (it's not something I need myself) but his change looks logical, and of course it works for him.

Here is the message on my qmail-patch mailing list where he told me about it.