Powerful Remote Incremental Backup with rdiff-backup
The last few days I have been testing backup software to automatically backup my desktop Ubuntu system. I’ve only just got it set up, but rdiff-backup is exactly what I was looking for and seems to be working very well.
rdiff-backup tries to “combine the best features of a mirror and an incremental backup”. It’s a command line utility that not only gives you a plain mirror of your files, but also allows you to retrieve previous versions of your files using the extra difference data it keeps. This means you can quickly copy and paste to restore a file from your most recent backup, or retrieve the contents of files as they were at the time of any previous backup. rdiff-backup has built in support for network backups over SSH and is network-efficient and fast thanks to its incremental nature. It’s also possible to run rdiff-backup on Windows, soon I’ll be investigating whether it will work as a backup solution for the Windows systems on my network.
If you’re looking for an simpler graphical backup tool, check out A Guide to System Backup and Restore in Ubuntu. The rest of this post will go though how I’ve set up rdiff-backup to backup my home directory to a hard drive connected to another Ubuntu system on my network.
SSH public key authentication
If you want to schedule automatic rdiff-backups over the network you will need to use public keys with SSH so rdiff-backup can log into the remote system without a password. This assumes that both the remote and local systems already have SSH servers installed.
On your local system create new key pair with no passphrase for your user:
ssh-keygen -t rsa
Use the ssh-copy-id tool to give the new public key to the remote backup system:
ssh-copy-id -i ~/.ssh/id_rsa.pub '-p 2222 backups@192.168.1.40'
Finally, test logging in to the remote system without a password:
ssh -p 2222 backups@192.168.1.40
Install rdiff-backup
When operating over the network, rdiff-backup is required to be installed on both systems. Ideally both copies of rdiff-backup will be the exact same version. If your local and remote systems are both running the same version of Ubuntu, you can install it from the repositories. If you’ve got different versions of Ubuntu, there is a PPA available with the latest version of rdiff-backup for every supported version of Ubuntu except dapper. Unfortunately my remote backups server is still running dapper, but I didn’t have any trouble installing the latest version of rdiff-backup from source.
Write your backup script
rdiff-backup’s options are pretty easy to configure. Be sure to read the page of examples as well as the manpage as you write your backup command. Here’s my backup.sh script file for running a backup of my home directory:
#!/bin/sh
rdiff-backup --print-statistics --remote-schema 'ssh -p 2222 %s rdiff-backup --server' --exclude /home/tom/Virtual\ Machines --exclude /home/tom/Videos --exclude /home/tom/.gvfs --exclude /home/tom/.local/share/Trash /home/tom backups@192.168.1.40::/media/backups/backups/tom-rdiff
rdiff-backup --remove-older-than 1M --remote-schema 'ssh -p 2222 %s rdiff-backup --server' backups@192.168.1.40::/media/backups/backups/tom-rdiff
The first command connects to my backups server with ssh on port 2222 and backs up my home directory while excluding some directories with files I won’t mind loosing (be sure to exclude your trash and gvfs folders). The second command removes increments older than one month to save disk space.
Restoring files
rdiff-backup doesn’t require anything special to restore files; just browse to the remote folder and all your files will be there. If you need it, you can use rdiff-backup to recover files from dates in the past. Check the links I posted in the last section for more on how to do this.
Automate it
I’ve scheduled my backup to run every Sunday when I’m not going to be on the computer using cron. Edit your user’s cron file with this command:
crontab -e
Here’s my cron line for backing up. See the Ubuntu documentation page for help with writing your own. I’ve also redirected stdout from the backup script to a log file so I can watch the rdiff-backup statistics.
0 2 * * 0 /home/tom/backup.sh >> backup.log
I’ve just finished setting up this new backup system, so I’ll update this page if I find that I need to make changes. What software are you backing up your Linux systems with? I’d be interested to hear how many of you are using rdiff-backup as well.
Archived Comments
Chris Peplin
I switched my rsync-based backup system to rdiff-backup about 3 months ago for the features you’ve described, but I’ve just had to switch back. I’m wondering if you will have any of these same issues:
rdiff-backup was extremely slow on large numbers of files. The initial backup of my music directory (not even copying any files over the network, just setting up the rdiff-directory for around 35k files) took over 48 hours.
rdiff-backup metadata tended to get corrupted, and I have no idea why. I would then have to delete the entire metadata folder and repeat that painful 48 hour initial backup.
rdiff-backup didn’t handle a connection dropping in the middle of a backup very well. It doesn’t save the progress at all, and has to completely wipe the partial backup and start again the next time around.
Tom
rdiff backup is reporting 68k files and 20 GB for my home directory backup. The initial backup took only about two hours. Perhaps there’s something wrong with your filesystem.
Did you try –check-destination-dir for repairing a failed backup?
Tommy
This is a great guide. I’m a real fan of rdiff-backup, and so it’s nice to have a systematically put-together tutorial to recommend to people.
You might also be interested in Duplicity, rdiff-backup’s non-identical twi brother. And if you like Duplicity, there’s a new front end out there for it: http://www.oak-tree.us/blog/index.php/science-and-technology/time-drive
Tom
I also tried out duplicity. I liked it’s integration with GVFS but I didn’t like how it stored data in hundreds of archives rather than a simple mirror like rdiff-backup.
time-drive does look very slick.
Tommy
I totally with that–a mirror is much easier to work with, provided you have diff increments. But that requires rdiff-backup on both the source and the destination, which is a luxury some of us have, and others don’t. The advantage of Duplicity is that it can do the math without a version of itself running on another machine.
jimcooncat
I’ve been using rsnapshot for several years now. I used this in favor of rdiff-backup mainly because it was so simple to dig into the archives – no special command needed. Truth is, though, I seldom venture beyond the most recent archive.
It’s saved us several times, especially since what it’s backing up doesn’t have a recycle bin. About 90% of the time I’m just restoring a deleted file, and about 10% of the time I’m resurrecting a file that’s been overwritten by unwanted text.
I run it in the background on my work machine every four hours, and never notice a slowdown on my other work when it’s running.
Tom
Thanks, I’ll have to try rsnapshot as well.
Matt
I found that rsnapshot (http://rsnapshot.org/) an excellent backup program and does something very similiar to rdiff-backup.
I’ve been running rsnapshot now across about a dozen machines for over a year – and it’s saved my bacon a number of times!
A comparison of both you can read at:
http://www.saltycrane.com/blog/2008/02/backup-on-linux-rsnapshot-vs-rdiff/
Mosh
I picked rdiff-backup for an office system I was setting up 2 years ago and it’s worked a treat ever since. The only downside is that file restoration isn’t a simple “drag/drop” or copy and there’s nobody on site there to recover things if it all goes wrong.
However, that’s why they pay me a retainer and I have remote access to the system ;)
kyio
http://backuppc.sourceforge.net/
i find backuppc quite usefull because you install it on your backup machine.
this pc will connect to each system to backup data. so your desktop does not
have high resource usage
Rick
A few months ago, I waded through all of the above rsync options and ended up using rsnapshot. I don’t know that it is necessarily better than any other rsync option, but I find that it is simple and works great! Before that I needed to do a simple network backup from a windows machine to linux server and I found that Delta copy worked well.
Now that I need to backup from linux to linux, the rsnapshot does the trick!
lucidsystems
Great introduction to using rdiff-backup.
You may also be interested in LBackup : http://www.lbackup.org.
Harper04
I would recommend you to read this too. http://www.mikerubel.org/computers/rsync_snapshots/