In previous articles, I’ve described a basic backup strategy for running an OS X-based server; now it’s time to take a look at backing up important information to another server in a geographically distinct location.
Backing Up the Whole Server
In principal, there’s nothing to stop you from backing up your entire server to another location. It’s possible to set up Time Machine, for example, so that your server is backed up continuously to another machine in some other data centre. You can do this by establishing a VPN connection from your server to another network, and backing up to a Time Capsule on that network, or it’s possible to forward port 548 (for file-sharing) on the remote network so as to expose a Time Capsule to direct internet access.
However, in my view, Time Machine over the internet does not make a particularly good solution. In addition to problems of speed and security during ordinary use, the issue of speed during a restore is a real killer. Time Machine is slow enough to restore from a local hard drive; do you really want to attempt the feat over an internet connection?
At a certain point of potential downtime, it becomes better to rebuild the server environment from scratch and restore only the irreplaceable data, rather than attempting to restore a whole bunch of stuff which does not have to be restored and can instead be rebuilt. This is the case whether you’re considering Time Machine specifically, or
rsync, or any other option for backing up the entire server. The only thing that really matters is where you might judge that crucial point of downtime: are you happy with a 12-hour restore, or a 5-hour restore, or 1 hour, or 15 minutes, or…?
Backing Up Only What Cannot Be Rebuilt
So, if you decide against backing up the entire server remotely, you’re left with a choice of what really needs to be backed up as essentially irreplaceable data. It obviously depends very much on your particular setup, but some of the things you might consider backing up remotely could include:
- the whole of
- main configuration files, like
- PHP includes
- MySQL databases
- customised web apps or Apache includes (if not backing up
- and more…
Remote Backup Tools and Destinations
In terms of where to put your backups and how to get them there (and back), there are plenty of options. One obvious candidate is Amazon S3, while another is any other server which you might already have access to for other purposes; another is a dedicated storage service like BQ Backup, which I’ve used myself.
Unfortunately, what seems at first glance like an obviously great option — Amazon S3 — is actually not so great at all, in my view. The problem is that you cannot simply
rsync or SFTP your data to Amazon S3: you need a client which speaks the Amazon protocol, and in every case I am aware of, these tools do not quite meet the needs of a server environment. For example, I would love to be able to recommend Arq, with its on-the-fly encryption and intelligent quota management. I think it is a terrific remote backup tool, and I use it myself on my main work machine. Unfortunately, in its present incarnation it cannot run without a user being logged into the GUI; since it is good practice not to leave someone continuously logged into a GUI on a server, this renders it entirely unsuitable for server use.
You might think it would be easy enough just to set up one of the main Mac SFTP clients to sync regularly to Amazon S3, but of the three leading Mac clients, Yummy FTP does not speak the Amazon protocols, Interarchy speaks the protocols but has problems even getting connected to S3, and Transmit has significant bugs in its current handling of S3 connections (some of which are potentially data-destroying) that render it entirely unusable for the purpose. Like Arq, problems also arise when you don’t want to leave someone logged into the server on a continuous basis.
Several rsync-like tools attempt to implement incremental backups to S3 — I’m thinking of duplicity and the like — and a range of MacFUSE-based tools make it possible to mount S3 buckets as volumes, which you might think would make it easy to
rsync to them. Unfortunately, in my limited testing, each brings with it significant hassles, risks, or outright bugs that are sufficiently worrying that I would not want to trust them as my last lifeline in case of catastrophic server failure.
This leaves another server of your own or a dedicated storage service as the main options — or, perhaps, an Amazon cloud-based server instantiated for the purpose of acting as an
rsync-speaking front end to Amazon’s Elastic Block Storage. (This latter option seems worth exploring in more detail, I just haven’t done it yet. I’m already taking advantage of Amazon’s free tier to run a micro instance, but if I still had a free one to work with,
rsync to it would be top of my list.) And how to get the backups there? Good old-fashioned — and rock-solid —
For my own uses,
rsync is the best option right now, but even here, there are some important caveats to keep in mind. The biggest of these is not to be lulled into a false sense of security by the fact that
rsync can be set to preserve permissions. Yes, it is true that
rsync can preserve permissions, but unless you are running as root on two different machines with exactly the same set of users and groups, you will not be able to
rsync to the backup destination and then
rsync back to restore and have everything be ready to roll. When you’re connecting to a dedicated storage service, for example, you are connecting as a specific user, and your
rsync-ed files are going to be owned by that user, regardless of what you do; when you
rsync them back, they’re not going to magically start being owned by root or owned by the web process or owned by other individual users on your machine again.
You can get around this problem by packaging things up first — for example, into compressed encrypted archives — before placing them on the remote server. Of course this does eliminate the advantages of incremental copying made possible by
rsync in the first place, which is not by any means a trivial tradeoff.
A Simple Rsync Example for Rotating Incremental Backups
If you do use
rsync, you can make it happen with a shell script along the following lines:
#!/bin/bash ## Note the day of week both in full form, e.g., Sunday, and in digit form, e.g. 0 _dow="$(date +'%A')" _now="$(date +'%w')" ## Upload server dir to remote server folder with day of week appended, except for exclusions, using compression rsync -aqz --exclude-from '/path/to/exclusions.txt' --delete -e ssh /Library/Server/ firstname.lastname@example.org:Server-"$_now-$_dow"/
(Note that the
rsync command goes all on one line, even though it may be broken onto two or more lines when displayed here.)
This will create a seven-day rotation of
/Library/Server backups, minus any exclusions you’ve listed in
exclusions.txt (such as cache files you won’t need in the event of a restore), and labelled like so:
Such a shell script can be set to trigger daily using
launchd (see “Replacing Cron Jobs With Launchd on OS X”), providing an ongoing incremental backup of the entire
/Library/Server directory that is retained for seven days. The shell script can be modified to keep backups for longer, to keep an additional weekly backup, etc.
All material on this site is carefully reviewed, but its accuracy cannot be guaranteed, and some suggestions offered here might just be silly ideas. For best results, please do your own checking and verifying. This specific article was last reviewed or updated by Greg on .