backups wisdom

Monitor Your Backups!

In which I find out things have gone awry

Hey you! Yes, you! The one reading this. You have backups, right? Go check that they i) actually exist ii) are backing up at the right frequency iii) work. This is important, I’ll wait.

borg: Great for Backing Up

I’ve been using borg for backups for a couple of years now. It’s great- it does deduplication (saving tons of space!), only backs up what has changed (efficient! incremental!), and is somehow fun to use while doing so.

I wrote a script to take the backups, run as a systemd service each hour. All was well- it did error detection and emailed me when a backup failed.

But I had the occasion to check on the backups a couple days ago, and the latest one was from January. My first thought was disk space, but there was enough (albeit getting close to the limit). So I then checked the systemd output:

$ systemctl status periodic-backup
? periodic-backup.service - Take a periodic backup of directories
     Loaded: loaded (/usr/lib/systemd/system/periodic-backup.service; enabled; vendor preset: disabled)
     Active: inactive (dead) since Wed 2020-02-12 12:03:06 GMT; 45min ago
TriggeredBy: ? periodic-backup.timer
   Main PID: 1168530 (code=exited, status=0/SUCCESS)

Feb 12 12:03:02 zeus systemd[1]: Started Take a periodic backup of directories.
Feb 12 12:03:06 zeus systemd[1]: periodic-backup.service: Succeeded.

So the job was running and… succeeding, but not backing up?

Next step in diagnosis is to run the script manually, and make sure it still works. The script didn’t error, but it took a long time to complete- longer than a straightforward case of “large increment to backup since January”.

So I broke it down even further, and ran the borg command as written in the script. I got a prompt:

Warning: The repository at location ssh://bertieb@pandora/home/bertieb/backups/borg/zeus was previously located at ssh://pandora/~/backups/borg/zeus

Aha! It was waiting on input to proceed. One form is how the script access the repo, the other is how it is accessed from the command line. It’s a bit strange as the repo clearly didn’t move, and I’m not sure why it started treating the two differently.

Fortunately, borg has an environment var for just such an occasion: BORG_RELOCATED_REPO_ACCESS_IS_OK=yes


I asked in #borgbackup on Freenode about the issue, and folks said they had used a few things for independently monitoring backups:

I am indebted to Armageddon for mentioning the last one. While full-on monitoring with Prometheus looks interesting (especially in conjunction with grafana), it’s way overkill for my needs. Ditto Zabbix.

Healthchecks is a relatively simple tool which implements the concept, “we expect a ping/health-check at <such-and-such> a frequency; if we don’t get it then alert”.

Armageddon/Lazkani’s blog has a worked example of setting up Healthchecks to work with borgmatic (a tool to simplify borg backups). The official borgmatic ‘getting started’ guide is pretty good too.

The env vars in the Healthchecks docker image are used on creation; after they can be changed in

I set up Healthchecks using the linuxserver Docker imagebig note: the env vars listed there are used on creation, after that they can be changed in the data volume / directory under; that one held me up for a bit when i was trying to sort out email integration — and have added both my pre-existing scripts, and some new borgmatic backups.

Looking good!

If you use the helpful ‘crontab’ format for the period, make sure to match the timezone, or you’ll get period emails saying the backup has failed. Ask me how I know…


Better Backups: Decide What You’re Going To Back Up

tl;dr: Picking what you are going to back up helps (i) keep the backup space usage minimal (ii) helps to inform choice of backup program

Following on from picking a backup system in the backups series, now that you’ve picked a system, what exactly should you back up?

You could make the argument that really, what you’re going to back up is part of your requirements gathering. Frequently-changing data (eg documents) is different from a snapshot of a Windows installation is different from an archive of the family photos.

My my case, I want to back up my home directory, which is a mix of things:

  • documents of all sorts
  • code (some mine, some open source tools)
  • application configuration data
  • browser history etc
  • miscellaneous downloads

It totals less than 20Gb, most of which is split between downloads, browser and code (around 3:1:1, according to ncdu). Some things like documents, code and browser data will change semi-frequently and old versions are useful; others will stay relatively static and version history is not so important (like downloads).

Some downloads were for a one-off specific purpose and removed. It would be possible to pare down further by removing some downloads and some code — wine is the largest directory in ~/code/, and I don’t remember the last time I used it — but it’s not enough that I feel it’s a priority to do.

Is there anything in this set of data that doesn’t need kept? Frequently-changing-but-low-utility files like browser cache would be worth excluding as they will cause the (incremental) backups to grow in size. Incidentally, cache was the next largest item in the ratio above!

Some of the files will change relatively frequently, and I’d like to keep the history of them. I have decided that I want to keep my entire home directory, minus browser cache. This help to inform me what things I need my backup program to do, and what to do with it when I decide.

backups linux

Better Backups: Pick a System

You have backups, right?

— SuperUser’s chat room motto

This started out as an intro to bup. Somewhere along the way it underwent a philosophical metamorphosis.

I’m certainly not the first person to say that backups are like insurance. They are a bit of a hassle to figure out which one will work best, you set it up and forget about it, and hopefully you won’t need it*.

Many moons ago, I had backups taken care of by a a simple shell script. Later, this got promoted to a python script which handled hourly, daily weekly and monthly rotation; and saved space by using hard links (cp -al ...). It even differentiated between local and remote backups. That was probably my backup zenith, at least when time and effort are factored in.

Really, the more sensible approach is rather than reinvent the wheel, use an existing tried-and-tested solution. So I moved to rdiff-backup and it was good; being simple it meant I could set up ‘fire-and-forget’ backups via cron. I was able to restore files from backups that I had set up and then forgotten about.

With the recent expansion of the fileserver ongoing, now’s a good time to take stock and re-evaluate options. I have created a Xen DomU dedicated to backups (called pandora, aptly) with it’s own dedicated logical volume. From here, I need to decide:

1) whether to keep going with rdiff-backup or switch to eg bup or borg
2) figure out if different machines could use different schedules or approaches (answer: probably); and if so, what those would be (answer: …)

I don’t want to spend too long on this — premature optimisation being the root of all evil — but the aim is to create a backup system which is:

  • robust
  • reliable
  • maintenance-minimal

: If you *do use your backups or insurance a lot, it’s probably a sign that something is going wrong somewhere