Skip to content

Reinstate your backups!

(when you move them)

Context

A while back I moved my home server from Xen (for host ‘OS’) with mdadm + LVM (for raid6 storage); to Proxmox with ZFS (for raid z+2). One of the Xen guests I had on the server was one called pandora, which was my one-stop target for backups. I like the name, as if I have to open Pandora’s box to get a backup, something has gone wrong. The transfer for many terabytes of data went fine, but I procrastinated redoing pandora and getting my backup target back; which is something of a cardinal sin IT-wise.

You, person reading this: you have backups, right?

I created a debian container to run the new pandora, installed borg by verifying the binary as per the README, and decided to check the repo for my main PC.

Permissions on a local directory bind mount in an unprivileged container on Proxmox

On running borg info, I was greeted with:

borg.locking.LockFailed: Failed to create/acquire the lock /home/robert/backups/borg/zeus/lock.exclusive ([Errno 13]
 Permission denied: '/home/robert/backups/borg/zeus/lock.exclusive.k1f9gcrd.tmp').

Oh right. No permission. But why? I run pandora, the backup container, as unprivileged and use a local mountpoint, as I figure that gets as close to native performance as possible. However, there is a caveat:

However you will soon realise that every file and directory will be mapped to “nobody” (uid 65534), which is fine as long as:

– you do not have restricted permissions set (only group / user readable files, or accessed directories), and
– you do not want to write files using a specific uid/gid, since all files will be created using the high-mapped (100000+) uids.

That’s not going to be useful for a RW scenario with backups. I run borg as an unprivileged user, uid 1000, so I need to create a mapping using lxc.idmap as described there. I ran into some problems (as others have done) trying to set up the mapping, getting errors like: Feb 20 16:27:52 pandora sshd[284]: fatal: setgroups: Invalid argument [preauth] when trying to authorise for ssh, or even problems creating the container: lxc_map_ids: 3668 newuidmap failed to write mapping "newuidmap: uid range [0-1000) -> [100000-101000) not allowed".

The thread there links to a useful script called proxmox-lxc-idmapper, written by Doug Dimick. Running it with the mappings you want to create gives you the entries you should add, for example:

robert@zeus:~/scripts$ python3 run.py 1000

# Add to /etc/pve/lxc/<container_id>.conf:
lxc.idmap: u 0 100000 1000
lxc.idmap: g 0 100000 1000
lxc.idmap: u 1000 1000 1
lxc.idmap: g 1000 1000 1
lxc.idmap: u 1001 101001 64535
lxc.idmap: g 1001 101001 64535

# Add to /etc/subuid:
root:1000:1

# Add to /etc/subgid:
root:1000:1

I wanted to map uid/gid 1000, so this is what I had to add and where I had to add it.

‘Cache is newer than repository’

After getting pandora up and able to read/write the backups area of the storage pool (tank), I retried borg info. However, it complained: Cache is newer than repository - do you have multiple, independently updated repos with same ID?. I don’t think I do, but the always-helpful Thomas Waldemann suggested that I might have backed up to the old location after moving it to the new ZFS storage array. I don’t think I did as I know that data integrity is important and I think I had shut down all access to any data on the old mdadm/lvm array while and after copying; but I cannot definitively rule it out.

Watching a check

So instead of running the command on the PC that I back up, I ran borg info on pandora itself (herself?). It seemed to do nothing, but…

root@pandora:~# du -sh /home/robert/.cache/borg/
8.3G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
9.2G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
12G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
12G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
13G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
14G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
14G	/home/robert/.cache/borg/
root@pandora:~# du -sh /home/robert/.cache/borg/
14G	/home/robert/.cache/borg/

The cache was growing!


Quick tip: Live-growing an LXC/Proxmox container’s storage

The cache was eating the storage, so I quickly increased it with pct:

root@artemis # pct resize 101 rootfs 20G
             # ^^^
             # proxmox container toolkit 
             #     ^^^^^
             #     resize a container mount point
             #             ^^^
             #             container id
             #                 ^^^^^^
             #                 mount point to resize
             #                        ^^^
             #                        new size

You can read the manpage online. Thanks to the tip by fabian in the proxmox forums.


Back to the check! As Thomas Waldmann suggested there might be some repo corruption, I interrupted the borg info command, and ran borg check instead. This also ran while reporting nothing, but… I had a way to check.

A while back I ported a grafana dashboard for ZFS iostats to influxdbv2; I realised I could use this to see if borg check was actually doing anything IO-wise.

Let’s see:

Oh, oh yeah it is! Hitting the disks for ~400MBps.

We can actually observe what it does:

20:09 <ThomasWaldmann> bertieb: repo check phase (1st phase) does a lot of I/O locally on the server. archives check has a lot of client activity and a bit less server activity.
20:10 <ThomasWaldmann> archives check is the 2nd phase.

and the visualisation:

Neat!

The bottom line is: if you update your backup system, reinstate it soon after!

4 thoughts on “Reinstate your backups!”

  1. Pingback: Reassess your backups! – Rob's Blog

  2. Yeah, that would have been interesting! Anecdotally, I never saw much above 150MB/s on the pervious array; but then I wasn’t monitoring them (does a similar thing exist for mdadm/lvm?) and didn’t have access to visualisations.

    I’ve found that keeping an eye on the iostats can, if you’re lucky, catch problems before they occur:

    unexpected io

    Seeing that level of activity when there should have been none caught my attention!

  3. Pingback: Fix your backups! – Rob's Blog

Tell us what's on your mind

Discover more from Rob's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading