Better Backups: Decide What You’re Going To Back Up

tl;dr: Picking what you are going to back up helps (i) keep the backup space usage minimal (ii) helps to inform choice of backup program

Following on from picking a backup system in the backups series, now that you’ve picked a system, what exactly should you back up?

You could make the argument that really, what you’re going to back up is part of your requirements gathering. Frequently-changing data (eg documents) is different from a snapshot of a Windows installation is different from an archive of the family photos.

My my case, I want to back up my home directory, which is a mix of things:

  • documents of all sorts
  • code (some mine, some open source tools)
  • application configuration data
  • browser history etc
  • miscellaneous downloads

It totals less than 20Gb, most of which is split between downloads, browser and code (around 3:1:1, according to ncdu). Some things like documents, code and browser data will change semi-frequently and old versions are useful; others will stay relatively static and version history is not so important (like downloads).

Some downloads were for a one-off specific purpose and removed. It would be possible to pare down further by removing some downloads and some code — wine is the largest directory in ~/code/, and I don’t remember the last time I used it — but it’s not enough that I feel it’s a priority to do.

Is there anything in this set of data that doesn’t need kept? Frequently-changing-but-low-utility files like browser cache would be worth excluding as they will cause the (incremental) backups to grow in size. Incidentally, cache was the next largest item in the ratio above!

Some of the files will change relatively frequently, and I’d like to keep the history of them. I have decided that I want to keep my entire home directory, minus browser cache. This help to inform me what things I need my backup program to do, and what to do with it when I decide.

Quick Hacks: A script to import photos to month-based directories (like Lightroom)

tl;dr: A bash script written in 15 minutes imports files as expected!

I was clearing photos off an SD card so that I have space to photograph a friend’s event this evening. Back on Windows, I would let Lightroom handle imports. Darktable is my photo management software of choice, but it leaves files where they are during import:

Importing a folder does not mean that darktable copies your images into another folder. It just means that the images are visible in lighttable and thus can be developed.

I had photos ranging from July last year until this month, so I needed to put them in directories from 2017/07 to 2018/02. But looking up metadata, copying and pasting seems like a tedious misuse of my time* so I wrote a little script to do so. It is not robust due to some assumptions (eg that the ‘year’ directory already exists) but it got the job done.

# - import from (mounted) sd card to directories based on date



function copy_file_to_dir() {
    if [ ! -d "$2" ]; then
        echo "$2 does not exist!"
        mkdir "$2"
    cp "$1" "$2"

function determine_import_year_month() {
    #echo "exiftool -d "%Y-%m-%d" -S -s -DateTimeOriginal $1"
    yearmonth=$(exiftool -d "%Y/%m/" -S -s -DateTimeOriginal "$1")
    echo $yearmonth

printf "%s%sn" "$CARD_BASEDIR" "$PHOTO_PATH"

find "$CARD_BASEDIR/$PHOTO_PATH" -type f | while read file
    ym=$(determine_import_year_month "$file")
    copy_file_to_dir "$file" "$TARGET_BASEDIR/$ym"
    if let "$i %10 == 0"; then
        echo "Processed file $i ($file)"
    let i++


This uses exiftool to extract the year and month (in the form YYYY/MM), and that is used to give a target to cp.

The enclosing function has a check to see if the directory exists ([ ! -d "$2" ]) before copying. Using rsync would have achieved the effect of auto-creating a directory if needed, but that i) involves another tool ii) probably slows things down slightly due to invocation time iii) writing it this way let me remind myself of how to check for directory existence.

I still occasionally glance at how to iterate over files in bash, even though there are other ways of doing so!

There is also a little use of modulo in there to print some status output.

Not pretty, glamorous or robust but it got the job done!

*: Golden rule: leave computers to do things that they are good at

QA: How can I stop on-screen volume indicator from showing every ~30s?

Note: This is a backup of a QA over at Super User which was auto-deleted, so is preserved here for posterity. Users with >10k rep can see the QA using the link.

This question was one of mine, which didn’t receive any attention, sadly!

I am using crunchbangplusplus (#!++, cbpp) Linux, a Debian-based lightweight distro which uses xfce4-notifyd to provide desktop notifications. One such notification is the humble volume indicator:

the notification for volume, isn't it lovely?

The indicator pops up in response to changes in volume and muting/unmuting. This is grand, but when I have vlc running, the volume indicator pops up every 30-45 seconds, which is rather distracting.

Some searching lead me to a crunchbang forum thread about disabling the volume indicator; but I don’t fancy losing all notifications just to rid myself of this turbulent volume display.

It did however bring me to xfce4-notifyd-config:

xfce4-notifyd-config dialog- lovely, but useless here

but unfortunately it doesn’t have an option to configure individual notifications. I also checked the volume mixer (PNmixer) preferences:

a tryptych of volume preferences

but nothing of help there.

Interestingly, I have observed that when the volume shows, it jumps from one volume (vlc‘s?) to another (system volume?). It also doesn’t happen if both vlc and system volume match at 100%. Since under Linux vlc can set the system volume, I am wondering if there’s a conflict here.


Volume notification appears every ~30s when vlc is running- why, and how can I stop that?

1 A long gif for the patient:

a very long gif of the notification, captured using silentcast

QA: Identifying multimedia connectors

Note: This is a backup of a QA over at Super User which was auto-deleted, so is preserved here for posterity. Users with >10k rep can see the QA using the link.

pentix asked:

I need some help identifying a connector which at fist I thought is an HDMI Type A connector.

The unknown connector is black on both sides, but has a small arrow pointing to the plug on one side. It is a short cable converting SCART to Unknown.

Mystery Connector

Looks like DFP (VESA Digital FlatPanel):

enter image description here


QA: Is there a way to make a backup of the difference with DD or PV?

Note: This is a backup of a QA over at Super User which was auto-deleted, so is preserved here for posterity. Users with >10k rep can see the QA using the link.

So this is an odd question, I know. Here is what I need. I ran a DD backup in linux mint last weekend to backup over 1.8TB of data to an external 4TB HDD from my server. This week, people have been using the server again and I havent had the time to upload that data to the new drives(They are server specific and in a RAID-10 configuration, so I am unable to load the backup to them without using the server). What I need to do is figure out how to use DD, or possibly PV to backup the difference in the data. I want it to skip over existing data from the backup and then backup new data from this last week without taking the entire weekend to do so. Is there any easy method to do this?

Continuing a dd transfer

If and only if you are confident that what you have is a contiguous piece of data that has been appended to, you can use seek= to start dd from an offset.

just use seek= to resume.

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=512 seek=464938971

GNU dd also supports seeking in bytes, so you can resume exactly, regardless of blocksize:

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=1M seek=238048755782 oflag=seek_bytes

Credit to frostshulz for his answer over on U&L.

But to be clear, this only really works if the data has been appended to, as that is akin to a resume. If you think people have written data like the following:

before: ABCDEF


and you want to get the GHIJKLM bit, great, dd will do that for you. But dd operates in terms of bytes (well, records) in and out. What you likely have is analogous to:

before: ABCDEF


and dd will not help you here!

Frame challenge: dd isn’t the right tool to accomplish what you are trying to do

You want to copy only changed data*? rsync (or something based on the rsync protocol, like rsdiff-backup or rsnapshot) is what you want:

Rsync finds files that need to be transferred using a lqquick checkrq algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

from the rsync man page

Further Considerations on Backups

Without knowing what you are trying to accomplish I am reticent to suggest you do something else, but for anyone else looking at this question and thinking ‘man, it would be useful to have a backup program I could easily get new changes with in a space-efficient manner’, it’s definitely worth having a look through the backup program comparison page on the Arch wiki (the general principles are system-agnostic).

In particular, BorgBackup, bup and obman (among others) have favourable characteristics in being generally intelligent about backups and being quite efficient in terms of disk space due to block-level deduplication. In brief, a full 1.*TB backup need not necessarily read an write the full 1.8TB of data each time, unlike with dd.

*: Some further reading on synchronisation and comparisons:

[solved] MySQL: Can’t find file ‘./db/table.frm’ (errno: 13)

tl;dr If you’re seeing this and the table does exist- check (and fix) permissions!

I was searching my backups for a database file which contained the entries for an old ranty blog I used to have before I cancelled the domain.

Lo and behold I had a named, dated .tgz file. Unusually, it contained a backup of the MySQL directory structure; rather than a mysqldump‘d set of SQL queries to reconstruct the databases and tables. No matter, I copied the database directory into /var/lib/mysql/db.

Browsing via the command-line interface indicated the database was present (SHOW DATABASES) and usable (USE db). So I tried to SELECT post_title FROM ranty_posts LIMIT 5. But no can do:

Can’t find file: ‘./db/ranty_posts.frm’ (errno: 13)

The problem is permissions, the file is there, slightly-misleading error message notwithstanding. Fortunately, it’s an easy fix- give mysqld the ability to read the files, eg:

# chown mysql:mysql /var/lib/mysql/db/ -R

Which will change the user and group ownership to mysql.

Database and table names changed to protect the innocent

[solved] ‘Because of billing requirements, it is currently not possible to disable auto-renewal’ (1and1)

tl;dr I was within 24 hours of domain renewal, so had to phone +44 333 336 5691 to cancel

I have a bunch of domains that I used at one point but no longer have a need for. I used to handle my domains with 1and1, and they sent me a reminder about an .eu domain which was expiring soon (tomorrow). I looked at what it had been used for — a ranty blog, if you’re wondering — and decided thta since it hand’t been used since ca. 2008 it probably wasn’t worth keeping.

Side note: Getting rid of things like domains is tricky for me, but something I’m just biting the bullet and doing. Tracking my expenses is helpful as well, an worth a bit of exposition at some point.

I went to cancel via the 1and1 admin page but couldn’t through the domain itself or the contract options:

Because of billing requirements, it is currently not possible to disable auto-renewal

Clicking around in an increasingly-frustrated manner didn’t seem to be getting me anywhere, so I phoned their tech support. Their rep told me that helpfully since it was within 24 hours of renewal, I was unable to change the option (?!); but he was able to manually cancel the domain itself.

Which he did:


Better Learning With Anki

Having recently gone through the rigmarole of yet more exams, I’m in a good position to talk about learning things*.

Spaced repetition has been around for ages, as has the wonderful program Anki. Based on the SM2 algorithm, it’s a valuable tool in the would-be reviser’s arsenal. I’m not here to convince you to use Anki in the first place, as that has been well-discussed elsewhere (eg this Reddit discussion or this recap of 10 000 flashcards.

The ’20 rules’

It’s a good idea to start off with SuperMemo’s Twenty Rules of formulating knowledge; this turned me on to Cloze deletion (which can be achieved Anki with a dedicated card type and shortcut for adding them) and better card formulation.

Recognition and two-way connections

I use reversed cards extensively now- wherever possible, basically. It help solidify connections, eg:

Front: What is the Warburg Effect?
Back: What is the name for the process whereby malignant cells gain energy by glycolysis (rather than oxidative phophorylation)

So if a discussion comes up about the Warburg Effect, you know that it’s very roughly about cancer cell metabolism (unless the people discussing it are plant scientists); and if you’re thinking “hey, what’s that thing called where cancer cells get energy differently”, you can easily recall the name too.


When you see hundreds of cards in the ‘due’ column, finding motivation to sit down and work all the way through can be challenging. So, instead, set a time limit.

You can do that via the ‘timeboxing’ setting in preferences:

Knowing that you’ll only be at it for ten minutes (or less) helps you stay focused. I’ve found myself trying to get as many done in the time limit as possible…

Well, see how many you can lick in an hour. Then try to break that record.

*: Check back in a couple weeks to find out if I did indeed actually pass

Why Won’t My GIMP Python Plug-in Show Up Under Filters?

tl;dr: Did you put it in ~/.gimp-2.8/plug-ins and set the executable bit ?

It’s been a while since I developed a script to automate tasks in GIMP. I figured I would do one for the repetitive tasks for creating a custom YouTube Thumbnail (more on that later perhaps). But my script wasn’t showing up in the Filters menu.

I had found the preference for setting the directory: Edit ? Preferences ? Folders ? Plug-Ins (not that GIMP treats python as plug-ins, not scripts); with the default user folder being ~/.gimp-2.8/plug-ins. But the plug-in dind’t show up.

Restart GIMP. Still nothing.

Ask on IRC. Double check the documentation (always a good idea). Aha!

Scheme and Python plug-ins are readable text files. C-language and Python plug-in files must have permissions set to allow execution.

chmod +x later, and it registered!

Hope this saves someone the twenty or thirty minutes it took me to find this out!