QA: Is there a way to make a backup of the difference with DD or PV?

Note: This is a backup of a QA over at Super User which was auto-deleted, so is preserved here for posterity. Users with >10k rep can see the QA using the link.

So this is an odd question, I know. Here is what I need. I ran a DD backup in linux mint last weekend to backup over 1.8TB of data to an external 4TB HDD from my server. This week, people have been using the server again and I havent had the time to upload that data to the new drives(They are server specific and in a RAID-10 configuration, so I am unable to load the backup to them without using the server). What I need to do is figure out how to use DD, or possibly PV to backup the difference in the data. I want it to skip over existing data from the backup and then backup new data from this last week without taking the entire weekend to do so. Is there any easy method to do this?

Continuing a dd transfer

If and only if you are confident that what you have is a contiguous piece of data that has been appended to, you can use seek= to start dd from an offset.

just use seek= to resume.

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=512 seek=464938971

GNU dd also supports seeking in bytes, so you can resume exactly, regardless of blocksize:

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=1M seek=238048755782 oflag=seek_bytes

Credit to frostshulz for his answer over on U&L.

But to be clear, this only really works if the data has been appended to, as that is akin to a resume. If you think people have written data like the following:

before: ABCDEF


and you want to get the GHIJKLM bit, great, dd will do that for you. But dd operates in terms of bytes (well, records) in and out. What you likely have is analogous to:

before: ABCDEF


and dd will not help you here!

Frame challenge: dd isn’t the right tool to accomplish what you are trying to do

You want to copy only changed data*? rsync (or something based on the rsync protocol, like rsdiff-backup or rsnapshot) is what you want:

Rsync finds files that need to be transferred using a lqquick checkrq algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

from the rsync man page

Further Considerations on Backups

Without knowing what you are trying to accomplish I am reticent to suggest you do something else, but for anyone else looking at this question and thinking ‘man, it would be useful to have a backup program I could easily get new changes with in a space-efficient manner’, it’s definitely worth having a look through the backup program comparison page on the Arch wiki (the general principles are system-agnostic).

In particular, BorgBackup, bup and obman (among others) have favourable characteristics in being generally intelligent about backups and being quite efficient in terms of disk space due to block-level deduplication. In brief, a full 1.*TB backup need not necessarily read an write the full 1.8TB of data each time, unlike with dd.

*: Some further reading on synchronisation and comparisons:

Show Progress Bar During dd Copy

There are a number of ways of showing the progress of a dd copy. The easiest is sending the USER1 signal to the dd process, like:

dd if=FILE1 of=FILE2
pkill -USER1 dd

But that only gives a current status – eg 12345678 bytes transferred (11.77MB) … [8.56MB/s]. Not that helpful if you want an ongoing update. You can make it periodic by using the watch command:

watch -n 10 pkill -USR1 dd

Still not perfect. If you want a progress bar, ETA and so forth, you’re best off using pv, a utility that measures the speed of a file through a pipe. If you don’t know what that means, I’d recommend reading up on the UNIX philosophy and pipes, but basically it means you can tell what dd is doing. Use it like so:

pv FILE1 | dd of=FILE2

For my use, backing up a 500GB hard drive that I use with a NAS (an NSLU2 I’ve probably mentioned before), I used the following command:

sudo pv /dev/sde3 |dd of=~/tera/nslu2.img

/dev/de3 is the data partition on the drive
~/tera/nslu2.img is the image file I want written to a terabyte-sized hard drive, mounted at ~/tera/.

As a post-script, the reason I’m backing this drive up is my nslu2 is failing weirdly. It was running without a hitch for years, then it without apparent warning dismounted the drive in slot1 and reverted to running from flash. I only figured that out as I had to log in with an old password. Trying to start it up results in it beeping once every minute or so, with the ready status lamp flashing orange. It seems to go through a loop. I thought it was a temporary glitch, as when I plugged the drive into my desktop to check the thing was intact the partitions showed up, then when I plugged it back into the nslu2 and turned it on it worked fine.

That was 3 nights ago. This evening it’s done the same thing, except repeating the steps didn’t sort it out. Running the drive through some brief SMART diagnostics and a partition check shows up no problems, so I’m inclined to believe the problem lies in the nslu2. I’ll post again if I can sort it, but at this stage I think a re-flashing is in order. Only problem is I can’t remember what firmware I flashed onto the nslu2 in the first place. D’oh!