Skip to content

QA: Is there a way to make a backup of the difference with DD or PV?

  • by

Note: This is a backup of a QA over at Super User which was auto-deleted, so is preserved here for posterity. Users with >10k rep can see the QA using the link.


So this is an odd question, I know. Here is what I need. I ran a DD backup in linux mint last weekend to backup over 1.8TB of data to an external 4TB HDD from my server. This week, people have been using the server again and I havent had the time to upload that data to the new drives(They are server specific and in a RAID-10 configuration, so I am unable to load the backup to them without using the server). What I need to do is figure out how to use DD, or possibly PV to backup the difference in the data. I want it to skip over existing data from the backup and then backup new data from this last week without taking the entire weekend to do so. Is there any easy method to do this?


Continuing a dd transfer

If and only if you are confident that what you have is a contiguous piece of data that has been appended to, you can use seek= to start dd from an offset.

just use seek= to resume.

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=512 seek=464938971

GNU dd also supports seeking in bytes, so you can resume exactly, regardless of blocksize:

dd if=/dev/urandom of=/dev/disk/by-uuid/etc bs=1M seek=238048755782 oflag=seek_bytes

Credit to frostshulz for his answer over on U&L.

But to be clear, this only really works if the data has been appended to, as that is akin to a resume. If you think people have written data like the following:

before: ABCDEF

after: ABCDEFGHIJKLM

and you want to get the GHIJKLM bit, great, dd will do that for you. But dd operates in terms of bytes (well, records) in and out. What you likely have is analogous to:

before: ABCDEF

after: AGCDMLEKFHI

and dd will not help you here!

Frame challenge: dd isn’t the right tool to accomplish what you are trying to do

You want to copy only changed data*? rsync (or something based on the rsync protocol, like rsdiff-backup or rsnapshot) is what you want:

Rsync finds files that need to be transferred using a lqquick checkrq algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

from the rsync man page

Further Considerations on Backups

Without knowing what you are trying to accomplish I am reticent to suggest you do something else, but for anyone else looking at this question and thinking ‘man, it would be useful to have a backup program I could easily get new changes with in a space-efficient manner’, it’s definitely worth having a look through the backup program comparison page on the Arch wiki (the general principles are system-agnostic).

In particular, BorgBackup, bup and obman (among others) have favourable characteristics in being generally intelligent about backups and being quite efficient in terms of disk space due to block-level deduplication. In brief, a full 1.*TB backup need not necessarily read an write the full 1.8TB of data each time, unlike with dd.

*: Some further reading on synchronisation and comparisons: https://wiki.archlinux.org/index.php/Synchronization_and_backup_programs

Tell us what's on your mind

Discover more from Rob's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading