rsync with --link-dest: Question about interrupted rsync session

bvz · 09-04-2016, 06:11 PM

Edit: Sorry for the title of this post... I can't change it... but it should read more like Looking for strategies and tips on handling interrupted, incremental rsync backups

I have a script that uses rsync to generate backups on a remote server. Here is an example of an rsync command it generates:

Code:

rsync --password-file /path/to/rsync.pwd --stats -ahi --link-dest ../backup--2016-08-05__16_21_50 --link-dest ../backup--2016-08-03__13_01_27  /Users/me --exclude Library/ --exclude .Trash/ rsync://rsync@000.000.000.000/backups/iMac_me/backup--2016-09-04__2016__14_34_30

My question is what happens when an rsync session is interrupted (and whether anyone has any strategic suggestions on how to handle that).

Specifically, if a session is interrupted I might get a backup directory that is incomplete, right? Then, if I run another rsync command (which writes to its own directory) and it ONLY looks back at the interrupted session's directory using --link-dest... it will think that it has a bunch of new files. In fact these new files are actually already backed up, they just didn't show up in the most recent interrupted session.

In my script, to try to handle that, I actually supply a bunch of --link-dest paths (up to 20 in total) to make sure it isn't backing up more than it should. I just assume (and hope) that within the last 20 backups, that one was able to run to completion.

But is this the best way of doing this? What happens if there are more than 20 interrupted backups? Should I be issuing a cp -al of the most recent backup (via ssh) first, and then having it rsync with the --delete-during option instead of the --link-dest option? And what does that imply for an interrupted rsync session using that technique?

Then there is the question of whether a particular backup directory created this way is complete. For example, if I need to restore the contents of the latest backup and it has fewer files in it than the previous one... is there a way I can find out whether it has fewer files because there were fewer files to back up (i.e. the source has had some files deleted) or that it has fewer files because the rsync session simply didn't finish.

Are there other strategies out there that people would recommend for how they use rsync to perform incremental backups?

Just looking for opinions on and/or links to different rsync strategies, and how you all handle them.

Thanks!

agillator · 09-05-2016, 12:30 PM

See if this helps:
http://stackoverflow.com/questions/1...nsfer#19804114

After the original response was posted I found this which may be an even more complete and understandable reference.
http://unix.stackexchange.com/questi...ng-interrupted

bvz · 09-05-2016, 01:06 PM

Quote:

Originally Posted by agillator

See if this helps:
http://stackoverflow.com/questions/1...nsfer#19804114

After the original response was posted I found this which may be an even more complete and understandable reference.
http://unix.stackexchange.com/questi...ng-interrupted

Thanks for the links!

I think my original question was really badly formed. I should have re-written it instead of just adding the addendum at the top.

What I was really after is what sort of strategy people use for interrupted, incremental backups that are using --link-dest (i.e. a new backup directory each session with hard links to the previous sessions). Would you re-start the backup during the next cycle, writing to the same destination? Even thought that backup was technically an "older" backup and the newer files you are writing technically should be in the newer backup directory? Or would you leave the interrupted session interrupted, and then just start a new backup but provide several --link-dest paths so that files already existing on the server are not re-copied? If you use this second method, how do you indicate that a particular backup folder is actually incomplete?

That said, I am also just now starting to deal with this situation (technically) so the links you provided will be super helpful with that!

Thanks.

agillator · 09-05-2016, 02:01 PM

I'm afraid I can't help you too much. I backup using rsnapshot which apparently uses a slightly different strategy and I honestly don't know what happens if a backup is interrupted. I'll have to investigate and find out. However, rsnapshot's method is to keep a number of directories, hourly.0 through hourly.6 for example. At backup time hourly.6 is dropped, hourly.5 is moved to hourly.6, and so on until hourly.0 is copied (with hard links) to hourly.1. All copying is done with hard links if the file is already on the backup and has not changed. Then the new backup is rsynced to hourly.0 so that only changed files are transferred. I am backing up a desktop which doesn't have as many changes as it would have in a busy office, but I am backing up client files which currently amount to something over 75GB every four hours and it normally takes only a minute or two. Now the INITIAL backup took several hours, of course, since everything was a change. But since . . . . You might take a look at rsnapshot and see if it gives you some ideas. It may deal with interruptions behind the scenes and I just don't realize it. I have had no trouble and am quite satisfied with it, and it has saved my neck a few times. I will have to investigate, though, because I am a firm believer in Murphy, know him all too well, and perhaps I have just been lucky. But then you know what they say, the Lord looks out for dumb animals and damn fools.

bvz · 09-05-2016, 10:02 PM

Quote:

Originally Posted by agillator

I'm afraid I can't help you too much. I backup using rsnapshot which apparently uses a slightly different strategy and I honestly don't know what happens if a backup is interrupted. I'll have to investigate and find out. However, rsnapshot's method is to keep a number of directories, hourly.0 through hourly.6 for example. At backup time hourly.6 is dropped, hourly.5 is moved to hourly.6, and so on until hourly.0 is copied (with hard links) to hourly.1. All copying is done with hard links if the file is already on the backup and has not changed. Then the new backup is rsynced to hourly.0 so that only changed files are transferred. I am backing up a desktop which doesn't have as many changes as it would have in a busy office, but I am backing up client files which currently amount to something over 75GB every four hours and it normally takes only a minute or two. Now the INITIAL backup took several hours, of course, since everything was a change. But since . . . . You might take a look at rsnapshot and see if it gives you some ideas. It may deal with interruptions behind the scenes and I just don't realize it. I have had no trouble and am quite satisfied with it, and it has saved my neck a few times. I will have to investigate, though, because I am a firm believer in Murphy, know him all too well, and perhaps I have just been lucky. But then you know what they say, the Lord looks out for dumb animals and damn fools.

Double checking your backups is important! I remember a time a long time ago (we are talking Windows 95 days) when I did a lot of backing up but never checked my backups. Then one day I had a disk failure and it turned out my backups were unreadable. Now, a part of my script that does backups will also periodically grab some random files off of my local machine and try to restore the same files from the backup. It will then do a checksum of each set to ensure that they are identical. (This bit has not been written yet, but it is coming). This is no guarantee of course, but it will give me a little peace of mind to know that there is a process that is double checking my backups to ensure that in some small measure there is consistency.

Thanks for the hints, and thanks for the rsnapshot suggestion. I'll look into that as well.

Beryllos · 09-08-2016, 11:06 PM

How often is your rsync interrupted? If it happens only rarely, test the exit value, which will be zero if the rsync was a complete success. If the exit value is not zero, you could try repeating the same rsync command, but not an unlimited number of times. Log the exit values so you can examine them later.

If interruptions are rare and you are able to complete the transfer on the second attempt, then the rsync command would only require one --link-dest, pointing to the most recent backup which is known to be complete.

bvz · 09-09-2016, 01:07 AM

Quote:

Originally Posted by Beryllos

How often is your rsync interrupted? If it happens only rarely, test the exit value, which will be zero if the rsync was a complete success. If the exit value is not zero, you could try repeating the same rsync command, but not an unlimited number of times. Log the exit values so you can examine them later.

If interruptions are rare and you are able to complete the transfer on the second attempt, then the rsync command would only require one --link-dest, pointing to the most recent backup which is known to be complete.

I'm not sure as I am just setting this up. I just want it to be as robust as possible.

I finally decided to do something similar to your suggestion. My script drops a small file in a known location that has the name of the backup job that it is running (it reads from a config file and can run multiple different backup jobs on differing schedules) and which contains the actual rsync command it will be running. Then it runs the rsync command. When it is done (and gets a zero exit value) it deletes this file again.

The next time it runs it first looks to see if this small file is there. If it is, that means the last rsync job was interrupted. It then just reads in the rsync command from that file and runs it again - which SHOULD just complete the backup. Only when that rsync completes with a value of 0 will it remove the file.

In this way, with the exception of the unlikely event that the script fails after it finishes, but before it deletes this file, I will always be able to complete any incomplete backups. (and even in that case, it would just run again anyway - no big deal). The only issue here is a tiny one: My backups are named for the date and time they were initiated. If a backup fails and then is re-launched during the next backup window it will be backing up "current" files to an "older" backup folder. I.e. September 10's backup might be backing up to a directory that includes the name "September 9". Ultimately not a big deal at all.

Thanks for the suggestion. If anyone is interested I will be posting a link to the script up on this forum somewhere once I have finished refactoring it (some ugly coding conventions in there at the moment). It is nice in that it is completely self contained (i.e. if you have python and rsync installed, you have all the dependencies you need), it will run any number of different backup jobs at a time (i.e. each backup job describes what directories to back up to which server), and each backup job can run on its own schedule (i.e. I have my Applications Folder on my OSX machine back up once a week, but my home directory backs up twice a day).

Beryllos · 09-09-2016, 10:35 PM

Quote:

Originally Posted by bvz

I'm not sure as I am just setting this up. I just want it to be as robust as possible...

Some things that you worried about will rarely if ever happen, and other problems will appear that you hadn't thought of, but you'll figure it out as you go.

Quote:

My script drops a small file in a known location that has the name of the backup job that it is running (it reads from a config file and can run multiple different backup jobs on differing schedules) and which contains the actual rsync command it will be running. Then it runs the rsync command. When it is done (and gets a zero exit value) it deletes this file again.

The next time it runs it first looks to see if this small file is there. If it is, that means the last rsync job was interrupted. It then just reads in the rsync command from that file and runs it again - which SHOULD just complete the backup. Only when that rsync completes with a value of 0 will it remove the file.

In this way, with the exception of the unlikely event that the script fails after it finishes, but before it deletes this file, I will always be able to complete any incomplete backups. (and even in that case, it would just run again anyway - no big deal). The only issue here is a tiny one: My backups are named for the date and time they were initiated. If a backup fails and then is re-launched during the next backup window it will be backing up "current" files to an "older" backup folder. I.e. September 10's backup might be backing up to a directory that includes the name "September 9". Ultimately not a big deal at all.

A serious weakness of this method is that it assumes too much about a non-zero exit value. If you get an exit value of zero, you know your backup is good. If it's non-zero, you know very little. You may have a partial backup, or more likely no backup at all. You should not expect it to succeed on the next attempt. Perhaps you should assume that the next attempt will fail in the same way.

The best thing to do with an rsync non-zero exit value might be to email it along with any stderr output to the administrator.

bvz · 09-09-2016, 10:47 PM

Quote:

Originally Posted by Beryllos

Some things that you worried about will rarely if ever happen, and other problems will appear that you hadn't thought of, but you'll figure it out as you go.

A serious weakness of this method is that it assumes too much about a non-zero exit value. If you get an exit value of zero, you know your backup is good. If it's non-zero, you know very little. You may have a partial backup, or more likely no backup at all. You should not expect it to succeed on the next attempt. Perhaps you should assume that the next attempt will fail in the same way.

The best thing to do with an rsync non-zero exit value might be to email it along with any stderr output to the administrator.

That makes a lot of sense.

Right now my script will potentially email when it exits with a non-zero value. I test the exit value and for some I won't send an email unless the user specifies they want emails in those cases - i.e. exit code 23 or 24 are usually not so serious - mostly permission issues.

The script will also re-try the backup a limited number of times (a value that can be specified by the user) before giving up completely. In those cases it will also email the address supplied.

Ultimately, though, like you said it will be running like this for a while and I will get a sense of what works and what doesn't. Eventually I want to publish the script up on github for others to use if they so desire.

Beryllos · 09-10-2016, 11:31 AM

Sounds good.