When your backup script is running for too long it sometimes causes the second backup script starting at the time when previous backup is still running. This increasing pressure on the database, makes server slower, could start chain of backup processes and in some cases may break backup integrity.
Simplest solution is to avoid this undesired situation by adding locking to your backup script and prevent script to start second time when it’s already running.
Here is working sample. You will need to replace “sleep 10″ string with actual backup script call:
#!/bin/bash LOCK_NAME="/tmp/my.lock" if [[ -e $LOCK_NAME ]] ; then echo "re-entry, exiting" exit 1 fi ### Placing lock file touch $LOCK_NAME echo -n "Started..." ### Performing required work sleep 10 ### Removing lock rm -f $LOCK_NAME echo "Done."
It works perfectly most of the times. Problem is that you could still theoretically run two scripts at the same time so both will pass lock file checks and will be running together. To avoid that you would need to place unique lock file just before check and make sure no other processes did the same.
Here is improved version:
#!/bin/bash UNIQSTR=$$ LOCK_PREFIX="/tmp/my.lock." LOCK_NAME="$LOCK_PREFIX$UNIQSTR" ### Placing lock file touch $LOCK_NAME if [[ -e $LOCK_NAME && `ls -la $LOCK_PREFIX* | wc -l` == 1 ]] ; then echo -n "Started..." ### Performing required work sleep 10 ### Removing lock rm -f $LOCK_NAME echo "Done." else ### another process is running, removing lock echo "re-entry, exiting" rm -f $LOCK_NAME exit 1 fi
Now even if you managed to run two scripts at the same time only one script could actually start backup. In very rare situation both scripts will refuse to start (because of two lock files existing at the same time) but you could catch this issue by simply monitoring script exit code. Anyway – as soon you receive backup exit code different than zero it’s time to review your backup structure and make sure it works as desired.
Please note – when you terminate this script manually you will also need to remove lock file as well so script will pass check on startup. You could also use this script for any periodic tasks you have like Sphinx indexing, merging or index consistency checking.
For your convenience this script is available for download directly or using wget:
wget http://astellar.com/downloads/backup-wrapper.sh
You could also find more about MySQL backup solutions here.
Keep your data safe and have a nice day!
Sergei says:
A simpler solution is to use mkdir instead of touch in the first script. Or ln -s /dev/null $LOCK_NAME. Or any other command that fails if the destination exists.
vlad says:
Indeed directory-based locking seems more reliable, thanks for advice!
ketan patel says:
Why can’t you just use flock ?
http://linux.die.net/man/2/flock
vlad says:
Using flock and even mutex inside C/C++ code is generally better idea. Bash script is just more convenient way in case of periodical tasks like backups, MySQL maintenance tasks, log rotation, Sphinx indexing, etc running by cron daemon.
Uli Stärk says:
I think this is not a good solution, because a touch is not atomic and can lead to errors.
You better use a perl/php/python/… script calling flock LOCK_EX to get an exclusive lock on a file. Its even better to get a mysql lock (GET_LOCK), because you could theoretically run the job from two distinct hosts
Rob Smith says:
You really should be using the lock style that can be found at http://www.davidpashley.com/articles/writing-robust-shell-scripts.html under Race conditions:
“It’s worth pointing out that there is a slight race condition in the above lock example between the time we test for the lockfile and the time we create it. A possible solution to this is to use IO redirection and bash’s noclobber mode, which won’t redirect to an existing file.”
It also shows how to use traps to catch and remove the lock file after the script gets killed/termed/etc, which is important for backup scripts to clean up after themselves if they can
vlad says:
Rob, thanks for the link, it’s great guide!
mike says:
Seems like this could lead to a race condition, you might want to use set -o noclobber or instead use mktemp -d since mkdir is atomic. Another common approach is to ‘kill -0′ the pid to verify the other job did not fail and neglect to clean up the lock file with a trap. (kill -9 is a potential pitfall still with traps)
Look here for some ideas:
http://lists.baseurl.org/pipermail/yum-devel/2011-August/008547.html
http://wiki.bash-hackers.org/howto/mutex
vlad says:
Indeed, using directory-based locking seems better idea! Thank you for the guides! I’ve also replied about race conditions below.
vlad says:
You all are absolutely right about possible race conditions and drawbacks. File system is an additional, relatively slow layer, locking behavior may vary depends on FS type and may not be atomic or thread safe. So if we’re talking about race condition prevention in parallel execution environment I would consider to use much faster and reliable in-memory mutex inside C/C++/Java/Python/etc code (as mentioned by Ketan and Uli) instead of file-base locking.
At the same time backup scripts and other periodic tasks are mostly started using cron job once in a while and could barely cause race condition on the first place. In this case having unique lock names with attached process id is convenient way to implement external process monitor.