From time to time we have to work with live environments and production databases. For some of us this is day-to-day job. And most of the time cost of a mistake is way higher than expected improvement especially on the databases. Because issue on the database side will affect everything else.
I heard enough war stories about ruined productions and can imagine well enough speed of DROP DATABASE command replicating across the cluster. So I’m scared to make changes in production. The more loss expected if things go wrong the more I’m going to be scared planning every change. But I still love to make improvements so the only question is how to make them safer.
This post is not intended to be a guide or best practices on how to avoid issues at all, it’s more invitation to discussion that started between me and @randomsurfer in twitter on how to avoid production failures. It was hard for me to fit to 150 characters so I’m switching to more comfortable environment.
So here is a few practices that I use while working with remote servers:
- Do not use rm -rf with leading wildcards. Ever. If you need to delete all subdirectories go with rm -rf dir1 dir1 dir2 etc. I’m also trying to avoid even trailing slashes near to rm -rf to be honest, just because.
- Commandline history is convenient but could be dangerous if your connection is slow (it happens often when you work coast to coast or trans-Atlantic) so you could press enter on wrong command. To avoid this I’m hitting <space> at the end of command line before hit enter and wait for terminal response to make sure that command appearing on the command line will not scroll away.
- I do my best to not make changes while sick. Sometimes it’s hard to reschedule a change, but most of the time it’s worth to move it to later time (even few hours could help sometimes) or hand off to someone else than put customer’s environment to risk.
- Work with a single server at a time. If you have several open SSH windows with different server there is possibility to mess up the window and issue right command in wrong place. Especially when one window is a master and another is slave – same DB names, same tables, easy to mess up. If you still need to work on different hosts at the same time – change terminal’s background. I usually use red color (128,0,0) for master SSH window.
If you have your own tricks on how to not run DROP DATABASE in wrong place (yeah, I know pt-slave-delay helps in this case, but still!) please feel free to share them!