Recommendations I make to save critical data

First off, your data is the most valuable part of any server. There are many many hour of very hard if not impossible to replace work involved in setting up even a fairly basic web site. This doesn’t even include things like client information, orders etc. that directly cost you money if you lose them.

Not all backup methods are for everyone. The reason is that there are widely variable needs for data security as well as a wide variety of budgets. Someone with a page that is doing e-commerce transactions will likely need a lot more in regards to backups than someone with a bi-weekly blog for instance.

First off, there are two different modes of failure one will encounter as a sysadmin. The first is a “hard” failure. This includes drives or RAID arrays (yes it does happen) going bad. I love RAID, I think it’s a great measure to ensuring data protection but it’s not fool proof by any means and is no substitute for backups.

The second type of failure is the “soft” failure. With this failure mode for whatever reason data on the system is gone. This can be anything from a user deleting off their public_html directory to data corruption because the drive is heavily over run. Commonly this is someone running an FS check on a machine and having it dump a few thousand files to lost&found. I have seen my fair share of machines come up after this and run fine, and have seen plenty that didn’t too. This can also be the result of hackers etc. messing around on your system. Something I will warn of is if you use a secondary drive in the same server for backups, it can be something that is deleted by hackers as well. If you leave the drive mounted after backups are done and they do rm -rf /* it will be erased. Be sure to unmount your backup drive if you use this method. In general I do not advise relying on it for this reason, however it makes for a great way to have backups on a system without waiting for them to transfer.

The first rule I have is no matter what you should have minimum three copies of your data, at least one of which is totally off site and not within the same company as your server/colocation/shared host etc. This gives you options if something happens, and you’re not relying on one group of people to ensure your data is in tact.This can be as simple as having your system upload the files to a home or office computer via DynDNS and back mapping the port, then burning the images on to a CD weekly. On a higher level it can be storage by a company offering cloud storage such as Amazon.

How often you should back your data up and retain it is another question that is fairly common. This is largely subjective, and is a compromise between how much data you can afford to lose versus how much space you can afford. If you’re running a streaming video site, this can get quite pricey very quickly. Even to the point it may be best to try and get a low end server and put big drives in it to back up to. Afterall if you pay .50/gb and need a 1TB of backup space $500 buys a good bit of server!

What to back up is another good question. If you’re running a forum or something like that where there aren’t really all that many changes made to the underlying software, doing a single full backup and then backing the user upload directories (eg images) and the database may be enough. If the site is undergoing constant development, full backups would be a great deal more prudent.

The last thing to consider is how these backups are going to be made. I have done backups before with shell scripts, and used both Plesk’s and CPanel’s backup mechanisms. When doing a shell script for backups, you gain a ton of versatility in how and what you back up, at the price of being a lot more tedious to configure. These sort of backups are really nice if you’re wanting to make it so that your system backs up only certain things on varying interval. The panel based backups are so easy to configure, there is little to no reason you shouldn’t set them up. You just specify how often you want backups, where they will be stored and what will be backed up. The caveat I will warn about using a panel based backup system is that even with CPU level tweaks in the config files these can heavily load a system so my advice is to run them off hours.

More useful commands to write about

  • most – Epic app for displaying 1 window at a time of output
  • aria2 – High speed download utility with resuming and segmented downloading
  • htop – htop is an interactive text-mode process viewer for Linux, similar to “top”
  • netcat – A featured networking utility which reads and writes data across network connections, using the TCP/IP protocol.
  • tee – In computing, tee is a command in various command-line interpreters (shells) such as Unix shells, 4DOS/4NT and Windows PowerShell,
  • heh – The heh program allows the user to open files with their preferred applications through a database of filename extension/application pairs.
  • speedometer – Speedometer shows a graph of your current and past network speed in your console