Tartarus - A flexible yet simple backup software Version 0.7.1 by Stefan Tomanek (stefan.tomanek@wertarbyte.de) http://wertarbyte.de/tartarus.shtml Tartarus reads it options from a configuration file specified at the command line. This file is in fact a shell script and has the duty of setting several variables that control the behaviour of the backup script. Each configuration file is called a profile. == Configuration options NAME The identifier of the profile; it will be used in the backup filename. DIRECTORY The directory to be backed up; only a single directory name is allowed here. STAY_IN_FILESYSTEM When set to "yes", the backup process will not traverse into directories residing on different partitions. This is useful when backing up /, since you do not want to traverse into /proc or /sys. Valid options: * yes * no EXCLUDE A list of directories you wish to exclude from the backup. While Tartarus will not descend into the directories, they themselves will be included in the backup (without their contents). While running the script, filename globbing will occur. EXCLUDE_FILES A list of directories from which no files will be saved, while its subdirectories will be stored in the backup. This is useful if you want to preserve the directory structure while discarding the file content. EXCLUDE_FILENAME_PATTERNS A list of patterns that will cause matching filenames to be excluded from the backup. CREATE_LVM_SNAPSHOT If this is set to yes, Tartarus will try to freeze the content of the LVM volume specified with LVM_VOLUME_NAME - The snapshot will then be mounted and used as the backup source. Once set, the specification of LVM_VOLUME_NAME, becomes mandatory. Valid options: * yes * no LVM_VOLUME_NAME The LVM logical volume to take a snapshot from before backing up: Be sure to specify the correcet volume your DIRECTORY is on, otherwise weird things might happen (mandatory if CREATE_LVM_SNAPSHOT is enabled). LVM_MOUNT_DIR The directory your volume is usually mounted; please specify an absolute path here (mandatory if CREATE_LVM_SNAPSHOT is enabled). LVM_SNAPSHOT_SIZE The amount of disk space to allocate for snapshot differences. Make sure your volume group has enough free space for this (Default value is 200 MB). ASSEMBLY_METHOD The method you would like to employ to combine your file system into an coherent data archive. The default method is "tar", but tartarus also supports the more modern "afio" format. You must have the corresponding archive program installed. Valid options: * tar * afio STORAGE_METHOD Specifies the way the backup data should be stored. Valid options: * FILE Store the backup archive as a file on the local system. * FTP Save the backups archive to an FTP server. * SIMULATE Don't actually save the file, but send it to /dev/null * CUSTOM Allows you to specify a custom storage method by defining a shell function named "TARTARUS_CUSTOM_STORAGE_METHOD". See below for an example that uses SSH to transmit the backup to a remote location. STORAGE_FILE_DIR If STORAGE_METHOD is set to "FILE", this variable specifies the directory Tartarus places the backup archive in. STORAGE_FTP_SERVER The FTP server you wish to store your backup on. STORAGE_FTP_USER The username to log into the FTP server. STORAGE_FTP_PASSWORD The password for logging into the FTP server. STORAGE_FTP_DIR The directory of the ftp server the backup is stored in. STORAGE_FTP_USE_SSL Specifies whether to use SSL when connecting to the FTP host. Valid options: * yes * no STORAGE_FTP_SSL_INSECURE Ignore problems regarding the server certificate. Valid options: * yes * no STORAGE_CHUNK_SIZE The maximum file size (in MiB) the storage medium can handle. If this is set, the backup archive will be split in several files. It can be used to circumvent limitations in old FTP servers or file systems that cannot handle files larger than 2 GiB. To restore the data, the files have to be concatenated. COMPRESSION_METHOD The compression method you want to apply to your backup stream. Valid options: * none * gzip * bzip2 * pbzip2 ENCRYPT_SYMMETRICALLY If enabled, the backup data will be encrypted using a password read from the file specified by ENCRYPT_PASSPHRASE_FILE. Valid options: * yes * no ENCRYPT_PASSPHRASE_FILE The file the password for backup encryption is read from. The password is needed to restore the backup, so you better write it down. ENCRYPT_ASYMMETRICALLY If enabled, the backup data will be encrypted using the public key specified by ENCRYPT_KEY_ID. If both ENCRYPT_SYMMETRICALLY and ENCRYPT_ASYMMETRICALLY are enabled, decryption will be possible with the private key or the supplied passphrase (one of them is sufficient). Valid options: * yes * no ENCRYPT_KEY_ID The key id you wish to encrypt your backup for. Check "gpg --list-keys" for valid key ids. ENCRYPT_KEYRING The keyring file you wish to read. If unspecified, gnupg uses the user keyring. INCREMENTAL_TIMESTAMP_FILE A timestamp file that is updated on each succesfull backup run and used as a reference point for future incremental backups INCREMENTAL_BACKUP Don't create a full backup but only save files that have been modified after the file set by INCREMENTAL_TIMESTAMP_FILE has been touched. Instead of enabling this option in the configuration file, you can also call tartarus with the option "-i". Valid options: * yes * no INCREMENTAL_STACKING Controls whether a successfull incremental backup updates the timestamp file. Setting this option to "yes" creates _real_ incrementeal backups instead of the differential ones used by default. This way, each incremental backup is based on the one created before that, while settings this option to "no" (which is the default value) bases each incremental backup onto the last full backup run. Valud options: * yes * no LIMIT_DISK_IO When set to "yes", Tartarus uses "ionice" to change the scheduler data for the backup run. The backup process will only get disk time when no other program is requesting it. Valid options: * yes * no CHECK_FOR_UPDATE Tartarus checks whether a new version of the script is available. It will then print a message about it and continue with the backup. It will also download the changelog and enumerate the differences between the current version and the most recent one available. To disable this behaviour, set this variable to "no". Valid options: * yes * no == Basic configuration examples Suppose you want to backup your home directories on a regular basis; the compressed archive will be stored on a FTP server. This can be achieved easily with just a few lines of tartarus configuration. Let's call the profile definition »/etc/tartarus/homedirs.conf«: # That's the profile name NAME="homedirs" DIRECTORY="/home" # We store it using FTP, on the fly STORAGE_METHOD="FTP" STORAGE_FTP_SERVER="ftpbackup.hostingcompany.com" STORAGE_FTP_USER="johndoe" STORAGE_FTP_PASSWORD="verysecret" COMPRESSION_METHOD="bzip2" By calling »tartarus /etc/tartarus/homedirs.conf« the script will gather all files below /home, compress them using bzip2 and store it on the FTP server ftpbackup.hostingcompany.com. == LVM snapshots Backing up a partition that is in use can lead to inconsistent backups. To avoid this, Tartarus supports the use of LVM snapshots to "freeze" the block device and operate on that static copy. The real volume can still be used while changes done to the file system structure are not reflected on the "frozen" block device. To use this feature, the file system you wish to back up has to reside on an LVM volume and the volume group has to have some free space to store the differences between snapshot and real volume that accumulate during the backup run. You also have to make sure that the directory /snap does exist, since tartarus mounts the created snapshot volume below that directory. A few additional lines instruct Tartarus to use the snapshot functionality: # Users keep on working CREATE_LVM_SNAPSHOT="yes" LVM_VOLUME_NAME="/dev/volumegroup0/home" LVM_MOUNT_DIR="/home" # Allocate enough space for any changes during the backup run LVM_SNAPSHOT_SIZE="1000m" == Incremental backups Storing a full backup takes a lot of disk space; Often just storing the files that changed since the last backup is more desirable - this is called a incremental backup. Tartarus can create a flag file on your system that is used as a reference point when doing the next incremental backup. To do this, just add the following line to your config: INCREMENTAL_TIMESTAMP_FILE="/var/spool/tartarus/homedirs" Everytime a full backup run succeeds, this file is "touched" by Tartarus. To create an incremental backup based on that file, just add these lines to a profile: INCREMENTAL_BACKUP="yes" INCREMENTAL_TIMESTAMP_FILE="/var/spool/tartarus/homedirs" Instead of copying the profile file and adding the lines, you can also just reuse the existing configuration profile and start Tartarus with the option "-i": 'tartarus -i /etc/tartarus/homedirs.conf' will create an incremental backups based on the latest flag file deposited by the last full run. As already said, incremental backups are (normally) based on the last full backup; usually, this is called a "differential" backup: [F1]->[D1] [F2]->[D4] \------>[D2] \------->[D5] `--------->[D3] `---------->[D6] While this backup strategy simplifies recovery (since only the most recent full and the most recent differential archive has to be extracted, e.g. F2 and D6), it can waste backup space in some cases. If a large file is added to the system after the full backup has been created, this file will appear in every partial backup afterwards. Another strategy is a "real" incremental backup, which is called a "stacked incremental backup" in Tartarus terminology. Instead of basing the partial backup on the last full run, it is based on the last successfull run - be it complete or partial as well. [F1]->[I1]->[I2]->[I3] [F2]->[I4]->[I5]->[I6] This behaviour will save space, since new (and unchanged) files will only appear in one archive. However, restoring a filesystem will require all archives to be extracted (F2 _and_ I4 _and_ I5 _and_ _I5_) Setting INCREMENTAL_STACKING to "yes" will enable this behaviour and makes Tartarus update the timestamp file after every backup run, not only after full backups. == Encryption Tartarus supports symmetric encryption through gpg (GNU Privacy Guard). To utilize it, write your passphrase into a file, for example »/etc/tartarus/backups.sec«, and place it at a safe location: You might need it one day to restore your precious backup data. Now tell Tartarus where to find the secret passphrase by adding the following lines to your profile: ENCRYPT_SYMMETRICALLY="yes" ENCRYPT_PASSPHRASE_FILE="/etc/tartarus/backups.sec" Also make sure that the passphrase file is only readable by root; otherwise anyone with access to that file can decrypt your backups. Asymmetric encryption is also possible. Just specify a key id to encrypt the backup archive using that public key: ENCRYPT_ASYMMETRICALLY="yes" ENCRYPT_KEY_ID="ABC12345" The resulting backup profile can only be decrypted using the matching private key. Symmetric and asymmetric encryption can also be combined: Then one credential, either the private key or the passphrase, is sufficient to decrypt the backup archive. == Restoring a backup Even more important than creating a backup is restoring it. Since Tartarus is largely based on standard unix tools, you won't have to install special software - even a basic rescue system will suffice to retrieve your lost data. Given that the backups is stored on an FTP server, compressed an encrypted, we need the following tools to restore it: - curl, wget or any other FTP client - gpg to decrypt the backup stream - gzip or bzip, depending on the compression method used - tar to extract the archive - afio (or cpio) to extract the archive when using this file format This enumeration is also the order in which to apply these programs; First download the tar archive to your system, then use "gpg --decrypt" to, well, decrypt it. After that you can expand the file by using "gzip -d" (or the equivalent of bzip2) and retrieve the "naked" tar archive, which can then be manipulated by the usual tar commands. If you do not have enough disk space to store the entire backup, you can also restore it on the fly; just use the "pipe" feature of any unix shell: # curl ftp://USER:PASS@YOURSERVER/home-20080411-1349.tar.bz2.gpg \ | gpg --decrypt \ | bzip2 -d \ | tar tpv The tar command "tpv" prints the archives content while using numeric UID/GID values for files (so it won't change file ownership while in the rescue system). If you really want to extract the archive, replace "t" with an "x" (eXtract). If you are using the afio file format, compression does not take part on the entire stream, but is handled by afio itself on a per file basis. The command line for listing such a backup might look like this: # curl ftp://USER:PASS@YOURSERVER/home-20080411-1349.tar.bz2.gpg \ | gpg --decrypt \ | afio -Z -P bzip2 -t - To restore incremental backups, just restore the last full backup as well as the most recent incremental one. == Defining a custom storage method Tartarus supports the creation of custom storage methods. No changes to the program are necessary to achieve this: Simply set the storage method in the configuration file to "CUSTOM": STORAGE_METHOD="CUSTOM" Then define a shell function with the name "TARTARUS_CUSTOM_STORAGE_METHOD". The method should read the backup data from STDIN, while the proposed archive filename is stored in the shell variable "$FILENAME". The following example uses the secure shell to transmit the archive to a remote location: TARTARUS_CUSTOM_STORAGE_METHOD() { local USER="stefan" local HOST="zirkel.wertarbyte.de" debug "Sending backup to $USER@$HOST:~/$FILENAME through SSH..." ssh $USER@$HOST "cat > ~/$FILENAME" } Any exit code except 0 is considered an error and will abort the backup process. If the archive is to be split into multiple chunks, the storage method might be called more than once. == Tartarus processing hooks For special configuration purposes, the Tartarus scripts offers special hooks where user supplied code can be placed and executed during the backup procedure. The following hooks are called during the run of the program: TARTARUS_PRE_PROCESS_HOOK Called right after the config file has been read and the program starts TARTARUS_POST_PROCESS_HOOK Called right before the program terminates gracefully, before the cleanup procedure TARTARUS_PRE_CONFIGVERIFY_HOOK Called before the configuration gets verified (after TARTARUS_PRE_PROCESS_HOOK) TARTARUS_POST_CONFIGVERIFY_HOOK Called after all configuration options and command line arguments have been inspected TARTARUS_PRE_CLEANUP_HOOK Called before the cleanup procedure runs, the variable ABORT indicates whether the program terminated gracefully TARTARUS_POST_CLEANUP_HOOK Called at the end of the cleanup procedure TARTARUS_PRE_FREEZE_HOOK Called right before a LVM snapshot is created TARTARUS_POST_FREEZE_HOOK Called right after a LVM snapshot has been created TARTARUS_PRE_STORE_HOOK Called right before the backup data is gathered and stored TARTARUS_POST_STORE_HOOK Called right after the backup has been stored TARTARUS_DEBUG_HOOK Called whenever a debug message (contained in the variable DEBUGMSG) is printed Each segment of the backup procedure - gathering , bundling, compression, encryption and storage - itself is also embraced by a pair of hooks. Those functions however are integrated into the pipeline that transports your backup data, so writing to STDOUT or reading from STDIN in a hook might destroy your data. Only do so if you know exactly what you are doing. TARTARUS_PRE_FIND_HOOK / TARTARUS_POST_FIND_HOOK Executed before/after the find process gathers the files to be saved TARTARUS_PRE_TAR_HOOK / TARTARUS_POST_FIND_HOOK Executed before/after tar bundles the files to an archive stream TARTARUS_PRE_COMPRESSION_HOOK / TARTARUS_POST_COMPRESSION_HOOK Executed before/after the data stream is handled by the compression software TARTARUS_PRE_COMPRESSION_HOOK / TARTARUS_POST_COMPRESSION_HOOK Executed before/after the data stream is processed by the encryption software TARTARUS_PRE_STORAGE_HOOK / TARTARUS_POST_STORAGE_HOOK Executed before/after the stream is handed over to the storage function To use a hook, define a shell function of the name in your config file. As an example, this hook function transfers all debug messages to your syslog system: TARTARUS_DEBUG_HOOK() { echo $DEBUGMSG | logger } Hooks can also increase the reliability of the snapshot functionality. LVM snapshots can lead to slightly inconsistent file systems, since they do not freeze the file system, but the underlying block device. This is why Tartarus calls 'sync' right before creating the snapshot volume. Most filesystems can cope with that issue. But if you want to make sure that the snapshot file system is valid, hooks can be used to run a file system check on the snapshot volume before mounting it: TARTARUS_PRE_FREEZE_HOOK() { # make sure everything is synced to disk # before snapshotting sync } TARTARUS_POST_FREEZE_HOOK() { # we can access the internal variables # of the tartarus process, but take care! # # $SNAPDEV should contain the volume we are # about to mount, try auto-repair /sbin/fsck -y "$SNAPDEV" } == Removing obsolete backups from an FTP server Tartarus is accompanied by the perl script "charon.pl", which roams an FTP server and removes any backups archives that match a certain pattern and exceed a specified age. Configuration is done via command line arguments, check the output of "charon.pl --help" for a detailed list. # charon.pl --host safehaven --user john --password SECRET --dir / --maxage --profile home --dry-run This command line will try to log into the server "safehaven" using the user name "john" and his password "SECRET" and remove backup file from the profile "home" with are more than 7 days old. Due to the command line switch "--dry-run", no files are actually deleted - the script will only explain its potential actions in its output. Charon will never remove a full backup until a incremental one depends on it; it will be deleted once all depending backups are expired. The script does not read tartarus backup profiles; by using hooks however it can be called from Tartarus after completing a successfull backup run. This way, Tartarus can pass the configuration variables to Charon: # Hook in Charon TARTARUS_POST_PROCESS_HOOK() { # pass configuration variables to charon # transmit the password through stdin to hide it from "ps ax" local CHARON="/usr/local/sbin/charon.pl" local MAX_AGE_IN_DAYS="7" echo -n "$STORAGE_FTP_PASSWORD" | $CHARON \ --host "$STORAGE_FTP_SERVER" --user "$STORAGE_FTP_USER" \ --readpassword \ --dir "$STORAGE_FTP_DIR" \ --maxage "$MAX_AGE_IN_DAYS" \ --profile "$NAME" } TARTARUS_POST_PROCESS_HOOK will only be executed in case of a successfull backup, so there won't be any files removed if tartarus encounters an error.