Databases

Are Your Database Backups Safe?

Using archecker to make sure your database backups work properly

Everyone backs up their databases. Most of the time we do not spend much effort worrying about our backups until the one day suddenly when we need them. Management is screaming to get the database back online; the pressure is on the DBA; will the restore work?

I have had a few clients who were surprised when the restore did not work as they had expected or did not work at all. One client had been faithfully backing up for years but never checked the tapes. On the one day when the company needed to restore, the DBA found out the tapes could not be read because of a malfunctioning tape drive.

Another client, whose very large data warehouse took 18 hours to backup over the network, started a restore after loading corrupt data. The organization waited and waited. It took four days to restore the data warehouse—a network change that no one had anticipated slowed the process.

I have had three disks fail this last year, more than any other year. If you calculate disk failures per megabyte as a constant, the more data you have, the higher the rate of failed disks you will encounter.

Checking and testing your backups is very important as a DBA. In my Informix DBA classes, I stress that students need to practice restoring their database servers so they are familiar with the prompts and procedures. When a real emergency happens, they will be prepared, knowing what to expect and how to complete tasks without looking them up in the manuals. The second key is to actually restore the production databases on a backup server to make sure the backup process and media works.

IBM Informix includes an archecker utility so you can check your backup media. Archecker provides a way for you to check an Informix backup and verify that the tape or media is usable for a restore. Archecker first started shipping with Informix version 7.3. Before this utility was introduced, the only way to verify a backup is to conduct a full restore. Archecker allows you to verify every backup, right after it has been made. You can even check the backup on another system so it does not impact your production systems. And on a critical production system, I suggest taking each backup tape to another system and verifying that the backup was successful.

The following is a quick introduction on how to use the archecker utility.

The archecker utility is designed to validate a level 0 archive with little impact on a production system. It will ensure that all data required to restore a system exists on the archive tapes or media in the correct format. It will detect pages that are missing or unreadable from the media and identify which tables are affected. It can also verify data in similar fashion to the command oncheck ‑cd. It also has an option to write a dot on the screen after reading every 1 GB of data from the tape. This lets you know that the program is doing something. As a rule, if it took two hours to make your backup, it will take archecker about two hours to verify the media.

To use archecker, you need to set up a configuration file in $INFORMIXDIR/etc using the file called AC_CONFIG.STD. Figure 1 is the configuration I use for our training classes. In older versions of Informix, the AC_TAPEDEV and AC_TAPEBLK files must also match the backup setting in your ONCONFIG file.

This is what the archecker configuration file looks like:

#*****************************************************

#

#  Licensed Material – Property Of IBM

#

#  “Restricted Materials of IBM”

#

#  IBM Informix Dynamic Server

#  (c) Copyright IBM Corporation 1996, 2004 All rights reserved.

#

#  Title:      ac_config.std

#  Description:

#              Default ac_config.std for archecker archive utility

#

#*****************************************************

AC_MSGPATH   /tmp/ac_msg.log # archecker message log

AC_STORAGE   /tmp            # Directory used for temp storage

AC_VERBOSE   1               # 1 verbose messages 0 terse messages

 

The configuration parameters are:

  • AC_STORAGE—This is the name of the directory where archecker temporary files are kept. The number of chunks and tables on your server will determine the amount of space required. You will need enough free space in this file system. To estimate, I recommend having 1 MB of free space for every 2 GB of dbspace on your system. If this directory is not set, it will default to your current directory.
  • AC_MSGPATH—This is the location and pathname of archecker’s message log. All error and status messages will be placed in this file.
  • AC_TAPEDEV—This is the name of the tape device to be used for reading and checking the archive.
  • AC_TAPEBLOCK—This file contains the size of the tape block in KB. It must match the block size from the ONCONFIG file used for the archive. If it does not match, you will get an error that will indicate the correct block size to use.

The basic command to run the archecker utility is:

archecker ‑tdsv

The –tdsv options mean:

  • -t  Use OnTape driver
  • -d  Delete old archecker metadata files and continue with new verification
  • -s  Print status message to screen
  • -v  Verbose output

 

These are the basic command line options I use. They tell archecker to read a tape, delete any old files from a previous run, print a status message and add dots to indicate the progress. Archecker creates a file /tmp/ac_msg.log with all the information.

One word of warning: you need enough free space in the directory specified by for AC_STORAGE since it copies parts of your tape to disk while it works. Archecker is best run on another machine, not your production machine, so it does not slow things down. This way you can start a test as soon as you finish a backup.

Here is the output from running archecker:

odin:informixbackup informix$ archecker -tdsv

IBM Informix Dynamic Server Version 11.70.FC4

Program Name: archecker

Version:      8.0

Released:     2011-10-12 21:56:17

CSDK:         IBM Informix CSDK Version 3.50

ESQL:         IBM Informix-ESQL Version 3.50.FC4

Compiled:     10/12/11 22:53  on Darwin 9.2.0
Darwin Kernel Version 9.2.0: Fri Jan 25 12:12:20
PST 2008; root:xnu-1228.3.12~1/RELEASE_I386

AC_STORAGE            /tmp

AC_MSGPATH            /tmp/ac_msg.log

AC_VERBOSE            on

AC_TAPEDEV            /Volumes/OdinHD2/Work/informixbackup/

AC_TAPEBLOCK          32 KB

AC_LTAPEDEV           /dev/null

AC_LTAPEBLOCK         32 KB

AC_TIMEOUT            300

AC_SESSION

Archive file /Volumes/OdinHD2/Work/informixbackup/odin.local_1_L0

Tape type:      Archive Backup Tape

OnLine version: IBM Informix Dynamic Server Version 11.70.FC4

Archive date:   Tue Oct  2 15:58:37 2012

Archive level:  0

Tape blocksize:  32768

Tape size:  2147483647

Tape number in series:  1

………………………………….

Scan PASSED

Control page checks PASSED

Reserve page validation PASSED

Checking rootdbs:TBLSpace

Checking sysmaster:sysdatabases

Checking system:syslicenseinfo

<.. Archecker displays all the table names as it
checks them, not shown here to save space .. >

Table checks PASSED

Tables/Fragments validated:  608

Archive Validation PASSED.

odin:informixbackup informix$

 

Archecker is an easy way to make sure your backups are working. It gives you the reassurance of knowing that you data is protected. There is also a lot more archecker can do, but I will leave that for you to explore.

Questions? Feedback? Please let me know your thoughts in the comments.
 

 
Previous post

Large-Scale Data Management in PureData/Netezza: Part 3

Next post

Advanced Data Tools, IBM Informix, and IBM BladeCenter

Lester Knutsen

Lester Knutsen ([email protected]) is president of Advanced Data Tools Corporation, an IBM Informix consulting and training partner specializing in data warehouse developement, database design, performance tuning, and Informix training and support. He is president of the Washington D.C. Area Informix User Group, a founding member of IIUG, an IBM Gold Consultant, and an IBM Data Champion.

  • Pingback: IBM Informix Round-Up : February 2013 | Informix Round-Up()

  • Sandor Olah

    It’s a very important article all of DBAs. I encountered once the last six hours of data was lost at a company with 1000transaction/sec system. The main reason was the head of the recordig unit was broken so the data was also.

    I recommend to everbody to check time by time all of their data to avoid such a big problem.

  • http://www.bancoppel.com Cesar Cruz

    It’s a very usefully procedure, needed for many production database instances to ensure the restore from tape backups level 0.

    Just imagine a restore of 15 Ultrium IV tapes for about 35 hours with a failure in the last tape (that’s happen to us 1 time)