[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Sks-devel] more database corruption
From: |
Dan Egli |
Subject: |
[Sks-devel] more database corruption |
Date: |
Sun, 2 Nov 2003 12:52:47 -0700 (MST) |
This is getting annoying. I looked over the server today and saw a lot of
messages (litterally thousands) in the failed_messages dir. That made no
sense so I moved some of them into the messages folder. They came right
back. That is strange, so I looked at the log. Database is corrupted
AGAIN. It seems to have happened at 3:00am this morning. Observe:
2003-11-02 02:48:14 1 keys found
2003-11-02 02:48:41 Adding list of 1 keys from file
./messages/msg-38760868.ready
2003-11-02 02:48:41 Applying 0 changes
2003-11-02 02:49:11 Adding list of 1 keys from file
./messages/msg-10140124.ready
2003-11-02 02:49:11 Applying 0 changes
2003-11-02 02:59:52 Adding list of 1 keys from file
./messages/msg-64306034.ready
2003-11-02 02:59:52 Applying 2 changes
2003-11-02 02:59:52 Adding hash 7B669B52ADB3D241956246551256B1F0
2003-11-02 02:59:52 Del'ng hash C9D6D9AC14E0AA17B32726433E2EEA32
2003-11-02 02:59:56 Sending LogResp size 2
2003-11-02 03:00:00 Calculating DB stats
2003-11-02 03:00:05 eventloop: Bdb.DBError("fatal region error detected;
run recovery")
2003-11-02 03:00:05 <command handler> error in callback.:
Bdb.DBError("fatal region error detected; run recovery")
2003-11-02 03:00:09 <mail transmit keys> error in callback.:
Bdb.DBError("fatal region error detected; run recovery")
2003-11-02 03:00:10 <command handler> error in callback.:
Bdb.DBError("fatal region error detected; run recovery")
2003-11-02 03:00:13 Error fetching key from hash
7B669B52ADB3D241956246551256B1F0: Bdb.DBError("fatal region error
detected; ru\n recovery")
2003-11-02 03:00:13 0 keys found
I tried to think what could be happening at 3:00 am that could corrupt the
database. The only thing I can come up with is my automatic backup and
keydump script. But I cannot see how it would affect the main database
because the only operations that occur in the main database are db_archve
and db_archive -s followed by some cp commands.
If it's of any help, here's my backup script.
#!/bin/bash
function errorabort {
echo "NON-ZERO exit status! Aborting Keydump! Deleting failed backup!
Alerting SysAdmin"
echo "The SKS Keyserver automated database backup and keyring dump sequence
encountered a fatal" > msg
echo "error on "`date`". This should be investigated immediately. Until then,
no further automated" >> msg
echo "backups or keydumps will take place. A file called BAD_DB was created
in the sks home" >> msg
echo "directory. The automated script will not run while this file exists.
When the database" >> msg
echo "problem has been corrected, remove this file to re-enable the automatic
backup and dumps." >> msg
mail dan -s "SKS Backup routine failure!!" < msg
rm -f msg
rm -f ${TEST}/PTree/*
rm -f ${TEST}/KDB/*
exit;
}
PATH=$PATH:/usr/local/bin
# before we do anything, check to see if BAD_DB exists. If so consider the
database unusable. Abort.
if [ -f ${HOME}/BAD_DB ] ; then
exit 2;
fi
# step 1 - define environment variables
TEST=${HOME}/test_backup
DB=${HOME}/backup_db
NEW=${HOME}/newdump
OLD=${HOME}/olddump
WORK=${HOME}/workdump
# step 2 - backup existing databases
cd $HOME/KDB
# step 2.1 - remove old KBD logs
rm -f `db_archive`
[ $? -ne 0 ] && errorabort
# step 2.2 - copy database files across
cp `db_archive -s` ${TEST}/KDB
[ $? -ne 0 ] && errorabort
# step 2.3 - copy KDB logs across
cp log.* ${TEST}/KDB
# step 2.4 - remove old PTree logs
cd ../PTree
rm -f `db_archive`
# step 2.5 - copy PTree databases across
cp `db_archive -s` ${TEST}/PTree
[ $? -ne 0 ] && errorabort
# step 2.6 - copy PTree logs across
cp log.* ${TEST}/PTree
# step 3 - validate DB files
cd ${TEST}/KDB
for DB in `db_archive -s` ; do
db_verify $DB
if [ $? -ne 0 ]; then
errorabort
fi;
done
cd ${TEST}/PTree
for DB in `db_archive -s` ; do
db_verify $DB
if [ $? -ne 0 ]; then
errorabort
fi;
done
# step 4 - make keydump
rm -f ${WORK}/*
sks dump 50000 $WORK
[ $? -ne 0 ] && errorabort
cd ${WORK}
for FILE in *.pgp; do
mv $FILE dungeon${FILE##sks-dump};
done
cd ..
rm -f ${OLD}/*
mv ${NEW}/* ${OLD}
mv ${WORK}/* ${NEW}
rm -f ${DB}/PTree/*
rm -f ${DB}/KDB/*
mv ${TEST}/KDB/* ${DB}/KDB
mv ${TEST}/PTree/* ${DB}/PTree
- [Sks-devel] more database corruption,
Dan Egli <=