checksum woes

From: Tod Oace
Subject: checksum woes
Date: Wed, 24 Dec 2003 15:05:51 -0800

A couple weeks ago I posted a message about trouble I'm having with type=checksum network copies occasionally firing off when files have not changed on the server. In my post I wondered if having cfagent and cfservd sharing a common checksum database was the source of the problem and that I'd try separating them. I did that and it didn't help. I finally learned about what the AddInstallable is for and added AddInstallable definitions for my define=actions, and that didn't help. So today I delved into it in more detail.

I captured cfservd debugging output into a file until one of my network copies went awry. Here's the output:

Received: [MD5 /var/cfengine/cvsexport/usr/local/etc/] on socket 5 CompareLocalChecksums(/var/cfengine/cvsexport/usr/local/etc/,MD5=865e7d51f8b89ae442566225ebe723a2) ChecksumChanged: key /var/cfengine/cvsexport/usr/local/etc/ with data MD5=865e7d51f8b89ae442566225ebe723a2
Checksum up to date..
Storing checksum for /var/cfengine/cvsexport/usr/local/etc/ in database MD5=865e7d51f8b89ae442566225ebe723a2
Checksums didn't match

Then I matched up this output with the 2.1.0p1 code. If I'm looking at it correctly the checksums are the same on both sides, but cfservd could not find the checksum in its database. So cfservd stored the checksum into its database and triggered a copy....even though the file has not changed since yesterday morning and the checksums on both sides are apparently the same.

So it seems like I have two problems:

1. cfservd should compare the local and remote checksums and give a response based on that even when it can't find a checksum in its database.

2. Entries in my checksum databases seem to keep disappearing. The checksum lookups in surrounding debug output succeed, so it's not like the whole database is failing all at once. My cfservd's are using BerkeleyDB 4.1.25 on Redhat 7.1.2 (ewww) with Cfengine 2.1.0p1.

Maybe I'll try running without the checksum database for a while to confirm that it's a database problem.

Does anyone have any other ideas? Thanks...

Tod Oace, Intel Corporation

