[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: checksum woes

From: Tod Oace
Subject: Re: checksum woes
Date: Fri, 30 Jan 2004 10:47:12 -0800

This reminds me of a bug that was in an old version. Are you
up to date with upgrades?

Frank's problem seems different than the one I was experiencing. I was just experiencing copies sporadically misfiring. And to follow up... I had reported that this was still happening even after I disabled checksum databases (client and server). Actually what I found was that after I killed and restarted all my cfservd's the problem completely disappeared.

So my problem was that sometimes the checksum database lookups would not find data when they should have. I've been meaning to try BerkeleyDB 4.2 and see if that helps. The change list between 4.1 and 4.2 looked pretty long. I was and still am using db-4.1.25 with Cfengine 2.1.0p1.

I have an objection to how cfservd reacts to the lookup failure. When the database lookup fails cfservd tells cfagent that the checksum has changed and cfagent goes ahead with its copy, even though the file may be exactly the same. Ideally BerkeleyDB wouldn't ever fail, but if it does, or if you blow away your checksum database then cfservd causes unnecessary copies because its not comparing the local and remote checksums.

If cfservd can't do the database lookup it should compute and compare the checksum before stating that it is different. It looks like misc.c:ChecksumChanged already computes and stores a checksum on the cfservd side. ChecksumChanged could compute the checksum a bit earlier on and then use that result for a comparison. If the checksums are equal then it should stash the checksum in the database and report the checksums as equal.

Again, I'm looking at 2.1.0p1. My apologies if you've already reworked this in 2.1.1. See my 2003-Dec-24 post for more details, including debug output: -8&

Hopefully Frank's problem can be solved with an upgrade.   -Tod


Tod Oace wrote:
A couple weeks ago I posted a message about trouble I'm having with
type=checksum network copies occasionally firing off when files have
not changed on the server.

I'd be VERY interested to hear if you solve this one. I'm having the
EXACT same issue on one of my servers.  The difference in my case is
that I'm not using a checksum database of any kind. All the checksums
get computed in real-time (server-side AND client-side).

Well that's disturbing/interesting. Yesterday I tried disabling the
checksum database on the server side and have still been seeing the
problems. So earlier today I disabled it on the client side and have
seen a couple more cases of it since then. I'm not sure if it's slowed
down any, but I'll know for sure tomorrow. I've been tracking one
particular problem for the past couple weeks and have a good baseline.

I've been beating my head against the wall on this for a while.

I'm glad I'm not the only one. I guess.  :)

I'll try and capture and analyze more cfservd debug output soon.

I am having the same problem. However it is happening every time on some
copies. Not only that, it then tries to save the file in
/var/spool/cfengine, and finds an entry already there. It then
recursively moves the saved files, and after a while I get files with
multiple instances of _var_spool_cfengine at the beginning and umpteen
.cfsaved extensions on the end.

I haven't looked into the problem yet. I only found it because 'locate' was segfaulting. Doing `locate '*' | tail` showed the segfault occuring
after printing some of the overlong cfengine spool files.

It is interesting that the extraneous copies occur regardless of the
checksum database. I suspected that the problem was related to the
unsafe concurrent access to the checksum DB. It appears not. One of the
files that gets copied every time is nedit. The destination is
/usr/local/bin. There is definitely an entry for nedit in the checksum
database. The database can be examined using db_dump with the -p option
to show human readable output instead of hexified text.

Since the problem is solid for me I will try and duplicate it with the
smallest config file I can manage. Then I should be able to do full
debugging, trussing, network snooping, etc.

Frank Ranner

Help-cfengine mailing list

Work: +47 22453272            Email:  address@hidden
Fax : +47 22453205            WWW  :

Help-cfengine mailing list

Tod Oace, Intel Corporation <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]