|
From: | Tod Oace |
Subject: | Re: checksum woes |
Date: | Fri, 30 Jan 2004 10:47:12 -0800 |
This reminds me of a bug that was in an old version. Are you up to date with upgrades?
Frank's problem seems different than the one I was experiencing. I was just experiencing copies sporadically misfiring. And to follow up... I had reported that this was still happening even after I disabled checksum databases (client and server). Actually what I found was that after I killed and restarted all my cfservd's the problem completely disappeared.
So my problem was that sometimes the checksum database lookups would not find data when they should have. I've been meaning to try BerkeleyDB 4.2 and see if that helps. The change list between 4.1 and 4.2 looked pretty long. I was and still am using db-4.1.25 with Cfengine 2.1.0p1.
I have an objection to how cfservd reacts to the lookup failure. When the database lookup fails cfservd tells cfagent that the checksum has changed and cfagent goes ahead with its copy, even though the file may be exactly the same. Ideally BerkeleyDB wouldn't ever fail, but if it does, or if you blow away your checksum database then cfservd causes unnecessary copies because its not comparing the local and remote checksums.
If cfservd can't do the database lookup it should compute and compare the checksum before stating that it is different. It looks like misc.c:ChecksumChanged already computes and stores a checksum on the cfservd side. ChecksumChanged could compute the checksum a bit earlier on and then use that result for a comparison. If the checksums are equal then it should stash the checksum in the database and report the checksums as equal.
Again, I'm looking at 2.1.0p1. My apologies if you've already reworked this in 2.1.1. See my 2003-Dec-24 post for more details, including debug output:
http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF -8&threadm=mailman.1599.1075460058.928.help-cfengine%40gnu.org&prev=/ groups%3Fgroup%3Dgnu.cfengine.help
Hopefully Frank's problem can be solved with an upgrade. -Tod
MarkTod Oace wrote:A couple weeks ago I posted a message about trouble I'm having withtype=checksum network copies occasionally firing off when files havenot changed on the server.I'd be VERY interested to hear if you solve this one. I'm having theEXACT same issue on one of my servers. The difference in my case isthat I'm not using a checksum database of any kind. All the checksumsget computed in real-time (server-side AND client-side).Well that's disturbing/interesting. Yesterday I tried disabling the checksum database on the server side and have still been seeing the problems. So earlier today I disabled it on the client side and haveseen a couple more cases of it since then. I'm not sure if it's sloweddown any, but I'll know for sure tomorrow. I've been tracking oneparticular problem for the past couple weeks and have a good baseline.I've been beating my head against the wall on this for a while.I'm glad I'm not the only one. I guess. :) I'll try and capture and analyze more cfservd debug output soon.I am having the same problem. However it is happening every time on somecopies. Not only that, it then tries to save the file in /var/spool/cfengine, and finds an entry already there. It then recursively moves the saved files, and after a while I get files with multiple instances of _var_spool_cfengine at the beginning and umpteen .cfsaved extensions on the end.I haven't looked into the problem yet. I only found it because 'locate' was segfaulting. Doing `locate '*' | tail` showed the segfault occuringafter printing some of the overlong cfengine spool files. It is interesting that the extraneous copies occur regardless of the checksum database. I suspected that the problem was related to theunsafe concurrent access to the checksum DB. It appears not. One of thefiles that gets copied every time is nedit. The destination is /usr/local/bin. There is definitely an entry for nedit in the checksumdatabase. The database can be examined using db_dump with the -p optionto show human readable output instead of hexified text. Since the problem is solid for me I will try and duplicate it with the smallest config file I can manage. Then I should be able to do full debugging, trussing, network snooping, etc. Regards, Frank Ranner _______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://mail.gnu.org/mailman/listinfo/help-cfengine~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work: +47 22453272 Email: Mark.Burgess@iu.hio.no Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://mail.gnu.org/mailman/listinfo/help-cfengine
-- Tod Oace, Intel Corporation <tod@intel.com>
[Prev in Thread] | Current Thread | [Next in Thread] |