|
From: | Frank Ranner |
Subject: | Re: checksum woes |
Date: | Sat, 31 Jan 2004 18:23:22 +1100 |
User-agent: | Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax) |
Tod Oace wrote:
I created a mini-config that duplicated the problem and then ran cfservd and cfagent in debug/verbose mode. It was reporting an MD5 mismatch, even though the source and dest files were the same. I used a standalone md5 program to compute the checksum and verified that that was what what reported in the trace. I also db_dumped the database and verified that the filepath was present and the checksum matched.This reminds me of a bug that was in an old version. Are you up to date with upgrades?Frank's problem seems different than the one I was experiencing. I was just experiencing copies sporadically misfiring. And to follow up... I had reported that this was still happening even after I disabled checksum databases (client and server). Actually what I found was that after I killed and restarted all my cfservd's the problem completely disappeared.So my problem was that sometimes the checksum database lookups would not find data when they should have. I've been meaning to try BerkeleyDB 4.2 and see if that helps. The change list between 4.1 and 4.2 looked pretty long. I was and still am using db-4.1.25 with Cfengine 2.1.0p1.I have an objection to how cfservd reacts to the lookup failure. When the database lookup fails cfservd tells cfagent that the checksum has changed and cfagent goes ahead with its copy, even though the file may be exactly the same. Ideally BerkeleyDB wouldn't ever fail, but if it does, or if you blow away your checksum database then cfservd causes unnecessary copies because its not comparing the local and remote checksums.If cfservd can't do the database lookup it should compute and compare the checksum before stating that it is different. It looks like misc.c:ChecksumChanged already computes and stores a checksum on the cfservd side. ChecksumChanged could compute the checksum a bit earlier on and then use that result for a comparison. If the checksums are equal then it should stash the checksum in the database and report the checksums as equal.Again, I'm looking at 2.1.0p1. My apologies if you've already reworked this in 2.1.1. See my 2003-Dec-24 post for more details, including debug output:http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF -8&threadm=mailman.1599.1075460058.928.help-cfengine%40gnu.org&prev=/ groups%3Fgroup%3Dgnu.cfengine.helpHopefully Frank's problem can be solved with an upgrade. -TodMark
While looking at syslog I noticed a lot of cfenvd errors complaining about the database. This led me to the conclusion that I had mixed up db versions. The cf programs were linked with db-3.3, but somewhere along the way I had done a test version linked with db-4.2 (while trying to solve the database corruption/crash problem). Of course cfengine treated my test version as damage and replaced it with the old versions, which then didn't like the database.
I have since compiled and relinked the programs against db-4.2 and put them into the distribution. The extraneous copies appear to have stopped.
However I still believe that the checksum database access needs work. Sleepycat documentation states that you need to set up an environment element and provide that enviroment to all instances of db_create, if you want to use multi-reader/single-writer operation. That will be a bit of work to set up. In the meantime I may just put a big pthread lock around the call to ChecksumChanged.
Regards, Frank Ranner
Tod Oace wrote:A couple weeks ago I posted a message about trouble I'm having with type=checksum network copies occasionally firing off when files have not changed on the server.I'd be VERY interested to hear if you solve this one. I'm having the EXACT same issue on one of my servers. The difference in my case isthat I'm not using a checksum database of any kind. All the checksumsget computed in real-time (server-side AND client-side).Well that's disturbing/interesting. Yesterday I tried disabling the checksum database on the server side and have still been seeing the problems. So earlier today I disabled it on the client side and have seen a couple more cases of it since then. I'm not sure if it's slowed down any, but I'll know for sure tomorrow. I've been tracking one particular problem for the past couple weeks and have a good baseline.I've been beating my head against the wall on this for a while.I'm glad I'm not the only one. I guess. :) I'll try and capture and analyze more cfservd debug output soon.I am having the same problem. However it is happening every time on somecopies. Not only that, it then tries to save the file in /var/spool/cfengine, and finds an entry already there. It then recursively moves the saved files, and after a while I get files with multiple instances of _var_spool_cfengine at the beginning and umpteen .cfsaved extensions on the end. I haven't looked into the problem yet. I only found it because 'locate' was segfaulting. Doing `locate '*' | tail` showed the segfault occuring after printing some of the overlong cfengine spool files. It is interesting that the extraneous copies occur regardless of the checksum database. I suspected that the problem was related to the unsafe concurrent access to the checksum DB. It appears not. One of the files that gets copied every time is nedit. The destination is /usr/local/bin. There is definitely an entry for nedit in the checksum database. The database can be examined using db_dump with the -p option to show human readable output instead of hexified text. Since the problem is solid for me I will try and duplicate it with the smallest config file I can manage. Then I should be able to do full debugging, trussing, network snooping, etc. Regards, Frank Ranner _______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://mail.gnu.org/mailman/listinfo/help-cfengine~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work: +47 22453272 Email: Mark.Burgess@iu.hio.no Fax : +47 22453205 WWW : http://www.iu.hio.no/~mark ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Help-cfengine mailing list Help-cfengine@gnu.org http://mail.gnu.org/mailman/listinfo/help-cfengine
[Prev in Thread] | Current Thread | [Next in Thread] |