help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Client and server locking up on network copies


From: Jason Cater
Subject: Client and server locking up on network copies
Date: Fri, 23 Jul 2004 16:57:58 -0500
User-agent: KMail/1.6.2

Greetings, 

I've been using cfengine for about 6 months now. In the last few months 
(I don't really recall when), I've started having trouble with the client
machines not being able to copy files. The client and server both just 
hang on the first copy attempt, and don't ever time out. 

However, if I killall -9 cfservd on the server, then restart it, sometimes 
(but not usually) one or two clients will be able to copy all their files. 
But after a few attempts, even that client will start to hang. Sometimes 
the unlucky client that initiates the hang will get part of the way into 
copying, and at some arbitrary point it starts hanging.  In other words, 
there's no predictability to when it will start, only that once it starts, 
it's next to impossible to get it working again. 

Also, I literally have to kill -9 the cfservd processed -- a simple kill, 
or an /etc/init.d/cfengine stop, does nothing. 

Also, as best I can tell, the *first* time I run cfagent on a new server, 
it copies all files.  Subsequent runs chance a lock up, however.  I haven't 
set up a lot of new servers lately, so this could be coincidence. 

Before getting into program output, here's my configuration: 

  Clients are running either Debian Sarge (mostly), Debian Woody, or 
  SuSE Enterprise 8 on x86 hardware. This behavior is consistent across
  all my clients, so it's not distro specific. Originally, all the Debian 
  machines were using  the cfengine 2.1.0 .debs, but since this started, 
  I have upgraded them to the latest 2.1.7p1 to see if that made a 
  difference (it didn't).
  
  Server is a Debian sarge machine running a compiled cfengine 2.1.7p1. 
  It originally ran the .debs, but I upgraded to see if it fixed this 
  problem (it didn't.)
  
  Clients and servers are linked against OpenSSL 0.9.7 and BerkeleyDB 4.1.

The cfengine server (thumper) is also a client, and displays the same 
behavior. I've tried cfagent with the -K option, but that didn't make a 
difference. Also, 

What's so frustrating is the clients will work 10% of the time, after 
killing all daemons and trying fresh. Then it will just randomly lock up, 
and won't work afterwards for any client.  So I think my configuration 
settings (grants, AllowedUsers, etc), file permissions, etc, are ok. 

I'm at a loss. 

-- Jason   

-------------------------------------------------------------------

cfagent -v on a client machine returns: 

  <snip>
  *********************************************************************
   Update Sched: copy pass 1 @ Fri Jul 23 16:26:54 2004
  *********************************************************************

  Checking copy from \
     cfengine.ncsmags.com:/usr/local/checkouts/cfengine/inputs  \
     to /var/lib/cfengine2/inputs
  Connect to cfengine.ncsmags.com = 192.168.0.254 on port cfengine
  Loaded /var/lib/cfengine2/ppkeys/root-192.168.0.254.pub

And just hangs there. 

-------------------------------------------------------------------
  
The server will hang at this point (from /usr/sbin/cfservd -d -v -F): 
  
  <snip>
  *** New socket [5]
  New connection...(from 192.168.2.253/5)
  Spawning new thread...
  Checking file updates on /var/cfengine/inputs/cfservd.conf 
(41018293/41018297)
  RecvSocketStream(8)
      (Concatenated 8 from stream)
  Transaction Receive [t 46][]
  RecvSocketStream(46)
      (Concatenated 46 from stream)
  Received: [CAUTH 192.168.2.253 backups.ncsmags.com root 0] on socket 5
  Connecting host identifies itself as 192.168.2.253 backups.ncsmags.com root 
0
  
(ipstring=[192.168.2.253],fqname=[backups.ncsmags.com],username=[root],socket=[192.168.2.253])
  cfservd: Allowing 192.168.2.253 to connect without (re)checking ID
  Non-verified Host ID is backups.ncsmags.com (Using skipverify)
  Non-verified User ID seems to be root (Using skipverify)
  IPV4 address
  sockaddr_ntop(192.168.2.253)
  Found address (192.168.2.253) for host backups.ncsmags.com
  Updating last-seen time for backups.ncsmags.com
  RecvSocketStream(8)
      (Concatenated 8 from stream)
  Transaction Receive [t 280][]
  RecvSocketStream(280)
      (Concatenated 280 from stream)
  Received: [SAUTH y 256 37] on socket 5
  Challenge encryption = y, nonce = 37, buf = 256

-------------------------------------------------------------------
  
If I run the client as "cfagent -d1 -v", this is what is returned: 

  *********************************************************************
  Update Sched: copy pass 1 @ Fri Jul 23 16:36:01 2004
  *********************************************************************
  
  (BuildClassEnvironment)
  Actionsequence item copy
  New server connection...
  Checking copy from cfengine.ncsmags.com:/usr/local/checkouts/cfengine/inputs 
to /var/lib/cfengine2/inputs
  Opening server connnection to cfengine.ncsmags.com
  IPV4 address
  sockaddr_ntop(192.168.0.254)
  Connect to cfengine.ncsmags.com = 192.168.0.254 on port cfengine
  IPV4 address
  sockaddr_ntop(192.168.0.254)
  IPV4 address
  sockaddr_ntop(192.168.2.253)
  Identifying this agent as 192.168.2.253 i.e. backups.ncsmags.com, with 
signature 0
  SENT:::CAUTH 192.168.2.253 backups.ncsmags.com root 0
  Transaction Send[t 46][Packed text]
  Attempting to send 54 bytes
  SendSocketStream, sent 54
  KeyAuthentication()
  Havekey(root-192.168.0.254)
  Loaded /var/lib/cfengine2/ppkeys/root-192.168.0.254.pub
  Transaction Send[t 280][Packed text]
  Attempting to send 288 bytes
  SendSocketStream, sent 288
  Transaction Send[t 261][Packed text]
  Attempting to send 269 bytes
  SendSocketStream, sent 269
  Transaction Send[t 5][Packed text]
  Attempting to send 13 bytes
  SendSocketStream, sent 13
  RecvSocketStream(8)


-------------------------------------------------------------------
Also, some other tidbits: 

----------------------------------
Master machine:
----------------------------------

  thumper:/etc/cfengine# ldd /usr/sbin/cfservd
        libdb-4.1.so => /usr/lib/libdb-4.1.so (0x40023000)
        libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7 \
            (0x400e4000)
        libnss_nis.so.2 => /lib/libnss_nis.so.2 (0x401d5000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x401df000)
        libm.so.6 => /lib/libm.so.6 (0x40230000)
        libc.so.6 => /lib/libc.so.6 (0x40252000)
        libdl.so.2 => /lib/libdl.so.2 (0x40384000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40387000)
        libnss_files.so.2 => /lib/libnss_files.so.2 (0x4039c000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

- - - - - - - - - - - - - - -

thumper:/etc/cfengine# cat /var/lib/cfengine2/inputs/cfservd.conf
  control:
    domain = ( ncsmags.com )
    sysadm = ( tech@ncsmags.com )
    solaris::
      cfrunCommand = ( "/usr/local/bin/cfagent" )
    linux::
      cfrunCommand  = ( "/usr/sbin/cfagent" )
    any::
    IfElapsed = ( 1 )
    MaxConnections = ( 100 )
    AllowUsers    = ( root )
    TrustKeysFrom = ( 192.168. )
  grant:
    /usr/local/checkouts/cfengine/inputs/            192.168.
    /usr/local/checkouts/cfengine/helpers/           192.168.
    /usr/local/checkouts/cfengine/masterfiles/       192.168.
    /usr/local/checkouts/cfengine-test/inputs/       192.168.
    /usr/local/checkouts/cfengine-test/helpers/      192.168.
    /usr/local/checkouts/cfengine-test/masterfiles/  192.168.
    /usr/sbin/cfagent                                thumper.ncsmags.com



-------------------------------------------------------------------

----------------------------------
Client machine: 
----------------------------------
backups:~# ldd /usr/sbin/cfagent
        libdb-4.1.so => /usr/lib/libdb-4.1.so (0x4001b000)
        libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7 
(0x400c9000)
        libnss_nis.so.2 => /lib/libnss_nis.so.2 (0x401c6000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x401d0000)
        libm.so.6 => /lib/libm.so.6 (0x40221000)
        libc.so.6 => /lib/libc.so.6 (0x40243000)
        libdl.so.2 => /lib/libdl.so.2 (0x40376000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40379000)
        libnss_files.so.2 => /lib/libnss_files.so.2 (0x4038e000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

- - - - - - - - - - - - - - -

backups:~# cat /var/lib/cfengine2/inputs/cfagent.conf
  import:
    any::             cf.groups
                      cf.main
                      cf.printing
                      cf.firewall
                      cf.programs
                      cf.configuration
                      cf.mail
                      cf.backups
                      cf.workstations
                      cf.dns
                      cf.ltsp
                      cf.oracle
                      cf.workstations
                      cf.autofiles
    debian::          cf.apt

- - - - - - - - - - - - - - -

backups:~# cat /var/lib/cfengine2/inputs/cf.main
ignore:
   CVS
   .svn
control:
   site = ( ncsmags )
   netmask = ( 255.255.255.0 )
   timezone = ( CST6DST )
   domain = ( ncsmags.com )
   sysadm = ( sysadmin@ncsmags.com )
   smtpserver = ( smtp.ncsmags.com )
   masterhost  = ( cfengine.ncsmags.com )
   editfilesize = ( 200000 )
   # Support for TESTING vs PRODUCTION scripting
   TESTING::
     masterfiles = ( /usr/local/checkouts/cfengine-test/masterfiles )
   !TESTING::
     masterfiles = ( /usr/local/checkouts/cfengine/masterfiles )
   # Support for debian dpkg testing
   MySession = ( RandomInt(0,100000) )
   DebTmpFile = ( /tmp/cfengine.debs.$(MySession) )
   actionsequence =
         (
         resolve
         directories
         shellcommands.firstpass
         links
         copy
         editfiles
         files.Prepare
         required
         tidy
         disable
         files.Rest
         processes
         shellcommands.secondpass
         )

- - - - - - - - - - - - - - -




reply via email to

[Prev in Thread] Current Thread [Next in Thread]