[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Client and server locking up on network copies
From: |
Jason Cater |
Subject: |
Client and server locking up on network copies |
Date: |
Fri, 23 Jul 2004 16:57:58 -0500 |
User-agent: |
KMail/1.6.2 |
Greetings,
I've been using cfengine for about 6 months now. In the last few months
(I don't really recall when), I've started having trouble with the client
machines not being able to copy files. The client and server both just
hang on the first copy attempt, and don't ever time out.
However, if I killall -9 cfservd on the server, then restart it, sometimes
(but not usually) one or two clients will be able to copy all their files.
But after a few attempts, even that client will start to hang. Sometimes
the unlucky client that initiates the hang will get part of the way into
copying, and at some arbitrary point it starts hanging. In other words,
there's no predictability to when it will start, only that once it starts,
it's next to impossible to get it working again.
Also, I literally have to kill -9 the cfservd processed -- a simple kill,
or an /etc/init.d/cfengine stop, does nothing.
Also, as best I can tell, the *first* time I run cfagent on a new server,
it copies all files. Subsequent runs chance a lock up, however. I haven't
set up a lot of new servers lately, so this could be coincidence.
Before getting into program output, here's my configuration:
Clients are running either Debian Sarge (mostly), Debian Woody, or
SuSE Enterprise 8 on x86 hardware. This behavior is consistent across
all my clients, so it's not distro specific. Originally, all the Debian
machines were using the cfengine 2.1.0 .debs, but since this started,
I have upgraded them to the latest 2.1.7p1 to see if that made a
difference (it didn't).
Server is a Debian sarge machine running a compiled cfengine 2.1.7p1.
It originally ran the .debs, but I upgraded to see if it fixed this
problem (it didn't.)
Clients and servers are linked against OpenSSL 0.9.7 and BerkeleyDB 4.1.
The cfengine server (thumper) is also a client, and displays the same
behavior. I've tried cfagent with the -K option, but that didn't make a
difference. Also,
What's so frustrating is the clients will work 10% of the time, after
killing all daemons and trying fresh. Then it will just randomly lock up,
and won't work afterwards for any client. So I think my configuration
settings (grants, AllowedUsers, etc), file permissions, etc, are ok.
I'm at a loss.
-- Jason
-------------------------------------------------------------------
cfagent -v on a client machine returns:
<snip>
*********************************************************************
Update Sched: copy pass 1 @ Fri Jul 23 16:26:54 2004
*********************************************************************
Checking copy from \
cfengine.ncsmags.com:/usr/local/checkouts/cfengine/inputs \
to /var/lib/cfengine2/inputs
Connect to cfengine.ncsmags.com = 192.168.0.254 on port cfengine
Loaded /var/lib/cfengine2/ppkeys/root-192.168.0.254.pub
And just hangs there.
-------------------------------------------------------------------
The server will hang at this point (from /usr/sbin/cfservd -d -v -F):
<snip>
*** New socket [5]
New connection...(from 192.168.2.253/5)
Spawning new thread...
Checking file updates on /var/cfengine/inputs/cfservd.conf
(41018293/41018297)
RecvSocketStream(8)
(Concatenated 8 from stream)
Transaction Receive [t 46][]
RecvSocketStream(46)
(Concatenated 46 from stream)
Received: [CAUTH 192.168.2.253 backups.ncsmags.com root 0] on socket 5
Connecting host identifies itself as 192.168.2.253 backups.ncsmags.com root
0
(ipstring=[192.168.2.253],fqname=[backups.ncsmags.com],username=[root],socket=[192.168.2.253])
cfservd: Allowing 192.168.2.253 to connect without (re)checking ID
Non-verified Host ID is backups.ncsmags.com (Using skipverify)
Non-verified User ID seems to be root (Using skipverify)
IPV4 address
sockaddr_ntop(192.168.2.253)
Found address (192.168.2.253) for host backups.ncsmags.com
Updating last-seen time for backups.ncsmags.com
RecvSocketStream(8)
(Concatenated 8 from stream)
Transaction Receive [t 280][]
RecvSocketStream(280)
(Concatenated 280 from stream)
Received: [SAUTH y 256 37] on socket 5
Challenge encryption = y, nonce = 37, buf = 256
-------------------------------------------------------------------
If I run the client as "cfagent -d1 -v", this is what is returned:
*********************************************************************
Update Sched: copy pass 1 @ Fri Jul 23 16:36:01 2004
*********************************************************************
(BuildClassEnvironment)
Actionsequence item copy
New server connection...
Checking copy from cfengine.ncsmags.com:/usr/local/checkouts/cfengine/inputs
to /var/lib/cfengine2/inputs
Opening server connnection to cfengine.ncsmags.com
IPV4 address
sockaddr_ntop(192.168.0.254)
Connect to cfengine.ncsmags.com = 192.168.0.254 on port cfengine
IPV4 address
sockaddr_ntop(192.168.0.254)
IPV4 address
sockaddr_ntop(192.168.2.253)
Identifying this agent as 192.168.2.253 i.e. backups.ncsmags.com, with
signature 0
SENT:::CAUTH 192.168.2.253 backups.ncsmags.com root 0
Transaction Send[t 46][Packed text]
Attempting to send 54 bytes
SendSocketStream, sent 54
KeyAuthentication()
Havekey(root-192.168.0.254)
Loaded /var/lib/cfengine2/ppkeys/root-192.168.0.254.pub
Transaction Send[t 280][Packed text]
Attempting to send 288 bytes
SendSocketStream, sent 288
Transaction Send[t 261][Packed text]
Attempting to send 269 bytes
SendSocketStream, sent 269
Transaction Send[t 5][Packed text]
Attempting to send 13 bytes
SendSocketStream, sent 13
RecvSocketStream(8)
-------------------------------------------------------------------
Also, some other tidbits:
----------------------------------
Master machine:
----------------------------------
thumper:/etc/cfengine# ldd /usr/sbin/cfservd
libdb-4.1.so => /usr/lib/libdb-4.1.so (0x40023000)
libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7 \
(0x400e4000)
libnss_nis.so.2 => /lib/libnss_nis.so.2 (0x401d5000)
libpthread.so.0 => /lib/libpthread.so.0 (0x401df000)
libm.so.6 => /lib/libm.so.6 (0x40230000)
libc.so.6 => /lib/libc.so.6 (0x40252000)
libdl.so.2 => /lib/libdl.so.2 (0x40384000)
libnsl.so.1 => /lib/libnsl.so.1 (0x40387000)
libnss_files.so.2 => /lib/libnss_files.so.2 (0x4039c000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
- - - - - - - - - - - - - - -
thumper:/etc/cfengine# cat /var/lib/cfengine2/inputs/cfservd.conf
control:
domain = ( ncsmags.com )
sysadm = ( tech@ncsmags.com )
solaris::
cfrunCommand = ( "/usr/local/bin/cfagent" )
linux::
cfrunCommand = ( "/usr/sbin/cfagent" )
any::
IfElapsed = ( 1 )
MaxConnections = ( 100 )
AllowUsers = ( root )
TrustKeysFrom = ( 192.168. )
grant:
/usr/local/checkouts/cfengine/inputs/ 192.168.
/usr/local/checkouts/cfengine/helpers/ 192.168.
/usr/local/checkouts/cfengine/masterfiles/ 192.168.
/usr/local/checkouts/cfengine-test/inputs/ 192.168.
/usr/local/checkouts/cfengine-test/helpers/ 192.168.
/usr/local/checkouts/cfengine-test/masterfiles/ 192.168.
/usr/sbin/cfagent thumper.ncsmags.com
-------------------------------------------------------------------
----------------------------------
Client machine:
----------------------------------
backups:~# ldd /usr/sbin/cfagent
libdb-4.1.so => /usr/lib/libdb-4.1.so (0x4001b000)
libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7
(0x400c9000)
libnss_nis.so.2 => /lib/libnss_nis.so.2 (0x401c6000)
libpthread.so.0 => /lib/libpthread.so.0 (0x401d0000)
libm.so.6 => /lib/libm.so.6 (0x40221000)
libc.so.6 => /lib/libc.so.6 (0x40243000)
libdl.so.2 => /lib/libdl.so.2 (0x40376000)
libnsl.so.1 => /lib/libnsl.so.1 (0x40379000)
libnss_files.so.2 => /lib/libnss_files.so.2 (0x4038e000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
- - - - - - - - - - - - - - -
backups:~# cat /var/lib/cfengine2/inputs/cfagent.conf
import:
any:: cf.groups
cf.main
cf.printing
cf.firewall
cf.programs
cf.configuration
cf.mail
cf.backups
cf.workstations
cf.dns
cf.ltsp
cf.oracle
cf.workstations
cf.autofiles
debian:: cf.apt
- - - - - - - - - - - - - - -
backups:~# cat /var/lib/cfengine2/inputs/cf.main
ignore:
CVS
.svn
control:
site = ( ncsmags )
netmask = ( 255.255.255.0 )
timezone = ( CST6DST )
domain = ( ncsmags.com )
sysadm = ( sysadmin@ncsmags.com )
smtpserver = ( smtp.ncsmags.com )
masterhost = ( cfengine.ncsmags.com )
editfilesize = ( 200000 )
# Support for TESTING vs PRODUCTION scripting
TESTING::
masterfiles = ( /usr/local/checkouts/cfengine-test/masterfiles )
!TESTING::
masterfiles = ( /usr/local/checkouts/cfengine/masterfiles )
# Support for debian dpkg testing
MySession = ( RandomInt(0,100000) )
DebTmpFile = ( /tmp/cfengine.debs.$(MySession) )
actionsequence =
(
resolve
directories
shellcommands.firstpass
links
copy
editfiles
files.Prepare
required
tidy
disable
files.Rest
processes
shellcommands.secondpass
)
- - - - - - - - - - - - - - -
- Client and server locking up on network copies,
Jason Cater <=