rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Client dying "randomly" on 1.1.16


From: Oliver Hookins
Subject: [rdiff-backup-users] Client dying "randomly" on 1.1.16
Date: Fri, 22 Aug 2008 08:46:10 +1000
User-agent: Mutt/1.5.17+20080114 (2008-01-14)

Hi, I posted about this issue a while ago but I'm having a hard time
diagnosing the problem since there is no logging on the client side. We are
doing backups by initiating from the backup server, and for some reason
clients will die during the night but not every night. Sometimes only once a
week or less.

The only message we get from the operation is the following from the cron
job ending:

Read from remote host exampleclient.backup: Connection timed out
Fatal Error: Lost connection to the remote system

I have put a small "wrapper" around /usr/bin/rdiff-backup on the client that
launches it through strace. Since a successful backup would generate a
massive strace I'm being very selective about it, and only tracing "open"
calls at the moment. Here is the end of the output from last night:

open("/data/var.lib.pgsql/data/base/727661/727757", O_RDONLY|O_LARGEFILE) =
3
open("/data/var.lib.pgsql/data/base/727661/727759", O_RDONLY|O_LARGEFILE) =
3
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
open("/usr/bin/rdiff-backup.real", O_RDONLY|O_LARGEFILE) = 3
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
open("/usr/lib/python2.3/site-packages/rdiff_backup/Main.py",
O_RDONLY|O_LARGEFILE) = 3
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
open("/usr/lib/python2.3/site-packages/rdiff_backup/log.py",
O_RDONLY|O_LARGEFILE) = 3
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
open("/usr/lib/python2.3/site-packages/rdiff_backup/log.py",
O_RDONLY|O_LARGEFILE) = 3
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---
--- SIGPIPE (Broken pipe) @ 0 (0) ---

The two data files it accesses last are an 8KB and a 70MB postgres data
files. They don't seem out of the ordinary. From this trace I can't figure
out what is wrong. Maybe something here will trigger someone's memory.

If there is a better syscall to trace to figure out why it is dying I'd
appreciate any hints. Of course it's quite likely the process is not dying
in a syscall, in which case perhaps I should run it through the Python
tracer... another scenario likely to cause a lot of output. Again any hints
are welcome.

-- 
Regards,
Oliver Hookins
Anchor Systems




reply via email to

[Prev in Thread] Current Thread [Next in Thread]