rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] ridff-backup 'hangs' on certain file - update


From: Maarten Bezemer
Subject: Re: [rdiff-backup-users] ridff-backup 'hangs' on certain file - update
Date: Sun, 16 May 2010 07:58:05 +0200 (CEST)

Hi,

Could you try making a new repo containing only the directory that includes the 'broken' file, and see if that fails too?
(How large is that file anyway?)

If that setup also fails, create a tar or use rsync to copy it to the backup server, and recreate the source tree there, to test if it works when running rdiff-backup locally.

I've seen a few cases (not with rdiff-backup but other software) where the network hardware somewhere between the hosts was malfunctioning. Smaller packets went through fine, larger packets got mangled and subsequently dropped. Ssh went fine, scp on a big file hung up.
Maybe setting MTU to a smaller value can be a workaround for your problem.

On the other hand, your strace results (I/O error) are not what's to be expected... did you try rebooting? Did you try memtest86+ to test your RAM?


HTH,

Maarten


On Sun, 16 May 2010, Danilo Godec wrote:

I removed 'rdiff-backup-data' for this host and tried the 'rdiff-backup'
again - it still stops and 'hangs' on the same file.

I interrupted the 'rdiff-backup' process (CTRL-C), removed all backed-up
files for this host and 'rsync'-ed it - without any problems.

Regards, Danilo


On 15.5.2010 12:42, Danilo Godec wrote:
Hi,

recently my 'rdiff-backup' developed a weird problem, where it 'hangs'
when backing up a certain file on one server. Other 40+ servers are OK,
it's just that one and even that is only happening since May 3rd....

I can 'cat' the file on the originating server, I can also 'scp' it on
the backup server - there is no problem, no error with that. However -
when 'rdiff-backup' gets to this file, it just 'hangs' and does nothing.

On the backup server I see the file 'rdiff-backup.tmp.22397' which seems
the be a partially transferred original file (524288 bytes vs. 785592
bytes of the original file).

If I 'strace' the 'python' process on the backup server, I get this:


# strace -p 7343
Process 7343 attached - interrupt to quit
read(5, ^C <unfinished ...>
Process 7343 detached

If I strace the 'ssh' process', I get this:


# strace -p 7344
Process 7344 attached - interrupt to quit
select(7, [3 4], [], NULL, NULL^C <unfinished ...>
Process 7344 detached

And that's all, there is nothing else going on even if I leave 'strace'
open for 30 minutes...

And if I 'strace' the 'python' process on the originating server, I get
this:


# strace -p 20518
Process 20518 attached - interrupt to quit
read(3,  <unfinished ...>
strace: ptrace(PTRACE_CONT,1,133): Input/output error
Process 20518 detached

After that the process state in 'ps' changes from 'Ss' to 'Ts'
(stopped). I can change it back to 'Ss' with 'kill -CONT', but it still
doesn't do anything.

The weird thing is that it ALWAYS happens on the same file, but there is
seemingly nothing wrong with that particular file...

Any ideas? What else is there to try and get more clues?

   D.

PS: OS of the backup server is OpenSuSE 11.1 (32 bit), OS of the
'backed-up' server is CentOS release 5.2 (64 bit). Rdiff-backup version
on both is 1.2.8. I also tried removing 'rdiff-backup-data' to start all
over, but it didn't help.


_______________________________________________
rdiff-backup-users mailing list at address@hidden
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki



_______________________________________________
rdiff-backup-users mailing list at address@hidden
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki




reply via email to

[Prev in Thread] Current Thread [Next in Thread]