rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] ridff-backup 'hangs' on certain file - update


From: Danilo Godec
Subject: Re: [rdiff-backup-users] ridff-backup 'hangs' on certain file - update
Date: Sun, 16 May 2010 08:37:34 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

On 16.5.2010 7:58, Maarten Bezemer wrote:
> Hi,
>
> Could you try making a new repo containing only the directory that
> includes the 'broken' file, and see if that fails too?

You mean to try and backup that file alone? Will do right away.

> (How large is that file anyway?)

It's about 700 kbytes.

>
> If that setup also fails, create a tar or use rsync to copy it to the
> backup server, and recreate the source tree there, to test if it works
> when running rdiff-backup locally.

Nice idea.

>
> I've seen a few cases (not with rdiff-backup but other software) where
> the network hardware somewhere between the hosts was malfunctioning.
> Smaller packets went through fine, larger packets got mangled and
> subsequently dropped. Ssh went fine, scp on a big file hung up.
> Maybe setting MTU to a smaller value can be a workaround for your
> problem.

I don't think that's a problem as both 'scp' and 'rsync' did OK. As well
as 40+ other hosts that I backup.

>
> On the other hand, your strace results (I/O error) are not what's to
> be expected... did you try rebooting? Did you try memtest86+ to test
> your RAM?

Yes, I/O errors are somewhat weird and I don't know what to make of them...

I did reboot both this machine and the backup server, however I didn't
run memtest (yet) as the machine seems to function properly (other than
the backup problem).

Thanks,
   Danilo



>
>
> HTH,
>
> Maarten
>
>
> On Sun, 16 May 2010, Danilo Godec wrote:
>
>> I removed 'rdiff-backup-data' for this host and tried the 'rdiff-backup'
>> again - it still stops and 'hangs' on the same file.
>>
>> I interrupted the 'rdiff-backup' process (CTRL-C), removed all backed-up
>> files for this host and 'rsync'-ed it - without any problems.
>>
>> Regards, Danilo
>>
>>
>> On 15.5.2010 12:42, Danilo Godec wrote:
>>> Hi,
>>>
>>> recently my 'rdiff-backup' developed a weird problem, where it 'hangs'
>>> when backing up a certain file on one server. Other 40+ servers are OK,
>>> it's just that one and even that is only happening since May 3rd....
>>>
>>> I can 'cat' the file on the originating server, I can also 'scp' it on
>>> the backup server - there is no problem, no error with that. However -
>>> when 'rdiff-backup' gets to this file, it just 'hangs' and does
>>> nothing.
>>>
>>> On the backup server I see the file 'rdiff-backup.tmp.22397' which
>>> seems
>>> the be a partially transferred original file (524288 bytes vs. 785592
>>> bytes of the original file).
>>>
>>> If I 'strace' the 'python' process on the backup server, I get this:
>>>
>>>
>>>> # strace -p 7343
>>>> Process 7343 attached - interrupt to quit
>>>> read(5, ^C <unfinished ...>
>>>> Process 7343 detached
>>>>
>>> If I strace the 'ssh' process', I get this:
>>>
>>>
>>>> # strace -p 7344
>>>> Process 7344 attached - interrupt to quit
>>>> select(7, [3 4], [], NULL, NULL^C <unfinished ...>
>>>> Process 7344 detached
>>>>
>>> And that's all, there is nothing else going on even if I leave 'strace'
>>> open for 30 minutes...
>>>
>>> And if I 'strace' the 'python' process on the originating server, I get
>>> this:
>>>
>>>
>>>> # strace -p 20518
>>>> Process 20518 attached - interrupt to quit
>>>> read(3,  <unfinished ...>
>>>> strace: ptrace(PTRACE_CONT,1,133): Input/output error
>>>> Process 20518 detached
>>>>
>>> After that the process state in 'ps' changes from 'Ss' to 'Ts'
>>> (stopped). I can change it back to 'Ss' with 'kill -CONT', but it still
>>> doesn't do anything.
>>>
>>> The weird thing is that it ALWAYS happens on the same file, but
>>> there is
>>> seemingly nothing wrong with that particular file...
>>>
>>> Any ideas? What else is there to try and get more clues?
>>>
>>>    D.
>>>
>>> PS: OS of the backup server is OpenSuSE 11.1 (32 bit), OS of the
>>> 'backed-up' server is CentOS release 5.2 (64 bit). Rdiff-backup version
>>> on both is 1.2.8. I also tried removing 'rdiff-backup-data' to start
>>> all
>>> over, but it didn't help.
>>>
>>>
>>> _______________________________________________
>>> rdiff-backup-users mailing list at address@hidden
>>> http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
>>> Wiki URL:
>>> http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
>>>
>>
>>
>> _______________________________________________
>> rdiff-backup-users mailing list at address@hidden
>> http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
>> Wiki URL:
>> http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]