bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ALL my tar.bz2-backups unreadable, CRC-error!


From: Hans-Bernhard Broeker
Subject: Re: ALL my tar.bz2-backups unreadable, CRC-error!
Date: 8 Aug 2002 11:06:48 GMT

Ralph Corderoy <address@hidden> wrote:

> If that was the problem, then it might be possible to speculatively turn
> the dodgy characters pairs back into a single linefeed and see if bzip2
> gets further.

That's quite a can of worms you're about to open.  The problem being
that, as you find a CR+LF sequence in the file, there's no way of
knowing, offhand, whether that was a lone LF in the unmutilated
original, or a CR+LF, too.  

In a compressed file format, the bytes should be essentially random,
so there's a chance of 1 in 256 that a LF would be preceeded by a
random CR in the original.  For each 64K of original file length this
would mean you'ld expect 255 LFs without a CR in front of them, and
one random CR+LF coincidence in the input.  But you don't know which
of the 256 CR+LF you're left with it was.  Quite a lot of tries would
be needed to find it.  Not to mention there could have been several
"real" CR+LFs.  If there happen to be 3 in one 64K CRC-checked block,
you'ld have to check 255*254*253 / 3!, i.e. about 2.7 million cases.

It also depends on how "clever" the line end conversion routine was
trying to be --- some would have converted an incoming CR+LF to
CR+CR+LF, but the cleverer ones would try to avoid such doubled CRs
and return a CR+LF, causing the hard problem mentioned above.

-- 
Hans-Bernhard Broeker (address@hidden)
Even if all the snow were burnt, ashes would remain.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]