bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Invalid Content-Length header in WARC files, on some plat


From: Tim Ruehsen
Subject: Re: [Bug-wget] Invalid Content-Length header in WARC files, on some platforms
Date: Tue, 13 Nov 2012 09:48:37 +0100
User-agent: KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; )

Hello Gis,

just out of curiosity. 

What about setting the compiler option -D _FILE_OFFSET_BITS=64 on these 
systems ?
Since off_t is used in many places for file length, there should be many more 
problems regarding large files. I just wonder how to generally handle large 
files on these PowerPC and ARM systems. If there is no such general way, using 
off_t wouldn't make sense (except these systems can't handle large files at 
all - but then your patch doesn't make sense).

Maybe you could bring some light...

Regards, Tim

Am Monday 12 November 2012 schrieb Gijs van Tulder:
> Hi,
> 
> There's a somewhat serious issue in the WARC-generating code: on some
> platforms (presumably the ones where off_t is not a 64-bit number) the
> Content-Length header at the top of each WARC record has an incorrect
> length. On these platforms it is sometimes 0, sometimes 1, but never the
> correct length. This makes the whole WARC file unreadable.
> 
> The code works fine on many platforms, but it is apparently a problem on
> some PowerPC and ARM systems, and maybe other systems as well.
> 
> Existing WARC files with this problem can be repaired by replacing the
> value of the Content-Length header with the correct value, for each WARC
> record in the file. The content of the WARC records is there, it's just
> the Content-Length header that is wrong.
> 
> The attached patch fixes the problem in warc.c. It replaces off_t by
> wgint and uses the number_to_static_string function from util.c.
> 
> Regards,
> 
> Gijs



reply via email to

[Prev in Thread] Current Thread [Next in Thread]