zutils-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Skippable Frames in zstd files


From: Jamboretz, Chris
Subject: RE: Skippable Frames in zstd files
Date: Mon, 22 Aug 2022 20:34:08 +0000

Hi Antonio,

Thanks Antonio. This has been a learning process for me.
Starting with a skippable frame doesn't violate the zstd format specification. 
The zstd format specification talks about compatibility with lz4 skippable 
frames, but not conformity to the lz4 frame ordering. And the zstd program is 
working with skippable frames at the beginning. I agree pzstd made the wrong 
decision here and they should fix both the program and the tighten the format 
specification. Allowing skippable frames up front requires jumping deeper into 
the file to figure out if it is lz4, zstd, or neither. Possibly many times 
because there's no limit on the number of skippable frames allowed. You or I 
could file an issue to the zstd format specification and pzstd design.
But that still leaves us with many legacy files already compressed and we have 
to deal with them somehow. From a pragmatic point of view it is more helpful 
for zutils to deal with the files as they are. 

Thank you,
Chris

-----Original Message-----
From: Zutils-bug <zutils-bug-bounces+chris.jamboretz=intel.com@nongnu.org> On 
Behalf Of Antonio Diaz Diaz
Sent: Saturday, August 20, 2022 9:30 AM
To: zutils-bug@nongnu.org
Subject: Re: Skippable Frames in zstd files

Hi Chris,

Jamboretz, Chris wrote:
> The zstd compression format allows for skippable frames 
> https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.
> md#skippable-frames
>
> pzstd uses this and puts the skippable frame at the beginning of the file.

Thank you for reporting this.

I think pzstd is making an improper use of skippable frames.

The link above states that "Skippable frames defined in this specification are 
compatible with LZ4 ones".

So it can't be known from the magic bytes if a file starting with a skippable 
frame is in zstd format or in lz4 format.

In fact,
https://github.com/lz4/lz4/blob/dev/doc/lz4_Frame_format.md#skippable-frames
advises not to do what pzstd is doing:

"For the purpose of facilitating identification, it is discouraged to start a 
flow of concatenated frames with a skippable frame. If there is a need to start 
such a flow with some user data encapsulated into a skippable frame, it's 
recommended to start with a zero-byte LZ4 frame followed by a skippable frame. 
This will make it easier for file type identifiers."

> The function test_format in zutils.cc doesn't take this magic_number into 
> account and so reports the file as uncompressed.

Detecting the 16 (!) magic numbers of skippable frames in test_format would be 
wrong because they do not really identify the zstd format. IMHO pzstd should 
follow the documentation of lz4 and start its files with a zero-byte zstd frame.

Best regards,
Antonio.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]