bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#67926: 29.1; fail to extract ZIP subfile named with [...]


From: awrhygty
Subject: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 04 Jan 2024 04:53:26 +0900
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

>> My interest is how to avoid naming problems.
>> There are more difficulties in Japanese.
>> Japanese characters in file names are normally encoded in cp932.
>> Encoded characters may have '[', '\' or ']' as a second byte.
>>   (encode-coding-string "ゼソゾ" 'cp932)
>>   => "\203[\203\\\203]"
>> Subfiles of such names can not be extracted normally.
>
> I don't think we can solve this in Emacs: non-ASCII file names in zip
> archives are a mess, even before you consider the fact that zip
> archives are frequently moved between systems.  For starters, how can
> one know in advance what is the encoding of file names in an arbitrary
> zip archive?  This will bite you even if we do everything in Emacs,
> and even if someone does submit patches to implement all the
> compression methods.

So I need a extractor without subfile names.
It is more usefull to extract contents with broken names than unable to
extract contents at all.

And I found my unzip.exe cannot extract BZIP2 or LZMA compressed
subfiles created by python zipfile module. I doubt unzip.exe does not
work for all compression methods.

By the way, I didn't know zlib-decompress-region function.
Now subfiles compressed with deflate method can be extracted
only with elisp program.

(advice-add #'archive-zip-extract :override
            #'archive-zip-decompress-content)

(defun archive-zip-decompress-content (archive name)
  (let* ((desc archive-subfile-mode)
         (buf (current-buffer))
         (bufname (buffer-file-name)))
    (set-buffer archive-superior-buffer)
    (save-restriction
      (widen)
      (let* ((file-beg archive-proper-file-start)
             (p0 (+ file-beg (archive--file-desc-pos desc)))
             (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
             (bitflags (archive-l-e (+ p  6) 2))
             (method   (archive-l-e (+ p  8) 2))
             (compsize (archive-l-e (+ p0 20) 4))
             (fn-len   (archive-l-e (+ p 26) 2))
             (ex-len   (archive-l-e (+ p 28) 2))
             (data-beg (+ p 30 fn-len ex-len))
             (data-end (+ data-beg compsize))
             (coding-system-for-read  'no-conversion)
             (coding-system-for-write 'no-conversion)
             (default-directory temporary-file-directory))
        (cond ((/= 0 (logand bitflags 1))
               (message "Subfile is encrypted"))
              ((= method 0)
               (with-current-buffer buf
                 (insert-buffer-substring archive-superior-buffer
                                          data-beg data-end)))
              ((eq method 8)
               (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
                     (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
                     (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
                 (with-current-buffer buf
                   (set-buffer-multibyte nil)
                   (insert header)
                   (insert-buffer-substring archive-superior-buffer
                                            data-beg data-end)
                   (insert crc-32 orig-size)
                   (zlib-decompress-region (point-min) (point-max))
                   (set-buffer-multibyte 'to))))
              ((eq method 12)
               (call-process-region data-beg data-end
                                    "bzip2" nil buf nil "-cd"))
              (t (message "Unknown compression method")))))
    (set-buffer buf)))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]