bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

From:	Lars Ingebrigtsen
Subject:	bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date:	Sun, 03 Jul 2022 13:08:04 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

> The problem is not just with BOM.  The problem will happen with any
> coding-system that produces prefix and/or suffix bytes when it encodes
> strings.  The FIXME I added mentions ISO-2022 7-bit encodings as
> another example.
>
> And then there are coding-system's with pre-write-conversion, and
> those can produce any additions they like.
>
>> If we had both, then we could strip the BOM from the individual chars,
>> and add one to the front.
>
> AFAIR, what we have now already handles BOM in coding-system's that
> are known to produce a BOM.  See encode-coding-char.

Ah, OK, it uses (coding-system-get coding-system :bom) and then
special-cases utf-8 and -16 to remove the BOM.

Hm...  I guess the only reliable solution across all coding systems is
(like your comment in the code says) to drop the encode-every-char and
try encoding strings, and then see whether the result is short enough.
That could be done somewhat efficiently using a binary search.  I'll
have a go at it...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

[Prev in Thread]

Current Thread

[Next in Thread]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/02
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/02
  - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen <=
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Andreas Schwab, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/04
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/04
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/05
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03

Prev by Date: bug#56332: 29.0.50; Large gnus imap groups; articles incorrectly marked as read (old)
Next by Date: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Previous by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Next by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Index(es):
- Date
- Thread