bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM

From:	Eli Zaretskii
Subject:	bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date:	Sat, 02 Jul 2022 19:37:07 +0300

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Glenn Morris <rgm@gnu.org>,  schwab@linux-m68k.org,  48324@debbugs.gnu.org
> Date: Sat, 02 Jul 2022 18:14:39 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This actually reveals a design flaw in string-limit: we cannot simply
> > use encode-coding-char to encode the characters one by one.  I added a
> > FIXME comment to explain why, as I don't currently have any clever
> > ideas for how to implement it more correctly, except by iterations,
> > which is inelegant.  Ideas welcome.
> 
> Hm...  do we have some way of knowing that the coding system we're using
> is one that should have a BOM?  And a function to remove the BOM?

The problem is not just with BOM.  The problem will happen with any
coding-system that produces prefix and/or suffix bytes when it encodes
strings.  The FIXME I added mentions ISO-2022 7-bit encodings as
another example.

And then there are coding-system's with pre-write-conversion, and
those can produce any additions they like.

> If we had both, then we could strip the BOM from the individual chars,
> and add one to the front.

AFAIR, what we have now already handles BOM in coding-system's that
are known to produce a BOM.  See encode-coding-char.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/02
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii <=
  - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Andreas Schwab, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/04
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/04
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/05
    - bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03

Prev by Date: bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
Next by Date: bug#56332: 29.0.50; Large gnus imap groups; articles incorrectly marked as read (old)
Previous by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Next by thread: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Index(es):
- Date
- Thread