bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos


From: Eli Zaretskii
Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
Date: Sun, 21 Mar 2021 17:27:45 +0200

> Date: Thu, 04 Mar 2021 21:21:24 +0000
> From: Gregory Heytings <gregory@heytings.org>
> 
> (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP 
> encoding, and although this looks like a bug, I'm not sure this is 
> actually a bug; I report this at the suggesion of Eli in bug#46859.)
> 
> I downloaded the file [1], and converted it to the ISO-2022-JP encoding 
> with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt.  The resulting file 
> is attached to this bug report.  It ends with two CRLFs, at byte offsets 
> 2993 and 2995.  However, after emacs -Q iso-2022-jp.txt, with M-: 
> (goto-char (filepos-to-bufferpos POS 'exact)) we get:
> 
> POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU)
> POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2995, 2996: first CRLF
> POS = 2997: second CRLF
> POS = 2998: point-max
> POS = 2999: first CRLF
> POS = 3000, 3001: second CRLF
> POS >= 3002: point-max
> 
> I would have expected:
> 
> POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU)
> POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2993, 2994: first CRLF
> POS = 2995, 2996: second CRLF
> POS >= 2997: point-max
> 
> The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact) 
> apparently also has bugs; its return values are not coherent with the 
> above ones:
> 
> POS = 0: 3003
> POS = 1: 3001
> POS = 2: 2999
> POS = 3 (IDEOGRAPHIC FULL STOP): 2997
> POS = 4 (HIRAGANA LETTER RU): 2995
> 
> I would have expected:
> 
> POS = 0: 2997
> POS = 1: 2995
> POS = 2: 2993
> POS = 3 (IDEOGRAPHIC FULL STOP): 2991
> POS = 4 (HIRAGANA LETTER RU): 2989
> 
> [1] 
> https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt

There's something strange going on here with encoding of the buffer
using iso-2022-jp-dos: near the end of the encoded bytestream, between
the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we
get 6 extra bytes: "ESC ( B ESC $ B".  AFAIU, this sequence mean
switch to ASCII and then switch back to Japanese.  So together these 6
bytes are a no-op as regards to their effect on the text, but they
disrupt the logic of filepos-to-bufferpos because they introduce extra
bytes that aren't there in the original file.

Kenichi, why are these 6 bytes inserted by encode-coding-region, but
not when we encode the same text as part of saving the buffer to its
file?  And why does it happen near the end of the text, between those
2 particular letters?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]