[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uppercase german umlaut
From: |
Dave Kemper |
Subject: |
Re: uppercase german umlaut |
Date: |
Tue, 9 Jan 2024 01:13:45 -0600 |
On 1/8/24, hohe72@posteo.de <hohe72@posteo.de> wrote:
> On Tue, 2 Jan 2024 11:04:25 -0600
> Dave Kemper <saint.snit@gmail.com> wrote:
>
>> > ECMA-48 says for 0x84:
>>
>> Also irrelevant to groff, as it doesn't use ECMA-48. Groff tools
>> (including gpic) take input in Latin-1, period.
>
> I don't think so. ECMA-48 may be interpreted by terminals.
In the message to which I was replying, you were speaking of the
sequence of bytes that were part of the input to gpic; in this realm,
ECMA-48 is irrelevant. And in any case, the 0x84 byte in question is
part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL
LETTER A WITH DIAERESIS; if it's being interpreted by a terminal
somewhere as ECMA-48, something is going wrong.
What seems to be going wrong in this instance is that you're passing
UTF-8 directly to gpic without first running it through preconv or
iconv, resulting in a byte sequence gpic doesn't recognize. You
haven't said whether you've tried converting the input before sending
it to gpic, or why you're avoiding preconv.
> In the case of terminal output, those characters if interpreted as
> control sequences would thrown the output into disarray. Therefore,
> if I'm right, it's rejected as invalid but not passed through.
Correct, gpic won't pass through bytes it considers invalid.
$ echo Ä | od -t x1
0000000 c3 84 0a
0000003
$ echo Ä | pic | grep -av '^\.' | od -t x1
pic:<standard input>:1: invalid input character code 132
0000000 c3 0a
0000002
gpic strips the 0x84 (decimal 132) byte, leaving you with invalid
UTF-8, or valid but erroneous Latin-1.