help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help*


From: Van L
Subject: Re: 26.1.92, 26.1-mac-7.4; unrecognised escaped chars in *Help*
Date: Wed, 06 Mar 2019 11:47:29 +1100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (berkeley-unix)

Eli writes:

>> >From the *scratch* buffer, I lookup the keybinding possibilities by
>> 
>>   C-h b
>> 
>> Under the Global Bindings section, the two lines under SPC look to be
>> encoded in Latin-1. I guess Emacs assumes UTF-8.
>
> No, this has nothing to do with encoding.  This text is produced by
> Emacs itself … the internal representation of characters in Emacs
> buffers and strings

>> \200 .. 3FF_F7F      self-insert-command
>> \200 .. \377 self-insert-command
>
> Yes.  This is admittedly confusing, although 100% correct.

But. But. But. Less than 100% beautiful. The out of ASCII range row
terminated by unprintables as visually balanced hex values in a box
would look and feel nicer.

> To start
> digging into what happens here, go to each of the 2 \200's and type
> "C-u C-x =".  You will see that these two look identically on display,
> but are actually two very different beasts: the former is a Unicode
> character whose codepoint happens to be 200 octal (0x80 in hex), the
> latter is a raw byte of the same value.

They are born digital homonyms.

> Emacs distinguishes between
> them.  The confusing bit here is that they are by default both
> displayed identically, 

"C-u C-x =" or M-x describe-char RET puts them in

    category: l:Latin
    category: L:Left-to-right (strong)

> for dull historical reasons (once upon a time,
> Emacs didn't distinguish between them).  (Perhaps there's no longer a
> reason to use this confusing display nowadays.)

Wouldn't it be funny to pull on that string? all the way to the bottom
is tied a boat anchor in the shape of a first of its kind 1950s Chinese
electric computer keyboard invented and made in the U.S.A. which was
being considered a gift to China by the Ike Admin.

> So the first of the above 2 lines stands for all the non-ASCII Unicode
> characters, all of which are bound to self-insert-command by default.

> By contrast, the second row shows all the raw bytes, which are also
> bound to self-insert-command by default.

> IOW, unlike the case with EWW showing incorrectly decoded text, here
> the issue is with how characters are _displayed_, 

> And now to your question:
>
>> I know what to do for this kind of situation in EWW, type "E latin-1 RET".
>> 
>> What goes here?
>
> Type
>
>   M-x customize-variable RET glyphless-char-display-control RET
>

Thank you.

Should I file a bug report for copy and paste inconsistency when trying
to collect in one buffer the `M-x describe-char' output? for the above two.

Highlight region then M-w C-y fails
whereas the middle-mouse button
paste works.

Having done that and attempting to save the buffer presents the
following on problematic characters which makes sense given the above
explanation

-- quote
These default coding systems were tried to encode text
in the buffer ‘x’:
  (utf-8 (845 . 4194176) (861 . 4194176) (1376 . 4194176))
However, each of them encountered characters it couldn’t encode:
  utf-8 cannot encode these: \200 \200 \200

Click on a character (or switch to this window by ‘C-x o’
and select the characters by RET) to jump to the place it appears,
where ‘C-u C-x =’ will give information about it.

Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
or specify any other coding system (and risk losing
   the problematic characters).

  raw-text no-conversion

-- quote ends

-- 
© 2019 Van L
gpg using EEF2 37E9 3840 0D5D 9183  251E 9830 384E 9683 B835
"What's so strange when you know that you're a Wizard at 3?" -Joni Mitchell




reply via email to

[Prev in Thread] Current Thread [Next in Thread]