emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: default charset for text/html selection in X11


From: Po Lu
Subject: Re: default charset for text/html selection in X11
Date: Thu, 22 Jun 2023 08:56:49 +0800
User-agent: Gnus/5.13 (Gnus v5.13)

Robert Pluim <rpluim@gmail.com> writes:

> Hi,
>
> Iʼve been playing around with the `yank-media' stuff Lars added, and
> Iʼve noticed that when yanking a selection with mime-type text/html
> from Chromium, what Iʼm getting is a utf-8 encoded string, which makes
> this:
>
> (defun html-mode--html-yank-handler (_type html)
>   (save-restriction
>     (insert html)
>     (ignore-errors
>       (sgml-pretty-print (point-min) (point-max)))))
>
> insert any codepoints > 127 as their constituent raw bytes
> instead, eg U+A0 ends up as \xc2\xa0 in the buffer.
>
> I *think* it should be OK to assume utf-8 here, and thus do:
>
> (defun html-mode--html-yank-handler (_type html)
>   (save-restriction
>     (insert (decode-coding-string html 'utf-8 t))
>     (ignore-errors
>       (sgml-pretty-print (point-min) (point-max)))))
>
> but I canʼt find a normative reference for that (if this was http, the
> default charset would be iso-8859-1, but this isnʼt http).
>
> Robert

What is the type of the string?  IOW, what's

  (get-text-property html 'foreign-selection)

?

This should be one of the usual X11 string formats: STRING
(iso-latin-1), COMPOUND_TEXT (compound-text-with-extensions), or
UTF8_STRING (utf-8).

If it's anything else, Emacs should try to detect the encoding
automatically, and fall back to Latin-1 if that fails.

Thanks.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]