[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing of multibyte strings frpom process output
From: |
Michael Albinus |
Subject: |
Re: Parsing of multibyte strings frpom process output |
Date: |
Tue, 08 May 2018 14:01:22 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) |
Helmut Eller <eller.helmut@gmail.com> writes:
Hi Helmut,
>> However, I don't know how to parse it that I could retrieve it. All
>> what I have tried returns always the *two* characters ?\xc2 ?\x9a,
>> multibyte encoded. How could I get just the multibyte character ?\x9a
>> from this?
>
> You could use (set-process-coding-system <proc> 'utf-8) if you know that
> the all output of the process is indeed utf-8 encoded.
I've done this already, for other purposes. But it doesn't help, the
string /home/albinus/tmp/\xc2\x9abung is written literally into the
output buffer.
> Alternatively, you could use 'binary as coding system and manually call
> decode-coding-string on the parts that are utf-8 encoded. However keep
> in mind, that "raw bytes" in multibyte strings have char codes in the
> range #x3FFF00..#x3FFFFF.
I tried that, with no luck. But I didn't know that "raw" bytes are in
that range.
> (decode-coding-string (string #x3FFFc3 #x3FFF9c) 'utf-8) => "Ü"
That's it! The following code works for me (res-symlink-target keeps the
file name from process output, as shown above):
--8<---------------cut here---------------start------------->8---
(setq res-symlink-target
;; Parse multibyte codings.
(decode-coding-string
(replace-regexp-in-string
"\\\\x\\([[:xdigit:]]\\{2\\}\\)"
(lambda (x)
(string
(string-to-number (concat "3FFF" (match-string 1 x)) 16)))
res-symlink-target)
'utf-8))
--8<---------------cut here---------------end--------------->8---
Thanks a lot!
> Helmut
Best regards, Muichael.