help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)


From: Garjola Dindi
Subject: Re: Incorrect rendering of accented characters in HTML e-mail (Gnus)
Date: Sat, 10 Oct 2020 16:35:05 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

On Sat 10-Oct-2020 at 16:00:54 +02, Eli Zaretskii <eliz@gnu.org> wrote: 
>> From: Garjola Dindi <garjola@garjola.net>
>> Date: Sat, 10 Oct 2020 15:34:02 +0200
>> 
>> If I use describe-char to inspect the characters, I get this before
>> «washing»:
>> 
>> ,----
>> | position: 470 of 867 (54%), column: 30                                     
>> | character: i (displayed as i) (codepoint 105, #o151, #x69)                 
>> | charset: ascii (ASCII (ISO646 IRV))                                        
>> | code point in charset: 0x69                                                
>> | script: latin                                                              
>> | syntax: w  which means: word                                          |
>> | category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, r:Roman      
>> | to input: type "C-x 8 RET 69" or "C-x 8 RET LATIN SMALL LETTER I"          
>> | buffer code: #x69                                                          
>> | file code: #x69 (encoded by coding system utf-8-unix)                      
>> | display: by this font (glyph code)                                         
>> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 (#x4C
>> |                                                                            
>> | Character code properties: customize what to show                          
>> | name: LATIN SMALL LETTER I                                                 
>> | general-category: Ll (Letter, Lowercase)                                   
>> | decomposition: (105) ('i')                                                 
>> |                                                                            
>> | There is an overlay here:                                                  
>> | From 440 to 520                                                            
>> | face                 hl-line                                               
>> | priority             -50                                                   
>> | window               #<window 141 on *Article nnmaildir+RSSFeeds:ABlog*>   
>> |                                                                            
>> |                                                                            
>> | There are text properties here:                                            
>> | face                 variable-pitch                                        
>> `----
>> 
>> And this after «washing»
>> 
>> ,----
>> | position: 472 of 871 (54%), column: 30                                     
>>  
>> | character: é (displayed as é) (codepoint 233, #o351, #xe9)                 
>>  
>> | charset: unicode (Unicode (ISO10646))                                      
>>  
>> | code point in charset: 0xE9                                                
>>  
>> | script: latin                                                              
>>  
>> | syntax: w  which means: word                                             
>> | category: .:Base, L:Left-to-right (strong), c:Chinese, j:Japanese, 
>> l:Latin, 
>> | v:Viet
>> | to input: type "C-x 8 RET e9" or "C-x 8 RET LATIN SMALL LETTER E WITH 
>> ACUTE"
>> | buffer code: #xC3 #xA9                                                     
>>  
>> | file code: #xC3 #xA9 (encoded by coding system utf-8-unix)                 
>>  
>> | display: by this font (glyph code)                                         
>>  
>> | ftcrhb:-GOOG-Noto Sans-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1 
>> (#xAB)
>> |                                                                            
>>  
>> | Character code properties: customize what to show                          
>>  
>> | name: LATIN SMALL LETTER E WITH ACUTE                                      
>>  
>> | old-name: LATIN SMALL LETTER E ACUTE                                       
>>  
>> | general-category: Ll (Letter, Lowercase)                                   
>>  
>> | decomposition: (101 769) ('e' '́')                                         
>>   
>> |                                                                            
>>  
>> | There is an overlay here:                                                  
>>  
>> | From 442 to 523                                                            
>>  
>> | face                 hl-line                                               
>>  
>> | priority             -50                                                   
>>  
>> | window               #<window 155 on *Article nnmaildir+RSSFeeds:ABlog*>   
>>  
>> |                                                                            
>>  
>> |                                                                            
>>  
>> | There are text properties here:                                            
>>  
>> | face                 variable-pitch                                        
>>  
>> `----
>> 
>> The html part of the e-mails contains
>> 
>> ,----
>> | < #part type=text/plain format="flowed" charset="utf-8"
>> | disposition=inline nofile=yes>
>> `----
>> 
>> so I guess that the html renderer should pick it up. I have tested shr,
>> gnus-w3m and w3m and I always get the same result.
>> 
>> I would be grateful if somebody could help me understand what happens.
>
> How does the character appear in the original HTML?

Thanks for your quick response.

I don't know if I am inspecting the message correctly, because when I
enter the edit mode, all characters appear OK. Therefore, I am not sure
if I an seeing the original html.

I have also noticed that the I also have the same issue with non html
e-mails. I thought they were html, but they are just multipart.

For instance, here is what I see in the article buffer:

,----
| \311lodie, qui a rejoint l'\351quipe podcast, me dit que sa soeur, qui a une
| formation th\351\342trale, serait disponible ponctuellement pour faire des
| voix pour des lectures. Pour le moment on a jamais eu ce besoin mais \347a
| peut ouvrir des perspectives. 
`----

(I have replaced the non printable chars with \xxx) and here is what I
see in edit mode: 

,----
| 
| 
| 
| Élodie, qui a rejoint l'équipe podcast, me dit que sa soeur, qui a une
| formation théâtrale, serait disponible ponctuellement pour faire des
| voix pour des lectures. Pour le moment on a jamais eu ce besoin mais ça
| peut ouvrir des perspectives. 
| 
| Pour connaître la configuration de la liste, gérer votre abonnement à la 
| liste et vos informations personnelles :
| https://listes.april.org/wws/info/libreavous
| 
`----


Again, when I quit the edit mode, the article buffer displays things correctly.

In the case of html I have for instace this in the article buffer:


,----
| Les groupes suprimacistes blancs ont profiti du mandat de Donald Trump et des 
...
`----


and this in the edit mode buffer 

,----
| 
| 
`----

So now I think this is not due to html, but to multipart MIME.

Thanks again for your help.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]