screen-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [screen-devel] [bug #60030] Screen segfaults by displaying some UTF-


From: Axel Beckert
Subject: Re: [screen-devel] [bug #60030] Screen segfaults by displaying some UTF-8 character combination
Date: Thu, 11 Feb 2021 16:57:14 +0100
User-agent: NeoMutt/20170113 (1.7.2)

Hi again,

On Wed, Feb 10, 2021 at 11:01:58PM -0000, Tavis Ormandy wrote:
> On 2021-02-10, Axel Beckert wrote:
> > +  else if (i < sizeof combchars / sizeof *combchars) {
> 
> This doesn't seem right, I think it should be compared against the
> calloc param at the top of utf8_handle_comb(), but I don't really
> understand enough about unicode to know where that 0x802 comes from!

Ack, I seem to have missed one level of dereference at least, so the
calculated size is probably too small.

> --- encoding.c        2020-02-05 12:09:38.000000000 -0800
> +++ encoding.c        2021-02-10 15:00:05.000000000 -0800
> @@ -1357,6 +1357,9 @@
>    int root, i, c1;
>    int isdouble;
>  
> +  if (c > 0x801)
> +    return;
> +
>    c1 = mc->image | (mc->font << 8) | mc->fontx << 16;
>    isdouble = c1 >= 0x1100 && utf8_isdouble(c1);
>    if (!combchars)

While that fix indeed fixes the crash as did mine (probably by
accident :-), I in the meanwhile found rendering issue with both:

I currently assume that this code handles combining diacriticals, i.e.
unicode characters which modify the previous character. Since they can
be stacked and Tavis mentioned that he thinks the code only handles
UTF-8 characters with (max) two bytes, I toyed around with multiple
combining diacriticals in a row. (Yes, I'm aware that these are not
"more than two bytes" with regards to that limit mentioned above.)

I found that without any patch, screen rendered the combination of
e.g. "e̤̒"

* the ASCII letter "e", and
* U+0324 COMBINING DIAERESIS BELOW (size is two bytes in UTF-8)
* U+0312 COMBINING TURNED COMMA ABOVE (size is two bytes in UTF-8)

correctly. With both, Tavis as well as my patch, only U+0324 COMBINING
DIAERESIS BELOW is rendered and U+0312 COMBINING TURNED COMMA ABOVE is
not shown.

Then again, "l᪼" (ASCII "l" + U+1ABC COMBINING DOUBLE PARENTHESES
ABOVE which has a three bytes representation in UTF-8 and clearly
above 0x800) is rendered correctly without patch as well with your
patch and mine.

The same counts for "e𝆫" (ASCII "e" and U+1D1AB MUSICAL SYMBOL
COMBINING UP BOW which has a four bytes representation in UTF-8).

Will test Michael's second patch proposal later today. Looking very
forward to that. :-)

                Kind regards, Axel
-- 
PGP: 2FF9CD59612616B5      /~\  Plain Text Ribbon Campaign, http://arc.pasp.de/
Mail: abe@deuxchevaux.org  \ /  Say No to HTML in E-Mail and Usenet
Mail+Jabber: abe@noone.org  X
https://axel.beckert.ch/   / \  I love long mails: https://email.is-not-s.ms/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]