libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c


From: Thomas Schmitt
Subject: Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c
Date: Tue, 09 Apr 2024 09:00:18 +0200

Hi,

Pete Batard wrote:
> Or maybe there's a mathematical proof that
> a UTF-8 glyph byte encoding can never be larger than 1.5 the UTF-16 glyph
> byte encoding

I thought to have given one. Let me try again:

  https://datatracker.ietf.org/doc/html/rfc3629
  "In UTF-8, characters from the U+0000..U+10FFFF range (the UTF-16
   accessible range) are encoded using sequences of 1 to 4 octets."
The table after this statement shows that it can encode 21 bits that
way.
The older FSS-UTF proposal of 1992 had up to 6 octets for up to 31 bits
but was restricted in 2003 to 21 bits by above RFC. This is also defined
in ISO/IEC 10646:2014 to ISO/IEC 10646:2020.

My proof is that UCS-2 encodes the Unicode points U+0000 to U+FFFF
in 2 bytes which is in UTF-8 encoded in at most 3 bytes.

If the producer of the ISO uses UTF-16 instead of the older UCS-2,
then the input Unicode range is like with UTF-8: U+0000..U+10FFFF.
Characters which do not fit into 2 bytes (and thus possibly not into
3 UTF-8 bytes) get represented as 4 bytes. Given that UTF-8 cannot
exceed 4 bytes, the number of bytes cannot grow during conversion.

(My proposal would accomodate up to 6 UTF-8 bytes for 4 UTF-16 bytes
and thus even suffice for FSS-UTF.)


> So I'm going to stick to i_fname for length, with the expectation that we're
> unlikely to see realistic truncations outside of images designed to trigger
> one,

I try to obey specs and to avoid speculations about what of their
provisions would possibly not happen in practice.
To my experience this pays off on the long run.


> I'm not
> sure I like the idea of trying to be too smart about or expecting specs not
> to change the deal.

My proposal with name allocation of 3*if_name/2 and a result size
parameter of _iso9660_recname_to_cstring() would be as safe against
result overflow as would be yours.
It would additionally guarantee that all valid UCS-2 names lead to valid
and untruncated UTF-8 names.

(One would separately have to check what the character conversion in
libcdio makes out of invalid UTF-16 byte sequences. Whatever the
proposed size check would avoid memory corruption in
_iso9660_recname_to_cstring().)


Have a nice day :)

Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]