Re: Fwd: probable bug(s) in do_conversion() in iri.c

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: probable bug(s) in do_conversion() in iri.c

From:	Derek Martin
Subject:	Re: Fwd: probable bug(s) in do_conversion() in iri.c
Date:	Mon, 26 Apr 2021 12:33:18 -0400
User-agent:	Mutt/1.9.4 (2018-02-28)

On Sun, Apr 25, 2021 at 04:37:45PM +0200, Tim Rühsen wrote:
> Hey Derek :-)

Hey! :)

> Thanks for going deep into this issue.
> 
> Your approach of extracting the code and reducing the initial size of the
> buffer is sensible and allows to test this code much easier then by invoking
> wget.

Yeah--I don't love the approach because it means you're technically
not testing the same code, but I think the changes are simple enough
that it is possible to reason out that it's functionally equivalent,
aside from the initial intentional too-small allocation to force
triggering the realloc code path.

> If I get it right, you found two issues
> 1. The trailing 0 byte is always 1 byte in size. This doesn't work if the
> destination encoding has a different sized 0 code point, e.g. UTF-16 or
> UCS-4.

Correct (or I believe that is the/a problem).  If you just run the
test program it likely should still work, because the initial
allocation gives you a "fresh" buffer that's effectively zeroed.  But
since malloc doesn't initialize the memory it returns, as the program
runs and the heap is filled with user-supplied data, then when the
single '\0' character is appended it is not a full 32-bit character so
whatever data comes after that would compose the full character.  What
happens after that might be platform- and/or architecture-specific.

[I suppose the test program could be modified to allocate a big chunk,
fill it with non-zero garbage, and then release it so that it is
reused before the conversion takes place... I also suppose whether or
not the same block is actually reused might be platform-dependent?  You
would likely still not see a segfault though unless access leaked into
an unallocated page...]

> 2. The buffer reallocation is calculated incorrect so that a buffer overflow
> can happen.

No, not quite.  It appeared to me that the calls to iconv and
(x)realloc were done correctly; however after the final call to iconv,
the calculation of where to put the terminating null was incorrect,
resulting in it truncating the output buffer somewhere in the middle,
quite possibly mid-character on non-8-bit character sets.

I don't *think* there's a buffer overflow, though this is one thing
I'm not completely clear on, as I could not find details of the
original buffer overflow from CVE-2019-5953.  I was hoping that you
folks had a PoC from that CVE or corresponding test that you could run
against it (and my version of it) to determine that...

> The questions that arise and that I don't have the time to answer right now:
> 
> a) How can we reproduce this by invoking wget ?

That's a very good question... I am unfamiliar with any real-world
IRIs being used, or how they would likely be used.  That further
complicated the testing for me as well, which is a large part of why I
extracted the function.

> I really would like to have a test that reproduces this issue.
> A first step would we to lower the initial buffer size in wget and to modify
> an existing test. Once we have a failing test, the patch can be applied and
> the test protects against regressions once and for all.

So I think the conditions for the code to fail are converting from a
16-bit encoding (like EUC-KR) to UTF-8, as in the case I provided.
Setting the initial buffer size to something intentionally too small
to accommodate the conversion should trigger it.  As for any
real-world IRIs that meet those criteria, I'm not aware of any.  But
if you need someone to provide a string in EUC-KR and its
corresponding UTF-8 conversion, I can do that. ;-)

> b) Did you take a look into Wget2 to compare the code ?
> There is a similar function to basically do the same that was written as
> library function and which is fuzzed continuously (code for the fuzzers is
> in /fuzz).
> https://gitlab.com/gnuwget/wget2/-/blob/master/libwget/encoding.c#L65

I haven't...

> Do you think there is a similar issue, or can we possibly just take that
> code ? (Or maybe rework wget to use libwget in the long run).

I can see if I can find some time to review it...  Not sure when I can
get to that.

> Thank you for helping !

Thanks for your response!

-- 
Derek Martin
Senior System Software Engineer II Lead
Akamai Technologies
demartin@akamai.com

[Prev in Thread]

Current Thread

[Next in Thread]

Fwd: probable bug(s) in do_conversion() in iri.c, Derek Martin, 2021/04/20
- Re: Fwd: probable bug(s) in do_conversion() in iri.c, Derek Martin, 2021/04/20
  - Re: Fwd: probable bug(s) in do_conversion() in iri.c, Derek Martin, 2021/04/21
    - Re: Fwd: probable bug(s) in do_conversion() in iri.c, Tim Rühsen, 2021/04/25
    - Re: Fwd: probable bug(s) in do_conversion() in iri.c, Derek Martin <=
    - Re: Fwd: probable bug(s) in do_conversion() in iri.c, Derek Martin, 2021/04/26
- probable bug(s) in do_conversion() in iri.c, Derek Martin, 2021/04/20

Prev by Date: Re: Fwd: probable bug(s) in do_conversion() in iri.c
Next by Date: bug report - ivo from brazil - how can i fix it Sir?
Previous by thread: Re: Fwd: probable bug(s) in do_conversion() in iri.c
Next by thread: Re: Fwd: probable bug(s) in do_conversion() in iri.c
Index(es):
- Date
- Thread