bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: probable bug(s) in do_conversion() in iri.c


From: Tim Rühsen
Subject: Re: Fwd: probable bug(s) in do_conversion() in iri.c
Date: Sun, 25 Apr 2021 16:37:45 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1

Hey Derek :-)

Thanks for going deep into this issue.

Your approach of extracting the code and reducing the initial size of the buffer is sensible and allows to test this code much easier then by invoking wget.

If I get it right, you found two issues
1. The trailing 0 byte is always 1 byte in size. This doesn't work if the destination encoding has a different sized 0 code point, e.g. UTF-16 or UCS-4. 2. The buffer reallocation is calculated incorrect so that a buffer overflow can happen.

The questions that arise and that I don't have the time to answer right now:

a) How can we reproduce this by invoking wget ?
I really would like to have a test that reproduces this issue.
A first step would we to lower the initial buffer size in wget and to modify an existing test. Once we have a failing test, the patch can be applied and the test protects against regressions once and for all.

b) Did you take a look into Wget2 to compare the code ?
There is a similar function to basically do the same that was written as library function and which is fuzzed continuously (code for the fuzzers is in /fuzz).
https://gitlab.com/gnuwget/wget2/-/blob/master/libwget/encoding.c#L65

Do you think there is a similar issue, or can we possibly just take that code ? (Or maybe rework wget to use libwget in the long run).

Thank you for helping !

Regards, Tim

On 21.04.21 16:07, Derek Martin wrote:
Actually my previous patch had a bug, in the case where there is one
character remaining to be converted in the input buffer, and the size
of the output buffer need to convert it is greater than in_remain * 2
(2 bytes).  This patch addresses that as well.

diff -ur wget-1.21.1.orig/src/iri.c wget-1.21.1/src/iri.c
--- wget-1.21.1.orig/src/iri.c  2021-01-08 17:51:43.000000000 -0500
+++ wget-1.21.1/src/iri.c       2021-04-21 10:05:28.557549874 -0400
@@ -130,9 +130,9 @@
  {
    iconv_t cd;
    /* sXXXav : hummm hard to guess... */
-  size_t len, done, outlen;
+  size_t outbuf_len = 0, out_remain = 0, converted = 0;
    int invalid = 0, tooshort = 0;
-  char *s, *in, *in_save;
+  char *outcur, *in, *in_save;

    cd = iconv_open (tocode, fromcode);
    if (cd == (iconv_t)(-1))
@@ -148,17 +148,16 @@
    url_unescape_except_reserved (in);
    inlen = strlen(in);

-  len = outlen = inlen * 2;
-  *out = s = xmalloc (outlen + 1);
-  done = 0;
+  /* Leave 4 bytes for null for e.g. UTF-32 */
+  outbuf_len = out_remain = inlen * 2;
+  *out = outcur = xmalloc (outlen + 4);

    for (;;)
      {
-      if (iconv (cd, (ICONV_CONST char **) &in, &inlen, out, &outlen) != (size_t)(-1) 
&&
+      if (iconv (cd, (ICONV_CONST char **) &in, &inlen, &outcur, &outlen) != 
(size_t)(-1) &&
            iconv (cd, NULL, NULL, out, &outlen) != (size_t)(-1))
          {
-          *out = s;
-          *(s + len - outlen - done) = '\0';
+          for (int i = 0; i < 4; ++i) *(outcur + i) = '\0';
            xfree(in_save);
            iconv_close(cd);
            IF_DEBUG
@@ -188,12 +187,23 @@
          }
        else if (errno == E2BIG) /* Output buffer full */
          {
-          tooshort++;
-          done = len;
-          len = done + inlen * 2;
-          s = xrealloc (s, len + 1);
-          *out = s + done - outlen;
-          outlen += inlen * 2;
+          converted = outbuf_len - out_remain;
+          /* If we merely allocate converted + in_remain * 2 chars for the new
+             output buffer, then any case where the input buffer contains 1 
char
+             and requires more than 2 bytes (in_remain * 2 == 1 * 2) to convert
+             it will result in an infinite loop:  In this case no conversion
+             will be done so converted will not change, and the buffer will
+             remain too small to complete the conversion on subsequent passes,
+             resulting in an infinite loop.  This happens e.g. when converting
+             ASCII to UTF-32.  Thus, we need to instead add in_remain * 2
+             characters to the current outbuf_len.  That way next time through
+             the loop the buffer size will actually increase, eventually being
+             large enough to hold the conversion. */
+          outbuf_len += in_remain * 2;
+          /* again, leave 4 bytes for null */
+          *out = xrealloc (*out, outbuf_len + 4);
+          outcur = *out + converted;
+          out_remain = outbuf_len - converted;
          }
        else /* Weird, we got an unspecified error */
          {



Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]