bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Submitting a patch fixing the --content-disposition optio


From: Vladimír Pýcha
Subject: Re: [Bug-wget] Submitting a patch fixing the --content-disposition option of wget
Date: Fri, 14 Feb 2014 12:59:47 +0100

Hi Darshit,
Thanks for your suggestions, I followed them. Please find attached the
improved patch, created with `git format-patch -1`.

Cheers,
Vlad



On Wed, Feb 12, 2014 at 4:33 PM, Darshit Shah <address@hidden> wrote:

>
> On Wed, Feb 12, 2014 at 9:48 AM, Vladimír Pýcha <address@hidden> wrote:
>
>> Hello,
>> I have created a small patch for wget 1.15. It is related to the
>> experimental --content-disposition option. The patch is attached.
>>
>> This patch ensures that if the filename parameter of Content-Disposition
>> header is url-encoded as described in RFC 2231, then wget decodes it (the
>> character set is ignored).
>>
>> Besides using the --content-disposition option, there may be also needed
>> to
>> use the --restrict-file-names=nocontrol option to avoid wget from escaping
>> some characters of the filename.
>>
>> I have created a testing URL:
>> http://www.vladpride.cz/res/content-disposition-with-unicode.php
>>
>> The source code of the PHP script at that URL is as follows:
>>
>> <?php
>> $str = 'aácč dďsš zžrř iínň';
>> header('Content-Type: text/plain; charset=UTF-8');
>> header("Content-Disposition: attachment;
>> filename*=UTF-8''".rawurlencode($str.'.txt'));
>> echo $str;
>> ?>
>>
>> You can use the following command to test it:
>>
>> wget --content-disposition --restrict-file-names=nocontrol "
>> http://www.vladpride.cz/res/content-disposition-with-unicode.php";
>>
>> Using the above command, my patched version of wget correctly saves the
>> file as:
>>
>> aácč dďsš zžrř iínň.txt
>>
>> Using the same command with unpatched wget, the file is incorrectly saved
>> as:
>>
>> a%C3%A1c%C4%8D%20d%C4%8Fs%C5%A1%20z%C5%BEr%C5%99%20i%C3%ADn%C5%88.txt
>>
>> It would be great if my patch could be incorporated into the next release
>> of wget.
>>
>> I have created the patch file with the following command:
>>
>> diff -rupN wget-1.15/src/ wget-1.15-custom/src/ >
>> content-disposition.patch
>>
>> You can apply the patch with the following command, while in the directory
>> where the source code tarball was extracted to:
>>
>> patch -p1 < ../content-disposition.patch
>>
>> Then the output will be like this:
>>
>> patching file src/http.c
>> patching file src/http.h
>> patching file src/url.c
>> patching file src/url.h
>>
>> Note that in my system, Ubuntu 12.04, I had to install package
>> libgnutls-dev to be able to compile wget.
>>
>> Cheers,
>> Vlad
>>
>
> Hi,
>
> Thanks for your patch! However, it would be great if you could add an
> entry to the ChangeLog in the git tree and recreate the patch file. Also,
> it would be easier to simply create the patch using `git format-patch -1`
>
> You can find the git repository for Wget at:
> http://git.savannah.gnu.org/cgit/wget.git
>
> Some comments about the code inline. (Gmail sometimes mangles lines. So
> I'm sorry if this looks ugly)
>
> iff -rupN wget-1.15/src/http.c wget-1.15-custom/src/http.c
> --- wget-1.15/src/http.c        2014-01-07 15:58:49.000000000 +0100
> +++ wget-1.15-custom/src/http.c 2014-02-12 08:27:28.713919909 +0100
> @@ -1066,7 +1066,20 @@ bool
>  extract_param (const char **source, param_token *name, param_token *value,
>                 char separator)
>  {
> +  return extract_param_new (source, name, value, separator, (bool *) 0);
> +}
> +
> +/* Like extract_param, but with the addition of parameter is_url_encoded
> that is set to true if the value is url-encoded (see RFC 2231 for details).
> */
>
> Your comment exceeds the 80 characters per line convention. It would be
> great if you could fix this.
>
> extract_param_new is *not* like extract_param. Because extract_param
> simply calls extract_param_new. It is like the old version of
> extract_param, but a new reader does not know that. Hence, simply move the
> old comment to your new function.
>
> +
> +bool
> +extract_param_new (const char **source, param_token *name, param_token
> *value,
> +               char separator, bool *is_url_encoded)
>
> I am not sure why you need to define extract_param_new. Why is not editing
> the former extract_param() method a better idea? Sure, you would be
> required to edit all invokations of the method in the source code, but such
> refactoring isn't difficult.
>
> +{
>    const char *p = *source;
> +  if (is_url_encoded)
> +    {
> +      *is_url_encoded = false;
> +    }
>
> Am I correct in interpreting that you are simply forcing the value of
> is_url_encoded to be false? Why have a parameter in that case? I might be
> wrong, but please explain this.
>
>    while (c_isspace (*p)) ++p;
>    if (!*p)
> @@ -1125,6 +1138,9 @@ extract_param (const char **source, para
>    int param_type = modify_param_name(name);
>    if (NOT_RFC2231 != param_type)
>      {
> +      if (RFC2231_ENCODING == param_type && is_url_encoded) {
> +        *is_url_encoded = true;
> +      }
>        modify_param_value(value, param_type);
>      }
>    return true;
> @@ -1137,13 +1153,17 @@ extract_param (const char **source, para
>  /* Appends the string represented by VALUE to FILENAME */
>
>  static void
> -append_value_to_filename (char **filename, param_token const * const
> value)
> +append_value_to_filename (char **filename, param_token const * const
> value, bool is_url_encoded)
>  {
>    int original_length = strlen(*filename);
>    int new_length = strlen(*filename) + (value->e - value->b);
>    *filename = xrealloc (*filename, new_length+1);
>    memcpy (*filename + original_length, value->b, (value->e - value->b));
>    (*filename)[new_length] = '\0';
> +  if (is_url_encoded)
> +    {
> +      url_unescape(*filename + original_length);
> +    }
>  }
>
>  #undef MAX
> @@ -1176,7 +1196,8 @@ parse_content_disposition (const char *h
>  {
>    param_token name, value;
>    *filename = NULL;
> -  while (extract_param (&hdr, &name, &value, ';'))
> +  bool is_url_encoded;
> +  for (is_url_encoded = false; extract_param_new (&hdr, &name, &value,
> ';', &is_url_encoded); is_url_encoded = false)
>      {
>        int isFilename = BOUNDED_EQUAL_NO_CASE ( name.b, name.e, "filename"
> );
>        if ( isFilename && value.b != NULL)
> @@ -1192,9 +1213,15 @@ parse_content_disposition (const char *h
>              continue;
>
>            if (*filename)
> -            append_value_to_filename (filename, &value);
> +            append_value_to_filename (filename, &value, is_url_encoded);
>            else
> -            *filename = strdupdelim (value.b, value.e);
> +            {
> +              *filename = strdupdelim (value.b, value.e);
> +              if (is_url_encoded)
> +                {
> +                  url_unescape (*filename);
> +                }
> +            }
>          }
>      }
>
> diff -rupN wget-1.15/src/http.h wget-1.15-custom/src/http.h
> --- wget-1.15/src/http.h        2014-01-04 13:49:47.000000000 +0100
> +++ wget-1.15-custom/src/http.h 2014-02-12 08:27:28.725919927 +0100
> @@ -44,6 +44,7 @@ typedef struct {
>    const char *b, *e;
>  } param_token;
>  bool extract_param (const char **, param_token *, param_token *, char);
> +bool extract_param_new (const char **, param_token *, param_token *,
> char, bool *);
>
>
>  #endif /* HTTP_H */
> diff -rupN wget-1.15/src/url.c wget-1.15-custom/src/url.c
> --- wget-1.15/src/url.c 2014-01-04 13:49:47.000000000 +0100
> +++ wget-1.15-custom/src/url.c  2014-02-12 08:27:28.725919927 +0100
> @@ -169,7 +169,7 @@ static const unsigned char urlchr_table[
>     The transformation is done in place.  If you need the original
>     string intact, make a copy before calling this function.  */
>
> -static void
> +void
>  url_unescape (char *s)
>  {
>    char *t = s;                  /* t - tortoise */
> diff -rupN wget-1.15/src/url.h wget-1.15-custom/src/url.h
> --- wget-1.15/src/url.h 2013-10-21 16:50:12.000000000 +0200
> +++ wget-1.15-custom/src/url.h  2014-02-12 08:27:28.725919927 +0100
> @@ -100,6 +100,7 @@ struct url
>  /* Function declarations */
>
>  char *url_escape (const char *);
> +void url_unescape (char *);
>  char *url_escape_unsafe_and_reserved (const char *);
>
>  struct url *url_parse (const char *, int *, struct iri *iri, bool
> percent_encode);
>
>
>
> --
> Thanking You,
> Darshit Shah
>
>

Attachment: 0001-URL-decode-the-filename-parameter-of-Content-Disposi.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]