bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Submitting a patch fixing the --content-disposition optio


From: Darshit Shah
Subject: Re: [Bug-wget] Submitting a patch fixing the --content-disposition option of wget
Date: Fri, 14 Feb 2014 13:12:51 +0100

That seems like a much cleaner patch thanks!!

We usually mention every edit in the ChangeLog. Those missing are trivial.
It would be great if you could those too. Specifically, cookies.c, and
other files which you have changed, even of to change the function header.

Else, Giuseppe while applying it may do it.


On Fri, Feb 14, 2014 at 12:59 PM, Vladimír Pýcha <address@hidden> wrote:

> Hi Darshit,
> Thanks for your suggestions, I followed them. Please find attached the
> improved patch, created with `git format-patch -1`.
>
> Cheers,
> Vlad
>
>
>
> On Wed, Feb 12, 2014 at 4:33 PM, Darshit Shah <address@hidden> wrote:
>
>>
>> On Wed, Feb 12, 2014 at 9:48 AM, Vladimír Pýcha <address@hidden> wrote:
>>
>>> Hello,
>>> I have created a small patch for wget 1.15. It is related to the
>>> experimental --content-disposition option. The patch is attached.
>>>
>>> This patch ensures that if the filename parameter of Content-Disposition
>>> header is url-encoded as described in RFC 2231, then wget decodes it (the
>>> character set is ignored).
>>>
>>> Besides using the --content-disposition option, there may be also needed
>>> to
>>> use the --restrict-file-names=nocontrol option to avoid wget from
>>> escaping
>>> some characters of the filename.
>>>
>>> I have created a testing URL:
>>> http://www.vladpride.cz/res/content-disposition-with-unicode.php
>>>
>>> The source code of the PHP script at that URL is as follows:
>>>
>>> <?php
>>> $str = 'aácč dďsš zžrř iínň';
>>> header('Content-Type: text/plain; charset=UTF-8');
>>> header("Content-Disposition: attachment;
>>> filename*=UTF-8''".rawurlencode($str.'.txt'));
>>> echo $str;
>>> ?>
>>>
>>> You can use the following command to test it:
>>>
>>> wget --content-disposition --restrict-file-names=nocontrol "
>>> http://www.vladpride.cz/res/content-disposition-with-unicode.php";
>>>
>>> Using the above command, my patched version of wget correctly saves the
>>> file as:
>>>
>>> aácč dďsš zžrř iínň.txt
>>>
>>> Using the same command with unpatched wget, the file is incorrectly saved
>>> as:
>>>
>>> a%C3%A1c%C4%8D%20d%C4%8Fs%C5%A1%20z%C5%BEr%C5%99%20i%C3%ADn%C5%88.txt
>>>
>>> It would be great if my patch could be incorporated into the next release
>>> of wget.
>>>
>>> I have created the patch file with the following command:
>>>
>>> diff -rupN wget-1.15/src/ wget-1.15-custom/src/ >
>>> content-disposition.patch
>>>
>>> You can apply the patch with the following command, while in the
>>> directory
>>> where the source code tarball was extracted to:
>>>
>>> patch -p1 < ../content-disposition.patch
>>>
>>> Then the output will be like this:
>>>
>>> patching file src/http.c
>>> patching file src/http.h
>>> patching file src/url.c
>>> patching file src/url.h
>>>
>>> Note that in my system, Ubuntu 12.04, I had to install package
>>> libgnutls-dev to be able to compile wget.
>>>
>>> Cheers,
>>> Vlad
>>>
>>
>> Hi,
>>
>> Thanks for your patch! However, it would be great if you could add an
>> entry to the ChangeLog in the git tree and recreate the patch file. Also,
>> it would be easier to simply create the patch using `git format-patch -1`
>>
>> You can find the git repository for Wget at:
>> http://git.savannah.gnu.org/cgit/wget.git
>>
>> Some comments about the code inline. (Gmail sometimes mangles lines. So
>> I'm sorry if this looks ugly)
>>
>> iff -rupN wget-1.15/src/http.c wget-1.15-custom/src/http.c
>> --- wget-1.15/src/http.c        2014-01-07 15:58:49.000000000 +0100
>> +++ wget-1.15-custom/src/http.c 2014-02-12 08:27:28.713919909 +0100
>> @@ -1066,7 +1066,20 @@ bool
>>  extract_param (const char **source, param_token *name, param_token
>> *value,
>>                 char separator)
>>  {
>> +  return extract_param_new (source, name, value, separator, (bool *) 0);
>> +}
>> +
>> +/* Like extract_param, but with the addition of parameter is_url_encoded
>> that is set to true if the value is url-encoded (see RFC 2231 for details).
>> */
>>
>> Your comment exceeds the 80 characters per line convention. It would be
>> great if you could fix this.
>>
>> extract_param_new is *not* like extract_param. Because extract_param
>> simply calls extract_param_new. It is like the old version of
>> extract_param, but a new reader does not know that. Hence, simply move the
>> old comment to your new function.
>>
>> +
>> +bool
>> +extract_param_new (const char **source, param_token *name, param_token
>> *value,
>> +               char separator, bool *is_url_encoded)
>>
>> I am not sure why you need to define extract_param_new. Why is not
>> editing the former extract_param() method a better idea? Sure, you would be
>> required to edit all invokations of the method in the source code, but such
>> refactoring isn't difficult.
>>
>> +{
>>    const char *p = *source;
>> +  if (is_url_encoded)
>> +    {
>> +      *is_url_encoded = false;
>> +    }
>>
>> Am I correct in interpreting that you are simply forcing the value of
>> is_url_encoded to be false? Why have a parameter in that case? I might be
>> wrong, but please explain this.
>>
>>    while (c_isspace (*p)) ++p;
>>    if (!*p)
>> @@ -1125,6 +1138,9 @@ extract_param (const char **source, para
>>    int param_type = modify_param_name(name);
>>    if (NOT_RFC2231 != param_type)
>>      {
>> +      if (RFC2231_ENCODING == param_type && is_url_encoded) {
>> +        *is_url_encoded = true;
>> +      }
>>        modify_param_value(value, param_type);
>>      }
>>    return true;
>> @@ -1137,13 +1153,17 @@ extract_param (const char **source, para
>>  /* Appends the string represented by VALUE to FILENAME */
>>
>>  static void
>> -append_value_to_filename (char **filename, param_token const * const
>> value)
>> +append_value_to_filename (char **filename, param_token const * const
>> value, bool is_url_encoded)
>>  {
>>    int original_length = strlen(*filename);
>>    int new_length = strlen(*filename) + (value->e - value->b);
>>    *filename = xrealloc (*filename, new_length+1);
>>    memcpy (*filename + original_length, value->b, (value->e - value->b));
>>    (*filename)[new_length] = '\0';
>> +  if (is_url_encoded)
>> +    {
>> +      url_unescape(*filename + original_length);
>> +    }
>>  }
>>
>>  #undef MAX
>> @@ -1176,7 +1196,8 @@ parse_content_disposition (const char *h
>>  {
>>    param_token name, value;
>>    *filename = NULL;
>> -  while (extract_param (&hdr, &name, &value, ';'))
>> +  bool is_url_encoded;
>> +  for (is_url_encoded = false; extract_param_new (&hdr, &name, &value,
>> ';', &is_url_encoded); is_url_encoded = false)
>>      {
>>        int isFilename = BOUNDED_EQUAL_NO_CASE ( name.b, name.e,
>> "filename" );
>>        if ( isFilename && value.b != NULL)
>> @@ -1192,9 +1213,15 @@ parse_content_disposition (const char *h
>>              continue;
>>
>>            if (*filename)
>> -            append_value_to_filename (filename, &value);
>> +            append_value_to_filename (filename, &value, is_url_encoded);
>>            else
>> -            *filename = strdupdelim (value.b, value.e);
>> +            {
>> +              *filename = strdupdelim (value.b, value.e);
>> +              if (is_url_encoded)
>> +                {
>> +                  url_unescape (*filename);
>> +                }
>> +            }
>>          }
>>      }
>>
>> diff -rupN wget-1.15/src/http.h wget-1.15-custom/src/http.h
>> --- wget-1.15/src/http.h        2014-01-04 13:49:47.000000000 +0100
>> +++ wget-1.15-custom/src/http.h 2014-02-12 08:27:28.725919927 +0100
>> @@ -44,6 +44,7 @@ typedef struct {
>>    const char *b, *e;
>>  } param_token;
>>  bool extract_param (const char **, param_token *, param_token *, char);
>> +bool extract_param_new (const char **, param_token *, param_token *,
>> char, bool *);
>>
>>
>>  #endif /* HTTP_H */
>> diff -rupN wget-1.15/src/url.c wget-1.15-custom/src/url.c
>> --- wget-1.15/src/url.c 2014-01-04 13:49:47.000000000 +0100
>> +++ wget-1.15-custom/src/url.c  2014-02-12 08:27:28.725919927 +0100
>> @@ -169,7 +169,7 @@ static const unsigned char urlchr_table[
>>     The transformation is done in place.  If you need the original
>>     string intact, make a copy before calling this function.  */
>>
>> -static void
>> +void
>>  url_unescape (char *s)
>>  {
>>    char *t = s;                  /* t - tortoise */
>> diff -rupN wget-1.15/src/url.h wget-1.15-custom/src/url.h
>> --- wget-1.15/src/url.h 2013-10-21 16:50:12.000000000 +0200
>> +++ wget-1.15-custom/src/url.h  2014-02-12 08:27:28.725919927 +0100
>> @@ -100,6 +100,7 @@ struct url
>>  /* Function declarations */
>>
>>  char *url_escape (const char *);
>> +void url_unescape (char *);
>>  char *url_escape_unsafe_and_reserved (const char *);
>>
>>  struct url *url_parse (const char *, int *, struct iri *iri, bool
>> percent_encode);
>>
>>
>>
>> --
>> Thanking You,
>> Darshit Shah
>>
>>
>


-- 
Thanking You,
Darshit Shah


reply via email to

[Prev in Thread] Current Thread [Next in Thread]