bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [Bug-Wget] Misc. patches


From: Darshit Shah
Subject: Re: [Bug-wget] [Bug-Wget] Misc. patches
Date: Mon, 21 Jul 2014 13:38:45 +0530

Hi Tim,

As you state, maybe we can wait a little before making all these
changes to Wget for lowercase conversions.

Meanwhile, please look at the attached patch. It fixes a potential
memory leak in Wget, in the case where lowercase conversion of either
cookie_domain or host fails but not both.
The patch also fixes the configure file to correctly detect libpsl.

On Mon, Jul 21, 2014 at 2:28 AM, Tim Rühsen <address@hidden> wrote:
> Am Montag, 21. Juli 2014, 00:58:49 schrieb Darshit Shah:
>> On Mon, Jul 7, 2014 at 8:14 PM, Tim Ruehsen <address@hidden> wrote:
>> > One more comment / idea.
>> >
>> > The 'cookie_domain' comes from a HTTP Set-Cookie repsonse header and thus
>> > is (must be) toASCII() encoded (=puncode). Of course this has to be
>> > checked when normalizing the incoming cookie data. A cookie comain having
>> > non-ascii characters should simply be dropped.
>> >
>> > The whole check only works when 'host' is also in toASCII() (punycode)
>> > form.
>> >
>> > Assuming this, psl_str_to_utf8lower() just reduces to a ASCII lowercase
>> > converter.
>> >
>> > If Wget would convert any domain name input to punycode + lowercase, many
>> > conversions would fall away and case-function would not be needed (e.g.
>> > calling strcmp instead of strcasecmp, the need to call
>> > psl_str_to_utf8lower() would fall away, etc.).
>> >
>> > What do you think ?
>>
>> Sounds like an interesting idea to me. Although, how do you suggest we
>> go about converting the domain names to lowercase?
>> I'm not sure about this, so I confirm first. After running the input
>> domain names through toASCII(), can we simply pass the string to
>> tolower() to get the lowercase version?
>
> That depends on the library you use.
>
> libidn's toASCII() has a built-in lowercase conversion. So the input case does
> not matter, the output is always lowercase ASCII.
>
> Using libidn2, you have to convert to lowercase first yourself (e.g. using
> libunistring). The output is of course lowercase ASCII.
>
> Using libicu, you have to convert to lowercase first yourself (but libicu is
> able to do that). The output is of course lowercase ASCII.
>
>
> What I thought of (what I did in Mget), 'normalize' every domain name before
> further processing/comparing. 'normalizing' means trimming, percent-decoding,
> charset transcoding to UTF-8, toASCII() conversion (with or without prior
> lowercasing, depending on the IDN library used).
>
> Having that, Wget's code just needs strcmp() to compare domains and
> $ wget übel.de Übel.de xn--bel-goa.de
> should reduce to a download of a single file (xn--bel-goa.de/index.html)
> (but maybe it is Wget's policy to explictely download every URL given on the
> command line, even if it is always the same !?)
>
> There is domain name input from the command line (URL's and a few options like
> -D/--domains), from local files (-i/--input-file) and from remote files.
>
> But Darshit, maybe this should have low priority. It is more a kind of 'code
> polishing'. I am looking forward to start a Wget version based on a libwget in
> the next 6-12 months. Most of the code is already working in the Mget project,
> but everything needs polishing (e.g. APi docs and more of Wget functionality,
> -k/convert-links implemented last week ;-) And than the day comes to merge
> Wget and Mget... if that finds any friends ;-)
>
>>
>> > Tim
>> >
>> > On Monday 07 July 2014 17:08:48 Darshit Shah wrote:
>> >> +  if (psl_str_to_utf8lower (cookie_domain, NULL,
>> >> NULL,&cookie_domain_lower)>
>> > == PSL_SUCCESS &&
>> >
>> >> +      psl_str_to_utf8lower (host, NULL, NULL, &host_lower) ==
>> >> PSL_SUCCESS)
>> >> +    {
>> >> +      is_acceptable = psl_is_cookie_domain_acceptable (psl,
>> >> host_lower, cookie_domain_lower);
>> >> +    }
>> >> +  else
>> >> +    {
>> >> +        DEBUGP (("libpsl unable to parse domain name. "
>> >> +                 "Falling back to simple heuristics.\n"));
>> >> +        goto no_psl;
>> >> +    }
>



-- 
Thanking You,
Darshit Shah

Attachment: 0001-Fix-potential-memory-leak-and-libpsl-configure.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]