bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?


From: Tim Rühsen
Subject: Re: [Bug-wget] Shouldn't wget strip leading spaces from a URL?
Date: Tue, 6 Jun 2017 09:37:57 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 06/06/2017 05:31 AM, L A Walsh wrote:
> 
> 
> Dale R. Worley wrote:
>> L A Walsh <address@hidden> writes:
>>> if wget gets leading spaces in a URL, it complains:
>>>   "  http://www.kernel.org/pub/linux/utils/util-linux/v2.30: Scheme
>>> missing."
>>>
>>> Isn't it required for a web client to strip leading spaces from
>>> URLs?
>>
>> Strictly speaking, no,
> ---
> You might want to read this web requirements doc:
> 
> https://www.w3.org/TR/2014/REC-html5-20141028/infrastructure.html#strip-leading-and-trailing-whitespace

That is plain HTML5 parsing, has nothing to do how to handle URLs from
the User Interface (CLI, GUI).
Anyways, Wget1.x is based on RFCs, not on recommendations from w3.org.

Wget2 skips leading spaces from URLs given on the CLI.

> Especially this sentence:
> 
>    When a user agent is to strip leading and trailing whitespace from a
> string,
>    the user agent must remove all space characters that are at the start
> or end
>    of the string.
> 
> As part of URL parsing, a user-agent(like wget) is required to strip
> leading and trailing whitespace.

Strictly: when it comes to HTML5 parsing.

But anyways, there will be no objection to deliver a patch :-)
It is a good beginners task.
If you don't want or can't work on that, consider to open an issue (or
do both).

Please also consider removing white space from all command line input, e.g.
        wget " --quiet"
outputs
--2017-06-06 09:33:42--  http://%20--quiet/
Resolving  --quiet ( --quiet)... failed: Name or service not known.
wget: unable to resolve host address ‘ --quiet’

And don't forget other commands as well, e.g.
        rm " -rf" xxx
says
rm: cannot remove ' -rf': No such file or directory
rm: cannot remove 'xxx': Is a directory

Maybe file a bug against bash !? ;-)


With Best Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]