bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] unexpected behaviour of wget on some long links


From: Darshit Shah
Subject: Re: [Bug-wget] unexpected behaviour of wget on some long links
Date: Thu, 13 Jun 2013 21:02:06 +0530

Bykov's suggestion is bang on accurate.

The issue you are facing is that the ampersand (&) is a special character
in the Bash Shell that asks the shell to run the command in the background
and return control of the Shell to the user.
The shell is reading the & character in your URL and sending the command to
the background. This is the expected outcome too. You should quote your URL
with either of single ( ' ) or double ( " ) quotes to prevent the shell
from processing that character.

On Thu, Jun 13, 2013 at 8:52 PM, Bykov Aleksey <address@hidden> wrote:

> Greetings, Yiwei Yang
> Sorry for stupid question, but does You try to use qoutes to escape url?
>
> wget -p -np -nc -nd --delete-after -t 1 -T 20 -P somefolder "<url>"
> or
> wget -p -np -nc -nd --delete-after -t 1 -T 20 -P somefolder '<url>'
> Shell can interpret ampersand as command separator...
>
> --
> Best regars, Alex
>
>
>  Hi,
>>     I wrote a c program and read a list of URLs and feed into wget one by
>> one with the following command:
>>
>>    wget -p -np -nc -nd --delete-after -t 1 -T 20 -P somefolder <url>
>>
>> However, with some long links, like:
>> http://www.linkedin.com/nhome/**nus-redirect?url=http%3A%2F%**
>> 2Fwww%2Elinkedin%2Ecom%**2Fprofile%2Fview%3Fid%**
>> 3D86239627%26snapshotID%3D%**26authType%3Dname%26authToken%**
>> 3DRWgi%26ref%3DNUS%26goback%**3D%252Enmp_*1_*1_*1_*1_*1_*1_***
>> 1_*1_*1_*1%26trk%3DNUS-body-**member-name&urlhash=-U2e&**
>> trkToken=action%3DviewMember%**26pageKey%3Dmember-home%**
>> 26contextId%3Dbf3a735f-6394-**4304-a98b-3ee6fa4b6515%**
>> 26distanceFromViewer%3D1%**26aggregationType%3Dnone%**
>> 26isPublic%3Dfalse%26verbType%**3Dlinkedin%3Aconnect%**
>> 26activityId%3Dactivity%**3A5730101464181772288%**26isDigested%3Dfalse%**
>> 26isFolloweeOfPoster%3Dfalse%**26actorType%3Dlinkedin%**
>> 3Amember%26feedPosition%3D15%**26actorId%3Dmember%3A86239627%**
>> 26objectId%3Dmember%**3A129413241%26rowPosition%3D1%**
>> 26objectType%3Dlinkedin%**3Amember<http://www.linkedin.com/nhome/nus-redirect?url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fprofile%2Fview%3Fid%3D86239627%26snapshotID%3D%26authType%3Dname%26authToken%3DRWgi%26ref%3DNUS%26goback%3D%252Enmp_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1%26trk%3DNUS-body-member-name&urlhash=-U2e&trkToken=action%3DviewMember%26pageKey%3Dmember-home%26contextId%3Dbf3a735f-6394-4304-a98b-3ee6fa4b6515%26distanceFromViewer%3D1%26aggregationType%3Dnone%26isPublic%3Dfalse%26verbType%3Dlinkedin%3Aconnect%26activityId%3Dactivity%3A5730101464181772288%26isDigested%3Dfalse%26isFolloweeOfPoster%3Dfalse%26actorType%3Dlinkedin%3Amember%26feedPosition%3D15%26actorId%3Dmember%3A86239627%26objectId%3Dmember%3A129413241%26rowPosition%3D1%26objectType%3Dlinkedin%3Amember>
>>
>> it will show me finished the fetching but it will just block there until I
>> hit enter, but then the whole program will exit without proceeding to the
>> next link.
>>
>> Another situation is I might get HTTP 404 error, for example, from:
>>
>> https://www.google.com/url?**url=https://plus.google.com/**
>> 118428821259931683184/about%**3Fhl%3Den%26socfid%3Dweb:lu:**
>> result:writeareviewplusurl%**26socpid%3D1&rct=j&sa=X&ei=**
>> u11sUee0B9Pa2wWe-YGQAQ&ved=**0CHAQ4gkwBw&q=usps&usg=**
>> AFQjCNFEjQ3SZNRXD6VNDQAjvOS2gX**BYbw<https://www.google.com/url?url=https://plus.google.com/118428821259931683184/about%3Fhl%3Den%26socfid%3Dweb:lu:result:writeareviewplusurl%26socpid%3D1&rct=j&sa=X&ei=u11sUee0B9Pa2wWe-YGQAQ&ved=0CHAQ4gkwBw&q=usps&usg=AFQjCNFEjQ3SZNRXD6VNDQAjvOS2gXBYbw>
>>
>> or from
>> https://maps.google.com/maps?**client=ubuntu&channel=fs&oe=**
>> utf-8&ie=UTF-8&q=usps&fb=1&gl=**us&hq=usps&hnear=**0x880cd7968484428f:**
>> 0xf48dcbad390c6541,Urbana,+IL&**ei=2_a1UaDNOsnDqQHD8oHwCg&ved=**0CMABELYD<https://maps.google.com/maps?client=ubuntu&channel=fs&oe=utf-8&ie=UTF-8&q=usps&fb=1&gl=us&hq=usps&hnear=0x880cd7968484428f:0xf48dcbad390c6541,Urbana,+IL&ei=2_a1UaDNOsnDqQHD8oHwCg&ved=0CMABELYD>
>>
>> And -p will fetch from some other links and sometimes I get  HTTP400 or
>> HTTP500 errors(this situation increases if I add -H in the command),
>>
>> So my question is:
>> Is there any restrictions on what kind of links could I use wget on? But
>> if
>> I use -p, it will try to fetch other links that I don't have control, so
>> is
>> there way to not to fetch links that will get HTTP errors so that my
>> program won't crash?
>>
>> Thank you very much!
>>
>> Lucy
>>
>
>


-- 
Thanking You,
Darshit Shah


reply via email to

[Prev in Thread] Current Thread [Next in Thread]