bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Using wget command to download from dynamic urls


From: Pratap Kumar Das
Subject: Re: [Bug-wget] Using wget command to download from dynamic urls
Date: Sun, 26 Jul 2009 23:44:04 +0530

Hi,
Actually you are right...It will only work on a specific domain. Our
institute has IEEE subscription...hence from the machines within the
institute, I can download the papers.
One of my friend suggested a way which is as follows...

   1. make sure that http_proxy is set to proper proxy in shell
   2. wget *"http://ieeexplore.ieee.org/**stamp/stamp.jsp?tp=&arnumber=**
   5173739&isnumber=5173690"* -O /tmp/ieee_paper.html
   3. wget `cat /tmp/ieee_paper.html | tr '\"' '\n' | grep '
   http://ieeexplore*.*ieee*.*org/stampPDF/getPDF*.*jsp'`  -O ieee_paper.pdf

With this I can now download the ultimate PDF file from IEEE.

But I was not able to download the mozilla exe from the dynamic link.
I would appreciate if anybody would suggest the way to look into  the souce
file of the intermediate html file and then how to figure out the final
command to download the ultimate file.
Thanks,
Pratap


On Sun, Jul 26, 2009 at 8:43 PM, Tony Lewis <address@hidden> wrote:

> Pratap Kumar Das wrote:
>
> > *The ieee_paper.pdf is expected to be some MBs and a valid pdf
> file...where
> > as the downloaded one is a normal text file.*  [?]
>
> When I follow the link you provided to wget, I get a page with an abstract
> of the paper and instructions on how to buy it. Try renaming the file you
> downloaded to ieee_paper.html and opening it in your browser to see what
> the
> website provided to wget.
>
> You will need to provide wget with the link to the PDF file in order to
> save
> it. Also, it looks like this site requires authentication so you need to
> figure out how the browser remembers its session state (most often this
> uses
> cookies) and "log in" to the site saving whatever session state is
> necessary
> (probably from the URL sessionid=string, cookies, or both) and then use
> that
> session information to grab the PDF file.
>
> If you have to purchase IEEE papers individually, it's probably simpler to
> just save the files from your browser.
>
> Good luck.
>
> Tony
>
>


-- 
******************************************************
PRATAP KUMAR DAS
Research Scholar
VLSI Circuits and Systems Lab
Dept of Electrical Communication Engineering
I.I.Sc, Bangalore
560012
Mobile: +919449974942
******************************************************


reply via email to

[Prev in Thread] Current Thread [Next in Thread]