bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget


From: Ángel González
Subject: Re: [Bug-wget] wget
Date: Sat, 28 Apr 2012 16:01:40 +0200
User-agent: Thunderbird

On 27/04/12 06:25, Howard Bryden wrote:
> Folks,
>
> I'm using wget 1.13.4 to attempt to recursively download a Sharepoint site.  
> The commandline is just the wget command verb; the contents of ~/.wgetrc are:
>
>
>
> Initially all appeared to work as expected yet it turns out I'm receiving 
> only a subset of the filespace, namely
>
> a) only the first 100 directories are visited, and
> b) only the first 100 files from each directory are actually downloaded.
>
> This pretty much corresponds to the Internet Explorer view, which presents 
> the site in pages of 100 items (directories and files within directories).

How are the next pages accessed?
Can you view those "next pages" if you disable javascript in your
browser? (wget doesn't parse javascript)

I think the problem lies in the way those next pages are linked, so such
a page would be more helpful than the full list of files.

Also, if you can view the full site as mounted on the computer, do you
really need to crawl it with wget?
You could make a similar mount in the Unix server (if it's eg. available
through smb) or simply zip everything locally and transfer that to the
HP server.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]