bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] WARC File Creation - Scope Issues


From: McFate, Mark
Subject: [Bug-wget] WARC File Creation - Scope Issues
Date: Thu, 11 Apr 2013 15:13:57 +0000

This is not a 'bug' by any means, but I could find no better place to post this 
so please forgive me...

I've used 'wget' for years but am just now discovering the real power it has.  
Lately I have upgraded to v1.14 so that I can take advantage of WARC file 
creation.  But I need to learn a lot more.  In particular, I'm having trouble 
controlling the scope of the content returned by wget when using the -warc-file 
option (or even when not).  The -mirror option is nice, but in many 
circumstances it returns far too much information, and limiting the return 
using the -l option requires trial and error as I am never sure how deep to set 
it.

For example, I would like to retrieve the following set of pages as a WARC, but 
don't really want anything else from this domain:  
https://webarchive.jira.com/wiki/display/wayback/Wayback+Installation+and+Configuration+Guide#WaybackInstallationandConfigurationGuide-URLsandWebApplications.
  Is it even possible using wget to capture a complete WARC containing only 
this document?

So, I'm looking for guidance that might be pertinent to using wget for WARC 
retrieval.  Please point me to anything you think might be helpful.  Thanks.

Mark A. McFate
Digital Library Applications Developer
Burling Library, Grinnell College
Grinnell, IA  50112-1690
address@hidden<mailto:address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]