bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget2 Feature Suggestion - Triston Line


From: Triston Line
Subject: Re: [Bug-wget] wget2 Feature Suggestion - Triston Line
Date: Fri, 13 Jul 2018 09:52:34 -0700

Hi Tim,

Excellent answer thank you very much for this info, "-N" or
"--timestamping" sounds like a much better way to go, however if I'm
converting links, using wget (1) I think I've read somewhere and noticed
that two separate commands running in series wouldn't be able to continue
due to the links from the previous session/command-instance? More clearly,
I've read that the primary reason continuing from a fault is impossible is
due to the fact that converting links to mirror isn't something that can be
continued and the links are only valid for that session. Sounds silly to me
because you're just formatting <a href> tags from my understanding but
there's probably a bit more to it.

I have used max-threads in the past and I've tried a suggestion for xargs
on one of the stack exchange forums, so I do toy with those settings while
testing out my friend's servers at UBC. Government on the other hand I
might get in a bit of trouble if I'm loading them during working hours
(Gosh knows I don't wanna come in at some ungodly hour (e.g. 3 am) with the
network-services team to toy around with their stuff at different sites or
perform intranet backups around different sites from my local).

" The server then only sends payload/data if it has a newer version of that
document, else it responds with 304 Not Modified." This is 400 Bytes to
respond with the last modification date of a file? I'm aware FTP uses open
timestamps on files, but do most apache/nginx servers? I'll query the gov
about their policy but I somehow doubt it's complicated (Or maybe a
security auditor came in and messed it up, seems to be the way around here!
"It's all default or it's specialized to the point of gibberish").

4MB of Extra download is very little to me, I've filled a few TB on a home
server just looping different loaded thread counts and them
remote-form/cloud-application queries (with packages separate and apart
from wget). Actually I was working on one of our old FTP servers, it had a
backup of our local reports to Environment Canada and NOAA, the log file
for WGET was well over 350MB of text and I can't remember what the
provincial and federal legislative "backups" was but holy crap that log was
longer than the circumference of the earth at 3 point font. I say backups
because there's a better way than to mirror the site but "no that would
take too much paperwork and this is a better work around".

Thanks Tim, I will toy with these new options over the weekend, I was
actually wondering about updates to site mapping and site-probing. In the
mean time I have to make phone apps for emergency preparedness within
community health services *eyeroll* :P I really appreciate your work on
this package by the way, as you can tell wget has helped me in many
endeavors and has clearly improved how the government and much of society
operate :)

Triston



On Fri, Jul 13, 2018 at 2:34 AM, Tim Rühsen <address@hidden> wrote:

> On 07/12/2018 08:12 PM, Triston Line wrote:
> > If that's possible that would help immensely. I "review" sites for my
> > friends at UBC and we look at geographic performance on their apache and
> > nginx servers, the only problem is they encounter minor errors from time
> to
> > time while recursively downloading (server-side errors nothing to do with
> > wget) so the session ends.
>
> Just forgot: Check out Wget2's --stats-site option. It gives you
> statistical information about all pages downloaded, including parent
> (linked from), status, size, compression, timing, encoding and a few
> more. You can visualize with graphviz or put the data into a database
> for easy analysis.
>
> Example:
> $ wget2 --stats-site=csv:site.csv -r -p https://www.google.com
> $ cat site.csv
> ID,ParentID,URL,Status,Link,Method,Size,SizeDecompressed,
> TransferTime,ResponseTime,Encoding,Verification
> 1,0,https://www.google.com/robots.txt,200,1,1,1842,6955,33,33,1,0
> 2,0,https://www.google.com,200,1,1,4637,10661,83,83,1,0
> 4,2,https://www.google.com/images/branding/product/ico/
> googleg_lodp.ico,200,1,1,1494,5430,32,31,1,0
> 5,2,https://www.google.com/images/branding/googlelogo/1x/
> googlelogo_white_background_color_272x92dp.png,200,1,1,5482,5482,36,36,0,0
> 3,2,https://www.google.com/images/nav_logo229.png,200,1,
> 1,12263,12263,59,58,0,0
>
> Regards, Tim
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]