bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already dow


From: Tim Ruehsen
Subject: [Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded
Date: Fri, 12 May 2017 04:02:25 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0

Update of bug #50935 (project wget):

                  Status:               Need Info => Confirmed              

    _______________________________________________________

Follow-up Comment #3:

Sorry, my stupidity :-)
I was stuck with the first command and everything was fine, so I didn't really
check the next command :-(

You are right, if the file exists the -p -nc combination says 'File ...
already there; not retrieving.' and does nothing.

Instead it should read and parse that file (after checking that it really is a
HTML or CSS). Wget currently has no heuristic, so it should make a HEAD
request to check the content-type. What Wget really does is looking at the
file name extension.

So you can do the trick with


wget -xHE -nc 'https://news.ycombinator.com/item?id=14245538'
wget -pH -nc 'https://news.ycombinator.com/item?id=14245538'


I will add this issue as a reference in Wget2 development, where we will do it
correctly (using HEAD request).

Thanks for your report !


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50935>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]