bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #50935] TEXTHTML not properly set if page is already dow


From: anonymous
Subject: [Bug-wget] [bug #50935] TEXTHTML not properly set if page is already downloaded
Date: Wed, 3 May 2017 20:08:06 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0

URL:
  <http://savannah.gnu.org/bugs/?50935>

                 Summary: TEXTHTML not properly set if page is already
downloaded
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Thu 04 May 2017 12:08:05 AM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: trunk
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

Running (for example):

wget -xH -nc 'https://news.ycombinator.com/item?id=14245538'
wget -pH -nc 'https://news.ycombinator.com/item?id=14245538'

results in wget not checking the resulting html file for links.  This is
caused by wget saving the file without an html suffix, and only checking the
file extension of the file to determine if it is an html file (this check even
has a "#### Bogusness alert.").  This could possibly be fixed by checking the
file for a "<!DOCTYPE html" header, or checking if it begins with an "<html>"
tag.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50935>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]