bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #54596] wget gets a lot of file named "index.html?......


From: anonymous
Subject: [Bug-wget] [bug #54596] wget gets a lot of file named "index.html?............" and other strange file names
Date: Thu, 30 Aug 2018 05:12:34 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36

URL:
  <http://savannah.gnu.org/bugs/?54596>

                 Summary: wget gets a lot of file named
"index.html?............" and other strange file names
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Thu 30 Aug 2018 09:12:33 AM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: Gabriel Popescu
        Originator Email: address@hidden
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.12
        Operating System: GNU/Linux
         Reproducibility: Every Time
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

Using wget recursively (wget -r -l 20 ...) to fetch the whole content of a web
site, a lot of files named "index.html?....." appear in the root directory
created by wget to save the site content.
To reproduce such behaviour, try to run
wget -r -l 20 notizie.lottoland.it
and look inside the directory notizie.lottoland.it created under the dir where
you run the wget command: you'll find a single index.html file and a lot of
"index.html?....." files, where the dots are POST parameters like "p=203" and
so on.
There are a lot of files named "wp-login.php?....." too and other files with
such strange names in the underlieing dirs.

EG:

# ls
amp
author
category
come-vincere-a-eurojackpot-leggi-i-nostri-4-suggerimenti
comments
feed
index.html
index.html?p=309
index.html?p=322
index.html?p=330
index.html?p=334
index.html?p=354
index.html?p=433
index.html?p=436
index.html?tm=1535446062
index.html?tm=1535619035
i-numeri-fortunati-alla-lotteria
le-probabilita-di-vincere-alla-lotteria
pago-delle-tasse-sulle-vincite-alle-lotterie
quale-lotteria-conviene-giocare
quale-lotteria-ha-piu-probabilita-di-vincere
quando-e-la-prossima-estrazione
robots.txt
wp-admin
wp-content
wp-includes
wp-json
wp-login.php
wp-login.php?action=lostpassword
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fcome-vincere-a-eurojackpot-leggi-i-nostri-4-suggerimenti%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fi-numeri-fortunati-alla-lotteria%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fle-probabilita-di-vincere-alla-lotteria%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fpago-delle-tasse-sulle-vincite-alle-lotterie%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquale-lotteria-conviene-giocare%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquale-lotteria-ha-piu-probabilita-di-vincere%2F
wp-login.php?redirect_to=https:%2F%2Fnotizie.lottoland.it%2Fquando-e-la-prossima-estrazione%2F
xmlrpc.php
xmlrpc.php?rsd

# wget --version
GNU Wget 1.12 built on linux-gnu.

Linux version 2.6.32-504.el6.x86_64 (address@hidden) (gcc
version 4.4.7 20120313 (Red Hat 4.4.7-11)




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?54596>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]