[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #66468] wget --no-clobber sometimes overwrites existing files
From: |
anonymous |
Subject: |
[bug #66468] wget --no-clobber sometimes overwrites existing files |
Date: |
Wed, 20 Nov 2024 07:04:42 -0500 (EST) |
URL:
<https://savannah.gnu.org/bugs/?66468>
Summary: wget --no-clobber sometimes overwrites existing
files
Group: GNU Wget
Submitter: None
Submitted: Wed 20 Nov 2024 12:04:38 PM UTC
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name: Raven
Originator Email: ravenmobile13@gmail.com
Open/Closed: Open
Discussion Lock: Any
Release: 1.20
Operating System: GNU/Linux
Reproducibility: Intermittent
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Follow-up Comments:
-------------------------------------------------------
Date: Wed 20 Nov 2024 12:04:38 PM UTC By: Anonymous
I have a Bash script that runs wget v1.21.3 on Debian 12. The script passes
wget a file containing URLs to download, using --directory-prefix=, and
--no-clobber to skip already downloaded files. This script worked
beautifully. I downloaded approximately 20,000 files without issue, and upon
rerunning the script after it would skip all 20,000 existing files. Great.
But now a month later I rewrote parts of the script, and slightly changed the
wget command to use a different output folder with --directory-prefix.
Suddenly the script started ignoring the --no-clobber for around 5% of the
20,000 files, redownloading them every time I ran my script, fully
reproduceable. I ran it dozens of times, it would redownload a group of URLs
one after the other (a sequential list of URLs in the middle were failing,
while the other thousands of URLs were fine).
I then picked one URL from this group of URLs that were always failing, and
tried to isolate the cause of the issue. I tried changing --directory-prefix
to /tmp, and it worked fine, skipping the existing file. Then I changed it
back to "2.Download_Pages_Data" and it failed again, overwriting the existing
file every time (tried dozens of times).
I enabled debug output with -d for the working and failing output directories,
but it provided no new information. When downloading to /tmp it says the file
exists. While downloading to "2.Download_Pages_Data" it acts as if the output
file does not exist, and overwrites it every time.
I thought it might be a bug with the --directory-prefix parameter, so I opted
to try "cd 2.Download_Pages_Data && wget ..." instead, but that also failed in
exactly the same manner. It worked fine doing "cd /tmp && wget ...".
I then wondered if wget might be having issues with there being a period "."
in the output directory name "2.Download_Pages_Data", so I started trying
other output directory names like "2_Download_Pages_Data" (failed), then
"2_Download_Data" (failed), then "2_Data" (WORKED!), then I went back to
"2.Download_Pages_Data" (ALSO WORKED!). Now each time I run the wget command
it works, skipping the existing output file.
I'm not sure how it's even possible to have wget break intermittently like
this. Literally only changing the output directory, wget --no-clobber
overwrites files or doesn't.
It's clearly a bug, because if --no-clobber was not used, it should produce
output_file.html.1, then output_file.html.2, etc., but it is overwriting
existing files.
In case it's relevant, the URLs I was downloading were all ending in .html
similar to "Some_Random_Page_Name.html".
How does one even go about trying to debug this? Debug output with -d showed
nothing relevant whether it was working or failing. It goes from working to
not working when changing nothing but the output directory.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?66468>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
signature.asc
Description: PGP signature
- [bug #66468] wget --no-clobber sometimes overwrites existing files,
anonymous <=