[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is there a possibility for wget to handle long(er) paths?
From: |
D . Wróblewski |
Subject: |
Re: Is there a possibility for wget to handle long(er) paths? |
Date: |
Tue, 12 Nov 2024 18:30:02 +0100 |
Hi Darshit, thanks for the reply.
Here is the full output of wget --version:
# wget --version
GNU Wget 1.21 built on linux-gnueabihf.
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/gnutls
Wgetrc:
/etc/wgetrc (system)
Locale:
/usr/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib
-I../../lib -Wdate-time -D_FORTIFY_SOURCE=2
-I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -g -O2
-ffile-prefix-map=/build/wget-0DbRiJ/wget-1.21=.
-fstack-protector-strong -Wformat -Werror=format-security
-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
Link:
gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -g -O2
-ffile-prefix-map=/build/wget-0DbRiJ/wget-1.21=.
-fstack-protector-strong -Wformat -Werror=format-security
-DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-z,relro -Wl,-z,now
-lpcre2-8 -luuid -lidn2 -lnettle -lgnutls -lz -lpsl ftp-opie.o
gnutls.o http-ntlm.o ../lib/libgnu.a
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.
Here is a filtered getconf:
# getconf -a | grep NAME
NAME_MAX 255
_POSIX_NAME_MAX 255
LOGNAME_MAX 256
TTY_NAME_MAX 32
TZNAME_MAX
_POSIX_TZNAME_MAX
CHARCLASS_NAME_MAX 2048
HOST_NAME_MAX 64
LOGIN_NAME_MAX 256
# getconf -a | grep PATH
PATH_MAX 4096
_POSIX_PATH_MAX 4096
PATH /bin:/usr/bin
CS_PATH /bin:/usr/bin
I prepared a test case for the wget output.
Setup:
All the content to be downloaded is at some server, call it
hh.hhhhhhhhhhhhh.hhh (the server name length is preserved with this
anonymisation).
It is all stored in a content root directory rrrr (length and case is
preserved).
All is downloaded to a local directory /llll/ll/lll/LL (lengths and case
preserved)
I created a remote directory _zzz under rrrr.
Inside _zzz I created a remote directory
67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234
and 4 files (full paths starting at the content root):
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.240
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.237
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.238
You might get the naming idea - the* full paths* (including '/') are 240,
235, 236, 237 and 238 chars long, respectively.
Please note that the file names are much shorter - *max 115 characters*
long!
So here is the wget command and output (I excluded other files/dirs inside
rrrr):
# wget --no-dns-cache -nv -r -c -l inf -P/llll/ll/lll/LL -nH --cut-dirs=1
--restrict-file-names=windows --progress=dot:mega --ftps-implicit
--ftp-user=USER --ftp-password=**** ftps://hh.hhhhhhhhhhhhh.hhh/rrrr/
Resuming SSL session in data connection.
2024-11-12 03:00:02 URL: ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/ [1392] ->
"/llll/ll/lll/LL/.listing" [1]
(...)
2024-11-12 17:16:12 URL: ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/ [296]
-> "/llll/ll/lll/LL/_zzz/.listing" [1]
Resuming SSL session in data connection.
2024-11-12 17:16:13 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/
[970] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/.listing"
[1]
The name is too long, 240 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.
The name is too long, 240 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.
Resuming SSL session in data connection.
2024-11-12 17:16:13 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.240
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456"
[1]
Resuming SSL session in data connection.
2024-11-12 17:16:14 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235"
[1]
Resuming SSL session in data connection.
2024-11-12 17:16:14 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236"
[1]
The name is too long, 237 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23.
The name is too long, 237 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23.
Resuming SSL session in data connection.
2024-11-12 17:16:15 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.237
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23"
[1]
The name is too long, 238 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2.
The name is too long, 238 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2.
Resuming SSL session in data connection.
2024-11-12 17:16:15 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.238
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2"
[1]
FINISHED --2024-11-12 17:16:15--
Total wall clock time: 11m 19s
Downloaded: 1420 files, 54G in 7.3s (7.44 GB/s)
As you can see, the *.235 and *.236 files are downloaded without their
names shortened, the shortening starts at the length of 237. This number is
quite strange.
I also made some tests with lftp and it transfers all the files in both
directions without name shortening. In fact, I checked much longer paths,
over 400 chars long.
So this problem does not seem to be system related. It seems to be the
"feature" of wget.
I have described the case here, but didn't get a satisfying answer (look in
comments, too):
https://superuser.com/questions/1857575/is-there-a-possibility-for-wget-to-handle-longer-paths
Best,
Glorifyday
On Mon, Nov 11, 2024 at 4:58 PM Darshit Shah <darnir@gnu.org> wrote:
> Hi,
>
> Sorry for the delayed response.
>
> Would it be possible for you to please share the full output of Wget?
> Also, the output of `wget --version`.
>
> GNU Wget does not usually impose arbitrary limits on the file name / path
> length.
> The limits that are imposed are done so by querying pathconf(3)
> In case pathconf is not available, a conservative limit on the path length
> is enforced. Which seems to be the case on your system.
> Given that pathconf(3) is part of POSIX-1.2008, I'm at a loss trying to
> explain
> why it wasn't found on your system.
>
> I will try to see if I can replicate your environment. But until then,
> there isn't
> much we can do about this. I would not add a runtime parameter for this.
> If no
> solution can be found, I am willing to add a compile time knob to modify
> the max
> path lengths. But that will still require you to rebuild Wget for your
> target platform
> manually.
>
> On Wed, Oct 2, 2024, at 01:59, D. Wróblewski wrote:
> > I am using wget for a local backup of a remote ftp/s directory using the
> > following syntax:
> >
> > wget --no-dns-cache -nv -r -c -l inf -P/local_dir -nH --cut-dirs=1
> > --restrict-file-names=windows --progress=dot:mega --ftps-implicit (...)
> >
> > This is wget 1.21 on Raspbian GNU/Linux 11 (bullseye)
> > # wget --version
> > GNU Wget 1.21 built on linux-gnueabihf.
> > I am restricting file names to windows because the local directory is
> > shared via smb to a Windows PC.
> >
> > The problem I encountered is that some files inside the remote directory
> > tree have long paths.
> >
> > In such a case, wget prints:
> > The name is too long, 240 chars total.
> > Trying to shorten...
> > and it downloads the file with its name shortened.
> >
> > If a local filesystem has no support for long paths, this behaviour is
> some
> > necessary workaround and is appreciated.
> >
> > However, in my case, the local filesystem does support long paths. I can
> > create paths much longer than 240 characters with mkdir, cp, mc, and so
> on.
> > So the name length restriction introduced by wget is unnecessary and
> causes
> > problems: some files end up without extensions, some names are cut at the
> > dot ('.') character and then their names are not showing properly via
> smb.
> >
> > Is there a parameter enabling wget to use longer path names?
> > If not, is it possible to introduce it in the next wget version?
> >
> > Best regards,
> > Glorifyday
> >
> > --
>
>
--
(P.S.
i pozdrawiam bezprawnie czytajacych ten list smutnych panow ze sluzb Polski
i calego swiata)