bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is there a possibility for wget to handle long(er) paths?


From: D . Wróblewski
Subject: Re: Is there a possibility for wget to handle long(er) paths?
Date: Tue, 12 Nov 2024 18:30:02 +0100

Hi Darshit, thanks for the reply.

Here is the full output of wget --version:

# wget --version
GNU Wget 1.21 built on linux-gnueabihf.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/gnutls

Wgetrc:
    /etc/wgetrc (system)
Locale:
    /usr/share/locale
Compile:
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
    -DLOCALEDIR="/usr/share/locale" -I. -I../../src -I../lib
    -I../../lib -Wdate-time -D_FORTIFY_SOURCE=2
    -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -g -O2
    -ffile-prefix-map=/build/wget-0DbRiJ/wget-1.21=.
    -fstack-protector-strong -Wformat -Werror=format-security
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall
Link:
    gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -g -O2
    -ffile-prefix-map=/build/wget-0DbRiJ/wget-1.21=.
    -fstack-protector-strong -Wformat -Werror=format-security
    -DNO_SSLv2 -D_FILE_OFFSET_BITS=64 -g -Wall -Wl,-z,relro -Wl,-z,now
    -lpcre2-8 -luuid -lidn2 -lnettle -lgnutls -lz -lpsl ftp-opie.o
    gnutls.o http-ntlm.o ../lib/libgnu.a

Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.


Here is a filtered getconf:

# getconf -a | grep NAME
NAME_MAX                           255
_POSIX_NAME_MAX                    255
LOGNAME_MAX                        256
TTY_NAME_MAX                       32
TZNAME_MAX
_POSIX_TZNAME_MAX
CHARCLASS_NAME_MAX                 2048
HOST_NAME_MAX                      64
LOGIN_NAME_MAX                     256

# getconf -a | grep PATH
PATH_MAX                           4096
_POSIX_PATH_MAX                    4096
PATH                               /bin:/usr/bin
CS_PATH                            /bin:/usr/bin


I prepared a test case for the wget output.

Setup:
All the content to be downloaded is at some server, call it
hh.hhhhhhhhhhhhh.hhh (the server name length is preserved with this
anonymisation).
It is all stored in a content root directory rrrr (length and case is
preserved).
All is downloaded to a local directory /llll/ll/lll/LL (lengths and case
preserved)

I created a remote directory _zzz under rrrr.
Inside _zzz I created a remote directory
67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234
and 4 files (full paths starting at the content root):
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.240
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.237
_zza/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.238

You might get the naming idea - the* full paths* (including '/') are 240,
235, 236, 237 and 238 chars long, respectively.
Please note that the file names are much shorter - *max 115 characters*
long!

So here is the wget command and output (I excluded other files/dirs inside
rrrr):


# wget --no-dns-cache -nv -r -c -l inf -P/llll/ll/lll/LL -nH --cut-dirs=1
--restrict-file-names=windows --progress=dot:mega --ftps-implicit
--ftp-user=USER --ftp-password=**** ftps://hh.hhhhhhhhhhhhh.hhh/rrrr/
Resuming SSL session in data connection.
2024-11-12 03:00:02 URL: ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/ [1392] ->
"/llll/ll/lll/LL/.listing" [1]
(...)
2024-11-12 17:16:12 URL: ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/ [296]
-> "/llll/ll/lll/LL/_zzz/.listing" [1]
Resuming SSL session in data connection.
2024-11-12 17:16:13 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/
[970] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/.listing"
[1]
The name is too long, 240 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.
The name is too long, 240 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.
Resuming SSL session in data connection.
2024-11-12 17:16:13 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456.240
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/6_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23456"
[1]
Resuming SSL session in data connection.
2024-11-12 17:16:14 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/a_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_.235"
[1]
Resuming SSL session in data connection.
2024-11-12 17:16:14 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/b_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_2.236"
[1]
The name is too long, 237 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23.
The name is too long, 237 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23.
Resuming SSL session in data connection.
2024-11-12 17:16:15 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.237
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/c_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_23.23"
[1]
The name is too long, 238 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2.
The name is too long, 238 chars total.
Trying to shorten...
New name is
_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2.
Resuming SSL session in data connection.
2024-11-12 17:16:15 URL:
ftps://hh.hhhhhhhhhhhhh.hhh:990/rrrr/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.238
[33] ->
"/llll/ll/lll/LL/_zzz/67_10_234567_20_234567_30_234567_40_234567_50_234567_60_234567_70_234567_80_234567_90_23456_100_23456_110_23456_120_234/d_130_23456_140_23456_150_23456_160_23456_170_23456_180_23456_190_23456_200_23456_210_23456_220_23456_230_234.2"
[1]
FINISHED --2024-11-12 17:16:15--
Total wall clock time: 11m 19s
Downloaded: 1420 files, 54G in 7.3s (7.44 GB/s)


As you can see, the *.235 and *.236 files are downloaded without their
names shortened, the shortening starts at the length of 237. This number is
quite strange.

I also made some tests with lftp and it transfers all the files in both
directions without name shortening. In fact, I checked much longer paths,
over 400 chars long.
So this problem does not seem to be system related. It seems to be the
"feature" of wget.

I have described the case here, but didn't get a satisfying answer (look in
comments, too):
https://superuser.com/questions/1857575/is-there-a-possibility-for-wget-to-handle-longer-paths

Best,
Glorifyday


On Mon, Nov 11, 2024 at 4:58 PM Darshit Shah <darnir@gnu.org> wrote:

> Hi,
>
> Sorry for the delayed response.
>
> Would it be possible for you to please share the full output of Wget?
> Also, the output of `wget --version`.
>
> GNU Wget does not usually impose arbitrary limits on the file name / path
> length.
> The limits that are imposed are done so by querying pathconf(3)
> In case pathconf is not available, a conservative limit on the path length
> is enforced. Which seems to be the case on your system.
> Given that pathconf(3) is part of POSIX-1.2008, I'm at a loss trying to
> explain
> why it wasn't found on your system.
>
> I will try to see if I can replicate your environment. But until then,
> there isn't
> much we can do about this. I would not add a runtime parameter for this.
> If no
> solution can be found, I am willing to add a compile time knob to modify
> the max
> path lengths. But that will still require you to rebuild Wget for your
> target platform
> manually.
>
> On Wed, Oct 2, 2024, at 01:59, D. Wróblewski wrote:
> > I am using wget for a local backup of a remote ftp/s directory using the
> > following syntax:
> >
> >    wget --no-dns-cache -nv -r -c -l inf -P/local_dir -nH --cut-dirs=1
> > --restrict-file-names=windows --progress=dot:mega --ftps-implicit (...)
> >
> > This is wget 1.21 on Raspbian GNU/Linux 11 (bullseye)
> >    # wget --version
> >    GNU Wget 1.21 built on linux-gnueabihf.
> > I am restricting file names to windows because the local directory is
> > shared via smb to a Windows PC.
> >
> > The problem I encountered is that some files inside the remote directory
> > tree have long paths.
> >
> > In such a case, wget prints:
> >    The name is too long, 240 chars total.
> >    Trying to shorten...
> > and it downloads the file with its name shortened.
> >
> > If a local filesystem has no support for long paths, this behaviour is
> some
> > necessary workaround and is appreciated.
> >
> > However, in my case, the local filesystem does support long paths. I can
> > create paths much longer than 240 characters with mkdir, cp, mc, and so
> on.
> > So the name length restriction introduced by wget is unnecessary and
> causes
> > problems: some files end up without extensions, some names are cut at the
> > dot ('.') character and then their names are not showing properly via
> smb.
> >
> > Is there a parameter enabling wget to use longer path names?
> > If not, is it possible to introduce it in the next wget version?
> >
> > Best regards,
> > Glorifyday
> >
> > --
>
>

-- 


(P.S.
i pozdrawiam bezprawnie czytajacych ten list smutnych panow ze sluzb Polski
i calego swiata)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]