bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Implement --keep-badhash to keep Metalink's files


From: Matthew White
Subject: Re: [Bug-wget] [PATCH] Implement --keep-badhash to keep Metalink's files with a bad hash
Date: Fri, 19 Aug 2016 15:21:42 +0200

On Fri, 19 Aug 2016 12:16:15 +0200
Tim Rühsen <address@hidden> wrote:

> On Freitag, 19. August 2016 07:34:42 CEST Matthew White wrote:
> > On Thu, 18 Aug 2016 12:50:48 +0200
> > 
> > Tim Rühsen <address@hidden> wrote:
> > > On Mittwoch, 17. August 2016 17:55:47 CEST Matthew White wrote:
> > > > On Thu, 04 Aug 2016 17:08:20 +0200
> > > > 
> > > > Tim Ruehsen <address@hidden> wrote:
> > > > > On Thursday, August 4, 2016 12:40:49 PM CEST Matthew White wrote:
> > > > > > On Thu, 04 Aug 2016 12:08:50 +0200
> > > > > > 
> > > > > > Tim Ruehsen <address@hidden> wrote:
> > > > > > > On Wednesday, August 3, 2016 2:40:14 PM CEST Matthew White wrote:
> > > > > > > > On Wed, 03 Aug 2016 14:05:06 +0200
> > > > > > > > 
> > > > > > > > Tim Ruehsen <address@hidden> wrote:
> > > > > > > > > On Tuesday, August 2, 2016 11:27:28 AM CEST Matthew White 
> wrote:
> > > > > > > > > > On Mon, 01 Aug 2016 16:32:30 +0200
> > > > > > > > > > 
> > > > > > > > > > Tim Ruehsen <address@hidden> wrote:
> > > > > > > > > > > On Saturday, July 30, 2016 9:41:48 PM CEST Matthew White
> > > 
> > > wrote:
> > > > > > > > > > > > Hello!
> > > > > > > > > > > > 
> > > > > > > > > > > > I think that sometimes it could help to keep downloaded
> > > > > > > > > > > > Metalink's
> > > > > > > > > > > > files
> > > > > > > > > > > > which have a bad hash.
> > > > > > > > > > > > 
> > > > > > > > > > > > The default wget behaviour is to delete such files.
> > > > > > > > > > > > 
> > > > > > > > > > > > This patch provides a way to keep files which have a bad
> > > > > > > > > > > > hash
> > > > > > > > > > > > through
> > > > > > > > > > > > the
> > > > > > > > > > > > option --keep-badhash. It appends the suffix .badhash to
> > > > > > > > > > > > the
> > > > > > > > > > > > file
> > > > > > > > > > > > name,
> > > > > > > > > > > > except without overwriting existing files. In the latter
> > > > > > > > > > > > case,
> > > > > > > > > > > > an
> > > > > > > > > > > > unique
> > > > > > > > > > > > suffix is appended after .badhash.
> > > > > > > > > > > > 
> > > > > > > > > > > > I made this patch working on the following branch:
> > > > > > > > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377)
> > > > > > > > > > > > git://git.savannah.gnu.org/wget.git
> > > > > > > > > > > > 
> > > > > > > > > > > > What do you think?
> > > > > > > > > > > 
> > > > > > > > > > > Hi Matthew,
> > > > > > > > > > > 
> > > > > > > > > > > good work !
> > > > > > > > > > > 
> > > > > > > > > > > While your FSF assignment is underway (PM), we can
> > > > > > > > > > > continue
> > > > > > > > > > > polishing.
> > > > > > > > > > > 
> > > > > > > > > > > Didn't test your code yet,  but from what I see, there are
> > > > > > > > > > > just
> > > > > > > > > > > these
> > > > > > > > > > > peanuts:
> > > > > > > > > > > 
> > > > > > > > > > > 1.
> > > > > > > > > > > +  bhash = malloc (strlen (name) + strlen (".badhash") +
> > > > > > > > > > > 1);
> > > > > > > > > > > +  strcat (strcpy (bhash, name), ".badhash");
> > > > > > > > > > > 
> > > > > > > > > > > Please consider using concat_strings() from util.h.
> > > > > > > > > > > And please leave an empty line between var declaration and
> > > > > > > > > > > code -
> > > > > > > > > > > just
> > > > > > > > > > > for
> > > > > > > > > > > readability.
> > > > > > > > > > > 
> > > > > > > > > > > 2.
> > > > > > > > > > > Please add 'link' and 'unlink' to bootstrap.conf. Gnulib
> > > > > > > > > > > will
> > > > > > > > > > > emulate
> > > > > > > > > > > these on platforms where they are not available (or have a
> > > > > > > > > > > slightly
> > > > > > > > > > > different behavior). I guess we simply forgot 'unlink'
> > > > > > > > > > > when we
> > > > > > > > > > > switched
> > > > > > > > > > > to gnulib.
> > > > > > > > > > > 
> > > > > > > > > > > 3.
> > > > > > > > > > > The (GNU) commit messages should ideally look like:
> > > > > > > > > > > 
> > > > > > > > > > > one line of short description
> > > > > > > > > > > <empty line>
> > > > > > > > > > > file changes
> > > > > > > > > > > <empty line>
> > > > > > > > > > > long description
> > > > > > > > > > > 
> > > > > > > > > > > For example:
> > > > > > > > > > > Add new option --keep-badhash
> > > > > > > > > > > 
> > > > > > > > > > > * src/init.c: Add keepbadhash
> > > > > > > > > > > * src/main.c: Add keep-badhash
> > > > > > > > > > > * src/options.h: Add keep_badhash
> > > > > > > > > > > * doc/wget.texi: Add docs for --keep-badhash
> > > > > > > > > > > * src/metalink.h: Add prototypes badhash_suffix(),
> > > > > > > > > > > badhash_or_remove()
> > > > > > > > > > > * src/metalink.c: New functions badhash_suffix(),
> > > > > > > > > > > badhash_or_remove().
> > > > > > > > > > > 
> > > > > > > > > > >   (retrieve_from_metalink): Call append .badhash()
> > > > > > > > > > > 
> > > > > > > > > > > With --keep-badhash, append .badhash to Metalink's files
> > > > > > > > > > > with
> > > > > > > > > > > checksum
> > > > > > > > > > > mismatch, except without overwriting existing files.
> > > > > > > > > > > 
> > > > > > > > > > > Without --keep-badhash, remove downloaded files with
> > > > > > > > > > > checksum
> > > > > > > > > > > mismatch
> > > > > > > > > > > (this conforms to the old behaviour).
> > > > > > > > > > > 
> > > > > > > > > > > [This also applies to to your other patches]
> > > > > > > > > > > 
> > > > > > > > > > > 4.
> > > > > > > > > > > Please not only write 'keepbadhash', but also what you did
> > > > > > > > > > > (e.g.
> > > > > > > > > > > remove,
> > > > > > > > > > > rename, add, ...), see my example above.
> > > > > > > > > > > 
> > > > > > > > > > > Those ChangeLog entries should allow finding changes /
> > > > > > > > > > > faults
> > > > > > > > > > > /
> > > > > > > > > > > regressions
> > > > > > > > > > > etc. even when a versioning system is not at hand, e.g.
> > > > > > > > > > > when
> > > > > > > > > > > unpacking
> > > > > > > > > > > the
> > > > > > > > > > > sources from a tarball. (Not debatable that consulting
> > > > > > > > > > > commit
> > > > > > > > > > > messages
> > > > > > > > > > > directly is much more powerful.)
> > > > > > > > > > > 
> > > > > > > > > > > 5.
> > > > > > > > > > > You write "except without overwriting existing files",
> > > > > > > > > > > maybe
> > > > > > > > > > > you
> > > > > > > > > > > should
> > > > > > > > > > > mention appending a counter instead of overwriting
> > > > > > > > > > > existent
> > > > > > > > > > > files
> > > > > > > > > > > !?
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Regards, Tim
> > > > > > > > > > 
> > > > > > > > > > Thanks Tim!
> > > > > > > > > > 
> > > > > > > > > > I really needed your guidance. I sent the modified patches
> > > > > > > > > > to
> > > > > > > > > > address@hidden
> > > > > > > > > > 
> > > > > > > > > > I believe there are more things to fix.
> > > > > > > > > > 
> > > > > > > > > > Consider the following after applying the attached patch:
> > > > > > > > > > * src/metalink.c (retrieve_from_metalink): line 124: If the
> > > > > > > > > > download
> > > > > > > > > > is
> > > > > > > > > > interrupted (output_stream isn't NULL), and there are more
> > > > > > > > > > urls
> > > > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > same file (we are still in the download loop), switching to
> > > > > > > > > > the
> > > > > > > > > > next
> > > > > > > > > > url
> > > > > > > > > > should resume the download (instead than start it over).
> > > > > > > > > > 
> > > > > > > > > > In this patch I added a fix to rename/remove the fully
> > > > > > > > > > downloaded
> > > > > > > > > > file
> > > > > > > > > > (output_stream is NULL), but there was an error (probably
> > > > > > > > > > checksum)
> > > > > > > > > > and
> > > > > > > > > > we
> > > > > > > > > > are still in the download loop (more urls for the same
> > > > > > > > > > file).
> > > > > > > > > > But as
> > > > > > > > > > said
> > > > > > > > > > before, switching to the next url should continue the
> > > > > > > > > > download:
> > > > > > > > > > *
> > > > > > > > > > src/metalink.c (retrieve_from_metalink): line 131:
> > > > > > > > > > Rename/remove
> > > > > > > > > > fully
> > > > > > > > > > downloaded file on error
> > > > > > > > > > 
> > > > > > > > > > I still have to investigate the problem...
> > > > > > > > > 
> > > > > > > > > So, I wait with 0001-... !?
> > > > > > > > > 
> > > > > > > > > 0002-... has been pushed (extended with 'symlink').
> > > > > > > > > 
> > > > > > > > > Thanks for your contribution.
> > > > > > > > > 
> > > > > > > > > Tim
> > > > > > > > 
> > > > > > > > Ok for 0002. Thanks.
> > > > > > > > 
> > > > > > > > There's no problem applying the 0001.
> > > > > > > 
> > > > > > > Applied and pushed !
> > > > > > > 
> > > > > > > > The "if the stream got interrupted, then restart the download
> > > > > > > > with
> > > > > > > > the
> > > > > > > > next
> > > > > > > > url" (output_stream isn't NULL) was already there before the
> > > > > > > > patch
> > > > > > > > 0001.
> > > > > > > > 
> > > > > > > > [debug is required to know when output_stream isn't NULL]
> > > > > > > > 
> > > > > > > > commit 7fad76db4cdb7a0fe7e5aa0dd88f5faaf8f4cdc8
> > > > > > > > * src/metalink.c (retrieve_from_metalink): line 124: 'if
> > > > > > > > (output_stream)'
> > > > > > > > remove file and restart download with next url
> > > > > > > > 
> > > > > > > > With the 0001, if --keep-badhash is used the file is renamed
> > > > > > > > instead
> > > > > > > > than
> > > > > > > > removed, even when "output_stream is NULL".
> > > > > > > > 
> > > > > > > > I have to look through the "stream interrupted" situation.
> > > > > > > > 
> > > > > > > > I'm guessing that the download is not resumed.
> > > > > > > > 
> > > > > > > > What do you suggest?
> > > > > > > 
> > > > > > > We need a  document where we define wanted behavior, create tests
> > > > > > > and
> > > > > > > amend the code.
> > > > > > > 
> > > > > > > Regards, Tim
> > > > > > 
> > > > > > Ok, I just documented a test. See the attached tarball.
> > > > > > 
> > > > > > Also there is a patch attached (there's a copy in the tarball too),
> > > > > > providing the suggested bugfix.
> > > > > 
> > > > > I made that test without your patch => crash due to
> > > > > badhash_or_remove(NULL). With your patch, valgrind says:
> > > > > 
> > > > > ==15068== Syscall param open(filename) points to unaddressable byte(s)
> > > > > ==15068==    at 0x6848F40: __open_nocancel (syscall-template.S:84)
> > > > > ==15068==    by 0x67DFF7D: _IO_file_open (fileops.c:221)
> > > > > ==15068==    by 0x67E009F: _IO_file_fopen@@GLIBC_2.2.5 (fileops.c:328)
> > > > > ==15068==    by 0x67D4F03: __fopen_internal (iofopen.c:86)
> > > > > ==15068==    by 0x434FAE: retrieve_from_metalink (metalink.c:184)
> > > > > ==15068==    by 0x406C63: main (main.c:2025)
> > > > > ==15068==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > > > > ==15068==
> > > > 
> > > > I think I nailed the above bug and the SIGSEGV mentioned here:
> > > > http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00052.html
> > > > http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00056.html
> > > > 
> > > > I just pushed to https://github.com/mehw/wget/tree/metalink, and further
> > > > explanations are in the commit fb3cdf5b0fd9e08426682b8d7c84be1f7e343805.
> > > > 
> > > > For the sake of clarity, below there's an extemp of the notes added to
> > > > the
> > > > commit message.
> > > > 
> > > > * src/metalink.c (retrieve_from_metalink):
> > > >   commit a0ff0952d6fbd01a509797eb6ba31923ba6752af
> > > >   [latest presenting the SIGSEGV problem]
> > > >   
> > > >   commit e3fb4c3859d2705fb2de80801b0bba2b64bea1a1
> > > >   [discussed in the mailing list about the SIGSEGV]
> > > >   
> > > >   If unique_create() cannot create/open the destination file, filename
> > > >   and output_stream remain NULL. If fopen() is used instead, filename
> > > >   always remains NULL. Both functions cannot create "path/file" trees.
> > > >   
> > > >   Setting filename to the right value is sufficient to prevent SIGSEGV
> > > >   generating from testing a NULL value. This also allows retrieve_url()
> > > >   to create a "path/file" tree through opt.output_document.
> > > >   
> > > >   Reading NULL as output_stream, when it shall not be, leads to wrong
> > > >   results. For instance, a non-NULL output_stream tells when a stream
> > > >   was interrupted, reading NULL instead means to assume the contrary.
> > > 
> > > Maybe you could fix these warnings:
> > > 
> > > metalink.c: In function 'retrieve_from_metalink':
> > > metalink.c:121:18: warning: implicit declaration of function
> > > 'metalink_check_safe_path' [-Wimplicit-function-declaration]
> > > 
> > >        safename = metalink_check_safe_path (filename) ? filename :
> > >        basename;
> > >        
> > >                   ^~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > metalink.c: In function 'get_metalink_basename':
> > > metalink.c:683:3: warning: suggest parentheses around assignment used as
> > > truth value [-Wparentheses]
> > > 
> > >    while (name = strstr (basename, "/"))
> > >    ^~~~~
> > > 
> > > And set the X flags for
> > > testenv/Test-metalink-xml-homepath.py
> > 
> > Thanks. Warnings fixed and 'x' flags set. Should I leave them as separate
> > commits, or merge them with the commits requiring a fix?
> > > I think it is debatable if Test-metalink-xml-abspath.py and
> > > Test-metalink-xml- homepath.py really should do nothing. You said that
> > > libmetalink returns NULL
> > Test-metalink-xml-abspath.py, Test-metalink-xml-relpath.py, and
> > Test-metalink-xml-homepath.py shall verify that only metalink:file elements
> > with safe names are processed/downloaded/saved. If unsafe metalink:file
> > elements are saved, such tests shall fail.
> > 
> > As I stated in doc/metalink.txt ('2. Notes'), saving a file to an absolute
> > path poses a security problem. We must ensure that Wget's automated tests
> > never modify the root.
> > Can you confirm the above?
> 
> ACK. As I said, we could strip the path and use the basename instead of 
> ignoring the file completely. But if path components are forbidden anyway, we 
> can choose to simply ignore such files (as you suggest).

Paths coming from Metalink/HTTP (--metalink-over-http) aren't verified by 
libmetalink, so it is possible to use their basename (read below).

When a Metalink/XML (--input-metalink) file is parsed, the following call is 
made:
* src/main.c (main): Call libmetalink's metalink_parse_file()

An unsafe metalink:file "name" field (path) is forbidden by:
* lib/libexpat_metalink_parser.c (metalink_parse_file)
* lib/libxml2_metalink_parser.c (metalink_parse_file)

As you can see, unless the parse engine is changed, there's no choice but to 
accept the result of libmetalink's metalink_parse_file(), which through a call 
to metalink_check_safe_path() forbids unsafe metalink:file elements found in 
Metalink/XML files.

On your request, I implemented 'Enforce Metalink file name verification, strip 
directory if necessary' 
https://github.com/mehw/wget/commit/3a0bda59481c7a344f9944884e7e0b332290acfc. 
As a side effect, --directory-prefix to absolute, relative, and home paths will 
trigger the directory stripping and the file is saved in the working directory 
as basename (doc/metalink.txt).

WDYT?

> 
> > > or an empty string as file name if the file name contains a path.
> > > But shouldn't we (by default) use the file name from the URL ? And even
> > > with -- trust-server-names we should take this name in case libmetalink
> > > doesn't give us a name ?
> > > 
> > > Regards, Tim
> > 
> > By design, the behaviour is the following.
> > 
> > 
> > Metalink/XML
> > ------------
> > 
> > bugus.meta4 is a metalink xml file (see attachment):
> > 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <metalink xmlns="urn:ietf:params:xml:ns:metalink">
> >   <file name="/dir/A/File1">
> >     <size>1617</size>
> >     <hash
> > type="sha256">ecb3dff2648667513e31554b3ad054ccd89fce38e33367c9459ac3a285153
> > 742</hash> <url>http://another.url/common_name</url>
> >     <url>http://ftpmirror.gnu.org/bash/bash-4.3-patches/bash43-001</url>
> >   </file>
> >   <file name="dir/B/File2">
> >     <size>1594</size>
> >     <hash
> > type="sha256">eee7cd7062ab29a9e4f02924d9c367264dcb8b162703f74ff6eb8f175a915
> > 02b</hash> <url>http://another.url/again/common_name</url>
> >     <url>http://ftpmirror.gnu.org/bash/bash-4.3-patches/bash43-002</url>
> >   </file>
> > </metalink>
> > 
> > $ wget --input-metalink=bogus.meta4
> 
> Should be --input-file=bogus.meta4 --force-metalink. --input-metalink is a 
> 'shortcut' not conforming to the --force options that wget has. Just saying 
> this for completeness, but for this discussion it might not be relevant.

I see that it has been "corrected" in wget2...

> 
> The main question is: where did this file come from ?
> It looks like downloading a file via metalink requires two command lines - 
> one 
> to retrieve the .meta4 file and another to start downloading from it.
> This seems basically wrong and is a very bad user experience.

The metalink xml file could be an user "hand made" file, so there should be 
(almost) no security concerns (user's responsibility).

If the metalink xml file is downloaded, the process could be automatized on 
user request:
$ wget [--metalink-over-http] --force-metalink http://safe.url/file.ext[.meta]

Of course the security measures you are suggesting should take place (I'm on 
it).

WDYT?

> 
> > Wget will eventually execute the following line:
> > * src/main.c (main): 'meta_err = metalink_parse_file (opt.input_metalink,
> > &metalink);'
> > 
> > [see bottom for type declarations]
> > 
> > By design, libmetalink rejects metalink:file elements with unsafe names, see
> > https://github.com/metalink-dev/libmetalink: * lib/metalink_helper.c
> > (metalink_check_safe_path)
> > 
> > At this point, the variable metalink contains data for "dir/B/File2" (i.e.
> > *metalink->files[0]), but not for "/dir/A/File1" (no more
> > *metalink->files).
> > 
> > There will never be *metalink->files with an empty or NULL name, thereof
> > there could be no *metalink->files at all.
> > 
> > Even when all metalink:file elements are rejected, i.e. no *metalink->files,
> > if metalink_parse_file() parsed the xml file without error, wget exits
> > without error.
> > 
> > Wget is expected to save each metalink:file element using its unique "name"
> > field as file name: https://tools.ietf.org/html/rfc5854#section-4.1.2.1
> > 
> > How do you suggest to implement --trust-server-names in the above scenario?
> 
> Starting from a local .meta4 file, we could use the original name without 
> .meta4 extension. E.g. wget --input-metalink archive.gz.meta4 creates 
> 'archive.gz'. --trust-server-names would use the names from the metalink file.

Got it! If more files are described by the xml, apply .1 .2 suffixes to 
archive.gz (--trust-server-names means to use the names in the xml 
descriptor)?! I'm on it.

> 
> Starting with a remote URL that returns 'application/metalink4+xml' Content-
> type, that content should be parsed as metalink file and the described 
> file(s) 
> downloaded. In this case we should take the file name from the original URL - 
> if two or more files are described, they should get the .1 .2 etc extensions. 
> This is for safety, e.g. not to implicitly download executable files/scripts. 
> In case --trust-server-names is given, the user says we should use the names 
> from the metalink description, overriding the default security mechanism.

Quite intuitive! Basically, --content-disposition and --metalink-over-http (if 
a xml is described) should operate the same processing on the downloaded 
metalink xml file: use the cli's url file name, unless --trust-server-names, 
and add .1 .2 suffixes accordingly. Right? I'm on this one too.

> 
> > 
> > Let's say that --trust-server-names saves the metalink:file elements using
> > their url's file name.
> > 
> > What should happen when different metalink:file elements, pointing to
> > different locations/contents, end up all having the same url's file name,
> > i.e. common_name in bogus.meta4?
> 
> They should have extensions .1 .2 .3 etc.

This should apply only if the metalink:file urls are used to name the file, 
right? Do you suggest the possibility of using a metalink:file url's file name 
to write the file?

> 
> > Also, which file name should be wrote when a metalink:file has urls
> > (mirrors) with different file names, and one of those urls fail (and the
> > next one is used)?
> 
> The URLs of the mirrors should not change the file name at all. They are just 
> mirrors.
> 
> > How do you choose the destination file name when a metalink:file uses its
> > url sources in parallel? https://tools.ietf.org/html/rfc5854#section-1
> > 
> > I see the risk of making --trust-server-names/--input-metalink to operate in
> > the same way of --input-file, ending up interpreting a metalink xml file as
> > a list of urls, each one to be saved using its own url's file name.
> 
> Basically yes, but see above.
> 
> > 
> > WDYT?
> > 
> > 
> > Metalink/HTTP
> > -------------
> > 
> > $ wget --metalink-over-http http://127.0.0.1/dir/sillyname.ext
> > 
> > [I'm using a local server]
> > 
> > Suppose the server answers with a header containing the following:
> > Link: http://ftpmirror.gnu.org/bash/bash-4.3-patches/bash43-001;
> > rel=duplicate; pref; pri=2 Link: http://another.url/common_name;
> > rel=duplicate; pref; pri=1
> > Digest: SHA-256=7LPf8mSGZ1E+MVVLOtBUzNifzjjjM2fJRZrDooUVN0I=
> > 
> > When opt.output_document and opt.content_disposition aren't specified, the
> > following line is reached, and 'u' and 'original_url' are the same: *
> > src/http.c (http_loop): 'url_file_name (opt.trustservernames ? u :
> > original_url, NULL);'
> > 
> > The function url_file_name() combines the url's file name with any
> > "Directory Options".
> > 
> > By design, the cli's url is used, i.e. the file name is 'sillyname.ext', and
> > the "Directory Options" apply to http://127.0.0.1/dir/.
> 
> That seems correct (the user gets what she requested).
> 
> > How do you suggest to implement --trust-server-names in the above scenario?
> 
> It's just mirrors - the name should IMO stay sillyname.ext.
> 
> But this is not a real world example - a real metalink server would either 
> redirect or answer with 200 + application/metalink4+xml + metalink body.
> 
> Example:
> wget -d http://openoffice.mirrorbrain.org/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2
> 
> HTTP/1.1 302 Found
> Date: Fri, 19 Aug 2016 09:26:29 GMT
> Server: Apache/2.2.22 (Linux/SUSE) mod_ssl/2.2.22 OpenSSL/1.0.0k DAV/2 SVN/
> 1.7.5 mod_wsgi/3.3 Python/2.7.2 mod_asn/1.6 mod_mirrorbrain/2.18.0 mod_fcgid/
> 2.3.6
> X-Prefix: 91.0.0.0/10
> X-AS: 3320
> X-MirrorBrain-Mirror: ftp5.gwdg.de
> X-MirrorBrain-Realm: country
> Link: <http://openoffice.mirrorbrain.org/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2.meta4>; rel=describedby; type="application/
> metalink4+xml"
> Link: <http://openoffice.mirrorbrain.org/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2.torrent>; rel=describedby; type="application/
> x-bittorrent"
> Link: <http://ftp5.gwdg.de/pub/openoffice/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2>; rel=duplicate; pri=1; geo=de
> Link: <http://ftp.uasw.edu/pub/openoffice.org/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2>; rel=duplicate; pri=2; geo=de
> Link: <http://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.openoffice.org/stable/
> 3.3.0/OOo_3.3.0_src_extensions.tar.bz2>; rel=duplicate; pri=3; geo=de
> Link: <http://sunsite.informatik.rwth-aachen.de/ftp/pub/mirror/OpenOffice/
> stable/3.3.0/OOo_3.3.0_src_extensions.tar.bz2>; rel=duplicate; pri=4; geo=de
> Link: <http://ftp.tu-chemnitz.de/pub/openoffice-extended/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2>; rel=duplicate; pri=5; geo=de
> Digest: MD5=onY2YSLrIbXofSyX14J61Q==
> Digest: SHA=Zkg18VkCplq7L+x+tImnMyET/tc=
> Digest: SHA-256=ftMOLpU/hW43Xcb8LqoZgHmCnL0iGM8C/HIo6KXqgkY=
> Location: http://ftp5.gwdg.de/pub/openoffice/stable/3.3.0/
> OOo_3.3.0_src_extensions.tar.bz2
> Content-Length: 264
> Keep-Alive: timeout=2, max=100
> Connection: Keep-Alive
> Content-Type: text/html; charset=iso-8859-1
> 
> Now it is the application's choice to download in one chunk (and use the 
> Digest header to check for correctness) or to download the .meta4 description 
> first - to take full advantage of chunked download. (That is how wget2 works 
> - 
> the single chunks are even downloaded from different mirrors in parallel to 
> distribute the load.)
> So, if possible, wget should use the HTTP answer just as 'trampoline' to 
> fetch 
> the metalink description. This allows for much better error control and 
> automatic continuation of stopped downloads.

Ok, thanks for the explanation.

> 
> > 
> > Let's say that --trust-server-names saves the file using one of its Link
> > url's file name.
> > 
> > To which url the "Directory Options" should apply? To the original cli's
> > url, or to the currrent mirror's url?
> > 
> > Also, if a Link fails and the download switches to the next mirror, should
> > the file name be changed accordingly?
> > 
> > And what if a Link redirects to another url with a different file name?
> > 
> > How do you choose the destination file name when mirror Link header fields
> > are used in parallel? https://tools.ietf.org/html/rfc6249#section-7
> > 
> > WDYT?
> 
> Tim

I pushed doc/metalink-standard.txt. I will update the document as soon as more 
features are implemented/changed.

Later,
Matthew

-- 
Matthew White <address@hidden>

Attachment: pgpeEO3pagT7Q.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]