bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [PATCH] Implement --keep-badhash to keep Metalink's files


From: Tim Rühsen
Subject: Re: [Bug-wget] [PATCH] Implement --keep-badhash to keep Metalink's files with a bad hash
Date: Thu, 18 Aug 2016 12:50:48 +0200
User-agent: KMail/5.2.3 (Linux/4.6.0-1-amd64; KDE/5.23.0; x86_64; ; )

On Mittwoch, 17. August 2016 17:55:47 CEST Matthew White wrote:
> On Thu, 04 Aug 2016 17:08:20 +0200
> 
> Tim Ruehsen <address@hidden> wrote:
> > On Thursday, August 4, 2016 12:40:49 PM CEST Matthew White wrote:
> > > On Thu, 04 Aug 2016 12:08:50 +0200
> > > 
> > > Tim Ruehsen <address@hidden> wrote:
> > > > On Wednesday, August 3, 2016 2:40:14 PM CEST Matthew White wrote:
> > > > > On Wed, 03 Aug 2016 14:05:06 +0200
> > > > > 
> > > > > Tim Ruehsen <address@hidden> wrote:
> > > > > > On Tuesday, August 2, 2016 11:27:28 AM CEST Matthew White wrote:
> > > > > > > On Mon, 01 Aug 2016 16:32:30 +0200
> > > > > > > 
> > > > > > > Tim Ruehsen <address@hidden> wrote:
> > > > > > > > On Saturday, July 30, 2016 9:41:48 PM CEST Matthew White 
wrote:
> > > > > > > > > Hello!
> > > > > > > > > 
> > > > > > > > > I think that sometimes it could help to keep downloaded
> > > > > > > > > Metalink's
> > > > > > > > > files
> > > > > > > > > which have a bad hash.
> > > > > > > > > 
> > > > > > > > > The default wget behaviour is to delete such files.
> > > > > > > > > 
> > > > > > > > > This patch provides a way to keep files which have a bad
> > > > > > > > > hash
> > > > > > > > > through
> > > > > > > > > the
> > > > > > > > > option --keep-badhash. It appends the suffix .badhash to the
> > > > > > > > > file
> > > > > > > > > name,
> > > > > > > > > except without overwriting existing files. In the latter
> > > > > > > > > case,
> > > > > > > > > an
> > > > > > > > > unique
> > > > > > > > > suffix is appended after .badhash.
> > > > > > > > > 
> > > > > > > > > I made this patch working on the following branch:
> > > > > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377)
> > > > > > > > > git://git.savannah.gnu.org/wget.git
> > > > > > > > > 
> > > > > > > > > What do you think?
> > > > > > > > 
> > > > > > > > Hi Matthew,
> > > > > > > > 
> > > > > > > > good work !
> > > > > > > > 
> > > > > > > > While your FSF assignment is underway (PM), we can continue
> > > > > > > > polishing.
> > > > > > > > 
> > > > > > > > Didn't test your code yet,  but from what I see, there are
> > > > > > > > just
> > > > > > > > these
> > > > > > > > peanuts:
> > > > > > > > 
> > > > > > > > 1.
> > > > > > > > +  bhash = malloc (strlen (name) + strlen (".badhash") + 1);
> > > > > > > > +  strcat (strcpy (bhash, name), ".badhash");
> > > > > > > > 
> > > > > > > > Please consider using concat_strings() from util.h.
> > > > > > > > And please leave an empty line between var declaration and
> > > > > > > > code -
> > > > > > > > just
> > > > > > > > for
> > > > > > > > readability.
> > > > > > > > 
> > > > > > > > 2.
> > > > > > > > Please add 'link' and 'unlink' to bootstrap.conf. Gnulib will
> > > > > > > > emulate
> > > > > > > > these on platforms where they are not available (or have a
> > > > > > > > slightly
> > > > > > > > different behavior). I guess we simply forgot 'unlink' when we
> > > > > > > > switched
> > > > > > > > to gnulib.
> > > > > > > > 
> > > > > > > > 3.
> > > > > > > > The (GNU) commit messages should ideally look like:
> > > > > > > > 
> > > > > > > > one line of short description
> > > > > > > > <empty line>
> > > > > > > > file changes
> > > > > > > > <empty line>
> > > > > > > > long description
> > > > > > > > 
> > > > > > > > For example:
> > > > > > > > Add new option --keep-badhash
> > > > > > > > 
> > > > > > > > * src/init.c: Add keepbadhash
> > > > > > > > * src/main.c: Add keep-badhash
> > > > > > > > * src/options.h: Add keep_badhash
> > > > > > > > * doc/wget.texi: Add docs for --keep-badhash
> > > > > > > > * src/metalink.h: Add prototypes badhash_suffix(),
> > > > > > > > badhash_or_remove()
> > > > > > > > * src/metalink.c: New functions badhash_suffix(),
> > > > > > > > badhash_or_remove().
> > > > > > > > 
> > > > > > > >   (retrieve_from_metalink): Call append .badhash()
> > > > > > > > 
> > > > > > > > With --keep-badhash, append .badhash to Metalink's files with
> > > > > > > > checksum
> > > > > > > > mismatch, except without overwriting existing files.
> > > > > > > > 
> > > > > > > > Without --keep-badhash, remove downloaded files with checksum
> > > > > > > > mismatch
> > > > > > > > (this conforms to the old behaviour).
> > > > > > > > 
> > > > > > > > [This also applies to to your other patches]
> > > > > > > > 
> > > > > > > > 4.
> > > > > > > > Please not only write 'keepbadhash', but also what you did
> > > > > > > > (e.g.
> > > > > > > > remove,
> > > > > > > > rename, add, ...), see my example above.
> > > > > > > > 
> > > > > > > > Those ChangeLog entries should allow finding changes / faults
> > > > > > > > /
> > > > > > > > regressions
> > > > > > > > etc. even when a versioning system is not at hand, e.g. when
> > > > > > > > unpacking
> > > > > > > > the
> > > > > > > > sources from a tarball. (Not debatable that consulting commit
> > > > > > > > messages
> > > > > > > > directly is much more powerful.)
> > > > > > > > 
> > > > > > > > 5.
> > > > > > > > You write "except without overwriting existing files", maybe
> > > > > > > > you
> > > > > > > > should
> > > > > > > > mention appending a counter instead of overwriting existent
> > > > > > > > files
> > > > > > > > !?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Regards, Tim
> > > > > > > 
> > > > > > > Thanks Tim!
> > > > > > > 
> > > > > > > I really needed your guidance. I sent the modified patches to
> > > > > > > address@hidden
> > > > > > > 
> > > > > > > I believe there are more things to fix.
> > > > > > > 
> > > > > > > Consider the following after applying the attached patch:
> > > > > > > * src/metalink.c (retrieve_from_metalink): line 124: If the
> > > > > > > download
> > > > > > > is
> > > > > > > interrupted (output_stream isn't NULL), and there are more urls
> > > > > > > for
> > > > > > > the
> > > > > > > same file (we are still in the download loop), switching to the
> > > > > > > next
> > > > > > > url
> > > > > > > should resume the download (instead than start it over).
> > > > > > > 
> > > > > > > In this patch I added a fix to rename/remove the fully
> > > > > > > downloaded
> > > > > > > file
> > > > > > > (output_stream is NULL), but there was an error (probably
> > > > > > > checksum)
> > > > > > > and
> > > > > > > we
> > > > > > > are still in the download loop (more urls for the same file).
> > > > > > > But as
> > > > > > > said
> > > > > > > before, switching to the next url should continue the download:
> > > > > > > *
> > > > > > > src/metalink.c (retrieve_from_metalink): line 131: Rename/remove
> > > > > > > fully
> > > > > > > downloaded file on error
> > > > > > > 
> > > > > > > I still have to investigate the problem...
> > > > > > 
> > > > > > So, I wait with 0001-... !?
> > > > > > 
> > > > > > 0002-... has been pushed (extended with 'symlink').
> > > > > > 
> > > > > > Thanks for your contribution.
> > > > > > 
> > > > > > Tim
> > > > > 
> > > > > Ok for 0002. Thanks.
> > > > > 
> > > > > There's no problem applying the 0001.
> > > > 
> > > > Applied and pushed !
> > > > 
> > > > > The "if the stream got interrupted, then restart the download with
> > > > > the
> > > > > next
> > > > > url" (output_stream isn't NULL) was already there before the patch
> > > > > 0001.
> > > > > 
> > > > > [debug is required to know when output_stream isn't NULL]
> > > > > 
> > > > > commit 7fad76db4cdb7a0fe7e5aa0dd88f5faaf8f4cdc8
> > > > > * src/metalink.c (retrieve_from_metalink): line 124: 'if
> > > > > (output_stream)'
> > > > > remove file and restart download with next url
> > > > > 
> > > > > With the 0001, if --keep-badhash is used the file is renamed instead
> > > > > than
> > > > > removed, even when "output_stream is NULL".
> > > > > 
> > > > > I have to look through the "stream interrupted" situation.
> > > > > 
> > > > > I'm guessing that the download is not resumed.
> > > > > 
> > > > > What do you suggest?
> > > > 
> > > > We need a  document where we define wanted behavior, create tests and
> > > > amend the code.
> > > > 
> > > > Regards, Tim
> > > 
> > > Ok, I just documented a test. See the attached tarball.
> > > 
> > > Also there is a patch attached (there's a copy in the tarball too),
> > > providing the suggested bugfix.
> > 
> > I made that test without your patch => crash due to
> > badhash_or_remove(NULL). With your patch, valgrind says:
> > 
> > ==15068== Syscall param open(filename) points to unaddressable byte(s)
> > ==15068==    at 0x6848F40: __open_nocancel (syscall-template.S:84)
> > ==15068==    by 0x67DFF7D: _IO_file_open (fileops.c:221)
> > ==15068==    by 0x67E009F: _IO_file_fopen@@GLIBC_2.2.5 (fileops.c:328)
> > ==15068==    by 0x67D4F03: __fopen_internal (iofopen.c:86)
> > ==15068==    by 0x434FAE: retrieve_from_metalink (metalink.c:184)
> > ==15068==    by 0x406C63: main (main.c:2025)
> > ==15068==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> > ==15068==
> 
> I think I nailed the above bug and the SIGSEGV mentioned here:
> http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00052.html
> http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00056.html
> 
> I just pushed to https://github.com/mehw/wget/tree/metalink, and further
> explanations are in the commit fb3cdf5b0fd9e08426682b8d7c84be1f7e343805.
> 
> For the sake of clarity, below there's an extemp of the notes added to the
> commit message.
> 
> * src/metalink.c (retrieve_from_metalink):
> 
>   commit a0ff0952d6fbd01a509797eb6ba31923ba6752af
>   [latest presenting the SIGSEGV problem]
> 
>   commit e3fb4c3859d2705fb2de80801b0bba2b64bea1a1
>   [discussed in the mailing list about the SIGSEGV]
> 
>   If unique_create() cannot create/open the destination file, filename
>   and output_stream remain NULL. If fopen() is used instead, filename
>   always remains NULL. Both functions cannot create "path/file" trees.
> 
>   Setting filename to the right value is sufficient to prevent SIGSEGV
>   generating from testing a NULL value. This also allows retrieve_url()
>   to create a "path/file" tree through opt.output_document.
> 
>   Reading NULL as output_stream, when it shall not be, leads to wrong
>   results. For instance, a non-NULL output_stream tells when a stream
>   was interrupted, reading NULL instead means to assume the contrary.

Maybe you could fix these warnings:

metalink.c: In function 'retrieve_from_metalink':
metalink.c:121:18: warning: implicit declaration of function 
'metalink_check_safe_path' [-Wimplicit-function-declaration]
       safename = metalink_check_safe_path (filename) ? filename : basename;
                  ^~~~~~~~~~~~~~~~~~~~~~~~
metalink.c: In function 'get_metalink_basename':
metalink.c:683:3: warning: suggest parentheses around assignment used as truth 
value [-Wparentheses]
   while (name = strstr (basename, "/"))
   ^~~~~

And set the X flags for
testenv/Test-metalink-xml-homepath.py

I think it is debatable if Test-metalink-xml-abspath.py and Test-metalink-xml-
homepath.py really should do nothing. You said that libmetalink returns NULL 
or an empty string as file name if the file name contains a path.
But shouldn't we (by default) use the file name from the URL ? And even with --
trust-server-names we should take this name in case libmetalink doesn't give 
us a name ?

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]