bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [PATCH] Bug 40426 follow-up


From: Ander Juaristi
Subject: [Bug-wget] [PATCH] Bug 40426 follow-up
Date: Thu, 12 Mar 2015 12:23:15 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

This is my first attempt to fix 40426. I'm expecting to continue to work on it 
until it's fixed, unless someone has objections.

Source of the problem:

When -r and -O - are specified together, Wget hangs. Well, it really doesn't 
hang, it's actually waiting for input. When Wget downloads a file it saves it 
to the disk first and then reads it again from there to parse it and get more 
URLs, if any. But when '-O -' was specified, Wget reads from stdin. This is 
done in wget_read_file(). As a funny game, when this happens, type some HTML, 
and then hit Ctrl-D or any other key sequence equivalent to EOF, and you'll 
effectively trick Wget into inserting arbitrary HTML.

Solution:

A patch was already proposed that simply avoided '-r' and '-O -' to be set together. But that would 
void a clause in the documentation that states that "wget -O file http://foo"; is intended to 
work like "wget -O - http://foo > file". So, my approach is to maintain that.

Initially I thought of redirecting stdout to stdin, but that would be an ugly 
hack that would probably make things difficult in the future. What's more, 
there are options that rely on stdin, such as '--input-file=-'.

So, what I did is to write to a regular file in /tmp/.wget.stdout instead of in 
stdout when '-O -' is passed, and dump everything in the end. This, apart from 
fixing the bug, maintains the documented behavior.

Although this is a good workaround for the short term, IMO the best approach is 
to keep everything in memory and write stuff out in the end. Something similar 
was already reported at 20714.

I don't expect this patch to be definite. This is just a follow-up to my work. 
I'm looking forward to your feedback as I sure haven't taken into account all 
the consequences.

So far, the following are still to be done:
  - /tmp/.wget.stdout won't work on Windows, so port it.
  - FOPEN_OPT_ARGS is not taken into account.
  - I only took into account HTTP. The same workaround should be applied to FTP 
too.

Here goes:

diff --git a/src/http.c b/src/http.c
index b7020ef..bc7c1e9 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3080,7 +3080,7 @@ http_loop (struct url *u, struct url *original_url, char 
**newloc,
/* Set LOCAL_FILE parameter. */
   if (local_file && opt.output_document)
-    *local_file = HYPHENP (opt.output_document) ? NULL : xstrdup 
(opt.output_document);
+    *local_file = HYPHENP (opt.output_document) ? xstrdup (TMP_OUTFILE) : 
xstrdup (opt.output_document);
/* Reset NEWLOC parameter. */
   *newloc = NULL;
@@ -3101,7 +3101,7 @@ http_loop (struct url *u, struct url *original_url, char 
**newloc,
if (opt.output_document)
     {
-      hstat.local_file = xstrdup (opt.output_document);
+      hstat.local_file = HYPHENP (opt.output_document) ? xstrdup (TMP_OUTFILE) 
: xstrdup (opt.output_document);
       got_name = true;
     }
   else if (!opt.content_disposition)
diff --git a/src/main.c b/src/main.c
index b23967b..1bf7565 100644
--- a/src/main.c
+++ b/src/main.c
@@ -1582,7 +1582,17 @@ for details.\n\n"));
 #ifdef WINDOWS
           _setmode (_fileno (stdout), _O_BINARY);
 #endif
-          output_stream = stdout;
+          // TODO We should take care of FOPEN_OPT_ARGS
+          output_stream = fopen (TMP_OUTFILE, "wb+");
+          if (output_stream == NULL)
+            {
+              perror (TMP_OUTFILE);
+              exit (WGET_EXIT_GENERIC_ERROR);
+            }
+          /*
+           * We know it's a regular file. No need to check.
+           */
+          output_stream_regular = true;
         }
       else
         {
@@ -1762,8 +1772,24 @@ outputting to a regular file.\n"));
   if (opt.convert_links && !opt.delete_after)
     convert_all_links ();
+ /* If output file is stdout (`-O -' was specified), obey.
+     Print stuff out.  */
+  if (HYPHENP (opt.output_document) && output_stream && (total_downloaded_bytes 
> 0))
+    {
+      struct file_memory *fm = wget_read_file (TMP_OUTFILE);
+      if (fm)
+        {
+          write (fileno (stdout), fm->content, fm->length);
+          wget_read_file_free (fm);
+        }
+    }
+
   cleanup ();
+ /* Delete the temporary output file */
+  if (HYPHENP (opt.output_document) && unlink (TMP_OUTFILE))
+    logprintf (LOG_NOTQUIET, "Temporary output file was not deleted. Should be done 
by hand. %s\n", strerror (errno));
+
   exit (get_exit_status ());
 }
diff --git a/src/wget.h b/src/wget.h
index 8d2b0f1..9d40255 100644
--- a/src/wget.h
+++ b/src/wget.h
@@ -126,6 +126,12 @@ as that of the covered work.  */
#define DEBUGP(args) do { IF_DEBUG { debug_logprintf args; } } while (0) +/* If we're outputting to stdout ('-O -' was specified), we'd rather
+   write to a temporary file and output everything in the end, since
+   Wget still needs the downloaded files to be regular in order to
+   parse them.  */
+#define TMP_OUTFILE "/tmp/.wget.stdout"
+
 /* Pick an integer type large enough for file sizes, content lengths,
    and such.  Because today's files can be very large, it should be a
    signed integer at least 64 bits wide.  This can't be typedeffed to



--
Regards,
- AJ



reply via email to

[Prev in Thread] Current Thread [Next in Thread]