lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev GIFs corrupted with mime headers option


From: David Woolley
Subject: lynx-dev GIFs corrupted with mime headers option
Date: Wed, 2 Jun 1999 00:17:01 +0100 (BST)

Lynx 2.8 (Slackware 3.5 (Linux, kernel 2.0.34))

html2ps is a Perl utility that formats valid HTML including some, well
behaved tables and images, into Postscript for printing, or for the 
creation of PDF.  It was used to create the official Postscript and
hyperlinked PDF versions of the HTML 4.0 specification.

Although not its first choice, one of the options used by html2ps to
fetch URLs is to run Lynx.  It does this with the mime headers option.
The version of Lynx included in Slackware 3.5 (July 1998) fetches corrupt
GIF images when run from html2ps.  Problem GIFs include *the* W3C
logo, although the system is in the office, so I don't have the 
exact URL.

My provisional diagnosis is that the problem is related to this code
in HTTP.c (this version taken from 2.7.2, which is the latest for which
I have exploded source):

  1704    /*
  1705    **  Set up the stream stack to handle the body of the message.
  1706    */
  1707    if (do_head || keep_mime_headers) {
  1708        /*
  1709        **  It was a HEAD request, or we want the headers and source.
  1710        */
  1711        start_of_data = line_kept_clean;
  1712        length = strlen(start_of_data);
  1713        format_in = HTAtom_for("text/plain");
  1714    }

My guess is that the text/plain is causing newlines to be canonicalised,
even in the binary part of the file.  I suspect that format_in needs to
be reset from the Content-Type in the headers at the end of the headers.

I will probably try to use wget to do the fetches to get round this one,
rather than personally diagnosing the problem further in Lynx.

For those interested in html2ps, it:

- has some CSS knowledge, and uses a CSS subset for its own 
  configuration;
- attempts to format tables which are real tables, not just for
  layout;
- can follow links to assemble documents from parts;
- can generate tables of contents, including PDF marks to allow the
  generation of an outline for Acrobat;
- generates PDF marks for links to preserve hyperlinks in PDF;
- generates page numbers for internal links in the printed text;
- can generate DSC Postscript, allowing the use of programs like
  psutils to make booklets from documents;
- can include images, with some restrictions on positioning;
- it's free (GPL).

One negative point is that it uses large amounts of virtual memory.

Some of these functions require ghostscript, and PDF generation requires
a modern Postscript, or Acrobat Distiller.  I'm afraid I don't have
the canonical html2ps site to hand.  CSS is Cascading Style Sheets, PDF
is Adobe's openly specified Portable Document Format, DSC is Document
Structure Comments, the %% lines in Postscript which allow tools to
split the postscript file without interpreting the actual postscript.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]