lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV BUG? Problem with suggested name for downloaded compressed


From: Klaus Weide
Subject: Re: LYNX-DEV BUG? Problem with suggested name for downloaded compressed files
Date: Tue, 23 Sep 1997 19:23:25 -0500 (CDT)

On Wed, 24 Sep 1997, Dick Wesseling wrote:

> address@hidden said:

(Actually I didn't write any of the lines you quote)

> > On Mon, 22 Sep 1997, WWW server manager wrote: [...]
> > > While it is reasonable to make use of Content-Encoding to uncompress a 
> > > file
> > > for display,
> Yes.
> 
> > and then to strip the suffix implying compression from the
> > > default file name for saving the results via p(rint),
> 
> I don't agree. The way I interpret the content-encoding header is that 
> it indicates that an extra level of compression is applied to the 
> content either by the HTTP server or by an intermediate gateway and 
> that the user agent should always decompress the content. Notice that I 
> am talking about *content*, not filenames. 

In other words, a different topic.

> In other words, 
> content-encoding ia a transport options.

That sounds more like Transfer-Encoding.

> For a real word example try http://www.nwsbank.nl/. When that server 
> (my server) recognizes accept-encoding: gzip it attempts to compress 
> the contents with gzip.

Nice.

> This works fine when *viewing* a file with 
> Lynx. When *downloading* the file Lynx always dumps the compressed data 
> to disk, not what I had in mind when I did the server-side compression.

What's bad about it, and why does it matter to you?  If a user explicitly
'd'ownloads something, user probably wants a file on his/her disk.
It shouldn't be any concern of yours in which form a user stores his/her
downloaded files.

> Now consider the following scenario: a Lynx user requests a .gz file. 
> Now the server attempts to compress a file that is already gziped. Of 
> course it is smart enough to use the original file it can't be 
> compressed any further, but for the sake of the argument assume that it 
> always sends the compressed contents together with a Content-encoding: 
> gzip header and a Content-type: application/gzip header. After 
> ungziping the content Lynx will still have a .gz file.

1. you admit that this is an unrealistic example without practical
   use, right?
2. There is no such thing as "Content-type: application/gzip", at least
   not officially (and you seem to take the specs seriously); the correct
   way (theoretically) to label a HTTP message with this content would be

      Content-type: <a-content-type>
      Content-encoding: gzip, gzip

   meaning that the content of type <a-content-type> has been gzipped
   twice.  I doubt whether any HTTP client understands this (Lynx
   certainly doesn't), but then it's a useless example anyway.

   Or you might say

      Content-type: application/octet-stream
      Content-encoding: gzip

   Or write a draft about what "application/zip" should mean and submit it
   to IANA.  But I guess you would have to make a good argument why it is
   worthwhile to introduce a way of labelling gzip'd files with

      Content-type: application/gzip

   when there already is a way to label it and say _more_ about the
   content, with (e.g.)

      Content-type: text/html
      Content-encoding: gzip

3. In practice I believe this is all irrelevant.  THere are far too may
   (misconfigured) servers which send

      Content-type: application/x-gzip
      Content-encoding: gzip

   or similar for files which have been encoded only once (of course),
   and I don't think they will all go away anytime soon.  I think the
   only resonable thing to do here is what Lynx does, offer to download
   without trying to unravel the mess.
   (If you explicitly define a VIEWER for application/x-gzip you may get
   different results, I haven't tried it...)

> If my reading of the specs is correct then Content-encoding is a 
> transport option only and it should be handled in a transparant way by 
> the user agent. 

Again, this sounds more like Transfer-Encoding.

> So, modifying file names based on content-encoding 
> headers is not a good idea.

Now you are (briefly) back to the original topic, but it isn't clear what
one has to do with the other.

I think you are trying to read (much) too much into the specs:

> However, I'm not sure whether my interpretation of the specs is 
> correct. Section 14.5 of draft-ietf-http-v11-spec-07 (I know, expired 
> half a year ago), says:
> 
> > The Content-Encoding entity-header field is used as a modifier to
> > the media-type. When present, its value indicates what additional
> > content codings have been applied to the entity-body, and thus what
> > decoding mechanisms MUST be applied in order to obtain the
> > media-type referenced by the Content-Type header field.
> > Content-Encoding is primarily used to allow a document to be
> > compressed without losing the identity of its underlying media type.
> 
> This supports what I claimed above, I read: "additional content coding" 
> as: "done by the server or gateway". If the underlying media type is 
> applicaton/gzip then Lynx must decode to obtain application/gzip.

Don't put too much value on that MUST.  The HTTP spec doesn't say anything
about what a client has to do with a given message content - there is no
requirement that text/html has to be parsed or image/gif has to be
displayed etc., a client can do what the user or implementer chooses with
the content (including storing to disk, printing, whatever you like).  
IF you WANT to "obtain the media-type referenced by the Content-Type
header field", THEN the spec says how to do it.  But it doesn't say you
have to want to.  (And in practice this fails with servers which send gzip
in both c-t and c-e headers, as noted above.)

Whether an encoding has been applied on the fly by the origin server or
before a file was stored on disk is irrelevant from the point of view of
the HTTP protocol.  The word "additional" doesn't imply anything about
that.

(Well these are my opinions, you may want to ask the authors instead.)

> However, a few lines later the same document says:
> 
> > The Content-Encoding is a characteristic of the entity identified by
> > the Request-URI. Typically, the entity-body is stored with this
> > encoding and is only decoded before rendering or analogous usage.
> 
> This seems a contradiction. First they are talking about "additional" 
> coding and next they're talking encoding being a characteristic (i.e. 
> not something that is added later) of the entity. Now what am I to 
> believe?

Maybe the word additional is unfortunate.  But I don't think it means what
you think.

Basically IMO a client is free to uncompress or not, as far as HTTP is
concerned, according to what seems to make more sense.

   Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]