lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget, & and lynx (was Re: lynx-dev bug report)


From: Klaus Weide
Subject: Re: wget, & and lynx (was Re: lynx-dev bug report)
Date: Tue, 26 Oct 1999 00:16:19 -0500 (CDT)

On Mon, 25 Oct 1999, Doug Kaufman wrote:

> On Mon, 25 Oct 1999, Klaus Weide wrote:
> 
> > Lynx does not generate URLs for local files with unescaped "&" characters,
> > as far as I know.  At least not on Unix.  I don't trust what folks are
> > doing on Windows in this respect, but then I don't have to use it...
> 
> I didn't realize that handling was different in the Windows ports. To
> which code are you referring? If it is wrong, it should be fixed.

Well, I get the feeling that confusion is rampant, from all the ad hoc
patches regarding filenames scattered all over the place...  (not really
a new thing).  I don't know how much "wrong" the resulting behavior is,
but I wouldn't trust it either.  A measure would be how well lynx deals
with browsing and other operations on unusual filenames.

Here is one case of Windows ports do it their own way - without any
technical reason afaiks, it seems somone just didn't like to see escaped
characters.  In LYConvertToURL (LYUtils.c):

#if defined (DOSPATH) || defined (__EMX__)
                /* Don't want to see DOS local paths like c: escaped  */
                /* especially when we really have file://localhost/   */
                /* at the beginning.  To avoid any confusion we allow */
                /* escaping the path if URL specials % or # present.  */
                if (strchr(temp, '#') == NULL && strchr(temp, '%') == NULL)
                    StrAllocCopy(cp, temp);
                else
                    cp = HTEscape(temp, URL_PATH);
#else
                cp = HTEscape(temp, URL_PATH);
#endif /* DOSPATH */


A section a bit below that shows something is clearly wrong, in the
WIN_EX line:
                HTUnEscape(cp);   /* unescape given path without fragment */
                StrAllocCat(temp2, cp);         /* append to current dir */
                StrAllocCopy(cp2, temp2);       /* keep a copy in cp2 */
                LYTrimRelFromAbsPath(temp2);
#ifdef WIN_EX   /* 1998/07/31 (Fri) 09:09:03 */
                HTUnEscape(temp2);      /* for LFN */
#endif

The string that ends up in temp2 gets HTUnEscaped twice.  That cannot
be right.  (Well except if it somehow got escaped twice before that, but
that doesn't seem to be the case and wouldn't make sense.)
Escaping and unescaping are not idempotent operations.  One has to keep
track of what a given string at a given point is.   For most characters
it isn't strictly necessary to always escape them (even when theoretically
they should be), but think of '%' itself.  (And if that's a character that
cannot occur in DOS/Windows filenames - I don't know - that shouldn't
really matter.)

Another thing that caught my eye is in HTDOS_wwwName:
(This is ifdef'd with SH_EX)

        case '\\':
        /* convert dos backslash to unix-style */
            *cp_url++ = '/';
            break;
        case ' ':
            *cp_url++ = '%';
            *cp_url++ = '2';
            *cp_url++ = '0';
            break;

So for some reason one character (space) gets URL-escaped here while
others don't.  What is the meaning of '%' characters in the resulting
mixed-nature string?  Then consider that the result of wwwName gets
fed to HTEscape in various places...

Note that these are all changes that have been deliberately introduced.
I doesn't look like by accident.  The code for Unix is basically working
right and consistent afaik.  If Windows people prefer to go their own ways
I can only hope they know what they're doing.

   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]