lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev lynx2.8.2dev.19 patch #6 (em dash = --)


From: Klaus Weide
Subject: Re: lynx-dev lynx2.8.2dev.19 patch #6 (em dash = --)
Date: Tue, 16 Mar 1999 12:18:04 -0600 (CST)

On Mon, 15 Mar 1999, Leonid Pauzner wrote:

> 15-Mar-99 05:12 Klaus Weide wrote:
> > On Mon, 15 Mar 1999, Leonid Pauzner wrote:
> 
> >> * — (&#x2014) now display as "--" (popular requests) - LP
> >> *   now display as two   (popular requests),
> >>   previous definition of HT_EM_SPACE now renamed to HT_EN_SPACE. - LP
> 
> > I remember seeing requests for the "--", but not for the   change.
> > Is this really a good idea?   Maybe the — should be changed, but
> >   left as it is?
> 
> > The change introduces "  " in some places, but in some contexts this might
> > be collapsed later into one space anyway (possibly in attributes?).
> 
> The current &emsp change affect only one line in 
> SGML.c/put_special_unicodes().

But the changes you made were more extensive...

> No problem to change it back but hope &emsp should be consistent with &mdash.

That's one way to see it.  The other is &emsp should be consistent with
a character is a character, and a space is a space.

> Attributes were not changed last year but occasionally they expand
> &emsp to two spaces from from def7_uni.tbl - try test/spaces.html.
> I am not sure whether there is a context where _this_ string
> might be collapsed (and require fixing),
> but probably we had even less requests addressed for this problem :)

Ok, I remember that I saw that a long time, and never bothered fixing
it.  Since I put the "  " in def7_uni.tbl, I guess I shouldn't complain
about it now. :)
Well I never claimed the original tables were more than samples, that
people should improve to make them more useful... (which you are doing).

So, as far as I can reconstruct,
- I put the "  " (two spaces) as replacement string for U+2003 (EM SPACE)
  in def7_uni.tbl.
- That didn't really get used by lynx in most common situations, but only
  for attribute handling (somewhere else, maybe?).  Instead, the lynx
  code recognized   (and other forms for it) specially, and
  translated it to special char value '\002' HT_EM_SPACE.
- You now made things more consistent between the def7_uni table and what
  normally occurs, by changing the special handling, such that '\002' is
  called HT_EN_SPACE, and   gets specially translated to two of
  those.

The whole process is still a bit of a mess - to not lose information
(and enable lossless back-translation, say for form submissions), we'd
need HT_EM_SPACE and HT_EN_SPACE and a bunch of other specials.
(Since   seems to be more often used, there may be some lossage
from now treating   as the primitive instead of  .  Well that's
all very theoretical.)

Ayyway, my reservation about alway &emsp -> " " (two spaces) is mostly
that this is another step in a direction I don't want: to give authors
more detailed layout control than is good for us.  Some author thinks
an   makes something "look better" in their GUI browser; should
lynx try to follow along, or should it just try to render the
"meaning" (whatever *that* means) of documents.

I see   here as different from —.  — is a different
character from '-' whose meaning could be confused.    isn't
usually used to convey a different meaning from a single space, it's
just a formatting tweak.

Besides, the character translations are also used within PRE etc., and
even in translated  plain text, so there is a good reason to leave
one fixed-width char translated to one fixed-width char as often
as possible.

Anyway, do what you want with this case, as you'll do anyway, :)
since the .tbl file already had two spaces anyway...

Btw., 

   U+2001:  
   U+2003:  

(there are two significant spaces at the end of both lines)
should probably become

   U+2001 "  "
   U+2003 "  "

for better survival chances in mailed patches etc.


   Klaus

reply via email to

[Prev in Thread] Current Thread [Next in Thread]