lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c


From: Klaus Weide
Subject: Re: LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c
Date: Sun, 9 Mar 1997 16:31:01 -0600 (CST)

On Sun, 9 Mar 1997, Hynek Med wrote:

> I have added ISO-8859-2 entities like č to Klaus' charset patches. 
> I hope I got it all right - is &udie; an 'u' with diaeresis and is 'a'
> with double accents &adouble; ? Where can I get these standards? 

I don't know what &adouble; is.  
&udie; should be the same as ü, and similarly 
&adie <-> &auml, &die <-> &uml, etc., but usually the &Xuml; is used
instead of the &Xdie. 

Some pointers to sites where you can find info on SGML character entities,
including tables (but usually without cross-reference to Unicodes):
  <URL: http://www.bbsinc.com/iso8859.html>  
  <URL: http://ppewww.ph.gla.ac.uk/%7Eflavell/iso8859/iso8859-pointers.html>
  <URL: ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES>

Have you encountered _any_ Web pages yet that use entities for (non-Latin1)
characters from iso-8859-2?

> It's strange anyway. &cacute; works, while &tacute; doesn't. &Aogon; 
> works, &aogon; doesn't. Either I forgot something (like sort it) or
> there's something wrong with the code.. 

The entities names in that table have to be sorted.  Lynx does a binary
lookup there for entities it encounters.  I will not be able to find
them if the order is wrong.  (My handful of examples also have a mistake
in that respect, see the last two entries..)

Piping your table through `sort' should probably do it...

[...]
> Trace outpus shows: 
> 
> SGML: Unknown entity Aogon so far, checking extra...  SGML: Unknown entity
> Ccaron so far, checking extra...  SGML: Unknown entity Ccaron SGML:
> Unknown entity uring so far, checking extra...  SGML: Unknown entity uring
> SGML: Unknown entity udie so far, checking extra... 

Well those trace messages are probably not very useful.. :)
At least they show that the entities are unknown "so far", i.e. Lynx
couldn't figure out what to do with them in the "usual" way.

As far as I know, these entities (from Latin 2 etc.) are not in any
"official" HTML standard;  but Peter Flynns HTML Pro (see Lynx
Links) "includes the whole of ISOlat1, ISOlat2, ISOnum, ISOpub and
ISOtech".  So a validator, given the right DOCTYPE, should be able to
validate text containing them.

Of course, if no other browser understands these entities, it doesn't make
much sese to create Web pages with them.  It may also not make sense to
have them in Lynx at all.  One potential problem is the interference with
URL query strings, which becomes more likely if there are more entities
recognized.

  Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]