[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] non-ASCII chars and grohtml
From: |
Werner LEMBERG |
Subject: |
[Groff] non-ASCII chars and grohtml |
Date: |
Wed, 24 Nov 2004 09:47:42 +0100 (CET) |
Gaius,
if I say
\X'html:ü'
I get
x X html:ü
in the intermediate troff output file. With other words, the \X
escape passes `ü' unmodified. This is a problem, since grohtml
expects ASCII input only. We have no possibility in GNU troff to
convert `ü' to `\[:u]' in the `mouth' (to use TeX's terminology), so I
suggest that you add a warning to grohtml, something like this:
Charset `US-ASCII' doesn't contain character code 0xFC (`ü')
Additionally, we need a new tag `html:charset' which sets the
`charset' attribute in the <meta> command. Then a string
`.input-encoding' (the leading dot shall indicate that this string is
meant as read-only) should be added to the latinX.tmac files which can
be used in www.tmac to set the tag automatically:
.tag "html:charset \*[.input-encoding]
The whole issue is a bit tricky; for example, I suggest to allow at
most one call to `.tag html:charset...' for simplicity. Another
problem is how to determine the valid character ranges -- shall this
be built into grohtml? Or shall my proposed html:charset tag look
like this:
html:charset <name> <start1> <end1> <start2> <end2> ...
so that grohtml can be dumb, and the latinX.tmac define the proper
ranges via \*[.input-encoding]?
Of course, the simplest solution is to disallow characters >= 0x80
completely in the `html:...' tag, but a user may wonder why she can
use `ü' everywhere in the document except in .URL and friends (and
switching to UTF8 in the future needs additional changes).
Werner
- [Groff] non-ASCII chars and grohtml,
Werner LEMBERG <=