[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Groff] illegal input characters in troff
From: |
Ted Harding |
Subject: |
RE: [Groff] illegal input characters in troff |
Date: |
Thu, 09 Mar 2000 00:12:52 -0000 (GMT) |
On 08-Mar-00 Werner LEMBERG wrote:
>
> The following input characters are illegal and will be ignored,
> causing a warning message: 0x00, 0x0B, 0x0D-0x1F, 0x80-0x9F.
>
> Does this make sense at all? At least for some Asian encodings (like
> SJIS for Japanese or Big 5 for Chinese), and of course for UTF-8, we
> need the full 0x80-0xFF range.
>
> Which characters are allowed in UNIX troff?
>
> I suggest to change this and make all characters legal if not in
> compatibility mode.
In my view there could be a few slightly delicate issues here.
I'm in favour of allowing everything, as a general principle
(much for the reasons Werner gives).
As far as I know, UNIX troff has always excluded 0x80-0xFF;
and has always documented the range 0x00-0x1F as follows:
The ASCII control characters horizontal tab [0x09] and SOH [0x01]
[tabs and leaders] and backspace [0x08] are discussed elsewhere.
The newline [0x0A] delimits imput lines. In addition, STX [0x02],
ETX [0x03], ENQ [0x05], ACK [0x06] and BEL [0x07] are accepted,
and may be used as delimiters or translated into a graphic with
'tr'. All other are ignored.
The first delicate point is what (if anything) to do with \r [0x0D].
At present, in groff, eqn generates the error message
illegal input character code `13'
but troff seems to ignore it (I think all components of groff should
treat it the same way). NOW: Should it be ignored (so that \n, \r\n
and \n\r produce the same result, as at present); should it (perhaps
as an option) be treatable as equivalent to \n (as in the Mac, for
instance, and in some other software) which could facilitate file
interchange? Should (as in PostScript), \n, \r\n, \n\r, \r all
be equivalent (and all internally translated to \n)?
Next: should ALL the currently ignored characters be translatable?
I.e., by default they would be "seen" by troff, but have null effect;
however, if translated by ".tr", or defined by ".char", or present at
the appropriate numbered position in the device font files, they could
be assigned to a graphic? I could be in favour of this option.
However, I think that the characters not currently ignored
(STX [0x02], ETX [0x03], ENQ [0x05], ACK [0x06] and BEL [0x07])
should retain their current status (i.e. usable as delimiters
-- "tbl" uses them, for instanace, and I think that "refer"
may do so as well -- or translatable), to avoid breaking
software which depends on this status.
In any case, I'm definitely in favour of ALL input in the range
0x80-0x9F being available to troff, on the same basis as just
described (i.e. wth null efect unless defined by ".char" or
translated by ".tr", or present in the device font files with
the appropriate numerical code).
For what it's worth ...
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <address@hidden>
Date: 09-Mar-00 Time: 00:12:51
------------------------------ XFMail ------------------------------