[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Groff] Re: groff: radical re-implementation
From: |
Tomohiro KUBOTA |
Subject: |
[Groff] Re: groff: radical re-implementation |
Date: |
Tue, 17 Oct 2000 10:27:12 +0900 |
User-agent: |
Wanderlust/1.0.3 (Notorious) SEMI/1.12.1 ([JR] Nonoichi) FLIM/1.12.7 (YĆ«zaki) Emacs/20.7 (i386-debian-linux-gnu) MULE/4.1 (AOI) |
Hi,
At Mon, 16 Oct 2000 16:41:35 +0200 (CEST),
Werner LEMBERG <address@hidden> wrote:
> [I'm CC'ing this mail to the groff@ mailing list. May I ask to move
> the discussion about improvments/changings of groff to this list?]
Ok, I joined address@hidden mailing list, though I send this message
also for debian-i18n list to inform that I agreed to move.
>> The ideal implementation will be using 'wchar_t' for reading.
> But this will fail for some compilers...
Now wchar_t is supported by many systems. It is a mandatory for
internationalization.
The merit of wchar_t is that: write once and work for every
encodings, uncluding UTF-8. Otherwise, you have to write
similar source codes many times for Latin-1, EBCDIC, UTF-8,
and so on so on. Especially, I will insist that Groff should
support EUC-* multibyte encodings for CJK languages. This is
what the current Groff cannot handle entirely. (CJK people
also uses ISO-2022-* encodings.)
The other merit of wchar_t is user-friendliness. Once a user
set LANG variable, every softwares work under the specified
encoding. If not, you have to specify encodings for every software.
We don't want to have ~/.groffrc, ~/.greprc, ~/.bashrc, ~/.xtermrc,
and so on so on to specify 'encoding=ISO8859-1' or 'encoding=UTF-8'.
>> Abolish device types of 'ascii', 'ascii8', 'latin1', 'nippon', and
>> 'utf8' and introduce a new device type such as 'tty'.
I suppose you don't know about 'ascii8' device. This is a local
patch for Debian's Groff that is 8-bit clean (like latin1) but
doesn't assume that 8-bit part is latin1 encoding. For example,
'-' is used for hyphenation and '\(co' is converted into '(C)'.
This is for 8-bit encodings other than latin1, i.e., ISO8859-2,3,..,
and KOI8-R. (Not for CJK multibyte languages).
> Please bear in mind that groff shall work on non-GNU systems also! My
> idea is to only accept UTF8, ascii, latin1, and ebcdic as input
> encodings (the latter three for historical reasons only).
I wrote about Glibc because the message is to Debian mailing list.
Of course I think of portability. wchar_t is portable. I recommend
to implement wchar_t as a new architecture and ascii, latin-1, and
ebcdic as historical encodings. (We may add 'UTF8' as a historical
one.)
I think what is 'historical' is systems which don't support wchar_t.
> Maybe on systems with a recent glibc, iconv() and friends can be used
> to do more, but generally I prefer an iconv-preprocessor so that groff
> itself has not to deal with encoding conversions.
I think this works well. However, who invokes iconv-preprocessor?
A user or wrapper-software? What determines the command option for
iconv?
>> - Groff assumes the input as the encoding of current locale.
> This is probably not correctly set everywhere.
How a system can be configured by a user, in ways other than locale?
A user who want to specify his/her language and encoding will set
LANG variable. Or, having many ~/.foobarrc for every softwares or
specifying --encoding=foobar everytime (s)he invokes a software?
I think setting LANG is a reasonable way.
One compromise is that:
- to use UCS-4 for internal processing, not wchar_t.
- a small part of input and output to be encoding-sensible.
- command options for encodings of input and output to be added.
- a compile-time option I18N to be introduced.
- when I18N is off, default input is latin-1 and default output
is also latin-1.
- when I18N is on, default input and default output are sensible
to LC_CTYPE locale.
- Of course these default encodings can be overrided by command
options.
- Groff can be compiled with I18N off for systems without
internationalization functions such as setlocale().
- iconv(3) to be used for converting between input/output encodings
and internal UCS-4 encoding, if available (I18N=true).
- if I18N is false, conversion process to be hard-coded for
Latin-1, EBCDIC, and UTF-8.
Do you think this can be achieved?
---
Tomohiro KUBOTA <address@hidden>
http://surfchem0.riken.go.jp/~kubota/
- [Groff] Re: groff: radical re-implementation, Werner LEMBERG, 2000/10/16
- [Groff] Re: groff: radical re-implementation, GOTO Masanori, 2000/10/16
- [Groff] Re: groff: radical re-implementation,
Tomohiro KUBOTA <=
- Re: [Groff] Re: groff: radical re-implementation, Tomohiro KUBOTA, 2000/10/17
- Re: [Groff] Re: groff: radical re-implementation, Werner LEMBERG, 2000/10/17
- Re: [Groff] Re: groff: radical re-implementation, CHOI Junho, 2000/10/17
- Re: [Groff] Re: groff: radical re-implementation, Werner LEMBERG, 2000/10/17
- Re: [Groff] Re: groff: radical re-implementation, Tomohiro KUBOTA, 2000/10/17