[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but s
From: |
Dave |
Subject: |
[bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but should accept any characters entities. |
Date: |
Wed, 20 Mar 2019 12:28:47 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 |
Follow-up Comment #3, bug #42870 (project groff):
To address the point raised in comment #2, init_charset_table() in
src/roff/troff/input.cpp appears to be what defines the default hcode values,
in particular the lines:
for (int i = 0; i < 256; i++) {
...
if (csalpha(i))
charset_table[i]->set_hyphenation_code(cmlower(i));
}
So the csalpha() call must be returning false for any characters that are
ISO-8859-1 (a.k.a. Latin-1) alphabetic characters but outside the ASCII
range.
Indeed, a peek into cset_init::cset_init() in src/libs/libgroff/cset.cpp
supports this:
for (int i = 0; i <= UCHAR_MAX; i++) {
csalpha.v[i] = ISASCII(i) && isalpha(i);
...
}
The isalpha() call is part of the C standard library's <ctype.h>. Its return
value depends on the current locale. In groff, which lives in the ISO-8859-1
locale, it's undesirable for this function's behavior to change based on the
user's environment; it's for this reason, I presume, that the additional test
ISASCII() is imposed, to force non-ASCII characters to return 0 regardless of
what isalpha() returns. And in the ASCII range, isalpha() should function the
same no matter the current locale.
But a more robust solution may be to call <ctype.h>'s isalpha_l() instead, so
that the ISO-8859-1 locale can be enforced. By doing this and removing the
ISASCII() test (from the csalpha.v[i] line and all the following lines setting
other attributes), the character attributes set in cset_init::cset_init()
would be accurate for all ISO-8859-1 characters, not just ASCII ones.
This could have implications beyond the hcode values, of course, and I confess
I'm not familiar enough with groff's internals to determine what they might
be.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?42870>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but should accept any characters entities.,
Dave <=