[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[groff] 16/23: Support CJK fonts encoded in UTF-16 (2/6).
From: |
G. Branden Robinson |
Subject: |
[groff] 16/23: Support CJK fonts encoded in UTF-16 (2/6). |
Date: |
Thu, 21 Nov 2024 14:47:49 -0500 (EST) |
gbranden pushed a commit to branch master
in repository groff.
commit 6692471f0a31f00b052cec9b223ed963a130edc1
Author: TANAKA Takuji <ttk@t-lab.opal.ne.jp>
AuthorDate: Fri Dec 29 13:56:37 2023 +0000
Support CJK fonts encoded in UTF-16 (2/6).
* src/include/font.h (class font): Declare private member variable
`wch`, a pointer to an existing list type `font_char_metric`. Declare
private member function `get_font_wchar_metric()` to access it.
* src/libs/libgroff/font.cpp (struct font_char_metric): Add members
`next` (a pointer to the struct's own type) and `end_code` of type
`int`.
(glyph_to_ucs_codepoint): New function returns UCS code point from a
(non-composite) `glyph` object, or -1 if invalid.
(font::font): Constructor initializes `wch` member variable to null
pointer.
(font::~font): Destructor frees storage allocated in `font::load()`
for `special_device_coding` member of `wcp` struct, and that of `wcp`
itself.
(font::contains): If `glyph_to_ucs_codepoint()` returns a valid value
for the glyph, populate its wide character metrics and return true.
(font::get_font_wchar_metric): New function obtains font metrics of
input character by Unicode code point.
(font::get_width, font::get_height, font::get_depth)
(font::get_italic_correction, font::get_left_italic_correction)
(font::get_subscript_correction, font::get_character_type)
(font::get_code, font::get_special_device_encoding): If
`glyph_to_ucs_codepoint()` returns a valid value for the glyph,
populate its wide character metrics and return the appropriate
parameter based on them.
(font::get_width): Add conditional guard when computing width for a
glyph from a "Unicode font"; use the computation only if the device
description file ("DESC") didn't declare "unscaled_charwidths".
(font::load): Recognize new directive in font description files:
"charset-range", which works like the existing "charset" directive
except that the glyph descriptions use a `name` of the form
"uFFFF..uFFFF" (where "FFFF" is a hexadecimal digit sequence), and
apply the metrics identically to all glyphs in the designated range.
(font::load): When processing glyph descriptions in "charset" section
and the device has declared the "unicode" directive, stop scaling the
width of the glyph by what `wcwidth()` returns for it. (Does this fix
Savannah #44018?)
---
ChangeLog | 45 +++++++++++
src/include/font.h | 4 +
src/libs/libgroff/font.cpp | 194 ++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 233 insertions(+), 10 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 7a7da1f62..cb309aead 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,48 @@
+2024-11-20 TANAKA Takuji <ttk@t-lab.opal.ne.jp>
+
+ Support CJK fonts encoded in UTF-16 (2/6).
+
+ * src/include/font.h (class font): Declare private member
+ variable `wch`, a pointer to an existing list type
+ `font_char_metric`. Declare private member function
+ `get_font_wchar_metric()` to access it.
+ * src/libs/libgroff/font.cpp (struct font_char_metric): Add
+ members `next` (a pointer to the struct's own type) and
+ `end_code` of type `int`.
+ (glyph_to_ucs_codepoint): New function returns UCS code point
+ from a (non-composite) `glyph` object, or -1 if invalid.
+ (font::font): Constructor initializes `wch` member variable to
+ null pointer.
+ (font::~font): Destructor frees storage allocated in
+ `font::load()` for `special_device_coding` member of `wcp`
+ struct, and that of `wcp` itself.
+ (font::contains): If `glyph_to_ucs_codepoint()` returns a valid
+ value for the glyph, populate its wide character metrics and
+ return true.
+ (font::get_font_wchar_metric): New function obtains font metrics
+ of input character by Unicode code point.
+ (font::get_width, font::get_height, font::get_depth)
+ (font::get_italic_correction, font::get_left_italic_correction)
+ (font::get_subscript_correction, font::get_character_type)
+ (font::get_code, font::get_special_device_encoding): If
+ `glyph_to_ucs_codepoint()` returns a valid value for the glyph,
+ populate its wide character metrics and return the appropriate
+ parameter based on them.
+ (font::get_width): Add conditional guard when computing width
+ for a glyph from a "Unicode font"; use the computation only if
+ the device description file ("DESC") didn't declare
+ "unscaled_charwidths".
+ (font::load): Recognize new directive in font description files:
+ "charset-range", which works like the existing "charset"
+ directive except that the glyph descriptions use a `name` of the
+ form "uFFFF..uFFFF" (where "FFFF" is a hexadecimal digit
+ sequence), and apply the metrics identically to all glyphs in
+ the designated range.
+ (font::load): When processing glyph descriptions in "charset"
+ section and the device has declared the "unicode" directive,
+ stop scaling the width of the glyph by what `wcwidth()` returns
+ for it. (Does this fix Savannah #44018?)
+
2024-11-20 TANAKA Takuji <ttk@t-lab.opal.ne.jp>
Support CJK fonts encoded in UTF-16 (1/6).
diff --git a/src/include/font.h b/src/include/font.h
index 9742a383a..e2537ef12 100644
--- a/src/include/font.h
+++ b/src/include/font.h
@@ -295,6 +295,7 @@ private:
// font (if !is_unicode) or for just some characters
// (if is_unicode). The indices of this array are
// font-specific, found as values in ch_index[].
+ font_char_metric *wch;// Metrics for wide characters.
int ch_used;
int ch_size;
font_widths_cache *widths_cache; // A cache of scaled character
@@ -334,6 +335,9 @@ private:
const char *, // file
int); // lineno
+ // Get font metric for wide characters indexed by Unicode code point.
+ font_char_metric *get_font_wchar_metric(int);
+
protected:
font(const char *); // Initialize a font with the given name.
diff --git a/src/libs/libgroff/font.cpp b/src/libs/libgroff/font.cpp
index 4ec4f19db..27c213209 100644
--- a/src/libs/libgroff/font.cpp
+++ b/src/libs/libgroff/font.cpp
@@ -1,4 +1,4 @@
-/* Copyright (C) 1989-2021 Free Software Foundation, Inc.
+/* Copyright (C) 1989-2024 Free Software Foundation, Inc.
Written by James Clark (jjc@jclark.com)
This file is part of groff.
@@ -47,6 +47,8 @@ struct font_char_metric {
int italic_correction;
int subscript_correction;
char *special_device_coding;
+ struct font_char_metric *next;
+ int end_code;
};
struct font_kern_list {
@@ -163,6 +165,18 @@ void text_file::fatal(const char *format,
fatal_with_file_and_line(path, lineno, format, arg1, arg2, arg3);
}
+static int glyph_to_ucs_codepoint(glyph *g)
+{
+ const char *nm = glyph_to_name(g);
+ if (nm != 0 /* nullptr */) {
+ if (valid_unicode_code_sequence(nm) && (strchr(nm, '_') == 0)) {
+ char *ignore;
+ return static_cast<int>(strtol(nm + 1, &ignore, 16));
+ }
+ }
+ return -1;
+}
+
int glyph_to_unicode(glyph *g)
{
const char *nm = glyph_to_name(g);
@@ -212,7 +226,7 @@ font::font(const char *s) : ligatures(0),
kern_hash_table(0 /* nullptr */),
space_width(0), special(false), internalname(0 /* nullptr */),
slant(0.0), zoom(0), ch_index(0 /* nullptr */), nindices(0),
- ch(0 /* nullptr */), ch_used(0), ch_size(0),
+ ch(0 /* nullptr */), wch(0 /* nullptr */), ch_used(0), ch_size(0),
widths_cache(0 /* nullptr */)
{
name = new char[strlen(s) + 1];
@@ -244,6 +258,13 @@ font::~font()
widths_cache = widths_cache->next;
delete tem;
}
+ struct font_char_metric *wcp, *nwcp;
+ for (wcp = wch; wcp != 0 /* nullptr */; wcp = nwcp) {
+ nwcp = wcp->next;
+ if (wcp->special_device_coding)
+ delete[] wcp->special_device_coding;
+ delete wcp;
+ }
}
static int scale_round(int n, int x, int y)
@@ -326,6 +347,12 @@ bool font::contains(glyph *g)
// Explicitly enumerated glyph?
if (idx < nindices && ch_index[idx] >= 0)
return true;
+ int uc = glyph_to_ucs_codepoint(g);
+ if (uc > 0) {
+ font_char_metric *wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */)
+ return true;
+ }
if (is_unicode) {
// Unicode font
// ASCII or Unicode character, or groff glyph name that maps to Unicode?
@@ -357,6 +384,17 @@ font_widths_cache::~font_widths_cache()
delete[] width;
}
+struct font_char_metric *font::get_font_wchar_metric(int uc)
+{
+ struct font_char_metric *wcp;
+ for (wcp = wch; wcp != 0 /* nullptr */; wcp = wcp->next) {
+ if (wcp->code <= uc && uc <= wcp->end_code) {
+ return wcp;
+ }
+ }
+ return 0 /* nullptr */;
+}
+
int font::get_width(glyph *g, int point_size)
{
int idx = glyph_to_index(g);
@@ -371,6 +409,13 @@ int font::get_width(glyph *g, int point_size)
else
real_size = int(point_size * double(zoom) / 1000.0 + .5);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 && !(idx < nindices && ch_index[idx] >= 0)) {
+ return scale(wcp->width, point_size);
+ }
if (idx < nindices && ch_index[idx] >= 0) {
// Explicitly enumerated glyph
int i = ch_index[idx];
@@ -403,7 +448,7 @@ int font::get_width(glyph *g, int point_size)
// Unicode font
int width = 24; // XXX: Add a request to override this.
int w = wcwidth(get_code(g));
- if (w > 1)
+ if (w > 1 && !font::use_unscaled_charwidths)
width *= w;
if (real_size == unitwidth || font::use_unscaled_charwidths)
return width;
@@ -422,6 +467,13 @@ int font::get_height(glyph *g, int point_size)
// Explicitly enumerated glyph
return scale(ch[ch_index[idx]].height, point_size);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return scale(wcp->height, point_size);
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -438,6 +490,13 @@ int font::get_depth(glyph *g, int point_size)
// Explicitly enumerated glyph
return scale(ch[ch_index[idx]].depth, point_size);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return scale(wcp->depth, point_size);
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -454,6 +513,13 @@ int font::get_italic_correction(glyph *g, int point_size)
// Explicitly enumerated glyph
return scale(ch[ch_index[idx]].italic_correction, point_size);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return scale(wcp->italic_correction, point_size);
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -465,11 +531,18 @@ int font::get_italic_correction(glyph *g, int point_size)
int font::get_left_italic_correction(glyph *g, int point_size)
{
int idx = glyph_to_index(g);
- assert(idx >= 0);
+ assert(idx >= 0 /* nullptr */);
if (idx < nindices && ch_index[idx] >= 0) {
// Explicitly enumerated glyph
return scale(ch[ch_index[idx]].pre_math_space, point_size);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0 )
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return scale(wcp->pre_math_space, point_size);
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -486,6 +559,13 @@ int font::get_subscript_correction(glyph *g, int
point_size)
// Explicitly enumerated glyph
return scale(ch[ch_index[idx]].subscript_correction, point_size);
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return scale(wcp->subscript_correction, point_size);
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -560,6 +640,13 @@ int font::get_character_type(glyph *g)
// Explicitly enumerated glyph
return ch[ch_index[idx]].type;
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return wcp->type;
+ }
if (is_unicode) {
// Unicode font
return 0;
@@ -576,6 +663,13 @@ int font::get_code(glyph *g)
// Explicitly enumerated glyph
return ch[ch_index[idx]].code;
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */) {
+ return uc;
+ }
if (is_unicode) {
// Unicode font
// ASCII or Unicode character, or groff glyph name that maps to Unicode?
@@ -610,6 +704,12 @@ const char *font::get_special_device_encoding(glyph *g)
// Explicitly enumerated glyph
return ch[ch_index[idx]].special_device_coding;
}
+ int uc = glyph_to_ucs_codepoint(g);
+ font_char_metric *wcp = 0 /* nullptr */;
+ if (uc > 0)
+ wcp = get_font_wchar_metric(uc);
+ if (wcp != 0 /* nullptr */)
+ return wcp->special_device_coding;
if (is_unicode) {
// Unicode font
return 0;
@@ -877,7 +977,8 @@ bool font::load(bool load_header_only)
else if (strcmp(p, "special") == 0) {
special = true;
}
- else if (strcmp(p, "kernpairs") != 0 && strcmp(p, "charset") != 0) {
+ else if (strcmp(p, "kernpairs") != 0 && strcmp(p, "charset") != 0 &&
+ strcmp(p, "charset-range") != 0) {
char *directive = p;
p = strtok(0 /* nullptr */, "\n");
handle_unknown_font_command(directive, trim_arg(p), t.path,
@@ -923,6 +1024,84 @@ bool font::load(bool load_header_only)
add_kern(g1, g2, n);
}
}
+ // TODO: Rename this directive to "ranged-charset".
+ else if (strcmp(directive, "charset-range") == 0) {
+ if (load_header_only)
+ return true;
+ saw_charset_directive = true;
+ bool had_range = false;
+ for (;;) {
+ if (!t.next_line()) {
+ directive = 0 /* nullptr */;
+ break;
+ }
+ char *nm = strtok(t.buf, WS);
+ assert(nm != 0 /* nullptr */);
+ p = strtok(0 /* nullptr */, WS);
+ if (0 /* nullptr */ == p) {
+ directive = nm;
+ break;
+ }
+ int start_code = 0;
+ int end_code = 0;
+ int nrange = sscanf(nm, "u%X..u%X", &start_code, &end_code);
+ // TODO: Check for backwards range: end_code < start_code.
+ if (2 == nrange) {
+ had_range = true;
+ font_char_metric *wcp = new font_char_metric;
+ wcp->code = start_code;
+ wcp->end_code = end_code;
+ wcp->height = 0;
+ wcp->depth = 0;
+ wcp->pre_math_space = 0;
+ wcp->italic_correction = 0;
+ wcp->subscript_correction = 0;
+ int nparms = sscanf(p, "%d,%d,%d,%d,%d,%d",
+ &wcp->width, &wcp->height, &wcp->depth,
+ &wcp->italic_correction,
+ &wcp->pre_math_space,
+ &wcp->subscript_correction);
+ if (nparms < 1) {
+ t.error("missing or invalid width for character range '%1'",
+ nm);
+ return false;
+ }
+ p = strtok(0 /* nullptr */, WS);
+ if (0 /* nullptr */ == p) {
+ t.error("missing character type for '%1'", nm);
+ return false;
+ }
+ int type;
+ if (sscanf(p, "%d", &type) != 1) {
+ t.error("invalid character type for '%1'", nm);
+ return false;
+ }
+ if ((type < 0) || (type > 255)) {
+ t.error("character type '%1' out of range for '%2'", type,
+ nm);
+ return false;
+ }
+ wcp->type = type;
+
+ p = strtok(0 /* nullptr */, WS);
+ if ((0 /* nullptr */ == p) || (strcmp(p, "--") == 0)) {
+ wcp->special_device_coding = 0 /* nullptr */;
+ }
+ else {
+ wcp->special_device_coding = new char[strlen(p) + 1];
+ strcpy(wcp->special_device_coding, p);
+ }
+ wcp->next = wch;
+ wch = wcp;
+ p = 0 /* nullptr */;
+ }
+ }
+ // TODO: Parallelize wording of "charset"'s diagnostic.
+ if (!had_range) {
+ t.error("no glyphs described after 'charset-range' directive");
+ return false;
+ }
+ }
else if (strcmp(directive, "charset") == 0) {
if (load_header_only)
return true;
@@ -997,11 +1176,6 @@ bool font::load(bool load_header_only)
t.error("invalid code '%1' for character '%2'", p, nm);
return false;
}
- if (is_unicode) {
- int w = wcwidth(metric.code);
- if (w > 1)
- metric.width *= w;
- }
p = strtok(0 /* nullptr */, WS);
if ((0 /* nullptr */ == p) || (strcmp(p, "--") == 0)) {
metric.special_device_coding = 0;
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [groff] 16/23: Support CJK fonts encoded in UTF-16 (2/6).,
G. Branden Robinson <=