[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appr
From: |
Robert Pluim |
Subject: |
bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate |
Date: |
Thu, 01 Jun 2023 15:30:18 +0200 |
>>>>> On Thu, 01 Jun 2023 15:43:26 +0300, Eli Zaretskii <eliz@gnu.org> said:
>> Cc: 63731@debbugs.gnu.org, steven@stebalien.com
>> Date: Wed, 31 May 2023 19:18:22 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>>
>> > From: Robert Pluim <rpluim@gmail.com>
>> > Cc: 63731@debbugs.gnu.org, steven@stebalien.com
>> > Date: Wed, 31 May 2023 18:11:36 +0200
>> >
>> > Eli> So there are two issues here: (a) why there's no composition
in the
>> > Eli> first case, and (b) why does "C-u C-x =" says there is when
there
>> > Eli> isn't.
>> >
>> > OK. I can poke around in gdb if you give me some idea of what I should
>> > be looking at.
>>
>> I don't really know. I plan to just step through the code in
>> composite.c tomorrow, unless you beat me to it. Once we understand
>> issue (a), I think we will also understand issue (b).
Eli> OK, the issue is quite clear even without stepping with a debugger.
Eli> Bottom line: we cannot support a situation where the same character
Eli> can be composed by more than one slot in composition-function-table.
Eli> If there are more than a single slot for the same character, one of
Eli> them will be tried, and the rest will be ignored (not even tried).
Eli> In particular, if a character CH has a "forward" composition rule that
Eli> starts with itself, and also has a "backward" rule (one with non-zero
Eli> look-back parameter) triggered by a different character (which should
Eli> follow CH), the latter rule will never be tried.
OK, that makes sense. Where would be a good place to document this?
Eli> This is what happens in this case: the character #x1F44D has several
Eli> rules that start with itself in emoji-zwj.el:
Eli> (#x1F44D .
Eli> ,(eval-when-compile (regexp-opt
Eli> '(
Eli> "\N{U+1F44D}\N{U+1F3FB}"
Eli> "\N{U+1F44D}\N{U+1F3FC}"
Eli> "\N{U+1F44D}\N{U+1F3FD}"
Eli> "\N{U+1F44D}\N{U+1F3FE}"
Eli> "\N{U+1F44D}\N{U+1F3FF}"
Eli> ))))
Eli> and it also has a "backward" rule:
Eli> (set-char-table-range
Eli> composition-function-table
Eli> #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))
Eli> The latter is triggered by #xFE0F and has a 1-character look-back,
Eli> which will match #x1F44D, since its category is '.' (it's a "base
Eli> character"). This latter rule is never tried. Why? because the
Eli> former rules, anchored at #X1F44D, are tried first (Emacs redisplay
Eli> examines characters in the order of their buffer positions), and fail
Eli> to match. When those rules fail to match, due to how the
Eli> composition-related functions called by the display engine are
Eli> factored, we never again consider compositions triggered by a later
Eli> character which "cover" also #x1F44D: once that position was examined
Eli> and the attempted composition failed, we move to the next character.
Eli> IOW, we assume that this first set of composition rules we find for a
Eli> given character are the only ones that could possibly be relevant for
Eli> that character.
Eli> Which means that to have #xFE0F compose correctly with Emoji
Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.
Thatʼs easy enough:
diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk
index 7d2ff6cb900..d1195ebbad8 100644
--- a/admin/unidata/emoji-zwj.awk
+++ b/admin/unidata/emoji-zwj.awk
@@ -106,7 +106,8 @@ END {
for (elt in ch)
{
- printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n",
elt, vec[elt])
+ entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt)
+ printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n",
elt, entries)
}
print "))"
print " (set-char-table-range composition-function-table"
That makes all the VS-16 sequences in
admin/unidata/emoji-variation-sequences.txt display with the emoji
font for me.
Eli> The reason why "C-u C-x =" lies to us saying there's a composition
Eli> where really there isn't is because descr-text.el uses the
Eli> find-composition primitive, whose implementation is parallel and
Eli> separate from that of the display-engine routines, and is structured
Eli> differently. So find-composition does succeed to detect the second
Eli> rule, the one triggered by #xFE0F, which the display engine ignores.
Eli> I will think whether this can be fixed, to avoid such false positives,
Eli> but if we accept that there can be only one set of composition rules
Eli> for a character, then we basically invoked undefined behavior here,
Eli> and we got what we deserved.
If find-composition DTRT, could we not use it in the display engine?
Robert
--
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/01
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate,
Robert Pluim <=
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/01
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Robert Pluim, 2023/06/01
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Robert Pluim, 2023/06/02
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/02
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Robert Pluim, 2023/06/02
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/02
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Robert Pluim, 2023/06/02
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/03
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Robert Pluim, 2023/06/05
- bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate, Eli Zaretskii, 2023/06/05