[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek
From: |
Mattias Engdegård |
Subject: |
bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek |
Date: |
Wed, 9 Dec 2020 15:37:19 +0100 |
Eli, thanks for looking at the patch, now pushed to master (with Basil's
suggested tweak).
> Why is it wrong, and what practical problems does this cause?
ß is a lower case letter so lowercasep(ß)=false is wrong. As a consequence,
matching ß with [:lower:] and [:upper:] don't work correctly: ß should be
matched by [:lower:] when case-fold-search is nil, and by both [:lower:] and
[:upper:] when case-fold-search is non-nil.
The problem stems from the fact that uppercasep and lowercasep don't use the
Unicode case information directly (which perhaps they should) but derive the
case indirectly from the upcase and downcase tables, and there is no way to
state that a char is lower case but cannot be upcased or downcased. (Below I'm
going to use the notation T[C] for the table T indexed by character C.)
Currently, characters missing from or self-mapping in the upcase and downcase
tables are considered to be caseless. For instance, upcase[*]=downcase[*]=* and
upcase[中]=downcase[中]=nil. However, we also have upcase[ß]=downcase[ß]=ß,
causing the incorrect lowercasep result.
The solution that I ended up applying was the simplest possible: set
upcase[ß]=ẞ (U+7838). The special-uppercase properties ensure that (upcase "ß")
=> "SS", and now all tests pass.
(An acceptable alternative would have been to set upcase[ß]=nil and adapt
lowercasep accordingly. I tried that and it works flawlessly, but involves
slightly more changes.)
And that concludes the resolution of this bug.
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/07
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/08
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Eli Zaretskii, 2020/12/08
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/08
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Eli Zaretskii, 2020/12/08
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek,
Mattias Engdegård <=
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Eli Zaretskii, 2020/12/09
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/10
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Eli Zaretskii, 2020/12/10
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/10
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Lars Ingebrigtsen, 2020/12/10
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Mattias Engdegård, 2020/12/11
- bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Lars Ingebrigtsen, 2020/12/11
bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Andreas Schwab, 2020/12/08
bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek, Basil L. Contovounesios, 2020/12/08