emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#11309: closed (24.1.50; Case problems with [:upper:] and Cyrillic, G


From: GNU bug Tracking System
Subject: bug#11309: closed (24.1.50; Case problems with [:upper:] and Cyrillic, Greek)
Date: Wed, 09 Dec 2020 14:38:01 +0000

Your message dated Wed, 9 Dec 2020 15:37:19 +0100
with message-id <28B85957-B8DB-431D-A120-F17D8AE4693F@acm.org>
and subject line Re: bug#11309: 24.1.50; Case problems with [:upper:] and 
Cyrillic,  Greek
has caused the debbugs.gnu.org bug report #11309,
regarding 24.1.50; Case problems with [:upper:] and Cyrillic, Greek
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
11309: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11309
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek Date: Sun, 22 Apr 2012 11:11:30 +0100
This bug report will be sent to the Bug-GNU-Emacs mailing list
and the GNU bug tracker at debbugs.gnu.org.  Please check that
the From: line contains a valid email address.  After a delay of up
to one day, you should receive an acknowledgement at that address.

Please write in English if possible, as the Emacs maintainers
usually do not have translators for other languages.

Please describe exactly what actions triggered the bug, and
the precise symptoms of the bug.  If you can, give a recipe
starting from `emacs -Q':

The Lisp manual says this when describing character classes:

  `[:lower:]'
       This matches any lower-case letter, as determined by the current
       case table (*note Case Tables::).  If `case-fold-search' is
       non-`nil', this also matches any upper-case letter.

And:

  `[:upper:]'
       This matches any upper-case letter, as determined by the current
       case table (*note Case Tables::).  If `case-fold-search' is
       non-`nil', this also matches any lower-case letter.
  
OK, so let's test this:

(let ((case-fold-search t))
  (string-match "[[:upper:]]" "a\u0686"))
=> 0 ;; As documented

(upcase "\u0430") ;; CYRILLIC SMALL LETTER A
=> "А" ;; "\u0410", so it's in the case table

(let ((case-fold-search t))
  (string-match "[[:upper:]]" "\u0430\u0686"))
=> nil ;; Ah, this is unexpected.

(let ((case-fold-search t))
  (string-match "[[:lower:]]" "\u0410\u0686"))
=> 0 ;; But this works as documented. 

(upcase "\u03b2") ;; GREEK SMALL LETTER BETA
=> "Β" ;; "\u0392", it's in the case table

(let ((case-fold-search t))
  (string-match "[[:upper:]]" "\u03b2\u5357"))
=> nil ;; Oops

(let ((case-fold-search t))
  (string-match "[[:lower:]]" "\u0392\u5357"))
=> 0 ;; But this works, again. 

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/Sources/emacs/nextstep/Emacs.app/Contents/Resources/etc/DEBUG.


In GNU Emacs 24.1.50.1 (i386-apple-darwin10.8.0, NS apple-appkit-1038.36)
 of 2012-04-22 on bonbon
Windowing system distributor `Apple', version 10.3.1038
Configured using:
 `configure '--with-ns''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Info

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
C-b C-b C-b C-b C-b C-b C-b C-f SPC \ x 7 f C-e C-j 
C-p C-f C-f C-f C-x = C-a ( SPC C-f C-x = C-a C-f s 
t <backspace> <backspace> m u l t <backspace> i b y 
t e - s t r i n g - p C-a C-f C-f C-f C-f t C-e ) C-j 
C-p C-p C-p C-n C-f C-f C-f C-f C-f C-f C-f C-f C-f 
C-f C-f C-f C-f C-f C-f C-f C-f C-f C-f C-f C-f C-f 
C-f C-f C-f C-b C-b C-b C-f C-x = C-x 1 C-f C-f C-f 
C-b C-k <escape> b <left> C-k C-p C-p C-p C-p C-p C-p 
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p 
C-p C-e C-b C-b C-b C-y C-k ) C-j C-p C-p C-e C-b C-b 
C-b C-b C-d C-e C-j C-p C-p C-e C-b C-b C-b C-t C-e 
C-j C-p C-p C-e C-x C-b C-x o C-n C-n C-n RET C-x 1 
C-x b <return> C-x b * s c <tab> <return> C-n C-p C-n 
C-n e n a b l e - m u l t i b y t e - c h a r a c t 
e r s C-j C-x b <return> C-p C-n RET C-v l C-a C-n 
C-n C-n C-e C-x 2 C-x o C-x b * s c <backspace> <backspace> 
<backspace> C-g C-x C-b C-x o C-n C-n C-n C-n RET C-p 
C-p C-p C-x o C-p C-p C-a C-n C-SPC C-n C-n C-n C-n 
<escape> w <escape> x r e p o r t - e m a c s - b u 
g s <tab> C-g <escape> x r e p o r t - e m a c s - 
b u g <return>

Recent messages:
insert-file-contents-literally: Opening input file: no such file or directory, 
/Sources/emacs/nextstep/Emacs.app/Contents/Resources/etc/DOC-24.1.50.1
Mark set
Char: ä (228, #o344, #xe4, file ...) point=499 of 612 (81%) column=1 [2 times]
Char: DEL (127, #o177, #x7f) point=466 of 623 (75%) column=3
Char: ä (228, #o344, #xe4, file ...) point=466 of 625 (74%) column=3
Char: DEL (127, #o177, #x7f) point=486 of 647 (75%) column=23
Mark set
Quit
byte-code: Beginning of buffer [2 times]
Mark set
Quit

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils find-func vc-git cc-mode cc-fonts cc-guess
cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs mule-util
multi-isearch info help-mode easymenu view help-fns byte-opt warnings cl
compile comint ansi-color ring bytecomp byte-compile cconv macroexp
vc-hg time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel
ns-win tool-bar dnd fontset image regexp-opt fringe lisp-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face files text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process dbusbind ns multi-tty emacs)

-- 
‘Iodine deficiency was endemic in parts of the UK until, through what has been
described as “an unplanned and accidental public health triumph”, iodine was
added to cattle feed to improve milk production in the 1930s.’
(EN Pearce, Lancet, June 2011)



--- End Message ---
--- Begin Message --- Subject: Re: bug#11309: 24.1.50; Case problems with [:upper:] and Cyrillic, Greek Date: Wed, 9 Dec 2020 15:37:19 +0100
Eli, thanks for looking at the patch, now pushed to master (with Basil's 
suggested tweak).

> Why is it wrong, and what practical problems does this cause?

ß is a lower case letter so lowercasep(ß)=false is wrong. As a consequence, 
matching ß with [:lower:] and [:upper:] don't work correctly: ß should be 
matched by [:lower:] when case-fold-search is nil, and by both [:lower:] and 
[:upper:] when case-fold-search is non-nil.

The problem stems from the fact that uppercasep and lowercasep don't use the 
Unicode case information directly (which perhaps they should) but derive the 
case indirectly from the upcase and downcase tables, and there is no way to 
state that a char is lower case but cannot be upcased or downcased. (Below I'm 
going to use the notation T[C] for the table T indexed by character C.)

Currently, characters missing from or self-mapping in the upcase and downcase 
tables are considered to be caseless. For instance, upcase[*]=downcase[*]=* and 
upcase[中]=downcase[中]=nil. However, we also have upcase[ß]=downcase[ß]=ß, 
causing the incorrect lowercasep result.

The solution that I ended up applying was the simplest possible: set 
upcase[ß]=ẞ (U+7838). The special-uppercase properties ensure that (upcase "ß") 
=> "SS", and now all tests pass.

(An acceptable alternative would have been to set upcase[ß]=nil and adapt 
lowercasep accordingly. I tried that and it works flawlessly, but involves 
slightly more changes.)

And that concludes the resolution of this bug.



--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]