bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#51292: 27.2; Reversing strings with unicode combining characters


From: Eli Zaretskii
Subject: bug#51292: 27.2; Reversing strings with unicode combining characters
Date: Wed, 20 Oct 2021 14:45:46 +0300

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Tue, 19 Oct 2021 21:26:31 +0200
> Cc: 51292@debbugs.gnu.org
> 
> Howard Melman <hmelman@gmail.com> writes:
> 
> > Reversing a string fails to account for unicode combining characters
> >
> >     (reverse "nai\u0308ve")
> >     "ev̈ian"
> >
> > Note the diaeresis is now on the v and not the i.  s-reverse gets it right:
> >
> >     (s-reverse "nai\u0308ve")
> >     "evïan"
> 
> So I wondered what s-reverse did, and indeed:
> 
> (defun s-reverse (s)
>   "Return the reverse of S."
>   (declare (pure t) (side-effect-free t))
>   (save-match-data
>     (if (multibyte-string-p s)
>         (let ((input (string-to-list s))
>               output)
>           (require 'ucs-normalize)
>           (while input
>             ;; Handle entire grapheme cluster as a single unit
>             (let ((grapheme (list (pop input))))
>               (while (memql (car input) ucs-normalize-combining-chars)
>                 (push (pop input) grapheme))
>               (setq output (nconc (nreverse grapheme) output))))
>           (concat output))
>       (concat (nreverse (string-to-list s))))))
> 
> Emacs has string-reverse, obsolete since 25.1.  Perhaps we should
> reintroduce it and use the definition from s?

I don't understand the use case(s) where this could be useful.  If
this is for display, then displaying text needs much more than just
combining accents with the base characters.  E.g., what if the accent
should not combine when the order is reversed, i.e. the composition
rules depend on the following characters as well?  And what if
character composition is not due to normalization rules.  Or what if
the text includes bidirectional scripts, whose reversal rules are
either very complex or simply undefined?

If this is not for display, then where is this useful and why?

If someone can describe real-life use cases, we could reason whether
doing something like that could be useful enough.  Without that, the
code in s-reverse seems like an incomplete semi-feature which supports
some limited use cases that someone needed in some specific situation,
not a useful general feature that handles the issue anywhere close to
completeness.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]