emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-insensitive string comparison


From: Eli Zaretskii
Subject: Re: case-insensitive string comparison
Date: Wed, 20 Jul 2022 20:50:16 +0300

> From: Roland Winkler <winkler@gnu.org>
> Cc: monnier@iro.umontreal.ca,  emacs-devel@gnu.org
> Date: Wed, 20 Jul 2022 12:37:29 -0500
> 
> > I hear you, but your request is impossible to fulfill in practice.
> > That's because the collation rules used by this function are
> > implemented in the C library, and even if we know the locale,
> > different implementations of libc use different collation rules (in
> > addition, collation rules for some locales change with time).
> 
> Even mentioning the difficulties could be useful here.

I'm not sure I agree.  To describe all the important aspects of this
would take too long, and it isn't the job of our manual to document
this stuff.  Read this if you want to know:

  https://unicode.org/reports/tr10/

> The elisp manual is used by people who want to develop code that
> works for a wide range of users.  So even if string comparison is a
> slippery terrain these elisp hackers need to make design choices
> that work best for most users.

Luckily, Emacs Lisp programs rarely need this.

> What usage scenarios in elisp packages might benefit from
> string-collate-equalp even if this function depends on details that can
> be quite different for different users?

For example, sorting file names.  If you want to get anything similar
to what GNU 'ls' does on GNU/Linux (in particular, with punctuation
characters in file names), you need to use the locale's collation
rules as implemented by glibc.  Which is what string-collate-lessp
does.

> >> - BBDB needs to know whether a name is already present in the database
> >>   or not, ignoring case.  The function bbdb-string= is again what Sam
> >>   suggests to put into subr.el.  The function string-collate-equalp
> >>   might be better suited for this.  But which locale should it use?  The
> >>   records in my BBDB cover larger parts of the world and I do not even
> >>   know which locale(s) might work best for each of them, not to mention
> >>   that BBDB needs to loop over all records.  Is there a "univeral
> >>   default locale"?
> >
> > That "universal default locale" is what Emacs uses, modulo the few
> > problematic characters like the dotless I etc.  For 100% predictable
> > results, build your own case table, bind the buffer's case table to
> > it, and then call case-insensitive comparison.
> 
> I am not sure I can follow your argument.  Do you suggest that, likely,
> BBDB will work best if it compares names using compare-strings?

Yes.  But in addition, you should set up the case table of the current
buffer when you do so, because otherwise special cases with the likes
of the Turkish language's dotless I could in rare cases screw you.

> (I'd be glad to hear that.)  This code should work for users who do not
> want to build their own case table and stuff like that.

Not the users should build the case table, BBDB (or whatever Lisp
program that needs the comparison) should.  It's not that hard,
really: if you only need ASCII, use ascii-case-table, otherwise copy
the standard case-table and modify it to make sure I downcases to i
and similarly with a few other exceptional letters.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]