bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] toCasefold?


From: Bruno Haible
Subject: Re: [bug-libunistring] toCasefold?
Date: Fri, 27 May 2011 21:31:14 +0200
User-agent: KMail/1.9.9

Hi Simon,

> I'm looking for an implementation of the toCasefold(X) operation defined
> in Unicode 6.0 section 3.13 page 114 [1] like this:
> 
>   R4 toCasefold(X): Map each character C in X to Case_Folding(C).
> 
>   • Case_Folding(C) uses the mappings with the status field value “C” or
>     “F” in the data file CaseFolding.txt in the Unicode Character
>     Database.

This function maps a string X to a sting.

> Reading the manual I found this function:
> 
>  -- Function: uint32_t * u32_casefold (const uint32_t *S, size_t N,
>           const char *ISO639_LANGUAGE, uninorm_t NF, uint32_t
>           *RESULTBUF, size_t *LENGTHP)
>      Returns the case folded string.
> 
> but I'm not sure what to use for ISO639_LANGUAGE

If you want a locale independent case folding, you can use the empty string
as ISO639_LANGUAGE.

> After reading the u32_casefold code, I found the seamingly appropriate
> function uc_tocasefold:
> 
> /* Return the casefold mapping of a Unicode character.  */
> extern ucs4_t
>        uc_tocasefold (ucs4_t uc);
> 
> However it doesn't seem to produce the right output, since
> uc_tocasefold(U+0130) returns U+0130.

No, this function is not appropriate, because it maps a single character
to a single character only. It cannot do the mapping
  <U+0130>  -->  <U+0069><U+0307>
that you find in Unicode's CaseFolding.txt file.

> looking at the 
> implementation I'm not sure it really corresponds to the toCasefold
> algorithm since it seems quite complex whereas Unicode toCasefold seems
> just like a property lookup function.

The u32_casefold function also handles the locale dependent casing, that
toCasefold does not do (file SpecialCasing.txt). That explains the complexity.

Bruno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]