bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] toCasefold?


From: Simon Josefsson
Subject: Re: [bug-libunistring] toCasefold?
Date: Mon, 30 May 2011 14:19:59 +0200
User-agent: Gnus/5.110018 (No Gnus v0.18) Emacs/23.2 (gnu/linux)

Bruno Haible <address@hidden> writes:

> Simon Josefsson wrote:
>> >> I'm looking for an implementation of the toCasefold(X) operation defined
>> >> in Unicode 6.0 section 3.13 page 114 [1] like this:
>> >> 
>> >>   R4 toCasefold(X): Map each character C in X to Case_Folding(C).
>> >> 
>> >>   • Case_Folding(C) uses the mappings with the status field value “C” or
>> >>     “F” in the data file CaseFolding.txt in the Unicode Character
>> >>     Database.
>> ...
>> But does u32_casefold match Unicode toCasefold?  Is it possible to
>> disable the SpecialCasing stuff?
>
> SpecialCasing.txt applies to toUpper, toLower, toTitle mappings. For
> toCasefold, all mappings are given in CaseFolding.txt, namely:
>   - the locale independent mappings (type 'C' and 'F'),
>   - the locale dependent mappings (type 'T') - this is similar to
>     SpecialCasing.txt.
>
> u32_casefold uses all of these mappings. And when you pass an empty string
> as ISO639_LANGUAGE, it uses only the locale independent mappings (type
> 'C' and 'F'), hence it matches what toCasefold does.

Thanks -- meanwhile, I manually implemented a simple toCaseFold based on
the tables directly, and now when I compare its output with the output
from u32_casefold it matches for all code points.  So indeed it seems
correct.  Just a data point to boost confidence in your implementation.

/Simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]