[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] toCasefold?
From: |
Simon Josefsson |
Subject: |
Re: [bug-libunistring] toCasefold? |
Date: |
Mon, 30 May 2011 14:19:59 +0200 |
User-agent: |
Gnus/5.110018 (No Gnus v0.18) Emacs/23.2 (gnu/linux) |
Bruno Haible <address@hidden> writes:
> Simon Josefsson wrote:
>> >> I'm looking for an implementation of the toCasefold(X) operation defined
>> >> in Unicode 6.0 section 3.13 page 114 [1] like this:
>> >>
>> >> R4 toCasefold(X): Map each character C in X to Case_Folding(C).
>> >>
>> >> • Case_Folding(C) uses the mappings with the status field value “C” or
>> >> “F” in the data file CaseFolding.txt in the Unicode Character
>> >> Database.
>> ...
>> But does u32_casefold match Unicode toCasefold? Is it possible to
>> disable the SpecialCasing stuff?
>
> SpecialCasing.txt applies to toUpper, toLower, toTitle mappings. For
> toCasefold, all mappings are given in CaseFolding.txt, namely:
> - the locale independent mappings (type 'C' and 'F'),
> - the locale dependent mappings (type 'T') - this is similar to
> SpecialCasing.txt.
>
> u32_casefold uses all of these mappings. And when you pass an empty string
> as ISO639_LANGUAGE, it uses only the locale independent mappings (type
> 'C' and 'F'), hence it matches what toCasefold does.
Thanks -- meanwhile, I manually implemented a simple toCaseFold based on
the tables directly, and now when I compare its output with the output
from u32_casefold it matches for all code points. So indeed it seems
correct. Just a data point to boost confidence in your implementation.
/Simon