gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Hash computation and TFB


From: Luboš Doležel
Subject: Hash computation and TFB
Date: Tue, 06 Aug 2013 15:14:58 +0200
User-agent: Roundcube Webmail/0.5

Hello,

hash computation with Toll-Free Bridging is a tricky subject. Do it wrong and you'll get all sorts of trouble, especially with dictionaries, which use hashes a lot.

The code in corebase currently dispatches all CFHash() calls on ObjC objects to -hash, which is bad. The following expectation breaks due to this dispatch:

CFHash(@"string") == CFHash(CFSTR("string"))

because NSString uses a different hashing algorithm than CFString.
My suggestion is to do away with the ObjC dispatch in CFHash() and alter all the CF*Hash() functions to support ObjC types.

While looking at CFStringHash(), I've also noticed that either 8-bit or 16-bit raw character data is used for hashing based on what is available. I believe this breaks the following case:

===
CFStringRef str1 = CFSTR("str");
CFStringRef str2 = CFStringCreateWithCharacters(NULL, (UniChar*) "s\0t\0r\0", 3); // "str" in UTF-16

CFHash(str1) == CFHash(str2);
===

While the two strings are obviously identical, different bytes are used to generate the hash in both cases.

This problem can by solved by converting the character data to Unicode first, which has a performance impact, but only once for every CFString.

The situation with CFHash() calls on NSStrings is worse, since corebase has nowhere to save the calculated hash, so it must be recalculated every time. But I think it's better to be slow than to be wrong. Please review the attached patch and let me know if you have any observations.

--
Luboš Doležel

Attachment: corebase-hash-fix.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]