[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uniq i18n implementation
From: |
Paul Eggert |
Subject: |
Re: uniq i18n implementation |
Date: |
Thu, 10 Aug 2006 16:21:49 -0700 |
User-agent: |
Gnus/5.1008 (Gnus v5.10.8) Emacs/21.4 (gnu/linux) |
Pádraig Brady <address@hidden> writes:
> I was also using the string length comparison
> shortcut on the wide string. I'm unsure whether
> this is valid (on all platforms).
Me too, which is why the current code is cautious about this sort of thing.
>> Sorry, I'm not familiar with the ICU code. Is it free software and is
>> it well maintained? Where else is it being used, outside ICU itself?
>
> I am not familiar with it myself, but note
> it's used for various things in python, mozilla, openoffice, ...
OK, well, when we know more about it perhaps we can consider using it.
>> we might have "X" < "Y" < "Z" (using C-locale comparison), but "Z"
>> < "X" (using some other locale's comparison). This will lead to
>> inconsistencies, which will be hard to document and will confuse
>> users.
>
> Garbage In Garbage Out.
Subject to memory limits programs like "sort" and "uniq" should work
on all inputs, not just the "nice" ones.
> As for confusing users my solution was to print
> a warning indicating the invalid input.
If that is the best we can do (and it is done in some places already)
then we'll do that. But I'd prefer a more-general approach.
>> Worse, it can
>> even lead to buffer overruns: e.g., qsort has undefined behavior if
>> you pass it a comparison function that is not a total order.
>
> Thanks for pointing that out.
> I'll look into that.
The current coreutils code avoids the problem by using 'exit' or
'longjmp' to break out of 'qsort'/etc. when strcoll reports an error;
this avoid the undefined behavior. It's kind of ugly. This is partly
why I'd like the cleaner solution.