bug-guile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: guile 1.8.5 test failure: srfi-14.test


From: Bruno Haible
Subject: Re: guile 1.8.5 test failure: srfi-14.test
Date: Tue, 27 May 2008 02:33:02 +0200
User-agent: KMail/1.5.4

Ludovic Courtès wrote in
<http://lists.gnu.org/archive/html/bug-guile/2008-05/msg00014.html>:
> > So the notion of "letters" in a Latin1 locale may depend on the libc.
> > It might be safer to change the test code from
> >
> >     (= (char-set-size char-set:letter) 117)
> >
> > to
> >
> >     (>= (char-set-size char-set:letter) 100)
> 
> The cardinals of these char sets were taken from SRFI-14:
> 
>   http://srfi.schemers.org/srfi-14/srfi-14.html#StandardCharsetDefs
> 
> This indicates that we should fix our SRFI-14 implementation, not the
> test.  ;-)

I don't think it's appropriate to take these numbers (117 etc.) as precise
expectations. Unicode is a moving target: At every Unicode version, new
characters are being added, and sometimes also the character classification
into "letters" vs. "non-letters" changes.

The SRFI-14 text to which you point says at various places "... in Unicode 3.0".
This matches the date of origin (1999/2000) of that text.

Note also that the text talking about the Unicode letters and 117 is outside
the section "Specification", which makes me think that it is not normative.
Even if it is normative, it nowhere says that you have to use *exactly*
Unicode 3.0.

So you have a choice between 3 alternatives:

  1) Provide an implementation of char-set:letter that is tied to a particular
     Unicode version and will not evolve. Then you can hardwire specific
     letter counts in the test suite.

  2) Provide an implementation that does not rely on the libc locale system
     but still upgrades to new Unicode versions now and then. Then you have to
     update the letter count in the tests when you upgrade the library.

  3) Provide an implementation that relies on the libc locale system, and
     thus upgrades to new Unicode vesions when the libc does. Then you can
     only expect approximate letter counts.

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]