Re: [Gcl-devel] utf8 and emacs text/string multibyte representation

gcl-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation

From:	Raymond Toy
Subject:	Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Date:	Sat, 01 Nov 2014 09:45:47 -0700
User-agent:	Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b34 (darwin)

>>>>> "Matt" == Matt Kaufmann <address@hidden> writes:

    Matt> I saw your question and was curious, so I looked into it a bit:
    >>> To your knowledge, is there any objection to defining alpha-char-p as
    >>> including code-char's >= 128?

    Matt> I see that SBCL 1.2.2 is OK with that, for example:

    Matt> * (code-char 232)

    Matt> #\LATIN_SMALL_LETTER_E_WITH_GRAVE
    Matt> * (alpha-char-p (code-char 232))

    Matt> T
    Matt> * 

    Matt> In fact, that alpha-char-p call also returns T in (versions of)
    Matt> Allegro CL, CCL, CLISP, CMU CL, LispWorks, and SBCL.

Try (code-char #xa0). This is the unicode character
no-break-space. This has no case and would presumably not be
alpha-char-p. I think there are quite a few characters that would not
be (from cmucl):

(count nil (loop for k from 128 upto 255 collect (alpha-char-p (code-char k))))
63

I think there is some confusion here, at least for me. If gcl uses
8-bit code-units and utf-8 strings, what exactly is (coode-char 232)? 
You can store that into a utf-8 string but it won't be
#\latin_small_letter_e_with_grave because that would be encoded as two
octets in a utf-8 string: 195 168.

I think it's perfectly legal for gcl to say everything above 128 is
alpha-char-p. I think, however, that people will just get confused
that no such characters can be stored into a string and processed
correctly as utf-8 without a bit of work.

But perhaps this is just how 8-bit chars and utf-8 strings just have
to work.

I think 16-bit chars with utf-16 or 32-bit chars with utf-32 are far
easier to explain.

K.I.S.S?

--
Ray

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/11/01
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Matt Kaufmann, 2014/11/01
  - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy <=
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Matt Kaufmann, 2014/11/01
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/11/01
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/11/01
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/11/01
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Bruce-Robert Fenn Pocock, 2014/11/02
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Camm Maguire, 2014/11/01
  - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/11/01
    - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Raymond Toy, 2014/11/01
- Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, Stephen J. Turnbull, 2014/11/01
  - Re: [Gcl-devel] utf8 and emacs text/string multibyte representation, David Kastrup, 2014/11/01

Prev by Date: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Next by Date: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Previous by thread: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Next by thread: Re: [Gcl-devel] utf8 and emacs text/string multibyte representation
Index(es):
- Date
- Thread