help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Manually parsing char-tables


From: Richard Wordingham
Subject: Re: Manually parsing char-tables
Date: Mon, 21 Feb 2022 01:39:41 +0000

On Sun, 20 Feb 2022 14:50:54 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Sun, 20 Feb 2022 11:09:26 +0000
> > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > 
> > I am trying to understand how Arabic script rendering works in Emacs
> > 28.0.90, as it seems to be using a different mechanism to that used
> > for Indic or European scripts.  (There seems to be more to it than
> > just the asymmetries between right-to-left and left-to-right.)  To
> > that end, I am trying to understand the contents of the variable
> > composition-function-table.  
> 
> I think it is easier to just look at how the Arabic part of this table
> is populated.  See lisp/language/misc-lang.el starting from line 105.

I first wanted to check that it was overwritten somewhere else.

> >       #^^[3 1152 nil nil nil #1# #1# #1# #1# #1# #1# #1# nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil]
> > 
> > (I've converted lines to paragraphs and abbreviated leading white
> > space.)
> > 
> >  I'm guessing that #1# is a macro invocation; when I invoke (print
> >  composition-function-table), I get something similar, but with #1#
> >  expanded and the '#1=' in the apparent macro definition omitted.  
> 
> #1# is a backreference to the value indicated by #1=.
> 
> > Where is this syntax explained?  I've looked in the elisp manual,
> > but not found it, though I may simply have failed to guess where
> > such a description was.  
> 
> See the node "Circular Objects" there.

That was reassuring - but I'm wondering why it was not familiar.  Had I
forgotten it?  Perhaps it's later then Emacs 19, when I last came close
to reading the lisp reference manual cover to cover.

Even the read syntax of a char-table is poorly documented. Using the
hint of an unexpanded reference to a 'sub-char-table', I've discovered
that the first key to understanding it is in list.h, and I may have to
delve into the .c files for the finer details.  It looks full of tricks
to reduce the storage requirement, which are reflected in the read
syntax. Perhaps it's not been documented because someone hopes it will
be cleaned up, but it is a useful syntax for dumping the table if
someone suspect the structure has been corrupted.  I will now present
my analysis in the hope that someone will find it useful.

Basically the data is stored in 64 blocks (of 'depth' 1) each for 2^16
characters, which in turn are composed of 16 blocks (of 'depth' 2) each
for 2^12 characters, which in turn are composed of 32 blocks (of 'depth'
3) each for 128 characters.  These blocks are the 'sub-char-tables', and
are introduced as a vector with two prepended items - the depth and the
first character code.  If all the data in a block is the same, that
same value replaces its sub-char-table.  (That happens with the
Unicode Arabic Block, which is covered by two sub-char-tables.)  This
structure is, eminently sensibly, hidden from the lisp interfaces.  The
sub-char-tables' syntax is basically

#^^[depth min_char ...]

where the ellipsis is the values at the lower level.

I suspect that the char-table syntax is basically

#^^[default parent purpose ascii_block ...]

but I haven't verified the order of those first four values, and indeed
I may have them wrong.

(In case anyone is wondering, the Emacs code space consists of 64
planes, rather than Unicode's 'measly' 17.)

Richard.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]