Re: Dealing with different character map formats when mapping glyph indi

From:

Hin-Tak Leung

Subject:

Re: Dealing with different character map formats when mapping glyph indicies to character codes

Date:

Tue, 23 May 2023 16:40:07 +0000 (UTC)

On Tuesday, 23 May 2023, 17:19:46 BST, Craig White <gerzytet@gmail.com> wrote:

> I was looking into how freetype maps character codes to glyph indices, and learned that there are many different formats the character map can be in, not to mention the one-to-many and many-to-one mappings that Werner mentioned.
> Will it be necessary to implement the reverse mapping separately for every cmap format?

Not sure why you need to/want to implement it in Freetype. glyph id is unique per glyph. Some glyphs are not mapped in any character encodings e.g. "symbol fonts with custom encoding vectors" <- there is even a name for such.

Perhaps it is best to STOP thinking about (unicode) characters. Glyphs are shaped drawings with a glyph id, some of them for example, lignatures ("combo characters" like "ff" , "etc"), which correspond to two (unicode) characters. And in Arabic, almost every character have 2 to 4 glyph shapes, called isolated forms and init/medi/fini forms.

I think I actually have a python program which does the reverse-map (for the purpose of dropping some glyphs in the many-to-one scenario). examples/cjk-multi-fix.py in my freetype-py fork ( https://github.com/HinTak/freetype-py/, you might need to switch to the font-diag branch to see it if it is not not the default branch).

The opentype spec / and font tech was created to make looking up in the most frequently used direction (from character encoding to glyph id) fast and easy.