[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: column numbers for non-ASCII characters in error messages
From: |
John Cowan |
Subject: |
Re: column numbers for non-ASCII characters in error messages |
Date: |
Sat, 18 Dec 2010 17:59:10 -0500 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Ben Pfaff scripsit:
> * Byte offset from beginning of line.
Definitely no.
> * Display width from beginning of line, with double-wide
> characters counting as two positions and combining
> characters (e.g. combining accents) counting as zero
> positions.
This is only an approximation to true display width, but it's a pretty
good one. The only thing I would add is to count conjoining initial jamo
<1100-115F> as double-width and the other conjoining jamo <1160-11FF>
as zero-width, thus making the resulting assembled hangul syllable
always double-width rather than varying between double- and triple-width.
The only downside is that in old Korean script a syllable sometimes has
more than one initial jamo, but I think that can be lived with.
> * Grapheme clusters (user-visible characters) from
> beginning of line, as specified in Unicode Standard
> Annex #29 "Unicode Text Segmentation".
This is close to what I describe above, but doesn't distinguish between
single- and double-width characters, which I think is a mistake.
--
I marvel at the creature: so secret and John Cowan
so sly as he is, to come sporting in the pool address@hidden
before our very window. Does he think that http://www.ccil.org/~cowan
Men sleep without watch all night?