monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc


From: Nathaniel Smith
Subject: Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc
Date: Sun, 16 Jul 2006 22:52:46 -0700
User-agent: Mutt/1.5.11+cvs20060403

On Sun, Jul 16, 2006 at 01:49:14PM -0700, Zack Weinberg wrote:
> On 7/14/06, Nathaniel Smith <address@hidden> wrote:
> >> +// ??? Ensure use of UTF8 encoding internally, validate encoding here.
> >
> >^^ Hmm?
> 
> I have gotten lost in the conversions and the wrappers, and cannot
> tell what encoding (if any) can be relied upon at this point in the
> code.  The exclusion of characters 00-1f and 7f, but none in the 80-ff
> range, makes me think it's supposed to be utf8 (it's clearly not a
> fixed-width 16- or 32-bit encoding; if it were any single-byte 8859.n
> encoding, we should also exclude 80-9f; any other variable-width
> encoding that I know of requires rather more smarts to find bad
> characters in...)

file_paths are always utf8 internally.

> But if it _is_ guaranteed to be utf8 at this point, there are a number
> of invalid byte sequences that we ought to be weeding out: notably ED
> A0 xx .. ED BF xx and overlength encodings like E0 9F 80; unless we
> have a guarantee from elsewhere that we're not going to get them.  I
> have code (from libcpp) that I can adapt to do this.

See utf8_validate, and the call to it at the top of the file_path
constructor.  (utf8_validate is itself stolen from glib.)

-- Nathaniel

-- 
Sentience can be such a burden.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]