[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Re: non utf-8 filenames
From: |
Graydon Hoare |
Subject: |
[Monotone-devel] Re: non utf-8 filenames |
Date: |
Thu, 05 Oct 2006 13:59:38 -0700 |
User-agent: |
Thunderbird 1.5.0.7 (Windows/20060909) |
Markus Schiltknecht wrote:
Hi,
Recently, I've stumbled across the following invariant when doing
'mtn ls unknown':
mtn: fatal: std::logic_error: paths.cc:255: invariant
'I(utf8_validate(path))' violated
It's not that throwing a warning would not be good, but it's certainly
not a bug in monotone. For some reason, I just happen to have files in
my working copy which have names that are not UTF-8 encoded. Could we
have a nicer warning here, instead of failing on an invariant? Maybe
even saying, that such files can not be added to the repository?
There's an important difference to examine here:
- There are file names that, while not presently encoded as UTF-8,
can be faithfully transformed to and from Unicode (and thus UTF-8).
These are very common -- several euc, koi, 8859-x, gb and jis
standards fall in this category -- but they are all supposed to
map bijectively to Unicode codepoints. We support these.
- There are file names that cannot be faithfully transformed to and
from Unicode. These are very rare -- possibly some 2202-x standards
-- and we've decided not to support these.
If you have filenames of the latter sort, you are out of luck: our
rosters (internal data structures) only deal in Unicode, so before
monotone can do anything with your filename it tries to convert it to
Unicode.
If you have filenames of the former sort, we should be able to deal with
it. Monotone is supposed to transform host character sets to Unicode
while reading them from disk, and transform back from Unicode to the
host character set when writing back to disk.
If it's not doing so, there's a bug.
-graydon