[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc |
Date: |
Fri, 14 Jul 2006 23:42:06 -0700 |
User-agent: |
Mutt/1.5.11+cvs20060403 |
On Thu, Jul 13, 2006 at 01:17:39PM -0700, Zack Weinberg wrote:
> Currently the knowledge of which characters are not allowed in a
> pathname is split between paths.cc and constants.cc.
> paths.cc:has_bad_chars is the sole user of
> constants.cc:illegal_path_bytes, but adds more to the set (notably
> backslash). I note also that this code is all marked as "must be
> super fast" but has_bad_chars uses a relatively inefficient algorithm.
> This patch deletes illegal_path_bytes and reduces has_bad_chars to a
> simple loop with the forbidden bytes expressed in code, rather than
> looked up in a table. The LIKELY and UNLIKELY coerce gcc 4.1 into
> generating code which is, um, not actively stupid (bug filed).
Seems fine to me.
> +// ??? Ensure use of UTF8 encoding internally, validate encoding here.
^^ Hmm?
> u8 x = (u8)*c;
> - if (x < sizeof(bad_table) && bad_table[x])
> - return true;
> + // 0x5c is '\\'; we use the hex constant to make the dependency on
> + // ASCII encoding explicit.
> + if (UNLIKELY(x <= 0x1f || x == 0x5c || x == 0x7f))
This could do with a comment about how the innocent looking "u8" there
is critical to the "<=" doing the right thing on machines with signed
chars...
-- Nathaniel
--
"Of course, the entire effort is to put oneself
Outside the ordinary range
Of what are called statistics."
-- Stephan Spender