[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] rfc: small simplification to paths.cc/constants.cc
From: |
Zack Weinberg |
Subject: |
[Monotone-devel] rfc: small simplification to paths.cc/constants.cc |
Date: |
Thu, 13 Jul 2006 13:17:39 -0700 |
Currently the knowledge of which characters are not allowed in a
pathname is split between paths.cc and constants.cc.
paths.cc:has_bad_chars is the sole user of
constants.cc:illegal_path_bytes, but adds more to the set (notably
backslash). I note also that this code is all marked as "must be
super fast" but has_bad_chars uses a relatively inefficient algorithm.
This patch deletes illegal_path_bytes and reduces has_bad_chars to a
simple loop with the forbidden bytes expressed in code, rather than
looked up in a table. The LIKELY and UNLIKELY coerce gcc 4.1 into
generating code which is, um, not actively stupid (bug filed).
Thoughts?
zw
* constants.cc (illegal_path_bytes_arr, illegal_path_bytes): Delete.
* constants.hh (illegal_path_bytes): Delete.
* paths.c (has_bad_chars): Code set of forbidden characters
explicitly here.
#
# old_revision [17ed988d5665a99c943bfcc810c1ec9accdcd8d5]
#
# patch "constants.cc"
# from [942d3eebad05095d859d2641150968f01f37c95e]
# to [b812f3fff900905f174e164024e18c52ff8ffdad]
#
# patch "constants.hh"
# from [7812034aa4a4a35decd8018d849102c06623bcd4]
# to [c5fe8274ac31f96c9cc610e6f6ee8cbed2079aa7]
#
# patch "paths.cc"
# from [4c98560ebccf3c70cfa26b985403a0a3fd66fb90]
# to [79d3e24da249f12334bbb673686ed71159d21fb5]
#
============================================================
--- constants.cc 942d3eebad05095d859d2641150968f01f37c95e
+++ constants.cc b812f3fff900905f174e164024e18c52ff8ffdad
@@ -110,22 +110,6 @@
string const regex_legal_key_name_bytes("(address@hidden)");
- // all the ASCII characters (bytes) which are illegal in a (file|local)_path
-
- char const illegal_path_bytes_arr[33] =
- {
- 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
- 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
- 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
- 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
- 0x7f, 0x00
- }
- ;
-
- char const * const illegal_path_bytes =
- illegal_path_bytes_arr
- ;
-
// merkle tree / netcmd / netsync related stuff
size_t const merkle_fanout_bits = 4;
============================================================
--- constants.hh 7812034aa4a4a35decd8018d849102c06623bcd4
+++ constants.hh c5fe8274ac31f96c9cc610e6f6ee8cbed2079aa7
@@ -89,9 +89,6 @@
// boost regex that matches the bytes in legal_key_name_bytes
extern std::string const regex_legal_key_name_bytes;
- // all the ASCII characters (bytes) which are illegal in a (file|local)_path
- extern char const * const illegal_path_bytes;
-
// remaining constants are related to netsync protocol
// number of bytes in the hash used in netsync
============================================================
--- paths.cc 4c98560ebccf3c70cfa26b985403a0a3fd66fb90
+++ paths.cc 79d3e24da249f12334bbb673686ed71159d21fb5
@@ -121,6 +121,8 @@
// -- no doubled /'s
// -- no trailing /
// -- no "." or ".." path components
+//
+// ??? Ensure use of UTF8 encoding internally, validate encoding here.
static inline bool
bad_component(string const & component)
{
@@ -138,25 +140,13 @@
static inline bool
has_bad_chars(string const & path)
{
- static bool bad_chars_init(false);
- static u8 bad_table[128] = {0};
- if (UNLIKELY(!bad_chars_init))
+ for (string::const_iterator c = path.begin(); LIKELY(c != path.end()); c++)
{
- string bad_chars = string("\\") + constants::illegal_path_bytes
+ string(1, '\0');
- for (string::const_iterator b = bad_chars.begin(); b !=
bad_chars.end(); b++)
- {
- u8 x = (u8)*b;
- I((x) < sizeof(bad_table));
- bad_table[x] = 1;
- }
- bad_chars_init = true;
- }
-
- for (string::const_iterator c = path.begin(); c != path.end(); c++)
- {
u8 x = (u8)*c;
- if (x < sizeof(bad_table) && bad_table[x])
- return true;
+ // 0x5c is '\\'; we use the hex constant to make the dependency on
+ // ASCII encoding explicit.
+ if (UNLIKELY(x <= 0x1f || x == 0x5c || x == 0x7f))
+ return true;
}
return false;
}
- [Monotone-devel] rfc: small simplification to paths.cc/constants.cc,
Zack Weinberg <=