monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: [patch] roster-deltas (40% pull speedup)


From: Graydon Hoare
Subject: [Monotone-devel] Re: [patch] roster-deltas (40% pull speedup)
Date: Tue, 22 Aug 2006 14:24:00 -0700
User-agent: Thunderbird 1.5.0.5 (Windows/20060719)

Nathaniel Smith wrote:

The change is very large -- 32 files changed, 2387 insertions(+), 1073
deletions(-) -- and messes about with some very important pieces.  In
particular, if there are bugs in roster reconstruction, that could
jeopardize our sanity checking generally.  So, I'd really appreciate
review of this code before it lands.

First off: fantastic work!

I only have minor nit-picky comments at this point; as far as "the plan" we've been discussing, this looks basically perfect, I'm surprised you got it done so fast.

I haven't carefully gone over the transaction commit / writeback / flush ordering in database.cc, which I think you probably want. I'll try taking a whole-file read soon, but I'd be very surprised if any bugs in there would avoid tripping a fatal assertion due to a missing db row.

Good testcase: download a cvs archive from sourceforge, import it, and sync it back and forth a few times.

Nit picking follows:

--- lru_writeback_cache.hh      7e0017442df3ec9e2a386acb3a424ffffd60a6a7
+++ lru_writeback_cache.hh      7e0017442df3ec9e2a386acb3a424ffffd60a6a7
+/**
+ * @brief Template cache with an LRU removal policy.
+ * @class LRUCache
+ *
+ * @par
+ * This template creats a simple collection of key-value pairs that grows
+ * until the size specified at construction is reached and then begins
+ * discard the Least Recently Used element on each insertion.

Modify this comment to describe the dirty set.

--- roster_delta.cc     1d44ccf2b5bf2911ce0fe1264bed6a33c285173f
+++ roster_delta.cc     1d44ccf2b5bf2911ce0fe1264bed6a33c285173f
@@ -0,0 +1,483 @@
+#include <set>
+#include <map>
+
+#include <boost/lexical_cast.hpp>
+
+#include "safe_map.hh"
+#include "parallel_iter.hh"
+#include "roster_delta.hh"
+#include "basic_io.hh"
+#include "paths.hh"
+
+using boost::lexical_cast;
+
+namespace +{
+
+  struct roster_delta_t
+  {
+    typedef std::set<node_id> nodes_deleted_t;
+    nodes_deleted_t nodes_deleted;
+    typedef std::map<std::pair<node_id, path_component>, node_id> dirs_added_t;
+    dirs_added_t dirs_added;
+    typedef std::map<std::pair<node_id, path_component>, std::pair<node_id, 
file_id> > files_added_t;
+    files_added_t files_added;
+    typedef std::map<node_id, std::pair<node_id, path_component> > 
nodes_renamed_t;
+    nodes_renamed_t nodes_renamed;
+    typedef std::map<node_id, file_id> deltas_applied_t;
+    deltas_applied_t deltas_applied;
+    typedef std::set<std::pair<node_id, attr_key> > attrs_cleared_t;
+    attrs_cleared_t attrs_cleared;
+    typedef std::set<std::pair<node_id, std::pair<attr_key, std::pair<bool, attr_value> 
> > > attrs_changed_t;
+    attrs_changed_t attrs_changed;

Stylistic: I think I'd prefer all the typedefs first, then the slot defs. Also wrap at 80 cols please.

Semantic: I'm slightly concerned that this is the non-invertable representation. The version I was working on (which, granted, is horribly incomplete) was invertable. You can only reconstruct backwards. Does this concern you at all? The only inversion cases I could imagine were just that: imaginary. I guess it's private so we can always revisit.

+
+    // nodes_deleted are automatically removed from the marking_map; these are
+    // all markings that are new or changed
+    typedef std::map<node_id, marking_t> markings_changed_t;
+    markings_changed_t markings_changed;
+
+    void
+    apply(roster_t & roster, marking_map & markings) const;
+  };
+
+  void
+  roster_delta_t::apply(roster_t & roster, marking_map & markings) const
+  {
+    // detach everything that should be detached
+    for (nodes_deleted_t::const_iterator i = nodes_deleted.begin(); i != 
nodes_deleted.end(); ++i)
+      roster.detach_node(*i);
+    for (nodes_renamed_t::const_iterator i = nodes_renamed.begin(); i != 
nodes_renamed.end(); ++i)
+      roster.detach_node(i->first);
+    // delete the delete-able things
+    for (nodes_deleted_t::const_iterator i = nodes_deleted.begin(); i != 
nodes_deleted.end(); ++i)
+      roster.drop_detached_node(*i);
+    // add the new things
+    for (dirs_added_t::const_iterator i = dirs_added.begin(); i != 
dirs_added.end(); ++i)
+      roster.create_dir_node(i->second);
+    for (files_added_t::const_iterator i = files_added.begin(); i != 
files_added.end(); ++i)
+      roster.create_file_node(i->second.second, i->second.first);
+    // attach everything
+    for (dirs_added_t::const_iterator i = dirs_added.begin(); i != 
dirs_added.end(); ++i)
+      roster.attach_node(i->second, i->first.first, i->first.second);
+    for (files_added_t::const_iterator i = files_added.begin(); i != 
files_added.end(); ++i)
+      roster.attach_node(i->second.first, i->first.first, i->first.second);
+    for (nodes_renamed_t::const_iterator i = nodes_renamed.begin(); i != 
nodes_renamed.end(); ++i)
+      roster.attach_node(i->first, i->second.first, i->second.second);

Stylistic: a bit more whitespace and wrapping, please. Also I'm shifting commenting style towards full sentences starting with a capital letter and ending with a period. I know, ghastly. What's happening?!

+
+    // roight, all that tricky tree-rearranging done, just have to do some
+    // individual node edits now

"roight"?

+  void
+  delta_only_in_to(node_t new_n, roster_delta_t & d)
+  delta_in_both(node_t old_n, node_t new_n, roster_delta_t & d)

Smells like duplicated code from roster.cc. Is it refactorable?

+    for (roster_delta_t::nodes_deleted_t::const_iterator i = 
d.nodes_deleted.begin(); i != d.nodes_deleted.end(); ++i)
+    for (roster_delta_t::nodes_renamed_t::const_iterator i = 
d.nodes_renamed.begin(); i != d.nodes_renamed.end(); ++i)
+    for (roster_delta_t::dirs_added_t::const_iterator i = 
d.dirs_added.begin(); i != d.dirs_added.end(); ++i)

... wrap.

--- roster_delta.hh     1939a0f9715abe7b3e01f3c6887fecd5fbb05323
+++ roster_delta.hh     1939a0f9715abe7b3e01f3c6887fecd5fbb05323
@@ -0,0 +1,30 @@
+#ifndef __ROSTER_DELTA_HH__
+
+//////////
+// Experimental roster delta stuff

Axe this comment.

         lru_cache.h                                                    \
+        lru_writeback_cache.h                                          \

Do we really need lru_cache.h to exist? Why not use lru_writeback_cache in all cases and ignore the dirty bits when irrelevant?

-#undef _REENTRANT
-#include "lru_cache.h"
+#include "lru_cache.hh"

If you're using lru_cache.hh (not .h), make sure you change the makefile and commit it. Or just toss lru_cache.{h,hh} as discussed above.

@@ -77,9 +78,10 @@ namespace
 {
   struct query_param
   {
-    enum arg_type { text, blob };
+    enum arg_type { text, blob, s64 };
     arg_type type;
     string data;
+    ::s64 integer;
   };

Hm. I accept that sqlite prefers signed integers, but I sure don't. Can we push the conversion to and from unsigned down inside this interface, so it only happens in database.cc?

@@ -748,6 +784,12 @@ database::fetch(results & res,
                               SQLITE_STATIC);
           }
           break;
+        case query_param::s64:
+          {
+            sqlite3_bind_int64(i->second.stmt(), param,
+                               idx(query.args, param - 1).integer);
+          }
+          break;
         default:
           I(false);
         }

I guess here, in particular. You could isolate it in one line here.

@@ -783,10 +825,19 @@ database::fetch(results & res,
i->second.count++; - E(want_rows == any_rows || want_rows == nrow,
-    F("wanted %d rows got %d in query: %s") % want_rows % nrow % 
query.sql_cmd);
+  I(want_rows == any_rows || want_rows == nrow);
 }

Is there a reason for this hunk? I prefer the former message: more detail for diagnostics.

+unsigned long
+database::roster_size_estimator::operator()(cached_roster const & cr)
+{
+  I(cr.first);
+  I(cr.second);
+  // do estimate using a totally made up multiplier, probably wildly off
+  return cr.first->all_nodes().size() * 175;
+}

Maybe pick this more precisely and move it to constants.{cc,hh} (more generally, maybe those numbers should be come hooks). For example: Assume dir nodes make up a trivial portion of the roster, and file nodes generally have no attributes. Then:

       - 40 bytes of sha1
       - average path component length, say 10 bytes
       - 5 * sizeof(void*) for internal fields
         (+/- a factor of 2 depending on how strings, shared_ptrs, maps
          are implemented)

Should be more like 70-80 bytes.

-database::remove_version(hexenc<id> const & target_id,
-                         string const & data_table,
-                         string const & delta_table)

I wonder if it's wise to remove this function. Isn't it useful?

--- database.hh 5fd8600ce422d017284baa8fa0e38b6899a32e02
+++ database.hh 441546750699828b5f3c25d69e27427009b29d6d
@@ -30,6 +30,9 @@ int sqlite3_finalize(sqlite3_stmt *);
 #include "selectors.hh"
 #include "vocab.hh"
+// hmm, would be better not to include this everywhere...
+#include "lru_writeback_cache.hh"
+

Indeed. Maybe move the caches to a pimpl (can be done later, I don't mind).

Also might want to take the opportunity to remove the hilarious comment about reverse deltas being the most confusing part of the program. Hohoho happy days.

============================================================
--- numeric_vocab.hh    87e847c2ff9af6c90dbe248c244859e4cab7d34e
+++ numeric_vocab.hh    319e12738bcd5ad090a7a0d994a10f9ab7a2a8ff
@@ -20,6 +20,11 @@ typedef uint64_t u64;
 typedef uint32_t u32;
 typedef uint64_t u64;
+typedef int8_t s8;
+typedef int16_t s16;
+typedef int32_t s32;
+typedef int64_t s64;
+

Signed integral types are the devil.

--- roster.cc   d03806d489e8b30fc4564b9a4f025669770403d6
+++ roster.cc   4996a8586b27a298548151e97c3b59e966bcb704
+void
+roster_t::set_delta(node_id nid, file_id const & new_id)

s/set_delta/set_content/

@@ -2439,7 +2486,6 @@ parse_marking(basic_io::parser & pa,
           pa.str(k);
           pa.hex(rev);
           attr_key key = attr_key(k);
-          I(n->attrs.find(key) != n->attrs.end());
           safe_insert(marking.attrs[key], revision_id(rev));
         }

Slightly nervous about this. Why is it ok?

--- roster.hh   43a7100415b7fe11037a50fbd8cdd0cf260b926d
+++ roster.hh   2f33287325fd542a21217aeba673015b0c271912
@@ -14,6 +14,7 @@
#include <boost/shared_ptr.hpp> +#include "basic_io.hh"

You can just forward-declare the namespace basic_io with incomplete types parser and printer in it.

Otherwise looks good.

-graydon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]