[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: [patch] roster-deltas (40% pull speedup)
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] Re: [patch] roster-deltas (40% pull speedup) |
Date: |
Wed, 6 Sep 2006 14:46:30 -0700 |
User-agent: |
Mutt/1.5.12-2006-07-14 |
On Wed, Sep 06, 2006 at 01:47:58AM -0700, Graydon Hoare wrote:
> Nathaniel Smith wrote:
>
> >Thanks for looking it over -- I know you don't have much time these
> >days :-).
>
> s/have/make/. It'd be dishonest to claim innocence; I'm just not
> allocating as much time for this stuff anymore.
A philosophical difference, perhaps... if you don't want to do it, you
shouldn't, and people can hardly be blamed for what they find they
prefer to spend time on :-).
> >Well, it's a tricky bit of state tracking. For instance, at one
> >point, I had converted things so that rosters were indexed by
> >roster_id, but hadn't actually changed their storage format or
> >anything yet. I was still caching them in the vcache. I only
> >realized much later that this meant there was a major bug -- because
> >rollback() does not flush the vcache, I could have left stale rosters
> >that never existed in there... this doesn't apply to files in the
> >vcache, because they're indexed by content, but it did to rosters once
> >I switched them over to being indexed by...index.
>
> Well, I don't immediately see any places you're not handling. It's late
> and there's an increasing amount of state in this class, so I'm not very
> confident in that assessment, but I don't know what else to do. I read
> through the methods and their sub-methods using etags. It seems coherent.
>
> I mean, you put them in the write-through cache during put_roster, you
> flush them to disk on commit, you cancel them all on rollback, you
> consult the cache on get_roster_version and populate it on cache-misses.
> Sounds sufficient to me.
'kay; sounds sufficient to me too, but it's always good to get a
sanity check.
> Three points stand out:
>
> 1. I'm nervous about handing out shared pointers to cache entries.
> I see they're const and all, so realistically I can't imagine
> anything that'd go wrong.. it's not like we had anything in
> the old code to prevent fetching soon-to-be-cancelled writes
> either, it just seems a bit ominous.
>
> Is it possible that the db invalidates a cache entry and some
> caller will keep the cache entry alive, thinking it's a real
> roster? If it does, what are the possible consequences?
This is a general problem with transactional stores, right, you might
do BEGIN/write/read/ROLLBACK and the read becomes invalid? I don't
have anything super useful to say about the problem in this particular
case -- I just tried to make the shared_ptr stuff have exactly the
semantics as one would get with a copying approach like we use
elsewhere (e.g., for files, for revisions proper, etc.).
> 2. The manifest-reconstruction stuff is still present. Soon this
> should all be deleted. Maybe now? I don't know. I'm surprised
> you kept it alive. If you're migrating the schema, why not just
> toss the manifest tables entirely? They're empty, no?
I got about half way through tearing it out as being completely
ancient, but then I realized that no, 'rosterify' actually needs the
manifest reconstruction stuff too. 0.30 seems a little early to break
migration from pre-0.26.
> 3. Why is the delayed_files map separate from the vcache? Shouldn't
> the two be unified (and dedicated to files alone, ditching
> manifests) now that you have a proper LRUWritebackCache?
Assuming we do want to keep manifests around for now, using a
writeback cache for files is a little bit tricky, because we'd have to
separate out the file and manifest logic somehow, or something...
Anyway, I call "defer!" on this, the current file/manifest logic is
just like it is on mainline, and if this branch creates new
opportunities, we don't have to take all of them before it lands.
> >That's the kind of bug I'm worried about -- somehow dropping a step in
> >the dance between roster_cache insertion, marking things clean/dirty,
> >and letting them eventually get written or not -- depending on how the
> >transaction resolves, and whether a delta gets written straight to the
> >db first. I _think_ I got all this stuff right, but I would feel
> >better if someone independently convinced themself of the same thing.
>
> Again, I think this is even less likely if you make the file cache use
> LRUWritebackCache too: identical code paths and relationship to
> transactions as roster cache, less opportunity for logic mismatch.
The file logic is already "known good", so I think this is okay for
now.
> If the delta gets "written straight to the db" it's still inside a
> transaction; it'll still get backed out if the transaction rolls back.
> The key is to ensure that everything in memory that might relate to the
> current transaction is written out before issuing "COMMIT", or discarded
> from cache before issuing "ROLLBACK". The fewer paths we have that cause
> write-out or cache-clearing, the easier this is to ensure.
>
> >The other place I'm worried for similar reasons is in the roster delta
> >code -- this is much simpler, but again, I'd appreciate it if someone
> >else independently read through and convinced themselves that we
> >aren't going to accidentally drop some field or whatever.
>
> One thing that stands out here is that you're detaching and attaching in
> random order, not bottom-up / top-down like in the editable tree code.
Yes -- this makes the code much simpler :-). Is this a problem? We
can get away with it because we are not screwing around with paths,
where interpreting one path requires that other nodes be in the proper
place.
> You can add some "use std::foo" entries in here (especially make_pair)
> to cut down on some chatter.
>
> You have a couple long lines still that need wrapping.
Okay.
> It seems a bit wasteful to encode the whole marking of each node that
> changes, rather than just the aspect of the marking that changed. Most
> edits are patches. You're going to write out the parent+name marking and
> all the attr marks in each such delta. This makes the deltas likely
> twice as large as they need to be. Maybe gzip will handle that?
I was being lazy, yeah. Looking at 'db info' on nvm* both migrated
and not, I get:
mainline .roster-delta
number of full rosters: 65 64
nubmer of roster deltas: 9083 7397
full roster bytes: 3408371 3452574
roster delta bytes: 6813125 6121118
total bytes: 10221496 9573692
Probably a lot of the size difference is due to .roster-delta being a
more thrifty about creating multiple reconstruction paths to the same
roster, but there doesn't seem to be too bad a size overhead in any
case.
> Are the function names "delta_only_in_to" and "delta_in_both" really
> correct? It seems they're called when a given *node* is in to/both, not
> a delta. I.e. they should be called "node_only_in_to" and
> "node_in_both". (And really, using "to" as a noun sort of irritates me;
> can you call it "target", "dest" or such?)
Delta is intended as a verb, as in, perform the delta for a node that
is only on one side, or on both :-). But yeah, the names could be
better. Fixed.
> >Hm. I'm not sure. Basically, IIUC, the difference between signed and
> >unsigned is in:
> > 1) comparison operators
> > 2) widening coercion
> > 3) IO/printing/parsing
> >and in particular, arithmetic couldn't care less _which_
> >representatives us silly humans think we're using for our 2^n modulo
> >equivalence classes, the computer'll happily use the same bit strings
> >either way?
> >
> >(1) and (2) are irrelevant here, because these are just uninterpreted
> >blobs, we never _do_ anything with them (also, they're 64 bit, so not
> >much widening going on :-)).
> ...
> >However, I really do not care all that much, and am curious what your
> >reasons are.
>
> This is not much of an argument, and I don't *really* care in the sense
> that I'll hold you up. But you asked, so:
>
> I generally hate signed integers. They are an unmitigated source of
> evil: something to confine to a very small corner of a program, surround
> with an iron box, and glare at.
>
> Once you have any signed value floating around, C will go its best to
> infect everything else with signedness via promotion. It only takes one
> signedness-where-you-didn't-expect-it to cause all manner of
> memory-corrupting security bugs, from underflowing loop counters to
> negative array indices to unexpected sign-extending of "small" numbers
> when widened. I prefer to keep them in a place where they'll infect
> less, sort of like a plague.
>
> (Note that if you complie sqlite with any level of checking for this
> stuff, it's absolutely lousy with mixed-signedness arithmetic. Makes my
> skin crawl)
Hmm, I have another idea -- looking at this code with fresher eyes,
I'm not sure why we have a roster_id at all. There is a 1-1
relationship between revisions and rosters, we always access rosters,
via revisions, and the roster id is arbitrary anyway -- so no point in
allocating these ids and using a bridge table, let's just index
rosters by revision_id.
I just implemented this on .roster-delta.no-roster_id, and it seems to
be working... I'll merge it back into .roster-delta once the tests
finish running.
> >>>-database::remove_version(hexenc<id> const & target_id,
> >>>- string const & data_table,
> >>>- string const & delta_table)
> >>I wonder if it's wise to remove this function. Isn't it useful?
> >
> >Its only use was so that 'db kill_rev_locally' could remove a
> >killed revision's roster (and potentially files?). I have a bit of a
> >passive aggressive relationship with this feature :-).
>
> Really? Hmm, istr having a use for it when we receive redundant storage
> edges, nuking one of the reconstruction paths to save space. I thought
> this was an important space optimization at one time. Is it no longer,
> or am I mis-remembering? Maybe that's "deltify"...
I dunno, maybe we did at some point, but it only had the one caller.
> >>>@@ -2439,7 +2486,6 @@ parse_marking(basic_io::parser & pa,
> >>> pa.str(k);
> >>> pa.hex(rev);
> >>> attr_key key = attr_key(k);
> >>>- I(n->attrs.find(key) != n->attrs.end());
> >>> safe_insert(marking.attrs[key], revision_id(rev));
> >>> }
> >>Slightly nervous about this. Why is it ok?
> >
> >Because the only caller for ros.parse_from is read_roster_and_marking,
> >which immediately calls roster.check_sane_against(marking), which does
> >the same check.
>
> Eh. Unless profiles say it's expensive, I prefer leaving O(1) and
> O(lg(N)) sanity checks in cold code paths. Suppose the path gains
> another caller someday.
The problem is that when parsing a roster_delta, I don't have the
roster it produces. So I can't sanity check the marking_t against
that roster as I parse it.
> Otherwise looks good. Very nice work. I wish I could grant you 100%
> confidence that it's correct, but I can't. All I can say is that I've
> *certainly* pushed buggier code before; I don't have a recipe for
> avoiding bugs. "make backups" (seriously: save a database do a $2 DVD
> every few months, you'll be glad you did someday)
Sure, we do what we can...
I already added a check for sanity on every roster before we put it
into the cache; I think I'll add a check of the manifest_id too
(using revision_ids to index rosters makes this easier too), and
then call it done.
-- Nathaniel
--
"If you can explain how you do something, then you're very very bad at it."
-- John Hopfield