monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] automate sync remote process startup


From: Stephen Leake
Subject: Re: [Monotone-devel] automate sync remote process startup
Date: Thu, 07 Oct 2010 18:16:27 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (windows-nt)

Thomas Keller <address@hidden> writes:

>>> 3) If multiple branch certs are attached to a single revision, this
>>> revision of course pops up (and is counted) multiple times in the final
>>> result. Would it make a lot of work to group these a little smarter,
>>> i.e. not counting revisions by any single, but multiple branch?
>>> Something which would lead to an output like
>>>
>>> mtn: would send 5 revisions:
>>> mtn:         2 in branches first.branch,
>>> mtn:                       second branch
>>> mtn:         3 in branch first branch
>> 
>> I find this confusing. Mostly because it's not what the current netsync
>> hooks in examples/display_branches.lua do.
>> 
>> For me, the main use of the sent rev count per branch is to remind me
>> that my work actually got to the central server, so I can expect other
>> people to see it. Similarly, if someone tells me they don't see
>> something, I run sync, and if I see the "sent branch" message, I can say
>> "oh, I forgot to sync". 
>> 
>> The exact count doesn't matter much.
>
> I took a look in my crystal ball and foreseed dozens of user questions
> why the sum of revisions in the list does not match the overall sum on
> top... 

Good point. For now, I'll mention that in monotone.texi. 

Note that we haven't gotten those dozens of questions from people using
examples/display_branches.lua. But there are probably more people who
will use --dry-run, since it doesn't involve specifying hook code.

>>> 4) For nvm.basic_io-doc Tim wrote in monotone.texi:
>>>
>>>   Each "symbol" is followed by one or more 'string's and/or 'hex'es.
>>>
>>> The basic_io format for sync / push output is not correct by this
>>> definition, as the "send" and "receive" symbols are not followed by one
>>> string / hash. (see also
>>> http://code.monotone.ca/p/monotone/issues/85/)
>> 
>> Good point. 
>> 
>> This format is supported directly by the basic_io stanza class; it has
>> push_symbol.
>> 
>> It is used in conflict resolutions, for 'resolved_internal'.
>
> Oh well, we should really have defined the format earlier... I didn't
> stumble upon this in the conflicts code because I haven't implemented
> that yet in guitone. Given that this is public already in other
> commands, we should probably adapt the description in Tim's branch.

Ok, I'll do that.

>>> We could make it ourselves easy now and just change the documentation
>>> accordingly, but are value-less, symbol-only lines really something we
>>> want to have / introduce?
>> 
>> I think they are important in the current case. How would you change the
>> output to avoid them? We'd need 'receive' or 'send' in every line:
>> 
>> receive revision "0"
>> receive     cert "1"
>> receive      key "1"
>>     
>> send revision "1"
>> send     cert "6"
>> send      key "1"
>> send   branch "foo2" "1"
>
> I'd probably do receive_revision, send_revision, aso. as symbols (see
> below).

Hadn't thought of that. I already had 'revision', 'cert', 'key' for the
non-dry-run output. However, that counts as complicating the parser in
my design; it already handles multiple symbols per line, this is just
adding to the number of possible symbols.

>>> 5) Likewise the non-dry-run format "symbol symbol" which is used like
>>> "receive revision" or "send key" is not defined either. While the set of
>>> possible values is predefined here, it might be better to just put them
>>> in double quotes.
>> 
>> This format is supported directly by the basic_io stanza class; it has
>> push_str_pair (symbol, symbol) (which actually pushes a symbol pair).
>>     
>> Conflict resolutions use a symbol pair as well, for the conflict type.
>> 
>> I don't see other uses of symbol pairs in the code.
>> 
>> So all of the violations of the (then undocumented) basic-io format were
>> written by me :).
>
> While I can see the use case for single-standing symbols, multiple
> symbols are making everything just a bit too complex (also for parsers).
> Symbols are usually used to identify a specific line and its data. 

Yes, but what's wrong with a symbol also being data?

> If there are multiple symbols in a line, multiple values and multiple
> hashes, how would a proper data structure for that look like? A struct
> with three vector members? Oh god, please no :-/ - we should really
> restrict the format - also in monotone's code.

I agree that one line of a stanza should provide one value.

Note that in the dry run output we have 'branch "foo" "1"', which is
multiple strings.

'automate inventory' outputs 'changes "content" "attrs"'; 'changes' is a
list.

Can't find any others at the moment.

> We also have no "context" notion in this format - and I doubt we want to
> introduce something like
>
>       foo
>
>       bar "1"
>       baz "2"
>
> equals
>
>       foo bar "1"
>       foo baz "2"
>
> right?

Not explicitly. But I did introduce the notion of context for the
automate sync output; revision, cert, and key stanzas have different
meaning when they occur after a 'receive' stanza, rather than after a
'send' stanza. I don't see a problem with that. Context is a standard
notion in parsers. In other automate commands, we state "the order of
the stanzas may change", we are simply not making that claim for this
output; we are guaranteeing it won't change (at least at that level; the
order of revs within receive is not defined).

>> Clearly it is useful for the parser to be able to make a distinction
>> between symbols and strings; symbols have a finite set of values, that
>> can be checked. If we change the second symbol to a string syntax, the
>> higher level semantics still says it has a finite set of values, but I
>> think having the syntax say it as well is a good idea.
>> 
>> It doesn't complicate the parser much.
>> 
>> At this point, I don't think we should change the conflict resolution
>> format. Although I suspect Emacs DVC is the only thing that actually
>> parses it, so the change would be manageable.
>> 
>> So I'm recommending changing nvm.basic-io-doc, and apologize for not
>> realizing it was violated when I reviewed it earlier this week.
>
> I just see that it becomes sufficiently harder to write a proper
> basic_io parser which allows easy lookup of the data in question.
> Wouldn't you also say that a more restrictive, "flat" data format would
> ease things here?

Having already written the DVC parser that easily handles this format,
no :).

The parser needs to handle symbol, string, and hex tokens. In the DVC
parser, there is no restriction on token locations at the lexical level;
it can handle any of the three types at any position on a line. They are
returned as a list of (type, value) pairs. The semantic level then
imposes the token location rules, in accordance with the high-level
definition of the command-specific format.

I can see that if you attempt to do high-level validation in the lexical
level, allowing symbols at more places complicates things. But that's
why the semantic and lexical levels should be split.

> Don't take my wrong sorting in the above use case for granted, we could
> still print out the certs sorted, even grouped. My point was more to
> strip down the complexity of the format a little.

Good point; the sort order does not have to be implied by the format,
and the format does not have to take advantage of the sort order.

I would not mind a flatter format that maintained the sorting, if it
really does make Guitone easier. Then your parser can assume no context,
but my parser can still rely on the context defined by the sorting :).
that's not a big change in my parser; just redefining some strings, and
ignoring revs in some certs.

That would also make it easier to add other sort orders (selected by
options), for other use cases, which would be good.

So I'll give that a try.

And I'll add some of this to nvm.basic_io-doc

>> Ah; another use case; put up the list of stuff that would be
>> transferred, and let the user modify the branch pattern before doing the
>> actual sync. In this case, the names of the received branches are more
>> important (for example monotone.ca?* vs
>> monotone.ca?net.venge.monotone*). Would it be possible to get the
>> to-be-received branch names?
>
> I asked Tim something similar already, but I doubt so, because we can't
> print the data before we actually received them. 

Well, we could modify the server code and the netsync format to allow
sending just branch names. But let's not go there; not worth it just for
this use case.

> So what I was heading for here was a unification of push / pull / sync
> in no-dry-run and push / sync in dry-run mode, because in the latter
> use case we should have all the local data available. For push / sync
> in dry-run mode all we can probably really print are the raw numbers.
> To make the format less distinct it _might_ be a good idea to print
> these data also for the non-dry-run mode, so implementors don't have
> to count the received cert stanzas if they're just after the raw
> numbers and they could also have a check whether they processed
> everything.
>
> My main point is to make the output format of push/pull/sync as
> consistent as possible for both, the dry-run and the non-dry-run use
> case.

Right, I'll give that a try.

-- 
-- Stephe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]