monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] automate sync remote process startup


From: Thomas Keller
Subject: Re: [Monotone-devel] automate sync remote process startup
Date: Thu, 07 Oct 2010 15:59:57 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.11) Gecko/20100714 SUSE/3.0.6 Lightning/1.0b2pre Thunderbird/3.0.6

Am 07.10.2010 15:17, schrieb Stephen Leake:
> Thomas Keller <address@hidden> writes:
> 
>> Am 06.10.10 14:56, schrieb Stephen Leake:
>>> Thomas Keller <address@hidden> writes:
>>>
>>>> Am 02.10.2010 11:30, schrieb Stephen Leake:
>>>>> [...] So this is an Emacs DVC problem, not a mtn problem.
>>>>>
>>>>> Which means that the nvm.netsync.automate branch is ready, except the
>>>>> --dryrun stuff is not tested or documented; I'll work on that.
>>>>
>>>> Coolness, drop me a note when you finished it and I'll review it. This
>>>> branch will then contain all the previous work from Tim, right?
>>>
>>> This is now complete, and all tests pass on Mingw and Debian (Cygwin
>>> tests running, very slowly with some failures).
>>>
>>> This includes Tim's work on 'sync --dry-run', and my work on basic_io
>>> output from 'automate sync'.
>>
>> My observations so far:
>>
>> 1) Even though --dryrun is given, mtn gives a lot of unneeded chatter,
>> especially the following:
>>
>> mtn: bytes in | bytes out | revs in | revs out
>> mtn:   10.1 k |    13.4 k |       0 |        0
>> mtn: successful exchange with <server>
> 
> The bytes in/out is meaningful, since you are talking to the server.
> The revs in/out is not needed, but I'm not sure it's harmful; it's at
> least familiar :).

Yes, it looks indeed familiar, but since these should always keep at 0,
the information there is redundant, or better, useless. Its not a big
dealbreaker for me, it was just unexpected that this ticker line popped
up at all.

>> mtn: note: your workspace has not been updated
> 
> I vote for eliminating this entirely from sync. Which workspace is it
> talking about (I have many that are potentially impacted by every sync),
> and why would I expect it to be updated?

Its the workspace you're currently in. If you pull "some.branch" and
your workspace is at the head of this branch, mtn pull --update will
pull the branch and update your workspace right along. I find this very
useful and already used it many times.

> Currently, 'pull' does 'maybe_do_update', but 'sync' does not. Why?

I don't know, maybe this is a bug? Anyways, the
"maybe_workspace_updater" shouldn't even be called in dry-run mode, that
was my whole point :)

>> Could we prevent the initialization of the transfer tickers 
> 
> I'll see if I can figure this out.

Thanks!

>> 3) If multiple branch certs are attached to a single revision, this
>> revision of course pops up (and is counted) multiple times in the final
>> result. Would it make a lot of work to group these a little smarter,
>> i.e. not counting revisions by any single, but multiple branch?
>> Something which would lead to an output like
>>
>> mtn: would send 5 revisions:
>> mtn:         2 in branches first.branch,
>> mtn:                       second branch
>> mtn:         3 in branch first branch
> 
> I find this confusing. Mostly because it's not what the current netsync
> hooks in examples/display_branches.lua do.
> 
> For me, the main use of the sent rev count per branch is to remind me
> that my work actually got to the central server, so I can expect other
> people to see it. Similarly, if someone tells me they don't see
> something, I run sync, and if I see the "sent branch" message, I can say
> "oh, I forgot to sync". 
> 
> The exact count doesn't matter much.

I took a look in my crystal ball and foreseed dozens of user questions
why the sum of revisions in the list does not match the overall sum on
top... You're right that the current hooks don't provide this fancy
grouping, but I'd still find it nice if it would work like this. Again,
not a big deal breaker, as long as not dozens of users complain...

>> 4) For nvm.basic_io-doc Tim wrote in monotone.texi:
>>
>>   Each "symbol" is followed by one or more 'string's and/or 'hex'es.
>>
>> The basic_io format for sync / push output is not correct by this
>> definition, as the "send" and "receive" symbols are not followed by one
>> string / hash. (see also
>> http://code.monotone.ca/p/monotone/issues/85/)
> 
> Good point. 
> 
> This format is supported directly by the basic_io stanza class; it has
> push_symbol.
> 
> It is used in conflict resolutions, for 'resolved_internal'.

Oh well, we should really have defined the format earlier... I didn't
stumble upon this in the conflicts code because I haven't implemented
that yet in guitone. Given that this is public already in other
commands, we should probably adapt the description in Tim's branch.

>> We could make it ourselves easy now and just change the documentation
>> accordingly, but are value-less, symbol-only lines really something we
>> want to have / introduce?
> 
> I think they are important in the current case. How would you change the
> output to avoid them? We'd need 'receive' or 'send' in every line:
> 
> receive revision "0"
> receive     cert "1"
> receive      key "1"
>     
> send revision "1"
> send     cert "6"
> send      key "1"
> send   branch "foo2" "1"

I'd probably do receive_revision, send_revision, aso. as symbols (see
below).

>> 5) Likewise the non-dry-run format "symbol symbol" which is used like
>> "receive revision" or "send key" is not defined either. While the set of
>> possible values is predefined here, it might be better to just put them
>> in double quotes.
> 
> This format is supported directly by the basic_io stanza class; it has
> push_str_pair (symbol, symbol) (which actually pushes a symbol pair).
>     
> Conflict resolutions use a symbol pair as well, for the conflict type.
> 
> I don't see other uses of symbol pairs in the code.
> 
> So all of the violations of the (then undocumented) basic-io format were
> written by me :).

While I can see the use case for single-standing symbols, multiple
symbols are making everything just a bit too complex (also for parsers).
Symbols are usually used to identify a specific line and its data. If
there are multiple symbols in a line, multiple values and multiple
hashes, how would a proper data structure for that look like? A struct
with three vector members? Oh god, please no :-/ - we should really
restrict the format - also in monotone's code.

We also have no "context" notion in this format - and I doubt we want to
introduce something like

        foo

        bar "1"
        baz "2"

equals

        foo bar "1"
        foo baz "2"

right?

> Clearly it is useful for the parser to be able to make a distinction
> between symbols and strings; symbols have a finite set of values, that
> can be checked. If we change the second symbol to a string syntax, the
> higher level semantics still says it has a finite set of values, but I
> think having the syntax say it as well is a good idea.
> 
> It doesn't complicate the parser much.
> 
> At this point, I don't think we should change the conflict resolution
> format. Although I suspect Emacs DVC is the only thing that actually
> parses it, so the change would be manageable.
> 
> So I'm recommending changing nvm.basic-io-doc, and apologize for not
> realizing it was violated when I reviewed it earlier this week.

I just see that it becomes sufficiently harder to write a proper
basic_io parser which allows easy lookup of the data in question.
Wouldn't you also say that a more restrictive, "flat" data format would
ease things here?

>> 6) Under what circumstances are received certs listed under "received
>> cert" and at what time under "received revision"? If only a certificate
>> is transferred, but not the revision of this certificate? 
> 
> Correct.
> 
>> Do we really need a cert-grouping-by-revision in first instance or
>> wouldn't it be enough to have one format where each cert stanza always
>> gets an identifying
>>
>>      revision [123abc]
>>
>> at the end? 
> 
> This was partly motivated by what examples/display_branches.lua
> currently does. But it's also the prefered display grouping for my use
> case.
> 
> My use case for the sync output is to allow the user to determine what
> workspaces need to be updated, or newly checked out.
> 
> Certs that accompany revisions are irrelevant in that use case; in the
> current format, they are easily ignored.
> 
> Branch certs that do not accompany revisions indicate new branches, that
> need to be newly checked out.
> 
> Other certs are simply ignored.

I just headed for a less complex format specification here again
basically. I greatly prefer to parse a simple text-based format and make
logic assumptions on the consumed data later on. Here, a lot of logic is
already hidden inside the format itself, which makes the parser complex.

>> Maybe it would also be a good idea to make the format a bit more flat,
>> i.e.
>>
>> send_revision [123abc...]
>>
>> send_cert "branch"
>>     value "net.venge.monotone"
>>       key [123abc...]
>>  revision [123abc...]
>>
>> receive_revision [123abc...]
>>
>> receive_cert "author"
>> ....
>>
>> so we and the implementor has to rely less on the particular placement
>> of the information. 
> 
> With this flat format, the front-end has to do the sorting that is
> currently done by the code in cmd_netsync.cc print_info_auto. That code
> is almost a duplicate of code in network/netsync_session.cc
> netsync_session::on_end.
> 
> I did start with a similar flat format, but changed it to the nested
> one, because that sorting code was easier to write in C++ than in Emacs
> lisp.

Don't take my wrong sorting in the above use case for granted, we could
still print out the certs sorted, even grouped. My point was more to
strip down the complexity of the format a little.

>> 8) Please read again over the "Output format" section in monotone.texi -
>> some sentences sound incomplete to me, but then I'm not a native speaker
>> either, so don't hit me if everything is correct :)
> 
> You are right; fixed now. 
> 
> Some of the other 'output format' sections have a similar style
> (non-sentences), but they are much shorter, so it works better there.

Ok, nice, then it was not me this time :)

>> 9) For the sake of consistency - would it make sense / is it possible to
>> unify the basic_io output for the dry-run and non-dry-run version when
>> we know what we want to push somewhere? I understand that we have no
>> information of incoming items we just haven't received yet, but we
>> should surely have the information for the outgoing items, right?
> 
> It depends on the use case for dry-run. I'm not clear what that is; the
> best I can think of is "do I have time to do this push over this slow
> cell-phone modem, or should I wait till I get to a better connection"?
> For that, all you need is some sense of how much data is involved. Of
> course, we could compute the actual number of bytes to transfer, which
> would be more useful.
> 
> But I guess for automate, we should be consistent, and err on the side
> of outputting too much data; we assume there is a front end that filters
> and reformats.
> 
> So I'll look at this.
> 
> Ah; another use case; put up the list of stuff that would be
> transferred, and let the user modify the branch pattern before doing the
> actual sync. In this case, the names of the received branches are more
> important (for example monotone.ca?* vs
> monotone.ca?net.venge.monotone*). Would it be possible to get the
> to-be-received branch names?

I asked Tim something similar already, but I doubt so, because we can't
print the data before we actually received them. So what I was heading
for here was a unification of push / pull / sync in no-dry-run and push
/ sync in dry-run mode, because in the latter use case we should have
all the local data available. For push / sync in dry-run mode all we can
probably really print are the raw numbers. To make the format less
distinct it _might_ be a good idea to print these data also for the
non-dry-run mode, so implementors don't have to count the received cert
stanzas if they're just after the raw numbers and they could also have a
check whether they processed everything.

My main point is to make the output format of push/pull/sync as
consistent as possible for both, the dry-run and the non-dry-run use case.

Thomas.

-- 
GPG-Key 0x160D1092 | address@hidden | http://thomaskeller.biz
Please note that according to the EU law on data retention, information
on every electronic information exchange might be retained for a period
of six months or longer: http://www.vorratsdatenspeicherung.de/?lang=en


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]