monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] url schemes


From: Timothy Brownawell
Subject: Re: [Monotone-devel] url schemes
Date: Sat, 22 Mar 2008 19:19:26 -0500

On Sat, 2008-03-22 at 16:19 +0100, Markus Schiltknecht wrote:
> Hi,
> 
> since I've been critiquing Timothy's current extensions of the URL 
> scheme, I think I need to try coming up with something better. Or at 
> least help in doing so. First of all, I've put together a list of URL 
> schemes we are using in and around monotone, including nuskool, which 
> probably is what we will use someday.
> 
> In the first part of the URL, we obviously encode the protocol and 
> database location in the URL. Existing samples are:
> 
>   * file:/path/to/monotone/db.mtn
>   * ssh://host[:port]/path/to/monotone/db.mtn
> 
> And for mtndumb, we already have:
> 
>   * http[s]://host[:port]/path/to/repo
>   * [s]ftp://[user[:address@hidden:port]/path/to/repo
>   * file:/path  (or file:///path??)
> 
> Upcoming URLs to specify a database location might be:
> 
>   * mtn://host[:port]           (as proposed for netsync)
>   * http://host[:port]/path/to/scgi       (as in nuskool)
>   * xmpp:[//address@hidden/address@hidden
>             (as recently proposed on IRC - might somehow
>                               work with nuskool, someday)
>   * pgsql://user:address@hidden:port/database/schema
>                                        (pipe dreaming...)
> 
> 
> Often enough, specifying a database isn't enough, because we want to 
> address only parts of the repository, i.e. only a certain brach, only a 
> revision or even only a single file delta.
> 
> Almost all of the above protocols support additional slashes and more 
> path components after the database. The only exception being pgsql, 
> which isn't really much of a standard URL scheme anyway, AFAICT. (In 
> case of an underlying filesystem - i.e. file and ssh - it should be 
> possible to walk down the path and use the first monotone dabatase or 
> monotone dumb data directory you find. That would only prevent you from 
> accessing a monotone database file within a dumb data directory, but 
> that wouldn't make much sense anyway).

For ssh transport the remote system would have to send back the db path
it found, so we'd know where in the URL to start looking for parameters.
Putting parameters into the query string instead of appending them to
the path gets around this rather nicely.

> Most protocol types also support an argument list, separated by & - but 
> not all of them. Exceptions are the dumb ones, which cannot parse 
> arguments, because there's no clever server to process them. For pgsql, 
> arguments are often used to specify options for the database connection, 
> but as mentioned above, it's not really a standard - we could certainly 
> use some monotone specific arguments, if needed.

If the dumb server is being accessed through monotone, we can translate
the provided URL however we have to.

> Now, the question which started that discussion is, what should the rest 
> of the URL look like? IMO, we should take a look at existing and planned 
> use cases. Then take care they don't conflict with each other.
> 
> The only existing rest-URL-scheme is from mtndumb. However, that one 
> uses a rather meaningless scheme to retrieve data from a repository. It 
> looks like it was designed to resemble the merkle trie, while still 
> providing a good compromise with round trips required:
> 
>   $DB/DATA
>   $DB/HASHES_
>   $DB/HASHES_??  (multiple times, where ?? are the first two hex chars)
>   ...
> 
> Then, there are the planned nuskool commands. Those are currently 
> encoded entirely in JSON. The HTTP client requests the same URL every 
> time, and encodes the query in JSON. ATM nuskool doesn't support branch 
> inclusion or exclusion patterns. The commands currently are:
> 
>   * inquiring revisions: asks the server if it has certain revisions
>   * getting descendants: querying the ancestry map of the server
>   * getting (pulling) a revision
>   * putting (pushing) a revision
>   * getting file data
>   * putting file data
>   * getting file delta
>   * putting file delta
> 
> 
> These are current facts and observations, or am I missing something 
> important?
> 
> Then, there are wishes and feature requests. I personally find the 
> following ones very compelling:
> 
>   * mtn itself should be able to talk to dumb servers
>   * it should be possible to do checkouts from remote databases
>   * mtn should feature a simple API for 3rd party tools

Isn't this what 'mtn automate' is supposed to be for?

>   * faster and firewall compatible protocol (covered by nuskool)

Faster is good, but it doesn't always make sense to tunnel everything
over HTTP. I think cert refinement in particular probably isn't a good
match.

> Taking all of that together, to me this smells very much like we need a 
> RESTful API. One which is easy to read, understand and remember, simple 
> to process and universally usable for all supported protocols (as far as 
> possible). What I have in mind would look somewhat like this:
> 
>   * GET $DB/capabilities: inquire capabilities of that mtn repository
>             (i.e. if arguments are supported or not)
>   * GET/PUT $DB/revision/$HASH/data: pull or push a revision
>   * GET/PUT $DB/file_data/$HASH: pull or push file data
>   * GET/PUT $DB/file_delta/$HASH: pull or push file delta
>   * GET $DB/branch/$BRANCHNAME/heads: get heads of $BRANCHNAME
>   * GET $DB/revision/$HASH/inquire: inquire *one* revision
>   * GET $DB/revision/$HASH/descendants: fetch descendants of a revision
> 
> This might appear http centric, but think about it: ftp, file and ssh, 
> maybe even xmpp, all of these provide put and get methods in a way. 
> (Even if pushing to dumb servers might not work - at least not without 
> some additional processing on the server side. Or maybe with proper 
> authentication support, so clients can update meta data on the dumb 
> server?). And as http is about the best known protocol, so what's bad 
> about being http centric? ;-)
> 
> For browsable protocols which support index files (like http[s] and 
> ftp[s]) we could offer those for the following URLs:
> 
>   * GET $DB/:    a listing of branches in the repo, general purpose
>                  repository information and statistics, etc..
>   * GET $DB/revision/$HASH/: a browsable directory tree
>   * GET $DB/branch/$BRANCHNAME/: some branch information, maybe a graph
>                  with the most recent revisions, links to the branch
>                  heads and to sub-branches
> 
> And others, but you get the point...
> 
> 
> What's important for me is, that these URL schemes should be compatible 
> to another. I would find it a waste of opportunity, if we would now specify:
> 
>   $DB/$BRANCHNAME[?$PATTERNS]
> 
> ..or similar for the mtn (i.e. netsync) protocol, because it certainly 
> conflicts with future extensions for other protocols.
> 
> While the following is longer and more to type, it's certainly more 
> cross-protocol compatible and wouldn't prevent future extensions:
> 
>   $DB/branch/$BRANCHNAME?PATTERNS
> 
> In other words: omitting that "branch" in between there would restrict 
> us from providing other resources. Or forcing us to use different URL 
> schemes for different protocols, i.e.:
> 
>   $DB/$BRANCHNAME for mtn://
> 
> but:
> 
>   $DB/branch/$BRANCHNAME for http://
> 
> ..which would certainly confuse people.

netsync doesn't take branch names, just patterns (some of which may be
equivalent to branch names).

> As another, minor point, IMO the second is also easier to read and 
> understand. A good (but admittedly deprecated) example might be:
> 
>   http://venge.net/net.venge.monotone
> 
> Looks quite confusing to me, where as:
> 
>   http://venge.net/branch/net.venge.monotone
> 
> Makes the thing easier to understand. Especially for starters, I think.
> 
> 
> So, that got rather longish now. Thanks for being with me so far. I'm 
> curious on your opinions, thoughts and criticism.

There needs to be a clear separation between the db path and any
parameters. http://host/path/to/app/followed/by/parameters can work for
web apps because all processing of the URL happens on the server, which
already knows do discard /path/to/app when looking for parameters. We
can't do that, because the netsync client also needs to know the
parameters.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]