monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] url schemes


From: Markus Schiltknecht
Subject: [Monotone-devel] url schemes
Date: Sat, 22 Mar 2008 16:19:03 +0100
User-agent: Mozilla-Thunderbird 2.0.0.9 (X11/20080109)

Hi,

since I've been critiquing Timothy's current extensions of the URL scheme, I think I need to try coming up with something better. Or at least help in doing so. First of all, I've put together a list of URL schemes we are using in and around monotone, including nuskool, which probably is what we will use someday.

In the first part of the URL, we obviously encode the protocol and database location in the URL. Existing samples are:

 * file:/path/to/monotone/db.mtn
 * ssh://host[:port]/path/to/monotone/db.mtn

And for mtndumb, we already have:

 * http[s]://host[:port]/path/to/repo
 * [s]ftp://[user[:address@hidden:port]/path/to/repo
 * file:/path  (or file:///path??)

Upcoming URLs to specify a database location might be:

 * mtn://host[:port]           (as proposed for netsync)
 * http://host[:port]/path/to/scgi       (as in nuskool)
 * xmpp:[//address@hidden/address@hidden
           (as recently proposed on IRC - might somehow
                             work with nuskool, someday)
 * pgsql://user:address@hidden:port/database/schema
                                      (pipe dreaming...)


Often enough, specifying a database isn't enough, because we want to address only parts of the repository, i.e. only a certain brach, only a revision or even only a single file delta.

Almost all of the above protocols support additional slashes and more path components after the database. The only exception being pgsql, which isn't really much of a standard URL scheme anyway, AFAICT. (In case of an underlying filesystem - i.e. file and ssh - it should be possible to walk down the path and use the first monotone dabatase or monotone dumb data directory you find. That would only prevent you from accessing a monotone database file within a dumb data directory, but that wouldn't make much sense anyway).

Most protocol types also support an argument list, separated by & - but not all of them. Exceptions are the dumb ones, which cannot parse arguments, because there's no clever server to process them. For pgsql, arguments are often used to specify options for the database connection, but as mentioned above, it's not really a standard - we could certainly use some monotone specific arguments, if needed.


Now, the question which started that discussion is, what should the rest of the URL look like? IMO, we should take a look at existing and planned use cases. Then take care they don't conflict with each other.

The only existing rest-URL-scheme is from mtndumb. However, that one uses a rather meaningless scheme to retrieve data from a repository. It looks like it was designed to resemble the merkle trie, while still providing a good compromise with round trips required:

 $DB/DATA
 $DB/HASHES_
 $DB/HASHES_??  (multiple times, where ?? are the first two hex chars)
 ...

Then, there are the planned nuskool commands. Those are currently encoded entirely in JSON. The HTTP client requests the same URL every time, and encodes the query in JSON. ATM nuskool doesn't support branch inclusion or exclusion patterns. The commands currently are:

 * inquiring revisions: asks the server if it has certain revisions
 * getting descendants: querying the ancestry map of the server
 * getting (pulling) a revision
 * putting (pushing) a revision
 * getting file data
 * putting file data
 * getting file delta
 * putting file delta


These are current facts and observations, or am I missing something important?

Then, there are wishes and feature requests. I personally find the following ones very compelling:

 * mtn itself should be able to talk to dumb servers
 * it should be possible to do checkouts from remote databases
 * mtn should feature a simple API for 3rd party tools
 * faster and firewall compatible protocol (covered by nuskool)


Taking all of that together, to me this smells very much like we need a RESTful API. One which is easy to read, understand and remember, simple to process and universally usable for all supported protocols (as far as possible). What I have in mind would look somewhat like this:

 * GET $DB/capabilities: inquire capabilities of that mtn repository
           (i.e. if arguments are supported or not)
 * GET/PUT $DB/revision/$HASH/data: pull or push a revision
 * GET/PUT $DB/file_data/$HASH: pull or push file data
 * GET/PUT $DB/file_delta/$HASH: pull or push file delta
 * GET $DB/branch/$BRANCHNAME/heads: get heads of $BRANCHNAME
 * GET $DB/revision/$HASH/inquire: inquire *one* revision
 * GET $DB/revision/$HASH/descendants: fetch descendants of a revision

This might appear http centric, but think about it: ftp, file and ssh, maybe even xmpp, all of these provide put and get methods in a way. (Even if pushing to dumb servers might not work - at least not without some additional processing on the server side. Or maybe with proper authentication support, so clients can update meta data on the dumb server?). And as http is about the best known protocol, so what's bad about being http centric? ;-)

For browsable protocols which support index files (like http[s] and ftp[s]) we could offer those for the following URLs:

 * GET $DB/:    a listing of branches in the repo, general purpose
                repository information and statistics, etc..
 * GET $DB/revision/$HASH/: a browsable directory tree
 * GET $DB/branch/$BRANCHNAME/: some branch information, maybe a graph
                with the most recent revisions, links to the branch
                heads and to sub-branches

And others, but you get the point...


What's important for me is, that these URL schemes should be compatible to another. I would find it a waste of opportunity, if we would now specify:

 $DB/$BRANCHNAME[?$PATTERNS]

..or similar for the mtn (i.e. netsync) protocol, because it certainly conflicts with future extensions for other protocols.

While the following is longer and more to type, it's certainly more cross-protocol compatible and wouldn't prevent future extensions:

 $DB/branch/$BRANCHNAME?PATTERNS

In other words: omitting that "branch" in between there would restrict us from providing other resources. Or forcing us to use different URL schemes for different protocols, i.e.:

 $DB/$BRANCHNAME for mtn://

but:

 $DB/branch/$BRANCHNAME for http://

..which would certainly confuse people.


As another, minor point, IMO the second is also easier to read and understand. A good (but admittedly deprecated) example might be:

 http://venge.net/net.venge.monotone

Looks quite confusing to me, where as:

 http://venge.net/branch/net.venge.monotone

Makes the thing easier to understand. Especially for starters, I think.


So, that got rather longish now. Thanks for being with me so far. I'm curious on your opinions, thoughts and criticism.

Regards

Markus





reply via email to

[Prev in Thread] Current Thread [Next in Thread]