monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux


From: Nathan Myers
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Sat, 4 Dec 2004 09:34:33 -0800
User-agent: Mutt/1.3.28i

On Fri, Dec 03, 2004 at 11:40:19PM -0800, Nathaniel Smith wrote:
> On Fri, Dec 03, 2004 at 10:20:13PM -0800, Nathan Myers wrote:
> > On Sat, Dec 04, 2004 at 12:16:08AM -0500, graydon hoare wrote:
> > > Nathan Myers wrote:
> > > 
> > > > Four words gives you forty bits, extremely unlikely to collide,
> > > > yet are easier to recall (or say) than six hex digits that yield
> > > > only 24 bits.  ...  I don't see any real reason to retain the hex 
> > > >format.
> > > 
> > > there's actually an old branch njs put together to do this with a format 
> > > called "bibblebabble".. 
> > 
> > Bibblebabble, IIRC, wasn't actual words.  
> 
> This is a feature, not a bug.  

No.  Nonsense words might be easy enough to say, but they remain very 
hard to remember.  Our brains are tuned to remember words; you can't 
have language neutrality and still take advantage of the acres of 
cortex tuned to actual words.  That advantage completely overwhelms 
any benefit of language neutrality.  Anyway, your language-neutrality 
is illusory: what you consider pronounceable syllables is very much 
tuned to your native language.  There's no getting around that.

Of course the word list to use could be a command-line or configuration
option.  Users of different languages could submit appropriate word 
lists, 4K bytes each, that may all be distributed with all releases of 
the program.  The command language itself uses English, although I 
expect it will be i12ized someday.  For interoperability, the natural 
convention would be the default encoding, except where two users agree 
on a different language.

> ... And with tab-completion, ...

To rely on tab-completion is to admit failure.  Commands involving 
non-unique hash fragments ought to fail entirely, with a distinguished
exit() argument.  Adding an third or fourth three-letter word is 
practically certain to yield a unique match, and my shell history 
gives me all the interactivity I need.  Tab-completion doesn't script 
well, or interact at all well a GUI front-end.

> > The reason this keeps coming up is that the hex codes are probably the 
> > most immediately intimidating thing about Monotone.  Those terrifying 
> > blocks of hex codes probably keep more people away from Monotone than 
> > any sort of unfamiliarity with its concepts, or any lack of features.  
> 
> Yeah.  I dunno if they're the biggest thing keeping people away --
> we seem to be doing okay getting new users right now, and if we keep
> growing then buzz will tend to overcome such fears, I think.  

No, we're _not_ doing OK.  We are falling farther and farther behind 
Arch and Subversion.  Arch is held back mainly by its incidental 
weirdnesses in, e.g., file name conventions.  In public discussion 
on SCM systems, Monotone is rarely more than a footnote.  Even Darcs 
gets more respect!

Participants on this list are self-selected as tolerating the hex
strings.  To any normal person, they simply scream "not ready for
prime time".  A public announcement that we have done away with hex 
hash output would raise the project's stature immeasurably, more than 
any other single improvement.  Every other possible improvement is 
hard to discover without really studying the manual or using the 
program in production.

> I haven't been making any argument for Bibblebabble, though, because I
> don't know how strong these human factor effects are.  

Human-factor effects are incredibly strong in normal people.  (I.e. 
people not on this list.)  Bibblebabble didn't seem compelling on the 
first go-around because it doesn't go far enough in addressing them.   

> > I would also suggest inserting a period after the word where, at the
> > moment, the revision is uniquely identified in the current state of the
> > repository.  I expect it would usually appear after the second word
> > even in big repositories.
> 
> Huh, that's a neat idea -- nonintrusive and simple.  

We can do a _lot_ better.  

Each repository might have a parameter U which is the number of bits
necessary to uniquely distinguish all the hashes in it.  Any time a new
entry is made which is not unique in that many bits, U is automatically
increased.  (Every repository still uses full 160-bit hashes.)  When
Monotone writes out a hash, it only expresses U bits worth of whatever
encoding we end up with, unless it has been asked to be verbose.

New repositories might start U at 40 bits, for interoperability.  Forty 
bits is ten hex digits, which is more than can reasonably be recalled 
or said.  It's three bibblebabble words or eight Oren syllables, which 
I would be hard-pressed to remember for more than ten seconds.  However, 
it's just four three-letter words, which even a college professor could 
recall a day later.  To remember two or three hashes is still within 
most people's capacity, where the hex or babble would be impossible.  
The repository that actually needed to increase U beyond 40 would be 
rare indeed.

The result is that most output that mentions hashes displays only
(e.g.) four three-letter words.  That would intimidate nobody.

Nathan Myers
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]