[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Re: user-friendly hash formats, redux
From: |
Oren Ben-Kiki |
Subject: |
Re: [Monotone-devel] Re: user-friendly hash formats, redux |
Date: |
Sat, 4 Dec 2004 20:56:17 +0200 |
User-agent: |
KMail/1.7.1 |
On Saturday 04 December 2004 15:52, Nathaniel Smith wrote:
> > phonetically distinct syllables: { B D F H J K L M R S T V Y } x {
> > a e i o u } - things like "YeRuDa".
>
> Ah, but you still have problems. There are both within-language
> phonetic processes: in English, "ada" and "ata" are pronounced
> identically
But "Ta" and "Da" aren't. Each syllable is separate. From the BB readme
I see things like "obipe". An English speaker will pronounce this
o-bye-p, I suppose, but that would come as a shock to a Finnish-only
speaker (they actually spell things the way they sound. Go figure). I
wouldn't dare to guess how it would be pronounced in French :-)
> -- and cross-language issues: Japanese doesn't have a
> distinct "L" and "R", to take one famous example.
Yes, that's a bummer. That's the only one I know of though - the b/p of
Arabic is covered, as is the American tendency to pronounce r/w the
same way. Of course, north European languages tend to pronounce 'j' as
'y', but at least they'll not mix up their sounds. Sigh. The problem is
you need 13 consonants to get to 64.
> Also, I misremembered; BB actually gets 16 bits/5-letter "word". I
Ah. That makes much more sense.
> Not exactly. Transmission over an audio medium is one problem, sure.
> But it's actually not a very difficult one --- you can use something
> like the "phonetic alphabet" (Alpha/Bravo/Charlie/Delta/...)
Quick, what's G? Gnome? :-) At any rate, if this isn't a goal, then I
must say Nathan's approach seems better. At least you have a chance to
remember the words. And as for being English - if you are not an
English speaker, you are not worse off than when using BibbleBabble. If
you are one, your chances of memorizing the id will be that much
higher. Besides, let's face it; most people will know at least _some_
English, and three letter words are about as "some" as you can get.
> Keeping 4 different distinct ids in working memory is different --
Why keep a distinct id in memory? Whether you are using bb/syl/tlw, or
something else, it is just a presentation/parsing issue. Internally it
is just bit strings, same as today.
> > How does such a program results tell you about how people use ids...
>
> ... stick
> people in an eye-tracker, and stick them down at a machine that's
> recording their mouse movements and keypresses and doing screen
> captures, and analyze the data in terms of various models...
Ah, that sort of program. We did this sort of testing for usability
studies in a company I worked for. "Easy" is the ast word I'd use to
describe it.
> The poor man's version I have in mind, though, just tests things like
> recall span, recognition span, typing speed, etc. -- cognitive
> processes that we can be pretty sure are important.
Assuming you can get people to sit for it... These things are usually
boring as hell. I suppose you could turn it into an engaging game :-).
> ... your
> calculations suggest that 2 words (10 characters) should be enough
> for just about anything :-).
Well, that was just back-of-the-envelope average value, there's the
distribution to consider.(hacking a C++ program to test this... running
it for 2,000,000 ids x 10 times...). OK, here are the results. syl is
my 2-char syllables, bb is your 5-char BibbleBabble, and tlw is
Nathan's 3-letter-words. I'm only showing the maximal and minimal
number of ids that were successfully identified in the 10 trials,
testing upto 2,000,000 ids:
syl/chr bb/chr tlw/chr bits : Were enough for
1/2 1/5 1/3 0 : 1 - 1
1/2 1/5 1/3 1 : 2 - 2
1/2 1/5 1/3 2 : 3 - 2
1/2 1/5 1/3 3 : 4 - 3
1/2 1/5 1/3 4 : 6 - 4
1/2 1/5 1/3 5 : 9 - 4
1/2 1/5 1/3 6 : 11 - 7
2/4 1/5 1/3 7 : 29 - 4
2/4 1/5 1/3 8 : 53 - 31
2/4 1/5 1/3 9 : 63 - 22
2/4 1/5 1/3 10 : 68 - 38
2/4 1/5 2/6 11 : 108 - 62
2/4 1/5 2/6 12 : 153 - 79
3/6 1/5 2/6 13 : 219 - 96
3/6 1/5 2/6 14 : 210 - 138
3/6 1/5 2/6 15 : 317 - 213
3/6 1/5 2/6 16 : 401 - 201
3/6 2/10 2/6 17 : 787 - 367
3/6 2/10 2/6 18 : 1,252 - 451
4/8 2/10 2/6 19 : 1,848 - 553
4/8 2/10 2/6 20 : 1,882 - 843
4/8 2/10 3/9 21 : 2,353 - 1,566
4/8 2/10 3/9 22 : 5,568 - 2,939
4/8 2/10 3/9 23 : 4,980 - 2,419
4/8 2/10 3/9 24 : 9,650 - 3,558
5/10 2/10 3/9 25 : 14,522 - 6,140
5/10 2/10 3/9 26 : 25,823 - 10,320
5/10 2/10 3/9 27 : 32,988 - 16,566
5/10 2/10 3/9 28 : 48,659 - 10,435
5/10 2/10 3/9 29 : 62,809 - 24,949
5/10 2/10 3/9 30 : 88,333 - 35,201
6/12 2/10 4/12 31 : 97,539 - 42,992
6/12 3/15 4/12 32 : 148,945 - 125,579
6/12 3/15 4/12 33 : 157,246 - 118,748
6/12 3/15 4/12 34 : 289,030 - 137,511
6/12 3/15 4/12 35 : 514,452 - 99,846
6/12 3/15 4/12 36 : 421,613
7/14 3/15 4/12 37 : 1,106,307 - 363,971
7/14 3/15 4/12 38 : 1,561,518 - 692,049
7/14 3/15 4/12 39 : >2,000,000 - 987,024
7/14 3/15 4/12 40 : 1,789,355 - 1,297,565
7/14 3/15 5/15 41 : >2,000,000 - 1,013,324
7/14 3/15 5/15 42 : >2,000,000 - 1,359,519
8/16 3/15 5/15 43 : >2,000,000
Well, this gives you an idea. I'm attaching the program if you are
interested.
It is interesting to compare the methods; although bb is in theory the
most dense, it ends up being the most wastefulbecause of the large
quanta . Of course, if you don't bother with computing the minimal
prefix and just use some "safe" constant, that's not a disadvantage.
I must say I'm growing to like Nathan's tlw idea - given a careful
choice of words to minimize the bug/bag issue. It would be the best of
both worlds if you could get rid of the more blatant confusions like
bug/bag and buy/bye by introducing a "few" "nice" non-words. E.g.,
'taz' isn't a word, but it works great (besides, its the name of a
cartoon character :-).
Have fun,
Oren Ben-Kiki
unique-ids.cc
Description: Text Data
- [Monotone-devel] Re: Re: Few remarks, (continued)
- [Monotone-devel] Re: Re: Few remarks, John S. Yates, Jr., 2004/12/03
- [Monotone-devel] Re: Few remarks, Bruce Stephens, 2004/12/03
- [Monotone-devel] Re: Few remarks, graydon hoare, 2004/12/03
- [Monotone-devel] user-friendly hash formats, redux, Nathan Myers, 2004/12/03
- [Monotone-devel] Re: user-friendly hash formats, redux, graydon hoare, 2004/12/04
- [Monotone-devel] Re: user-friendly hash formats, redux, Nathan Myers, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathaniel Smith, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Oren Ben-Kiki, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathaniel Smith, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathaniel Smith, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux,
Oren Ben-Kiki <=
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathan Myers, 2004/12/04
- [Monotone-devel] Re: user-friendly hash formats, redux, Bruce Stephens, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathan Myers, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Derek Scherger, 2004/12/04
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Nathan Myers, 2004/12/04
- [Monotone-devel] Re: user-friendly hash formats, redux, Bruce Stephens, 2004/12/04
- [Monotone-devel] Re: user-friendly hash formats, redux, graydon hoare, 2004/12/05
- Re: [Monotone-devel] Re: user-friendly hash formats, redux, Oren Ben-Kiki, 2004/12/05
- [Monotone-devel] Re: user-friendly hash formats, redux, graydon hoare, 2004/12/05
- [Monotone-devel] Re: user-friendly hash formats, redux, Oren Ben-Kiki, 2004/12/06