help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most used words in current buffer


From: tomas
Subject: Re: Most used words in current buffer
Date: Mon, 23 Jul 2018 09:34:28 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jul 23, 2018 at 12:09:15AM -0600, Bob Proulx wrote:

[...]

> Again the hash table is good until you need to grow the table.  Then
> there is a pause while memory is shuffled around.

Precisely. To get this "amortized", you usually grow by some constant
factor, and there's where the log n comes in (which a tree has naturally
built in).

For really big hash tables (think all *dbm variations which are not
some cousins of B-trees), you even have several "levels", at which
point the hash tables start resembling B-trees somewhat (see below).

> But that never happens with a balanced tree because that cost is
> always amortized along as you go.

But once you get big, you start "feeling" the structure of your
storage (i.e. that the "storage as a big constant-time access
array" is just a lie). Having two pointers per node is horribly
wasteful, and if those pointers point to different continents
in your RAM, your CPU ends up waiting on cache fills all the
time.

> Also if you need to extract the entries in sorted order then the
> balanced tree is already sorted.  Whereas with the hash table an extra
> sort step is needed.

That's right: once you need sorted walks or even lookup by sorted
index (i.e. "find the first entry whose value is greater or equal
to X" -- or "find all entries whose values are between X and Y"),
ordered trees are (probably) the way to go. But I'd guess, once
the tree becomes significant in size (yeah, handwaving here :)
some kind of B-tree, where you manage several entries in one node,
should be at an advantage.

> One of looking at this is that it is global optimization versus local
> optimization.

And the ratio of lookups per insert. And whether you want a sorted
index.

> In some ways it is swings and roundabouts.

Nicely put :-)

> But in other ways it depends upon what you need.

Absolutely

Cheers
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAltVhQQACgkQBcgs9XrR2kYaIACfdi/XvTyfdbJ9yk4my+eJziMt
yyYAnjw7NadASHDsgDrCJpWdzvDpmxDm
=5dmS
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]