[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Most used words in current buffer
From: |
tomas |
Subject: |
Re: Most used words in current buffer |
Date: |
Mon, 23 Jul 2018 09:34:28 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mon, Jul 23, 2018 at 12:09:15AM -0600, Bob Proulx wrote:
[...]
> Again the hash table is good until you need to grow the table. Then
> there is a pause while memory is shuffled around.
Precisely. To get this "amortized", you usually grow by some constant
factor, and there's where the log n comes in (which a tree has naturally
built in).
For really big hash tables (think all *dbm variations which are not
some cousins of B-trees), you even have several "levels", at which
point the hash tables start resembling B-trees somewhat (see below).
> But that never happens with a balanced tree because that cost is
> always amortized along as you go.
But once you get big, you start "feeling" the structure of your
storage (i.e. that the "storage as a big constant-time access
array" is just a lie). Having two pointers per node is horribly
wasteful, and if those pointers point to different continents
in your RAM, your CPU ends up waiting on cache fills all the
time.
> Also if you need to extract the entries in sorted order then the
> balanced tree is already sorted. Whereas with the hash table an extra
> sort step is needed.
That's right: once you need sorted walks or even lookup by sorted
index (i.e. "find the first entry whose value is greater or equal
to X" -- or "find all entries whose values are between X and Y"),
ordered trees are (probably) the way to go. But I'd guess, once
the tree becomes significant in size (yeah, handwaving here :)
some kind of B-tree, where you manage several entries in one node,
should be at an advantage.
> One of looking at this is that it is global optimization versus local
> optimization.
And the ratio of lookups per insert. And whether you want a sorted
index.
> In some ways it is swings and roundabouts.
Nicely put :-)
> But in other ways it depends upon what you need.
Absolutely
Cheers
- -- t
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEARECAAYFAltVhQQACgkQBcgs9XrR2kYaIACfdi/XvTyfdbJ9yk4my+eJziMt
yyYAnjw7NadASHDsgDrCJpWdzvDpmxDm
=5dmS
-----END PGP SIGNATURE-----
- Re: Most used words in current buffer, (continued)
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Re: Most used words in current buffer, Eric Abrahamsen, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/20
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Re: Most used words in current buffer, tomas, 2018/07/22
- Re: Most used words in current buffer, Bob Proulx, 2018/07/23
- Re: Most used words in current buffer,
tomas <=
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/23
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/21
- Re: Most used words in current buffer, Stefan Monnier, 2018/07/21
- Message not available
- Re: Most used words in current buffer, Udyant Wig, 2018/07/22