qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order tr


From: Daniel P . Berrangé
Subject: Re: [RFC PATCH] qobject: Rewrite implementation of QDict for in-order traversal
Date: Thu, 7 Jul 2022 16:37:56 +0100
User-agent: Mutt/2.2.6 (2022-06-05)

On Tue, Jul 05, 2022 at 11:54:21AM +0200, Markus Armbruster wrote:
> QDict is implemented as a simple hash table of fixed size.  Observe:
> 
> * Slow for large n.  Not sure this matters.

I presume you're referring qdict_find() here, which would
ideally be O(1).

Our bucket size is 512, so for hash tables less than say
2000, it is close enough to O(1) that it likely doesn't
matter (except for our deterministic hash function which
can be abused to overfill specific buckets).

Ignoring the latter attack though, the fixed hash bucket
count isn't likely a speed issue for normal usage as our
QDict element counts are just not that big typically. So
it is mostly a memory wastage issue.


Historically QEMU's JSON input has come from sources that
are more trusted than QEMU itself, so didn't matter. As
we split up QEMU into co-operating processes with potentially
varying privileges, this may cease to be a safe assumption.

For pre-emptive robustness though I'd favour a guaranteed
O(1) impl, which would mean a dynamically resizing bucket
count, along with a non-deterministic (ideally cryptographically
strong) key hash function.

> * A QDict with n entries takes 4120 + n * 32 bytes on my box.  Wastes
>   space for small n, which is a common case.

So effectively 8k usage for every QDict instance at a minimum.
This is not so great with widespread QDict usage.

> * Order of traversal depends on the hash function and on insertion
>   order, because it iterates first over buckets, then collision
>   chains.
> 
> * Special code ensures qdict_size() takes constant time.
> 
> Replace the hash table by a linked list.  Observe:
> 
> * Even slower for large n.  Might be bad enough to matter.

Guaranteed O(n) every time, even for small values of 'n'.
Just feels like a bad idea to me.

> * A QDict with n entries takes 32 + n * 24 bytes.
> 
> * Traversal is in insertion order.
> 
> * qdict_size() is linear in the number of entries.
> 
> This is an experiment.  Do not commit to master as is.

Two alternative ideas.

 * Implement it is both a hashtable and a linked list.
   Hashtable to get O(1) lookup, linked list to get
   stable iteration order based on insertion order.
   Makes the insert/delete operations more expensive,
   and slightly greater memory overhead.

 * Merely change the users to apply the ordering they
   require when iterating.

In both those cases, I'd suggest we consider use of GHashTable, to give
us a more dynamic hash table impl with resizing buckets, so it is more
memory efficient and stronger guarantee of O(1) lookups. It also quite
simple to iterate over the keys in a fixed order, as you can get a GList
of keys, and invoke g_list_sort with any comparator. While we could add
more APIs to do this with QDict and QLIST, re-inventing the wheel feels
dubious unless there's a compelling benefit to our existing impl.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]