bug#68244: hash-table improvements

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#68244: hash-table improvements

From:	Dmitry Gutov
Subject:	bug#68244: hash-table improvements
Date:	Sun, 7 Jan 2024 05:13:39 +0200
User-agent:	Mozilla Thunderbird

On 06/01/2024 13:34, Mattias Engdegård wrote:

5 jan. 2024 kl. 16.41 skrev Dmitry Gutov <dmitry@gutov.dev>:

That's a good question and it all comes down to how we interpret 
`consing_until_gc`. Here we take the view that it should encompass all parts of 
an allocation and this seems to be consistent with existing code.


But the existing code used objects that would need to be collected by GC, 
right? And the new one, seemingly, does not.


But it does, similar to the same way that we deal with string data.

Actually, vectors might be a better comparison. And we do increase thetally when creating a vector (inside 'allocate_vectorlike').

So I don't quite see the advantage of increasing consing_until_gc then. It's 
like the difference between creating new strings and inserting strings into a 
buffer: new memory is used either way, but the latter doesn't increase consing.


Since we don't know exactly when objects die, we use object allocation as a 
proxy, assuming that on average A bytes die for every B bytes allocated and 
make an informed (and adjusted) guess as to what the A/B ratio might be. That 
is the basis for the GC clock.

Buffer memory is indeed treated differently and does not advance the GC clock 
as far as I can tell. Presumably the reasoning is that buffer size changes make 
a poor proxy for object deaths.

Perhaps we could look at it differently: what are the failure modes fornot increasing the tally.

For strings, one could allocate a handful of very long strings, takingup a lot of memory, and if the consing tally did not take into accountthe lengths of the strings, the GC might never start, and we die of OOM.

For vectors, it almost looks different (the contained values are alreadycounted, and they'd usually be larger than the memory taken by onecell), but then you could put many copies of the same value (could evenbe nil) into a large vector, and we're back to the same problem.

Could we do something like that with a hash-table? Probably not - thehashing should at least guarantee 'eq' uniqueness. But then I supposesomeone could create an empty hash-table of a very large size. If theinternal vectors are pre-allocated, that could have the same effect asthe above.

The same reasoning could work for buffers too, but are they actuallygarbage-collected?

Of course we could reason that growing an existing hash table is also a bad 
proxy for object deaths, but the evidence for that is weak so I used the same 
metric as for other data structures just to be on the safe side.

This reminds me that the `gcstat` bookkeeping should probably include the 
hash-table ancillary arrays as well, since those counters are used to adjust 
the GC clock (see total_bytes_of_live_objects and consing_threshold). Will fix!

It's great that the new hash tables are garbage-collected more easily and 
produce less garbage overall, but in a real program any GC cycle will have to 
traverse the other data structures anyway. So we might be leaving free 
performance gains on the table when we induce GC cycles while no managed 
allocations are done. I could be missing something, of course.


So could I, and please know that your questions are much appreciated. Are you 
satisfied by my replies above, or did I misunderstand your concerns?


Thank you. I hope I'm not too off mark with my reasoning.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#68244: hash-table improvements, (continued)
- bug#68244: hash-table improvements, Stefan Kangas, 2024/01/09
  - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/12
- bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/05
  - bug#68244: hash-table improvements, Dmitry Gutov, 2024/01/05
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/06
    - bug#68244: hash-table improvements, Eli Zaretskii, 2024/01/06
    - bug#68244: hash-table improvements, Dmitry Gutov <=
    - bug#68244: hash-table improvements, Stefan Monnier, 2024/01/07
    - bug#68244: hash-table improvements, Dmitry, 2024/01/07
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/07
    - bug#68244: hash-table improvements, Stefan Monnier, 2024/01/07
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/08
    - bug#68244: hash-table improvements, Stefan Monnier, 2024/01/08
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/09
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/13
    - bug#68244: hash-table improvements, Gerd Möllmann, 2024/01/14
    - bug#68244: hash-table improvements, Mattias Engdegård, 2024/01/14

Prev by Date: bug#67463: 30.0.50; Eglot may manage js-json-mode buffers with wrong server
Next by Date: bug#68288: 29.1.90; describe-package errors if a package specifies multiple maintainers
Previous by thread: bug#68244: hash-table improvements
Next by thread: bug#68244: hash-table improvements
Index(es):
- Date
- Thread