Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)

From:	Emanuel Berg
Subject:	Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
Date:	Fri, 11 Aug 2023 20:06:50 +0200
User-agent:	Gnus/5.13 (Gnus v5.13)

Ihor Radchenko wrote:

>>> Maybe we could somehow re-use the already allocated bignum
>>> objects, similar to what is done for cons cells (see
>>> src/alloc.c:Fcons).
>>
>> Sounds reasonable :)
>
> And... is has been already done, actually.
> allocate_vectorlike calls allocate_vector_from_block, which
> re-uses pre-allocated objects.
>
> And looking into the call graph, this exact branch calling
> allocate_vector_from_block is indeed called for the bignums [...]

Are we talking a list of Emacs C functions executing with the
corresponding times they have been in execution in a tree data
structure? :O

E.g. where do we find allocate_vectorlike ?

> With the patch:
>
> perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 1.968 s
>
>     33.17%  emacs    emacs                          [.] process_mark_stack
>      5.51%  emacs    libgmp.so.10.5.0               [.] __gmpz_sizeinbase
>      5.05%  emacs    emacs                          [.] mark_char_table
>      4.88%  emacs    emacs                          [.] pdumper_marked_p_impl
>      3.30%  emacs    emacs                          [.] 
> pdumper_set_marked_impl
> ...
>      2.52%  emacs    emacs                          [.] allocate_vectorlike
>
> allocate_vectorlike clearly takes a lot less time by not trying to loop
> over all the ~500 empty elements of vector_free_lists.
>
> We can further get rid of the GC by temporarily disabling it (just for
> demonstration):
>
> (let ((beg (float-time)))
>   (setq gc-cons-threshold most-positive-fixnum)
>   (fib 10000 1000)
>   (message "%.3f s" (- (float-time) beg)) )
>
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
>
>     17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>      7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>      6.51%  emacs    emacs                 [.] arith_driver
>      6.03%  emacs    libc.so.6             [.] malloc
>      5.57%  emacs    emacs                 [.] allocate_vectorlike
>      5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
>      4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
>      3.72%  emacs    emacs                 [.] check_number_coerce_marker
>      3.35%  emacs    fib.eln               [.] F666962_fib_0
>      3.29%  emacs    emacs                 [.] allocate_pseudovector
>      2.30%  emacs    emacs                 [.] Flss
>
> Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time.
>
> I am not sure what differs between Elisp gmp bindings and analogous SBCL
> binding so that SBCL is so much faster.
>
> diff --git a/src/alloc.c b/src/alloc.c
> index 17ca5c725d0..62e96b4c9de 100644
> --- a/src/alloc.c
> +++ b/src/alloc.c
> @@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p)
>     vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
>
>  static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
> +static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>  /* Singly-linked list of large vectors.  */
>
> @@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t
>  nbytes)
>    set_next_vector (v, vector_free_lists[vindex]);
>    ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
>    vector_free_lists[vindex] = v;
> +  if ( vindex < vector_free_lists_min_idx )
> +    vector_free_lists_min_idx = vindex;
>  }
>
>  /* Get a new vector block.  */
> @@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes)
>    /* Next, check free lists containing larger vectors.  Since
>       we will split the result, we should have remaining space
>       large enough to use for one-slot vector at least.  */
> -  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
> -       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
> +  for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN),
>  vector_free_lists_min_idx );
> +       index < VECTOR_MAX_FREE_LIST_INDEX; index++,
>  vector_free_lists_min_idx++)
>      if (vector_free_lists[index])
>        {
>       /* This vector is larger than requested.  */
> @@ -3413,6 +3416,7 @@ sweep_vectors (void)
>    gcstat.total_vectors = 0;
>    gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
> +  vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>    /* Looking through vector blocks.  */

Amazing! :O

See if you can do my original test, which was 1-3 Elisp,
byte-compiled Elisp, and natively compiled Elisp, and the
Common Lisp execution (on your computer), if you'd like.

Actually it is a bit of a bummer to the community since Emacs
is like THE portal into Lisp. We should have the best Lisp in
the business, and I don't see why not? Emacs + SBCL + CL +
Elisp anyone?

I.e. real CL not the cl- which is actually in Elisp. Not that
there is anything wrong with that! On the contrary ;)

-- 
underground experts united
https://dataswamp.org/~incal

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Shrinking the C core, (continued)

Prev by Date: Re: [ELPA] Proposing to add express to ELPA
Next by Date: Re: Bignum performance (was: Shrinking the C core)
Previous by thread: [PATCH] Re: Bignum performance (was: Shrinking the C core)
Next by thread: Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
Index(es):
- Date
- Thread