lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Atomic operations


From: Marc Nieper-Wißkirchen
Subject: Re: Atomic operations
Date: Fri, 9 Sep 2022 22:48:12 +0200

Dear Paulo,

I just want to say that I got your exciting message; I should have time to look into your work in more detail next week.

Thanks a lot,

Marc

Am Do., 8. Sept. 2022 um 21:59 Uhr schrieb Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com>:
Em qui., 1 de set. de 2022 às 18:18, Paulo César Pereira de Andrade
<paulo.cesar.pereira.de.andrade@gmail.com> escreveu:

  Hi Marc,

> Em sex., 12 de ago. de 2022 às 08:15, Paulo César Pereira de Andrade
> <paulo.cesar.pereira.de.andrade@gmail.com> escreveu:
> >
> > Em qui., 11 de ago. de 2022 às 17:52, Marc Nieper-Wißkirchen
> > <marc.nieper+gnu@gmail.com> escreveu:
> >
> >   Hi Marc,
> >
> > > PS: This document may be helpful as well: https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
>
>   My idea is to keep at least the first implementation as simple
> as possible.
>
> >   Right now, my idea of what should be done would be to have a
> > jit_casr(bool_return_in_register, address_in_register,
> > new_value_in_register, old_value_in_register);
> >
> >   It would need a new pattern of 4 registers instruction, as only
> > the first one is modified. The current qmul* and qdiv* ones put
> > result in the first two register arguments.
> >
> >   And while at that, also have a:
> >
> > jit_tasr(old_value_in_register, address_in_register, new_value_in_register);
> >
> >   Likely can have the _c, _s, _i, and _l modifiers, or have only a 32
> > bit variant.
>
>   At first "test and set" is only cheap on x86*, and would require very complex
> code, and even loops on some architectures. It should also not be friendly to
> cache and end up wasting too much cpu.
> https://godbolt.org/z/WPsPzv66c
>
> >   Very few cpus should not have some construct for it. For those, we just add
> > a software fallback with some kind of spin lock. In either case,  we "cheat" and
> > look at what assembly gcc generates or the kernel uses for the
> > equivalent construct.
>
>   For the moment I am considering only a:
>
> jit_casr(jit_int32_t bool_reg_return, jit_int32_t address, jit_int32_t
> old_val, jit_int32_t new_val);

  Please check https://git.savannah.gnu.org/cgit/lightning.git/commit/?id=d5a7c8e4ad719e84dbb4904c532f906a1ef5a77b
[Implement a simple compare and swap atomic operation.]

  At first I would not like to implement something like gcc libatomic
or libatomic_ops as inline jit generation. It would be way too complex
and very difficult to ensure it would be bug free.

> where:
> * bool_reg_return would be true if it did swap, false otherwise;
>   any loop, until it returns true would need to be explicitly generated.
>   This also makes it easier to implement a "try lock" logic.
> * address is the address in a register
> * old_val is the expected old value in a register
> * new_val is the expected new value in a register
>
> could implement a jit_casi where the address argument could be an
> immediate value.
> At first I would avoid it, because it likely would need a temporary,
> and better to
> have the code implementing any loop choosing what register to use for
> the address,
> and it might still be required temporaries on some architectures even
> receiving all
> arguments in registers.
>
>   About pairs, it is in my low priority TODO complex numbers support :-)
>
> >   About the ABA problem, we ignore it. Just have some note, describe the issue
> > and how to avoid it. I would suggest using it with the address_in_register value
> > as a special lock, not the actual variable.

  At first it implements jit_cas{r,i}(result, address, old, new) where
result is a
register to store the boolean result, address is a register or an immediate
with the machine address of the value/lock, old is the old value to test, and
new is the value to set the contents of the address if it holds the old value.

  Because it only returns true if the value was changed, frequently a loop must
be done until acquiring the lock, *or* if the code can be generated in
a very smart
way, it can do something else and try again later, to avoid wasting cpu time.

  A fallback is implemented with pthreads. It is very expensive, but for now
only used in the first implementation, or on ports where it cannot be properly
tested, as there is no real hardware access, or no smp hardware access.

Thanks,
Paulo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]