lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ppc: Fix 'calli' when floating-point arguments are passed


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH] ppc: Fix 'calli' when floating-point arguments are passed
Date: Fri, 9 Sep 2022 23:21:03 -0300

Em sex., 9 de set. de 2022 às 18:19, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,

  Hi Paul,

> Le ven., sept. 9 2022 at 16:51:24 -0300, Paulo César Pereira de
> Andrade <paulo.cesar.pereira.de.andrade@gmail.com> a écrit :
> > Em sex., 9 de set. de 2022 às 15:05, Paulo César Pereira de Andrade
> > <paulo.cesar.pereira.de.andrade@gmail.com> escreveu:
> >>
> >>  Em sex., 9 de set. de 2022 às 13:53, Paul Cercueil
> >>  <paul@crapouillou.net> escreveu:
> >>  >
> >>  >
> >>  >
> >>  > Le ven., sept. 9 2022 at 12:30:33 -0300, Paulo César Pereira de
> >>  > Andrade <paulo.cesar.pereira.de.andrade@gmail.com> a écrit :
> >>  > > Em sex., 9 de set. de 2022 às 12:08, Paulo César Pereira de
> >> Andrade
> >>  > > <paulo.cesar.pereira.de.andrade@gmail.com> escreveu:
> >>  > >>
> >>  > >>  Em sex., 9 de set. de 2022 às 11:42, Paul Cercueil
> >>  > >>  <paul@crapouillou.net> escreveu:
> >>  > >>  >
> >>  > >>  > Hi Paulo,
> >>  > >>  >
> >>  > >>  > [snip]
> >>  > >>  >
> >>  > >>  > >>  The very last break happened as it was calling
> >> jit_new_state()
> >>  > >>  > >> again,
> >>  > >>  > >>  so I suppose the memory address that was being watched
> >> was
> >>  > >> being
> >>  > >>  > >> reused
> >>  > >>  > >>  and I stopped the debugging session there.
> >>  > >>  > >
> >>  > >>  > >   Please just do call jit_print() after finishing
> >>  > >> split_branches, for
> >>  > >>  > > example:
> >>  > >>  > >
> >>  > >>  > > Breakpoint 1, _split_branches (_jit=0x10111270) at
> >>  > >> lightning.c:2713
> >>  > >>  > > 2713        for (node = _jitc->head; node; node = next) {
> >>  > >>  > > (gdb) call _jit_print(_jit)
> >>  > >>  > > L0: /* prolog */
> >>  > >>  > >     #name
> >>  > >>  > >     #note 395
> >>  > >>  > > ...
> >>  > >>  > >
> >>  > >>  > > and let me know the output. The likely explanation for it
> >> not
> >>  > >> being
> >>  > >>  > > set,
> >>  > >>  > > assuming lightning code is not modified, is that there is
> >> some
> >>  > >> extra
> >>  > >>  > > code changing the value of r26 after it is used as a
> >> temporary.
> >>  > >>  > > Something
> >>  > >>  > > like in the range from "L2:" to the "stxi_i 0x3e7c r29
> >> r26"
> >>  > >> below,
> >>  > >>  > > there is
> >>  > >>  > > some instruction that changes r26, and causes it to
> >> assume it
> >>  > >> is dead
> >>  > >>  > > in
> >>  > >>  > > that range, and usable as a temporary.
> >>  > >>  >
> >>  > >>  > I'll do better, I'll give you an example that reproduces the
> >>  > >> issue :)
> >>  > >>  > See the attached example.
> >>  > >>  >
> >>  > >>  > Your idea that something must be changing r26 in that slice
> >> of
> >>  > >> code
> >>  > >>  > pointed me in the right direction.
> >>  > >>  > I only changed one line in the example: there is now a
> >>  > >> jit_movi(r26, 0)
> >>  > >>  > line 58.
> >>  > >>  >
> >>  > >>  > I assume Lightning detects that r26 is already 0 anyway
> >> (because
> >>  > >> of the
> >>  > >>  > beqi just before) so it removes this instruction, causing
> >> havoc
> >>  > >> at the
> >>  > >>  > same time.
> >>  > >>
> >>  > >>    Exactly, the instruction is being removed, and the optimizer
> >>  > >> getting
> >>  > >>  confused.
> >>  > >>    This should not happen as the optimizations done are
> >> supposed to
> >>  > >>  be the fool proof ones, but somehow this pattern triggered a
> >> bug.
> >>  > >>    I will try to understand it and have a patch later today.
> >>  > >
> >>  > >   Just in case, this should work as a temporary workaround:
> >>  > >
> >>  > > """
> >>  > > diff --git a/lib/lightning.c b/lib/lightning.c
> >>  > > index d219e6d..c4f2e91 100644
> >>  > > --- a/lib/lightning.c
> >>  > > +++ b/lib/lightning.c
> >>  > > @@ -1702,12 +1702,14 @@ _jit_optimize(jit_state_t *_jit)
> >>  > >             case jit_code_epilog:
> >>  > >                 _jitc->function = NULL;
> >>  > >                 break;
> >>  > > +#if 0
> >>  > >             case jit_code_beqi:
> >>  > >                 redundant_store(node, 1);
> >>  > >                 break;
> >>  > >             case jit_code_bnei:
> >>  > >                 redundant_store(node, 0);
> >>  > >                 break;
> >>  > > +#endif
> >>  > >             default:
> >>  > >  #if JIT_HASH_CONSTS
> >>  > >                 if (mask & jit_cc_a0_flt) {
> >>  > > """
> >>  >
> >>  > I simply commented the 'redunant_store()' calls as I didn't know
> >> if it
> >>  > was OK for the jit_code_beqi/jit_code_bnei cases to fall in the
> >>  > "default" case.
> >>
> >>    It should go to the default case, because it might need to
> >> allocate
> >>  a constant to be loaded, or if in a function body, need to know the
> >>  register is modified, what only matters if it is callee save; this
> >> was
> >>  a second problem in this code path; if it did not remove the movi
> >>  in redundant_store() it should still go to the default case, or,
> >> could
> >>  not detect a callee save register is modified (likely it was already
> >>  noted as modified, but maybe in some very special scenario it
> >>  could not notice that...).
> >>
> >>  > Looks like the workaround works, thanks for the suggestion.
> >>
> >>    The problem was that it basically did add an "implicit" note note
> >> that
> >>  the register was dead in the range from L2 to  the
> >> "jit_movi(JIT_V1, 0)",
> >>  so, it was safe to use it as temporary, but when removing the
> >> jit_movi
> >>  in another optimization pass, it broke the logic.
> >>
> >>  > >   I will try to work on a patch, to have the live register
> >> computation
> >>  > > work as if the movi was still done, or just remove and no
> >> longer use
> >>  > > redundant_store, as this movi removal is causing the problem.
> >>  > >
> >>  > > """
> >>  > >         case jit_code_movi:
> >>  > >         if (regno == jit_regno(iter->u.w)) {
> >>  > >             if (iter->flag || iter->v.w != word)
> >>  > >             return;
> >>  > >             del_node(prev, iter);
> >>  > >             iter = prev;
> >>  > >         }
> >>  > >         break;
> >>  > > """
> >>  > >
> >>  > >   The problem is that the instruction is removed after the
> >> computation
> >>  > > of the initial state at the start of the block.
> >
> >   I just pushed a commit with the easiest and I believe safest
> > approach to
> > keep the optimization and prevent the misuse as a temporary. It just
> > turns the jit_movi into a jit_live.
> >   This works because the jit_movi() could be seen as the non existing
> > jit_dead()* call. Because it was not used without modification up to
> > that
> > removed jit_movi, the start of the block did assume it was dead at
> > block entry. It would be too complicated to fully understand the
> > condition
> > and (not too much) costly to rebuild the initial state. Still, the
> > only case
> > this approach is bad is if the movi would end up in a dead code path,
> > preventing it from being used as a cheap temporary, or, if run out of
> > temporaries, need a spill/reload.
>
> Sadly it does cause more issues than it solves.
> Attached is a piece of code that I generate, with the *workaround* in
> place (not your fix).

  Please confirm the workaround works as expected.

> With the fix in place, the CMPWI and BEQ lines 157-158 are missing.
>
> Besides, I can't give you a jit_print() trace of this code with the fix
> in place, because it segfaults:

  Ok. I was thinking a bit about it later, and it would only work for the
reproducer. It was still buggy. Should either recompute the initial
state at start of the block or leave the redundant register rematerialization,
to the known value, in place.

> #0  0x3f390ee0 in strlen () from
> /home/paul/dev/toolchains/powerpc/powerpc-buildroot-linux-gnu/sysroot/lib/libc.so.6
> #1  0x3f368d64 in fputs () from
> /home/paul/dev/toolchains/powerpc/powerpc-buildroot-linux-gnu/sysroot/lib/libc.so.6
> #2  0x3f5a6248 in _jit_print_node (_jit=_jit@entry=0x402dcbb0,
> node=node@entry=0x40312130) at jit_print.c:178
> #3  0x3f5a700c in _jit_print (_jit=_jit@entry=0x402dcbb0) at
> jit_print.c:73
>
> > * could add a jit_dead() instruction, this could be useful when doing
> > a
> > jump to a non constant address or address not in jit code. This would
> > be used for example, when landing in a code that reloads the called
> > save register value from the stack, and is a hint that it can be used
> > safely as temporary from the jit_dead() to the jump. For the moment,
> > just something to think about for the near future.
>
> That sounds like a good idea. I manage my own register allocation so I
> can tell Lightning exactly when a register is dead.

  To fully work it would need an extra regset mask, otherwise it could
cause bugs in unexpected conditions.  But it would not be too costly,
as it would not be an extra regset per block, but only an extra one in
the _jit->comp structure.

> Cheers,
> -Paul

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]