lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] jit_qdivr_u trashes JIT_R0 on x86_64


From: Paul Cercueil
Subject: Re: [Lightning] jit_qdivr_u trashes JIT_R0 on x86_64
Date: Thu, 29 Aug 2019 02:41:05 +0200

Hi Paulo,


Le mer. 28 août 2019 à 20:01, Paulo =?iso-8859-1?b?Q+lzYXI=?= Pereira de Andrade <address@hidden> a écrit :
Em qua, 28 de ago de 2019 às 15:36, Paul Cercueil <address@hidden> escreveu:

  Hi Paul,
[...]
 >>  I call jit_qdiv(reg1, reg2, reg2, reg1)
>> (and also jit_qdiv_u with the same arguments). reg1 and reg2 cannot
 >>  represent the same register.
 >
> Ok. I see it can fail on some ports, actually, probably only on x86.
 > I will add extra tests for these conditions, and fix any register
 > clobbers
 > that should be being generated.
> Note that by doing the way you described, it might cause lightning
 > to
 > generate code to get at least one extra temporary depending on
 > registers
> input. It might look clean in the lightning input, but may generate > significantly larger code due to moves to/from %rax and %rdx, as well
 > as
 > spill reloads.

Since I have my own register allocator on top of Lightning, sometimes in a function all the registers offered by Lightning have been written
 to, and I guess there's no way for Lightning to know which registers
are safe to use without spilling. A good improvement would be to have
 an API function to mark a register as freely usable.

  I just tested extending check/qalu.inc to test all possible
permutations, and also check for any register clobber, and it still
pass all tests on x86_64.

  Can you please describe how your api works? I suspect it might be
doing something that is triggering a bug somewhere. Maybe point to
some git repository with an example of how to reproduce the problem.
It would be far better if you could create an example input to
check/lightning. For example:
$ cat test.tst
.data   32
fmt:
.c      "7 / 3 = (%ld, %ld)\n"
.code
        prolog
        movi %r1 7
        movi %r0 3
        // force live
        movi %r2 10
        movi %v0 11
        movi %v1 12
        movi %v2 13
        // end force live
        qdivr %r0 %r1 %r1 %r0
        // check registers
        bnei L0 %r2 10
        bnei L0 %v0 11
        bnei L0 %v1 12
        beqi L1 %v2 13
L0:
        calli @abort
L1:
        prepare
                pushargi fmt
                ellipsis
                pushargr %r0
                pushargr %r1
        finishi @printf
        ret
        epilog
$ ./ligtning test.tst
7 / 3 = (2, 1)

I tried, but I could not succeed to make a test case that shows the issue... All I tried have been working just fine.

The actual code that triggers the issue is this one: https://gist.github.com/pcercuei/8db789b415f6ced73abb01a6504b64c7
As it's generated, it's not the easiest to understand, sorry...

The thing to see, is that line 26 I write register 0 (rax), which is then untouched until line 58 or 71.

This is what Lightning generates: https://gist.github.com/pcercuei/6b4afef692682b139a46449665454681

As you can see, line 30 %rax is written to, and %eax is written back to memory on lines 69 or 84.
But also, line 49 %rax is unconditionally overwritten.

Cheers,
-Paul


  You might also use some ideas from the allocator in
https://github.com/pcpa/owl/blob/master/lib/oemit.c
but the allocator really is for virtual registers. It hardcodes actual
registers for the virtual registers, thread pointer, this pointer,
(virtual) stack pointer, global pointer, and some temporaries. But as
long as the code generator knows it is working only with integer
operations that cannot overflow or float/double values it does not
generate code to spill reload the hardware register assigned to the
(four) virtual registers.
  Note that with lightning you should think of registers as resources,
somewhat like a file descriptor.

 Thanks,
 -Paul

Thanks!
Paulo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]