|
From: | Richard Henderson |
Subject: | Re: [PATCH v3 18/37] target/ppc: implement vgnb |
Date: | Fri, 11 Feb 2022 17:15:54 +1100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 |
On 2/10/22 23:34, matheus.ferst@eldorado.org.br wrote:
+ for (int dw = 1; dw >= 0; dw--) { + get_avr64(vrb, a->vrb, dw); + for (; in >= 0; in -= a->n, out--) { + if (in > out) { + tcg_gen_shri_i64(tmp, vrb, in - out); + } else { + tcg_gen_shli_i64(tmp, vrb, out - in); + } + tcg_gen_andi_i64(tmp, tmp, 1ULL << out); + tcg_gen_or_i64(rt, rt, tmp); + } + in += 64; + }
This is going to produce up to 3*64 operations (n=2). You can produce more than one output pairing per shift, and produce the same result in 3*lg2(64) operations. I've given an example like this on the list before, recently. I think it was in the context of some riscv bit manipulation.
N = 2 AxBxCxDxExFxGxHxIxJxKxLxMxNxOxPxQxRxSxTxUxVxWxXxYxZx0x1x2x3x4x5x & rep(0b10) A.B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z.0.1.2.3.4.5. << 1 .B.C.D.E.F.G.H.I.J.K.L.M.N.O.P.Q.R.S.T.U.V.W.X.Y.Z.0.1.2.3.4.5.. | ABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZ001122334455. & rep(0b1100) AB..CD..EF..GH..IJ..KL..MN..OP..QR..ST..UV..WX..YZ..01..23..45.. << 2 ..CD..EF..GH..IJ..KL..MN..OP..QR..ST..UV..WX..YZ..01..23..45.... | ABCDCDEFEFGHGHIJIJKLKLMNMNOPOPWQQRSTSTUVUVWXWXYZYZ010123234545.. & rep(0xf0) ABCD....EFGH....IJKL....MNOP....QRST....UVWX....YZ01....2345.... << 4 ....EFGH....IJKL....MNOP....QRST....UVWX....YZ01....2345........ | ABCDEFGHEFGHIJKLIJKLMNOPMNOPQRSTQRSTUVWXUVWXYZ01YZ0123452345.... & rep(0xff00) ABCDEFGH........IJKLMNOP........QRSTUVWX........YZ012345........ << 8 ........IJKLMNOP........QRSTUVWX........YZ012345................ | ABCDEFGHIJKLMNOPIJKLMNOPQRSTUVWXQRSTUVWXYZ012345YZ012345........ & rep(0xffff0000) ABCDEFGHIJKLMNOP................QRSTUVWXYZ012345................ deposit(t, 32, 16) ABCDEFGHIJKLMNOPQRSTUVWXYZ012346................................
and similarly for larger N. For N >= 4, I believe that half of the masking may be elided, because there are already zeros in which to place bits.
N = 5 AxxxxBxxxxCxxxxDxxxxExxxxFxxxxGxxxxHxxxxIxxxxJxxxxKxxxxLxxxxMxxx & rep(0b10000) A....B....C....D....E....F....G....H....I....J....K....L....M... << (5 - 1) .B....C....D....E....F....G....H....I....J....K....L....M....... | AB...BC...CD...DE...EF...FG...GH...HI...IJ...JK...KL...LM...M... << (10 - 2) ..CD...DE...EF...FG...GH...HI...IJ...JK...KL...LM...M... | ABCD.BCDE.CDEF.DEFG.EFGH.FGHI.GHIJ.HIJK.IJKL.JKLM.KLM..LM...M... & rep(0xf0000) ABCD................EFGH................IJKL................M... << (20 - 4) ....EFGH................IJKL................M................... | ABCDEFGH............EFGHIJKL............IJKLM...............M... << (40 - 8) ........IJKLM...............M................................... | ABCDEFGHIJKLM.......EFGHIJKLM...........IJKLM...............M... & 0xfff8_0000_0000_0000 ABCDEFGHIJKLM...................................................
It's probably worth working through the various N to make sure you know which masking is required.
r~
[Prev in Thread] | Current Thread | [Next in Thread] |