[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PULL 082/116] target/i386: correct fix for pcmpxstrx substring sear
From: |
Alex Bennée |
Subject: |
Re: [PULL 082/116] target/i386: correct fix for pcmpxstrx substring search |
Date: |
Mon, 15 Jun 2020 14:43:42 +0100 |
User-agent: |
mu4e 1.5.3; emacs 28.0.50 |
Philippe Mathieu-Daudé <philmd@redhat.com> writes:
> On 6/15/20 12:18 PM, Philippe Mathieu-Daudé wrote:
>> On 6/12/20 6:07 PM, Paolo Bonzini wrote:
>>> From: Joseph Myers <joseph@codesourcery.com>
>>>
>>> This corrects a bug introduced in my previous fix for SSE4.2 pcmpestri
>>> / pcmpestrm / pcmpistri / pcmpistrm substring search, commit
>>> ae35eea7e4a9f21dd147406dfbcd0c4c6aaf2a60.
>>>
>>> That commit fixed a bug that showed up in four GCC tests with one libc
>>> implementation. The tests in question generate random inputs to the
>>> intrinsics and compare results to a C implementation, but they only
>>> test 1024 possible random inputs, and when the tests use the cases of
>>> those instructions that work with word rather than byte inputs, it's
>>> easy to have problematic cases that show up much less frequently than
>>> that. Thus, testing with a different libc implementation, and so a
>>> different random number generator, showed up a problem with the
>>> previous patch.
>>>
>>> When investigating the previous test failures, I found the description
>>> of these instructions in the Intel manuals (starting from computing a
>>> 16x16 or 8x8 set of comparison results) confusing and hard to match up
>>> with the more optimized implementation in QEMU, and referred to AMD
>>> manuals which described the instructions in a different way. Those
>>> AMD descriptions are very explicit that the whole of the string being
>>> searched for must be found in the other operand, not running off the
>>> end of that operand; they say "If the prototype and the SUT are equal
>>> in length, the two strings must be identical for the comparison to be
>>> TRUE.". However, that statement is incorrect.
>>>
>>> In my previous commit message, I noted:
>>>
>>> The operation in this case is a search for a string (argument d to
>>> the helper) in another string (argument s to the helper); if a copy
>>> of d at a particular position would run off the end of s, the
>>> resulting output bit should be 0 whether or not the strings match in
>>> the region where they overlap, but the QEMU implementation was
>>> wrongly comparing only up to the point where s ends and counting it
>>> as a match if an initial segment of d matched a terminal segment of
>>> s. Here, "run off the end of s" means that some byte of d would
>>> overlap some byte outside of s; thus, if d has zero length, it is
>>> considered to match everywhere, including after the end of s.
>>>
>>> The description "some byte of d would overlap some byte outside of s"
>>> is accurate only when understood to refer to overlapping some byte
>>> *within the 16-byte operand* but at or after the zero terminator; it
>>> is valid to run over the end of s if the end of s is the end of the
>>> 16-byte operand. So the fix in the previous patch for the case of d
>>> being empty was correct, but the other part of that patch was not
>>> correct (as it never allowed partial matches even at the end of the
>>> 16-byte operand). Nor was the code before the previous patch correct
>>> for the case of d nonempty, as it would always have allowed partial
>>> matches at the end of s.
>>>
>>> Fix with a partial revert of my previous change, combined with
>>> inserting a check for the special case of s having maximum length to
>>> determine where it is necessary to check for matches.
>>>
>>> In the added test, test 1 is for the case of empty strings, which
>>> failed before my 2017 patch, test 2 is for the bug introduced by my
>>> 2017 patch and test 3 deals with the case where a match of an initial
>>> segment at the end of the string is not valid when the string ends
>>> before the end of the 16-byte operand (that is, the case that would be
>>> broken by a simple revert of the non-empty-string part of my 2017
>>> patch).
>>>
>>> Signed-off-by: Joseph Myers <joseph@codesourcery.com>
>>> Message-Id: <alpine.DEB.2.21.2006121344290.9881@digraph.polyomino.org.uk>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> target/i386/ops_sse.h | 4 ++--
>>> tests/tcg/i386/Makefile.target | 3 +++
>>> tests/tcg/i386/test-i386-pcmpistri.c | 33 ++++++++++++++++++++++++++++
>>> 3 files changed, 38 insertions(+), 2 deletions(-)
>>> create mode 100644 tests/tcg/i386/test-i386-pcmpistri.c
>>>
>>> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
>>> index 01d6017412..14f2b16abd 100644
>>> --- a/target/i386/ops_sse.h
>>> +++ b/target/i386/ops_sse.h
>>> @@ -2089,10 +2089,10 @@ static inline unsigned pcmpxstrx(CPUX86State *env,
>>> Reg *d, Reg *s,
>>> res = (2 << upper) - 1;
>>> break;
>>> }
>>> - for (j = valids - validd; j >= 0; j--) {
>>> + for (j = valids == upper ? valids : valids - validd; j >= 0; j--) {
>>> res <<= 1;
>>> v = 1;
>>> - for (i = validd; i >= 0; i--) {
>>> + for (i = MIN(valids - j, validd); i >= 0; i--) {
>>> v &= (pcmp_val(s, ctrl, i + j) == pcmp_val(d, ctrl, i));
>>> }
>>> res |= v;
>>> diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
>>> index 43ee2e181e..53efec0668 100644
>>> --- a/tests/tcg/i386/Makefile.target
>>> +++ b/tests/tcg/i386/Makefile.target
>>> @@ -10,6 +10,9 @@ ALL_X86_TESTS=$(I386_SRCS:.c=)
>>> SKIP_I386_TESTS=test-i386-ssse3
>>> X86_64_TESTS:=$(filter test-i386-ssse3, $(ALL_X86_TESTS))
>>>
>>> +test-i386-pcmpistri: CFLAGS += -msse4.2
>>> +run-test-i386-pcmpistri: QEMU_OPTS += -cpu max
>>
>> This test fails on our CI:
>> https://travis-ci.org/github/qemu/qemu/jobs/698006621#L4246
>
> FYI Paolo's analysis from 'make V=1' output
> https://api.travis-ci.org/v3/job/698459904/log.txt:
>
> timeout 60
> /home/travis/build/philmd/qemu/build/i386-linux-user/qemu-i386 -cpu max
> test-i386-pcmpistri > test-i386-pcmpistri.out
>
> timeout 60
> /home/travis/build/philmd/qemu/build/i386-linux-user/qemu-i386 -plugin
> ../../plugin/libbb.so -d plugin -D
> test-i386-pcmpistri-with-libbb.so.pout test-i386-pcmpistri >
> run-plugin-test-i386-pcmpistri-with-libbb.so.out
>
> "incorrect qemu invocation, missing -cpu max in the second".
Just testing some patches now.
--
Alex Bennée