coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#1037264: cksum crashes intermittently with "Illegal instruction"


From: Kristoffer Brånemyr
Subject: Re: Bug#1037264: cksum crashes intermittently with "Illegal instruction" on some Xen DomU
Date: Mon, 12 Jun 2023 16:25:28 +0000 (UTC)

I guess it doesn't hurt to try to also check for SSE variants in the function 
trying to see if pclmul is supported.
But I think it's a bit suspicious that it only crashes sometimes.If there was 
some instruction which causes this, should it not happen everytime?
Could it be something else, like some unaligned address read/write that causes 
this?I guess ILL_ILLOPN might mean the argument to a instruction (i.e. possibly 
address?)

Can you reproduce the problem running cksum in gdb? Then you could disassemble 
the location it crashes in and possibly see a bit better what causes the issue. 
Also dump the values of the hardware registers. And variables if you can.


-- 
/Kristoffer Brånemyr 

    Den måndag 12 juni 2023 kl. 15:03:11 CEST, Philip Rowlands 
<coreutils@dimebar.com> skrev:  
 
 On Sat, 10 Jun 2023, at 11:09, Pádraig Brady wrote:
> cksum since v9.0 checks at runtime whether pclmul is supported.
> It seems that check is not working appropriately on a Xen DomU.

Hypervisors routinely lie about CPUID feature flags, in order to maintain 
compatibility between a fleet of diverse servers. It's possible in this case 
that the system was misconfigured to present flags which the underlying CPU 
doesn't support.

> The routine in question is pclmul_supported() at:
> https://github.com/coreutils/coreutils/blob/b841f111/src/cksum.c#L160-L191
>
> That either suggests xen is incorrectly setting PCLMUL and AVX bits,
> or perhaps these two bits are not sufficient.
> Hmm I wonder do we also need to explicitly check for SSSE3 support?

Intel says to check for SSE and SSE2; quoting the manual
===
11.6.2 Checking for Intel® SSE and SSE2 Support
Before an application attempts to use Intel SSE and/or Intel SSE2, it should 
check that they are present on the
processor:
1. Check that the processor supports the CPUID instruction. Bit 21 of the 
EFLAGS register can be used to check
processor’s support the CPUID instruction.
2. Check that the processor supports Intel SSE and/or SSE2 (true if 
CPUID.01H:EDX.SSE[bit 25] = 1 and/or
CPUID.01H:EDX.SSE2[bit 26] = 1).

12.13.4 Checking for Intel® AES-NI Support
Before an application attempts to use AESNI instructions or PCLMULQDQ, the 
application should follow the steps
illustrated in Section 11.6.2, “Checking for Intel® SSE and SSE2 Support.” 
Next, use the additional step provided
below:
Check that the processor supports Intel AES-NI (if CPUID.01H:ECX.AESNI[bit 25] 
= 1); check that the processor
supports PCLMULQDQ (if CPUID.01H:ECX.PCLMULQDQ[bit 1] = 1).
===

Wikipedia mentions an AVX-512 version (VPCLMULQDQ) but I don't think we're 
using that.

I can't find the equivalent AMD docs. Is there a library / macro check for 
this, to avoid the low-level bit inspection?

It would be useful to see the output of "cpuid -1" which does a verbose decode 
of all CPUID flags, on the system which sees the SIGILL. (How can it be 
intermittent??)

Interesting that the strace output finishes with:

read(0, "", 61440)                      = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55bec9cc6cf5} ---
+++ killed by SIGILL +++

i.e. ILL_ILLOPN (operand) rather than ILL_ILLOPC (opcode). What could cause 
this?


Cheers,
Phil
  

reply via email to

[Prev in Thread] Current Thread [Next in Thread]