qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860d


From: Michael Tokarev
Subject: Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e
Date: Sat, 24 Jun 2023 17:29:42 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

23.06.2023 14:09, Anushree Mathur wrote:
Hi everyone,

I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2 
option and observed a segfault (qemu crash).

qemu command line used:

qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c

After doing a git bisect, I found the first bad commit which introduced this 
issue is below:

[qemu]# git bisect good
20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
commit 20b6643324a79860dcdfe811ffe4a79942bca21e
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Mon Dec 5 17:45:02 2022 -0600

     tcg/ppc: Reorg goto_tb implementation

I've got another case which leads to this same commit, with similar results,
on a debian ppc64 machine with qemu 8.0 and master.

The crash doesn't happen every time, sometimes it needs 20+ iterations
to trigger (so my bisection was rather painful, initially pointing to
an entirely innocent commit).  So far it only occurs on actual ppc64
machine, - I weren't able to reproduce it on amd64.

Sometimes (more often) it ends with SIGSEGV, but sometimes it also fails
with Illegal Instruction.  Examining it with gdb - it looks more like a
stack corruption.

I triggered it by just booting a linux system. When it fails, most often
it fails somewhere at the end of boot, but sometimes it does that the moment
kernel spawns /init from initramfs and that one (a shell script) executes
first program.



[  OK  ] Finished systemd-journal-f…ush Journal to Persistent Storage.
         Starting systemd-tmpfiles-… Volatile Files and Directories...
[  OK  ] Finished systemd-udev-trig…e - Coldplug All udev Devices.
[  OK  ] Finished systemd-tmpfiles-…te Volatile Files and Directories.
         Starting systemd-resolved.…e - Network Name Resolution...
         Starting systemd-update-ut…rd System Boot/Shutdown in UTMP...
[  OK  ] Started systemd-udevd.serv…nager for Device Events and Files.
         Starting systemd-networkd.…ice - Network Configuration...
Segmentation fault (core dumped)

...
Core was generated by `qemu-system-ppc64 -append root=LABEL=debvm rw -nographic 
-smp 2 -machine accel='.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fff3462395c in code_gen_buffer ()
[Current thread is 1 (Thread 0x7fff79c6e7c0 (LWP 922586))]
(gdb) bt
#0  0x00007fff3462395c in code_gen_buffer ()
#1  0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, 
itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
    tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
#2  0x00000001076cc348 in cpu_loop_exec_tb (tb_exit=0x7fff79c6d8c0, 
last_tb=<synthetic pointer>, pc=140736355546736,
    tb=0x7fff4b378480 <code_gen_buffer+383812548>, cpu=<optimized out>) at 
accel/tcg/cpu-exec.c:893
#3  cpu_exec_loop (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at 
accel/tcg/cpu-exec.c:1013
#4  0x00000001076ccd98 in cpu_exec_setjmp (cpu=cpu@entry=0x1001d98b320, 
sc=sc@entry=0x7fff79c6da10)
    at accel/tcg/cpu-exec.c:1043
#5  0x00000001076cd5ec in cpu_exec (cpu=0x1001d98b320) at 
accel/tcg/cpu-exec.c:1069
#6  0x0000000107705d30 in tcg_cpus_exec (cpu=0x1001d98b320) at 
accel/tcg/tcg-accel-ops.c:81
#7  0x0000000107705f20 in mttcg_cpu_thread_fn (arg=0x1001d98b320) at 
accel/tcg/tcg-accel-ops-mttcg.c:95
#8  0x000000010793ed7c in qemu_thread_start (args=<optimized out>) at 
util/qemu-thread-posix.c:541
#9  0x00007fff81673d0c in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#10 0x00007fff81724350 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6

(gdb) l
32      
33      int qemu_default_main(void)
34      {
35         int status;
36      
37         status = qemu_main_loop();
38         qemu_cleanup();
39      
40         return status;
41      }

(gdb) frame 1
#1  0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320, 
itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
    tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
460        ret = tcg_qemu_tb_exec(env, tb_ptr);
(gdb) l
455        if (qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC)) {
456            log_cpu_exec(log_pc(cpu, itb), cpu, itb);
457        }
458     
459        qemu_thread_jit_execute();
460        ret = tcg_qemu_tb_exec(env, tb_ptr);
461        cpu->can_do_io = 1;
462        qemu_plugin_disable_mem_helpers(cpu);
463        /*
464         * TODO: Delay swapping back to the read-write region of the TB


(this is 8.0.2, the same happens with master).

Here, frame#0 appears corrupt.

Other attempts, sometimes stack frame is corrupt to a way so gdb can't decode it
at all.

I need help debugging this further.

Thanks,

/mjt



reply via email to

[Prev in Thread] Current Thread [Next in Thread]