[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860d
From: |
Michael Tokarev |
Subject: |
Re: qemu-system-ppc64 option -smp 2 broken with commit 20b6643324a79860dcdfe811ffe4a79942bca21e |
Date: |
Sat, 24 Jun 2023 17:29:42 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 |
23.06.2023 14:09, Anushree Mathur wrote:
Hi everyone,
I was trying to boot rhel9.3 image with upstream qemu-system-ppc64 -smp 2
option and observed a segfault (qemu crash).
qemu command line used:
qemu-system-ppc64 -name Rhel9.3.ppc64le -smp 2 -m 16G -vga none -nographic -machine pseries -cpu POWER10 -accel tcg -device virtio-scsi-pci -drive
file=/home/rh93.qcow2,if=none,format=qcow2,id=hd0 -device scsi-hd,drive=hd0 -boot c
After doing a git bisect, I found the first bad commit which introduced this
issue is below:
[qemu]# git bisect good
20b6643324a79860dcdfe811ffe4a79942bca21e is the first bad commit
commit 20b6643324a79860dcdfe811ffe4a79942bca21e
Author: Richard Henderson <richard.henderson@linaro.org>
Date: Mon Dec 5 17:45:02 2022 -0600
tcg/ppc: Reorg goto_tb implementation
I've got another case which leads to this same commit, with similar results,
on a debian ppc64 machine with qemu 8.0 and master.
The crash doesn't happen every time, sometimes it needs 20+ iterations
to trigger (so my bisection was rather painful, initially pointing to
an entirely innocent commit). So far it only occurs on actual ppc64
machine, - I weren't able to reproduce it on amd64.
Sometimes (more often) it ends with SIGSEGV, but sometimes it also fails
with Illegal Instruction. Examining it with gdb - it looks more like a
stack corruption.
I triggered it by just booting a linux system. When it fails, most often
it fails somewhere at the end of boot, but sometimes it does that the moment
kernel spawns /init from initramfs and that one (a shell script) executes
first program.
[ OK ] Finished systemd-journal-f…ush Journal to Persistent Storage.
Starting systemd-tmpfiles-… Volatile Files and Directories...
[ OK ] Finished systemd-udev-trig…e - Coldplug All udev Devices.
[ OK ] Finished systemd-tmpfiles-…te Volatile Files and Directories.
Starting systemd-resolved.…e - Network Name Resolution...
Starting systemd-update-ut…rd System Boot/Shutdown in UTMP...
[ OK ] Started systemd-udevd.serv…nager for Device Events and Files.
Starting systemd-networkd.…ice - Network Configuration...
Segmentation fault (core dumped)
...
Core was generated by `qemu-system-ppc64 -append root=LABEL=debvm rw -nographic
-smp 2 -machine accel='.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fff3462395c in code_gen_buffer ()
[Current thread is 1 (Thread 0x7fff79c6e7c0 (LWP 922586))]
(gdb) bt
#0 0x00007fff3462395c in code_gen_buffer ()
#1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320,
itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
#2 0x00000001076cc348 in cpu_loop_exec_tb (tb_exit=0x7fff79c6d8c0,
last_tb=<synthetic pointer>, pc=140736355546736,
tb=0x7fff4b378480 <code_gen_buffer+383812548>, cpu=<optimized out>) at
accel/tcg/cpu-exec.c:893
#3 cpu_exec_loop (cpu=cpu@entry=0x1001d98b320, sc=sc@entry=0x7fff79c6da10) at
accel/tcg/cpu-exec.c:1013
#4 0x00000001076ccd98 in cpu_exec_setjmp (cpu=cpu@entry=0x1001d98b320,
sc=sc@entry=0x7fff79c6da10)
at accel/tcg/cpu-exec.c:1043
#5 0x00000001076cd5ec in cpu_exec (cpu=0x1001d98b320) at
accel/tcg/cpu-exec.c:1069
#6 0x0000000107705d30 in tcg_cpus_exec (cpu=0x1001d98b320) at
accel/tcg/tcg-accel-ops.c:81
#7 0x0000000107705f20 in mttcg_cpu_thread_fn (arg=0x1001d98b320) at
accel/tcg/tcg-accel-ops-mttcg.c:95
#8 0x000000010793ed7c in qemu_thread_start (args=<optimized out>) at
util/qemu-thread-posix.c:541
#9 0x00007fff81673d0c in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6
#10 0x00007fff81724350 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6
(gdb) l
32
33 int qemu_default_main(void)
34 {
35 int status;
36
37 status = qemu_main_loop();
38 qemu_cleanup();
39
40 return status;
41 }
(gdb) frame 1
#1 0x00000001076cbd2c in cpu_tb_exec (cpu=cpu@entry=0x1001d98b320,
itb=itb@entry=0x7fff4b378480 <code_gen_buffer+383812548>,
tb_exit=tb_exit@entry=0x7fff79c6d8c0) at accel/tcg/cpu-exec.c:460
460 ret = tcg_qemu_tb_exec(env, tb_ptr);
(gdb) l
455 if (qemu_loglevel_mask(CPU_LOG_TB_CPU | CPU_LOG_EXEC)) {
456 log_cpu_exec(log_pc(cpu, itb), cpu, itb);
457 }
458
459 qemu_thread_jit_execute();
460 ret = tcg_qemu_tb_exec(env, tb_ptr);
461 cpu->can_do_io = 1;
462 qemu_plugin_disable_mem_helpers(cpu);
463 /*
464 * TODO: Delay swapping back to the read-write region of the TB
(this is 8.0.2, the same happens with master).
Here, frame#0 appears corrupt.
Other attempts, sometimes stack frame is corrupt to a way so gdb can't decode it
at all.
I need help debugging this further.
Thanks,
/mjt