On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does not
> > have an INT3 instruction, it fails. This can happen if one sets a
> > software breakpoint in a kernel module and then reloads it. gdb then
> > thinks the breakpoint cannot be deleted and there is no way to add it
> > back.
> >
> > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > target/i386/kvm/kvm.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > index 0b5755e42b..c8d61daf68 100644
> > --- a/target/i386/kvm/kvm.c
> > +++ b/target/i386/kvm/kvm.c
> > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs,
struct kvm_sw_breakpoint *bp)
> > {
> > uint8_t int3;
> >
> > - if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> > - cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1))
{
> > + if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > + return -EINVAL;
> > + }
> > + if (int3 != 0xcc) {
> > + return 0;
> > + }
> > + if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1))
{
> > return -EINVAL;
> > }
> > return 0;
>
> There still remains a philosopical question if kvm_arch_remove_sw_breakpoint
> should always return 0, since for the usual case of kernel debugging where
> a breakpoint is in unloaded module, the above will probably still fail
> as paging for this module is removed as well by the kernel.
> It is still better though so:
>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>
> Note that I managed to make lx-symbols to work in a very stable way
> with attached WIP patch I hacked on this Sunday.
> I will send a cleaned up version of it to upstream when I have time.
>
> Since I make gdb unload the symbols, it works even without this patch.
>
> Added Stefano Garzarella to CC as I know that he tried to make this work as
well.
> https://lkml.org/lkml/2020/10/5/514
Thanks Maxim for CCing me!
Just a report when I tried these patches, but I'm not sure they are
related.
I found that gdb 10 has some problem with QEMU:
$ gdb --version
GNU gdb (GDB) Fedora 10.1-2.fc33
(gdb) lx-symbols
loading vmlinux
scanning for modules in linux/build
../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
With gdb 9 'lx-symbols' works well, but I still have some issue when I
put a breakpoint to a symbol not yet loaded (vsock_core_register in this
example), then I load the module (vsock_loopback in this example) in the
guest.
Whit your patch gdb stuck after loading the symbols of the first new
module:
(gdb) b vsock_core_register
Function "vsock_core_register" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (vsock_core_register) pending.
(gdb) c
Continuing.
loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
Without your patch, gdb loops infinitely reloading the new module:
refreshing all symbols to reload module 'vsock'
loading vmlinux
loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
loading @0xffffffffc0077000: linux/build/net/802/stp.ko
loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
loading @0xffffffffc000d000: linux/build/net/ipv4/netfilter/ip_tables.ko
loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
refreshing all symbols to reload module 'vsock'
loading vmlinux
loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
...
I'll try to get a better look at what's going on.
Let me then explain what I found:
First of all initial execution of lx-symbols works and always worked for me
(I use gdb 9 though from fedora 32 (9.1-7.fc32))
Then a special breakpoint is placed on do_init_module
(it is hidden from the user)
Once it is hit two things can happen:
1. if a not yet seen module is loaded,
(module that wasn't loaded last time all symbols were reloaded)
its symbols are loaded to gdb with 'add-symbol-file' command.
2. if module that was already loaded to gdb, is loaded (see above),
then 'big hammer' is used:
a. all existing breakpoints (including the hidden one)
are disabled since reloading everything
indeed messes up the gdb state
b. the executable's symbols (the kernel) are reloaded,
which makes gdb unload all symbols, and then all symbols are loaded
again (in the same way as lx-symbols works),
including the symbols of currently loading module.
c. all breakpoints are enabled again
Now the issue that you originally reported on LKML is because the (1)
apparently also messes up the software breakpoints state in gdb,
and that makes gdb to not to temporary remove the breakpoint
in do_init_module once the execution is resumed, and then
the guest starts executing garbage (bytes after 'int3' instruction).
The second issue is that (2), aka the big hammer isn't really needed.
GDB does have (maybe this is recent command?) a 'remove-symbol-file'
command.
So it is possible to do remove-symbol-file/add-symbol-file'
on known module reload instead of reloading everything.
But this has a small issue. The issue is that such known module
was already reloaded, so all int3 instructions that gdb placed into
it are already gone, so sofware breakpoints placed to it won't work
This is what the patch that Paulo sent fixes.
However it is even better to create another hidden breakpoint on module removal
path (I used free_module for that) and unload the symbols there.
This allows the gdb to cleanly remove all software breakpoints
in that module, show them again as pending, and even re-install
them again once the module is loaded again.
So those are the two changes I made:
1. I added a breakpoint on module removal which also does
a. disable all breakpoints
b. unload the module's symbols
c. enable all breakpoints
2. On module load I also do
a. disable all breakpoints
b. load the module's symbols
c. enable all breakpoints
There is still an issue that both 'load' and 'unload' breakpoint
can hit more that once.
This happens because these are software breakpoints and
load/unload code in the kernel is executed with interrupts enabled.
So what is happening is that once the hidden breakpoint is hit, gdb script
attached to it is done, and VM is resumed, gdb does more or less this:
a. remove the breakpoint
b. do a single step
c. re-install the breakpoint.
However the single step more often than not, lands us into an interrupt
handler, and so once the handler is finished we end up re-executing the
instruction on which breakpoint was set.
On a single vCPU it works more or less (with several tries) on my machine,
but with many vCPUs, I can end up in live lock like state
when the above prevents the VM from progressing.
I think we can fix this on kvm side by not injecting interrupts
when single step is done.
In fact the below patch works for me,
Not only it fixes the live lock but makes these hidden breakpoints
hit only once in my testing.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eec62c0dafc36..8b7a4e27bcf66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu *vcpu,
bool *req_immediate_exit
goto busy;
}
+ /*
+ * Don't inject interrupts while single stepping to make guest debug
easier
+ */
+ if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
+ can_inject = false;
+
With this patch lx-symbols works almost perfectly for me (on AMD).