[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-gues
From: |
Gerd Hoffmann |
Subject: |
Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object |
Date: |
Fri, 2 Sep 2022 07:46:21 +0200 |
On Fri, Sep 02, 2022 at 02:52:25AM +0000, Sean Christopherson wrote:
> On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> > On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> > > Hi,
> > > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off.
> > > > E.g.,
> > > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on syscall
> > > > gap
> > > > [1].
> > >
> > > Why is that a problem for a TD guest kernel? Installing exception
> > > handlers is done quite early in the boot process, certainly before any
> > > userspace code runs. So I think we should never see a syscall without
> > > a #VE handler being installed. /me is confused.
> > >
> > > Or do you want tell me linux has no #VE handler?
> >
> > The problem is not "no #VE handler" and Linux does have #VE handler. The
> > problem is Linux doesn't want any (or certain) exception occurrence in
> > syscall gap, it's not specific to #VE. Frankly, I don't understand the
> > reason clearly, it's something related to IST used in x86 Linux kernel.
>
> The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first
> instruction
> at the SYSCALL entry point runs with a userspaced-controlled RSP. With TDX, a
> malicious hypervisor can induce a #VE on the SYSCALL page and thus get the
> kernel
> to run the #VE handler with a userspace stack.
>
> The "fix" is to use an IST for #VE so that a kernel-controlled RSP is loaded
> on #VE,
> but ISTs are terrible because they don't play nice with re-entrancy (among
> other
> reasons). The RSP used for IST-based handlers is hardcoded, and so if a #VE
> handler triggers another #VE at any point before IRET, the second #VE will
> clobber
> the stack and hose the kernel.
> v
> It's possible to workaround this, e.g. change the IST entry at the very
> beginning
> of the handler, but it's a maintenance burden. Since the only reason to use
> an IST
> is to guard against a malicious hypervisor, Linux decided it would be just as
> easy
> and more beneficial to avoid unexpected #VEs due to unaccepted private pages
> entirely.
Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?
Having a hypervisor-controlled config bit to protect against a malicious
hypervisor looks pointless to me ...
take care,
Gerd