[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-gues
From: |
Sean Christopherson |
Subject: |
Re: [PATCH v1 15/40] i386/tdx: Add property sept-ve-disable for tdx-guest object |
Date: |
Fri, 2 Sep 2022 15:26:35 +0000 |
On Fri, Sep 02, 2022, Gerd Hoffmann wrote:
> On Fri, Sep 02, 2022 at 02:52:25AM +0000, Sean Christopherson wrote:
> > On Fri, Sep 02, 2022, Xiaoyao Li wrote:
> > > On 8/26/2022 1:57 PM, Gerd Hoffmann wrote:
> > > > Hi,
> > > > > For TD guest kernel, it has its own reason to turn SEPT_VE on or off.
> > > > > E.g.,
> > > > > linux TD guest requires SEPT_VE to be disabled to avoid #VE on
> > > > > syscall gap
> > > > > [1].
> > > >
> > > > Why is that a problem for a TD guest kernel? Installing exception
> > > > handlers is done quite early in the boot process, certainly before any
> > > > userspace code runs. So I think we should never see a syscall without
> > > > a #VE handler being installed. /me is confused.
> > > >
> > > > Or do you want tell me linux has no #VE handler?
> > >
> > > The problem is not "no #VE handler" and Linux does have #VE handler. The
> > > problem is Linux doesn't want any (or certain) exception occurrence in
> > > syscall gap, it's not specific to #VE. Frankly, I don't understand the
> > > reason clearly, it's something related to IST used in x86 Linux kernel.
> >
> > The SYSCALL gap issue is that because SYSCALL doesn't load RSP, the first
> > instruction
> > at the SYSCALL entry point runs with a userspaced-controlled RSP. With
> > TDX, a
> > malicious hypervisor can induce a #VE on the SYSCALL page and thus get the
> > kernel
> > to run the #VE handler with a userspace stack.
> >
> > The "fix" is to use an IST for #VE so that a kernel-controlled RSP is
> > loaded on #VE,
> > but ISTs are terrible because they don't play nice with re-entrancy (among
> > other
> > reasons). The RSP used for IST-based handlers is hardcoded, and so if a #VE
> > handler triggers another #VE at any point before IRET, the second #VE will
> > clobber
> > the stack and hose the kernel.
> > v
> > It's possible to workaround this, e.g. change the IST entry at the very
> > beginning
> > of the handler, but it's a maintenance burden. Since the only reason to
> > use an IST
> > is to guard against a malicious hypervisor, Linux decided it would be just
> > as easy
> > and more beneficial to avoid unexpected #VEs due to unaccepted private
> > pages entirely.
>
> Hmm, ok, but shouldn't the SEPT_VE bit *really* controlled by the guest then?
>
> Having a hypervisor-controlled config bit to protect against a malicious
> hypervisor looks pointless to me ...
IIRC, all (most?) of the attributes are included in the attestation report, so a
guest/customer can refuse to provision secrets to the guest if the hypervisor is
misbehaving.
I'm guessing Intel made it an attribute and not a dynamic control knob to
simplify
the TDX module implementation.