qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stable-8.1.1: which bug do we keep?


From: Daniel P . Berrangé
Subject: Re: stable-8.1.1: which bug do we keep?
Date: Wed, 20 Sep 2023 10:17:04 +0100
User-agent: Mutt/2.2.9 (2022-11-12)

On Wed, Sep 20, 2023 at 07:46:36AM +0300, Michael Tokarev wrote:
> Hi!
> 
> I'm in somewhat doubt what to do with 8.1.1 release.
> 
> There are 2 compelling issues, fixing one discovers the other.
> 
> https://gitlab.com/qemu-project/qemu/-/issues/1864
> "x86 VM with TCG and SMP fails to start on 8.1.0"
> is fixed by 0d58c660689f "softmmu: Use async_run_on_cpu in tcg_commit"
> 
> But this brings up
> 
> https://gitlab.com/qemu-project/qemu/-/issues/1866
> "mips/mip64 virtio broken on master (and 8.1.0 with tcg fix)"
> (which is actually more than mips, as I've shown down the line,
> https://gitlab.com/qemu-project/qemu/-/issues/1866#note_1558221926 )
> 
> Also, one commit alone,
> 86e4f93d827 "softmmu: Assert data in bounds in iotlb_to_section",
> when not followed with "async_run_on_cpu in tcg_commit", causes
> assertion failures, eg
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg989846.html
> I don't know if "async_run_on_cpu in tcg_commit" was supposed to
> fix this assertion or not, or maybe some additional fix is needed, -
> but I haven't see this is triggered with 0d58c660689f applied.
> 
> There were at least two attempts by Richard to fix issues after
> 0d58c660689f, one "accel/tcg: Always require can_do_io", which fixes
> both reproducers for #1866 but at a high cost, and another,
> "softmmu: Introduce cpu_address_space_sync", which addresses the
> mips regression but does not fix my reproducer with ovmf
> and none of the 2 landed on master so far.

In the cover letter for the 2nd proposed series Richard says

[quote]
I've done a tiny bit of performance testing between the two
solutions and it seems to be a wash.  So now it's simply a
matter of cleanliness.
[/quote]

Since the 2nd series is shown to still be broken in some cases
and 1st is thought to solve them all, IMHO it feels like we
should just press ahead with applying the the 1st series to
git master, and then stable.

If we still want a cleaner solution, it can be reverted/replaced
later once someone figures out an option that addresses all the
problems. We shouldn't leave such a big regression in TCG unfixed
for so long while we figure out a cleaner option.

> Right now I have a "which bug to keep?" situation for 8.1.1, and
> I'd love to have at least *some* comments about all this.  I've got
> no replies to my earlier emails in this area.
> 
> To mee, it *feels* like 0d58c660689f should be there.
> Note: the scheduled deadline for staging-8.1.1 is gone yesterday.
> But this stuff seems to be important enough to delay 8.1.1 further.

On the one hand breaking x86 is a big deal because it is a mainstream
architecture, on the other hand people have real x86 hardware, so
using TCG emulation for x86 is less compelling. I agree we need to
fully address this in 8.1.1.

I guess the other unmentioned option is to revert whatever TCG changes
went into 8.1 that caused the regression in the first place. I've no
idea if that is at all practical though.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]