qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remov


From: Eric Blake
Subject: Re: [PATCH for-4.2?] block/qcow2-bitmap: fix crash bug in qcow2_co_remove_persistent_dirty_bitmap
Date: Fri, 6 Dec 2019 08:36:29 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2

[adding in Peter Maydell, as there is now potential talk of other 4.2-worthy patches]

On 12/6/19 4:18 AM, Vladimir Sementsov-Ogievskiy wrote:
05.12.2019 23:16, John Snow wrote:


On 12/5/19 3:09 PM, Eric Blake wrote:
On 12/5/19 1:30 PM, Vladimir Sementsov-Ogievskiy wrote:
Here is double bug:

First, return error but not set errp. This may lead to:
qmp block-dirty-bitmap-remove may report success when actually failed

block-dirty-bitmap-remove used in a transaction will crash, as
qmp_transaction will think that it returned success and will cal

call

block_dirty_bitmap_remove_commit which will crash, as state->bitmap is
NULL

Second (like in anecdote), this case is not an error at all. As it is
documented in the comment above bdrv_co_remove_persistent_dirty_bitmap
definition, absence of bitmap is not an error, and similar case handled
at start of qcow2_co_remove_persistent_dirty_bitmap, it returns 0 when
there is no bitmaps at all..

double .


But when there are some bitmaps, but not the requested one, it return
error with errp unset.

Fix that.

Fixes: b56a1e31759b750
Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
---

Hi all!

Ohm, suddenly we faced this bug. It's a regression in 4.2. I'm very
sorry for introducing it, and it sad that it's found so late..

Personally, I think that this one worth rc5, as it makes new bitmap
interfaces unusable. But the decision is yours.

Last minute edit: hmm, actually, transaction action introduced in
4.2, so crash is not a regression, only broken block-dirty-bitmap-remove
command is a regression... Maybe it's OK for stable.

Libvirt REALLY wants to use transaction bitmap management (and require
qemu 4.2) for its incremental backup patches that Peter is almost ready
to merge in.  I'm trying to ascertain:

When exactly can this bug hit?  Can you give a sequence of QMP commands
that will trigger it for easy reproduction?  Are there workarounds (such
as checking that a bitmap exists prior to attempting to remove it)?  If
it does NOT get fixed in rc5, how will libvirt be able to probe whether
the fix is in place?


It looks like:

- You need to have at least one bitmap
- You need to use transactional remove
- you need to misspell the bitmap name
- The remove will fail (return -EINVAL) but doesn't set errp
- The caller chokes on incomplete information, state->bitmap is NULL


No, that would be too simple, the thing is worse. Absolutele correct removing 
is broken, without any misspelling

Bug triggers when we are removing persistent bitmap that is not stored yet in 
the image AND at least one (another) bitmap already stored in the image. So, 
something like:

1. create persistent bitmap A
2. shutdown vm  (bitmap A is synced)
3. start vm
4. create persistent bitmap B
5. remove bitmap B - it fails (and crashes if in transaction)

====

Hmm, workaround..

I'm afraid that the only thing is not remove persistent bitmaps, which were 
never synced to the image. So, instead the sequence above, we need


1. create persistent bitmap A
2. shutdown vm
3. start vm
4. create persistent bitmap B
5. remember, that we want to remove bitmap B after vm shutdown
...
   some other operations
...
6. vm shutdown
7. start vm in stopped mode, and remove all bitmaps marked for removing
8. stop vm

But, I think that in real circumstances, vm shutdown is rare thing...

This is sounding a bit more serious. As I said earlier, it shouldn't delay 4.2 on its own, but if the fix is obvious (and other than comments, it is a single change from 'ret = -EINVAL' to 'ret = 0' which fixes a definite reproducible crash), I think it rises to the level of acceptable.

I've been so worried about the question of which release, that I don't know if I've previously offered:
Reviewed-by: Eric Blake <address@hidden>

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]