qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] qcow2: Fix corruption bug in qcow2_detect_metadata_preal


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 3/3] qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
Date: Thu, 24 Oct 2019 12:41:50 +0000

24.10.2019 14:17, Kevin Wolf wrote:
> Am 24.10.2019 um 12:46 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 23.10.2019 18:26, Kevin Wolf wrote:
>>> qcow2_detect_metadata_preallocation() calls qcow2_get_refcount() which
>>> requires s->lock to be taken to protect its accesses to the refcount
>>> table and refcount blocks. However, nothing in this code path actually
>>> took the lock. This could cause the same cache entry to be used by two
>>> requests at the same time, for different tables at different offsets,
>>> resulting in image corruption.
>>>
>>> As it would be preferable to base the detection on consistent data (even
>>> though it's just heuristics), let's take the lock not only around the
>>> qcow2_get_refcount() calls, but around the whole function.
>>>
>>> This patch takes the lock in qcow2_co_block_status() earlier and asserts
>>> in qcow2_detect_metadata_preallocation() that we hold the lock.
>>>
>>> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
>>> Cc: address@hidden
>>> Reported-by: Michael Weiser <address@hidden>
>>> Signed-off-by: Kevin Wolf <address@hidden>
>>
>> Oh, I'm very sorry. I was going to backport this patch, and found that it's
>> fixed in our downstream long ago, even before last upstream version patch 
>> sent :(
> 
> Seriously? Is your downstream QEMU so divergent from upstream that you
> wouldn't notice something like this? I think you really have to improve
> something there.

How to improve it? Months and years are passed to bring patches into upstream,
of course we have to live with a bunch (now it's nearer to 300 and this is a lot
better then 500+ in the recent past) of patches above Rhel release.

> 
> I mean, we have a serious data corruptor in the 4.1 release and I wasted
> days to debug this, and you've been sitting on a fix for months without
> telling anyone?

Of course, it was not my goal to hide the fix, I'll do my best to avoid this
in future. Very sorry for your time wasted.

> This isn't a memory leak or something, this kills
> people's images. It's eating their data.

Possibly, I didn't know about data corruption. For some reason I decided that
lock should be taken here, I don't remember. Still I should have
applied it to upstream version too, of course.

> 
> Modifying an image format driver requires utmost care and I think I'll
> try to make sure to allow only few qcow2 changes per release in the
> future. Too many changes were made in too short time, and with too
> little care, and there are even more qcow2 patches pending. Please check
> whether these series actually match your downstream code.

The difference is that I didn't move the lock() but add separate lock()/unlock()
pair around qcow2_detect_metadata_preallocation(). I think there is no serious
difference.

> Anyway, we'll
> tread very carefully now with the pending patches, even if it means
> delaying them for another release or two.

What do you mean? What to do with sent patches during several months?

> Stability is way more
> important for an image format driver than new features and
> optimisations.

Agree. But I think, the main source of stability is testing.

> 
> Do whatever you need to fix your downstream process, but seriously, this
> must not ever happen again.
> 

I don't see how to improve the process to exclude such problems. But I'll
of course will do my best to avoid them.



-- 
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]