qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse


From: Max Reitz
Subject: Re: [Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse snapshots
Date: Tue, 2 Jul 2019 21:46:15 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0

On 02.07.19 21:13, Peter Maydell wrote:
> On Mon, 24 Jun 2019 at 15:48, Max Reitz <address@hidden> wrote:
>>
>> From: Sam Eiderman <address@hidden>
>>
>> Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
>> QEMU).
>>
>> This format was lacking in the following:
>>
>>     * Grain directory (L1) and grain table (L2) entries were 32-bit,
>>       allowing access to only 2TB (slightly less) of data.
>>     * The grain size (default) was 512 bytes - leading to data
>>       fragmentation and many grain tables.
>>     * For space reclamation purposes, it was necessary to find all the
>>       grains which are not pointed to by any grain table - so a reverse
>>       mapping of "offset of grain in vmdk" to "grain table" must be
>>       constructed - which takes large amounts of CPU/RAM.
>>
> 
> Hi; this change has confused Coverity a bit (CID 1402786):
> 
>> @@ -481,7 +529,7 @@ static int vmdk_init_tables(BlockDriverState *bs, 
>> VmdkExtent *extent,
>>      int i;
>>
>>      /* read the L1 table */
>> -    l1_size = extent->l1_size * sizeof(uint32_t);
>> +    l1_size = extent->l1_size * extent->entry_size;
>>      extent->l1_table = g_try_malloc(l1_size);
>>      if (l1_size && extent->l1_table == NULL) {
>>          return -ENOMEM;
> 
>>      }
>>
>>      if (extent->l1_backup_table_offset) {
>> +        assert(!extent->sesparse);
>>          extent->l1_backup_table = g_try_malloc(l1_size);
>>          if (l1_size && extent->l1_backup_table == NULL) {
>>              ret = -ENOMEM;
> 
> Here we allocate extent->l1_backup_table via g_try_malloc(),
> and our "ENOMEM" guard checks l1_size. However later on we do:
> 
>         for (i = 0; i < extent->l1_size; i++) {
>             le32_to_cpus(&extent->l1_backup_table[i]);
>         }
> 
> which is a dereference of l1_backup_table whose guard
> is looking at extent->l1_size. Previously Coverity could
> (probably) see this was OK because we were setting
>    l1_size = extent->l1_size * sizeof(uint32_t)
> so l1_size is 0 if and only if extent->l1_size is zero.
> But now there's another possibility because we have changed
> the initialization to
>    l1_size = extent->l1_size * extent->entry_size;
> so if extent->entry_size is zero then l1_size could be 0
> (getting us past the ENOMEM check) but extent->l1_size
> non-zero (causing us to try to dereference a NULL l1_backup_table
> pointer).
> 
> This can't in fact happen because if extent->l1_size is
> non-zero we'll assert that extent->entry_size is
> either 8 or 4, but the logic has become sufficiently
> convoluted that it's no longer clear to Coverity that this
> is an impossible case. I could mark this as a false positive,
> but maybe there's a way of rephrasing this code that makes
> the logic a little clearer to both humans and robots?

There is also the fact that entry_size is initialized to 4
(sizeof(uint32_t)) and updated to 8 (sizeof(uint64_t)) when the image is
in the seSparse subformat.  So entry_size can only ever be 4 or 8.

The only substantial thing I could imagine that might in some way make
it clearer would be to drop entry_size and always decide based on
extent->sesparse (because extent->entry_size == (extent->sesparse ? 8 : 4)).

But to me personally, that would not really be clearer.  I think it’s
reasonable to store the size of L1 entries in an extent attribute, and
remember that it’s always 4 or 8.

Of course, there is also the less substantial thing, which is adding a
comment “Size of L1 entries, can only be sizeof(uint32_t) or
sizeof(uint64_t)” above the entry_size definition.


I think Coverity is actually confused by aliasing.  The report says this:

> 1. Condition l1_size, taking false branch.

So l1_size must be 0, which means either extent->l1_size or
extent->entry_size must be 0.

It then says:

> 3. Condition i < extent->l1_size, taking true branch.

So extent->l1_size must be greater than 0.  (To be exact, it prints this
two times, so extent->l1_size must be 2.)

This is followed by:

> 4. Condition extent->entry_size == 8UL /* sizeof (uint64_t) */, taking true 
> branch.

Er, so, extent->entry_size is 8?

That actually makes l1_size 16.  So somewhere between conditions 1 and
3/4, *extent must have changed its contents.

So it looks to me like Coverity just thinks that *extent may be used
concurrently.  Short of adding a “restrict”, I don’t know what to do
this but to close the report as a false positive.

> Supplementary:
> 
> (1) this code:
>     for (i = 0; i < extent->l1_size; i++) {
>         if (extent->entry_size == sizeof(uint64_t)) {
>             le64_to_cpus((uint64_t *)extent->l1_table + i);
>         } else {
>             assert(extent->entry_size == sizeof(uint32_t));
>             le32_to_cpus((uint32_t *)extent->l1_table + i);
>         }
>     }
> only asserts that extent->entry_size is sane if the l1_size
> is non-zero, and it does it inside the loop so we assert
> once for each entry. Hiding the assert like this might be
> part of what's confusing Coverity.
> 
> (2) we allocate the l1_backup_table() with a size of l1_size,
> which will differ depending on extent->entry_size; but then
> we do the endianness conversion on it using
>         for (i = 0; i < extent->l1_size; i++) {
>             le32_to_cpus(&extent->l1_backup_table[i]);
>         }
> which assumes always 32-bit entries. Is that correct?

Yes, because 64 bit entries only occur for the seSparse format.  At
least our current (RO) implementation does not use a backup table when
accessing seSparse images.  (Hence the assertion against that at the
beginning of the enclosing if block.)

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]