Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-

From:	Max Reitz
Subject:	Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)
Date:	Tue, 30 Mar 2021 18:39:25 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

On 30.03.21 15:25, Vladimir Sementsov-Ogievskiy wrote:

30.03.2021 15:51, Max Reitz wrote:
On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 12:49, Max Reitz wrote:
On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:
ping. Do we want it for 6.0?
I’d rather wait. I think the conclusion was that guests shouldn’thit this because they serialize discards?
I think, that we never had bugs, so we of course can wait.
There’s also something Kevin wrote on IRC a couple of weeks ago, forwhich I had hoped he’d sent an email but I don’t think he did, soI’ll try to remember and paraphrase as well as I can...
He basically asked whether it wouldn’t be conceptually simpler totake a reference to some cluster in get_cluster_offset() and laterrelease it with a to-be-added put_cluster_offset().
He also noted that reading is problematic, too, because if you reada discarded and reused cluster, this might result in an informationleak (some guest application might be able to read data it isn’tallowed to read); that’s why making get_cluster_offset() the pointof locking clusters against discarding would be better.
Yes, I thought about read too, (RFCed in cover letter of [PATCH v50/6] qcow2: fix parallel rewrite and discard (lockless))
This would probably work with both of your solutions. For thein-memory solutions, you’d take a refcount to an actual cluster; inthe CoRwLock solution, you’d take that lock.
What do you think?
Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() andqcow2_inflight_writes_dec() toget_cluster_offset()/put_cluster_offset(), to make it more native touse for read operations as well?
Hm.  Our discussion wasn’t so detailed.
I interpreted it to mean all qcow2 functions that find an offset to aqcow2 cluster, namely qcow2_get_host_offset(),qcow2_alloc_host_offset(), and qcow2_alloc_compressed_cluster_offset().
What about qcow2_alloc_clusters() ?

Seems like all callers for that but do_alloc_cluster_offset() call it toallocate metadata clusters, which cannot be discarded from the guest.

Is it really possible that some metadata cluster is used while qcow2discards it internally at the same time, or isn’t all of this only aproblem for data clusters?

When those functions return an offset (in)to some cluster, thatcluster (or the image as a whole) should be locked against discards.Every offset received this way would require an accompanyingqcow2_put_host_offset().
Or to update any kind of "getting cluster offset" in the whole qcow2driver to take a kind of "dynamic reference count" byget_cluster_offset() and then call corresponding put() somewhere? Inthis case I'm afraid it's a lot more work..
Hm, really? I would have assumed we need to do some locking in allfunctions that get a cluster offset this way, so it should be lesswork to take the lock in the functions they invoke to get the offset.
It would be also the problem that a lot of paths in qcow2 are not incoroutine and don't even take s->lock when they actually should.
I’m not sure what you mean here, because all functions that invoke anyof the three functions I listed above are coroutine_fns (or, well, Ididn’t look it up, but they all have *_co_* in their name).
qcow2_alloc_clusters() has a lot more callers..


Let’s hope we don’t need to worry about it then. O:)

This will also mean that we do same job as normal qcow2 refcountsalready do: no sense in keeping additional "dynamic refcount" for L2table cluster while reading it, as we already have non-zero qcow2normal refcount for it..
I’m afraid I don’t understand how normal refcounts relate to this.For example, qcow2_get_host_offset() doesn’t touch refcounts at all.
I mean the following: remember our discussion about what isfree-cluster. If we add "dynamic-refcount", or "infligth-write-counter"thing only to count inflight data-writing (or, as discussed, we shouldcount data-reads as well) operations, than "full reference count" of thecluster is inflight-write-count + qcow2-metadata-refcount.
But if we add a kind of "dynamic refcount" for any use of host cluster,for example reading of L2 table, than we duplicate the reference inqcow2-metadata to this L2 table (represented as refcount) by our"dynamic refcount", and we don't have a concept of "full referencecount" as the sum above.. We still should treat a cluster as free whenboth "dynamic refcount" and qcow2-metadata-refcount are zero, but theirsum doesn't have a good sense. Not a problem maybe.. But looks like acomplication with no benefit.

You’re right, but I don’t think that’s a real problem. Perhaps the sumwas even a false equivalency. There is a difference between the dynamicrefcount and the on-disk refcount: We must wait with discarding untilthe the dynamic refcount is 0, and discarding will then drop the on-diskrefcount to 0, too. I think. So I’m not sure whether the sum reallymeans anything.

But if metadata isn’t a problem and that means we don’t have ask thesequestions at all, well, that’ll be even better.

==
OK, I think now that you didn't mean qcow2_alloc_clusters(). So, we aresaying about only functions returning an offset to cluster with "guestdata", not to any kind of host cluster. Than what you propose looks likethis to me:
  - take my v5
  - rename qcow2_inflight_writes_dec() to put_cluster_offset()
  - call qcow2_inflight_writes_inc() from the three functions you mention

Yes, I think so. Or you take the CoRwLock in those three functions,depending on which implementation we want.

Sorry if we’ve discussed this before and I just forgot, but: What arethe performance implications of either solution? As far as I remember,the inflight-write-counter solution had the problem of always doingstuff on every I/O access. You said the impact was small and yes, itis, but it’s still there.I haven’t looked at the CoRwLock solution but it I would assume it’sbasically zero cost for common cases, right? I.e. the case where theguest already serializes discards from other accesses, which I thoughtis what e.g. Linux does. (And even if it doesn’t, I would assume thatconcurrent I/O and discards are rather rare.)

That make sense to me. Still, put_cluster_offset() name doesn't make itobvious that it's only for clusters with "guest data", and we shouldn'tcall it when work with metadata clusters.

Yeah, it was meant for symmetry with qcow2_get_host_offset() (i.e. itwould be named “qcow2_put_host_offset()”). Now that there are threefunctions that would take a reference, it should get some other name. Idon’t know. qcow2_put_data_cluster_offset()?

Max

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH v4 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 1/3] qemu-io: add aio_discard, Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 2/3] iotests: add qcow2-discard-during-rewrite, Vladimir Sementsov-Ogievskiy, 2021/03/19
- [PATCH v4 3/3] block/qcow2: introduce discard_rw_lock: fix discarding host clusters, Vladimir Sementsov-Ogievskiy, 2021/03/19
- Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/25
  - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz, 2021/03/30
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/30
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz, 2021/03/30
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/30
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Max Reitz <=
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/30
    - Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock), Vladimir Sementsov-Ogievskiy, 2021/03/31

Prev by Date: Re: [PULL 0/9] Block patches for 6.0-rc1
Next by Date: Re: [PATCH 1/4] iotests/297: Drop 169 and 199 from the skip list
Previous by thread: Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)
Next by thread: Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)
Index(es):
- Date
- Thread