qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-


From: Max Reitz
Subject: Re: [PATCH v4 for-6.0? 0/3] qcow2: fix parallel rewrite and discard (rw-lock)
Date: Tue, 30 Mar 2021 18:39:25 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0

On 30.03.21 15:25, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 15:51, Max Reitz wrote:
On 30.03.21 12:51, Vladimir Sementsov-Ogievskiy wrote:
30.03.2021 12:49, Max Reitz wrote:
On 25.03.21 20:12, Vladimir Sementsov-Ogievskiy wrote:
ping. Do we want it for 6.0?

I’d rather wait.  I think the conclusion was that guests shouldn’t hit this because they serialize discards?

I think, that we never had bugs, so we of course can wait.


There’s also something Kevin wrote on IRC a couple of weeks ago, for which I had hoped he’d sent an email but I don’t think he did, so I’ll try to remember and paraphrase as well as I can...

He basically asked whether it wouldn’t be conceptually simpler to take a reference to some cluster in get_cluster_offset() and later release it with a to-be-added put_cluster_offset().

He also noted that reading is problematic, too, because if you read a discarded and reused cluster, this might result in an information leak (some guest application might be able to read data it isn’t allowed to read); that’s why making get_cluster_offset() the point of locking clusters against discarding would be better.

Yes, I thought about read too, (RFCed in cover letter of [PATCH v5 0/6] qcow2: fix parallel rewrite and discard (lockless))


This would probably work with both of your solutions.  For the in-memory solutions, you’d take a refcount to an actual cluster; in the CoRwLock solution, you’d take that lock.

What do you think?


Hmm. What do you mean? Just rename my qcow2_inflight_writes_inc() and qcow2_inflight_writes_dec() to get_cluster_offset()/put_cluster_offset(), to make it more native to use for read operations as well?

Hm.  Our discussion wasn’t so detailed.

I interpreted it to mean all qcow2 functions that find an offset to a qcow2 cluster, namely qcow2_get_host_offset(), qcow2_alloc_host_offset(), and qcow2_alloc_compressed_cluster_offset().

What about qcow2_alloc_clusters() ?

Seems like all callers for that but do_alloc_cluster_offset() call it to allocate metadata clusters, which cannot be discarded from the guest.

Is it really possible that some metadata cluster is used while qcow2 discards it internally at the same time, or isn’t all of this only a problem for data clusters?

When those functions return an offset (in)to some cluster, that cluster (or the image as a whole) should be locked against discards. Every offset received this way would require an accompanying qcow2_put_host_offset().

Or to update any kind of "getting cluster offset" in the whole qcow2 driver to take a kind of "dynamic reference count" by get_cluster_offset() and then call corresponding put() somewhere? In this case I'm afraid it's a lot more work..

Hm, really?  I would have assumed we need to do some locking in all functions that get a cluster offset this way, so it should be less work to take the lock in the functions they invoke to get the offset.

It would be also the problem that a lot of paths in qcow2 are not in coroutine and don't even take s->lock when they actually should.

I’m not sure what you mean here, because all functions that invoke any of the three functions I listed above are coroutine_fns (or, well, I didn’t look it up, but they all have *_co_* in their name).

qcow2_alloc_clusters() has a lot more callers..

Let’s hope we don’t need to worry about it then. O:)

This will also mean that we do same job as normal qcow2 refcounts already do: no sense in keeping additional "dynamic refcount" for L2 table cluster while reading it, as we already have non-zero qcow2 normal refcount for it..

I’m afraid I don’t understand how normal refcounts relate to this. For example, qcow2_get_host_offset() doesn’t touch refcounts at all.


I mean the following: remember our discussion about what is free-cluster. If we add "dynamic-refcount", or "infligth-write-counter" thing only to count inflight data-writing (or, as discussed, we should count data-reads as well) operations, than "full reference count" of the cluster is inflight-write-count + qcow2-metadata-refcount.

But if we add a kind of "dynamic refcount" for any use of host cluster, for example reading of L2 table, than we duplicate the reference in qcow2-metadata to this L2 table (represented as refcount) by our "dynamic refcount", and we don't have a concept of "full reference count" as the sum above.. We still should treat a cluster as free when both "dynamic refcount" and qcow2-metadata-refcount are zero, but their sum doesn't have a good sense. Not a problem maybe.. But looks like a complication with no benefit.

You’re right, but I don’t think that’s a real problem. Perhaps the sum was even a false equivalency. There is a difference between the dynamic refcount and the on-disk refcount: We must wait with discarding until the the dynamic refcount is 0, and discarding will then drop the on-disk refcount to 0, too. I think. So I’m not sure whether the sum really means anything.

But if metadata isn’t a problem and that means we don’t have ask these questions at all, well, that’ll be even better.

==

OK, I think now that you didn't mean qcow2_alloc_clusters(). So, we are saying about only functions returning an offset to cluster with "guest data", not to any kind of host cluster. Than what you propose looks like this to me:

  - take my v5
  - rename qcow2_inflight_writes_dec() to put_cluster_offset()
  - call qcow2_inflight_writes_inc() from the three functions you mention

Yes, I think so. Or you take the CoRwLock in those three functions, depending on which implementation we want.

Sorry if we’ve discussed this before and I just forgot, but: What are the performance implications of either solution? As far as I remember, the inflight-write-counter solution had the problem of always doing stuff on every I/O access. You said the impact was small and yes, it is, but it’s still there. I haven’t looked at the CoRwLock solution but it I would assume it’s basically zero cost for common cases, right? I.e. the case where the guest already serializes discards from other accesses, which I thought is what e.g. Linux does. (And even if it doesn’t, I would assume that concurrent I/O and discards are rather rare.)

That make sense to me. Still, put_cluster_offset() name doesn't make it obvious that it's only for clusters with "guest data", and we shouldn't call it when work with metadata clusters.

Yeah, it was meant for symmetry with qcow2_get_host_offset() (i.e. it would be named “qcow2_put_host_offset()”). Now that there are three functions that would take a reference, it should get some other name. I don’t know. qcow2_put_data_cluster_offset()?

Max




reply via email to

[Prev in Thread] Current Thread [Next in Thread]