qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlo


From: Damien Le Moal
Subject: Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Date: Fri, 2 Sep 2022 11:13:11 +0900
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0

On 9/1/22 23:57, Markus Armbruster wrote:
> Sam Li <faithilikerun@gmail.com> writes:
> 
>> Markus Armbruster <armbru@redhat.com> 于2022年8月31日周三 16:35写道:
>>>
>>> Sam Li <faithilikerun@gmail.com> writes:
>>>
>>>> Markus Armbruster <armbru@redhat.com> 于2022年8月30日周二 19:57写道:
>>>>>
>>>>> Sam Li <faithilikerun@gmail.com> writes:
>>>>>
>>>>>> By adding zone management operations in BlockDriver, storage controller
>>>>>> emulation can use the new block layer APIs including Report Zone and
>>>>>> four zone management operations (open, close, finish, reset).
>>>>>>
>>>>>> Add zoned storage commands of the device: zone_report(zrp), 
>>>>>> zone_open(zo),
>>>>>> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>>>>>>
>>>>>> For example, to test zone_report, use following command:
>>>>>> $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>>>>>> filename=/dev/nullb0
>>>>>> -c "zrp offset nr_zones"
>>>>>>
>>>>>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>>>>> Reviewed-by: Hannes Reinecke <hare@suse.de>
>>>>>
>>>>> [...]
>>>>>
>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>> index 0a8b4b426e..e3efba6db7 100644
>>>>>> --- a/block/file-posix.c
>>>>>> +++ b/block/file-posix.c
>>>>>
>>>>> [...]
>>>>>
>>>>>> @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>>>>>>  #endif
>>>>>>  };
>>>>>>
>>>>>> +#if defined(CONFIG_BLKZONED)
>>>>>> +static BlockDriver bdrv_zoned_host_device = {
>>>>>> +        .format_name = "zoned_host_device",
>>>>>
>>>>> Indentation should be 4, not 8.
>>>>>
>>>>>> +        .protocol_name = "zoned_host_device",
>>>>>> +        .instance_size = sizeof(BDRVRawState),
>>>>>> +        .bdrv_needs_filename = true,
>>>>>> +        .bdrv_probe_device  = hdev_probe_device,
>>>>>> +        .bdrv_file_open     = hdev_open,
>>>>>> +        .bdrv_close         = raw_close,
>>>>>> +        .bdrv_reopen_prepare = raw_reopen_prepare,
>>>>>> +        .bdrv_reopen_commit  = raw_reopen_commit,
>>>>>> +        .bdrv_reopen_abort   = raw_reopen_abort,
>>>>>> +        .bdrv_co_create_opts = bdrv_co_create_opts_simple,
>>>>>> +        .create_opts         = &bdrv_create_opts_simple,
>>>>>> +        .mutable_opts        = mutable_opts,
>>>>>> +        .bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>>>>>> +        .bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>>>>>> +
>>>>>> +        .bdrv_co_preadv         = raw_co_preadv,
>>>>>> +        .bdrv_co_pwritev        = raw_co_pwritev,
>>>>>> +        .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>>>>>> +        .bdrv_co_pdiscard       = hdev_co_pdiscard,
>>>>>> +        .bdrv_co_copy_range_from = raw_co_copy_range_from,
>>>>>> +        .bdrv_co_copy_range_to  = raw_co_copy_range_to,
>>>>>> +        .bdrv_refresh_limits = raw_refresh_limits,
>>>>>> +        .bdrv_io_plug = raw_aio_plug,
>>>>>> +        .bdrv_io_unplug = raw_aio_unplug,
>>>>>> +        .bdrv_attach_aio_context = raw_aio_attach_aio_context,
>>>>>> +
>>>>>> +        .bdrv_co_truncate       = raw_co_truncate,
>>>>>> +        .bdrv_getlength = raw_getlength,
>>>>>> +        .bdrv_get_info = raw_get_info,
>>>>>> +        .bdrv_get_allocated_file_size
>>>>>> +                            = raw_get_allocated_file_size,
>>>>>> +        .bdrv_get_specific_stats = hdev_get_specific_stats,
>>>>>> +        .bdrv_check_perm = raw_check_perm,
>>>>>> +        .bdrv_set_perm   = raw_set_perm,
>>>>>> +        .bdrv_abort_perm_update = raw_abort_perm_update,
>>>>>> +        .bdrv_probe_blocksizes = hdev_probe_blocksizes,
>>>>>> +        .bdrv_probe_geometry = hdev_probe_geometry,
>>>>>> +        .bdrv_co_ioctl = hdev_co_ioctl,
>>>>>> +
>>>>>> +        /* zone management operations */
>>>>>> +        .bdrv_co_zone_report = raw_co_zone_report,
>>>>>> +        .bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>>>>>> +};
>>>>>
>>>>> Differences to bdrv_host_device:
>>>>>
>>>>> * .bdrv_parse_filename is not set
>>>>>
>>>>> * .bdrv_co_ioctl is not set
>>>>>
>>>>> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set
>>>>
>>>> As Stefan mentioned, zoned_host_device is a new driver that doesn't
>>>> work with string filenames. .bdrv_parse_filename() helps legacy
>>>> drivers strip the optional protocol prefix off the filename and no use
>>>> here. Therefore it can be dropped.
>>>
>>> Makes sense.
>>>
>>>> .bdrv_co_ioctl is set actually.
>>>
>>> You're right; I diffed the two and misread the result.
>>>
>>>> Zoned_host_device is basically host_device + zone operations. It
>>>> serves for a simple purpose: if the host device is zoned, register
>>>> zoned_host_device driver; else, register host_device.
>>>
>>> Why would I ever want to use host_device instead of zoned_host_device?
>>>
>>> To answer this question, we need to understand how their behavior
>>> differs.
>>>
>>> We can ignore the legacy protocol prefix / string filename part.
>>>
>>> All that's left seems to be "if the host device is zoned, then using the
>>> zoned_host_device driver gets you the zoned features, whereas using the
>>> host_device driver doesn't".  What am I missing?
>>
>> I think that's basically what users need to know about.
> 
> Now answer my previous question, please: why would I ever want to use
> host_device instead of zoned_host_device?
> 
> Or in other words, why would I ever want to present a zoned host device
> to a guest as non-zoned device?
> 
>>>>> Notably common is .bdrv_file_open = hdev_open.  What happens when you
>>>>> try to create a zoned_host_device where the @filename argument is not in
>>>>> fact a zoned device?
>>>>
>>>> If the device is a regular block device, QEMU will still open the
>>>> device. For instance, I use a loopback device to test zone_report in
>>>> qemu-io. It returns ENOTTY which indicates Inappropriate ioctl for the
>>>> device. Meanwhile, if using a regular block device when emulation a
>>>> zoned device on a guest os, the best case is that the guest can boot
>>>> but has no emulated block device. In some cases, QEMU just terminates
>>>> because the block device has not met the alignment requirements.
>>>
>>> I'm not sure I understand all of this.  I'm also not sure I have to :)
>>
>> Maybe I didn't explain it very well. Which part would you like to know
>> more about?
> 
> Let's try more specific questions.  Say I configure a zoned_host_device
> backed by a host device that isn't zoned.
> 
> 1. Is this configuration accepted?

If we assume we have the special zoned_host_device driver, with the
associated command line zoned_host_device option explicitly calling for
it, then no, I do not think this should be allowed at all and an error
should be returned on startup. That would be consistent with the fact that
the options zoned_host_device and host_device are different to make sure
we can check that the user knows what he/she/them is doing.

If we have only host_device as a setup option and driver, the driver
methods can be trivially adjusted to do the right thing based on the
device type (i.e. zoned vs regular/not zoned). However, that would prevent
an interesting future extension of this work to implement a full zone
emulation on top of a regular (not zoned) host block device.

With this in mind, we currently have the following:

1) host_device option -> accept only regular non-zoned host block devices
2) zoned_host_device option -> accept only zoned host block devices

And in the future, we can have:

1) host_device option -> accept only regular non-zoned host block devices
2) zoned_host_device option -> accept any host block device type
        a) Use native zone kernel API for zoned host block devices
        b) Use full zone emulation for regular host block devices

But sure, internally, we could have a single driver structure with methods
adjusted to do the correct thing based on the device type and option
specified. Having a 1:1 mapping between the driver name and driver
structure does clarify things I think (even though there are indeed a lot
of methods that are identical).

> 
> 2. Would a guest work as long as it doesn't touch this device?
> 
> 3. Would a guest using this device work as long as it uses no zoned
>    features?
> 
> 4. What happens when a guest tries to use zoned features?
> 
> [...]
> 

-- 
Damien Le Moal
Western Digital Research




reply via email to

[Prev in Thread] Current Thread [Next in Thread]