qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set


From: Klaus Jensen
Subject: Re: [PATCH v9 08/12] hw/block/nvme: Support Zoned Namespace Command Set
Date: Tue, 8 Dec 2020 21:20:17 +0100

On Dec  8 20:02, Dmitry Fomichev wrote:
> Hi Klaus,
> 
> Thank you for your review! Please see replies below...
> 
> 
> On Thu, 2020-11-12 at 20:36 +0100, Klaus Jensen wrote:
> > Hi Dmitry,
> > 
> > I know you posted v10, but my comments should be relevant to that as
> > well.
> > 
> > On Nov  5 11:53, Dmitry Fomichev wrote:
> > > @@ -133,6 +300,12 @@ static Property nvme_ns_props[] = {
> > >      DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
> > >      DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> > >      DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
> > > +    DEFINE_PROP_BOOL("zoned", NvmeNamespace, params.zoned, false),
> > 
> > I disagree on this. Using the "magic" value ensures that only one
> > command set can be selected. We can do a custom property so we can set
> > `iocs=zoned` as well as `iocs=0x2` if that makes it more user friendly?
> 
> I doubt that an average admin will even know what "iocs" really means, leave
> alone for knowing any magic values. On the other hand, it would be trivial
> to add a check to prevent users from doing zoned=true kv=true, etc. I don't
> see that as a big problem.
> 

OK, I'm fine with this.

> > 
> > > +    DEFINE_PROP_SIZE("zoned.zsze", NvmeNamespace, params.zone_size_bs,
> > > +                     NVME_DEFAULT_ZONE_SIZE),
> > > +    DEFINE_PROP_SIZE("zoned.zcap", NvmeNamespace, params.zone_cap_bs, 0),
> > > +    DEFINE_PROP_BOOL("zoned.cross_read", NvmeNamespace,
> > > +                     params.cross_zone_read, false),
> > 
> > Same reason why I think we should just expose ozcs directly instead of
> > adding more parameters.
> > 
> > We are already adding a bunch of new parameters - might as well keep the
> > number as low as possible.
> 
> There is only RAZB that is defined in OZCS as of now and you will not be
> able to reduce the number of module parameters by exposing OZCS instead of
> RAZB. But telling the user what RAZB really means in the parameter name is,
> IMO, a better choice.
> 

The TP that shall not be named puts stuff in there but I'm OK with the
zoned.cross_read parameter.

> > > +static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, NvmeRequest *req)
> > > +{
> > > +    NvmeCmd *cmd = (NvmeCmd *)&req->cmd;
> > > +    NvmeNamespace *ns = req->ns;
> > > +    /* cdw12 is zero-based number of dwords to return. Convert to bytes 
> > > */
> > > +    uint32_t data_size = (le32_to_cpu(cmd->cdw12) + 1) << 2;
> > > +    uint32_t dw13 = le32_to_cpu(cmd->cdw13);
> > > +    uint32_t zone_idx, zra, zrasf, partial;
> > > +    uint64_t max_zones, nr_zones = 0;
> > > +    uint16_t ret;
> > > +    uint64_t slba;
> > > +    NvmeZoneDescr *z;
> > > +    NvmeZone *zs;
> > > +    NvmeZoneReportHeader *header;
> > > +    void *buf, *buf_p;
> > > +    size_t zone_entry_sz;
> > > +
> > > +    req->status = NVME_SUCCESS;
> > > +
> > > +    ret = nvme_get_mgmt_zone_slba_idx(ns, cmd, &slba, &zone_idx);
> > > +    if (ret) {
> > > +        return ret;
> > > +    }
> > 
> > Zone Management Receive does not specify anything for the given SLBA.
> > Out-of-bounds is acceptable, just results in no descriptors being
> > returned.
> 
> SLBA is an LBA in the lowest zone that is considered for reporting.
> The text in 4.4.1.1 a) says "report only zone descriptors for which
> the ZSLBA value is greater or equal to the ZSLBA value of the zone
> specified by the SLBA value in the command". The last part of this
> paragraph basically says that SLBA has to point to a zone, hence
> the error if it doesn't.
> 

Hmm. I tend to disagree since nowhere does the spec define that an error
should be returned if the given ZSLBA does not resolve to a valid zone.

> > > +
> > > +        zone_idx++;
> > > +    }
> > > +
> > > +    if (!partial) {
> > > +        for (; zone_idx < ns->num_zones; zone_idx++) {
> > > +            zs = &ns->zone_array[zone_idx];
> > > +            if (nvme_zone_matches_filter(zrasf, zs)) {
> > > +                nr_zones++;
> > > +            }
> > > +        }
> > > +    }
> > 
> > I did something like this as well (only counting matching zones from
> > given SLBA), but when looking at the spec now, it seems that this is a
> > remnant from an older version of the spec? Please correct me if wrong.
> > 
> > On the Partial Report bit, the ratified specification just says that "If
> > this bit is cleared to '0', then the value in the Number of Zones field
> > indicates the number of zone descriptors that match the criteria in the
> > Zone Receive Action Specific field.".
> > 
> > So, I think if !partial, the Number of Zones field should not consider
> > the SLBA and just count from 0.
> 
> If Partial is 0, then the header contains the number of descriptors that
> can be potentially reported from SLBA until the end of LBA range if the
> buffer would be unlimited. If the Partial bit is 1, the same count is
> additionally limited by the number of descriptors that can fit to the
> provided buffer. Perhaps ZNS spec is not quite clear about this, but this
> is the way all other zoned devices work. The most clear description of
> this logic that I am aware of can be found in T13 ZAC-2, p. 5.2.10.6.2
> "ZONE LIST LENGTH field".
> 

Looking again, the spec is actually clear. It says that it "indicates the
number of zones that match the criteria described in section 4.4.1.1".
And that includes the criteria that it only reports zones that have a
ZSLBA equal to or greater than the given SLBA. So it is sound.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]