qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Na


From: Matias Bjorling
Subject: RE: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set
Date: Tue, 29 Sep 2020 19:42:51 +0000


> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Tuesday, 29 September 2020 20.36
> To: Matias Bjorling <Matias.Bjorling@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Damien Le Moal
> <Damien.LeMoal@wdc.com>; Fam Zheng <fam@euphon.net>; Kevin Wolf
> <kwolf@redhat.com>; qemu-block@nongnu.org; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Klaus Jensen <k.jensen@samsung.com>; qemu-
> devel@nongnu.org; Alistair Francis <Alistair.Francis@wdc.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>
> Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types and
> Zoned Namespace Command Set
> 
> On Sep 29 18:17, Matias Bjorling wrote:
> >
> >
> > > -----Original Message-----
> > > From: Klaus Jensen <its@irrelevant.dk>
> > > Sent: Tuesday, 29 September 2020 20.00
> > > To: Keith Busch <kbusch@kernel.org>
> > > Cc: Damien Le Moal <Damien.LeMoal@wdc.com>; Fam Zheng
> > > <fam@euphon.net>; Kevin Wolf <kwolf@redhat.com>; qemu-
> > > block@nongnu.org; Niklas Cassel <Niklas.Cassel@wdc.com>; Klaus
> > > Jensen <k.jensen@samsung.com>; qemu-devel@nongnu.org; Alistair
> > > Francis <Alistair.Francis@wdc.com>; Philippe Mathieu-Daudé
> > > <philmd@redhat.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> > > Subject: Re: [PATCH v4 00/14] hw/block/nvme: Support Namespace Types
> > > and Zoned Namespace Command Set
> > >
> > > On Sep 29 10:29, Keith Busch wrote:
> > > > On Tue, Sep 29, 2020 at 12:46:33PM +0200, Klaus Jensen wrote:
> > > > > It is unmistakably clear that you are invalidating my arguments
> > > > > about portability and endianness issues by suggesting that we
> > > > > just remove persistent state and deal with it later, but
> > > > > persistence is the killer feature that sets the QEMU emulated
> > > > > device apart from other emulation options. It is not about using
> > > > > emulation in production (because yeah, why would you?), but
> > > > > persistence is what makes it possible to develop and test "zoned
> > > > > FTLs" or something that
> > > requires recovery at power up.
> > > > > This is what allows testing of how your host software deals with
> > > > > opened zones being transitioned to FULL on power up and the
> > > > > persistent tracking of LBA allocation (in my series) can be used
> > > > > to properly test error recovery if you lost state in the app.
> > > >
> > > > Hold up -- why does an OPEN zone transition to FULL on power up?
> > > > The spec suggests it should be CLOSED. The spec does appear to
> > > > support going to FULL on a NVM Subsystem Reset, though. Actually,
> > > > now that I'm looking at this part of the spec, these implicit
> > > > transitions seem a bit less clear than I expected. I'm not sure
> > > > it's clear enough to evaluate qemu's compliance right now.
> > > >
> > > > But I don't see what testing these transitions has to do with
> > > > having a persistent state. You can reboot your VM without tearing
> > > > down the running QEMU instance. You can also unbind the driver or
> > > > shutdown the controller within the running operating system. That
> > > > should make those implicit state transitions reachable in order to
> > > > exercise your FTL's recovery.
> > > >
> > >
> > > Oh dear - don't "spec" with me ;)
> > >
> > > NVMe v1.4 Section 7.3.1:
> > >
> > >     An NVM Subsystem Reset is initiated when:
> > >       * Main power is applied to the NVM subsystem;
> > >       * A value of 4E564D64h ("NVMe") is written to the NSSR.NSSRC
> > >         field;
> > >       * Requested using a method defined in the NVMe Management
> > >         Interface specification; or
> > >       * A vendor specific event occurs.
> > >
> > > In the context of QEMU, "Main power" is tearing down QEMU and
> > > starting it from scratch. Just like on a "real" host, unbinding the
> > > driver, rebooting or shutting down the controller does not cause a
> > > subsystem reset (and does not cause the zones to change state). And
> > > since the device does not indicate support for the optional
> > > NSSR.NSSRC register, that way to initiate a subsystem cannot be used.
> > >
> > > The reason for moving to FULL is that write pointer updates are not
> > > persisted on each advancement, only when the zone state changes. So
> > > zones that were opened might have valid data, but invalid write pointer.
> > > So the device transitions them to FULL as it is allowed to.
> > >
> >
> > How about when one must also recover from intermediate states (i.e.,
> > open/closed upon power loss). For example, I don't hope a real SSD
> > implementation transition zones to full when it has thousands of open
> > simultaneously. That could be a disaster for the PE cycles, and a lot
> > of media going to waste. One would want applications to support that
> > kind of failure mode as well.
> 
> Christ. The WDC Strike Force is really jumping out of lightspeed here.
> I'm afraid I don't have an opposing force to engage with. So I'll be your only
> boxing bag for the evening.
> 
> As Keith just said, "Opened" is not a valid intial state. Didn't you write the
> spec? ;) As for Closed, they will be brought up as is.

Upon power failure, a zone in the Explicitly Opened state or the Implicitly 
Opened state, and has LBAs written, can either be transitioned to Full or 
Closed state by the controller.

In the previous mail, I wanted to point out that if the intention of qemu was 
to test applications upon power failures, it could be beneficial to have an 
option that allowed transitioning open zones to closed upon power failure.

Then applications can be tested with that in mind as well, without having 
access to an SSD that provided that kind of implementation.

> 
> With that in mind, I'm not sure what you specifically refer to? I'll gently 
> remind
> you that the QEMU nvme device is not a real SSD and does not deal with NAND
> so it does not really do any "recovering" of intermediate states on power on 
> if
> that is what you refer to?

reply via email to

[Prev in Thread] Current Thread [Next in Thread]