qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-b


From: Eric Blake
Subject: Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies
Date: Thu, 1 Jun 2023 08:00:55 -0500
User-agent: NeoMutt/20230517

On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:
> On 5/25/23 15:00, Eric Blake wrote:
> > Support receiving headers for 64-bit replies if extended headers were
> > negotiated.  We already insist that the server not send us too much
> > payload in one reply, so we can exploit that and merge the 64-bit
> > length back into a normalized 32-bit field for the rest of the payload
> > length calculations.  The NBD protocol specifically documents that
> > extended mode takes precedence over structured replies, and that there
> > are no simple replies in extended mode.  We can also take advantage
> > that the handle field is in the same offset in all the various reply
> > types.
> 
> (1) We might want to replace "handle" with "cookie" at this point.

Yep, I'm already rebasing the series to do that, now that c1df4df9
landed.

> 
> >
> > Note that if we negotiate extended headers, but a non-compliant server
> > replies with a non-extended header, this patch will stall waiting for
> > the server to send more bytes
> 
> Ah, yes. If we negotiate extended headers, we set "rlen" to "sizeof
> extended" in REPLY.START, which is larger than either of "sizeof simple"
> and "sizeof structured". Therefore we'll get stuck in REPLY.RECV_REPLY.
> Correct?

Yes.

> 
> > rather than noticing that the magic
> > number is wrong (for aio operations, you'll get a magic number
> > mismatch once you send a second command that elicits a reply; but for
> > blocking operations, we basically deadlock).  The easy alternative
> > would be to read just the first 4 bytes of magic, then determine how
> > many more bytes to expect; but that would require more states and
> > syscalls, and not worth it since the typical server will be compliant.
> 
> Not liking this "not worth it" argument. But...
> 
> > The other alternative is what the next patch implements: teaching
> > REPLY.RECV_REPLY to handle short reads that were at least long enough
> > to transmit magic to specifically look for magic number mismatch.
> 
> ... that sounds OK!

If you haven't guessed already, I actually did hit this bug (my
initial qemu patches sent the wrong reply type, and it took me several
minutes to figure out why libnbd was hung), so I was also pleasantly
surprised at how easy the next patch turned out to be, after first
trying (and failing) to go down the rabbit hole mentioned here of
adding yet more states.

But, as you mentioned earlier in the series, it MAY be worth
rearranging states anyways, to specifically route extended header
parsing differently from structured header parsing.  So I'm still
playing with that, and seeing what may happen to this patch as
fallout.

> 
> >
> > At this point, h->extended_headers is permanently false (we can't
> > enable it until all other aspects of the protocol have likewise been
> > converted).
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> >  lib/internal.h                      |  1 +
> >  generator/states-reply-structured.c | 52 +++++++++++++++++++----------
> >  generator/states-reply.c            | 34 ++++++++++++-------
> >  3 files changed, 58 insertions(+), 29 deletions(-)
> 
> Probably best to reorder the files in this patch for review:

I see what you mean: because of the state hierarchy, it is probably
worth tweaking the git orderfile to favor files nearer the top of the
state tree first, rather than strict alphabetical ordering of *.c.  I
may just push a patch for that without needing review, as it doesn't
affect library semantics at all.

...
> > +++ b/generator/states-reply.c
> > @@ -62,15 +62,19 @@  REPLY.START:
> >     */
> >    ssize_t r;
> >
> > -  /* We read all replies initially as if they are simple replies, but
> > -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> > -   * This works because the structured_reply header is larger.
> > +  /* With extended headers, there is only one size to read, so we can do it
> > +   * all in one syscall.  But for older structured replies, we don't know 
> > if
> > +   * we have a simple or structured reply until we read the magic number,
> > +   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
> >     */
> >    assert (h->reply_cmd == NULL);
> >    assert (h->rlen == 0);
> >
> >    h->rbuf = &h->sbuf.reply.hdr;
> > -  h->rlen = sizeof h->sbuf.reply.hdr.simple;
> > +  if (h->extended_headers)
> > +    h->rlen = sizeof h->sbuf.reply.hdr.extended;
> > +  else
> > +    h->rlen = sizeof h->sbuf.reply.hdr.simple;
> >
> >    r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
> >    if (r == -1) {
> 
> The comment "This works because the structured_reply header is larger"
> is being removed, which makes the non-ext branch even less
> comprehensible than before.
> 
> (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is
> smaller than "sizeof structured"?

Yep, I'm already rebasing onto some additional asserts along those
lines, based on your reviews earlier in the series.

> 
> > @@ -116,16 +120,22 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
> >    uint64_t cookie;
> >
> >    magic = be32toh (h->sbuf.reply.hdr.simple.magic);
> > -  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
> > +  switch (magic) {
> > +  case NBD_SIMPLE_REPLY_MAGIC:
> > +    if (h->extended_headers)
> > +      goto invalid;
> >      SET_NEXT_STATE (%SIMPLE_REPLY.START);
> > -  }
> > -  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
> > +    break;
> > +  case NBD_STRUCTURED_REPLY_MAGIC:
> > +  case NBD_EXTENDED_REPLY_MAGIC:
> > +    if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)
> 
> Heh, I love this ((a == b) == c) "equivalence" form! :)

I still do a double-take every time I see 'a < b < c' in languages
(like python) that support it as shorthand for 'a < b && b < c'.  Even
weirder is when you see 'a < b > c'.  But yes, C's non-transitive ==
can be a surprise for the uninitiated.

> 
> > +      goto invalid;
> >      SET_NEXT_STATE (%STRUCTURED_REPLY.START);
> > -  }
> > -  else {
> > -    /* We've probably lost synchronization. */
> > -    SET_NEXT_STATE (%.DEAD);
> > -    set_error (0, "invalid reply magic 0x%" PRIx32, magic);
> > +    break;
> > +  default:
> > +  invalid:
> > +    SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
> > +    set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
> >  #if 0 /* uncomment to see desynchronized data */
> >      nbd_internal_hexdump (&h->sbuf.reply.hdr.simple,
> >                            sizeof (h->sbuf.reply.hdr.simple),
> 
> My apologies, but I find *this* placement of "invalid" terrible. I
> thought the "goto invalid" statements were jumping to the end of the
> function. Now I see they jump effectively to another case label.
> 
> (3) How about this (on top of your patch, to be squashed):

That part of the patch pre-dates other reviews I've seen from you on
the same topic (this patch series has been percolating on my local
builds for a long time), so I'm not surprised by your request, and I'm
happy to squash in your improvement.  I may have copied from other
similar code in the generator/states-*.c files, if so, I'll probably
do a separate cleanup patch first.

> 
> ... At the same time, even with this "cleanup" for the labels, I find it
> relatively confusing (albeit correct, as far as I can tell!) that in
> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we first check the magic
> number(s) and *then* whether we negotiated extended headers, whereas in
> all the other state handlers, the order (= condition nesting) is the
> opposite: we first check extended headers, and deal with any other
> objects second. This makes the assert()s in REPLY.STRUCTURED_REPLY.START
> especially tricky to demonstrate -- I think I've managed to do it, it
> looks correct, it's just hard. So that's a half- (or quarter-)hearted
> proposal to investigate how REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY
> looked if there too, "extended_headers" were the outer check. Anyway, I
> don't feel strongly about this. :)
> 
> (4) Still in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we have a comment
> saying
> 
>    /* NB: This works for both simple and structured replies because the
>     * handle (our cookie) is stored at the same offset.
>     */
> 
> The code under it applies to extended headers too, so the comment should
> be updated.
> 
> (There's a similar comment in REPLY.FINISH_COMMAND, too; I'm not sure if
> that state is relevant for extended headers.)

Indeed, when rebasing, I need to remember to grep (to cover comments)
rather than merely relying on the compiler (which only covers code)
for renames.

> 
> Then:
> 
> > diff --git a/generator/states-reply-structured.c 
> > b/generator/states-reply-structured.c
> > index df509216..c53aed7b 100644
> > --- a/generator/states-reply-structured.c
> > +++ b/generator/states-reply-structured.c
> > @@ -45,14 +45,20 @@ structured_reply_in_bounds (uint64_t offset, uint32_t 
> > length,
> >
> >  STATE_MACHINE {
> >   REPLY.STRUCTURED_REPLY.START:
> > -  /* We've only read the simple_reply.  The structured_reply is longer,
> > -   * so read the remaining part.
> > +  /* If we have extended headers, we've already read the entire header.
> > +   * Otherwise, we've only read enough for a simple_reply; since structured
> > +   * replies are longer, read the remaining part.
> >     */
> 
> (5) Ah, the word "simple_reply", which this change preserves, is a
> leftover. It should be updated in patch#2, "internal: Refactor layout of
> replies in sbuf", where the "simple_reply" field is replaced with
> "reply.hdr.simple" in "sbuf".
> 
> I guess I didn't notice this omission in patch#2 because the context
> there only shows the "so read the remaining part" line of the comment,
> and "simple_reply" is on the preceding line.
> 
> And then this patch too should refer to "reply.hdr.simple", in the new
> comment.
> 
> Alternatively, perhaps use the type name (struct tag, actually):
> "nbd_simple_reply".

Will fix in the appropriate place(s).

> 
> > -  h->rbuf = &h->sbuf;
> > -  h->rbuf = (char *)h->rbuf + sizeof h->sbuf.reply.hdr.simple;
> > -  h->rlen = sizeof h->sbuf.reply.hdr.structured;
> > -  h->rlen -= sizeof h->sbuf.reply.hdr.simple;
> > -  SET_NEXT_STATE (%RECV_REMAINING);
> > +  if (h->extended_headers) {
> > +    assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + 
> > (char*)&h->sbuf);
> > +    SET_NEXT_STATE (%CHECK);
> > +  }
> > +  else {
> > +    assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf);
> > +    h->rlen = sizeof h->sbuf.reply.hdr.structured -
> > +      sizeof h->sbuf.reply.hdr.simple;
> > +    SET_NEXT_STATE (%RECV_REMAINING);
> > +  }
> >    return 0;
> >
> >   REPLY.STRUCTURED_REPLY.RECV_REMAINING:
> 
> This looks OK, but can we make it less verbose? How about:
> 
> /----------------------
> | diff --git a/lib/internal.h b/lib/internal.h
> | index e4e21a4d1ffa..a630b2511ff7 100644
> | --- a/lib/internal.h
> | +++ b/lib/internal.h
> | @@ -240,7 +240,7 @@ struct nbd_handle {
> |      } or;
> |      struct nbd_export_name_option_reply export_name_reply;
> |      struct {
> | -      union {
> | +      union reply_header {

Oh cool.  I keep forgetting that you can declare a type name usable at
the top level even while declaring a nested struct.  Yeah, doing that
probably has some nice line length improvements.

> |          struct nbd_simple_reply simple;
> |          struct nbd_structured_reply structured;
> |          struct nbd_extended_reply extended;
> | diff --git a/generator/states-reply-structured.c 
> b/generator/states-reply-structured.c
> | index c53aed7bb859..852cb5b6053c 100644
> | --- a/generator/states-reply-structured.c
> | +++ b/generator/states-reply-structured.c
> | @@ -49,14 +49,14 @@  REPLY.STRUCTURED_REPLY.START:
> |     * Otherwise, we've only read enough for a simple_reply; since structured
> |     * replies are longer, read the remaining part.
> |     */
> | +  union reply_header *hdr = &h->sbuf.reply.hdr;
> |    if (h->extended_headers) {
> | -    assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + 
> (char*)&h->sbuf);
> | +    assert (h->rbuf == sizeof hdr->extended + (char*)&h->sbuf);
> |      SET_NEXT_STATE (%CHECK);
> |    }
> |    else {
> | -    assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf);
> | -    h->rlen = sizeof h->sbuf.reply.hdr.structured -
> | -      sizeof h->sbuf.reply.hdr.simple;
> | +    assert (h->rbuf == sizeof hdr->simple + (char*)&h->sbuf);
> | +    h->rlen = sizeof hdr->structured - sizeof hdr->simple;
> |      SET_NEXT_STATE (%RECV_REMAINING);
> |    }
> |    return 0;
> \----------------------
> 
> Anyway, feel free to ignore this -- I can see two counter-arguments
> myself:
> 
> - the rest of the code remains just as verbose,
>

Not necessarily - once I add in more STATIC_ASSERTs, being able to
refer to individual nested types without having to go all the way
through struct nbd_handle or union sbuf will be shorter.

> - one could argue that the addition
> 
>     sizeof hdr->simple + (char*)&h->sbuf
> 
>   is *more* confusing than
> 
>     sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf

That argument still remains - it's easier to see that we expect a
particular offset from sbuf when the use of sbuf in the offset
calculation is not hidden several lines away in the intermediate
initialization.   I'll think about it for this line.

> 
> Then:
> 
> > @@ -69,11 +75,18 @@  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
> >   REPLY.STRUCTURED_REPLY.CHECK:
> >    struct command *cmd = h->reply_cmd;
> >    uint16_t flags, type;
> > -  uint32_t length;
> > +  uint64_t length;
> > +  uint64_t offset = -1;
> 
> (6) I disagree with initializing the local variable "offset" here.
> 
> Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read
> "offset" back if "extended_headers" is set. But if "extended_headers" is
> set, we also store a value to "offset", before the read.
> 
> Initializing "offset" to -1 suggests that the code might otherwise read
> an indeterminate value from "offset" -- but that's not the case.

I'll have to double-check; I thought that offset was read even for
structured replies (where there really isn't an offset read from the
wire), but my rebasing efforts may have changed that.

> 
> >
> > +  assert (cmd);
> 
> (7) What guarantees this?
> 
> Is it the lookup code at the end of
> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY?
> 
> But that code can set "reply_cmd" to NULL, in case the cookie is not
> found; and the cookie lookup there defers to FINISH for diagnosing the
> unassociated reply.

Hmmm.  Reading ahead, I see what you wrote in (9).  This may have been
something I copied from other states, but I'll have to double check
whether it still makes sense.

> 
> >    flags = be16toh (h->sbuf.reply.hdr.structured.flags);
> >    type = be16toh (h->sbuf.reply.hdr.structured.type);
> > -  length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +  if (h->extended_headers) {
> > +    length = be64toh (h->sbuf.reply.hdr.extended.length);
> > +    offset = be64toh (h->sbuf.reply.hdr.extended.offset);
> > +  }
> > +  else
> > +    length = be32toh (h->sbuf.reply.hdr.structured.length);
> >
> >    /* Reject a server that replies with too much information, but don't
> >     * reject a single structured reply to NBD_CMD_READ on the largest
> > @@ -83,13 +96,18 @@  REPLY.STRUCTURED_REPLY.CHECK:
> >     * not worth keeping the connection alive.
> >     */
> >    if (length > MAX_REQUEST_SIZE + sizeof 
> > h->sbuf.reply.payload.offset_data) {
> > -    set_error (0, "invalid server reply length %" PRIu32, length);
> > +    set_error (0, "invalid server reply length %" PRIu64, length);
> >      SET_NEXT_STATE (%.DEAD);
> >      return 0;
> >    }
> > +  /* For convenience, we now normalize extended replies into compact,
> > +   * doable since we validated that payload length fits in 32 bits.
> > +   */
> > +  h->sbuf.reply.hdr.structured.length = length;
> 
> (8) I'm very confused by this. For an extended reply, this will
> overwrite the "offset" field.

At one point, I considered normalizing in the opposite direction (take
structured replies and widen them into the extended header form); it
required touching more lines of code, so I didn't pursue it at the
time.  But I can see how compressing things down (intentionally
throwing away information we will no longer reference) can be
confusing without good comments, so at a minimum, I will be adding
comments, if not outright going with the widening rather than
narrowing approach.

> 
> I understand we have stolen that field already, above; but that's little
> solace, especially without specific comments.
> 
> >
> >    /* Skip an unexpected structured reply, including to an unknown cookie. 
> > */
> > -  if (cmd == NULL || !h->structured_replies)
> > +  if (cmd == NULL || !h->structured_replies ||
> > +      (h->extended_headers && offset != cmd->offset))
> >      goto resync;
> >
> >    switch (type) {
> 
> (9) Preserving the (cmd == NULL) sub-condition, plus the reference to
> "unkown cookie" in the comment, looks like a logic bug to me: it can
> never evaluate to "true", given the assert() I'm questioning above at
> (7).
> 
> Alternatively, the assert() is wrong.

Off-hand, I think the assert is correct and I do have a dead condition
here; but I'll have to convince myself of that.  Since the condition
is pre-existing, it may be worth a separate patch adding just the
assertion and removing the 'cmd == NULL' check here, if it is correct.

> 
> (10) The comment only talks about "unexpected structured reply", but the
> condition now deals with extended headers too, and I don't know how
> those relate to each other.
> 
> What I'm trying to express is that
> 
> (a) checking "structured_replies" *here*, but
> 
> (b) checking "extended_headers" against magic numbers in
> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY,
> 
> at the same time, seems inconsistent. If we get a structured reply after
> *not* having negotiated them, we should be able to catch that in the
> exact same location -- i.e., where we check magic numbers -- where we
> can also realize that we're getting an extended reply in spite of *not*
> having negotiated *those*.
> 
> In other words, the treatment differs, and I don't know why: if we get
> an unexpected structured reply, we skip it and "resync", whereas if we
> get an unexpected extended reply, we move to the DEAD state.
> 
> I think it's fine to skip replies that refer to unknown cookies, but
> *reply format* violations should be fatal.

I think this all stems from me trying to be as generous as possible
with bad servers: if we get a wrong reply type per the protocol but it
is a reply type we know how to handle, AND we haven't consumed too
many bytes for the size of that reply, then we can tolerate the
server's bug.  If it is the wrong type, and we already read too many
bytes for what the right type would be, we can't (easily) undo the
fact that we read too much.

If we did not negotiate extended headers, we start by reading the
length of a simple header; if the magic says it was instead an
extended header, we can tolerate the server's mistake by reading the
rest of that packet.  But I definitely see your point about
consolidating the decisions to be local to one state, rather than
split across two files, and aiming for consistent handling.

I'm liking your idea earlier in the series about reworking the state
engine to FULLY parse the header (whether simple, structured, or
extended) in states-reply.c, and only then move to
states-reply-structured.c if the header was structured or extended,
rather than splitting the header parse across two files.

> 
> 
> Then, this is not a request for an update, just a comment: I don't
> understand why the spec introduced the "offset" field at all. It does
> not seem to carry additional information beyond the cookie. So we can
> certainly check that the response's offset matches the command's offset
> (the code looks OK), but the larger purpose is unclear. (And this is
> probably also the reason why you get away with corrupting
> "nbd_extended_reply.offset" at (8) -- the field is never again needed,
> beyond the sanity check performed here.)

The evolution of that field: in my first draft, I wanted power-of-2
sizing to the reply header (both request and reply having a header of
exactly 32 bytes); this left dead space in the reply packet, which I
originally tried to mandate to be zero-filled.  But when looking at
the types again, I noticed that if the 'offset' field occupies the
same relative place in the request and reply struct, a server can
rewrite the reply in the same memory as it had saved for the request
(I don't know that I actually implemented it that way in qemu as
server, but it is possible for some other server).

There's also NBD_REPLY_TYPE_OFFSET_DATA and NBD_REPLY_TYPE_OFFSET_HOLE
(used in response to NBD_CMD_READ), which each return an offset field
in order to handle the case where a single client read request was
split across multiple server replies.  When we first specified
structured replies, the spec was ambiguous: if I issue
NBD_CMD_READ(length=1k, offset=0) but the server replies with the pair
OFFSET_DATA(length=512, offset=0) and OFFSET_HOLE(length=512,
offset=512), it's obvious how to reconstruct the original buffer (read
into the first 512 bytes, memset the second 512 bytes).  But if I
issue NBD_CMD_READ(length=1k, offset=512), should the server reply
OFFSET_DATA(length=512, offset=512) and OFFSET_HOLE(length=512,
offset=1k) (that is, the reply offsets are absolute to the overall
image), or with OFFSET_DATA(length=512, offset=0) and
OFFSET_HOLE(length=512, offset=512) (that is, the reply offsets are
relative to the start of the 1k buffer of the read request)?  The
initial implementations of qemu and nbdkit disagreed, and the spec
ended up settling in favor of the former (see nbd.git commit
587ba722).

As a result, when receiving an absolute offset in a data reply, the
client has to place the data into buffer + (reply_offset -
request_offset) (that is, convert the server's absolute offset back
into a buffer-relative offset), which requires knowing what the
reqeust offset was; having the server return this offset in extended
replies may make it easier to implement a client that doesn't have to
store its request offset separately.  Since libnbd also handles
non-extended headers (and has to hang on to the request offset for
those), you are right that we end up merely validating that the
server's reply offset was expected, and then throwing it away as it
didn't add anything further than a validation that we haven't lost
sync with the server.

> 
> > @@ -168,7 +186,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >      assert (length >= sizeof h->sbuf.reply.payload.error.error.error);
> >      assert (cmd);
> >
> > @@ -207,7 +225,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >      msglen = be16toh (h->sbuf.reply.payload.error.error.len);
> >      type = be16toh (h->sbuf.reply.hdr.structured.type);
> >
> > @@ -307,7 +325,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >      offset = be64toh (h->sbuf.reply.payload.offset_data.offset);
> >
> >      assert (cmd); /* guaranteed by CHECK */
> > @@ -346,7 +364,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >      offset = be64toh (h->sbuf.reply.payload.offset_data.offset);
> >
> >      assert (cmd); /* guaranteed by CHECK */
> > @@ -426,7 +444,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >
> >      assert (cmd); /* guaranteed by CHECK */
> >      assert (cmd->type == NBD_CMD_BLOCK_STATUS);
> > @@ -460,7 +478,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
> >      SET_NEXT_STATE (%.READY);
> >      return 0;
> >    case 0:
> > -    length = be32toh (h->sbuf.reply.hdr.structured.length);
> > +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
> >
> >      assert (cmd); /* guaranteed by CHECK */
> >      assert (cmd->type == NBD_CMD_BLOCK_STATUS);
> 
> (11) This is all fallout from (8). The commit message does document it
> (in the first paragraph), but I don't understand where we *benefit* from
> stuffing "nbd_extended_reply.length" into "nbd_structured_reply.length".

The benefit is that all later code only has to deal with
nbd_structured_reply (rather than adding yet more 'if
(h->extended_reply)' in all subsequent states).  Conversely, if I had
widened instead of narrowed, then all later code would only need to
doea with nbd_extended_reply.

> 
> Regarding downsides thereof, I can see two:
> 
> - as I mentioned, "nbd_extended_reply.offset" becomes unusable,
> 
> - "nbd_structured_reply.length" will no longer be big-endian but
>   host-endian, which arguably does not match the other fields'
>   endianness in the scratch space (= sbuf).
> 
> I feel this back-stuffing brings the stashed header into an inconsistent
> state that is specific to a subset of the automaton's states.
> 
> I'd rather introduce an entirely new field to "sbuf.reply" -- between
> "sbuf.reply.hdr" and "sbuf.reply.payload" --, if we really need to cache
> a host-endian length value, regardless of which kind of reply header we
> got.

You may be on to something.  Leaving the wire reply ALWAYS in network
endian order, and adding a few bytes to nbd_reply to stash a
normalized host-endian decoded value, would certainly be less
confusing than having to figure out "which parts of my wire reply are
still network-endian vs. natively converted".  It may make the overall
structure a few bytes larger, but that's probably in the noise (since
our nbd_handle already accomodatese maximum-length NBD strings of 4k,
adding up to another 32 bytes for fully-normalized host-endian form
isn't bad).  I'll keep that in mind for v4 (this is not the only place
that needs normalization; handling 32- vs. 64-bit block status replies
also needs normalization; the tradeoff between minimal storage by
normalizing in place vs. maintainability by normalizing into a copy
has interesting implications).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]