qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-b


From: Laszlo Ersek
Subject: Re: [Libguestfs] [libnbd PATCH v3 05/22] states: Prepare to receive 64-bit replies
Date: Thu, 1 Jun 2023 16:11:31 +0200

On 6/1/23 15:00, Eric Blake wrote:
> On Thu, Jun 01, 2023 at 11:04:05AM +0200, Laszlo Ersek wrote:

>> Probably best to reorder the files in this patch for review:
> 
> I see what you mean: because of the state hierarchy, it is probably
> worth tweaking the git orderfile to favor files nearer the top of the
> state tree first, rather than strict alphabetical ordering of *.c.  I
> may just push a patch for that without needing review, as it doesn't
> affect library semantics at all.

Ouch, I'm sorry, I meant that *I* should rearrange the hunks in the
patch for review! I didn't mean to ask you for any orderfile changes! :)

Of course if you can do that: high five! :)

> 
> ...
>>> +++ b/generator/states-reply.c
>>> @@ -62,15 +62,19 @@  REPLY.START:
>>>     */
>>>    ssize_t r;
>>>
>>> -  /* We read all replies initially as if they are simple replies, but
>>> -   * check the magic in CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
>>> -   * This works because the structured_reply header is larger.
>>> +  /* With extended headers, there is only one size to read, so we can do it
>>> +   * all in one syscall.  But for older structured replies, we don't know 
>>> if
>>> +   * we have a simple or structured reply until we read the magic number,
>>> +   * requiring a two-part read with CHECK_SIMPLE_OR_STRUCTURED_REPLY below.
>>>     */
>>>    assert (h->reply_cmd == NULL);
>>>    assert (h->rlen == 0);
>>>
>>>    h->rbuf = &h->sbuf.reply.hdr;
>>> -  h->rlen = sizeof h->sbuf.reply.hdr.simple;
>>> +  if (h->extended_headers)
>>> +    h->rlen = sizeof h->sbuf.reply.hdr.extended;
>>> +  else
>>> +    h->rlen = sizeof h->sbuf.reply.hdr.simple;
>>>
>>>    r = h->sock->ops->recv (h, h->sock, h->rbuf, h->rlen);
>>>    if (r == -1) {
>>
>> The comment "This works because the structured_reply header is larger"
>> is being removed, which makes the non-ext branch even less
>> comprehensible than before.
>>
>> (2) Can we add the STATIC_ASSERT() here, stating that "sizeof simple" is
>> smaller than "sizeof structured"?
> 
> Yep, I'm already rebasing onto some additional asserts along those
> lines, based on your reviews earlier in the series.
> 
>>
>>> @@ -116,16 +120,22 @@  REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY:
>>>    uint64_t cookie;
>>>
>>>    magic = be32toh (h->sbuf.reply.hdr.simple.magic);
>>> -  if (magic == NBD_SIMPLE_REPLY_MAGIC) {
>>> +  switch (magic) {
>>> +  case NBD_SIMPLE_REPLY_MAGIC:
>>> +    if (h->extended_headers)
>>> +      goto invalid;
>>>      SET_NEXT_STATE (%SIMPLE_REPLY.START);
>>> -  }
>>> -  else if (magic == NBD_STRUCTURED_REPLY_MAGIC) {
>>> +    break;
>>> +  case NBD_STRUCTURED_REPLY_MAGIC:
>>> +  case NBD_EXTENDED_REPLY_MAGIC:
>>> +    if ((magic == NBD_STRUCTURED_REPLY_MAGIC) == h->extended_headers)
>>
>> Heh, I love this ((a == b) == c) "equivalence" form! :)
> 
> I still do a double-take every time I see 'a < b < c' in languages
> (like python) that support it as shorthand for 'a < b && b < c'.  Even
> weirder is when you see 'a < b > c'.  But yes, C's non-transitive ==
> can be a surprise for the uninitiated.
> 
>>
>>> +      goto invalid;
>>>      SET_NEXT_STATE (%STRUCTURED_REPLY.START);
>>> -  }
>>> -  else {
>>> -    /* We've probably lost synchronization. */
>>> -    SET_NEXT_STATE (%.DEAD);
>>> -    set_error (0, "invalid reply magic 0x%" PRIx32, magic);
>>> +    break;
>>> +  default:
>>> +  invalid:
>>> +    SET_NEXT_STATE (%.DEAD); /* We've probably lost synchronization. */
>>> +    set_error (0, "invalid or unexpected reply magic 0x%" PRIx32, magic);
>>>  #if 0 /* uncomment to see desynchronized data */
>>>      nbd_internal_hexdump (&h->sbuf.reply.hdr.simple,
>>>                            sizeof (h->sbuf.reply.hdr.simple),
>>
>> My apologies, but I find *this* placement of "invalid" terrible. I
>> thought the "goto invalid" statements were jumping to the end of the
>> function. Now I see they jump effectively to another case label.
>>
>> (3) How about this (on top of your patch, to be squashed):
> 
> That part of the patch pre-dates other reviews I've seen from you on
> the same topic (this patch series has been percolating on my local
> builds for a long time), so I'm not surprised by your request, and I'm
> happy to squash in your improvement.  I may have copied from other
> similar code in the generator/states-*.c files, if so, I'll probably
> do a separate cleanup patch first.
> 
>>
>> ... At the same time, even with this "cleanup" for the labels, I find it
>> relatively confusing (albeit correct, as far as I can tell!) that in
>> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we first check the magic
>> number(s) and *then* whether we negotiated extended headers, whereas in
>> all the other state handlers, the order (= condition nesting) is the
>> opposite: we first check extended headers, and deal with any other
>> objects second. This makes the assert()s in REPLY.STRUCTURED_REPLY.START
>> especially tricky to demonstrate -- I think I've managed to do it, it
>> looks correct, it's just hard. So that's a half- (or quarter-)hearted
>> proposal to investigate how REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY
>> looked if there too, "extended_headers" were the outer check. Anyway, I
>> don't feel strongly about this. :)
>>
>> (4) Still in REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY, we have a comment
>> saying
>>
>>    /* NB: This works for both simple and structured replies because the
>>     * handle (our cookie) is stored at the same offset.
>>     */
>>
>> The code under it applies to extended headers too, so the comment should
>> be updated.
>>
>> (There's a similar comment in REPLY.FINISH_COMMAND, too; I'm not sure if
>> that state is relevant for extended headers.)
> 
> Indeed, when rebasing, I need to remember to grep (to cover comments)
> rather than merely relying on the compiler (which only covers code)
> for renames.
> 
>>
>> Then:
>>
>>> diff --git a/generator/states-reply-structured.c 
>>> b/generator/states-reply-structured.c
>>> index df509216..c53aed7b 100644
>>> --- a/generator/states-reply-structured.c
>>> +++ b/generator/states-reply-structured.c
>>> @@ -45,14 +45,20 @@ structured_reply_in_bounds (uint64_t offset, uint32_t 
>>> length,
>>>
>>>  STATE_MACHINE {
>>>   REPLY.STRUCTURED_REPLY.START:
>>> -  /* We've only read the simple_reply.  The structured_reply is longer,
>>> -   * so read the remaining part.
>>> +  /* If we have extended headers, we've already read the entire header.
>>> +   * Otherwise, we've only read enough for a simple_reply; since structured
>>> +   * replies are longer, read the remaining part.
>>>     */
>>
>> (5) Ah, the word "simple_reply", which this change preserves, is a
>> leftover. It should be updated in patch#2, "internal: Refactor layout of
>> replies in sbuf", where the "simple_reply" field is replaced with
>> "reply.hdr.simple" in "sbuf".
>>
>> I guess I didn't notice this omission in patch#2 because the context
>> there only shows the "so read the remaining part" line of the comment,
>> and "simple_reply" is on the preceding line.
>>
>> And then this patch too should refer to "reply.hdr.simple", in the new
>> comment.
>>
>> Alternatively, perhaps use the type name (struct tag, actually):
>> "nbd_simple_reply".
> 
> Will fix in the appropriate place(s).
> 
>>
>>> -  h->rbuf = &h->sbuf;
>>> -  h->rbuf = (char *)h->rbuf + sizeof h->sbuf.reply.hdr.simple;
>>> -  h->rlen = sizeof h->sbuf.reply.hdr.structured;
>>> -  h->rlen -= sizeof h->sbuf.reply.hdr.simple;
>>> -  SET_NEXT_STATE (%RECV_REMAINING);
>>> +  if (h->extended_headers) {
>>> +    assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + 
>>> (char*)&h->sbuf);
>>> +    SET_NEXT_STATE (%CHECK);
>>> +  }
>>> +  else {
>>> +    assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf);
>>> +    h->rlen = sizeof h->sbuf.reply.hdr.structured -
>>> +      sizeof h->sbuf.reply.hdr.simple;
>>> +    SET_NEXT_STATE (%RECV_REMAINING);
>>> +  }
>>>    return 0;
>>>
>>>   REPLY.STRUCTURED_REPLY.RECV_REMAINING:
>>
>> This looks OK, but can we make it less verbose? How about:
>>
>> /----------------------
>> | diff --git a/lib/internal.h b/lib/internal.h
>> | index e4e21a4d1ffa..a630b2511ff7 100644
>> | --- a/lib/internal.h
>> | +++ b/lib/internal.h
>> | @@ -240,7 +240,7 @@ struct nbd_handle {
>> |      } or;
>> |      struct nbd_export_name_option_reply export_name_reply;
>> |      struct {
>> | -      union {
>> | +      union reply_header {
> 
> Oh cool.  I keep forgetting that you can declare a type name usable at
> the top level even while declaring a nested struct.  Yeah, doing that
> probably has some nice line length improvements.
> 
>> |          struct nbd_simple_reply simple;
>> |          struct nbd_structured_reply structured;
>> |          struct nbd_extended_reply extended;
>> | diff --git a/generator/states-reply-structured.c 
>> b/generator/states-reply-structured.c
>> | index c53aed7bb859..852cb5b6053c 100644
>> | --- a/generator/states-reply-structured.c
>> | +++ b/generator/states-reply-structured.c
>> | @@ -49,14 +49,14 @@  REPLY.STRUCTURED_REPLY.START:
>> |     * Otherwise, we've only read enough for a simple_reply; since 
>> structured
>> |     * replies are longer, read the remaining part.
>> |     */
>> | +  union reply_header *hdr = &h->sbuf.reply.hdr;
>> |    if (h->extended_headers) {
>> | -    assert (h->rbuf == sizeof h->sbuf.reply.hdr.extended + 
>> (char*)&h->sbuf);
>> | +    assert (h->rbuf == sizeof hdr->extended + (char*)&h->sbuf);
>> |      SET_NEXT_STATE (%CHECK);
>> |    }
>> |    else {
>> | -    assert (h->rbuf == sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf);
>> | -    h->rlen = sizeof h->sbuf.reply.hdr.structured -
>> | -      sizeof h->sbuf.reply.hdr.simple;
>> | +    assert (h->rbuf == sizeof hdr->simple + (char*)&h->sbuf);
>> | +    h->rlen = sizeof hdr->structured - sizeof hdr->simple;
>> |      SET_NEXT_STATE (%RECV_REMAINING);
>> |    }
>> |    return 0;
>> \----------------------
>>
>> Anyway, feel free to ignore this -- I can see two counter-arguments
>> myself:
>>
>> - the rest of the code remains just as verbose,
>>
> 
> Not necessarily - once I add in more STATIC_ASSERTs, being able to
> refer to individual nested types without having to go all the way
> through struct nbd_handle or union sbuf will be shorter.
> 
>> - one could argue that the addition
>>
>>     sizeof hdr->simple + (char*)&h->sbuf
>>
>>   is *more* confusing than
>>
>>     sizeof h->sbuf.reply.hdr.simple + (char*)&h->sbuf
> 
> That argument still remains - it's easier to see that we expect a
> particular offset from sbuf when the use of sbuf in the offset
> calculation is not hidden several lines away in the intermediate
> initialization.   I'll think about it for this line.
> 
>>
>> Then:
>>
>>> @@ -69,11 +75,18 @@  REPLY.STRUCTURED_REPLY.RECV_REMAINING:
>>>   REPLY.STRUCTURED_REPLY.CHECK:
>>>    struct command *cmd = h->reply_cmd;
>>>    uint16_t flags, type;
>>> -  uint32_t length;
>>> +  uint64_t length;
>>> +  uint64_t offset = -1;
>>
>> (6) I disagree with initializing the local variable "offset" here.
>>
>> Below (in the rest of REPLY.STRUCTURED_REPLY.CHECK), we only read
>> "offset" back if "extended_headers" is set. But if "extended_headers" is
>> set, we also store a value to "offset", before the read.
>>
>> Initializing "offset" to -1 suggests that the code might otherwise read
>> an indeterminate value from "offset" -- but that's not the case.
> 
> I'll have to double-check; I thought that offset was read even for
> structured replies (where there really isn't an offset read from the
> wire), but my rebasing efforts may have changed that.
> 
>>
>>>
>>> +  assert (cmd);
>>
>> (7) What guarantees this?
>>
>> Is it the lookup code at the end of
>> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY?
>>
>> But that code can set "reply_cmd" to NULL, in case the cookie is not
>> found; and the cookie lookup there defers to FINISH for diagnosing the
>> unassociated reply.
> 
> Hmmm.  Reading ahead, I see what you wrote in (9).  This may have been
> something I copied from other states, but I'll have to double check
> whether it still makes sense.
> 
>>
>>>    flags = be16toh (h->sbuf.reply.hdr.structured.flags);
>>>    type = be16toh (h->sbuf.reply.hdr.structured.type);
>>> -  length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +  if (h->extended_headers) {
>>> +    length = be64toh (h->sbuf.reply.hdr.extended.length);
>>> +    offset = be64toh (h->sbuf.reply.hdr.extended.offset);
>>> +  }
>>> +  else
>>> +    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>>
>>>    /* Reject a server that replies with too much information, but don't
>>>     * reject a single structured reply to NBD_CMD_READ on the largest
>>> @@ -83,13 +96,18 @@  REPLY.STRUCTURED_REPLY.CHECK:
>>>     * not worth keeping the connection alive.
>>>     */
>>>    if (length > MAX_REQUEST_SIZE + sizeof 
>>> h->sbuf.reply.payload.offset_data) {
>>> -    set_error (0, "invalid server reply length %" PRIu32, length);
>>> +    set_error (0, "invalid server reply length %" PRIu64, length);
>>>      SET_NEXT_STATE (%.DEAD);
>>>      return 0;
>>>    }
>>> +  /* For convenience, we now normalize extended replies into compact,
>>> +   * doable since we validated that payload length fits in 32 bits.
>>> +   */
>>> +  h->sbuf.reply.hdr.structured.length = length;
>>
>> (8) I'm very confused by this. For an extended reply, this will
>> overwrite the "offset" field.
> 
> At one point, I considered normalizing in the opposite direction (take
> structured replies and widen them into the extended header form); it
> required touching more lines of code, so I didn't pursue it at the
> time.  But I can see how compressing things down (intentionally
> throwing away information we will no longer reference) can be
> confusing without good comments, so at a minimum, I will be adding
> comments, if not outright going with the widening rather than
> narrowing approach.
> 
>>
>> I understand we have stolen that field already, above; but that's little
>> solace, especially without specific comments.
>>
>>>
>>>    /* Skip an unexpected structured reply, including to an unknown cookie. 
>>> */
>>> -  if (cmd == NULL || !h->structured_replies)
>>> +  if (cmd == NULL || !h->structured_replies ||
>>> +      (h->extended_headers && offset != cmd->offset))
>>>      goto resync;
>>>
>>>    switch (type) {
>>
>> (9) Preserving the (cmd == NULL) sub-condition, plus the reference to
>> "unkown cookie" in the comment, looks like a logic bug to me: it can
>> never evaluate to "true", given the assert() I'm questioning above at
>> (7).
>>
>> Alternatively, the assert() is wrong.
> 
> Off-hand, I think the assert is correct and I do have a dead condition
> here; but I'll have to convince myself of that.  Since the condition
> is pre-existing, it may be worth a separate patch adding just the
> assertion and removing the 'cmd == NULL' check here, if it is correct.
> 
>>
>> (10) The comment only talks about "unexpected structured reply", but the
>> condition now deals with extended headers too, and I don't know how
>> those relate to each other.
>>
>> What I'm trying to express is that
>>
>> (a) checking "structured_replies" *here*, but
>>
>> (b) checking "extended_headers" against magic numbers in
>> REPLY.CHECK_SIMPLE_OR_STRUCTURED_REPLY,
>>
>> at the same time, seems inconsistent. If we get a structured reply after
>> *not* having negotiated them, we should be able to catch that in the
>> exact same location -- i.e., where we check magic numbers -- where we
>> can also realize that we're getting an extended reply in spite of *not*
>> having negotiated *those*.
>>
>> In other words, the treatment differs, and I don't know why: if we get
>> an unexpected structured reply, we skip it and "resync", whereas if we
>> get an unexpected extended reply, we move to the DEAD state.
>>
>> I think it's fine to skip replies that refer to unknown cookies, but
>> *reply format* violations should be fatal.
> 
> I think this all stems from me trying to be as generous as possible
> with bad servers: if we get a wrong reply type per the protocol but it
> is a reply type we know how to handle, AND we haven't consumed too
> many bytes for the size of that reply, then we can tolerate the
> server's bug.  If it is the wrong type, and we already read too many
> bytes for what the right type would be, we can't (easily) undo the
> fact that we read too much.
> 
> If we did not negotiate extended headers, we start by reading the
> length of a simple header; if the magic says it was instead an
> extended header, we can tolerate the server's mistake by reading the
> rest of that packet.  But I definitely see your point about
> consolidating the decisions to be local to one state, rather than
> split across two files, and aiming for consistent handling.

Right, I found your note about commit b31e7bac in patch#6, and I found
it hard to agree with the motivation behind it :) When reading patch#6,
I immediately felt it was somehow related to my point (10) here, under
patch#5. I couldn't figure out how exactly though, as the affected files
were different!

> 
> I'm liking your idea earlier in the series about reworking the state
> engine to FULLY parse the header (whether simple, structured, or
> extended) in states-reply.c, and only then move to
> states-reply-structured.c if the header was structured or extended,
> rather than splitting the header parse across two files.

I don't want to generate unnecessary work for you, but you too might
find future maintenance easier this way. From my perspective, I already
have to ping-pong between these two C files; not easy. :)

> 
>>
>>
>> Then, this is not a request for an update, just a comment: I don't
>> understand why the spec introduced the "offset" field at all. It does
>> not seem to carry additional information beyond the cookie. So we can
>> certainly check that the response's offset matches the command's offset
>> (the code looks OK), but the larger purpose is unclear. (And this is
>> probably also the reason why you get away with corrupting
>> "nbd_extended_reply.offset" at (8) -- the field is never again needed,
>> beyond the sanity check performed here.)
> 
> The evolution of that field: in my first draft, I wanted power-of-2
> sizing to the reply header (both request and reply having a header of
> exactly 32 bytes); this left dead space in the reply packet, which I
> originally tried to mandate to be zero-filled.  But when looking at
> the types again, I noticed that if the 'offset' field occupies the
> same relative place in the request and reply struct, a server can
> rewrite the reply in the same memory as it had saved for the request
> (I don't know that I actually implemented it that way in qemu as
> server, but it is possible for some other server).
> 
> There's also NBD_REPLY_TYPE_OFFSET_DATA and NBD_REPLY_TYPE_OFFSET_HOLE
> (used in response to NBD_CMD_READ), which each return an offset field
> in order to handle the case where a single client read request was
> split across multiple server replies.  When we first specified
> structured replies, the spec was ambiguous: if I issue
> NBD_CMD_READ(length=1k, offset=0) but the server replies with the pair
> OFFSET_DATA(length=512, offset=0) and OFFSET_HOLE(length=512,
> offset=512), it's obvious how to reconstruct the original buffer (read
> into the first 512 bytes, memset the second 512 bytes).  But if I
> issue NBD_CMD_READ(length=1k, offset=512), should the server reply
> OFFSET_DATA(length=512, offset=512) and OFFSET_HOLE(length=512,
> offset=1k) (that is, the reply offsets are absolute to the overall
> image), or with OFFSET_DATA(length=512, offset=0) and
> OFFSET_HOLE(length=512, offset=512) (that is, the reply offsets are
> relative to the start of the 1k buffer of the read request)?  The
> initial implementations of qemu and nbdkit disagreed, and the spec
> ended up settling in favor of the former (see nbd.git commit
> 587ba722).

Interesting: while reading your comparison, I immediately found the read
request relative reply offsets far-fetched and cumbersome, and the
absolute reply offsets "natural". :)

> 
> As a result, when receiving an absolute offset in a data reply, the
> client has to place the data into buffer + (reply_offset -
> request_offset) (that is, convert the server's absolute offset back
> into a buffer-relative offset), which requires knowing what the
> reqeust offset was; having the server return this offset in extended
> replies may make it easier to implement a client that doesn't have to
> store its request offset separately.  Since libnbd also handles
> non-extended headers (and has to hang on to the request offset for
> those), you are right that we end up merely validating that the
> server's reply offset was expected, and then throwing it away as it
> didn't add anything further than a validation that we haven't lost
> sync with the server.

Thanks for clarifying! So a less "legacy compatible" NBD client could
insist on extended headers (refuse working with servers not providing
extended headers), and then such a client might need the "offset" field
for real.

(On a tangent: I wonder if it were *secure* to trust the server's
"offset" field in the reply, for addressing client memory!)

> 
>>
>>> @@ -168,7 +186,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>      assert (length >= sizeof h->sbuf.reply.payload.error.error.error);
>>>      assert (cmd);
>>>
>>> @@ -207,7 +225,7 @@  REPLY.STRUCTURED_REPLY.RECV_ERROR_MESSAGE:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>      msglen = be16toh (h->sbuf.reply.payload.error.error.len);
>>>      type = be16toh (h->sbuf.reply.hdr.structured.type);
>>>
>>> @@ -307,7 +325,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>      offset = be64toh (h->sbuf.reply.payload.offset_data.offset);
>>>
>>>      assert (cmd); /* guaranteed by CHECK */
>>> @@ -346,7 +364,7 @@  REPLY.STRUCTURED_REPLY.RECV_OFFSET_DATA_DATA:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>      offset = be64toh (h->sbuf.reply.payload.offset_data.offset);
>>>
>>>      assert (cmd); /* guaranteed by CHECK */
>>> @@ -426,7 +444,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_HEADER:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>
>>>      assert (cmd); /* guaranteed by CHECK */
>>>      assert (cmd->type == NBD_CMD_BLOCK_STATUS);
>>> @@ -460,7 +478,7 @@  REPLY.STRUCTURED_REPLY.RECV_BS_ENTRIES:
>>>      SET_NEXT_STATE (%.READY);
>>>      return 0;
>>>    case 0:
>>> -    length = be32toh (h->sbuf.reply.hdr.structured.length);
>>> +    length = h->sbuf.reply.hdr.structured.length; /* normalized in CHECK */
>>>
>>>      assert (cmd); /* guaranteed by CHECK */
>>>      assert (cmd->type == NBD_CMD_BLOCK_STATUS);
>>
>> (11) This is all fallout from (8). The commit message does document it
>> (in the first paragraph), but I don't understand where we *benefit* from
>> stuffing "nbd_extended_reply.length" into "nbd_structured_reply.length".
> 
> The benefit is that all later code only has to deal with
> nbd_structured_reply (rather than adding yet more 'if
> (h->extended_reply)' in all subsequent states).  Conversely, if I had
> widened instead of narrowed, then all later code would only need to
> doea with nbd_extended_reply.

Hmmm. That's a good point. I guess I didn't think of it because it would
mean (for example) handling extended block descriptors in a payload that
*seemingly* followed a structured (not extended) reply header.

I've not yet gotten used to the idea that header and payload are going
to be treated independently, after initial consistency checks.

> 
>>
>> Regarding downsides thereof, I can see two:
>>
>> - as I mentioned, "nbd_extended_reply.offset" becomes unusable,
>>
>> - "nbd_structured_reply.length" will no longer be big-endian but
>>   host-endian, which arguably does not match the other fields'
>>   endianness in the scratch space (= sbuf).
>>
>> I feel this back-stuffing brings the stashed header into an inconsistent
>> state that is specific to a subset of the automaton's states.
>>
>> I'd rather introduce an entirely new field to "sbuf.reply" -- between
>> "sbuf.reply.hdr" and "sbuf.reply.payload" --, if we really need to cache
>> a host-endian length value, regardless of which kind of reply header we
>> got.
> 
> You may be on to something.  Leaving the wire reply ALWAYS in network
> endian order, and adding a few bytes to nbd_reply to stash a
> normalized host-endian decoded value, would certainly be less
> confusing than having to figure out "which parts of my wire reply are
> still network-endian vs. natively converted".  It may make the overall
> structure a few bytes larger, but that's probably in the noise (since
> our nbd_handle already accomodatese maximum-length NBD strings of 4k,
> adding up to another 32 bytes for fully-normalized host-endian form
> isn't bad).  I'll keep that in mind for v4 (this is not the only place
> that needs normalization; handling 32- vs. 64-bit block status replies
> also needs normalization; the tradeoff between minimal storage by
> normalizing in place vs. maintainability by normalizing into a copy
> has interesting implications).
> 

Thanks!
Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]