Re: [PATCH 6/7] block/nbd: decouple reconnect from drain

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/7] block/nbd: decouple reconnect from drain

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
Date:	Tue, 16 Mar 2021 21:09:12 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

16.03.2021 19:03, Roman Kagan wrote:

On Mon, Mar 15, 2021 at 11:10:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:

15.03.2021 09:06, Roman Kagan wrote:

The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).

Since the pieces of the reconnection logic are now properly migrated
from one aio_context to another, it appears safe to just stop messing
with the drained section in the reconnection code.

Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")


I'd not think that it "fixes" it. Behavior changes.. But 5ad81b4946
didn't introduce any bugs.


I guess you're right.

In fact I did reproduce the situation when the drain made no progress
exactly becase the only in-flight reference was taken by the
connection_co, but it may be due to some intermediate stage of the patch
development so I need to do a more thorough analysis to tell if it was
triggerable with the original code.

Fixes: 8c517de24a ("block/nbd: fix drain dead-lock because of nbd 
reconnect-delay")


And here..

1. There is an existing problem (unrelated to nbd) in Qemu that long
io request which we have to wait for at drained_begin may trigger a
dead lock
(https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html)

2. So, when we have nbd reconnect (and therefore long io requests) we
simply trigger this deadlock.. That's why I decided to cancel the
requests (assuming they will most probably fail anyway).

I agree that nbd driver is wrong place for fixing the problem
described in
(https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg01339.html),
but if you just revert 8c517de24a, you'll see the deadlock again..


I may have misunderstood that thread, but isn't that deadlock exactly
due to the requests being unable to ever drain because the
reconnection process is suspended while the drain is in progress?



Hmm, I didn't thought about it this way.  What you mean is that reconnection is 
cancelled on drain_begin, so drain_begin will never finish, because it waits 
for requests, which will never be reconnected. So, you are right it's a 
deadlock too.

But as I remember what is described in 8c517de24a is another deadlock, triggered by 
"just a long request during drain_begin". And it may be triggered again, if the 
we'll try to reconnect for several seconds during drained_begin() instead of cancelling 
requests. Didn't you try the scenario described in 8c517de24a on top of your series?



--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch, (continued)
- [PATCH 4/7] block/nbd: transfer reconnection stuff across aio_context switch, Roman Kagan, 2021/03/15
- [PATCH 1/7] block/nbd: avoid touching freed connect_thread, Roman Kagan, 2021/03/15
  - Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread, Vladimir Sementsov-Ogievskiy, 2021/03/15
    - Re: [PATCH 1/7] block/nbd: avoid touching freed connect_thread, Roman Kagan, 2021/03/16
- [PATCH 5/7] block/nbd: better document a case in nbd_co_establish_connection, Roman Kagan, 2021/03/15
- [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait, Roman Kagan, 2021/03/15
  - Re: [PATCH 2/7] block/nbd: use uniformly nbd_client_connecting_wait, Vladimir Sementsov-Ogievskiy, 2021/03/15
- [PATCH 6/7] block/nbd: decouple reconnect from drain, Roman Kagan, 2021/03/15
  - Re: [PATCH 6/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/15
    - Re: [PATCH 6/7] block/nbd: decouple reconnect from drain, Roman Kagan, 2021/03/16
    - Re: [PATCH 6/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy <=
    - Re: [PATCH 6/7] block/nbd: decouple reconnect from drain, Roman Kagan, 2021/03/26
- [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context, Roman Kagan, 2021/03/15
  - Re: [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context, Vladimir Sementsov-Ogievskiy, 2021/03/15
    - Re: [PATCH 3/7] block/nbd: assert attach/detach runs in the proper context, Roman Kagan, 2021/03/15
- [PATCH 7/7] block/nbd: stop manipulating in_flight counter, Roman Kagan, 2021/03/15
  - Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter, Vladimir Sementsov-Ogievskiy, 2021/03/15
    - Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter, Roman Kagan, 2021/03/16
    - Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter, Vladimir Sementsov-Ogievskiy, 2021/03/16
    - Re: [PATCH 7/7] block/nbd: stop manipulating in_flight counter, Roman Kagan, 2021/03/26
- Re: [PATCH 0/7] block/nbd: decouple reconnect from drain, Vladimir Sementsov-Ogievskiy, 2021/03/15

Prev by Date: Re: [PATCH v3 6/6] block/qcow2: use seqcache for compressed writes
Next by Date: [PULL v3 00/42] Block layer patches and object-add QAPIfication
Previous by thread: Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
Next by thread: Re: [PATCH 6/7] block/nbd: decouple reconnect from drain
Index(es):
- Date
- Thread