[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 07/14] vfio-user: run vfio-user context
From: |
Stefan Hajnoczi |
Subject: |
Re: [PATCH v4 07/14] vfio-user: run vfio-user context |
Date: |
Thu, 6 Jan 2022 13:35:32 +0000 |
On Wed, Jan 05, 2022 at 10:38:10AM +0000, Thanos Makatos wrote:
>
>
> > -----Original Message-----
> > From: Jag Raman <jag.raman@oracle.com>
> > Sent: 17 December 2021 18:00
> > To: Stefan Hajnoczi <stefanha@redhat.com>; John Levon
> > <john.levon@nutanix.com>; Thanos Makatos <thanos.makatos@nutanix.com>
> > Cc: qemu-devel <qemu-devel@nongnu.org>; Alex Williamson
> > <alex.williamson@redhat.com>; Marc-André Lureau
> > <marcandre.lureau@gmail.com>; Philippe Mathieu-Daudé
> > <philmd@redhat.com>; pbonzini@redhat.com; alex.bennee@linaro.org;
> > thuth@redhat.com; crosa@redhat.com; wainersm@redhat.com;
> > bleal@redhat.com; Elena Ufimtseva <elena.ufimtseva@oracle.com>; John
> > Levon <john.levon@nutanix.com>; John Johnson
> > <john.g.johnson@oracle.com>; Thanos Makatos
> > <thanos.makatos@nutanix.com>; Swapnil Ingle <swapnil.ingle@nutanix.com>
> > Subject: Re: [PATCH v4 07/14] vfio-user: run vfio-user context
> >
> >
> >
> > > On Dec 16, 2021, at 6:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Wed, Dec 15, 2021 at 10:35:31AM -0500, Jagannathan Raman wrote:
> > >> @@ -114,6 +118,62 @@ static void vfu_object_set_device(Object *obj,
> > const char *str, Error **errp)
> > >> vfu_object_init_ctx(o, errp);
> > >> }
> > >>
> > >> +static void vfu_object_ctx_run(void *opaque)
> > >> +{
> > >> + VfuObject *o = opaque;
> > >> + int ret = -1;
> > >> +
> > >> + while (ret != 0) {
> > >> + ret = vfu_run_ctx(o->vfu_ctx);
> > >> + if (ret < 0) {
> > >> + if (errno == EINTR) {
> > >> + continue;
> > >> + } else if (errno == ENOTCONN) {
> > >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> > >> + o->vfu_poll_fd = -1;
> > >> + object_unparent(OBJECT(o));
> > >> + break;
> > >
> > > If nothing else logs a message then I think that should be done here so
> > > users know why their vfio-user server object disappeared.
> >
> > Sure will do.
> >
> > Do you prefer a trace, or a message to the console? Trace makes sense to me.
> > Presently, the client could unplug the vfio-user device which would trigger
> > the
> > deletion of this object. This process could happen quietly.
> >
> > >
> > >> + } else {
> > >> + error_setg(&error_abort, "vfu: Failed to run device %s
> > >> - %s",
> > >> + o->device, strerror(errno));
> > >
> > > error_abort is equivalent to assuming !o->daemon. In the case where the
> > > user doesn't want to automatically shut down the process we need to log
> > > a message without aborting.
> >
> > OK, makes sense.
> >
> > >
> > >> + break;
> > >
> > > Indentation is off.
> > >
> > >> + }
> > >> + }
> > >> + }
> > >> +}
> > >> +
> > >> +static void vfu_object_attach_ctx(void *opaque)
> > >> +{
> > >> + VfuObject *o = opaque;
> > >> + GPollFD pfds[1];
> > >> + int ret;
> > >> +
> > >> + qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> > >> +
> > >> + pfds[0].fd = o->vfu_poll_fd;
> > >> + pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
> > >> +
> > >> +retry_attach:
> > >> + ret = vfu_attach_ctx(o->vfu_ctx);
> > >> + if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> > >> + qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
> > >> + goto retry_attach;
> > >
> > > This can block the thread indefinitely. Other events like monitor
> > > commands are not handled in this loop. Please make this asynchronous
> > > (set an fd handler and return from this function so we can try again
> > > later).
> > >
> > > The vfu_attach_ctx() implementation synchronously negotiates the
> > > vfio-user connection :(. That's a shame because even if accept(2) is
> > > handled asynchronously, the negotiation can still block. It would be
> > > cleanest to have a fully async libvfio-user's vfu_attach_ctx() API to
> > > avoid blocking. Is that possible?
> >
> > Thanos / John,
> >
> > Any thoughts on this?
>
> I'm discussing this with John and FYI there are other places where
> libvfio-user can block, e.g. sending a response or receiving a command. Is it
> just the negotiation you want it to be asynchronous or _all_ libvfio-user
> operations? Making libvfio-user fully asynchronous might require a
> substantial API rewrite.
I see at least two reasons for a fully async API:
1. The program wants to handle other events (e.g. a management REST API)
from the same event loop thread that invokes libvfio-user. If
libvfio-user blocks then the other events cannot be handled within a
reasonable time frame.
The workaround for this is to use multi-threading and ignore the
event-driven architecture implied by vfu_get_poll_fd().
2. The program handles multiple clients that do not trust each other.
This could be a software-defined network switch or storage appliance.
A malicious client can cause a denial-of-service by making a
libvfio-user call block.
Again, the program needs separate threads instead of an event loop to
work around this.
The downside to a sync approach is that programs that already have an
event loop require extra code to set up dedicated threads for
libvfio-user. That's a library integration/usability issue.
In some cases it's okay to block: when the program doesn't need to
handle other events. If most users of libvfio-user are expected to fall
into this category then there's no need to change the API.
Either way, the doc comments in libvfio-user.h aren't very clear.
Someone integrating this library may think vfu_get_poll_fd() allows for
fully async operation.
Stefan
signature.asc
Description: PGP signature