qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/4] Modifying ‘migrate’ qmp command to add multi-FD socket o


From: Claudio Fontana
Subject: Re: [PATCH 1/4] Modifying ‘migrate’ qmp command to add multi-FD socket on particular source and destination pair
Date: Wed, 13 Jul 2022 14:54:34 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 6/16/22 19:26, Dr. David Alan Gilbert wrote:
> * Het Gala (het.gala@nutanix.com) wrote:
>> i) Modified the format of the qemu monitor command : 'migrate' by adding a 
>> list,
>>    each element in the list consists of multi-FD connection parameters: 
>> source
>>    and destination uris and of the number of multi-fd channels between each 
>> pair.
>>
>> ii) Information of all multi-FD connection parameters’ list, length of the 
>> list
>>     and total number of multi-fd channels for all the connections together is
>>     stored in ‘OutgoingArgs’ struct.
>>
>> Suggested-by: Manish Mishra <manish.mishra@nutanix.com>
>> Signed-off-by: Het Gala <het.gala@nutanix.com>
>> ---
>>  include/qapi/util.h   |  9 ++++++++
>>  migration/migration.c | 47 ++++++++++++++++++++++++++++++--------
>>  migration/socket.c    | 53 ++++++++++++++++++++++++++++++++++++++++---
>>  migration/socket.h    | 17 +++++++++++++-
>>  monitor/hmp-cmds.c    | 22 ++++++++++++++++--
>>  qapi/migration.json   | 43 +++++++++++++++++++++++++++++++----
>>  6 files changed, 170 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/qapi/util.h b/include/qapi/util.h
>> index 81a2b13a33..3041feb3d9 100644
>> --- a/include/qapi/util.h
>> +++ b/include/qapi/util.h
>> @@ -56,4 +56,13 @@ int parse_qapi_name(const char *name, bool complete);
>>      (tail) = &(*(tail))->next; \
>>  } while (0)
>>  
>> +#define QAPI_LIST_LENGTH(list) ({ \
>> +    int _len = 0; \
>> +    typeof(list) _elem; \
>> +    for (_elem = list; _elem != NULL; _elem = _elem->next) { \
>> +        _len++; \
>> +    } \
>> +    _len; \
>> +})
>> +
>>  #endif
> 
> This looks like it should be a separate patch to me (and perhaps size_t
> for len?)
> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 31739b2af9..c408175aeb 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -2328,13 +2328,14 @@ static bool migrate_prepare(MigrationState *s, bool 
>> blk, bool blk_inc,
>>      return true;
>>  }
>>  
>> -void qmp_migrate(const char *uri, bool has_blk, bool blk,
>> +void qmp_migrate(const char *uri, bool has_multi_fd_uri_list,
>> +                 MigrateUriParameterList *cap, bool has_blk, bool blk,
>>                   bool has_inc, bool inc, bool has_detach, bool detach,
>>                   bool has_resume, bool resume, Error **errp)
>>  {
>>      Error *local_err = NULL;
>>      MigrationState *s = migrate_get_current();
>> -    const char *p = NULL;
>> +    const char *dst_ptr = NULL;
>>  
>>      if (!migrate_prepare(s, has_blk && blk, has_inc && inc,
>>                           has_resume && resume, errp)) {
>> @@ -2348,20 +2349,46 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
>> blk,
>>          }
>>      }
>>  
>> +    /*
>> +     * In case of Multi-FD migration parameters, if uri is provided,
> 
> I think you mean 'if uri list is provided'
> 
>> +     * supports only tcp network protocol.
>> +     */
>> +    if (has_multi_fd_uri_list) {
>> +        int length = QAPI_LIST_LENGTH(cap);
>> +        init_multifd_array(length);
>> +        for (int i = 0; i < length; i++) {
>> +            const char *p1 = NULL, *p2 = NULL;
> 
> Keep these as ps/pd  to make it clear which is source and dest.
> 
>> +            const char *multifd_dst_uri = cap->value->destination_uri;
>> +            const char *multifd_src_uri = cap->value->source_uri;
>> +            uint8_t multifd_channels = cap->value->multifd_channels;
>> +            if (!strstart(multifd_dst_uri, "tcp:", &p1) ||
>> +                !strstart(multifd_src_uri, "tcp:", &p2)) {
> 
> I've copied in Claudio Fontana; Claudio is fighting to make snapshots
> faster and has been playing with various multithread schemes for multifd
> with files and fd's;  perhaps the syntax you're proposing doesn't need
> to be limited to tcp.


Hi,

I will try to express our current problem, and see where there might be some 
overlap, maybe you can see more.

The current problem we are facing is, saving or restoring of VM state to disk, 
which in libvirt terms is "virsh save" or "virsh managedsave" and "virsh 
restore" or "virsh start",
is currently needlessly slow with large VMs, using upstream libvirt and qemu.

We need to get the transfer speeds of VM state (mainly RAM) to disk in the 
multiple GiB/s range with modern processors and NVMe disks; we have shown it is 
feasible.

Mainline libvirt uses QEMU migration to "fd://" to implement saving of VMs to 
disk, but adds copying though pipes via a libvirt "iohelper" process to work 
around a set of problems that are still not 100% clear to me;
(I cannot find the right email where this was discussed).

One clearly factual issue is that the QEMU migration stream is not currently 
O_DIRECT friendly, as it assumes it goes to network, and unfortunately for 
large transfers of this kind, O_DIRECT is needed due to the kernel file cache 
trashing problem.

So already, if your series were to address "fd://", it would potentially 
automatically provide an additional feature for libvirt's current save VM 
implementation; but I am not sure if what you are trying to achieve applies 
here.

Our temporary solution for libvirt to the throughput problem takes advantage of 
multifd migration to a "unix://" socket target to save in parallel,
with a new helper process (multifd-helper) taking the place of iohelper and 
performing the parallel multithreaded copy from the UNIX socket to a single 
file (in the latest iteration of the series),
or to multiple files in previous iterations, one for each multifd channel.

It works very well in practice, achieving dramatic throughput improvements by 
parallelizing the transfer reaching the GiB/s range.
This temporary solution is available here:

https://listman.redhat.com/archives/libvir-list/2022-June/232252.html

Libvirt is not accepting this approach, because the maintainer (Daniel, in Cc:) 
argues that the problem needs to be solved in QEMU instead,
while solving it in libvirt is an unwanted hack. My understanding is that this 
new feature is no more of a hack than the existing libvirt iohelper solution 
for basic VM save currently in mainline.

I don't really know how really this QEMU solution could look like yet.

If we code up a new QEMU "disk://" migration transport to save to a local file, 
and parameters to specify whether the transfer should happen in parallel, and 
how many parallel channels to use,
then we could solve the problem entirely in QEMU (possibly reusing some multifd 
code, or even not reusing that at all), but we end up with libvirt unable to 
efficiently put its own header as part of the savefile format libvirt expects.

An alternative could be instead to adjust the QEMU "fd://" migration protocol 
to add "parallel" parameters, and so keep the existing mechanism for 
libvirt/qemu communication for save vm,
change libvirt header read/write to be O_DIRECT friendly, and have qemu migrate 
in parallel directly to the open fd.

In both cases I presume that the QEMU migration stream code, including the code 
for all device state save, would need to be adjusted to be O_DIRECT friendly.

This modulo some additional details is my current understanding of the 
situation, I hope it helps.

Ciao,

Claudio

> 
>> +                error_setg(errp, "multi-fd destination and multi-fd source "
>> +                "uri, both should be present and follows tcp protocol 
>> only");
>> +                break;
>> +            } else {
>> +                store_multifd_migration_params(p1 ? p1 : multifd_dst_uri,
>> +                                            p2 ? p2 : multifd_src_uri,
>> +                                            multifd_channels, i, 
>> &local_err);
>> +            }
>> +            cap = cap->next;
>> +        }
>> +    }
>> +
>>      migrate_protocol_allow_multi_channels(false);
>> -    if (strstart(uri, "tcp:", &p) ||
>> +    if (strstart(uri, "tcp:", &dst_ptr) ||
>>          strstart(uri, "unix:", NULL) ||
>>          strstart(uri, "vsock:", NULL)) {
>>          migrate_protocol_allow_multi_channels(true);
>> -        socket_start_outgoing_migration(s, p ? p : uri, &local_err);
>> +        socket_start_outgoing_migration(s, dst_ptr ? dst_ptr : uri, 
>> &local_err);
>>  #ifdef CONFIG_RDMA
>> -    } else if (strstart(uri, "rdma:", &p)) {
>> -        rdma_start_outgoing_migration(s, p, &local_err);
>> +    } else if (strstart(uri, "rdma:", &dst_ptr)) {
>> +        rdma_start_outgoing_migration(s, dst_ptr, &local_err);
>>  #endif
>> -    } else if (strstart(uri, "exec:", &p)) {
>> -        exec_start_outgoing_migration(s, p, &local_err);
>> -    } else if (strstart(uri, "fd:", &p)) {
>> -        fd_start_outgoing_migration(s, p, &local_err);
>> +    } else if (strstart(uri, "exec:", &dst_ptr)) {
>> +        exec_start_outgoing_migration(s, dst_ptr, &local_err);
>> +    } else if (strstart(uri, "fd:", &dst_ptr)) {
>> +        fd_start_outgoing_migration(s, dst_ptr, &local_err);
>>      } else {
>>          if (!(has_resume && resume)) {
>>              yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>> diff --git a/migration/socket.c b/migration/socket.c
>> index 4fd5e85f50..7ca6af8cca 100644
>> --- a/migration/socket.c
>> +++ b/migration/socket.c
>> @@ -32,6 +32,17 @@ struct SocketOutgoingArgs {
>>      SocketAddress *saddr;
>>  } outgoing_args;
>>  
>> +struct SocketArgs {
>> +    struct SrcDestAddr data;
> 
> 'data' is an odd name; 'addresses' perhaps?
> 
>> +    uint8_t multifd_channels;
>> +};
>> +
>> +struct OutgoingMigrateParams {
>> +    struct SocketArgs *socket_args;
>> +    size_t length;
>> +    uint64_t total_multifd_channel;
>> +} outgoing_migrate_params;
>> +
>>  void socket_send_channel_create(QIOTaskFunc f, void *data)
>>  {
>>      QIOChannelSocket *sioc = qio_channel_socket_new();
>> @@ -47,6 +58,14 @@ int socket_send_channel_destroy(QIOChannel *send)
>>          qapi_free_SocketAddress(outgoing_args.saddr);
>>          outgoing_args.saddr = NULL;
>>      }
>> +
>> +    if (outgoing_migrate_params.socket_args != NULL) {
>> +        g_free(outgoing_migrate_params.socket_args);
>> +        outgoing_migrate_params.socket_args = NULL;
> 
> I think g_free is safe on NULL; so I think you can just do this without
> the if.
> 
>> +    }
>> +    if (outgoing_migrate_params.length) {
> 
> Does that ever differ from the != NULL test ?
> I think you can always just set this to 0 without the test.
> 
>> +        outgoing_migrate_params.length = 0;
>> +    }
>>      return 0;
>>  }
>>  
>> @@ -117,13 +136,41 @@ 
>> socket_start_outgoing_migration_internal(MigrationState *s,
>>  }
>>  
>>  void socket_start_outgoing_migration(MigrationState *s,
>> -                                     const char *str,
>> +                                     const char *dst_str,
>>                                       Error **errp)
>>  {
>>      Error *err = NULL;
>> -    SocketAddress *saddr = socket_parse(str, &err);
>> +    SocketAddress *dst_saddr = socket_parse(dst_str, &err);
>> +    if (!err) {
>> +        socket_start_outgoing_migration_internal(s, dst_saddr, &err);
>> +    }
>> +    error_propagate(errp, err);
>> +}
>> +
>> +void init_multifd_array(int length)
>> +{
>> +    outgoing_migrate_params.socket_args = g_new0(struct SocketArgs, length);
>> +    outgoing_migrate_params.length = length;
>> +    outgoing_migrate_params.total_multifd_channel = 0;
>> +}
>> +
>> +void store_multifd_migration_params(const char *dst_uri,
>> +                                    const char *src_uri,
>> +                                    uint8_t multifd_channels,
>> +                                    int idx, Error **errp)
>> +{
>> +    Error *err = NULL;
>> +    SocketAddress *src_addr = NULL;
>> +    SocketAddress *dst_addr = socket_parse(dst_uri, &err);
>> +    if (src_uri) {
>> +        src_addr = socket_parse(src_uri, &err);
>> +    }
>>      if (!err) {
>> -        socket_start_outgoing_migration_internal(s, saddr, &err);
>> +        outgoing_migrate_params.socket_args[idx].data.dst_addr = dst_addr;
>> +        outgoing_migrate_params.socket_args[idx].data.src_addr = src_addr;
>> +        outgoing_migrate_params.socket_args[idx].multifd_channels
>> +                                                         = multifd_channels;
>> +        outgoing_migrate_params.total_multifd_channel += multifd_channels;
>>      }
>>      error_propagate(errp, err);
>>  }
>> diff --git a/migration/socket.h b/migration/socket.h
>> index 891dbccceb..bba7f177fe 100644
>> --- a/migration/socket.h
>> +++ b/migration/socket.h
>> @@ -19,12 +19,27 @@
>>  
>>  #include "io/channel.h"
>>  #include "io/task.h"
>> +#include "migration.h"
>> +
>> +/* info regarding destination and source uri */
>> +struct SrcDestAddr {
>> +    SocketAddress *dst_addr;
>> +    SocketAddress *src_addr;
>> +};
>>  
>>  void socket_send_channel_create(QIOTaskFunc f, void *data);
>>  int socket_send_channel_destroy(QIOChannel *send);
>>  
>>  void socket_start_incoming_migration(const char *str, Error **errp);
>>  
>> -void socket_start_outgoing_migration(MigrationState *s, const char *str,
>> +void socket_start_outgoing_migration(MigrationState *s, const char *dst_str,
>>                                       Error **errp);
>> +
>> +int multifd_list_length(MigrateUriParameterList *list);
>> +
>> +void init_multifd_array(int length);
>> +
>> +void store_multifd_migration_params(const char *dst_uri, const char 
>> *src_uri,
>> +                                    uint8_t multifd_channels, int idx,
>> +                                    Error **erp);
>>  #endif
>> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
>> index 622c783c32..2db539016a 100644
>> --- a/monitor/hmp-cmds.c
>> +++ b/monitor/hmp-cmds.c
>> @@ -56,6 +56,9 @@
>>  #include "migration/snapshot.h"
>>  #include "migration/misc.h"
>>  
>> +/* Default number of multi-fd channels */
>> +#define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
>> +
>>  #ifdef CONFIG_SPICE
>>  #include <spice/enums.h>
>>  #endif
>> @@ -1574,10 +1577,25 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
>>      bool inc = qdict_get_try_bool(qdict, "inc", false);
>>      bool resume = qdict_get_try_bool(qdict, "resume", false);
>>      const char *uri = qdict_get_str(qdict, "uri");
>> +
>> +    const char *src_uri = qdict_get_str(qdict, "source-uri");
>> +    const char *dst_uri = qdict_get_str(qdict, "destination-uri");
>> +    uint8_t multifd_channels = qdict_get_try_int(qdict, "multifd-channels",
>> +                                        DEFAULT_MIGRATE_MULTIFD_CHANNELS);
>>      Error *err = NULL;
>> +    MigrateUriParameterList *caps = NULL;
>> +    MigrateUriParameter *value;
>> +
>> +    value = g_malloc0(sizeof(*value));
>> +    value->source_uri = (char *)src_uri;
>> +    value->destination_uri = (char *)dst_uri;
>> +    value->multifd_channels = multifd_channels;
>> +    QAPI_LIST_PREPEND(caps, value);
>> +
>> +    qmp_migrate(uri, !!caps, caps, !!blk, blk, !!inc,
>> +                inc, false, false, true, resume, &err);
>> +    qapi_free_MigrateUriParameterList(caps);
>>  
>> -    qmp_migrate(uri, !!blk, blk, !!inc, inc,
>> -                false, false, true, resume, &err);
>>      if (hmp_handle_error(mon, err)) {
>>          return;
>>      }
> 
> Please split the HMP changes into a separate patch.
> 
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 6130cd9fae..fb259d626b 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -1454,12 +1454,38 @@
>>  ##
>>  { 'command': 'migrate-continue', 'data': {'state': 'MigrationStatus'} }
>>  
>> +##
>> +# @MigrateUriParameter:
>> +#
>> +# Information regarding which source interface is connected to which
>> +# destination interface and number of multifd channels over each interface.
>> +#
>> +# @source-uri: the Uniform Resource Identifier of the source VM.
>> +#              Default port number is 0.
>> +#
>> +# @destination-uri: the Uniform Resource Identifier of the destination VM
> 
> I would just say 'uri' rather than spelling it out.
> 
>> +# @multifd-channels: number of parallel multifd channels used to migrate 
>> data
>> +#                    for specific source-uri and destination-uri. Default 
>> value
>> +#                    in this case is 2 (Since 4.0)
> 
> 7.1 at the moment.
> 
>> +#
>> +##
>> +{ 'struct' : 'MigrateUriParameter',
>> +  'data' : { 'source-uri' : 'str',
>> +             'destination-uri' : 'str',
>> +             '*multifd-channels' : 'uint8'} }
> 
> OK, so much higher level question - why do we specify both URIs on
> each end?  Is it just the source that is used on the source side to say
> which NIC to route down?  On the destination side I guess there's no
> need?
> 
> Do we have some rule about needing to specify enough channels for all
> the multifd channels we specify (i.e. if we specify 4 multifd channels
> in the migration parameter do we have to supply 4 channels here?)
> What happens with say Peter's preemption channel?
> 
> Is there some logical ordering rule; i.e. if we were to start ordering
> particular multifd threads, then can we say that we allocate these
> channels in the same order as this list?
> 
>>  ##
>>  # @migrate:
>>  #
>>  # Migrates the current running guest to another Virtual Machine.
>>  #
>>  # @uri: the Uniform Resource Identifier of the destination VM
>> +#       for migration thread
>> +#
>> +# @multi-fd-uri-list: list of pair of source and destination VM Uniform
>> +#                     Resource Identifiers with number of multifd-channels
>> +#                     for each pair
>>  #
>>  # @blk: do block migration (full disk copy)
>>  #
>> @@ -1479,20 +1505,27 @@
>>  # 1. The 'query-migrate' command should be used to check migration's 
>> progress
>>  #    and final result (this information is provided by the 'status' member)
>>  #
>> -# 2. All boolean arguments default to false
>> +# 2. The uri argument should have the Uniform Resource Identifier of default
>> +#    destination VM. This connection will be bound to default network
>> +#
>> +# 3. All boolean arguments default to false
>>  #
>> -# 3. The user Monitor's "detach" argument is invalid in QMP and should not
>> +# 4. The user Monitor's "detach" argument is invalid in QMP and should not
>>  #    be used
>>  #
>>  # Example:
>>  #
>> -# -> { "execute": "migrate", "arguments": { "uri": "tcp:0:4446" } }
>> +# -> { "execute": "migrate",
>> +#                 "arguments": { "uri": "tcp:0:4446", "multi-fd-uri-list": 
>> [ {
>> +#                                "source-uri": "tcp::6900", 
>> "destination-uri": "tcp:0:4480",
>> +#                                "multifd-channels": 4}, { "source-uri": 
>> "tcp:10.0.0.0: ",
>> +#                                "destination-uri": "tcp:11.0.0.0:7789", 
>> "multifd-channels": 5} ] } }
>>  # <- { "return": {} }
>>  #
>>  ##
>>  { 'command': 'migrate',
>> -  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
>> -           '*detach': 'bool', '*resume': 'bool' } }
>> +  'data': {'uri': 'str', '*multi-fd-uri-list': ['MigrateUriParameter'], 
>> '*blk': 'bool',
>> +           '*inc': 'bool', '*detach': 'bool', '*resume': 'bool' } }
>>  
>>  ##
>>  # @migrate-incoming:
>> -- 
>> 2.22.3
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]