On Mon, May 30, 2022 at 08:07:35PM +0300, Avihai Horon wrote:
+/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */
+static int vfio_save_block(QEMUFile *f, VFIOMigration *migration)
+{
+ ssize_t data_size;
+
+ data_size = read(migration->data_fd, migration->data_buffer,
+ migration->data_buffer_size);
+ if (data_size < 0) {
+ return -1;
+ }
+ if (data_size == 0) {
+ return 1;
+ }
+
+ qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE);
+ qemu_put_be64(f, data_size);
+ qemu_put_buffer_async(f, migration->data_buffer, data_size, false);
+ qemu_fflush(f);
+ bytes_transferred += data_size;
+
+ trace_vfio_save_block(migration->vbasedev->name, data_size);
+
+ return qemu_file_get_error(f);
+}
We looked at this from an eye to "how much data is transfered" per
callback.
The above function is the basic data mover, and
'migration->data_buffer_size' is set to 1MB at the moment.
So, we product up to 1MB VFIO_MIG_FLAG_DEV_DATA_STATE sections.
This series does not include the precopy support, but that will
include a precopy 'save_live_iterate' function like this:
static int vfio_save_iterate(QEMUFile *f, void *opaque)
{
VFIODevice *vbasedev = opaque;
VFIOMigration *migration = vbasedev->migration;
int ret;
ret = vfio_save_block(f, migration);
if (ret < 0) {
return ret;
}
if (ret == 1) {
return 1;
}
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
return 0;
}
Thus, during precopy this will never do more than 1MB per callback.
+static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)
+{
+ VFIODevice *vbasedev = opaque;
+ enum vfio_device_mig_state recover_state;
+ int ret;
+
+ /* We reach here with device state STOP or STOP_COPY only */
+ recover_state = VFIO_DEVICE_STATE_STOP;
+ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
+ recover_state);
+ if (ret) {
+ return ret;
+ }
+
+ do {
+ ret = vfio_save_block(f, vbasedev->migration);
+ if (ret < 0) {
+ return ret;
+ }
+ } while (!ret);
This seems to be the main problem where we chain together 1MB blocks
until the entire completed precopy data is completed. The above is
hooked to 'save_live_complete_precopy'
So, if we want to break the above up into some 'save_iterate' like
function, do you have some advice how to do it? The above do/while
must happen after the VFIO_DEVICE_STATE_STOP_COPY.
For mlx5 the above loop will often be ~10MB's for small VMs and
100MB's for big VMs (big meaning making extensive use of RDMA
functionality), and this will not change with pre-copy support or not.
Is it still a problem?
For other devices, like a GPU, I would imagine pre-copy support is
implemented and this will be a smaller post-precopy residual.
Jason