Re: [PATCH] migration: support file: uri for source migration

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration: support file: uri for source migration

From:	Daniel P . Berrangé
Subject:	Re: [PATCH] migration: support file: uri for source migration
Date:	Mon, 12 Sep 2022 16:41:47 +0100
User-agent:	Mutt/2.2.6 (2022-06-05)

On Thu, Sep 08, 2022 at 01:26:32PM +0300, Nikolay Borisov wrote:
> This is a prototype of supporting a 'file:' based uri protocol for
> writing out the migration stream of qemu. Currently the code always
> opens the file in DIO mode and adheres to an alignment of 64k to be
> generic enough. However this comes with a problem - it requires copying
> all data that we are writing (qemu metadata + guest ram pages) to a
> bounce buffer so that we adhere to this alignment.

The adhoc device metadata clearly needs bounce buffers since it
is splattered all over RAM with no concern of alignemnt. THe use
of bounce buffers for this shouldn't be a performance issue though
as metadata is small relative to the size of the snapshot as a whole.

The guest RAM pages should not need bounce buffers at all when using
huge pages, as alignment will already be way larger than we required.
Guests with huge pages are the ones which are likely to have huge
RAM sizes and thus need the DIO mode, so we should be sorted for that.

When using small pages for guest RAM, if it is not already allocated
with suitable alignment, I feel like we should be able to make it
so that we allocate the RAM block with good alignemnt to avoid the
need for bounce buffers. This would address the less common case of
a guest with huge RAM size but not huge pages.

Thus if we assume guest RAM is suitably aligned, then we can avoid
bounce buffers for RAM pages, while still using bounce buffers for
the metadata.

>                                                    With this code I get
> the following performance results:
> 
>       DIO              exec: cat > file         virsh --bypass-cache
>       82                              77                                      
>                 81
>       82                          78                                          
>         80
>       80                          80                                          
>         82
>       82                          82                                          
>         77
>       77                          79                                          
>         77
> 
> AVG:  80.6                            79.2                                    
>         79.4
> stddev: 1.959                 1.720                                           
> 2.05
> 
> All numbers are in seconds.
> 
> Those results are somewhat surprising to me as I'd expected doing the
> writeout directly within qemu and avoiding copying between qemu and
> virsh's iohelper process would result in a speed up. Clearly that's not
> the case, I attribute this to the fact that all memory pages have to be
> copied into the bounce buffer. There are more measurements/profiling
> work that I'd have to do in order to (dis)prove this hypotheses and will
> report back when I have the data.

When using the libvirt iohelper we have mutliple CPUs involved. IOW the
bounce buffer copy is taking place on a separate CPU from the QEMU
migration loop. This ability to use multiple CPUs may well have balanced
out any benefit from doing DIO on the QEMU side.

If you eliminate bounce buffers for guest RAM and write it directly to
the fixed location on disk, then we should see the benefit - and if not
then something is really wrong in our thoughts.

> However I'm sending the code now as I'd like to facilitate a discussion
> as to whether this is an approach that would be acceptable to upstream
> merging. Any ideas/comments would be much appreciated.

AFAICT this impl is still using the existing on-disk format, where RAM
pages are just written inline to the stream. For DIO benefit to be
maximised we need the on-disk format to be changed, so that the guest
RAM regions can be directly associated with fixed locations on disk.
This also means that if guest dirties RAM while its saving, then we
overwrite the existing content on disk, such that restore only ever
needs to restore each RAM page once, instead of restoring every
dirtied version.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] migration: support file: uri for source migration, Nikolay Borisov, 2022/09/08
- Re: [PATCH] migration: support file: uri for source migration, Daniel P . Berrangé <=
  - Re: [PATCH] migration: support file: uri for source migration, Nikolay Borisov, 2022/09/12
    - Re: [PATCH] migration: support file: uri for source migration, Daniel P . Berrangé, 2022/09/12

Prev by Date: Re: [PATCH v9 02/10] s390x/cpu topology: core_id sets s390x CPU topology
Next by Date: Re: [PATCH] migration: support file: uri for source migration
Previous by thread: [PATCH] migration: support file: uri for source migration
Next by thread: Re: [PATCH] migration: support file: uri for source migration
Index(es):
- Date
- Thread