qemu-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Virtual hard disks slow to back up because they are single big files


From: R. Diez
Subject: Virtual hard disks slow to back up because they are single big files
Date: Thu, 24 Dec 2020 12:46:15 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Hi all:

I have written a Bash script to stop and back up my libvirt/KVM/QEMU virtual 
machine every night:

https://github.com/rdiez/Tools/blob/master/VirtualMachineManager/BackupVm.sh

There are ways to optimise it, but this simple method works well for me.

The trouble is, it takes pretty long to back up the virtual hard disk.

I could optimise the backup with LVM snapshots or with a deduplicating backup tool like BorgBackup. And I could move some files from the VM's main virtual disk to a separate virtual disk and then backup the files inside (and not the whole separate virtual disk). Faster hardware could of course help.

But there is one issue that stays the same: any backup tool must read the complete big file with the main virtual hard disk (where the OS is installed) every time.

As far as I can see, there are 2 main virtual disk formats: qcow2 and raw. But 
both feature a single huge file.

Overlays or "backing chains" may alleviate the problem, but we would only be moving the big file problem from a big file to another one. At some point in time, you need to merge the changes back. So it would not really solve the problem, and it would add administration costs.

There is an article in LWN that discusses the issue:

  Changed-block tracking and differential backups in QEMU
  https://lwn.net/Articles/837053/

But the proposed implementation sounds rather complicated.

Wouldn't it be easier if QEMU could partition a raw virtual disk into several files? For example, a 200 GB virtual disk could be made up of 100 files, each with 2 GB of data. This way, a backup tool could check the "last modified time" to know whether a particular chunk has changed since the last backup.

It is not very high tech, but it would considerably reduce the reading/scanning phase when doing backups. And the implementation would be pretty straightforward.

Regards,
  rdiez


reply via email to

[Prev in Thread] Current Thread [Next in Thread]