The load/store API is not scalable when bitmaps are 1 MB or larger.
For example, a 500 GB disk image with 64 KB granularity requires a 1 MB
bitmap. If a guest has several disk images of this size, then multiple
megabytes must be read to start the guest and written out to shut down
the guest.
By comparison, the L1 table for the 500 GB disk image is less than 8 KB.
I think something like qcow2-cache.c or metabitmaps should be used to
lazily read/write persistent bitmaps. That way only small portions need
to be read/written at a time.
Stefan