[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#9500: [PATCH]: use posix_fallocate where supported
From: |
Pádraig Brady |
Subject: |
bug#9500: [PATCH]: use posix_fallocate where supported |
Date: |
Wed, 23 Nov 2011 00:49:11 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 |
On 09/14/2011 03:46 PM, Pádraig Brady wrote:
> On 09/14/2011 03:06 PM, Eric Blake wrote:
>> On 09/13/2011 11:55 PM, Kelly Anderson wrote:
>>> Hi,
>>>
>>> I put together a patch 2 or 3 years ago (back when posix_fallocate was
>>> first introduced in glibc).
>>
>> Thanks for the effort. However, this has been discussed in the past, and
>> the consensus was that we should first write a patch to gnulib that provides
>> a posix_fallocate() stub for all platforms, so that coreutils can
>> unconditionally call posix_fallocate, rather than making coreutils have to
>> use #ifdef. Among other things, a gnulib module would make it possible to
>> emulate posix_fallocate() even on older glibc where it is missing or broken.
>>
>
> Also we probably want fallocate() for this use case
> rather than posix_fallocate() in any case,
> as we don't want to fall back to writing zeros.
>
> Also I had a whole lot of fallocate() things to try
> once the fiemap() stuff landed, but unfortunately
> that doesn't work reliably on all file systems
> and is currently restricted to sparse files.
> So I need to dig out my notes on how to apply
> fallocate() to files with holes and "empty portions" again.
I thought a little about this today.
fallocate() is a feature to quickly allocate space in a file system.
It's useful for 3 things as far as I can see:
1. Improved file layout for subsequent access
2. Immediate indication of ENOSPC
3. Efficient writing of NUL portions
Note 1. is somewhat moot with newer file systems that do "delayed allocation".
So what do we need to consider when using fallocate on the destination file?
Considering just cp for the moment, its inputs impacting this are the options:
--sparse={auto,always,never}
Note with no --sparse specified we behave with --sparse=auto,
where we try to detect holes based on st_size vs st_blocks
The other significant input is the construction of the source file.
Now data in a file can generally be classed into 4 types:
Data: normal data
Zero: normal data containing only NULs
Hole: unallocated data containing only NULs
Empty: allocated data containing only NULs
One can have any of the above types at any point in the file.
Also 'Empty' is special in that it can extend beyond the apparent size.
In fact this tail allocation is common on XFS for performance reasons.
An important factor is how well we can distinguish the above data classes.
There are currently three possible identification options:
Heuristics
This is used by default to see if holes might be present.
The test is simply st_size >= the appropriate number of allocated st_blocks.
Note, this can fail for example in the case where there is
a tail allocation not accounted for in the size like:
+-----------+---+
| D | E | H | E |
+-----------+---+
Traditionally when a sparse source is detected we check input blocks
for all zeros and create a 'Hole' in the destination instead.
This is inefficient as it requires reading all the NUL data
and verifying that it is in fact NUL.
SEEK_HOLE
Available on linux since 3.1
'Empty' is treated like a 'Hole' which at least
allows 'Empty' portions to be processed quickly by `cp`.
We lose the ability to copy the allocation from src to dst.
fiemap
Available on linux since around 2.6.39
Gives greater control by distinguishing Hole and Empty,
thus allowing us to both efficiently copy and maintain allocation.
Requires sync on ext4, xfs
Code already done and used (with sync) for sparse files
Note by not being able to use fiemap with non sparse files,
means that we need to read() the empty extents which is
inefficient, especially in --sparse=always mode.
So given the above info, what functionality might the use
of fallocate() make available to cp?
Exact copy from source to dest:
Copying the source layout would mean that one could for example,
create a backup copy of a large db file, which could be then used
without worrying about fragmentation or ENOSPC issues.
There is the argument that this might be better as a higher level
file operation anyway, and perhaps `cp --reflink` might cover
this use case on some file systems at least.
fiemap gives us most control, allowing us to copy even tail
allocations from source to destination. But the sync issue
makes it not usable in general at present, and is currently
restricted to sparse files where it's used to avoid reading
'Empty' and 'Hole' portions.
Copying sparse files
It's worth noting again, the caveat mentioned above that we
might not recognise some sparse files due to tail allocation.
Given that we use fiemap (with sync) for sparse files at present,
we can augment the fiemap copying code to use fallocate where appropriate.
So dependent on the options the operations would be:
--sparse=auto => 'Empty' -> 'Empty'
--sparse=always => 'Empty' -> 'Hole' && discard tail allocation
--sparse=never => 'Hole' -> 'Empty'
Perhaps the first case could be simplified to initially doing:
fallocate(dest, blocks*blocksize))
Copying normal files
Note using SEEK_HOLE for this case, would only help
to avoid reading 'Hole' and more likely 'Empty' portions,
and should not impact on the use of fallocate(dest).
So assuming we initially did:
if ! --sparse=always
fallocate(dest, st_size)
That would throw away any tail allocation in the source,
which is probably OK as noted above. In fact we might always
discard tail allocation for consistency, unless we can use fiemap
for all cases.
I'll cook something up on this soon.
cheers,
Pádraig.
- bug#9500: [PATCH]: use posix_fallocate where supported,
Pádraig Brady <=
- bug#9500: [PATCH]: use posix_fallocate where supported, Jim Meyering, 2011/11/23
- bug#9500: [PATCH]: use posix_fallocate where supported, Pádraig Brady, 2011/11/25
- bug#9500: [PATCH]: use posix_fallocate where supported, Pádraig Brady, 2011/11/25
- bug#9500: [PATCH]: use posix_fallocate where supported, Pádraig Brady, 2011/11/25
- bug#9500: [PATCH]: use posix_fallocate where supported, Goswin von Brederlow, 2011/11/26
- bug#9500: [PATCH]: use posix_fallocate where supported, Pádraig Brady, 2011/11/29
- bug#9500: [PATCH]: use posix_fallocate where supported, Jim Meyering, 2011/11/29
- bug#9500: [PATCH]: use posix_fallocate where supported, Pádraig Brady, 2011/11/29
- bug#9500: [PATCH]: use posix_fallocate where supported, Paul Eggert, 2011/11/29