[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or
From: |
Goffredo Baroncelli |
Subject: |
[PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile. |
Date: |
Wed, 18 Jul 2018 08:24:50 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 07/12/2018 03:46 PM, Daniel Kiper wrote:
> On Tue, Jun 19, 2018 at 07:39:48PM +0200, Goffredo Baroncelli wrote:
>> Signed-off-by: Goffredo Baroncelli <address@hidden>
>> ---
>> grub-core/fs/btrfs.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 70 insertions(+)
>>
>> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
>> index be195448d..fbabaebbe 100644
>> --- a/grub-core/fs/btrfs.c
>> +++ b/grub-core/fs/btrfs.c
>> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
>> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
>> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
>> grub_uint8_t dummy2[0xc];
>> grub_uint16_t nstripes;
>> grub_uint16_t nsubstripes;
>> @@ -764,6 +766,74 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data,
>> grub_disk_addr_t addr,
>> stripe_offset = low + chunk_stripe_length
>> * high;
>> csize = chunk_stripe_length - low;
>> + break;
>> + }
>> + case GRUB_BTRFS_CHUNK_TYPE_RAID5:
>> + case GRUB_BTRFS_CHUNK_TYPE_RAID6:
>> + {
>> + grub_uint64_t nparities, stripe_nr, high, low;
>> +
>> + redundancy = 1; /* no redundancy for now */
>> +
>> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
>> + {
>> + grub_dprintf ("btrfs", "RAID5\n");
>> + nparities = 1;
>> + }
>> + else
>> + {
>> + grub_dprintf ("btrfs", "RAID6\n");
>> + nparities = 2;
>> + }
>> +
>> + /*
>> + * Below is an example of a RAID 6 layout and the meaning of the
>> + * variables. The same applies to RAID 5. The only differences is
>> + * that there is only one parity disk instead of two.
>> + *
>> + * A RAID 6 layout consists of several stripes spread
>> + * on the disks, following a layout like the one below
>> + *
>> + * Disk0 Disk1 Disk2 Ddisk3
>
> s/Ddisk3/Disk3/
>
>> + *
>> + * A1 B1 P1 Q1
>> + * Q2 A2 B2 P2
>> + * P3 Q3 A3 B3
>> + * [...]
>> + *
>> + * Note that the placement of the parities depends on row index.
>> + * In the code below:
>> + * - stripe_nr is the stripe number not considering the parities
>> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
>> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
>> + * - stripen is the column number (or disk number),
>> + * - off is the logical address to read (from the beginning of
>> + * the chunk space),
>
> What do you mean by chunk?
Is a specific btrfs data structure (see
https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID).
Most of the BTRFS internal structures (e.g. where the file data is stored) are
in terms of *logical address*. At this level there is no concept of redundancy
or raid nor disks. You can see the logical address as a flat space, large as
the sum of the disks sizes (minus the space used by the parities or mirroring).
The "logical space" is divided in "slices" called chunk. A chunk typically has
a size in terms of hundred of megabytes (IIRC up to 1GB)
A chunk (or block group) is a mapping. A chunk maps a "logical address" to a
"physical address" on the disks.
You can think a chunk as a structure like:
u64 profile
u64 logical_start, logical_end
u64 disk1_id, disk1_start, disk1_end
u64 disk2_id, disk2_start, disk2_end
u64 disk3_id, disk3_start, disk3_end
[...]
In order to translate a "logical_address" to the pair of "disk" and "physical
address:
- find the chunk which maps the logical address (looking to the
"logical_start", "logical_end")
- then on the basis of "profile" value you can have the following cases:
profile == SINGLE:
is the simplest one: the logical address is mapped to the range
(disk1_start...disk1_end) of the disk "disk_id1"
profile == RAID1
like the previous one; however if the disk1 is not available,
you can find the data in the range
(disk2_start...disk2_end) of the disk "disk_id2". NOTE: the two
ranges
must have the same sizes, but may have different starts
profile == RAID0
the data is without redundancy and it is spread on different
disk each
"chunk_stripe_length" bytes; so:
off = logical_address - logical_start // this is what I
call
// address from
the
// beginning of
the chunk
and so on....
>
>> + * - chunk_stripe_length is the size of a stripe (typically 64k),
>> + * - nstripes is the number of disks,
>> + * - low is the offset of the data inside a stripe,
>> + * - stripe_offset is the disk offset from the beginning
>> + * of the disk chunk mapping start,
>
> Err... I am confused here...
As described above, the chunk data structure translate from "logical address"
to the pair (disk, physical address on the disk). Each chunk translate a small
portion of the space (typically 1GB); so it is more correct to say that a chunk
maps a range in the logical address to a range in the disks. 'off' is the
offset from the beginning of the range in the logical address. 'stripe_offset'
is the offset from the beginning of the range in the physical space
>
>> + * - csize is the "potential" data to read. It will be reduced to
>> + * size if the latter is smaller.
>> + */
>> + stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);
>
> Here it seems to me that off is from the beginning of RAID volume space.
> So, it does not agree with the comment above...
RAID volume space == chunk logical address space
>
>> + /*
>> + * stripen is evaluated without considering
>> + * the parities (0 for A1, A2, A3... 1 for B1, B2...).
>> + */
>> + high = grub_divmod64 (stripe_nr, nstripes - nparities, &stripen);
>
> This looks good.
>
>> + /*
>> + * stripen now considers also the parities (0 for A1, 1 for A2,
>> + * 2 for A3....). The math is performed modulo number of disks.
>> + */
>> + grub_divmod64 (high + stripen, nstripes, &stripen);
>
> Ditto.
>
>> + stripe_offset = low + chunk_stripe_length * high;
>
> And here I am confused. Why "* high"? If chunk_stripe_length is stripe
> length then stripe_offset leads to nowhere. Am I missing something?
> I have a feeling that comments does not agree with the code somehow.
The "physical range" is divided in small range called(?) "stripe". These stripe
are spread on the disks. 'high' is the offset from the beginning of the
physical chunk range in unit of "chunk_stripe_length"
I found quite confuse the btrfs terminology around the "stripe" word. And I
have to point out that these variables were named before my patch.
I am afraid that my comment doesn't help to "de-obfuscate" my code. I know that
my English is far to be perfect. However this code is not so different that the
previous one (i.e. handling the btrfs raid1/dup/raid0....). So I am thinking to
remove all the comments because it seems to me that the my comments increase
the confusion instead of reducing it.
Please, let me know your opinion
>
> Daniel
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5