[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID
From: |
Goffredo Baroncelli |
Subject: |
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile. |
Date: |
Wed, 26 Sep 2018 22:40:32 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 |
On 25/09/2018 17.31, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <address@hidden>
>>
>> Signed-off-by: Goffredo Baroncelli <address@hidden>
>> ---
>> grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 66 insertions(+)
>>
>> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
>> index be195448d..56c42746d 100644
>> --- a/grub-core/fs/btrfs.c
>> +++ b/grub-core/fs/btrfs.c
>> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
>> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
>> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
>> grub_uint8_t dummy2[0xc];
>> grub_uint16_t nstripes;
>> grub_uint16_t nsubstripes;
>> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data,
>> grub_disk_addr_t addr,
>> stripe_offset = low + chunk_stripe_length
>> * high;
>> csize = chunk_stripe_length - low;
>> + break;
>> + }
>> + case GRUB_BTRFS_CHUNK_TYPE_RAID5:
>> + case GRUB_BTRFS_CHUNK_TYPE_RAID6:
>> + {
>> + grub_uint64_t nparities, block_nr, high, low;
>> +
>> + redundancy = 1; /* no redundancy for now */
>> +
>> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
>> + {
>> + grub_dprintf ("btrfs", "RAID5\n");
>> + nparities = 1;
>> + }
>> + else
>> + {
>> + grub_dprintf ("btrfs", "RAID6\n");
>> + nparities = 2;
>> + }
>> +
>> + /*
>> + * A RAID 6 layout consists of several blocks spread on the disks.
>> + * The raid terminology is used to call all the blocks of a row
>> + * "stripe". Unfortunately the BTRFS terminology confuses block
>
> Stripe is data set or parity (parity stripe) on one disk. Block has
> different meaning. Please stick to btrfs terminology and say it clearly
> in the comment. And even add a link to btrfs wiki page to ease reading.
>
> I think about this one:
>
> https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
>
>> + * and stripe.
>
> I do not think so. Or at least not so much...
Trust me, generally speaking stripe is the "row" in the disks (without the
parity); looking at the ext3 man page:
....
stride=stride-size
Configure the filesystem for a RAID array with
stride-size filesystem blocks. This is the number of
blocks read or written to disk before moving to the
next disk, which is sometimes referred to as the
chunk size. This mostly affects placement of
filesystem metadata like bitmaps at mke2fs time to
avoid placing them on a single disk, which can hurt
performance. It may also be used by the block allo‐
cator.
stripe_width=stripe-width
Configure the filesystem for a RAID array with
stripe-width filesystem blocks per stripe. This is
typically stride-size * N, where N is the number of
data-bearing disks in the RAID (e.g. for RAID 5
there is one parity disk, so N will be the number of
disks in the array minus 1). This allows the block
allocator to prevent read-modify-write of the parity
in a RAID stripe if possible when the data is writ‐
ten.
....
Looking at the RAID5 wikipedia page, it seems that the term "stripe" is
coherent with the ext3 man page.
I suspect that the confusion is due to the fact that in RAID1 a stripe is in
ONE disk (because the others are like parities). In BTRFS the RAID5/6 code uses
the structure of RAID1 with some minimal extensions...
To be clear, I don't have problem to be adherent to the BTRFS terminology.
However I found this very confusing because I was used to a different
terminology. I am bit worried about the fact that grub uses both MD/DM code and
BTRFS code; a quick look to the code (eg ./grub-core/disk/diskfilter.c) shows
that the stripe_size field seems to be related to a disks row without parities.
And... yes in BTRFS "chunk" is a completely different beast than what it is
reported in the ext3 man page :-)
>
>> + *
>> + * Disk0 Disk1 Disk2 Disk3
>> + *
>> + * A1 B1 P1 Q1
>> + * Q2 A2 B2 P2
>> + * P3 Q3 A3 B3
>> + * [...]
>> + *
>> + * Note that the placement of the parities depends on row index.
>> + * In the code below:
>> + * - block_nr is the block number without the parities
>
> Well, it seems to me that the btrfs code introduces confusion not the
> spec itself. I would leave code as is but s/block number/stripe number/.
Ok I will replace the two terms. However I have to put a warning that this is a
"BTRFS" terminology :-)
>
>> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
>> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
>> + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
>
> s/disk number/disk number in a row/
This value doesn't depend by the row. So "number of disk" is more correct
>
>> + * - off is the logical address to read
>> + * - chunk_stripe_length is the size of a block (typically 64k),
>
> s/a block/a stripe/
>
>> + * - nstripes is the number of disks,
>
> s/number of disks/number of disks in a row/
ditto
>
> I miss the description of nparities here...
Right:
+ * - nparities is the number of parities (1 for RAID5, 2 for
RAID6);
+ * used only in RAID5/6 code.
>
>> + * - low is the offset of the data inside a stripe,
>> + * - stripe_offset is the disk offset,
>
> s/the disk offset/the data offset in an array/?
Yes
>
>> + * - csize is the "potential" data to read. It will be reduced to
>> + * size if the latter is smaller.
>> + */
>> + block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
>> +
>> + /*
>> + * stripen is computed without the parities (0 for A1, A2, A3...
>> + * 1 for B1, B2...).
>> + */
>> + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
>
> This is clear...
>
>> + /*
>> + * stripen is recomputed considering the parities (0 for A1, 1 for
>> + * A2, 2 for A3....).
>> + */
>> + grub_divmod64 (high + stripen, nstripes, &stripen);
>
> ... but this looks strange... You add disk number to row number. Hmmm...
> It looks that it works but this is not obvious at first sight. Could you
> explain that?
What about
+ /*
+ * stripen is recomputed considering the parities: different row
have
+ * a different offset, we have to add to stripen the number of
row ("high") in
+ * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
+ */
>
>> + stripe_offset = low + chunk_stripe_length * high;
>> + csize = chunk_stripe_length - low;
>> +
>> break;
>> }
>> default:
>
> Daniel
>
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
- [PATCH V7] Add support for BTRFS raid5/6 to GRUB, Goffredo Baroncelli, 2018/09/19
- [PATCH 5/9] btrfs: Move logging code in grub_btrfs_read_logical(), Goffredo Baroncelli, 2018/09/19
- [PATCH 2/9] btrfs: Add helper to check the btrfs header., Goffredo Baroncelli, 2018/09/19
- [PATCH 3/9] btrfs: Move the error logging from find_device() to its caller., Goffredo Baroncelli, 2018/09/19
- [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found., Goffredo Baroncelli, 2018/09/19
[PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles., Goffredo Baroncelli, 2018/09/19