[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID
From: |
Daniel Kiper |
Subject: |
Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile. |
Date: |
Thu, 27 Sep 2018 17:47:59 +0200 |
User-agent: |
Mutt/1.3.28i |
On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 17.31, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli <address@hidden>
> >>
> >> Signed-off-by: Goffredo Baroncelli <address@hidden>
> >> ---
> >> grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 66 insertions(+)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index be195448d..56c42746d 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
> >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
> >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
> >> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
> >> grub_uint8_t dummy2[0xc];
> >> grub_uint16_t nstripes;
> >> grub_uint16_t nsubstripes;
> >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data
> >> *data, grub_disk_addr_t addr,
> >> stripe_offset = low + chunk_stripe_length
> >> * high;
> >> csize = chunk_stripe_length - low;
> >> + break;
> >> + }
> >> + case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> >> + case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> >> + {
> >> + grub_uint64_t nparities, block_nr, high, low;
> >> +
> >> + redundancy = 1; /* no redundancy for now */
> >> +
> >> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> >> + {
> >> + grub_dprintf ("btrfs", "RAID5\n");
> >> + nparities = 1;
> >> + }
> >> + else
> >> + {
> >> + grub_dprintf ("btrfs", "RAID6\n");
> >> + nparities = 2;
> >> + }
> >> +
> >> + /*
> >> + * A RAID 6 layout consists of several blocks spread on the disks.
> >> + * The raid terminology is used to call all the blocks of a row
> >> + * "stripe". Unfortunately the BTRFS terminology confuses block
> >
> > Stripe is data set or parity (parity stripe) on one disk. Block has
> > different meaning. Please stick to btrfs terminology and say it clearly
> > in the comment. And even add a link to btrfs wiki page to ease reading.
> >
> > I think about this one:
> >
> > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
> >
> >> + * and stripe.
> >
> > I do not think so. Or at least not so much...
>
> Trust me, generally speaking stripe is the "row" in the disks (without the
> parity); looking at the ext3 man page:
>
> ....
> stride=stride-size
> Configure the filesystem for a RAID array with
> stride-size filesystem blocks. This is the number of
> blocks read or written to disk before moving to the
> next disk, which is sometimes referred to as the
> chunk size. This mostly affects placement of
> filesystem metadata like bitmaps at mke2fs time to
> avoid placing them on a single disk, which can hurt
> performance. It may also be used by the block
> allo???
> cator.
>
> stripe_width=stripe-width
> Configure the filesystem for a RAID array with
> stripe-width filesystem blocks per stripe. This is
> typically stride-size * N, where N is the number of
> data-bearing disks in the RAID (e.g. for RAID 5
> there is one parity disk, so N will be the number of
> disks in the array minus 1). This allows the block
> allocator to prevent read-modify-write of the parity
> in a RAID stripe if possible when the data is
> writ???
> ten.
>
> ....
> Looking at the RAID5 wikipedia page, it seems that the term "stripe"
> is coherent with the ext3 man page.
Ugh... It looks that I have messed up things. Sorry about that.
> I suspect that the confusion is due to the fact that in RAID1 a stripe
> is in ONE disk (because the others are like parities). In BTRFS the
> RAID5/6 code uses the structure of RAID1 with some minimal
> extensions...
>
> To be clear, I don't have problem to be adherent to the BTRFS
> terminology. However I found this very confusing because I was used to
> a different terminology. I am bit worried about the fact that grub
Yeah, I have the same feeling. However, I think that in btrfs code we
should stay with btrfs terms. Though I think that it make sense to
underline differences between btrfs and well known RAID here.
> uses both MD/DM code and BTRFS code; a quick look to the code (eg
> ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems
> to be related to a disks row without parities.
>
> And... yes in BTRFS "chunk" is a completely different beast than what
> it is reported in the ext3 man page :-)
As I said above, please say it in the comment. This will ease reading
for people who are not used to btrfs terms.
> >> + *
> >> + * Disk0 Disk1 Disk2 Disk3
> >> + *
> >> + * A1 B1 P1 Q1
> >> + * Q2 A2 B2 P2
> >> + * P3 Q3 A3 B3
> >> + * [...]
> >> + *
> >> + * Note that the placement of the parities depends on row index.
> >> + * In the code below:
> >> + * - block_nr is the block number without the parities
> >
> > Well, it seems to me that the btrfs code introduces confusion not the
> > spec itself. I would leave code as is but s/block number/stripe number/.
>
> Ok I will replace the two terms. However I have to put a warning that this is
> a "BTRFS" terminology :-)
Yep, and please explain the differences at the beginning of the comment.
> >> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
> >> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
> >> + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
> >
> > s/disk number/disk number in a row/
>
> This value doesn't depend by the row. So "number of disk" is more correct
Yes, but without "row" it is a bit confusing because it suggests that
it is an arbitrary number. Even if you give an example next to the
description. So, I am insisting on adding "in a row" here.
> >> + * - off is the logical address to read
> >> + * - chunk_stripe_length is the size of a block (typically 64k),
> >
> > s/a block/a stripe/
Taking into account discussion above I am not sure right now which one
is correct. Please double check and fix it if it is needed.
> >> + * - nstripes is the number of disks,
> >
> > s/number of disks/number of disks in a row/
> ditto
As above, Is it total number of disks in array? I do not think so.
Hence, I think that "in a row" helps a bit. Even if it is not very
precise. However, if you come up with something better I am not
against it.
> > I miss the description of nparities here...
>
> Right:
> + * - nparities is the number of parities (1 for RAID5, 2 for
> RAID6);
> + * used only in RAID5/6 code.
LGTM.
> >> + * - low is the offset of the data inside a stripe,
> >> + * - stripe_offset is the disk offset,
> >
> > s/the disk offset/the data offset in an array/?
>
> Yes
>
> >
> >> + * - csize is the "potential" data to read. It will be reduced to
> >> + * size if the latter is smaller.
> >> + */
> >> + block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
> >> +
> >> + /*
> >> + * stripen is computed without the parities (0 for A1, A2, A3...
> >> + * 1 for B1, B2...).
> >> + */
> >> + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
> >
> > This is clear...
> >
> >> + /*
> >> + * stripen is recomputed considering the parities (0 for A1, 1 for
> >> + * A2, 2 for A3....).
> >> + */
> >> + grub_divmod64 (high + stripen, nstripes, &stripen);
> >
> > ... but this looks strange... You add disk number to row number. Hmmm...
> > It looks that it works but this is not obvious at first sight. Could you
> > explain that?
>
> What about
> + /*
> + * stripen is recomputed considering the parities: different row
> have
> + * a different offset, we have to add to stripen the number of
> row ("high") in
> + * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
> + */
This is better but not much. You are repeating what code does.
I am especially interested in why this math is correct. It is not
obvious at first sight. If it is not it should be explained.
Otherwise we will forget in a few months why it is correct.
Daniel
- [PATCH V7] Add support for BTRFS raid5/6 to GRUB, Goffredo Baroncelli, 2018/09/19
- [PATCH 5/9] btrfs: Move logging code in grub_btrfs_read_logical(), Goffredo Baroncelli, 2018/09/19
- [PATCH 2/9] btrfs: Add helper to check the btrfs header., Goffredo Baroncelli, 2018/09/19
- [PATCH 3/9] btrfs: Move the error logging from find_device() to its caller., Goffredo Baroncelli, 2018/09/19
- [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found., Goffredo Baroncelli, 2018/09/19
[PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles., Goffredo Baroncelli, 2018/09/19