bug-grub
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Grub fails with "couldn't find root_dataset" on ZFS pool (patch attached


From: Zachary Bedell
Subject: Grub fails with "couldn't find root_dataset" on ZFS pool (patch attached)
Date: Fri, 29 Jul 2011 17:21:08 -0400

Greetings all,

I've run into a bit of trouble using Grub with the native ZFS driver from 
ZFSonLinux.org.  Patch is attached which fixes the problem.

The system in question uses a mirrored ZFS pool as it's root.  Partition table 
is GPT with a bios_grub partition in front and ZFS on the remainder of the two 
SCSI drives.  Initially Grub installed without issue to this pool, and it 
booted fine.  

After a number of ungraceful reboots, pool scrubs, and some general mayhem 
caused by some swap-related deadlocks in the ZFS driver, the pool was no longer 
recognized by Grub either at boot nor by grub-probe.  The pool still imported 
successfully by the ZFS tools and scrubbed completely with no errors reported.  
I was able to use the pool without issue once the system was booted from a 
livecd and manually chroot'ed into the ZFS root.

Running grub-probe from within the chroot gave the following:

livecd grub-bzr # ./grub-probe -v /
./grub-probe: info: Looking for /dev/sda2.
./grub-probe: info: /dev/sda2 starts from 135168.
./grub-probe: info: opening the device hd0.
./grub-probe: info: the size of hd0 is 142264000.
./grub-probe: info: Partition 0 starts from 2048.
./grub-probe: info: Partition 1 starts from 135168.
./grub-probe: info: Looking for /dev/sda2.
./grub-probe: info: /dev/sda2 starts from 135168.
./grub-probe: info: opening the device hd0.
./grub-probe: info: the size of hd0 is 142264000.
./grub-probe: info: Partition 0 starts from 2048.
./grub-probe: info: Partition 1 starts from 135168.
./grub-probe: info: opening hd0,gpt2.
./grub-probe: info: the size of hd0 is 142264000.
./grub-probe: error: couldn't find root_dataset.

Comparing the pool to other pools which booted correctly showed one significant 
difference:  The ZAP object containing the pool's Object Directory was a "fat" 
ZAP on the non-bootable pool and only a micro-ZAP on the pools that worked.  
Diving into Grub's code, I found that the problem was caused by a modification 
made to the zap_leaf_chunk->zap_leaf_array structure.  According to the ZFS 
on-disk specification, that structure should be defined as something like:

        struct zap_leaf_array {
                grub_uint8_t la_type;           /* always ZAP_CHUNK_ARRAY */
                grub_uint8_t la_array[ZAP_LEAF_ARRAY_BYTES];
                grub_uint16_t la_next;          /* next blk or CHAIN_END */
        } l_array;

But Grub's zap_leaf.h defined it as:

        struct zap_leaf_array {
                grub_uint8_t la_type;           /* always ZAP_CHUNK_ARRAY */
                union
                {
                        grub_uint8_t la_array[ZAP_LEAF_ARRAY_BYTES];
                        grub_uint64_t la_array64;
                };
                grub_uint16_t la_next;          /* next blk or CHAIN_END */
        } l_array;

Using the re-defined structure, the ZAP_LEAF_CHUNK macro in zfs.c was only able 
to correctly resolve the 0th ZAP entry in a fat ZAP.  The 
sizeof(zap_leaf_array) became 40 bytes after the modifications (I'm assuming 
due to word alignment of the struct?) whereas the correct size of that 
structure is 24 bytes.  As a result, the 1st and subsequent indexes into the 
ZAP pointer array in zap_leaf_phys were calculated incorrectly.  The attached 
patch restores the correct definition of zap_leaf_array and fixes the two 
places where the added la_array64 union member were accessed.  

With this patch in place, grub-probe again correct detects 'zfs' as the 
filesystem in use on root and after re-running grub-install, the system again 
boots cleanly.

The attached patch applies cleanly against both the Grub 1.99 release and also 
a BZR pull of trunk as of earlier today (29-Jul-2011).  The patch can also be 
found on GitHub in case of damage in transit:

https://github.com/pendor/gentoo-zfs-overlay/blob/master/sys-boot/grub/files/grub-9999-fzap.patch

Attachment: grub-9999-fzap.patch
Description: Binary data


Best regards,
Zac Bedell



reply via email to

[Prev in Thread] Current Thread [Next in Thread]