[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
From: |
Török Edwin |
Subject: |
Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10 |
Date: |
Sun, 11 May 2008 14:35:41 +0300 |
User-agent: |
Mozilla-Thunderbird 2.0.0.12 (X11/20080420) |
[sending to grub-devel@ as requested]
Robert Millan wrote:
> On Sun, May 04, 2008 at 05:01:32PM +0300, Török Edwin wrote:
>
>>>> Device Boot Start End Blocks Id System
>>>> /dev/sda1 * 1 1275 10241406 7 HPFS/NTFS
>>>> /dev/sda2 1276 2248 7815622+ a6 OpenBSD
>>>> /dev/sda3 2249 5289 24426832+ f W95 Ext'd (LBA)
>>>> /dev/sda4 6080 7296 9775552+ bf Solaris
>>>> /dev/sda5 2249 2371 987966 82 Linux swap /
>>>> Solaris
>>>> /dev/sda6 2372 3587 9767488+ 83 Linux
>>>> /dev/sda7 3588 3600 104391 83 Linux
>>>> /dev/sda8 3601 4863 10145016 8e Linux LVM
>>>> /dev/sda9 4864 5228 2931831 a6 OpenBSD
>>>> /dev/sda10 5229 5289 489951 83 Linux
>>>>
>> [...]
>> grub> ls (hd0,10)
>> error: unknown device
>> grub> ls (hd0,11)
>> error: unknown device
>> grub>
>>
>
> I tried reproducing your setup, but I can't hit the same bug. This starts to
> look really nasty. Just spotted this:
>
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x80,
> type 0x7, start 0x3f, len 0x1388afc
> [...]
> /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x0,
> type 0x82, start 0x2270f07, len 0x1e267c
>
> for which I can't find any explanation other than memory corruption. Also,
> due to a missing fflush() call the output is somewhat scrambled, which makes
> it harder to track (I fixed this already in upstream).
>
> Could you:
>
> - Apply the attached patch & run grub-probe again (this time output
> will be a bit more readable)
>
There was no patch attached, however I did a 'cvs diff -u -D2008-04-30',
and applied that patch.
I found what the problem is, and it also explains why you couldn't
reproduce the problem.
/dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
iteration fails with an error: invalid disk label magic 0x%x.
If I replace that return with a continue, it works.
The problem is that grub2 stops looking for more partitions as soon as
it encountered the invalid partition,
grub 0.97 was working perfectly and I never noticed the partition has
the wrong type!
Also if I change the partition type to 83 (as it should be) an unpatched
grub-probe can find that /boot is on /dev/sda10:
# grub-probe -t device /boot
/dev/sda10
I think grub2 should handle errors more gracefully, eventually mark the
partition as invalid, and keep going.
grub-probe was looking for /dev/sda10, and it shouldn't be affected by
/dev/sda9 being corrupted/invalid.
Think of it this way: if a partition gets corrupted, that shouldn't
prevent from booting, assuming the boot and root partitions are
still ok.
Compare what grub-emu says when sda9 has wrong type:
grub> ls (hd0,10)
error: unknown device
And this is what it says when sda9 has the correct type:
grub> ls (hd0,10)
Partition hd0,10: Filesystem type ext2, Label debian_BOOT
> - Send it to address@hidden
>
Done
> ?
>
> Maybe someone there has an idea, but if it's memory corruption and we can't
> reproduce it, tracing the problem remotely isn't going to work very well.
>
It wasn't memory corruption, however I have run valgrind and it has
shown some leaks, plus call to stat() with NULL parameter.
The attached patch fixes some valgrind warnings. Some leaks still
remain, I attached the new valgrind logs.
P.S.: grub2 seems to work now, I am able to boot with it with the
text-mode menu. The default graphics mode doesn't work I will open a
separate bug about that.
Best regards,
--Edwin
diff -ur grub2-1.96+20080429/kern/disk.c ../grub2-1.96+20080429/kern/disk.c
--- grub2-1.96+20080429/kern/disk.c 2008-02-08 14:22:51.000000000 +0200
+++ ../grub2-1.96+20080429/kern/disk.c 2008-05-11 13:58:02.270673755 +0300
@@ -317,7 +317,10 @@
/* Reset the timer. */
grub_last_time = grub_get_rtc ();
- grub_free (disk->partition);
+ if(disk->partition) {
+ grub_free (disk->partition->data);
+ grub_free (disk->partition);
+ }
grub_free ((void *) disk->name);
grub_free (disk);
}
diff -ur grub2-1.96+20080429/util/grub-probe.c
../grub2-1.96+20080429/util/grub-probe.c
--- grub2-1.96+20080429/util/grub-probe.c 2008-05-11 13:59:14.934811935
+0300
+++ ../grub2-1.96+20080429/util/grub-probe.c 2008-05-11 13:46:21.729236855
+0300
@@ -190,9 +190,10 @@
struct stat st;
grub_fs_t fs;
- stat (path, &st);
+ if(path)
+ stat (path, &st);
- if (st.st_mode == S_IFREG)
+ if (path && st.st_mode == S_IFREG)
{
/* Regular file. Verify that we can read it properly. */
==25071== Memcheck, a memory error detector.
==25071== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25071== Using LibVEX rev 1804, a library for dynamic binary translation.
==25071== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25071== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation
framework.
==25071== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25071== For more details, rerun with: -v
==25071==
==25071== My PID = 25071, parent PID = 5663. Prog and args are:
==25071== ./grub-probe
==25071== -d
==25071== /dev/sda10
==25071==
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071== This could cause spurious value errors to appear.
==25071== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
proper wrapper.
==25071==
==25071== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)
==25071== malloc/free: in use at exit: 611,077 bytes in 176 blocks.
==25071== malloc/free: 901 allocs, 725 frees, 2,397,201 bytes allocated.
==25071== For counts of detected errors, rerun with: -v
==25071== searching for pointers to 176 not-freed blocks.
==25071== checked 662,256 bytes.
==25071==
==25071== 4,096 bytes in 1 blocks are possibly lost in loss record 3 of 5
==25071== at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071== by 0x804AFE4: xmalloc (misc.c:81)
==25071== by 0x804B41A: grub_malloc (misc.c:222)
==25071== by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071== by 0x804CDC1: grub_disk_read (disk.c:461)
==25071== by 0x8069A72: grub_lvm_scan_device (lvm.c:288)
==25071== by 0x804C014: iterate_partition.2134 (device.c:132)
==25071== by 0x8066C9C: pc_partition_map_iterate (pc.c:153)
==25071== by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071== by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071== by 0x80498FA: call_hook (biosdisk.c:132)
==25071== by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071==
==25071==
==25071== 41,136 (41,132 direct, 4 indirect) bytes in 12 blocks are definitely
lost in loss record 4 of 5
==25071== at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071== by 0x804AFE4: xmalloc (misc.c:81)
==25071== by 0x804B41A: grub_malloc (misc.c:222)
==25071== by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071== by 0x804CDC1: grub_disk_read (disk.c:461)
==25071== by 0x8066D4E: pc_partition_map_iterate (pc.c:165)
==25071== by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071== by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071== by 0x80498FA: call_hook (biosdisk.c:132)
==25071== by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071== by 0x804C4CC: grub_disk_dev_iterate (disk.c:205)
==25071== by 0x804BF63: grub_device_iterate (device.c:138)
==25071==
==25071== LEAK SUMMARY:
==25071== definitely lost: 41,132 bytes in 12 blocks.
==25071== indirectly lost: 4 bytes in 1 blocks.
==25071== possibly lost: 4,096 bytes in 1 blocks.
==25071== still reachable: 565,845 bytes in 162 blocks.
==25071== suppressed: 0 bytes in 0 blocks.
==25071== Reachable blocks (those to which a pointer was found) are not shown.
==25071== To see them, rerun with: --leak-check=full --show-reachable=yes
- Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10,
Török Edwin <=