help-grub
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Trying to set a grub variable and reboot when vmlinuz is corrupted


From: Andrei Borzenkov
Subject: Re: Trying to set a grub variable and reboot when vmlinuz is corrupted
Date: Wed, 23 Oct 2024 20:20:57 +0300
User-agent: Mozilla Thunderbird

23.10.2024 15:12, Sayan Paul wrote:
Hi all,

I am trying to implement a robust rollback mechanism for image-based
OS(which stores two deployments and is available in the boot selection menu
). The use case is for edge devices with limited to no connectivity.
The deployment to boot into can be controlled using the grub_env parameter
`Default=0/1`. 0- current, 1-fallback. more info:
https://github.com/fedora-iot/greenboot

Even though its working in user space,I want the implementation to work for
corrupted kernels also, Currently this happens:
1. grub shows the error `unable to load kernel`
2. goes back to the selection menu and waits for user instructions to
choose the deployment.

Since I want it un attended I am looking for:
1. Detects kernel load error
2. set a counter and reboot
3. Decrease the counter if the boot fails due to kernel issues again.,
reboot
4. when the counter reaches 0
5.  clear counter, set grub_env `default=1`,reboot
6. automatically boots into the fallback deployment.

I have already tried a few approaches by adding a grub config.
```
# Greenboot support for boot counter and boot success reporting
insmod increment

# Check if boot_counter exists and boot_success is 0, to activate boot
counting behavior
if [ -n "${boot_counter}" -a "${boot_success}" = "0" ]; then
   # If boot_counter has expired (0 or -1), select rollback deployment
(default=1)
   if [ "${boot_counter}" = "0" -o "${boot_counter}" = "-1" ]; then
     set default=1   # Rollback to previous OSTree deployment (second entry)
     set boot_counter=-1  # Stop decrementing further
   else
     # Otherwise, decrement boot_counter and try the new kernel again
     decrement boot_counter
   fi
   save_env boot_counter
fi

# Reset boot_success for the current boot attempt to 0
set boot_success=0
save_env boot_success

# Set timeout for kernel failure detection (30 seconds)
set timeout=30

# If control is not passed to systemd within 30 seconds, reboot
if [ "$?" -ne 0 ]; then
   echo "Kernel failed to load, rebooting in 30 seconds..."
   sleep 30  # Wait for 30 seconds
   reboot
fi
```
Can someone help me with , what I am missing, as I am all out of options
now?


It is completely unclear where this config "was added", is it the complete grub.cfg of just part of it, etc



reply via email to

[Prev in Thread] Current Thread [Next in Thread]