help-grub
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Trying to set a grub variable and reboot when vmlinuz is corrupted


From: Sayan Paul
Subject: Trying to set a grub variable and reboot when vmlinuz is corrupted
Date: Wed, 23 Oct 2024 17:42:20 +0530

Hi all,

I am trying to implement a robust rollback mechanism for image-based
OS(which stores two deployments and is available in the boot selection menu
). The use case is for edge devices with limited to no connectivity.
The deployment to boot into can be controlled using the grub_env parameter
`Default=0/1`. 0- current, 1-fallback. more info:
https://github.com/fedora-iot/greenboot

Even though its working in user space,I want the implementation to work for
corrupted kernels also, Currently this happens:
1. grub shows the error `unable to load kernel`
2. goes back to the selection menu and waits for user instructions to
choose the deployment.

Since I want it un attended I am looking for:
1. Detects kernel load error
2. set a counter and reboot
3. Decrease the counter if the boot fails due to kernel issues again.,
reboot
4. when the counter reaches 0
5.  clear counter, set grub_env `default=1`,reboot
6. automatically boots into the fallback deployment.

I have already tried a few approaches by adding a grub config.
```
# Greenboot support for boot counter and boot success reporting
insmod increment

# Check if boot_counter exists and boot_success is 0, to activate boot
counting behavior
if [ -n "${boot_counter}" -a "${boot_success}" = "0" ]; then
  # If boot_counter has expired (0 or -1), select rollback deployment
(default=1)
  if [ "${boot_counter}" = "0" -o "${boot_counter}" = "-1" ]; then
    set default=1   # Rollback to previous OSTree deployment (second entry)
    set boot_counter=-1  # Stop decrementing further
  else
    # Otherwise, decrement boot_counter and try the new kernel again
    decrement boot_counter
  fi
  save_env boot_counter
fi

# Reset boot_success for the current boot attempt to 0
set boot_success=0
save_env boot_success

# Set timeout for kernel failure detection (30 seconds)
set timeout=30

# If control is not passed to systemd within 30 seconds, reboot
if [ "$?" -ne 0 ]; then
  echo "Kernel failed to load, rebooting in 30 seconds..."
  sleep 30  # Wait for 30 seconds
  reboot
fi
```
Can someone help me with , what I am missing, as I am all out of options
now?

Thanks
Sayan Paul


reply via email to

[Prev in Thread] Current Thread [Next in Thread]