4.7 Troubleshooting Boot Failures: Recovery Mode and Rescue Boot

Right, so it’s all gone sideways. The machine is on, but nothing friendly is happening. Maybe you’re staring at a black screen with a blinking cursor, a cryptic error message, or worse, your manufacturer’s logo staring back at you, mocking your inability to actually do anything. Don’t panic. This isn’t a catastrophe; it’s a conversation. The hardware is talking, it’s just speaking in a language of beep codes, LED blink patterns, and text-mode errors. Our job is to listen and then talk back in a way it understands.

First, a dose of reality. If your machine won’t POST at all—no lights, no fans, nothing—this chapter isn’t your first stop. Check your power cable. Seriously. I’ve “troubleshot” for an hour only to find the power strip was switched off. If it’s not that simple, you’re likely dealing with a hardware failure: RAM, CPU, motherboard. But if you’re getting some sign of intelligent life before the operating system loads, you’re in the right place.

The Power of Last Known Good

Sometimes, the problem is simple: you just installed something gnarly. A kernel module that doesn’t play nice, a new driver that claims compatibility but lies, or a lib file you overwrote because a forum post from 2008 told you to. Modern systems anticipate this stupidity (yours or a package maintainer’s, we don’t judge).

GRUB to the rescue: If you’re using GRUB (and you probably are on Linux), hammer the Shift key (or Esc on some systems) right after the firmware hands off control. This interrupts the default boot and brings up the menu. If you don’t see it, you might need to mash the key before your manufacturer’s logo appears. Here, you can select an older kernel version. The genius part is the “Advanced options” menu, which often contains a “recovery mode” entry for each kernel.

Selecting a recovery mode option boots you into a minimal environment as root, with your main filesystem mounted read-only. This is your surgical table. From here, you can uninstall the broken package, check logs, and fix filesystem errors without the offending software even being active.

# Once in recovery mode, you'll often need to remount the root filesystem as read-write to make changes:
mount -o remount,rw /

# Now you can, for instance, purge the suspect kernel module or driver package:
apt purge nvidia-driver-550

# Or run a filesystem check on your root partition (if it's /dev/sda2, for example):
fsck -y /dev/sda2

# After fixing, reboot
reboot

When GRUB Itself is the Problem

What if you don’t get a GRUB menu? What if you get a grub> prompt? This means the GRUB bootloader stage 2 couldn’t find its configuration file (grub.cfg). This is often fixable without a rescue disk, right from the prompt. Your goal is to locate your boot partition (which contains /boot/grub) and tell GRUB where it is and what kernel to load.

This is where it feels like wizardry. You’re essentially booting the system manually to then re-install GRUB automatically.

# At the grub> prompt, first set the prefix (path to the /boot/grub directory)
set prefix=(hd0,gpt2)/boot/grub

# Set the root device (the partition you just used above)
set root=(hd0,gpt2)

# Now, load the normal module, which will read the config and give you the menu
insmod normal
normal

If that works and you boot successfully, you must immediately re-install GRUB to your disk’s master boot record (MBR) or EFI partition to make the fix permanent.

# Identify your EFI System Partition (ESP) or main disk
# Usually /dev/sda for the whole disk, or /dev/sda1 for the ESP
sudo grub-install /dev/sda
sudo update-grub

The Nuclear Option: Live USB & chroot

If you can’t get a recovery mode to work, or you’ve borked your GRUB installation beyond a quick fix, it’s time to break out the big guns: a Live USB. This is a full operating system on a stick, and it’s your lifeline. You’ll boot from it, mount your broken system’s drives, and chroot into them to fix the problem from the inside. This is the gold standard for serious repair.

The process is methodical:

Boot from the Live USB.
Identify your root partition and, if separate, your boot/EFI partition.
Mount them in a logical structure under /mnt.
Bind mount the virtual filesystems so your chroot environment has access to your hardware and processes.
chroot in and fix whatever is broken.

# Step 2: Use lsblk or fdisk -l to identify partitions. Let's assume:
# Root: /dev/nvme0n1p5
# EFI:  /dev/nvme0n1p1

# Step 3: Mount the root partition
sudo mount /dev/nvme0n1p5 /mnt

# Mount the EFI partition inside the root
sudo mount /dev/nvme0n1p1 /mnt/boot/efi

# Step 4: Bind the virtual filesystems
sudo mount --bind /dev /mnt/dev
sudo mount --bind /dev/pts /mnt/dev/pts
sudo mount --bind /proc /mnt/proc
sudo mount --bind /sys /mnt/sys

# Step 5: chroot into your system
sudo chroot /mnt

# You are now inside your broken system. You can fix GRUB, reconfigure packages, etc.
grub-install /dev/nvme0n1
update-grub

# Exit the chroot, unmount everything, and reboot
exit
sudo umount -l /mnt/dev/pts
sudo umount -l /mnt/dev
sudo umount -l /mnt/proc
sudo umount -l /mnt/sys
sudo umount /mnt/boot/efi
sudo umount /mnt
sudo reboot

The key insight here is that chroot changes the root directory that the running process sees. By doing this from a Live environment, we trick our repair tools into thinking your broken system’s / is the actual /, so when we reinstall GRUB, it writes to the correct MBR or EFI partition on your actual disk, not the Live USB’s temporary one.

Remember, the goal isn’t to avoid breaking things—that’s impossible if you’re actually exploring. The goal is to know exactly how to fix them when you do. Consider this your first real rite of passage. Welcome to the club.