Chapter 42: Debugging
- Why kernel debugging is harder than user-space debugging
- How to use printf-style debugging via the UART serial port
- How to set up and use GDB for kernel debugging with QEMU
- How to use the QEMU monitor and its debugging commands
- How to read ARM64 exception syndrome registers for fault diagnosis
- How to implement a simple kernel panic handler with register dump
42.1 The Challenge of Kernel Debugging
Debugging a kernel is fundamentally harder than debugging a user-space program. Several factors contribute:
- No process boundary: the kernel has no OS to catch its crashes. A kernel bug crashes the entire system.
- No standard library: printf, scanf, and other debugging tools are not available until you implement them.
- No address space isolation: a rogue pointer in kernel code can corrupt any memory.
- Interrupt context: code that runs in an interrupt handler cannot block or use locking.
- Hardware state: CPU registers, MMU configuration, and peripheral state are invisible from regular code.
42.2 printf Debugging via UART
The simplest and most effective debugging technique is printing messages to the UART serial port. Every kernel chapter in this book uses this technique. The UART output appears in the terminal where QEMU runs:
/* Minimal printf for kernel debugging (see Chapter 33 for UART driver) */
void kprintf(const char *fmt, ...) {
char buf[256];
va_list args;
va_start(args, fmt);
vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
uart_send_string(buf);
}
/* Usage throughout the kernel */
kprintf("mmu: enabling MMU at %lu Hz\n", get_timer_freq());
kprintf("scheduler: starting PID %d\n", new_pid);
kprintf("ERROR: out of memory in kmalloc(%lu)\n", size);
Guidelines for effective printf debugging:
- Always include the subsystem name as a prefix (e.g., "mmu:", "sched:", "uart:")
- Use different log levels (INFO, WARN, ERROR) that can be enabled or disabled at compile time
- Print function entry/exit when tracking down crashes
- Print the values of key variables at decision points
/* Log level macros for compile-time filtering */
#define LOG_ERROR 0
#define LOG_WARN 1
#define LOG_INFO 2
#define LOG_DEBUG 3
#ifndef LOG_LEVEL
#define LOG_LEVEL LOG_INFO
#endif
#define log(level, fmt, ...) do { \
if (level <= LOG_LEVEL) { \
kprintf("%s: " fmt "\n", #level, ##__VA_ARGS__); \
} \
} while (0)
/* Usage */
log(LOG_INFO, "mmu: page table at %p", page_table);
42.3 GDB with QEMU
QEMU supports remote debugging via GDB's gdbstub. Start QEMU with the -s flag (or -gdb tcp::1234) to enable the GDB server, then connect with a cross-debugger:
# Terminal 1: start QEMU with GDB server
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
-kernel kernel.elf -s
# Terminal 2: connect GDB
aarch64-none-elf-gdb kernel.elf
(gdb) target remote localhost:1234
(gdb) break kernel_main
(gdb) continue
(gdb) stepi
(gdb) info registers
(gdb) monitor system_reset # QEMU monitor commands via GDB
Key GDB commands for kernel debugging:
| Command | Purpose |
|---|---|
target remote :1234 | Connect to QEMU's GDB stub |
break kernel_main | Set a breakpoint on a function |
break *0x40080000 | Set a breakpoint at an address |
stepi | Execute one instruction |
info registers | View all CPU registers |
info reg esr_el1 | View a specific system register |
x/16gx 0x40000000 | Examine memory as 8-byte hex values |
layout asm | Open the assembly TUI view |
layout src | Open the source TUI view |
monitor info registers | QEMU monitor: show all CPU state |
monitor info cpus | QEMU monitor: show CPU state |
42.4 QEMU Monitor
When running with -nographic, press Ctrl-A C to enter the QEMU monitor. The monitor provides debugging commands without GDB:
QEMU 8.0.0 monitor - type 'help' for more information
(qemu) info registers
(qemu) info cpus
(qemu) info irq
(qemu) info mtree # Show memory tree (mapped devices)
(qemu) system_reset # Reset the VM
(qemu) system_powerdown # Shut down
(qemu) stop # Pause execution
(qemu) cont # Resume execution
(qemu) quit
To use the monitor alongside GDB, start QEMU with separate serial and monitor ports:
qemu-system-aarch64 -M virt -cpu cortex-a72 \
-nographic \
-serial mon:stdio \
-monitor telnet::45454,server,nowait \
-kernel kernel.elf
# Connect to monitor in another terminal:
telnet localhost 45454
42.5 Exception Syndrome Decoding
When an exception occurs, ARM64 sets ESR_EL1 (Exception Syndrome Register). The upper bits indicate the exception class (EC), and the lower bits provide syndrome-specific information. Understanding ESR_EL1 is essential for debugging crashes:
/* Exception class field (ESR_EL1 bits [31:26]) */
#define EC_UNKNOWN 0b000000
#define EC_SVC64 0b010101
#define EC_IABORT_EL0 0b100000 /* Instruction abort, EL0 */
#define EC_IABORT_EL1 0b100001 /* Instruction abort, EL1 */
#define EC_DABORT_EL0 0b100100 /* Data abort, EL0 */
#define EC_DABORT_EL1 0b100101 /* Data abort, EL1 */
#define EC_SP_ALIGN 0b101000
#define EC_FP_EXCEPTION 0b101100
#define EC_SError 0b110000
/* Decode and print exception information (called from sync handler) */
void dump_exception_info(void) {
uint64_t esr, elr, far, spsr;
asm volatile("mrs %0, esr_el1" : "=r"(esr));
asm volatile("mrs %0, elr_el1" : "=r"(elr));
asm volatile("mrs %0, far_el1" : "=r"(far));
asm volatile("mrs %0, spsr_el1" : "=r"(spsr));
uint32_t ec = (esr >> 26) & 0x3F;
uint32_t iss = esr & 0xFFFFFF;
kprintf("=== EXCEPTION DUMP ===\n");
kprintf("ESR_EL1: %016lx EC=%u ISS=%u\n", esr, ec, iss);
kprintf("ELR_EL1: %016lx\n", elr);
kprintf("FAR_EL1: %016lx\n", far);
kprintf("SPSR_EL1: %016lx\n", spsr);
/* Decode exception class */
switch (ec) {
case EC_IABORT_EL1:
kprintf("Cause: Instruction Abort at EL1\n");
kprintf(" FAR holds faulting address\n");
kprintf(" ISS[6]: %s translation fault\n",
(iss & 64) ? "Second-stage" : "First-stage");
break;
case EC_DABORT_EL1:
kprintf("Cause: Data Abort at EL1\n");
kprintf(" FAR holds access address\n");
kprintf(" ISS[6]: %s\n", (iss & 64) ? "write" : "read");
break;
case EC_SVC64:
kprintf("Cause: SVC (system call) from AArch64\n");
break;
case EC_UNKNOWN:
kprintf("Cause: Unknown exception\n");
break;
default:
kprintf("Cause: Other (EC=%u)\n", ec);
}
}
42.6 Kernel Panic Handler
A panic handler prints diagnostic information and halts the system when an unrecoverable error occurs:
/* Kernel panic: print diagnostic info and halt */
void __attribute__((noreturn)) panic(const char *msg, ...) {
/* Disable interrupts */
asm volatile("msr daifset, #2");
kprintf("\n\n*** KERNEL PANIC ***\n");
va_list args;
va_start(args, msg);
kvprintf(msg, args);
va_end(args);
kprintf("\n");
/* Dump exception info if available */
dump_exception_info();
/* Dump stack trace */
kprintf("Stack trace:\n");
dump_stack_trace();
/* Dump register state */
kprintf("x0-x3: %016lx %016lx %016lx %016lx\n",
read_reg("x0"), read_reg("x1"),
read_reg("x2"), read_reg("x3"));
/* ... dump all registers ... */
kprintf("=== SYSTEM HALTED ===\n");
/* Infinite loop with WFI */
while (1) {
asm volatile("wfi");
}
}
42.7 Assertions and Sanity Checks
Assertions catch programming errors early. Our kernel provides assert() and ASSERT() macros:
/* Kernel assertion macro */
#define ASSERT(cond) do { \
if (!(cond)) { \
panic("ASSERTION FAILED: %s at %s:%d\n", \
#cond, __FILE__, __LINE__); \
} \
} while (0)
/* Check for common kernel invariants */
ASSERT(irq_count >= 0);
ASSERT(current_process != NULL);
ASSERT(preempt_count >= 0);
ASSERT(page_table != NULL);
42.8 Our Implementation
Our kernel includes the following debugging infrastructure:
- kprintf: formatted output to UART with log level filtering
- Kernel panic handler: register dump, exception decode, stack trace, halt
- ASSERT macros: compile-time enabled assertions in
<debug.h> - EL1 sync handler: automatic exception dump on crashes (Chapter 11)
- GDB support: QEMU -s flag, kernel.elf with debug symbols (-g flag)
- Stack trace: frame-pointer-based backtrace in AArch64
- Memory corruption detection: canary values at heap block boundaries
To enable debug symbols in the kernel, build with CFLAGS += -g -O0. The -O0 flag turns off optimizations, making GDB single-stepping behave predictably. Use -O2 for release builds.
42.9 Exercises
Exercise 1: Stack Trace
Implement dump_stack_trace() using the frame pointer (x29). In AArch64, each stack frame has the saved frame pointer at [fp], saved return address at [fp+8]. Walk the chain and print each return address.
Exercise 2: Debugging with GDB
Set a breakpoint on the sync exception handler. Trigger a deliberate data abort (dereference NULL) and step through the handler. Examine ESR_EL1, FAR_EL1, and ELR_EL1.
Exercise 3: Logging Levels
Implement a run-time log level that can be changed via a sysctl or a syscall. Add a user-space tool to set the current debug level.
42.10 Summary
Kernel debugging requires a multi-layered approach. printf-style logging via UART is the simplest and most used technique. GDB with QEMU provides step-by-step instruction tracing and memory inspection. The QEMU monitor offers low-level hardware state inspection. ESR_EL1 decoding reveals the exact cause of exceptions. A robust panic handler with register dumps and stack traces is essential for diagnosing unrecoverable errors. Together, these tools make kernel development practical despite the lack of a traditional debugging environment.