ARM64 OS Handbook
🔍

Chapter 42: Debugging

What You Will Learn in This Chapter
  • Why kernel debugging is harder than user-space debugging
  • How to use printf-style debugging via the UART serial port
  • How to set up and use GDB for kernel debugging with QEMU
  • How to use the QEMU monitor and its debugging commands
  • How to read ARM64 exception syndrome registers for fault diagnosis
  • How to implement a simple kernel panic handler with register dump

42.1 The Challenge of Kernel Debugging

Debugging a kernel is fundamentally harder than debugging a user-space program. Several factors contribute:

  • No process boundary: the kernel has no OS to catch its crashes. A kernel bug crashes the entire system.
  • No standard library: printf, scanf, and other debugging tools are not available until you implement them.
  • No address space isolation: a rogue pointer in kernel code can corrupt any memory.
  • Interrupt context: code that runs in an interrupt handler cannot block or use locking.
  • Hardware state: CPU registers, MMU configuration, and peripheral state are invisible from regular code.

42.2 printf Debugging via UART

The simplest and most effective debugging technique is printing messages to the UART serial port. Every kernel chapter in this book uses this technique. The UART output appears in the terminal where QEMU runs:

/* Minimal printf for kernel debugging (see Chapter 33 for UART driver) */
void kprintf(const char *fmt, ...) {
    char buf[256];
    va_list args;
    va_start(args, fmt);
    vsnprintf(buf, sizeof(buf), fmt, args);
    va_end(args);
    uart_send_string(buf);
}

/* Usage throughout the kernel */
kprintf("mmu: enabling MMU at %lu Hz\n", get_timer_freq());
kprintf("scheduler: starting PID %d\n", new_pid);
kprintf("ERROR: out of memory in kmalloc(%lu)\n", size);

Guidelines for effective printf debugging:

  • Always include the subsystem name as a prefix (e.g., "mmu:", "sched:", "uart:")
  • Use different log levels (INFO, WARN, ERROR) that can be enabled or disabled at compile time
  • Print function entry/exit when tracking down crashes
  • Print the values of key variables at decision points
/* Log level macros for compile-time filtering */
#define LOG_ERROR 0
#define LOG_WARN  1
#define LOG_INFO  2
#define LOG_DEBUG 3

#ifndef LOG_LEVEL
#define LOG_LEVEL LOG_INFO
#endif

#define log(level, fmt, ...) do { \
    if (level <= LOG_LEVEL) { \
        kprintf("%s: " fmt "\n", #level, ##__VA_ARGS__); \
    } \
} while (0)

/* Usage */
log(LOG_INFO, "mmu: page table at %p", page_table);

42.3 GDB with QEMU

QEMU supports remote debugging via GDB's gdbstub. Start QEMU with the -s flag (or -gdb tcp::1234) to enable the GDB server, then connect with a cross-debugger:

# Terminal 1: start QEMU with GDB server
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
    -kernel kernel.elf -s

# Terminal 2: connect GDB
aarch64-none-elf-gdb kernel.elf
(gdb) target remote localhost:1234
(gdb) break kernel_main
(gdb) continue
(gdb) stepi
(gdb) info registers
(gdb) monitor system_reset  # QEMU monitor commands via GDB

Key GDB commands for kernel debugging:

CommandPurpose
target remote :1234Connect to QEMU's GDB stub
break kernel_mainSet a breakpoint on a function
break *0x40080000Set a breakpoint at an address
stepiExecute one instruction
info registersView all CPU registers
info reg esr_el1View a specific system register
x/16gx 0x40000000Examine memory as 8-byte hex values
layout asmOpen the assembly TUI view
layout srcOpen the source TUI view
monitor info registersQEMU monitor: show all CPU state
monitor info cpusQEMU monitor: show CPU state

42.4 QEMU Monitor

When running with -nographic, press Ctrl-A C to enter the QEMU monitor. The monitor provides debugging commands without GDB:

QEMU 8.0.0 monitor - type 'help' for more information
(qemu) info registers
(qemu) info cpus
(qemu) info irq
(qemu) info mtree       # Show memory tree (mapped devices)
(qemu) system_reset     # Reset the VM
(qemu) system_powerdown # Shut down
(qemu) stop             # Pause execution
(qemu) cont             # Resume execution
(qemu) quit

To use the monitor alongside GDB, start QEMU with separate serial and monitor ports:

qemu-system-aarch64 -M virt -cpu cortex-a72 \
    -nographic \
    -serial mon:stdio \
    -monitor telnet::45454,server,nowait \
    -kernel kernel.elf

# Connect to monitor in another terminal:
telnet localhost 45454

42.5 Exception Syndrome Decoding

When an exception occurs, ARM64 sets ESR_EL1 (Exception Syndrome Register). The upper bits indicate the exception class (EC), and the lower bits provide syndrome-specific information. Understanding ESR_EL1 is essential for debugging crashes:

/* Exception class field (ESR_EL1 bits [31:26]) */
#define EC_UNKNOWN         0b000000
#define EC_SVC64           0b010101
#define EC_IABORT_EL0      0b100000  /* Instruction abort, EL0 */
#define EC_IABORT_EL1      0b100001  /* Instruction abort, EL1 */
#define EC_DABORT_EL0      0b100100  /* Data abort, EL0 */
#define EC_DABORT_EL1      0b100101  /* Data abort, EL1 */
#define EC_SP_ALIGN        0b101000
#define EC_FP_EXCEPTION    0b101100
#define EC_SError          0b110000

/* Decode and print exception information (called from sync handler) */
void dump_exception_info(void) {
    uint64_t esr, elr, far, spsr;
    asm volatile("mrs %0, esr_el1" : "=r"(esr));
    asm volatile("mrs %0, elr_el1" : "=r"(elr));
    asm volatile("mrs %0, far_el1"  : "=r"(far));
    asm volatile("mrs %0, spsr_el1" : "=r"(spsr));

    uint32_t ec = (esr >> 26) & 0x3F;
    uint32_t iss = esr & 0xFFFFFF;

    kprintf("=== EXCEPTION DUMP ===\n");
    kprintf("ESR_EL1:  %016lx  EC=%u ISS=%u\n", esr, ec, iss);
    kprintf("ELR_EL1:  %016lx\n", elr);
    kprintf("FAR_EL1:  %016lx\n", far);
    kprintf("SPSR_EL1: %016lx\n", spsr);

    /* Decode exception class */
    switch (ec) {
    case EC_IABORT_EL1:
        kprintf("Cause: Instruction Abort at EL1\n");
        kprintf("  FAR holds faulting address\n");
        kprintf("  ISS[6]: %s translation fault\n",
                (iss & 64) ? "Second-stage" : "First-stage");
        break;
    case EC_DABORT_EL1:
        kprintf("Cause: Data Abort at EL1\n");
        kprintf("  FAR holds access address\n");
        kprintf("  ISS[6]: %s\n", (iss & 64) ? "write" : "read");
        break;
    case EC_SVC64:
        kprintf("Cause: SVC (system call) from AArch64\n");
        break;
    case EC_UNKNOWN:
        kprintf("Cause: Unknown exception\n");
        break;
    default:
        kprintf("Cause: Other (EC=%u)\n", ec);
    }
}

42.6 Kernel Panic Handler

A panic handler prints diagnostic information and halts the system when an unrecoverable error occurs:

/* Kernel panic: print diagnostic info and halt */
void __attribute__((noreturn)) panic(const char *msg, ...) {
    /* Disable interrupts */
    asm volatile("msr daifset, #2");

    kprintf("\n\n*** KERNEL PANIC ***\n");

    va_list args;
    va_start(args, msg);
    kvprintf(msg, args);
    va_end(args);

    kprintf("\n");

    /* Dump exception info if available */
    dump_exception_info();

    /* Dump stack trace */
    kprintf("Stack trace:\n");
    dump_stack_trace();

    /* Dump register state */
    kprintf("x0-x3:  %016lx %016lx %016lx %016lx\n",
            read_reg("x0"), read_reg("x1"),
            read_reg("x2"), read_reg("x3"));
    /* ... dump all registers ... */

    kprintf("=== SYSTEM HALTED ===\n");

    /* Infinite loop with WFI */
    while (1) {
        asm volatile("wfi");
    }
}

42.7 Assertions and Sanity Checks

Assertions catch programming errors early. Our kernel provides assert() and ASSERT() macros:

/* Kernel assertion macro */
#define ASSERT(cond) do { \
    if (!(cond)) { \
        panic("ASSERTION FAILED: %s at %s:%d\n", \
              #cond, __FILE__, __LINE__); \
    } \
} while (0)

/* Check for common kernel invariants */
ASSERT(irq_count >= 0);
ASSERT(current_process != NULL);
ASSERT(preempt_count >= 0);
ASSERT(page_table != NULL);

42.8 Our Implementation

Our kernel includes the following debugging infrastructure:

  • kprintf: formatted output to UART with log level filtering
  • Kernel panic handler: register dump, exception decode, stack trace, halt
  • ASSERT macros: compile-time enabled assertions in <debug.h>
  • EL1 sync handler: automatic exception dump on crashes (Chapter 11)
  • GDB support: QEMU -s flag, kernel.elf with debug symbols (-g flag)
  • Stack trace: frame-pointer-based backtrace in AArch64
  • Memory corruption detection: canary values at heap block boundaries

To enable debug symbols in the kernel, build with CFLAGS += -g -O0. The -O0 flag turns off optimizations, making GDB single-stepping behave predictably. Use -O2 for release builds.

42.9 Exercises

Exercise 1: Stack Trace

Implement dump_stack_trace() using the frame pointer (x29). In AArch64, each stack frame has the saved frame pointer at [fp], saved return address at [fp+8]. Walk the chain and print each return address.

Exercise 2: Debugging with GDB

Set a breakpoint on the sync exception handler. Trigger a deliberate data abort (dereference NULL) and step through the handler. Examine ESR_EL1, FAR_EL1, and ELR_EL1.

Exercise 3: Logging Levels

Implement a run-time log level that can be changed via a sysctl or a syscall. Add a user-space tool to set the current debug level.

42.10 Summary

Kernel debugging requires a multi-layered approach. printf-style logging via UART is the simplest and most used technique. GDB with QEMU provides step-by-step instruction tracing and memory inspection. The QEMU monitor offers low-level hardware state inspection. ESR_EL1 decoding reveals the exact cause of exceptions. A robust panic handler with register dumps and stack traces is essential for diagnosing unrecoverable errors. Together, these tools make kernel development practical despite the lack of a traditional debugging environment.