Chapter 8: The Bootloader

What You Will Learn in This Chapter

What a bootloader is and why we need one
How a bootloader reads the kernel from storage and places it in memory
The structure of a simple bootloader for QEMU virt
How to load an ELF file from a raw disk image
How to pass control to the kernel with the correct CPU state
The difference between a bootloader and a kernel

8.1 What is a Bootloader?

A bootloader is a small program that loads the operating system kernel into memory and starts it. It is the bridge between the firmware (which initializes hardware) and the kernel (which manages the system).

The bootloader solves a chicken-and-egg problem: the kernel is stored on disk, but the disk driver is part of the kernel. How do we load the kernel without the kernel being loaded? The bootloader contains just enough code to read from storage, parse the kernel image format, and jump to it.

A bootloader does these things:

Initialize hardware enough to read the kernel (storage controller, memory, UART for debug output)
Read the kernel image from a storage device (SD card, disk, flash, network)
Parse the kernel format (raw binary, ELF, or a custom format) to know where to place each section in memory
Set up the kernel environment (device tree, boot parameters)
Jump to the kernel entry point at the correct exception level

So far we have been using QEMU's -kernel kernel.elf option, which makes QEMU act as its own bootloader. In this chapter, we will write our own bootloader so we understand exactly what happens between power-on and kernel execution.

8.2 Bootloader vs Kernel

Property	Bootloader	Kernel
Size	Tiny (a few KB)	Large (MB)
Lifetime	Runs once, then exits	Runs until shutdown
Complexity	Minimal	High (scheduler, MMU, drivers, FS)
Storage driver	Minimal (enough to read kernel)	Full driver stack
Memory management	None (loads to fixed addresses)	Complete (paging, heap, VMA)
User interaction	May have menu/console	Shell or GUI
Where it runs	EL2 (or EL1 if no EL2 support)	EL1

The bootloader does only what is necessary to start the kernel, then it is done. It does not manage processes, handle system calls, or schedule tasks. That is the kernel's job.

8.3 Storage on QEMU virt

To load our kernel from a disk, we need a storage device. QEMU virt provides several options:

Device	Interface	How to Add It
SD card (SDHCI)	MMIO at 0x09040000	`-device sd-card,drive=sd0`
VirtIO block	MMIO at 0x0A000000+	`-device virtio-blk-device,drive=disk0`
PCI storage	PCI bus	`-device ahci,id=ahci`
Flash/NOR	Memory-mapped	`-drive file=flash.img,format=raw,if=pflash`

For simplicity, we will use a raw disk image with our kernel stored at a known offset. The bootloader reads the kernel from this image and loads it into memory.

# Create a 64 MB raw disk image
dd if=/dev/zero of=disk.img bs=1M count=64

# Write our kernel binary at sector 1 (byte offset 512)
dd if=kernel.bin of=disk.img bs=512 seek=1 conv=notrunc

# Run QEMU with the disk image
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
    -drive file=disk.img,format=raw,id=hd0 \
    -device virtio-blk-device,drive=hd0

8.4 A Minimal Bootloader

Our bootloader will be the simplest possible: it runs at EL2, reads the kernel from a known location in memory (loaded by QEMU's firmware), and jumps to it at EL1.

On QEMU virt, we can place the bootloader at the beginning of the flash and have it load the kernel from a fixed address. But an even simpler approach is to use a two-stage boot:

Stage 1: A tiny assembly program at the reset vector that initializes basic hardware and loads stage 2
Stage 2: A slightly larger program that reads the kernel from disk, parses it, and jumps to it

For QEMU virt, we can build stage 2 directly. Here is a minimal bootloader that:

Runs at EL2
Sets up a stack and UART for debug
Copies the kernel from a known memory location to the kernel load address
Sets up the device tree pointer
Drops to EL1 and jumps to the kernel

boot.S

.section .text
.global _start

.equ KERNEL_LOAD_ADDR,  0x40000000
.equ KERNEL_SOURCE_ADDR, 0x50000000  /* firmware places kernel here */
.equ KERNEL_MAX_SIZE,    0x00100000  /* 1 MB max kernel size */
.equ UART_BASE,          0x09000000

_start:
    /* Set up stack */
    ldr x0, =_stack_end
    mov sp, x0

    /* Initialize UART for debug output */
    bl uart_init
    ldr x0, =boot_msg
    bl uart_puts

    /* Copy kernel from source to load address */
    ldr x0, =KERNEL_SOURCE_ADDR
    ldr x1, =KERNEL_LOAD_ADDR
    ldr x2, =KERNEL_MAX_SIZE
    bl memcpy

    /* Set up device tree */
    ldr x0, =dtb_addr      /* address of device tree provided by firmware */
    ldr x0, [x0]

    /* Set CPU ID */
    mov x1, #0

    /* Drop to EL1 and jump to kernel */
    mov x2, #0x3C5         /* EL1h, all interrupts masked */
    msr SPSR_EL2, x2
    ldr x2, =KERNEL_LOAD_ADDR
    msr ELR_EL2, x2
    eret

uart_init:
    /* Minimal UART init for PL011 */
    ldr x0, =UART_BASE
    mov w1, #0
    str w1, [x0, #0x30]     /* disable UART */
    mov w1, #13
    str w1, [x0, #0x24]     /* IBRD */
    mov w1, #1
    str w1, [x0, #0x28]     /* FBRD */
    mov w1, #0x70
    str w1, [x0, #0x2C]     /* LCR_H (8-bit, enable FIFO) */
    mov w1, #0x301
    str w1, [x0, #0x30]     /* enable UART, TX, RX */
    ret

uart_puts:
    ldr x1, =UART_BASE
1:  ldrb w2, [x0], #1
    cbz w2, 2f
    str w2, [x1]            /* write character to UART data register */
    b 1b
2:  ret

memcpy:
    /* x0 = source, x1 = destination, x2 = size */
1:  cbz x2, 2f
    ldr x3, [x0], #8
    str x3, [x1], #8
    sub x2, x2, #8
    b 1b
2:  ret

boot_msg:
    .asciz "[BOOT] Loading kernel...\r\n"

.section .bss
.align 4
_stack_start:
    .skip 4096
_stack_end:

This bootloader:

Runs at EL2 (the exception level for boot loaders)
Copies the kernel from a staging area to the final load address
Configures SPSR_EL2 to specify EL1 as the target exception level
Uses ERET to jump to the kernel at EL1
Passes the device tree address in x0 and CPU ID in x1

8.5 Loading an ELF File

Our kernel is an ELF (Executable and Linkable Format) file. ELF is the standard binary format on Unix-like systems. It contains multiple sections (text, data, bss) each with their own load addresses.

Parsing ELF in a bootloader is more complex than copying a raw binary, but it allows the kernel to be larger and more flexible. The key ELF structures are:

Structure	What It Contains
ELF header	Magic number (0x7F 'ELF'), architecture (AArch64), entry point address
Program header table	Array of segments: each segment describes a region to load
Segment	A chunk of the file to load into memory at a specific address

A segment (program header) has these fields:

struct elf64_phdr {
    uint32_t p_type;    /* type of segment (PT_LOAD = 1 means load into memory) */
    uint32_t p_flags;   /* permissions (PF_R=4, PF_W=2, PF_X=1) */
    uint64_t p_offset;  /* offset in the file where this segment starts */
    uint64_t p_vaddr;   /* virtual address to load this segment to */
    uint64_t p_paddr;   /* physical address (same as vaddr for our kernel) */
    uint64_t p_filesz;  /* size of the segment in the file */
    uint64_t p_memsz;   /* size of the segment in memory (may be larger than filesz for BSS) */
    uint64_t p_align;   /* alignment requirement */
};

To load an ELF kernel, the bootloader:

Reads the ELF header at offset 0 of the kernel image
Validates the magic number and architecture
Iterates through program headers, loading each PT_LOAD segment to its p_paddr
For segments where p_memsz > p_filesz, zeroes the extra space (this is BSS)
Jumps to the entry point from the ELF header

Here is the C code for an ELF loader:

#include 

#define PT_LOAD 1

struct elf64_hdr {
    uint8_t  ident[16];
    uint16_t type;
    uint16_t machine;
    uint32_t version;
    uint64_t entry;
    uint64_t phoff;      /* program header offset */
    uint64_t shoff;
    uint32_t flags;
    uint16_t ehsize;
    uint16_t phentsize;  /* size of each program header */
    uint16_t phnum;      /* number of program headers */
    uint16_t shentsize;
    uint16_t shnum;
    uint16_t shstrndx;
};

struct elf64_phdr {
    uint32_t p_type;
    uint32_t p_flags;
    uint64_t p_offset;
    uint64_t p_vaddr;
    uint64_t p_paddr;
    uint64_t p_filesz;
    uint64_t p_memsz;
    uint64_t p_align;
};

/* Load an ELF kernel from memory at 'addr' and jump to its entry point */
void load_elf(void *addr) {
    struct elf64_hdr *hdr = (struct elf64_hdr *)addr;

    /* Validate ELF magic */
    if (hdr->ident[0] != 0x7F || hdr->ident[1] != 'E' ||
        hdr->ident[2] != 'L'  || hdr->ident[3] != 'F') {
        /* Not a valid ELF file */
        return;
    }

    /* Validate it is AArch64 */
    if (hdr->machine != 0xB7) {  /* EM_AARCH64 */
        return;
    }

    /* Load each segment */
    struct elf64_phdr *phdr = (struct elf64_phdr *)((uint8_t *)addr + hdr->phoff);
    for (int i = 0; i < hdr->phnum; i++) {
        if (phdr[i].p_type != PT_LOAD) continue;

        /* Source in the ELF file */
        uint8_t *src = (uint8_t *)addr + phdr[i].p_offset;
        /* Destination in memory */
        uint8_t *dst = (uint8_t *)(uintptr_t)phdr[i].p_paddr;

        /* Copy the segment data */
        for (uint64_t j = 0; j < phdr[i].p_filesz; j++) {
            dst[j] = src[j];
        }

        /* Zero the BSS portion (memsz > filesz) */
        for (uint64_t j = phdr[i].p_filesz; j < phdr[i].p_memsz; j++) {
            dst[j] = 0;
        }
    }

    /* Jump to the entry point */
    void (*entry)(uint64_t, uint64_t) = (void (*)(uint64_t, uint64_t))hdr->entry;
    entry(0, 0);  /* device tree address, CPU ID */
}

8.6 Bootloader with UEFI

On modern systems (including Raspberry Pi 4/5), the boot firmware provides UEFI (Unified Extensible Firmware Interface). UEFI is a standard interface between firmware and operating system. It provides:

File system access (read files from FAT32 partitions)
Memory map of available RAM
Device tree or ACPI tables
Runtime services (set time, reboot, etc.)

UEFI loads an EFI application (a PE32+ executable) from the EFI System Partition. The kernel can be an EFI application itself (like Linux's efi-stub), or UEFI can load a separate bootloader like GRUB which then loads the kernel.

For our kernel, we have two options:

UEFI stub: Link our kernel as an EFI application so UEFI loads it directly. This is how Linux boots on ARM64.
U-Boot: Use U-Boot as an intermediate bootloader that reads our kernel from a FAT partition and loads it.

We will start with raw binary loading (the simplest) and progress to ELF loading, then UEFI support later.

8.7 Bootloader Configuration

A bootloader often needs configuration: where is the kernel on disk? What kernel arguments should be passed? What device tree should be used?

Simple bootloaders use hard-coded values (like our boot.S above). More sophisticated bootloaders read a configuration file. For example, U-Boot uses boot.scr or extlinux.conf, and GRUB uses grub.cfg.

A simple configuration approach is to store a struct at a fixed location on disk:

struct boot_config {
    uint8_t  magic[4];         /* "CFG!" */
    uint64_t kernel_offset;    /* sector offset of kernel on disk */
    uint64_t kernel_size;      /* size of kernel in bytes */
    uint64_t dtb_offset;       /* sector offset of device tree */
    uint64_t dtb_size;         /* size of device tree in bytes */
    char     cmdline[256];     /* kernel command line arguments */
};

The bootloader reads this struct from a known location (e.g., sector 0), validates the magic, and uses the fields to locate and load the kernel and device tree.

8.8 Our Implementation

For the first version of our OS, we do not need a separate bootloader. QEMU's -kernel flag handles the loading for us. However, we will eventually need a bootloader for real hardware.

Here is our plan for bootloader support:

Phase	Boot Method	When
Phase 1 (current)	QEMU -kernel (raw binary)	Development on QEMU
Phase 2	Raw kernel on disk image + our bootloader	After we have UART and storage drivers
Phase 3	ELF kernel + our ELF loader	After kernel grows beyond raw binary limits
Phase 4	UEFI boot or U-Boot	Raspberry Pi 4/5 port

For now, our start.S from Chapter 7 acts as both the bootloader entry and the kernel entry. This is acceptable because QEMU's built-in loader handles the storage and loading part. When we move to real hardware, we will split this into a separate bootloader binary.

Building a Bootloader + Kernel System

To build a bootloader and kernel together, we can combine them in a single ELF or use separate binaries:

# Build bootloader
aarch64-none-elf-as boot.S -o boot.o
aarch64-none-elf-ld -T boot.ld boot.o -o boot.elf
aarch64-none-elf-objcopy -O binary boot.elf boot.bin

# Build kernel
aarch64-none-elf-gcc -c -ffreestanding -O2 kernel.c -o kernel.o
aarch64-none-elf-as start.S -o start.o
aarch64-none-elf-ld -T kernel.ld start.o kernel.o -o kernel.elf
aarch64-none-elf-objcopy -O binary kernel.elf kernel.bin

# Create disk image with bootloader at sector 0 and kernel at sector 1
dd if=/dev/zero of=disk.img bs=512 count=65536
dd if=boot.bin of=disk.img bs=512 conv=notrunc
dd if=kernel.bin of=disk.img bs=512 seek=1 conv=notrunc

# Run QEMU with the bootloader as firmware and disk as storage
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
    -bios boot.bin \
    -drive file=disk.img,format=raw,id=hd0 \
    -device virtio-blk-device,drive=hd0

The -bios boot.bin flag tells QEMU to use our bootloader as the firmware. QEMU loads it at the reset vector and executes it on power-on.

8.9 Exercises

Exercise 1: Bootloader Output

Modify the boot.S bootloader to print the kernel size and load address before jumping to the kernel. The output should look like:

[BOOT] Loading kernel...
[BOOT] Kernel size: 12345 bytes
[BOOT] Load address: 0x40000000
[BOOT] Jumping to kernel...

Exercise 2: ELF Parser

Write a C function that reads the ELF header of kernel.elf and prints the entry point address and the number of loadable segments. Build it as a standalone program for your host system, not for the target. Use fopen and fread.

Exercise 3: Checksum Verification

Add a checksum to the boot configuration struct. Before jumping to the kernel, compute an XOR checksum of the kernel data and compare it to the stored checksum. If they do not match, print an error and halt.

Exercise 4: Boot Time Measurement

Use the system timer (CNTPCT_EL0) to measure how long the bootloader takes to load the kernel. Print the time in microseconds before jumping to the kernel.

Exercise 5: Boot Menu (Challenge)

Add a simple boot menu to the bootloader that waits for a keypress during a 3-second window. If the user presses 'k', boot the kernel. If 'd', enter a debug mode that dumps memory. Otherwise, boot the kernel by default.

8.10 Summary

In this chapter, we learned what a bootloader is and why it is needed. The bootloader bridges the gap between firmware (which initializes hardware) and the kernel (which manages the system).

We built a minimal bootloader in assembly that copies the kernel from a staging area to the load address, configures the CPU state, and jumps to the kernel at EL1 using the ERET instruction.

We examined the ELF file format and wrote a C function that parses ELF headers and loads segments into memory at the correct addresses. ELF loading allows the kernel to be larger and more flexible than a raw binary.

We discussed bootloader configuration and the boot phases we will follow: starting with QEMU's built-in loader, progressing to our own bootloader with raw binary, then ELF support, and finally UEFI for real hardware.

For now, QEMU's -kernel flag is sufficient. When we are ready to run on real hardware, we will build a full bootloader based on the concepts from this chapter.

In the next chapter, we will focus on what happens inside the kernel entry point: installing exception vectors, enabling caches, and transitioning from early boot to full kernel initialization.