Chapter 8: The Bootloader
- What a bootloader is and why we need one
- How a bootloader reads the kernel from storage and places it in memory
- The structure of a simple bootloader for QEMU virt
- How to load an ELF file from a raw disk image
- How to pass control to the kernel with the correct CPU state
- The difference between a bootloader and a kernel
8.1 What is a Bootloader?
A bootloader is a small program that loads the operating system kernel into memory and starts it. It is the bridge between the firmware (which initializes hardware) and the kernel (which manages the system).
The bootloader solves a chicken-and-egg problem: the kernel is stored on disk, but the disk driver is part of the kernel. How do we load the kernel without the kernel being loaded? The bootloader contains just enough code to read from storage, parse the kernel image format, and jump to it.
A bootloader does these things:
- Initialize hardware enough to read the kernel (storage controller, memory, UART for debug output)
- Read the kernel image from a storage device (SD card, disk, flash, network)
- Parse the kernel format (raw binary, ELF, or a custom format) to know where to place each section in memory
- Set up the kernel environment (device tree, boot parameters)
- Jump to the kernel entry point at the correct exception level
So far we have been using QEMU's -kernel kernel.elf option, which makes
QEMU act as its own bootloader. In this chapter, we will write our own bootloader so
we understand exactly what happens between power-on and kernel execution.
8.2 Bootloader vs Kernel
| Property | Bootloader | Kernel |
|---|---|---|
| Size | Tiny (a few KB) | Large (MB) |
| Lifetime | Runs once, then exits | Runs until shutdown |
| Complexity | Minimal | High (scheduler, MMU, drivers, FS) |
| Storage driver | Minimal (enough to read kernel) | Full driver stack |
| Memory management | None (loads to fixed addresses) | Complete (paging, heap, VMA) |
| User interaction | May have menu/console | Shell or GUI |
| Where it runs | EL2 (or EL1 if no EL2 support) | EL1 |
The bootloader does only what is necessary to start the kernel, then it is done. It does not manage processes, handle system calls, or schedule tasks. That is the kernel's job.
8.3 Storage on QEMU virt
To load our kernel from a disk, we need a storage device. QEMU virt provides several options:
| Device | Interface | How to Add It |
|---|---|---|
| SD card (SDHCI) | MMIO at 0x09040000 | -device sd-card,drive=sd0 |
| VirtIO block | MMIO at 0x0A000000+ | -device virtio-blk-device,drive=disk0 |
| PCI storage | PCI bus | -device ahci,id=ahci |
| Flash/NOR | Memory-mapped | -drive file=flash.img,format=raw,if=pflash |
For simplicity, we will use a raw disk image with our kernel stored at a known offset. The bootloader reads the kernel from this image and loads it into memory.
# Create a 64 MB raw disk image
dd if=/dev/zero of=disk.img bs=1M count=64
# Write our kernel binary at sector 1 (byte offset 512)
dd if=kernel.bin of=disk.img bs=512 seek=1 conv=notrunc
# Run QEMU with the disk image
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
-drive file=disk.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0
8.4 A Minimal Bootloader
Our bootloader will be the simplest possible: it runs at EL2, reads the kernel from a known location in memory (loaded by QEMU's firmware), and jumps to it at EL1.
On QEMU virt, we can place the bootloader at the beginning of the flash and have it load the kernel from a fixed address. But an even simpler approach is to use a two-stage boot:
- Stage 1: A tiny assembly program at the reset vector that initializes basic hardware and loads stage 2
- Stage 2: A slightly larger program that reads the kernel from disk, parses it, and jumps to it
For QEMU virt, we can build stage 2 directly. Here is a minimal bootloader that:
- Runs at EL2
- Sets up a stack and UART for debug
- Copies the kernel from a known memory location to the kernel load address
- Sets up the device tree pointer
- Drops to EL1 and jumps to the kernel
.section .text
.global _start
.equ KERNEL_LOAD_ADDR, 0x40000000
.equ KERNEL_SOURCE_ADDR, 0x50000000 /* firmware places kernel here */
.equ KERNEL_MAX_SIZE, 0x00100000 /* 1 MB max kernel size */
.equ UART_BASE, 0x09000000
_start:
/* Set up stack */
ldr x0, =_stack_end
mov sp, x0
/* Initialize UART for debug output */
bl uart_init
ldr x0, =boot_msg
bl uart_puts
/* Copy kernel from source to load address */
ldr x0, =KERNEL_SOURCE_ADDR
ldr x1, =KERNEL_LOAD_ADDR
ldr x2, =KERNEL_MAX_SIZE
bl memcpy
/* Set up device tree */
ldr x0, =dtb_addr /* address of device tree provided by firmware */
ldr x0, [x0]
/* Set CPU ID */
mov x1, #0
/* Drop to EL1 and jump to kernel */
mov x2, #0x3C5 /* EL1h, all interrupts masked */
msr SPSR_EL2, x2
ldr x2, =KERNEL_LOAD_ADDR
msr ELR_EL2, x2
eret
uart_init:
/* Minimal UART init for PL011 */
ldr x0, =UART_BASE
mov w1, #0
str w1, [x0, #0x30] /* disable UART */
mov w1, #13
str w1, [x0, #0x24] /* IBRD */
mov w1, #1
str w1, [x0, #0x28] /* FBRD */
mov w1, #0x70
str w1, [x0, #0x2C] /* LCR_H (8-bit, enable FIFO) */
mov w1, #0x301
str w1, [x0, #0x30] /* enable UART, TX, RX */
ret
uart_puts:
ldr x1, =UART_BASE
1: ldrb w2, [x0], #1
cbz w2, 2f
str w2, [x1] /* write character to UART data register */
b 1b
2: ret
memcpy:
/* x0 = source, x1 = destination, x2 = size */
1: cbz x2, 2f
ldr x3, [x0], #8
str x3, [x1], #8
sub x2, x2, #8
b 1b
2: ret
boot_msg:
.asciz "[BOOT] Loading kernel...\r\n"
.section .bss
.align 4
_stack_start:
.skip 4096
_stack_end:
This bootloader:
- Runs at EL2 (the exception level for boot loaders)
- Copies the kernel from a staging area to the final load address
- Configures
SPSR_EL2to specify EL1 as the target exception level - Uses
ERETto jump to the kernel at EL1 - Passes the device tree address in x0 and CPU ID in x1
8.5 Loading an ELF File
Our kernel is an ELF (Executable and Linkable Format) file. ELF is the standard binary format on Unix-like systems. It contains multiple sections (text, data, bss) each with their own load addresses.
Parsing ELF in a bootloader is more complex than copying a raw binary, but it allows the kernel to be larger and more flexible. The key ELF structures are:
| Structure | What It Contains |
|---|---|
| ELF header | Magic number (0x7F 'ELF'), architecture (AArch64), entry point address |
| Program header table | Array of segments: each segment describes a region to load |
| Segment | A chunk of the file to load into memory at a specific address |
A segment (program header) has these fields:
struct elf64_phdr {
uint32_t p_type; /* type of segment (PT_LOAD = 1 means load into memory) */
uint32_t p_flags; /* permissions (PF_R=4, PF_W=2, PF_X=1) */
uint64_t p_offset; /* offset in the file where this segment starts */
uint64_t p_vaddr; /* virtual address to load this segment to */
uint64_t p_paddr; /* physical address (same as vaddr for our kernel) */
uint64_t p_filesz; /* size of the segment in the file */
uint64_t p_memsz; /* size of the segment in memory (may be larger than filesz for BSS) */
uint64_t p_align; /* alignment requirement */
};
To load an ELF kernel, the bootloader:
- Reads the ELF header at offset 0 of the kernel image
- Validates the magic number and architecture
- Iterates through program headers, loading each PT_LOAD segment to its p_paddr
- For segments where p_memsz > p_filesz, zeroes the extra space (this is BSS)
- Jumps to the entry point from the ELF header
Here is the C code for an ELF loader:
#include
#define PT_LOAD 1
struct elf64_hdr {
uint8_t ident[16];
uint16_t type;
uint16_t machine;
uint32_t version;
uint64_t entry;
uint64_t phoff; /* program header offset */
uint64_t shoff;
uint32_t flags;
uint16_t ehsize;
uint16_t phentsize; /* size of each program header */
uint16_t phnum; /* number of program headers */
uint16_t shentsize;
uint16_t shnum;
uint16_t shstrndx;
};
struct elf64_phdr {
uint32_t p_type;
uint32_t p_flags;
uint64_t p_offset;
uint64_t p_vaddr;
uint64_t p_paddr;
uint64_t p_filesz;
uint64_t p_memsz;
uint64_t p_align;
};
/* Load an ELF kernel from memory at 'addr' and jump to its entry point */
void load_elf(void *addr) {
struct elf64_hdr *hdr = (struct elf64_hdr *)addr;
/* Validate ELF magic */
if (hdr->ident[0] != 0x7F || hdr->ident[1] != 'E' ||
hdr->ident[2] != 'L' || hdr->ident[3] != 'F') {
/* Not a valid ELF file */
return;
}
/* Validate it is AArch64 */
if (hdr->machine != 0xB7) { /* EM_AARCH64 */
return;
}
/* Load each segment */
struct elf64_phdr *phdr = (struct elf64_phdr *)((uint8_t *)addr + hdr->phoff);
for (int i = 0; i < hdr->phnum; i++) {
if (phdr[i].p_type != PT_LOAD) continue;
/* Source in the ELF file */
uint8_t *src = (uint8_t *)addr + phdr[i].p_offset;
/* Destination in memory */
uint8_t *dst = (uint8_t *)(uintptr_t)phdr[i].p_paddr;
/* Copy the segment data */
for (uint64_t j = 0; j < phdr[i].p_filesz; j++) {
dst[j] = src[j];
}
/* Zero the BSS portion (memsz > filesz) */
for (uint64_t j = phdr[i].p_filesz; j < phdr[i].p_memsz; j++) {
dst[j] = 0;
}
}
/* Jump to the entry point */
void (*entry)(uint64_t, uint64_t) = (void (*)(uint64_t, uint64_t))hdr->entry;
entry(0, 0); /* device tree address, CPU ID */
}
8.6 Bootloader with UEFI
On modern systems (including Raspberry Pi 4/5), the boot firmware provides UEFI (Unified Extensible Firmware Interface). UEFI is a standard interface between firmware and operating system. It provides:
- File system access (read files from FAT32 partitions)
- Memory map of available RAM
- Device tree or ACPI tables
- Runtime services (set time, reboot, etc.)
UEFI loads an EFI application (a PE32+ executable) from the EFI System
Partition. The kernel can be an EFI application itself (like Linux's
efi-stub), or UEFI can load a separate bootloader like GRUB which then
loads the kernel.
For our kernel, we have two options:
- UEFI stub: Link our kernel as an EFI application so UEFI loads it directly. This is how Linux boots on ARM64.
- U-Boot: Use U-Boot as an intermediate bootloader that reads our kernel from a FAT partition and loads it.
We will start with raw binary loading (the simplest) and progress to ELF loading, then UEFI support later.
8.7 Bootloader Configuration
A bootloader often needs configuration: where is the kernel on disk? What kernel arguments should be passed? What device tree should be used?
Simple bootloaders use hard-coded values (like our boot.S above). More sophisticated
bootloaders read a configuration file. For example, U-Boot uses boot.scr
or extlinux.conf, and GRUB uses grub.cfg.
A simple configuration approach is to store a struct at a fixed location on disk:
struct boot_config {
uint8_t magic[4]; /* "CFG!" */
uint64_t kernel_offset; /* sector offset of kernel on disk */
uint64_t kernel_size; /* size of kernel in bytes */
uint64_t dtb_offset; /* sector offset of device tree */
uint64_t dtb_size; /* size of device tree in bytes */
char cmdline[256]; /* kernel command line arguments */
};
The bootloader reads this struct from a known location (e.g., sector 0), validates the magic, and uses the fields to locate and load the kernel and device tree.
8.8 Our Implementation
For the first version of our OS, we do not need a separate bootloader. QEMU's
-kernel flag handles the loading for us. However, we will eventually need
a bootloader for real hardware.
Here is our plan for bootloader support:
| Phase | Boot Method | When |
|---|---|---|
| Phase 1 (current) | QEMU -kernel (raw binary) | Development on QEMU |
| Phase 2 | Raw kernel on disk image + our bootloader | After we have UART and storage drivers |
| Phase 3 | ELF kernel + our ELF loader | After kernel grows beyond raw binary limits |
| Phase 4 | UEFI boot or U-Boot | Raspberry Pi 4/5 port |
For now, our start.S from Chapter 7 acts as both the bootloader entry and
the kernel entry. This is acceptable because QEMU's built-in loader handles the storage
and loading part. When we move to real hardware, we will split this into a separate
bootloader binary.
Building a Bootloader + Kernel System
To build a bootloader and kernel together, we can combine them in a single ELF or use separate binaries:
# Build bootloader
aarch64-none-elf-as boot.S -o boot.o
aarch64-none-elf-ld -T boot.ld boot.o -o boot.elf
aarch64-none-elf-objcopy -O binary boot.elf boot.bin
# Build kernel
aarch64-none-elf-gcc -c -ffreestanding -O2 kernel.c -o kernel.o
aarch64-none-elf-as start.S -o start.o
aarch64-none-elf-ld -T kernel.ld start.o kernel.o -o kernel.elf
aarch64-none-elf-objcopy -O binary kernel.elf kernel.bin
# Create disk image with bootloader at sector 0 and kernel at sector 1
dd if=/dev/zero of=disk.img bs=512 count=65536
dd if=boot.bin of=disk.img bs=512 conv=notrunc
dd if=kernel.bin of=disk.img bs=512 seek=1 conv=notrunc
# Run QEMU with the bootloader as firmware and disk as storage
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
-bios boot.bin \
-drive file=disk.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0
The -bios boot.bin flag tells QEMU to use our bootloader as the firmware.
QEMU loads it at the reset vector and executes it on power-on.
8.9 Exercises
Exercise 1: Bootloader Output
Modify the boot.S bootloader to print the kernel size and load address before jumping to the kernel. The output should look like:
[BOOT] Loading kernel...
[BOOT] Kernel size: 12345 bytes
[BOOT] Load address: 0x40000000
[BOOT] Jumping to kernel...
Exercise 2: ELF Parser
Write a C function that reads the ELF header of kernel.elf and prints
the entry point address and the number of loadable segments. Build it as a standalone
program for your host system, not for the target. Use fopen and
fread.
Exercise 3: Checksum Verification
Add a checksum to the boot configuration struct. Before jumping to the kernel, compute an XOR checksum of the kernel data and compare it to the stored checksum. If they do not match, print an error and halt.
Exercise 4: Boot Time Measurement
Use the system timer (CNTPCT_EL0) to measure how long the bootloader takes to load the kernel. Print the time in microseconds before jumping to the kernel.
Exercise 5: Boot Menu (Challenge)
Add a simple boot menu to the bootloader that waits for a keypress during a 3-second window. If the user presses 'k', boot the kernel. If 'd', enter a debug mode that dumps memory. Otherwise, boot the kernel by default.
8.10 Summary
In this chapter, we learned what a bootloader is and why it is needed. The bootloader bridges the gap between firmware (which initializes hardware) and the kernel (which manages the system).
We built a minimal bootloader in assembly that copies the kernel from a staging area
to the load address, configures the CPU state, and jumps to the kernel at EL1 using
the ERET instruction.
We examined the ELF file format and wrote a C function that parses ELF headers and loads segments into memory at the correct addresses. ELF loading allows the kernel to be larger and more flexible than a raw binary.
We discussed bootloader configuration and the boot phases we will follow: starting with QEMU's built-in loader, progressing to our own bootloader with raw binary, then ELF support, and finally UEFI for real hardware.
For now, QEMU's -kernel flag is sufficient. When we are ready to run on
real hardware, we will build a full bootloader based on the concepts from this chapter.
In the next chapter, we will focus on what happens inside the kernel entry point: installing exception vectors, enabling caches, and transitioning from early boot to full kernel initialization.