Chapter 7: Boot Process
- The complete boot sequence from power-on to kernel execution
- The role of firmware, bootloader, and kernel in the boot chain
- How exception levels change during boot
- What the CPU state is when our kernel starts
- How QEMU boots our kernel with the -kernel flag
- The device tree and how to read it
- What our _start code must do before calling kernel_main
7.1 The Boot Sequence Overview
When you press the power button, the CPU is in an undefined state. A sequence of events must occur before our kernel can run. This sequence is called the boot chain.
graph LR
A[Power On] --> B[ROM Firmware]
B --> C[EL3 Boot Loader]
C --> D[EL2 Boot Loader]
D --> E[EL1 Kernel]
E --> F[EL0 Applications]
Figure 7.1: The ARM64 boot chain. Each stage runs at a lower exception level than the previous one.
Each stage initializes some hardware, prepares for the next stage, and then drops to a lower exception level before jumping to the next stage. This is called exception level dropping. Once an exception level is lowered, it cannot go back up without explicit firmware calls.
| Stage | EL | What It Does |
|---|---|---|
| ROM Firmware | EL3 | Basic CPU init, loads first boot loader from flash |
| EL3 Boot Loader | EL3 | DRAM init, loads EL2 boot loader (e.g., U-Boot) |
| EL2 Boot Loader | EL2 | Loads kernel from disk/network, passes device tree |
| Kernel | EL1 | Our kernel: MMU, scheduler, drivers, system calls |
| Applications | EL0 | User-space programs running under kernel control |
7.2 Firmware and the Boot ROM
When power is first applied, the CPU starts executing code from a fixed address in ROM (read-only memory). This is the boot ROM, built into the CPU itself. It cannot be modified.
The boot ROM does the minimum needed to get the system started:
- Initializes the CPU (caches off, MMU off, all interrupts masked)
- Sets up the stack pointer for EL3
- Reads boot configuration (boot device priority, etc.)
- Loads the next-stage boot loader from flash/EEPROM into SRAM
- Jumps to it at EL3
On real hardware (Raspberry Pi 4/5), this ROM loads a file called bootcode.bin
from the SD card. On QEMU, the firmware is provided by QEMU itself (usually
QEMU_EFI.fd for UEFI, or a built-in EL3 firmware).
7.3 The Boot Loader
The boot loader is a program that loads our kernel into memory and prepares the environment for it. On ARM64 systems, common boot loaders include:
- U-Boot: the most common open-source boot loader for ARM boards
- UEFI: modern firmware interface, used on Raspberry Pi 4/5
- ARM Trusted Firmware (TF-A): reference EL3 firmware for ARM
- QEMU's built-in loader: when using
-kernel, QEMU acts as a simple boot loader itself
The boot loader's responsibilities:
- Initialize DRAM (memory) so there is somewhere to load the kernel
- Load the kernel image from storage (SD card, network, flash) into DRAM
- Load the device tree into memory (tells the kernel about hardware)
- Set up CPU registers before jumping to the kernel
- Drop exception level to EL1 or EL2 before entering the kernel
How QEMU Handles This
When we run:
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic -kernel kernel.elf
QEMU does the following:
- Creates a virtual ARM64 machine with the virt platform
- Loads a built-in EL3 firmware (ARM Trusted Firmware)
- The firmware initializes the virtual hardware and drops to EL2
- QEMU's internal loader places our
kernel.elfat the address specified in the ELF headers (0x40000000) - The firmware at EL2 drops to EL1 and jumps to the kernel entry point
This means our kernel starts at EL1 with:
- MMU disabled (all addresses are physical)
- Data and instruction caches disabled
- Stack pointer undefined (we must set it up)
- All interrupts masked (DAIF bits set)
- Device tree address in x0 (if QEMU provides one)
- CPU ID in x1 (for multicore systems)
7.4 The Device Tree (FDT)
A device tree is a data structure that describes the hardware to the kernel. It tells the kernel what devices exist, where their registers are in memory, how interrupts are wired, and other configuration details.
The device tree is a Flattened Device Tree (FDT) or Device Tree Blob (DTB). It is a binary format, but it can be represented as text in a Device Tree Source (DTS) file.
When QEMU boots our kernel, it can pass a device tree. The address of the device tree
is placed in register x0 before our kernel starts. The device tree
describes:
/dts-v1/;
/ {
model = "QEMU virt";
compatible = "arm,virt";
memory@40000000 {
device_type = "memory";
reg = <0x00000000 0x40000000>; /* 1 GB at 0x40000000 */
};
uart@9000000 {
compatible = "arm,pl011";
reg = <0x00000000 0x09000000 0x00000000 0x00001000>;
interrupts = <0x00000001 0x00000003>;
};
cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a72";
reg = <0x00000000 0x00000000>;
};
};
The device tree allows the same kernel binary to run on different hardware with different memory sizes, different UART addresses, or different numbers of CPUs. Instead of hard-coding addresses, the kernel reads them from the device tree.
For now, our kernel hard-codes the UART address (0x09000000). Later, we will write a device tree parser so the kernel can discover hardware dynamically.
7.5 CPU State at Kernel Entry
When the boot loader jumps to our kernel entry point (_start), the CPU is
in a specific state. Understanding this state is critical because our startup code must
handle it correctly.
| Component | State at Entry | What We Must Do |
|---|---|---|
| Exception level | EL1 (or EL2 if booted by EL2 loader) | If at EL2, drop to EL1 |
| MMU | Disabled (all addresses physical) | Keep disabled until we set up page tables |
| Data cache | Disabled | Keep disabled until MMU is on |
| Instruction cache | Disabled (may be enabled) | Can enable early for performance |
| Stack pointer | Undefined (SP_EL1 is not set up) | Set SP_EL1 immediately |
| Interrupts | All masked (DAIF bits set) | Keep masked until we have handlers |
| x0 | Device tree address (or 0 if none) | Save before using (we pass to kernel_main) |
| x1 | CPU ID (0 for primary core) | Save for multicore boot |
| Other registers | Undefined | Do not assume any value |
| BSS section | Not zeroed (contains garbage) | Zero it before using any global variables |
Our current _start code handles the most critical items:
_start:
ldr x0, =_stack_end /* load top of stack address */
mov sp, x0 /* set stack pointer */
bl kernel_main /* jump to C code */
wfi
b _start
This is minimal. Later, we will need to add BSS clearing, exception level checking, and multicore handling.
7.6 BSS Clearing
The BSS section contains global and static variables that are
initialized to zero. In a normal C program, the C runtime startup code (crt0) zeros
BSS before calling main. In our freestanding kernel, we must do this
ourselves.
Our linker script defines two symbols that mark the BSS region:
/* Before calling kernel_main, zero the BSS section */
_start:
ldr x0, =_stack_end
mov sp, x0
/* Clear BSS */
ldr x0, =_bss_start
ldr x1, =_bss_end
mov x2, xzr /* zero */
1:
cmp x0, x1
b.ge 2f
str x2, [x0], #8 /* store zero and advance by 8 bytes */
b 1b
2:
bl kernel_main
wfi
b _start
This loop stores 8 bytes of zero to each 8-byte word in the BSS range. Without this step, any global variable that should be zero will contain garbage values, causing unpredictable behavior.
7.7 Exception Level Drop (EL2 to EL1)
Some boot configurations start our kernel at EL2 instead of EL1. Since our kernel is designed to run at EL1, we need to detect this and drop to EL1 if necessary.
_start:
/* Check current exception level */
mrs x0, CurrentEL
lsr x0, x0, #2
cmp x0, #2 /* Are we at EL2? */
b.ne setup_el1
/* We are at EL2. Configure EL2 and drop to EL1. */
/* Set up a minimal EL2 environment... */
/* Set SPSR_EL2 to boot to EL1 */
mov x0, #0x3C5 /* EL1h, all interrupts masked */
msr SPSR_EL2, x0
/* Set the return address to our EL1 startup code */
adr x0, setup_el1
msr ELR_EL2, x0
/* Return to EL1 */
eret
setup_el1:
/* Now at EL1 */
ldr x0, =_stack_end
mov sp, x0
/* Clear BSS */
ldr x0, =_bss_start
ldr x1, =_bss_end
mov x2, xzr
1: cmp x0, x1
b.ge 2f
str x2, [x0], #8
b 1b
2:
bl kernel_main
wfi
b _start
The ERET instruction loads the exception return address from
ELR_EL2 and the processor state from SPSR_EL2, and then
jumps to the return address at the specified exception level.
7.8 Multicore Considerations
QEMU virt defaults to 1 CPU, but can be configured for more:
qemu-system-aarch64 -M virt -cpu cortex-a72 -smp 4 -nographic -kernel kernel.elf
When multiple CPUs are present, all CPUs start executing at the kernel entry point simultaneously. We need to:
- Identify which CPU is the primary (boot core, CPU 0)
- Send secondary CPUs to a spin loop (wait for work)
- Let only the primary CPU continue with initialization
_start:
/* x1 contains the CPU ID (set by QEMU/bootloader) */
mov x2, x1 /* save CPU ID */
cbz x2, primary_cpu /* CPU 0 is primary, proceed */
secondary_cpu:
/* Secondary CPUs spin here until the kernel wakes them */
wfe
b secondary_cpu
primary_cpu:
ldr x0, =_stack_end
mov sp, x0
/* ... clear BSS, call kernel_main ... */
We will implement a proper multicore wake-up mechanism using SEV (send
event) later in the book when we discuss scheduling.
7.9 The Boot Flow on QEMU virt
Let us trace the exact boot flow when we run our kernel on QEMU virt:
- QEMU starts and creates the virtual machine with the virt platform
- ROM firmware at EL3 initializes the CPU, sets up the GIC (interrupt controller), configures the virtual memory map, and loads the next stage
- ARM Trusted Firmware (ATF) at EL3 performs PSCI (Power State Coordination Interface) setup, then drops to EL2
- QEMU's internal loader reads our
kernel.elf, parses the ELF headers, and loads the segments at the addresses specified in the program headers - EL2 stub (if present) or the firmware jumps to our kernel entry point
- Our _start code executes: set stack, clear BSS, call kernel_main
We can observe this boot flow using QEMU's tracing:
# Trace the boot process
qemu-system-aarch64 -M virt -cpu cortex-a72 -nographic \
-kernel kernel.elf -d cpu_reset,int -D qemu_trace.log
7.10 Our Implementation
Let us now write a complete start.S that handles all the boot requirements
we have discussed:
.section .text._start
.global _start
_start:
/* Save boot parameters from bootloader */
mov x20, x0 /* save device tree address */
mov x21, x1 /* save CPU ID */
/* Check if we need to drop from EL2 to EL1 */
mrs x0, CurrentEL
lsr x0, x0, #2
cmp x0, #2
b.ne 1f
/* Drop from EL2 to EL1 */
mov x0, #0x3C5 /* EL1h, DAIF masked */
msr SPSR_EL2, x0
adr x0, 1f
msr ELR_EL2, x0
eret
1:
/* Set stack pointer for EL1 */
ldr x0, =_stack_end
mov sp, x0
/* Clear BSS section */
ldr x0, =_bss_start
ldr x1, =_bss_end
mov x2, xzr
2: cmp x0, x1
b.ge 3f
str x2, [x0], #8
b 2b
3:
/* Restore boot parameters and enter C code */
mov x0, x20 /* device tree address */
mov x1, x21 /* CPU ID */
bl kernel_main
/* If kernel_main returns, halt */
halt:
wfi
b halt
This start.S now does five things:
- Saves boot parameters from the boot loader
- Detects and handles EL2 start (drops to EL1)
- Sets up the stack pointer for C code
- Zeroes the BSS section
- Passes the device tree address and CPU ID to kernel_main
Correspondingly, our kernel_main signature changes to accept these
parameters:
void kernel_main(uint64_t dtb_addr, uint64_t cpu_id) {
/* Now we know which CPU we are and where the device tree is */
if (cpu_id == 0) {
/* Primary CPU does full initialization */
uart_init();
uart_puts("Primary CPU booting...\r\n");
} else {
/* Secondary CPUs wait */
while (1) __asm__("wfe");
}
}
This is our foundation. In Chapter 9 (Kernel Entry Point), we will refine the startup sequence further with cache enabling, exception vector installation, and early memory initialization.
7.11 Exercises
Exercise 1: Trace the Boot Flow
When you run qemu-system-aarch64 -M virt -nographic -kernel kernel.elf,
list every software component that executes between power-on and the first instruction
of our _start. Use QEMU documentation and online resources.
Exercise 2: Read CurrentEL
Add code to kernel_main that reads the current exception level and prints
it. If it is EL2, print a warning. Build and run to verify our kernel starts at EL1.
Exercise 3: Dump the Device Tree
Use QEMU's -M virt,dumpdtb=qemu-virt.dtb option to extract the device
tree binary. Then use dtc -I dtb -O dts qemu-virt.dtb to convert it to
human-readable DTS format. Identify the UART, GIC, and memory nodes. Add their
addresses.
Exercise 4: BSS Bug Hunting
Remove the BSS-clearing loop from start.S and add a global variable
int counter = 0; in kernel.c that increments in kernel_main. Build and
run several times. Observe that the value is not always zero at start. Write a short
explanation of why this happens.
Exercise 5: Boot with SMP
Run QEMU with -smp 4 and add code to start.S that prints
"CPU X booting" for each CPU that reaches _start. The primary CPU should
print the message; secondary CPUs should spin. Count how many messages you see.
Exercise 6: EL2 Detection (Challenge)
Modify QEMU's boot to start our kernel at EL2 instead of EL1. One way is to use a
different firmware: -machine virt,secure=on or a custom EL2 loader.
Then verify that our EL2-to-EL1 drop code works correctly. Hint: look at QEMU's
-bios option for providing an EL2 loader.
7.12 Summary
In this chapter, we traced the complete boot sequence from power-on to our kernel entry point. The boot chain goes through multiple stages, each running at a higher exception level than the next. The firmware at EL3 initializes the system, the boot loader at EL2 loads our kernel into memory, and our kernel runs at EL1.
We learned about the device tree, which describes hardware to the kernel in a platform- independent way. While we currently hard-code hardware addresses, we will eventually parse the device tree to discover devices dynamically.
We examined the exact CPU state at kernel entry: MMU off, caches off, interrupts masked,
stack undefined, BSS uninitialized. Our _start code must handle each of these
before it can safely call C code.
Finally, we built a robust start.S that saves boot parameters, drops from
EL2 to EL1 if needed, sets up the stack, clears BSS, and calls kernel_main with the
device tree address and CPU ID.
In the next chapter, we will look at building a complete boot loader that can load our kernel from disk or over a serial connection.