ARM64 OS Handbook
🔍

Chapter 37: ELF Files

What You Will Learn in This Chapter
  • What an ELF file is and its structure
  • The ELF header, program headers, and section headers
  • How the kernel loads an ELF executable into memory
  • Relocation and dynamic linking basics
  • How exec() uses the ELF loader to start a new program
  • Our ELF loader implementation

37.1 What is ELF?

ELF (Executable and Linkable Format) is the standard binary format for executables, object files, and shared libraries on Unix-like systems, including ARM64 Linux. An ELF file contains the machine code, data, and metadata needed to load and execute a program.

37.2 ELF Header

/* ELF64 header */
struct elf64_hdr {
    uint8_t  e_ident[16];      /* Magic: \x7fELF, class, data, version... */
    uint16_t e_type;           /* ET_EXEC=2, ET_DYN=3, ET_REL=1 */
    uint16_t e_machine;        /* EM_AARCH64=183 */
    uint32_t e_version;
    uint64_t e_entry;          /* Entry point virtual address */
    uint64_t e_phoff;          /* Program header table offset */
    uint64_t e_shoff;          /* Section header table offset */
    uint32_t e_flags;
    uint16_t e_ehsize;         /* ELF header size */
    uint16_t e_phentsize;      /* Size of each program header entry */
    uint16_t e_phnum;          /* Number of program headers */
    uint16_t e_shentsize;      /* Section header entry size */
    uint16_t e_shnum;          /* Number of section headers */
    uint16_t e_shstrndx;       /* Section header string table index */
};

/* Program header (describes a segment to load) */
struct elf64_phdr {
    uint32_t p_type;           /* PT_LOAD=1, PT_DYNAMIC=2, PT_INTERP=3 */
    uint32_t p_flags;          /* PF_R=4, PF_W=2, PF_X=1 */
    uint64_t p_offset;         /* Offset in file */
    uint64_t p_vaddr;          /* Virtual address to load at */
    uint64_t p_paddr;          /* Physical address (unused) */
    uint64_t p_filesz;         /* Size in file */
    uint64_t p_memsz;          /* Size in memory (may be > filesz for .bss) */
    uint64_t p_align;          /* Alignment requirement */
};

37.3 ELF Loader

/* Load an ELF executable into the current process */
int elf_load(struct pcb *proc, const char *path) {
    /* Open and read the ELF file */
    struct file *f = vfs_open(path, O_RDONLY);
    if (!f) return -1;

    /* Read ELF header */
    struct elf64_hdr ehdr;
    vfs_read(f, &ehdr, sizeof(ehdr));

    /* Verify ELF magic */
    if (ehdr.e_ident[0] != 0x7f || ehdr.e_ident[1] != 'E' ||
        ehdr.e_ident[2] != 'L'  || ehdr.e_ident[3] != 'F') {
        vfs_close(f);
        return -1;
    }

    /* Verify architecture */
    if (ehdr.e_machine != 183) {  /* EM_AARCH64 */
        vfs_close(f);
        return -1;
    }

    /* Read program headers */
    struct elf64_phdr *phdrs = kmalloc(ehdr.e_phnum * sizeof(struct elf64_phdr));
    vfs_seek(f, ehdr.e_phoff, SEEK_SET);
    vfs_read(f, phdrs, ehdr.e_phnum * sizeof(struct elf64_phdr));

    /* Load each PT_LOAD segment */
    for (int i = 0; i < ehdr.e_phnum; i++) {
        struct elf64_phdr *ph = &phdrs[i];
        if (ph->p_type != 1) continue;  /* PT_LOAD */

        /* Calculate page-aligned addresses */
        uint64_t va_start = ph->p_vaddr & ~(PAGE_SIZE - 1);
        uint64_t va_end = (ph->p_vaddr + ph->p_memsz + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1);
        uint64_t file_offset = ph->p_offset & ~(PAGE_SIZE - 1);
        uint64_t page_offset = ph->p_vaddr & (PAGE_SIZE - 1);

        /* Map pages */
        for (uint64_t va = va_start; va < va_end; va += PAGE_SIZE) {
            void *page = page_alloc();
            uint64_t file_pos = file_offset + (va - va_start);
            uint64_t copy_offset = (va == va_start) ? page_offset : 0;

            /* Read data from file */
            if (file_pos < ph->p_offset + ph->p_filesz) {
                uint64_t file_count = min(PAGE_SIZE - copy_offset,
                                          ph->p_offset + ph->p_filesz - file_pos);
                vfs_seek(f, file_pos, SEEK_SET);
                vfs_read(f, (char *)page + copy_offset, file_count);
            }

            /* Zero-fill .bss (memsz > filesz) */
            if (ph->p_vaddr + ph->p_memsz > va + PAGE_SIZE) {
                uint64_t zero_start = ph->p_filesz - (file_pos - ph->p_offset);
                if (zero_start < PAGE_SIZE) {
                    memset((char *)page + zero_start, 0, PAGE_SIZE - zero_start);
                }
            }

            /* Map with correct permissions */
            int writable = ph->p_flags & 2;
            int executable = ph->p_flags & 1;
            map_page(proc->page_table, va, (uint64_t)page, writable, 1);
        }
    }

    /* Set entry point */
    proc->context.pc = ehdr.e_entry;

    vfs_close(f);
    kfree(phdrs);
    return 0;
}

37.4 Kernel ELF vs User ELF

Our kernel itself is also an ELF file. However, the kernel is not loaded by the ELF loader: it is loaded by the bootloader (UEFI or start.S). The kernel ELF:

  • Has a virtual address entry point at 0xFFFF_0000_0000_0000 + offset
  • Contains program headers that map segments to the kernel's higher-half address space
  • Is statically linked (no shared library dependencies)
  • Has a single PT_LOAD segment covering .text, .rodata, .data, .bss

37.5 Our Implementation

Our ELF loader (fs/elf.c) provides:

  • ELF validation: checks magic, architecture (AArch64), and endianness
  • Segment loading: maps PT_LOAD segments at specified virtual addresses with correct permissions
  • .bss zeroing: handles memsz > filesz by zero-filling the extra memory
  • Page alignment: handles p_vaddr alignment and partial pages
  • Interpreter support: detects PT_INTERP (dynamic linker) for shared executables
  • User stack setup: allocates a stack at USER_STACK_TOP with argc/argv/envp

37.6 Exercises

Exercise 1: Readelf

Implement a simple readelf utility that parses and prints the ELF header, program headers, and section headers of a given file.

Exercise 2: Dynamic Linker

Research how PT_INTERP works. Write a minimal dynamic linker that loads a shared library, resolves symbols, and jumps to the program entry point.

37.7 Summary

ELF is the standard executable format on our OS. The ELF loader reads the file header, iterates over program headers, and maps PT_LOAD segments into the process's address space with the correct permissions. It handles page alignment, partial pages, and the .bss (zero-initialized data) section. The entry point from the ELF header becomes the initial PC of the new process. Our loader supports both static executables and dynamically-linked executables with a PT_INTERP interpreter.