ARM64 OS Handbook
🔍

Chapter 35: Storage

What You Will Learn in This Chapter
  • How the kernel interfaces with block storage devices
  • The generic block layer abstraction
  • The virtio-blk driver for QEMU virtual disks
  • SD card interface (for Raspberry Pi)
  • Block I/O request queue and completion
  • Our storage driver implementation

35.1 The Block Layer

Storage devices are block devices: data is read and written in fixed-size blocks (typically 512 bytes or 4096 bytes). The kernel's block layer abstracts away device-specific details behind a common interface.

/* Generic block device structure */
struct block_device {
    const char *name;
    int block_size;                    /* Block size in bytes (usually 512) */
    uint64_t num_blocks;               /* Total blocks on device */
    struct block_device_ops *ops;      /* Device-specific operations */
    void *private_data;                /* Driver-specific data */
    struct list_head bd_list;          /* For block device list */
};

/* Block device operations (filled in by each driver) */
struct block_device_ops {
    int (*read)(struct block_device *dev, uint64_t lba,
                void *buffer, int count);
    int (*write)(struct block_device *dev, uint64_t lba,
                 const void *buffer, int count);
};

35.2 The virtio-blk Driver

On QEMU virt, the primary storage device is virtio-blk, a paravirtualized block device. It uses a shared memory ring buffer (virtqueue) for communication between the guest kernel and the QEMU host:

/* virtio-blk device */
struct virtio_blk {
    uint64_t mmio_base;          /* MMIO base from device tree */
    struct virtqueue *vq;        /* Virtqueue for I/O requests */
    uint64_t capacity;           /* Number of 512-byte sectors */
    int features;                /* Negotiated features */
};

/* virtio-blk MMIO registers */
#define VIRTIO_MMIO_MAGIC     0x000
#define VIRTIO_MMIO_VERSION   0x004
#define VIRTIO_MMIO_DEVICE_ID 0x008
#define VIRTIO_MMIO_QUEUE_NUM 0x030
#define VIRTIO_MMIO_QUEUE_READY 0x044

/* virtio-blk request header */
struct virtio_blk_req {
    uint32_t type;              /* 0 = read, 1 = write */
    uint32_t reserved;
    uint64_t sector;            /* LBA starting sector */
    char data[0];              /* Data buffer follows */
    uint8_t status;            /* Status byte (0 = OK) */
};

/* Read blocks from virtio-blk */
int virtio_blk_read(struct block_device *dev, uint64_t lba,
                    void *buffer, int count) {
    struct virtio_blk *vblk = dev->private_data;

    /* Build request */
    struct virtio_blk_req *req = kmalloc(sizeof(*req) + count * 512);
    req->type = 0;              /* READ */
    req->sector = lba;
    memcpy(req->data, buffer, count * 512);

    /* Submit to virtqueue */
    virtqueue_add(vblk->vq, req, count * 512);

    /* Notify host */
    writel(vblk->mmio_base, VIRTIO_MMIO_QUEUE_NOTIFY, 0);

    /* Wait for completion (poll or interrupt) */
    while (!(readl(vblk->mmio_base, VIRTIO_MMIO_INTERRUPT_STATUS) & 1));

    /* Copy data back and free */
    memcpy(buffer, req->data, count * 512);
    int status = req->status;
    kfree(req);
    return (status == 0) ? 0 : -1;
}

35.3 Generic Block I/O Request Queue

Rather than calling the driver directly for every read/write, the kernel uses a generic I/O request queue. This allows scheduling, merging, and caching of requests:

/* Block I/O request */
struct bio_request {
    struct block_device *dev;
    uint64_t lba;
    void *buffer;
    int count;
    int dir;                     /* 0 = read, 1 = write */
    struct semaphore completion; /* For synchronous I/O */
    int error;
    struct list_head node;
};

/* Submit a synchronous I/O request */
int blk_read(struct block_device *dev, uint64_t lba,
             void *buffer, int count) {
    struct bio_request *req = kmalloc(sizeof(*req));
    req->dev = dev;
    req->lba = lba;
    req->buffer = buffer;
    req->count = count;
    req->dir = 0;
    sem_init(&req->completion, 0);

    /* Add to request queue */
    spinlock_lock(&blk_request_lock);
    list_add_tail(&blk_request_list, &req->node);
    spinlock_unlock(&blk_request_lock);

    /* Wake up the block I/O thread */
    sem_signal(&blk_request_sem);

    /* Wait for completion */
    sem_wait(&req->completion);
    int error = req->error;
    kfree(req);
    return error;
}

/* Block I/O kernel thread (processes requests) */
void blk_thread(void) {
    while (1) {
        sem_wait(&blk_request_sem);

        spinlock_lock(&blk_request_lock);
        struct bio_request *req = list_pop(&blk_request_list);
        spinlock_unlock(&blk_request_lock);

        if (req) {
            if (req->dir == 0)
                req->error = req->dev->ops->read(req->dev, req->lba,
                                                  req->buffer, req->count);
            else
                req->error = req->dev->ops->write(req->dev, req->lba,
                                                   req->buffer, req->count);
            sem_signal(&req->completion);
        }
    }
}

35.4 SD Card Interface (Raspberry Pi)

On real hardware (Raspberry Pi 4/5), storage is typically on an SD card connected via the SDHCI (SD Host Controller Interface) or the BCM2711 EMMC2 controller. The interface involves:

  1. Initializing the SD controller (clock, voltage, bus width)
  2. Sending SD commands (CMD0, CMD8, ACMD41, CMD2, CMD3) to identify the card
  3. Reading/writing sectors using CMD18 (read multiple) and CMD25 (write multiple)

On QEMU virt, SD cards are emulated via -drive file=disk.img,if=sd,format=raw. Our kernel detects and initializes the SDHCI controller from the device tree.

35.5 Our Implementation

Our storage subsystem (drivers/block/) provides:

  • Generic block layer: block_device interface with read/write operations
  • virtio-blk driver: for QEMU virtual disks (fast paravirtualized I/O)
  • SDHCI driver: for SD card access on Raspberry Pi
  • Block I/O thread: asynchronous request processing with semaphore synchronization
  • Partition support: parses the MBR partition table to detect partitions
  • Device registration: block devices appear as /dev/sda, /dev/sdb, etc. via devfs

35.6 Exercises

Exercise 1: MBR Parser

Implement a function that reads the Master Boot Record (LBA 0) and returns the four partition entries with their start sectors and sizes.

Exercise 2: Read Speed Benchmark

Benchmark the read speed of the virtio-blk device by reading 1 MB in various transfer sizes (512 bytes, 4 KB, 64 KB). Compare the throughput.

35.7 Summary

The storage subsystem provides block-level I/O for persistent storage. The generic block layer abstracts device details behind a common interface. The virtio-blk driver (for QEMU) uses paravirtualized I/O via virtqueues. The block I/O thread processes requests asynchronously, allowing other threads to continue while I/O is in progress. Our kernel supports both virtio-blk (QEMU) and SDHCI (Raspberry Pi) with MBR partition parsing.