ARM64 OS Handbook
🔍

Chapter 25: Synchronization

What You Will Learn in This Chapter
  • Why synchronization is needed in concurrent systems
  • Race conditions and how they occur
  • Critical sections and mutual exclusion
  • Atomic operations and their role in synchronization
  • ARM64 atomic instructions: LDXR/STXR
  • Deadlocks, livelocks, and starvation

25.1 The Problem: Race Conditions

A race condition occurs when two or more threads access shared data concurrently and at least one access is a write. The result depends on the timing of the accesses, which is unpredictable. Consider:

/* Shared counter */
int counter = 0;

/* Two threads execute this simultaneously */
counter++;  /* Read counter, increment, write counter */

If both threads read counter=0 at the same time, they both write 1, and the final value is 1 instead of 2. This happens because counter++ is not an atomic operation: it is three separate steps (load, add, store). Without synchronization, concurrent updates lose data.

25.2 Critical Sections

A critical section is a region of code that accesses shared data and must not be executed by more than one thread at a time. The goal of synchronization is to ensure mutual exclusion: only one thread executes in the critical section at any moment.

// enter_critical_section();  /* Wait until no other thread is inside */
counter++;                     /* Critical section */
// exit_critical_section();   /* Allow other threads to enter */

25.3 Atomic Operations

At the hardware level, certain operations can be made atomic (indivisible). ARM64 provides the Load-Exclusive/Store-Exclusive (LDXR/STXR) instructions for building atomic operations:

/* ARM64 atomic compare-and-swap (CAS) */
int atomic_cas(uint64_t *ptr, uint64_t expected, uint64_t newval) {
    uint64_t result;
    int success;

    asm volatile(
        "    mov %w1, #0\n"        /* success = 0 */
        "1:  ldxr %0, [%2]\n"      /* Load exclusive */
        "    cmp %0, %3\n"         /* Compare with expected */
        "    b.ne 2f\n"            /* If not equal, fail */
        "    stxr %w1, %4, [%2]\n" /* Store exclusive */
        "    cbnz %w1, 1b\n"       /* Retry if store failed */
        "2:"
        : "=&r"(result), "=&r"(success)
        : "r"(ptr), "r"(expected), "r"(newval)
        : "memory"
    );

    return success;
}

/* Atomic increment (using LDADD in ARMv8.1+) */
void atomic_inc(uint64_t *ptr) {
    asm volatile("ldadd xzr, %0, [%1]" : "=r"(*(ptr)) : "r"(ptr) : "memory");
}

/* Brief reminder: LDXR loads a value and marks the memory location for
   exclusive access. STXR stores only if no other core or thread has
   written to the location since the LDXR. If another write occurred,
   STXR fails and returns non-zero, and we retry. */

25.4 Memory Ordering

Modern CPUs reorder memory operations for performance. Without ordering constraints, a thread might observe shared memory updates in a different order than they were written. ARM64 provides memory barriers:

#define dmb()  asm volatile("dmb ish" ::: "memory")  /* Data memory barrier */
#define dsb()  asm volatile("dsb ish" ::: "memory")  /* Data sync barrier */
#define isb()  asm volatile("isb" ::: "memory")       /* Instruction sync barrier */

/* Example: producer-consumer ordering */
/* Thread A (producer) */
data->value = 42;
data->ready = 1;
dmb();  /* Ensure data writes are visible before signaling */

/* Thread B (consumer) */
while (!data->ready) { /* spin */ }
dmb();  /* Ensure we see the data writes after seeing ready=1 */
int val = data->value;  /* Guaranteed to see 42 */

25.5 Deadlocks

A deadlock occurs when two or more threads are waiting for each other to release resources, and none of them can proceed. Four conditions must hold for a deadlock (Coffman conditions):

  1. Mutual exclusion: resources cannot be shared
  2. Hold and wait: a thread holds resources while waiting for others
  3. No preemption: resources cannot be forcibly taken
  4. Circular wait: a cycle of waiting relationships exists

To prevent deadlocks: acquire locks in a fixed order, use lock hierarchies, or use try-lock with backoff.

25.6 Our Implementation

Our kernel provides these synchronization primitives, built from atomic operations and memory barriers:

  • Spinlocks: busy-wait locks for short critical sections (Chapter 27)
  • Mutex: sleep-wait locks that block the thread (Chapter 26)
  • Semaphores: counting synchronization (Chapter 28)
  • Read-Write locks: multiple readers, exclusive writer
  • Condition variables: wait for a condition with a mutex

All primitives use ARM64 LDXR/STXR for atomicity and DMB ISH barriers for memory ordering. The kernel also provides local IRQ disable for synchronization between a thread and an interrupt handler on the same core:

/* Disable/enable interrupts on the current core */
uint64_t local_irq_save(void) {
    uint64_t flags;
    asm volatile("mrs %0, daif" : "=r"(flags));
    asm volatile("msr daifset, #2" : : : "memory");  /* Mask IRQ */
    return flags;
}

void local_irq_restore(uint64_t flags) {
    asm volatile("msr daif, %0" : : "r"(flags) : "memory");
}

/* Use for: protect shared data accessed from both thread and interrupt context */
void uart_write_char(char c) {
    uint64_t flags = local_irq_save();
    while (*(volatile uint32_t *)(UART_BASE + UART_FR) & UART_FR_TXFF);
    *(volatile uint32_t *)(UART_BASE + UART_DR) = c;
    local_irq_restore(flags);
}

25.7 Exercises

Exercise 1: Race Detection

Write a program with two threads incrementing a shared counter 100,000 times each without synchronization. Run it and observe the incorrect result.

Exercise 2: Deadlock Creation

Write code that produces a deadlock using two mutexes. Verify the deadlock by observing that both threads are blocked forever.

25.8 Summary

Synchronization is essential for correctness in concurrent systems. Race conditions occur when multiple threads access shared data without coordination. Critical sections enforce mutual exclusion using atomic operations and memory barriers. ARM64 provides LDXR/STXR for atomic read-modify-write and DMB/DSB/ISB for memory ordering. The kernel builds spinlocks, mutexes, semaphores, and condition variables on top of these hardware primitives. Deadlocks are a key hazard, prevented by ordered lock acquisition and try-lock patterns.