Implementing atomic read-modify-write operations with LL/SC instructions

Like many RISC1 architectures, Arm does not have dedicated RMW instructions. Given that the processor may switch contexts to another thread at any moment, constructing RMW operations from standard loads and stores is not feasible. Special instructions are required instead: load-link and store-conditional (LL/SC). These instructions are complementary: load-link performs a read operation from an address, similar to any load, but it also signals the processor to watch that address. Store-conditional executes a write operation only if no other writes have occurred at that address since its paired load-link. This mechanism is illustrated through an atomic fetch and add example.

On Arm

void incFoo() { ++foo; }

compiles to

incFoo:
  ldr r3, <&foo>
  dmb
loop:
  ldrex r2, [r3] // LL foo
  adds r2, r2, #1 // Increment
  strex r1, r2, [r3] // SC
  cmp r1, #0 // Check the SC result.
  bne loop // Loop if the SC failed.
  dmb
  bx lr

We LL the current value, add one, and immediately try to store it back with a SC. If that fails, another thread may have written to foo since our LL, so we try again. In this way, at least one thread is always making forward progress in atomically modifying foo, even if several are attempting to do so at once.

1

Reduced instruction set computer, in contrast to a complex instruction set computer (CISC) architecture like x64.