Consume

Last but not least, we introduce memory_order_consume. Imagine a situation where data changes rarely but is frequently read by many threads. For example, in a kernel tracking peripherals connected to a machine, updates to this information occur very infrequently—only when a device is plugged in or removed. In such cases, it is logical to prioritize read optimization as much as possible. Based on our current understanding, the most effective strategy is:

std::atomic<PeripheralData*> peripherals;

// Writers:
PeripheralData* p = kAllocate(sizeof(*p));
populateWithNewDeviceData(p);
peripherals.store(p, memory_order_release);
// Readers:
PeripheralData *p = peripherals.load(memory_order_acquire);
if (p != nullptr) {
    doSomethingWith(p->keyboards);
}

To further enhance optimization for readers, bypassing a memory barrier on weakly-ordered systems for loads would be ideal. Fortunately, this is often achievable. The data being accessed (p->keyboards) relies on the value of p, leading most platforms, including those with weak ordering, to maintain the sequence of the initial load (p = peripherals) and its subsequent use (p->keyboards). However, it is notable that on some particularly weakly-ordered architectures, like DEC Alpha, this reordering can occur, much to the frustration of developers. Ensuring the compiler avoids any similar reordering is crucial, and memory_order_consume is designed for this purpose. Change readers to:

PeripheralData *p = peripherals.load(memory_order_consume);
if (p != nullptr) {
    doSomethingWith(p->keyboards);
}

and an ARM compiler could emit:

  ldr r3, &peripherals
  ldr r3, [r3]
  // Look ma, no barrier!
  cbz r3, was_null // Check for null
  ldr r0, [r3, #4] // Load p->keyboards
  b doSomethingWith(Keyboards*)
was_null:
  ...

Sadly, the emphasis here is on could. Figuring out what constitutes a “dependency” between expressions is not as trivial as one might hope,1 so all compilers currently convert consume operations to acquires.

1

Even the experts in the ISO committee’s concurrency study group, SG1, came away with different understandings. See N4036 for the gory details. Proposed solutions are explored in P0190R3 and P0462R1.