Consume
Last but not least, we introduce memory_order_consume
.
Imagine a situation where data changes rarely but is frequently read by many threads.
For example, in a kernel tracking peripherals connected to a machine,
updates to this information occur very infrequently—only when a device is plugged in or removed.
In such cases, it is logical to prioritize read optimization as much as possible.
Based on our current understanding, the most effective strategy is:
std::atomic<PeripheralData*> peripherals;
// Writers:
PeripheralData* p = kAllocate(sizeof(*p));
populateWithNewDeviceData(p);
peripherals.store(p, memory_order_release);
// Readers:
PeripheralData *p = peripherals.load(memory_order_acquire);
if (p != nullptr) {
doSomethingWith(p->keyboards);
}
To further enhance optimization for readers,
bypassing a memory barrier on weakly-ordered systems for loads would be ideal.
Fortunately, this is often achievable.
The data being accessed (p->keyboards
) relies on the value of p
,
leading most platforms, including those with weak ordering,
to maintain the sequence of the initial load (p = peripherals
) and its subsequent use (p->keyboards
).
However, it is notable that on some particularly weakly-ordered architectures, like DEC Alpha,
this reordering can occur, much to the frustration of developers.
Ensuring the compiler avoids any similar reordering is crucial, and memory_order_consume
is designed for this purpose.
Change readers to:
PeripheralData *p = peripherals.load(memory_order_consume);
if (p != nullptr) {
doSomethingWith(p->keyboards);
}
and an ARM compiler could emit:
ldr r3, &peripherals
ldr r3, [r3]
// Look ma, no barrier!
cbz r3, was_null // Check for null
ldr r0, [r3, #4] // Load p->keyboards
b doSomethingWith(Keyboards*)
was_null:
...
Sadly, the emphasis here is on could. Figuring out what constitutes a “dependency” between expressions is not as trivial as one might hope,1 so all compilers currently convert consume operations to acquires.