Spurious LL/SC failures
It is impractical for CPU hardware to track load-linked addresses for each byte within a system due to the immense resource requirements. To mitigate this, many processors monitor these operations at a broader scale, like the cache line level. Consequently, a SC operation may fail if any part of the monitored block is written to, not just the specific address that was load-linked.
This limitation poses a particular challenge for operations like compare and swap,
highlighting the essential purpose of compare_exchange_weak
.
Consider, for example, the task of atomically multiplying a value without an architecture-specific atomic read-multiply-write instruction.
void atomicMultiply(int by)
{
int expected = foo;
// Which CAS should we use?
while (!foo.compare_exchange_?(expected, expected * by)) {
// Empty loop.
// (On failure, expected is updated with foo's most recent value.)
}
}
Many lockless algorithms use CAS loops like this to atomically update a variable when calculating its new value is not atomic. They:
- Read the variable.
- Perform some (non-atomic) operation on its value.
- CAS the new value with the previous one.
- If the CAS failed, another thread beat us to the punch, so try again.
If we use compare_exchange_strong
for this family of algorithms,
the compiler must emit nested loops:
an inner one to protect us from spurious SC failures,
and an outer one which repeatedly performs our operation until no other thread has interrupted us.
But unlike the _strong
version,
a weak CAS is allowed to fail spuriously, just like the LL/SC mechanism that implements it.
So, with compare_exchange_weak
,
the compiler is free to generate a single loop,
since we do not care about the difference between retries from spurious SC failures and retries caused by another thread modifying our variable.