Takeaways

We have only scratched the surface here, but hopefully you now know:

Why compilers and CPU hardware reorder loads and stores.
Why we need special tools to prevent these reorderings to communicate between threads.
How we can guarantee sequential consistency in our programs.
Atomic read-modify-write operations.
How atomic operations can be implemented on weakly-ordered hardware, and what implications this can have for a language-level API.
How we can carefully optimize lockless code using non-sequentially-consistent memory orderings.
How false sharing can impact the performance of concurrent memory access.
Why volatile is an inappropriate tool for inter-thread communication.
How to prevent the compiler from fusing atomic operations in undesirable ways.

To learn more, see the additional resources below, or examine lock-free data structures and algorithms, such as a single-producer/single-consumer (SP/SC) queue or read-copy-update (RCU).¹

Good luck and godspeed!

See the LWN article, What is RCU, Fundamentally? for an introduction.

Concurrency Primer

Takeaways