Takeaways
We have only scratched the surface here, but hopefully you now know:
- Why compilers and CPU hardware reorder loads and stores.
- Why we need special tools to prevent these reorderings to communicate between threads.
- How we can guarantee sequential consistency in our programs.
- Atomic read-modify-write operations.
- How atomic operations can be implemented on weakly-ordered hardware, and what implications this can have for a language-level API.
- How we can carefully optimize lockless code using non-sequentially-consistent memory orderings.
- How false sharing can impact the performance of concurrent memory access.
- Why
volatile
is an inappropriate tool for inter-thread communication. - How to prevent the compiler from fusing atomic operations in undesirable ways.
To learn more, see the additional resources below, or examine lock-free data structures and algorithms, such as a single-producer/single-consumer (SP/SC) queue or read-copy-update (RCU).1
Good luck and godspeed!
1
See the LWN article, What is RCU, Fundamentally? for an introduction.