Takeaways

We have only scratched the surface here, but hopefully you now know:

  • Why compilers and CPU hardware reorder loads and stores.
  • Why we need special tools to prevent these reorderings to communicate between threads.
  • How we can guarantee sequential consistency in our programs.
  • Atomic read-modify-write operations.
  • How atomic operations can be implemented on weakly-ordered hardware, and what implications this can have for a language-level API.
  • How we can carefully optimize lockless code using non-sequentially-consistent memory orderings.
  • How false sharing can impact the performance of concurrent memory access.
  • Why volatile is an inappropriate tool for inter-thread communication.
  • How to prevent the compiler from fusing atomic operations in undesirable ways.

To learn more, see the additional resources below, or examine lock-free data structures and algorithms, such as a single-producer/single-consumer (SP/SC) queue or read-copy-update (RCU).1

Good luck and godspeed!

1

See the LWN article, What is RCU, Fundamentally? for an introduction.