1) I'd have liked if you would have dived into those cases where you do actually flush the CPU cache. I've run into this maybe once or twice in my entire career, and this was while doing MIPS kernel drivers. I'm guessing it would be cool for the audience to understand what shenanigans are needed to actually require it, particularly as more people will be transitioning from x86 to ARM.
2) You are ascribing meaning to volatile which it absolutely does not have (in C/C++). You really should be going deeper into read/store memory barriers. Using volatile in the hope that this allows you to do some kind of synchronization is misguided.
The classic case where you need to do a cache flush is when you're a JIT writing out native code. Once you've written the instructions into memory you need to clean the data cache [ie ensure that your changes are made visible to anything 'below' you in the memory hierarchy] and invalidate the icache [ie throw away any info you have], so that when the CPU starts to execute instructions from the memory you've just written it doesn't get the stale old versions that might otherwise be in the icache. In fact you only need to clean data out to the point in the memory hierarchy where the icache and the dcache for all your cores come together, which is probably not the same as "write it all really back to system RAM" but is basically indistinguishable from such by the programmer.
NB that x86 maintains coherency between icache and dcache (unlike ARM, say), so you don't need to do this on that CPU architecture.
You are absolutely right that volatile is inadequate for ordering C/C++ concurrent algorithms, and memory barriers/fences are additionally required. I tried to focus on the hardware in this article.
1) I'd have liked if you would have dived into those cases where you do actually flush the CPU cache. I've run into this maybe once or twice in my entire career, and this was while doing MIPS kernel drivers. I'm guessing it would be cool for the audience to understand what shenanigans are needed to actually require it, particularly as more people will be transitioning from x86 to ARM.
2) You are ascribing meaning to volatile which it absolutely does not have (in C/C++). You really should be going deeper into read/store memory barriers. Using volatile in the hope that this allows you to do some kind of synchronization is misguided.