however, the problem with fixing this problem has never been identifying where to put the barriers, it has been adding them in a portable way. __atomic_* are, as far as i can tell, gcc-specific. am i wrong about that?
I think you're almost correct. Clang supports them too, so __atomic_* functions are portable across