On Fri, Dec 29, 2017 at 1:10 PM, Chris Caudle <chris(a)chriscaudle.org> wrote:
On Fri, December 29, 2017 10:23 am, Paul Davis wrote:
​Are they NUMA in the "traditional"
sense that there are local caches
and a complex cache invalidation scheme? Or just NUMA in the sense that
"it's a
bit slower to get there from here"?
Well, technically NUMA means non-uniform memory access, so any shared
memory scheme that has differing access speeds for different parts of the
address space counts as NUMA, but yes, any modern multi-socket server or
workstation is considered cache-coherent NUMA, and even many single die
processors have multiple levels of cache that are dedicated per core or
pair of cores, and could be considered a form of NUMA. The new AMD server
and workstation processors are actually NUMA on package, each package has
four separate die mounted with cache-coherent interconnect between the
die, and each die has two memory controllers. On each of the four die the
8 processor cores are arranged in pairs, with some levels of cache
dedicated to each pair, and some shared between all four pairs. So yes,
"complex" is an apt description of the cache management. It is somewhat
mind boggling that it works at all, much less has the performance levels
it does.
that description makes it sound as if someone (at least AMD) did in fact
figure out a working cache architecture for NUMA. i was working on highly
parallel systems (64 cores) in the mid 90s, and my impression was that
everybody just gave up on the idea that we could ever get cache coherency
correct. it warms my tiny cold (*) heart if someone actually managed to do
so.
(*) it's just the local weather for 10 days, not a psychological condition