Wednesday, September 21, 2016

[ykvgtzld] Nonuniform shared memory

Having a distributed memory system simulate shared memory, i.e., cache coherent non uniform memory access, is a very convenient abstraction for programmers.  However, what other features could be provided to make it possible for a dedicated programmer to optimize things, breaking the abstraction?

The OS moves blocks of memory closer to the processor using it.  A program can declare in advance that it plans to frequently access (or write) a block of memory.  Or the program could explicitly request a block of memory be moved closer.  Conversely the program could prevent the OS from automatically moving a block of memory if it knows it will be a bad idea.

Alternatively, some way of migrating a process to the processor close to the memory that the process will accessed.  Perhaps automatic, perhaps explicitly yes or explicitly disable automatic.

A program could issue a bunch of memory requests, wait for the first few to respond, then cancel the remaining requests.  The remaining requests, if they arrive back, should not displace elements in the cache.

