Having a distributed memory system simulate shared memory, i.e., cache coherent non uniform memory access, is a very convenient abstraction for programmers. However, what other features could be provided to make it possible for a dedicated programmer to optimize things, breaking the abstraction?
The OS moves blocks of memory closer to the processor using it. A program can declare in advance that it plans to frequently access (or write) a block of memory. Or the program could explicitly request a block of memory be moved closer. Conversely the program could prevent the OS from automatically moving a block of memory if it knows it will be a bad idea.
Alternatively, some way of migrating a process to the processor close to the memory that the process will accessed. Perhaps automatic, perhaps explicitly yes or explicitly disable automatic.
A program could issue a bunch of memory requests, wait for the first few to respond, then cancel the remaining requests. The remaining requests, if they arrive back, should not displace elements in the cache.
No comments :
Post a Comment