Thursday, November 17, 2011

[qkhsskbg] Automatic deparallelization

Consider always writing code in a style using egregious fine grained parallelism: assume lots of cores with no communication latency and no overhead.

It is the compiler's job to deparallelize (unparallelize, serialize) the program to run on the actual number of cores available, taking into account communication latency and the overhead of parallelization.  While difficult, this still seems like an easier task than automatic parallelization of serial code.  It may be a just-in-time compiler, or even dynamic optimization by a run-time system.  The number of cores available might change depending on other processes running on the system.

Of particular difficulty is parallel code that does wasted work (compared to a good serial implementation), but with the assumption that the gain from parallization offsets the wasted work.  Another difficult one is a Monte Carlo simulation where each run takes a variable amount of time not known in advance, and the answer depends (slightly) on the order the results come back.  I think we need a theoretical foundation.

As with any compiler, the programmer should be able give hints of how to deparallelize effectively.

We need to rewrite most software which was written with the serial paradigm.  Consider it an opportunity rather than a burden.

Pure functional programming (e.g., Haskell) might be the answer, or at least a good first step.  All maps are pmaps.  But beware of serial constructs, often (but not always) monads.  Foldl and foldr are bad, but associative fold is good (parallel prefix?).

No comments :