Wednesday, November 11, 2009

Task Queue, Graph Partitioning and Loop Parallelism

I was ill therefore not very productive at the start of this week so this blog comes a bit late, but better late than never.

Task Queue
I was pretty happy with this pattern and felt it had enough examples for my taste. It did a decent job of describing both shared queues as well as a high level overview of work stealing although it was perhaps a bit light on the latter. The only thing I missed from the example section was in the Java example. When implementing task queues in shared memory programs one usually use a condition variable to put threads to sleep instead of spinning waiting for the queue to return non-null values, but I guess this is mentioned in the shared queue pattern.

Graph Partitioning
I was pretty unhappy with this pattern as it was hard to understand what they were talking about. Like Professor Johnson I get the impression that the pattern is really about how to map tasks onto processing elements at startup time for load balancing and reduced communication. However, this was far from clear from the start of the paper which quickly delved into the details. I suppose it could also be used to remap tasks at runtime if the graph changes. I read a pattern that sounded similar to me from paraplop09 a while back called "A Pattern Language for Topology Aware Mapping.". This pattern language talked about how to map a graph of tasks and the communication between these efficiently onto a machine topology graph so as to effectively use the machine's resources: "Given an object communication graph and a processor topology graph, how can we map the first graph onto the second such that most messages travel a small number of hops (links) on the network" In that paper the intent was much clearer and if these two patterns are talking about the same thing I prefer that version.

Loop Parallelism
Loop parallelism is a pretty common pattern for parallelization on shared memory machines besides fork-join. I thought this paper did a fairly good job of describing it and went through a lot of different techniques one can use to prepare loops for parallelization. However, unless one are already familiar with these techniques then one would probably not get them after reading this pattern as the descriptions were very brief (they did contain examples though!)

There were a few minor issues that I thought were strange. For example, why doesn't the i and j loops carry dependencies in the example on the top of page 3? It looks to me like they do (data created in iteration i of the i loop requires data from iteration i-1 and i+1) unless they employ double buffering in which case that should be mentioned.

Another problem I had was that they refer to the map-reduce pattern for descriptions of reduce. Surely a pattern on loop parallelism is the right place to include a description of both reduce and scans? Reduces are certainly used by map-reduce, but that is a very high level pattern and I think a more detailed description fits here.

No comments:

Post a Comment