Thursday, November 5, 2009

Pipeline, Geometric Decomposition and Data Parallelism

Pipeline
Pipelining is an important pattern and is one of the best examples of clear-cut task parallelism (interestingly one book actually placed pipelining its own category disjoint from parallelism - Either things operate in parallel or they operate in a pipeline). Even though pipelining tend not to expose a lot of parallelism when used in software (usually just a few parallel tasks) it is sometimes the only parallelism we can extract. And in other cases it can be composed with data or other task parallelism, which gives it the effect of multiplying the parallelism we would otherwise have in the system.

The pattern on pipelining started out with a lot of (interesting) facts about the usage of pipelining in hardware that an application developer trying to implement software pipelining in his application won't care about. At this level in the pattern language we are not really concerned with vector processors and certainly not with instruction pipelining and these discussions therefore distract from the goal. A better approach would be to remove these facts, or at least move them to the known uses section, and focus on how to implement multithreaded/multiprocess software pipelines in our applications. Besides, having the first example in assembly is pretty strange.

The universal forces section mention throughput vs latency. I have heard this consideration before when comparing different CPUs and CPUs to GPUs. In hardware it makes sense as the tradeoff is between adding transistors to perform more work on one instruction per cycle or adding transistors to perform work on multiple instructions per cycle (for example through pipelining). but I wonder how relevant it is for a software pipeline. If you disregard communication cost between pipeline stages a given software task will take as long as it takes, no matter if you have a deep or shallow pipeline. It should not affect latency if you have a two stage pipeline performing n+n work or if you have a single task performing 2n work. Of course this is altered by communication between pipeline stages, but is that what the author has in mind? Or am I missing something?

Finally, I did not like the example. Why use matlab and not languages that are more well known to the target audience like C or Java? He says it is to not implement fft, but fft could just have been hidden behind a function in the example. And the example code just shows how to crop out a circle to do blurring using fft and matalb, not how to actually implement a pipeline. I would be much more satisfied with an example implemented using threads or message passing to show how to set up a pipeline (the parallel programming patterns book from which this pattern was modified had the java code for this. Why was it removed?). The first example could even be as simple as having each stage add a constant to a stream of numbers, with fft as an advanced second example.


Geometric Decomposition
I think this pattern is pretty important and I also think that it is specific enough as opposed to "Data Parallelism" which is rather vague. This description of it discussed the most important considerations when implementing geometric decomposition and it contains a lot of good information, but it definitely lacked an example with real code. Again, the Berkeley format suffers from having an explicit Example section which discourages them from interleaving the example with the solution description. And even the example in the example section only exemplifies how to do heat diffusion, with no code and little text showing how to do this in parallel. After reading this pattern I know about many things that we should consider when doing geometric decomposition, but I know nothing about how to actually begin implementing them.

It also boggles my mind that one can describe something as closely related to Cartesian coordinate system and real world phenomena's without even a single figure showing a grid of cells/points that are distributed amongst processors. But then again I am very much a graphical thinker. Indeed their only figure when describing how to solve a problem in parallel is of a heat equation...

What I don't understand is that the parallel programming patterns book contains these figures and the code in addition to the issues described here. Why can't they just use those, since they patterns like Discrete Event was taken almost directly from it? Is there some restriction on how much they can take, even though the author of this pattern is also an author of the book, or do they explicitly want to shorten it?

Data Parallelism
There is not really much to say about the pattern description as it is only three sentences long. Like the Task parallelism pattern this pattern is too vague. In the geometric decomposition pattern they say that if the geometric chunks have no dependencies then it is an instance of task parallelism because it is embarrassingly parallel. Why isn't it data parallelism? We discussed this in class and the distinction is very vague, which makes me think task and data parallelism are not patterns, but two potentially intersection categories. If they must be included in OPL then they should be two sub-categories of algorithm strategy. If they were then I would put the following patterns in the data parallel group: Geometric Decomposition, Recursive Splitting (yes really - you perform only one task repeatedly to the data structure although the granularity of the data differ dynamically. "The splitting is not a part of the core computation") and at least parts of non-work-efficient parallelism such as Scans. Discrete event, pipelining and speculation falls into the task parallel category.

Finally, I agree with Professor Johnson in that we should replace them with the Embarrassingly Parallel pattern. But even embarrassingly parallel can be both task and data parallelism. Embarrassingly task parallelism is to me when you have different tasks that do not need to communicate (MIMD). Embarrassingly data parallelism is when you have one task that operate on multiple pieces of data (SIMD). Of course this gets a bit vague when considering that the data parallel operations could have branches, but as long as those branches branch based on the data I still think it is data parallelism. Whether there, based on this, should be one or two embarrassingly parallelism pattern is not really that important and I guess it depends on whether you are a lumper or a splitter as Professor Johnson so nicely put it.

No comments:

Post a Comment