Tuesday, October 20, 2009

Map Reduce and Event-Based Parallel Patterns

Map_Reduce
This pattern described he very topical map-reduce algorithm. Map-reduce has been around for a long time and I believe it is supported natively in some functional programming languages (OCaml?) under the guise of map-fold. The pattern does a decent job of describing the component that I need (the map and reduce functions) as well as some aspects of implementing a map-reduce framework, but I think it could do with a coding example using a simple toy map-reduce library in which a simple operation such as for example scaling down images and then merge them could be implemented.

Also, the implementation section seemed to assume a master-worker paradigm underneath ("the software architecture is logically based on master-worker"), but although this is certainly one reasonable way to implement it there is no reason why one shouldn't do statical load balancing or work stealing instead in certain situations (I don't consider work stealing a sub-category of master worker as OPL does). They do later suggest a round robin (cyclic) schedule for some tasks, but I still think they put too much focus on master-worker which is described elsewhere and really an implementation detail of this pattern.

I also wonder whether filter/reduce should be discussed as a variant of this pattern. It is certainly used out there, but maybe it is a different pattern.

Finally, I their precondition that the reduce function must be associative. Perhaps this should be changed to "should for optimal performance be" as one can certainly get great speedups from the map phase even if the reduce step has to be performed sequentially.

Event-based, Implicit Invocation
This pattern discusses the architecture of event-based systems. One very common example of this is GUI framework where events that are mainly triggered by the user are passed on to event handlers that subscribe to that event type. This pattern was contained more details than the other OPL patterns we have read and for that reason I thought it was one of the better ones. However, at one point they state that "load balancing is significantly simpler with a centralized medium". Does not a centralized medium make the load balancing problem go away entirely for this pattern?

Later on in the example section they discussed GUI event-based systems. These systems have a centralized manager and this is identified as the biggest impediment to scalability. However, I would claim that the biggest impediment is not the centralized manager, but the fact that the event handling is done as blocking functions. If any of these functions take some time to complete they will block everything. This can also be hard to figure out as if one for example use file IO then everything may work well, but then the GUI suddenly starts to lock when one start accessing files through a networked file system. I thought such issues, seeing as they are so common with this very important case of event-based systems, deserve to be mentioned more clearly.

No comments:

Post a Comment