Topics of Software Engineering Course Blog: Making reliable distributed systems in the presence of software errors

Chapter 2 of Joe Armstrong's doctoral thesis on "" discusses the architectural model of a system that can achieve this goal, presenting Erlang as one such system. Processes are here the central concept in the system and in the Erlang language. The chapter argues that the only way to have true reliability in the presence of bugs is to encapsulate each module in its own process so that errors can be contained and the process killed if an error is detected. As such modules should not attempt to correct errors, but instead kill themselves, leaving recovery to someone else (usually to the runtime environment). This is a strong property that simplifies the recovery model as one can perform recovery from code that has not had any errors yet. It also frees the programmer from programming defensively as he/she can rely on any other process either succeeding at what they were supposed to do or crashing. As such one does not have to deal with extensive error checking and error handling code for each call to another module.

The concept of a Concurrency Oriented Programming Language (COPL) is introduced in which programs should model the world like one does in Object Oriented programming. However, unlike Object Oriented programming, it recognizes that the world is concurrent and programmers are encouraged to map the concurrency of the world one-to-one with the concurrency of their programs. To achieve this it is made a first class citizen of the language and the runtime environment utilizes green threads to make concurrency cheap, which gives the programmer the freedom to model all the concurrency of the problem domain.

Processes communicate through asynchronous message passing. This means that no state can be shared and that if another process needs some data then that data must be copied to that process by sending a message. This enforces the use of the Separable parallel pattern which tends to lead to more scalable programs as one need not serialize access to the data through mutual exclusion. This property, in addition to the fact that the messages are asynchronous, also means that the potential for deadlocks is (almost?) completely removed. Finally, programs that copy data (such as one has to do in message passing APIs such as MPI) have greater data locality and avoid false sharing problematics, which are important for performance.

In Erlang concurrency is not intended for performance. The main goals of programming concurrently is reliability (through the process abstraction), scalability, security and even modularity.

Topics of Software Engineering Course Blog

Tuesday, October 20, 2009

Making reliable distributed systems in the presence of software errors - Chapter 2

No comments:

Post a Comment

About Me

Blog Archive

Followers