A workshop on distributed computing was held 12-13 October 1978 at the Harvard Faculty Club in Cambridge, Massachusetts. The goal of the workshop was to encourage discussion, investigation, and exploration of the fundamental problems of distributed computing. The workshop was sponsored by the Laboratory for Computer Science of M.I.T., and organized by Barbara Liskov and Liba Svobodova, of M.I.T. Approximately twenty-five computer scientists with active interest in and experience with the area of distributed computing attended the workshop. The Appendix lists the attendees.
The workshop was, by design, highly informal. To start the workshop, Dave Clark presented the basic concepts being developed by the Distributed Systems Group (DSG) of the Laboratory for Computer Science. This produced a number of comments, questions, and arguments which were then used to provide a loose framework for the remainder of the workshop. The remaining sessions consisted of informal extemporaneous presentations and discussions as various issues were raised and considered by the group. Figure 1 lists the session structure.
Figure 1: Workshop on Distributed Computing - Session Content
In this note, we report on the issues that were discussed at the workshop. The field of distributed computing is still very much in its infancy, and as such, much of the discussion at the workshop concerned the problems which have been encountered or are envisioned; definitive solutions to these problems are not yet available, but rather several approaches to solutions are possible.
Clark's remarks were intended to consider why distributed systems exist (or will exist), and from these considerations, to deduce some of the constraints on the resulting structures and languages for distributed computing. Clark established first of all that the interest of DSG is in information intensive applications with naturally distributed information, as opposed to distributed computing for improved performance (such as off-loading).
One very strong constraint on these systems is the need for autonomy. The distributed components of a distributed system should be autonomous. This includes the ability to remove a machine from the network (although this ability is needed for reliability anyway), but more generally, it means that the internal algorithms and organization of information can be freely selected at each node, independently of other nodes in the network. Decisions about what information is to be kept, how it is to be organized, how it is to be processed, and for what purposes it may be used are all to be locally decided.
The requirement of autonomy must be balanced against the need for coherence. A distributed system should not be distributed anarchy. Thus, certain conventions and protocols must be maintained, but only where two nodes interact, and where a choice exists between autonomy and coherence, Clark would give the choice to autonomy.
The system under development by DSG at M.I.T. does not allow information to be moved between nodes dynamically by the system, but maintains that the location and movement of information is to be controlled by the application developers. Protection is needed only between separate nodes, but not necessarily within a node. This implies that local data is inherently different from remote data, and this difference will show through to the application builder.
Clark went on to describe briefly the specific system under development at M.I.T.. This system is described in the 1978 progress report of DSG [Clark, et. al., 1978], available from the authors.
Several points were made during Clark's presentation which elicited comment and debate. One such topic was: Why a distributed system? The comments on this question included:
This last point (protection) was hotly debated and produced the statement that security results from a logically distributed system, and not from physical distribution. It was argued however that physically distributed systems, where your part resides at your site, provide more apparent protection of information by providing physical control over that part of the system; information can flow in and out only over easily identified wires. This produces "warm feelings" in the user of the system. This explicit control over local processing and information is presumably the major reason for physically distributed systems.
However, it was generally recognized that physical distribution is not as important as logical distribution. This logical distribution could be implemented on one processor, on several processors at one site, or at several processors at different sites. The physical separation of processors affects mainly assumptions about bandwidth and response time. It was suggested that these parameters might greatly influence how a distributed system is organized.
Redell and Popek produced the following list of system parameters and assumptions. The discussion of these topics indicated that there is a very wide spectrum in distributed systems, and this may result in vastly different logical structures.
A related issue is the question of the cost of processes and messages. Most approaches to distributed systems are based on the concept of multiple processes communicating by exchanging messages. The feasibility of this approach is largely dependent upon having cheap processes and cheap messages.
Gray argued that, currently, neither of these are cheap. Processes, at least on IBM systems, are very big, and expensive. A process has memory, registers, protection information, virtual memory information, accounting, timing, files, and so on. Thus a process switch can be very expensive. The growth of operating system services and features tends to make processes even more expensive by associating more and more state information with each process. Cheap processes would seem to require minimal operating system support (requiring minimal state information).
Messages are a similar problem. Message passing appears to be very expensive. Although it was difficult to establish uniform definitions of what was being measured, a time of about 20 milliseconds was quoted as the round trip time to send a (null) message and receive an answer on the Xerox Alto systems, with similar numbers put forth for IBM systems and Multics. This is both a surprisingly high and surprisingly uniform time. Sturgis indicated that, on the Xerox system, several recoding efforts had failed to significantly change this number. Still no one was able to indicate exactly where the time went, or why 20 ms should be a universal lower bound on message passing time.
A related topic, brought up by Sturgis, was a recent paper by Lauer and Needham presented at IRIA [Lauer and Needham, 1978] The Lauer and Needham paper argued that, for real (single processor) systems, message passing and procedure calling are essentially equivalent, and hence either construct can be used for the basic implementation of an operating system. This provoked a heated argument about the extension of this duality to distributed systems. The suggestion was that communication should be by a remote procedure call rather than sending a message and waiting for a return. This reduces the semantics at the programmers' level to only the semantics of the procedure call, simplifying the programmers' view by eliminating the need for messages.
One complaint was that treating message passing as procedure calling forces every request (procedure call) to have a reply (procedure return). It was argued that there exist systems which are structured as a pipeline, where all information flows in one direction only and this request/reply model is inappropriate.
Also, it was argued that remote procedure calls or messages are semantically different from local procedure calls. For local procedure calls, the responses from a call are of two types:
In either case, positive knowledge of the result of the procedure call exists. Even in cases of an infinite loop in the procedure, a timer can be set and a result of class (b) occurs.
For remote procedure calls or messages, these two types of responses to a request still occur, but another result is possible:
In this case, it is unknown if the request message was lost (procedure call not made), the server failed but was unable to respond so, or the server completed but the reply message was lost (procedure return not made). It was claimed that this third result class is a semantic difference between local and remote procedure calls.
The differences cited above for local and remote procedure calls hold for any remote reference across the net. Thus it was argued that remote references (messages) are fundamentally different from local references. This raises the problem of transparency: should accesses to remote objects appear differently to the programmer from accesses to local objects? While there appeared to be strong feelings on this subject, no agreement was reached. It was agreed only that this problem is tied to two other problems: naming and binding.
Within the system, objects (data and processes) need names. However, there may be several different levels of object names including:
local addresses system wide addresses routing information complete path names unique (system wide) identifiers symbolic names
It is important to realize that these names are potentially quite different. Failure to appreciate these differences can lead to considerable confusion. Any conversation about naming must maintain a consistent meaning for the concept of a name.
Objects will generally be referred to by unique system wide identifiers (uid). This uid will be bound to some physical address to allow access to the physical object. An important aspect of this binding is the frequency of binding, and the set of actions which can occur between a binding and the subsequent unbinding. If the object cannot be moved after it is bound to a uid, then the uid can contain location and routing information. On the other hand, if the object can be moved, then the uid must be location independent, and local and remote accesses must be transparent (since the local or remote nature of the access cannot be determined).
Binding time can be delayed by the introduction of indirection, with a resulting loss of efficiency. Also, transparency can be provided by an additional level of user software, if it is not supported by the system.
Much of the group seemed to support the view that data objects exist within (data abstraction) modules providing a limited set of operations on the objects. Object names would consist of an (module name, local name) pair. Objects could only be manipulated by presenting the module with the object name; the module would use the local name to identify the particular object. Of course, this view rests on the crucial assumpution that a data abstraction is implemented entirely in one node -- an assumption that some of the group was unwilling to make.
Naming and transparency also affect reliability and robustness. If objects cannot move and locations are not transparent, then these objects will become inaccessible if the processor or links needed for these objects fail; the system cannot reconfigure itself to recover from a failure.
The topics of reliability and recovery were considered by Svobodova and Shrivastava.
Svobodova discussed the multiple update problem for copies of an object from two points of view, depending upon the reason for providing the copies. Copies can be maintained for reliability: a master plus k back-ups. In this case, all updates can be funneled through the master into the sequential back-up-copy managers. In the other case, multiple copies are provided to increase efficiency of access: accessing the closest copy will reduce communication delays while the existence of multiple copies will allow concurrent access increasing bandwidth. In this case, it is claimed, most programs can accept a copy which is slightly out-of-date in exchange for the faster access; a program which requires the most current copy can always access the master file. Changes to the copies need be made only within a reasonable time after the master is changed.
Shrivastava talked about the recovery of concurrent processes based on three techniques:
Transactions, a reliability concept similar to spheres of control, was advanced by Gray. A transaction is semantically atomic, it either happens (commits) or does not happen (aborts). The important point is that messages and data cannot be lost or partially changed from the point of view of the application programmer. (The case (c) in the discussion of procedures/messages above is not allowed). From the programmer's point of view a transaction might look like:
begin transaction Send ... Receive ... Send ... Send ... ... end transaction (commit or abort)
In a transaction either everything happens or nothing happens. Thus the transaction becomes a unit of recovery.
IBM's CICS/ISC system implements distributed transaction processing. CICS/ISC is a distributed data management system which allows transactions to act on "non-local" objects (databases, queues, or terminals) by translating remote procedure calls into messages to remote servers. Key aspects of CICS/ISC are:
Although it is attractive to assume that the job with sole responsibility for retaining certain data or performing some processing is wholly contained in one physical node of the distributed system, that assumption cannot always be made. Rovner and Jones discussed different aspects of the distributed job issue. Their remarks were based on their respective systems which are in an advanced state of development.
Rovner focused on the issue of restarting or continuing a job after recovering from an error. A job is defined as a collection of processes communicating via messages. In the University of Rochester system each distributed job has one controlling process which is responsible for handling errors encountered by all processes in the job. A registry of notification is used to specify which process should be notified in case of process or physical node failure. In addition, each process name includes an invocation number. If a node should fail and be brought up, then the servers implemented by the node are restarted. Their invocation numbers will differ from their previous invocation numbers and so the requesting processes can be notified of the server failure. They may then respond appropriately.
Jones defined task forces as a collection of executing processes that have a single high level objective to fulfill. She gave several examples of a language used to define task forces, so that certain parameters of a task force configuration could be varied to adjust to hardware variation due to failure and repair and to meet objectives of enhanced performance or reliability. Dimensions of task force variation include replication of data and processes, partitioning of data into separately addressable data structures, placement of information in specific memory modules, and process-to-processor assignment. The task force language was developed in conjunction with the StarOS operating system for the Cm* computer system.
The workshop raised many issues: basic assumptions, costs of processes and messages, models of communication, naming, binding, transparency, reliability and recoverability. All of these fundamental issues must be decided in order to construct a distributed system. It appears that these problems can be solved in a variety of ways and the effects of these decisions are not yet known. Even when the basic system is designed, however, its use poses even more problems.
Discussion at the workshop was energetic. It was apparent that the group did not share a single paradigm of what a distributed system is. The underlying assumptions and objectives of the group members varied, leading them to state problems in different terms and to defend different solution approaches quite staunchly.
Thus, we conclude that further research is needed on distributed computing, both in the design of distributed systems and in the use of distributed systems.
James Gray, Anita Jones, Barbara Liskov and Liba Svobodova have lent their assistance to this report.
Robert Chansler Carnegie-Mellon University Pittsburgh, PA David Clark M.I.T. Laboratory for Computer Science Cambridge, MA F. J. Corbato M.I.T. Laboratory for Computer Science Cambridge, MA Jack Dennis M.I.T. Laboratory for Computer Science Cambridge, MA James Gray IBM Research San Jose, CA Irene Greif M.I.T. Laboratory for Computer Science Cambridge, MA Carl Hewitt M.I.T. Laboratory for Computer Science Cambridge, MA Anita Jones Carnegie-Mellon University Pittsburgh, PA Barbara Liskov M.I.T. Laboratory for Computer Science Cambridge, MA James Low University of Rochester Rochester, NY Allen Luniewski M.I.T. Laboratory for Computer Science Cambridge, MA James Peterson University of Texas Austin, TX Gerald Popek University of California Los Angeles, CA David Redell Xerox System Development Division Palo Alto, CA David Reed M.I.T. Laboratory for Computer Science Cambridge, MA Paul Rovner University of Rochester Rochester, NY Jerome Saltzer M.I.T. Laboratory for Computer Science Cambridge, MA Craig Schaffert M.I.T. Laboratory for Computer Science Cambridge, MA Santosh Shrivastava University of Newcastle Upon Tyne England Howard Sturgis Xerox Research Palo Alto, CA Liba Svobodova M.I.T. Laboratory for Computer Science Cambridge, MA Stephen Ward M.I.T. Laboratory for Computer Science Cambridge, MA