The GENESIS parallelism management system employing concurrent process-creation services

https://doi.org/10.1016/S0141-9331(00)00093-4Get rights and content

Abstract

Clusters (of PCs or Workstations) are increasingly being chosen as platforms for the execution of parallel applications over traditional supercomputers owing to the excellent price-to-performance ratio held by clusters and the widespread availability of relatively inexpensive PCs/workstations and high-speed networks. Unfortunately, current operating systems and run-time environments do not provide satisfactory levels of support for parallel processing on clusters, forcing programmers to use third-party software that operates on top of existing network operating systems, which increases overheads and reduces flexibility. Creation of parallel processes suffers from poor performance because these processes are created in a sequential manner. This paper shows an original concurrent local and remote process-creation mechanism, a part of the GENESIS operating system that manages parallelism on clusters and, in particular, provides execution transparency and reduces the time required to initiate child processes of single program and multiple data (SPMD) parallel applications.

Introduction

The execution of parallel applications on clusters has gained popularity over recent years, owing primarily to the widespread availability of such systems throughout many scientific, educational and business organizations; the relatively low component (commodity PCs or workstations and networks) costs; and the vast computational potential located within clusters. This means that many users have the potential to benefit from parallel processing.

Two models operate with the program as the primary unit of execution: the single program and multiple data (SPMD) — data parallelism; and multiple program and multiple data (MPMD) — functional parallelism models. The SPMD model of parallelism is ideally suited to execution on clusters for three main reasons [5]: it offers high ratios of computation to communication; identical copies of the same program are employed to do the computation; and there is minimal interaction between the parent and parallel child processes. Unfortunately, the execution of parallel applications on clusters has been made difficult and inefficient owing to the poor level of support and transparency provided by current operating systems used to control the computers within the cluster. Current approaches, to simplify this situation, have tended to focus on the provision of execution environments that are built on top of these existing operating systems, which leads to increased overheads, duplication of code and poor execution performance. This is true for systems such as parallel virtual machine (PVM) [1] and message-passing interface (MPI) [10], [12]. Similar observations could be made regarding such systems as NOW [6], Paralex [13], Beowulf [15] and MOSIX [7], although the real progress has been made by the last system. The reason for this poor situation is that parallelism management, which is responsible for management of parallel process and computational resources, has been neglected [5].

An important stage in the execution of an SPMD parallel application is the initialization stage where the child processes, which perform the bulk of the application computation, are instantiated or created on a set of selected computers within the cluster. In traditional environments these child processes are created in a sequential manner, as the process-creation mechanisms provided by the underlying operating system only support single creation, and computers of a cluster used to support the application execution are selected manually. As the number of processes and computers used in the execution of the parallel application increases, so does the total instantiation time, which can lead to degradation in the overall performance of the application. Furthermore, the whole initialization process is a burden for programmers, error-prone, time-consuming and application-irrelevant.

The major goal of the research reported here was to design and develop a unique concurrent local and remote process-creation mechanism, a part of a parallelism management system, which is built as an integral component of a cluster operating system to provide execution transparency, and employs group communication services to reduce the time required to instantiate the child processes of SPMD parallel applications. The second goal was twofold. First, to show that using a simple parallel simulation and two common mathematical parallel applications (successive over-relaxation and quick sort problems), the overall performance can be improved when this unique concurrent process-creation mechanism is employed. Secondly, to show that the developed system provides full transparency and the user is relieved from operating system activities such as instantiation of all child processes and movement of these processes between computers during their execution, in such a manner that the load is balanced.

To demonstrate that the goals are achieved, Section 2 shows the design and implementation of the GENESIS parallelism management system with focus placed on the concurrent local and remote process-creation mechanism. Section 3 reports on the SPMD parallel applications used to test the performance of the concurrent process-creation mechanism and the performance results obtained. Section 4 demonstrates that execution transparency and ease of use are achieved. Section 5 presents related work. Conclusions and future work are provided in Section 6.

Section snippets

Managing parallelism in the GENESIS system

This section presents a transparent and efficient parallelism management system, as part of GENESIS, an operating system supporting parallel processing on a cluster, and the design and implementation of the concurrent process-creation mechanism that forms a key component of the GENESIS parallelism management system.

Performance of the concurrent process-creation service

The objective of this section is to present the performance results obtained from executing a variety of common SPMD parallel applications on a cluster controlled by GENESIS using the parallelism management system. The first experiment conducted was based on a simple parallel simulation, which enables the influence of the various creation methods to be clearly examined. The remaining experiments were performed on SPMD parallel applications chosen from a set of real-world mathematical problems,

Transparency and ease of use

The GENESIS parallelism management system provides the concurrent process-creation service in a transparent manner and relieves the programmer from activities such as: mapping processes to a virtual computer; process creation; and load balancing at the instantiation of a process and also during their execution. This is clearly demonstrated by the pseudocode presented in the previous sub-sections for each of the SPMD parallel applications, in which there is only one primitive required by the

Related work

A number of systems have been developed to support the execution of parallel applications on clusters. PVM [1] is an execution environment that runs on top of a variety of computers with the aim of providing parallel applications with a method of accessing the resources of the cluster. The PVM system was developed as a set of cooperating servers and a suite of specialized libraries that provide programmers with a set of consistent primitives for parallel process communication, execution and

Conclusions

Clusters of computers provide an extremely attractive platform upon which parallel applications, especially SPMD applications, can be executed. Currently available systems suffer from increased overheads and do not provide the user with satisfactory levels of transparency or support for parallelism. As a result, the overall performance achieved from parallel applications executed on clusters is degraded. We have presented in this paper a unique solution to this problem, which addresses the

Acknowledgements

This work was partly supported by the Small ARC Grant 0504003157.

Michael Hobbs received his BS (Honours) and PhD in computer science from Deakin University, Geelong, Australia, in 1995 and 1998, respectively. He is presently a software design engineer with the Storage Systems Program of Hewlett-Packard Laboratories in Palo Alto, California. His research interests include distributed operating systems, parallel and distributed processing, cluster computing and storage systems management.

References (17)

  • A. Barak et al.

    The MOSIX multicomputer operating system for high performance cluster computing

    Future Generation Computer Systems

    (1998)
  • D. Beguelin, J. Dongarra, A. Geist, R. Manchek, S. Otto, J. Walpole, PVM: experiences, current status and future...
  • A. Barak, A. Braverman, I. Gilderman, O. La'adan, Performance of PVM with the MOSIX preemptive process migration,...
  • M. Hobbs et al.

    The RHODOS remote process creation facility supporting parallel execution on distributed systems

    Journal of High Performance Computing

    (1996)
  • M. Hobbs et al.

    A concurrent process creation service to support SPMD based parallel processing on COWs

    Concurrency: Practice and Experience

    (1999)
  • A. Goscinski

    Towards an operating system managing parallelism of computing on clusters of workstations

    Future Generation Computer Systems

    (2000)
  • T. Anderson et al.

    A case for networks of workstations: NOW

    IEEE Micro

    (1995)
  • M. Hobbs, The management of SPMD based parallel processing on clusters of workstations, PhD thesis, School of Computing...
There are more references available in the full text version of this article.

Cited by (4)

Michael Hobbs received his BS (Honours) and PhD in computer science from Deakin University, Geelong, Australia, in 1995 and 1998, respectively. He is presently a software design engineer with the Storage Systems Program of Hewlett-Packard Laboratories in Palo Alto, California. His research interests include distributed operating systems, parallel and distributed processing, cluster computing and storage systems management.

Andrzej M. Goscinski is a chair professor of computing at Deakin University. He received his MSc, PhD and DSc from the Staszic University of Mining and Metallurgy, Krakow, Poland. Dr Goscinski is recognized as one of the leading researchers in distributed systems, distributed operating systems and parallel processing on clusters. The results of his research have been published in international refereed journals and conference proceedings and presented at specialized conferences. In 1997, Dr Goscinski and his research group initiated a study of the design and development of a cluster operating system supporting parallelism management and offering a single system image. The first version of this system has been in use from the end of 1998. Currently, Dr Goscinski is carrying out research into global computing, based on distributed, networked and parallel systems, to support the information economy, in particular, electronic commerce and knowledge acquisition and management.

View full text