|
These pages contain basic information on clusters and cluster technologies analysed within the first part of the CRO-GRID Infrastructure project.
General information on clusters is given bellow, while individual cluster technologies are described at separate web pages.
Cluster technologies described are:
- job management systems
- cluster monitoring systems
- standard parallel libraries
- automatic installation systems
- OS level clusters
- cluster benchmarking
- cluster distributions
Cluster is defined as a group of independent computers linked with a computational network, operating as a single computer. Clusters were developed as an alternative to expensive multiprocessor supercomputers.
Features of a system consisting a group of components is to hide its complexity and present itself as a single system is called Single System Image SSI, hereinafter SSI. The table bellow contains features a cluster should have to satisfy the SSI feature. Clusters that have bellow listed features are called SSI clusters.
Feature | Description | Single entry point | System is accessed as if it is a single computer. | Single User Interface | Users use single interface when working with a cluster. | Single control point | Cluster management and monitoring is accomplished by means of a one system. | Single job management | One system manages all jobs performed at the cluster. | Single file hierarchy | Clients see the same directory organization, i.e. they use the same file system. | Single virtual networking | Clients see themselves as linked into one network, even though in reality they can be linked into several networks. | Single memory space | Programs see single memory space. | Single I/O space * | Programs see singe I/O space. | Single process space * | Processes at clients are performed in single process space, i.e. single space of process identifiers. | Checkpointing and process migration * | Check pointing of processes performed allows restarting the process from the last stores point in an instance of error. Process migration allows transparent migration from a malfunctioning client to a functioning one. |
* features which are not required for a cluster to be perceived as a SSI cluster.
Clusters can have SSI features at three levels: at the hardware level, at the operating system level and at the application level. SSI at the hardware level is accomplished by means of special hardware which enables user to see computers in the cluster as a single computer. SSI at the OS level consists of special operating systems or OSs with additional features which create image of a single machine. SSI at the application level is accomplished by means of a set of computational programmes called cluster middleware.
Cluster middleware is a set of programmes which allow cluster clients to operate as one. The picture bellow shows cluster architecture with certain components of the cluster middleware:
- job management system
- cluster monitoring system
- parallel libraries
- automatic client installation systems
- cluster management tools
- distributed and parallel file systems
- global process space
Job Management System (JMS) is a cluster middleware component, responsible for controlling, scheduling, monitoring and performing users’ applications. Monitoring system performs monitoring of clients status and allows display of client components’ workload. Standard parallel libraries allow development and performing parallel applications. Automatic client installation systems enable automatic installation and configuration of OS and a set of programmes onto cluster clients. These four components have been described in more detail at the above mentioned pages.
Cluster management tools consist of a set of tools administrators use for direct client management. Examples of this are commands execution tools at all clients or tools for saving files onto all clients.
Partitioned file systems allow user to use the same file system at all clients. Network File System (NFS) is most commonly used. Parallel file systems allow files to be processed simultaneously at several clients. Thereby, better performances in working with large files are achieved.
System for accomplishing global process space enables that all processes on the clients have unique identifiers at the OS level. Using that system, user can control all processes at all clients. Global processing space system is not necessary for cluster operations and it has not been accomplished in most clusters.
There are several cluster differentiations. One of the possible differentiations is: Beowulf clusters and Networks of Workstations (NOW) or Clusters of Workstations (COW).
Beowulf cluster consists of a group of computers which have no peripherals (keyboard, monitor). In Beowulf cluster one client differs from all others and is called front-end. Front-end represents the core of the cluster, and certain cluster middleware systems servers are located there. Specifically, file system and job management system servers, cluster monitoring system and automatic client installation system are located at the front -end. Front-end and clients are linked into a private network which is physically isolated from the public network. More efficient communication between clients is thereby established. Front-end has two network interfaces: one towards the public network, and the other towards the private network. Users using Beowulf clusters are working only at the front-end, which therefore makes a single point of access.
Network of workstations or computers consists of a group of computers used on everyday basis, and at times when they are not used they are used for performing cluster operations. Cluster middleware in networks of workstations must allow monitoring of computers’ workload. Furthermore, job management system must allow for transfer of jobs from the client which becomes loaded i.e. when its owner starts to use it, onto some unloaded client. Beowulf clusters achieve better performances than networks of workstations. Some of the advantages of the Beowulf clusters over the network of workstation are the following: private computational network, dedicated computers and computers make one administrative domain. Beowulf cluster clients linked to the computational network which is physically isolated from the public network, therefore there is not background load of the network.
Computers forming clients are used only for cluster purposes, therefore there is no need to monitor background load of the computer. Because the clients are used only for cluster purposes, it is possible to perform client OS core tuning in order to achieve better features of the whole system. The fact that the computers are in the private local network makes administration easier and increases security.
Another form of cluster differentiation is differentiation according to the type of computing it is intended for: High Performance Computing (HPC), High Throughput Computing HTC and High Availability clusters (HA).
HPC refers to a set of applications which require extremely high computing power. Typical HPC applications are parallel applications whose sub processes are tightly connected and they exchange large quantity of information. HPC applications are suitable for Beowulf cluster class.
HTC refers to a set of applications consisting of a large number of mutually independent tasks. HTC applications are suitable for networks of workstations.
The last class refers to clusters where applications which are vital part of some system and which a have to be permanently active are performed. Examples of applications which require HA are web servers, mail servers and directory servers. HA clusters will not be described in detail because they are not necessary for the needs of the CRO-GRID project.
|