HOME | Cluster Computing Links | Cluster Server | Parallel Computing | List of Clusters | Super Computers |
What is Clustering?
A cluster is a group of independent computers working together as a single system to ensure that mission-critical applications and resources are as highly available as possible. The group is managed as a single system, shares a common namespace, and is specifically designed to tolerate component failures and to support the addition or removal of components in a way that is transparent to users.
Thus clustering means linking together two or more systems to handle variable workloads or to provide continued operation in the event one fails. Each computer may be a multiprocessor system itself. For example, a cluster of four computers, each with two CPUs, would provide a total of eight CPUs processing simultaneously. When clustered, these computers behave like a single computer and are used for load balancing, fault tolerance, and parallel processing.
Two or more servers that have been configured in a cluster use a heartbeat mechanism to continuously monitor each other’s health. Each server sends the other an I am OK message at regular intervals. If several messages or heartbeats are missed, it is assumed that a server has failed and the surviving server begins the failover operation. That is, the surviving server assumes the identity of the failed server in addition to its identity and recovers and restores the network interfaces, storage connections, and applications. Clients are then reconnected to their applications on the surviving server.
The minimum requirements for a server cluster are (a) two servers connected by a network, (b) a method for each server to access the other’s disk data, and (c) special cluster software like Microsoft Cluster Service (MSCS). The special software provides services such as failure detection, recovery, and the ability to manage the service as a single system.
Availability, scalability and to a lesser extent, investment protection and simplified administration are all touted as benefits from clustering technology. Availability translates into decreased downtime, scalability translates into flexible growth, and investment protection and simplified administration translate into lowered cost of ownership. Clustered systems bring fault-tolerance and support for rolling upgrades.
The most common uses of clustering technique are mission-critical database management, file/intranet data sharing, messaging, and general business applications.
There are two main different types of cluster models in the industry.
In the shared device model, applications running within a cluster can access any hardware resource connected to any node in the cluster. This forces the accessing of data to be synchronized. There is a special component called a Distributed Lock Manager (DLM), which is a service that manages access to cluster hardware resources. When multiple applications access the same resource, the DLM resolves any sort of conflicts if any.
Though DLM brings sophistication and complexity, it adds significant overhead to the cluster resulting in performance hit.
But the other type, referred to as, shared nothing model, does not use this DLM component. So clustered systems does not have this overhead. In the shared nothing model, only one node can own and access a single hardware resource at any given time. When failure occurs, a surviving node can take ownership of the failed node’s resources and make them available for users.
Microsoft Cluster Service(MSCS) was built into Microsoft Windows NT Server 4.0 - Enterprise Edition. An updated version of the cluster service is built into Windows 2000 Advanced Server and Windows 2000 Datacenter server. Because Microsoft’s clustering implementation uses Intel-based platforms and standard networking technology, it provides a low-cost clustering solution. Cluster service has also been designed for use by smaller organizations that can not afford a highly skilled administrative staff. Microsoft has designed wizards in the cluster Administrator to automate and simplify cluster configuration. With Windows 2000, Windows Internet Name Service (WINS), Distributed File Services (DFS), and dynamic host configuration protocol (DHCP) are now cluster-aware services that can be automatically failed over.
Microsoft’s Windows clusters are made up of two servers and a set of disks that are physically attached to both servers. Although the disks are connected to both servers, only one server may support, own and have access privileges to a particular disk at any time.
When one server fails, the other server restarts the applications that were running on the failed server and takes over ownership of its disks and other resources. The term shared storage is used to differentiate this type of cluster design from a cluster design where each disk is physically duplicated and connected to a different server and where all data changes are written to both sets of disks. The latter architecture is referred to as a mirrored storage cluster design.
MSCS supports clusters of two computers running Microsoft NT Server - Enterprise Edition. MSCS is comprised of two main components: clustering software and the clustering administrator. The clustering software enables the two servers of a cluster to exchange specific types of messages that trigger the transfer of resources at the appropriate time. The clustering software has two primary components: the cluster service and the resource monitor. The cluster service runs on each cluster server. It controls cluster activity, communication between cluster servers, and failure operations. The resource monitor handles communication between the cluster service and the application resources.
The cluster administrator is a graphical application that is used to manage a cluster. The network applications, data files, and other tools available in the systems are the cluster resources, which provide services to network clients. A resource is hosted on only one node at any time.
Windows Load Balancing Service (WLBS) is a load balancing feature for Windows NT TCP/IP applications that supports load balancing and clustering for Web-based services such as Internet Information Server (IIS) (Web, FTP, etc.), streaming media, virtual private networking (VPN), and Microsoft Proxy Server. Network Load Balancing is the name of the TCP/IP application load balancing software in Windows 2000. NLB clusters distribute client connections over multiple servers, providing stability and high availability for TCP/IP-based services and applications.
From a software perspective, MSCS on Windows NT is lagging in management functionality. MSCS views each cluster as an island unto itself, making it cumbersome for an administrator to manage multiple clusters in one enterprise.
With the Windows 2000 cluster service, Microsoft addressed cluster administrative deficiencies, added availability improvements, and provided better integration with underlying Windows 2000 technologies making it easier to develop and deploy applications.
Windows 2000 Datacenter server provides enterprise-class scalability, being capable of supporting SMP systems of up to 32 processors. Windows 2000 Datacenter server also supports four-node clustering with the promise of future enhancements in terms of cluster size, integration with other operating system services, and ease of management improvements.
|