High Availability clusters

On operating system level, two cluster architectures exist: High Performance clusters and High Availability clusters. This article describes high availability clusters. 

High availability clusters are groups of computers (nodes in a cluster) that can failover applications in case one of the computers fails.

Cluster software

Special clustering software is needed to setup an high availability cluster. The most popular choices for operating systems are:

This software is used to let applications running on a node in a cluster failover to another node as fast as possible. The software periodically (for instance every minute) checks if the application on a node still works as expected. If the application fails, a failover is initiated: the application is stopped on the failed node (if this is still possible), and restarted on another node in the cluster.

The intention is to have minimal interruptions for the end-users, so they can continue to work as if nothing happened.

Cluster-aware applications

The above description is used for cluster- unaware applications. The applications don't know they are running on a cluster. There are also cluster-aware applications.

An example of a cluster-aware application is Oracle RAC (Real Application Cluster). This way Oracle can run on multiple nodes at the same time, and can cope with node-failures. The end-users will not know a node failed (they might experience some reduced performance though).

Testing

It is crucial for High Availability Clusters to have them tested regularly.

I have experience with a  2-node HP-UX Serviceguard cluster, that was setup correctly once, but was never tested since. Everyone assumed the cluster would perform a correct failover in case of a node failure. But when after some years a node actually failed, the cluster did not function. A considerable amount of downtime was the result.

This could have been prevented if the cluster was tested a few times per year.


This entry was posted on Friday 06 April 2007

Earlier articles

Quantum computing

Security at cloud providers not getting better because of government regulation

The cloud is as insecure as its configuration

Infrastructure as code

DevOps for infrastructure

Infrastructure as a Service (IaaS)

(Hyper) Converged Infrastructure

Object storage

Software Defined Networking (SDN) and Network Function Virtualization (NFV)

Software Defined Storage (SDS)

What's the point of using Docker containers?

Identity and Access Management

Using user profiles to determine infrastructure load

Public wireless networks

Supercomputer architecture

Desktop virtualization

Stakeholder management

x86 platform architecture

Midrange systems architecture

Mainframe Architecture

Software Defined Data Center - SDDC

The Virtualization Model

What are concurrent users?

Performance and availability monitoring in levels

UX/UI has no business rules

Technical debt: a time related issue

Solution shaping workshops

Architecture life cycle

Project managers and architects

Using ArchiMate for describing infrastructures

Kruchten’s 4+1 views for solution architecture

The SEI stack of solution architecture frameworks

TOGAF and infrastructure architecture

The Zachman framework

An introduction to architecture frameworks

How to handle a Distributed Denial of Service (DDoS) attack

Architecture Principles

Views and viewpoints explained

Stakeholders and their concerns

Skills of a solution architect architect

Solution architects versus enterprise architects

Definition of IT Architecture

What is Big Data?

How to make your IT "Greener"

What is Cloud computing and IaaS?

Purchasing of IT infrastructure technologies and services

IDS/IPS systems

IP Protocol (IPv4) classes and subnets

Infrastructure Architecture - Course materials

Introduction to Bring Your Own Device (BYOD)

Fire prevention in the datacenter

Where to build your datacenter

Availability - Fall-back, hot site, warm site

Reliabilty of infrastructure components

Human factors in availability of systems

Business Continuity Management (BCM) and Disaster Recovery Plan (DRP)

Performance - Design for use

Performance concepts - Load balancing

Performance concepts - Scaling

Performance concept - Caching

Perceived performance

Ethical hacking

The first computers

Open group ITAC /Open CA Certification


Recommended links

Ruth Malan
Gaudi site
Esther Barthel's site on virtualization
Eltjo Poort's site on architecture


Feeds

 
XML: RSS Feed 
XML: Atom Feed 


Disclaimer

The postings on this site are my opinions and do not necessarily represent CGI’s strategies, views or opinions.

 

Copyright Sjaak Laan