Active/Active Clustering 101 - Sanity Solutions INC

Posted on July 24, 2017.

When setting up a server infrastructure, there are several options available to deploy a fault tolerant design that can sustain a node failure or multiple node failures. Many system administrators use an Active/Active clustering model since it eliminates downtime experienced by customers in the event of a node failure. Is Active/Active clustering right for your infrastructure? This guide is designed to help you learn more about this topology and its benefits.

What is Active/Active Clustering?

Also known as Dual Active Clustering, Active/Active clustering means that independent nodes in a networked server cluster have access to their own replicated database server. Essentially, multiple copies of the same application have their own server environment for the purpose of load balancing and redundant backup, which keep applications running at peak performance.

Since the databases are replicated, it mimics having only one instance of the application, allowing data to stay in sync. This scheme is called Continuous Availability because more servers are waiting to receive client connections to replicate the primary environment if a failover occurs. A failover is when a primary component or node fails and then switches the function of the failing primary component to a secondary system seamlessly as a backup failsafe mechanism.

The advantages of an Active/Active Clustering architecture are numerous. Organizations with little to no tolerance for service interruptions can rely on the resiliency and redundancy of critical components throughout this architecture. The capability to load balance and provide high transactional capability are additional benefits of this design.

Active/Active vs Active/Passive

In Active/Active Clustering, a set of companion nodes, a primary and a secondary, function in a relationship with their database. If the primary node fails, then the secondary picks up where it left off, and there are no outages. It handles the traffic until the primary node comes back online and resumes its responsibilities from the secondary node, in a procedure called a failback. Any unissued transactions are held and then resubmitted to the currently active server node during failover and failback so that no duplication occurs and the data is cleanly maintained.

In an Active/Passive scenario, a single server runs on either the primary or secondary node. The server runs on the primary node until a failover occurs, then the single primary server is restarted and relocated to the secondary node. The failback to the primary node is not necessary since the environment is already running on the secondary node. Client connections must then reconnect to the active node and resubmit their transactions when it comes back online in the failover and failback scenarios.

The success of any clustered approach is based on the concept of a Quorum. The quorum model ensures that whenever the cluster is running, enough members of the distributed system are operational and communicative, and at least one replica of current state can be guaranteed. The nodes communicate over a private network periodically notifying their companion nodes of system health status. When nodes are in a known good state they cast their vote based on specific algorithms. As a cluster grows in node count or a stretch cluster is implemented, the quorum becomes essential in safe guarding against a split scenario.

There are four quorum modes:

Node Majority: Each node that is available and in communication can vote. The cluster functions only with a majority, or more than half, of the votes.
Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes.
Node and File Share Majority: Each node plus a designated file share created by the administrator (the “file share witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes.
No Majority: Disk Only: The cluster has quorum if one node is available and in communication with a specific disk in the cluster storage.

Active/Passive is called a “high availability” solution. The downtime is very minuscule, but there is still a little time between when the first node fails and the secondary node picks up the traffic.

Benefits of Active/Active Clustering

Secondary servers are already active and waiting to receive connections, so there is no downtime during failover scenario.
The environment enjoys improved the processing capacity because two server nodes are actively running instead of one waiting Idly as it does in Active/Passive Clustering.
No extra hardware is needed since both methodologies use multiple server nodes. In Active/Active they are utilized to fit the strategy.
It’s easy to expand capacity as traffic increases. The network staff can easily add new nodes as necessary.
Reliability is also improved because there is no single point of failure. The primary server always has a ready backup.

Active/Active clustering is one of the most reliable and scalable server configurations currently being utilized. Learn more by contacting Sanity Solutions – we’re happy to answer any questions you may have about Active/Active clustering and network solutions.