Tuesday, January 10, 2017

vSAN Availability Part 10 - Stretched Cluster Preferred and Secondary Fault Domains

A vSAN stretched cluster consists of exactly three sites or, more accurately, three fault domains: The Preferred fault domain, the Secondary fault domain, and the Witness fault domain. This short article explains the difference between the Preferred and Secondary fault domains and how this difference affects recovery based on the type of failure.

The screen shot above shows one of the steps in configuring a vSAN stretched cluster. There are two fault domains that store data such as VM Home objects and virtual disks. These are the Preferred and Secondary fault domains. If a refresher is needed on vSAN objects and components, see vSAN Availability Part 2 - Storage Policies and Component Placement.

Currently, only the RAID-1 mirroring rule with Number of Failures to Tolerate = 1 are supported with a vSAN stretched cluster. That means one copy of the data is stored in the Preferred fault domain, one copy is stored in the Secondary fault domain, and the witness component is stored on the Witness host. This component distribution provides resilience against a disk failure, host failure, and the loss of an entire site (fault domain).

Back to the original question: What is the difference between the Preferred and Secondary fault domains?

When network connectivity between the Preferred and Secondary fault domains is lost, vSAN is no longer able to write data to both copies an object. Keep in mind vSAN data is not accessible if vSAN is unable to achieve quorum (access to more than 50% of the components that make up an object). The Preferred or Secondary fault domain must combine with the Witness host (third fault domain) to achieve quorum.

vSAN will automatically select the Preferred fault domain in this scenario. VMs running in the Preferred fault domain will continue to run without interruption. vSphere HA powers off VMs running in the Secondary fault domain and they are restarted in the Preferred fault domain. This is to preserve data integrity with minimal downtime.

When network connectivity is restored between the the Preferred and Secondary fault domains, changes at the Preferred site are synchronized to the Secondary site and operations return to normal (writes occurring synchronously to both sites). vSphere DRS affinity rules can be used to automatically migrate specific VMs back to the Secondary site.

As you can see, the concept of the Preferred and Secondary fault domains for vSAN stretched clusters is fairly simple: vSAN "prefers" running all of the VMs in the Preferred fault domain when there is a loss of network connectivity between the two data sites/fault domains. In part 11 of this series, we examine more failure scenarios and how vSAN responds to them.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.