Wednesday, September 14, 2016

Virtual SAN Availability Part 3 - Network Partitions

VSAN Utilizes the Network

VSAN consists of two or more physical hosts typically connected by a 10GbE networking. 1GbE is supported with hybrid VSAN configurations, but 10GbE is recommended. 10GbE is required for all-flash VSAN clusters. These network connection are required to replicate data across hosts for redundancy and to share metadata updates such as the location of an object's components.

As with any storage fabric, redundant connections are strongly recommended. The VMware Virtual SAN 6.2 Network Design Guide provides more details on network configurations and recommendations. Considering VSAN's dependence on the network, this often brings up questions around what happens if one of more hosts lose network connectivity with other hosts in the cluster. This article aims to address those questions.

Every host in the cluster must have a vmkernel adapter with the Virtual SAN traffic service enabled, as shown in the screen shot below.

The VSAN heath check service verifies a vmkernel adapter with the Virtual SAN traffic service enabled is present on each host. the health check service also verifies a number of other network configurations and connectivity, as shown below.

Network Partition Groups

A module of VSAN called the Cluster Monitoring, Membership, and Directory Services (CMMDS) is responsible for monitoring links to the cluster. A master node (host) and a backup node are elected in a VSAN cluster. The master node is responsible for dispatching an update provided by one node to all other nodes in the cluster. In a healthy cluster where all nodes are communicating over the VSAN network, there is only one master node and a backup node. All of the nodes in this healthy cluster are members of the same network partition group. Normally, there is only one network partition groups named "Group 1" as seen in this screen shot.

If both the master and the backup nodes go offline, the CMMDS participates in electing a new master node and backup node. In the case where one or more hosts become isolated from each other, VSAN will attempt to form a VSAN cluster and elect a master in each partition. In other words, it is possible to have more than one network partition group.

VSAN and vSphere High Availability (HA) Integration

vSphere HA required some enhancements to work well with VSAN. Perhaps the most significant change is how HA now uses the same network connections as VSAN when VSAN is enabled. This helps ensure that vSphere HA and VSAN see the same network partition(s) in a cluster. Having a consistent view of network partitions across VSAN and HA enable accurate and reliable responses to host isolation and multiple network partitions.

There are a number of recommendations to consider when configuring vSphere HA for a VSAN cluster. This blog article, VMware Virtual SAN & vSphere HA Recommendations, discusses the recommendations in detail. The article discusses vSphere 5.5 and VSAN 5.5. The recommendations also apply to vSphere 6.0 and VSAN 6.x.

Let's take a look at a couple of scenarios involving the loss of network connectivity.

Single Host is Isolated

Remember that an object is available if >50% of its components (more specifically, votes) are active and accessible. There are a minimum of three components when Number of Failures to Tolerate (FTT) = 1. There are two replicas, which contain mirror copies of the data, and a witness, which acts as a tie breaker when the network is partitioned.

As we saw in the screen shot above, a second network partition group (Group 2) is created when a host becomes isolated from the rest of the cluster. Components are distributed across hosts for availability. In the case of FTT=1 and RAID-1 mirroring, a minimum of three components are present. Each one resides on a different host. That means the isolated host has access to, at most, one of the three components that make up an object.

There is no way for this host by itself to achieve quorum for any of the objects. A VM can run on the isolated host, but it will not be able to read or write to storage as it does not have access to its objects (VM Home, VMDKs, etc.).

To resolve this issue, vSphere HA powers off running VMs on the isolated host and attempts to restart them on other hosts in the cluster. These other hosts in network partition group 1 have access to enough components (more than 50%) to enable access to the objects. The VMs are restarted successfully even though one of the hosts are offline. Net result: No data loss and downtime for the VMs on the affected host is minimized.

All Hosts Are Disconnected

A loss of network connectivity between all of the hosts in a VSAN cluster is more troublesome. Each host in the cluster effectively forms its own single-node cluster. In other words, there are multiple network partition groups. This screen shot shows four network partition groups as the VSAN network on all hosts in this 4-node cluster is disconnected.

Each host still has access to the data on its local drives. The challenge in this scenario is components are commonly distributed across multiple hosts. Therefore, it is not possible for any of the disconnected hosts to have access to  greater than 50% of the components. Even if one of the components that make up a VM's object is located on the same host where the VM is running, that is still access to less than 50%. In other words, any VMs where FTT=1 or higher will not be able to read and write to any of its objects.

The VMs might continue to run from a compute and memory standpoint, but they will not be able to write to storage. VMs in this state are sometimes referred to as "zombie" VMs - not to be confused with "monster" VMs :) - and they often end up crashing. The good news is data is not lost and access to objects will be restored when network connectivity is restored. Any "zombie" VMs might have to be reset to resume normal operations.

A corner-case scenario when a VM might be able to survive the disconnection of all hosts is when a VM has a policy assigned with FTT=0. The component(s) that make up the object could be located on the same host where the VM is running. However, it is more likely components will be located on other hosts - even in a small, 3-node cluster. This likelihood increases as the number of hosts in the cluster increase. Therefore, assigning a policy with FTT=0 is not an effective measure to counteract the effects if multiple host disconnections.


We have already covered most of these items, but I thought it made sense to summarize...

1. Design your VSAN network to be resilient just like any other storage fabric.

2. Consult the VMware Virtual SAN 6.2 Network Design Guide.

3. Read the VMware Virtual SAN & vSphere HA Recommendations blog article and Using vSphere HA with Virtual SAN documentation.

In Part 4 of this series, we look at some component states and discuss the VSAN rebuild timer.


No comments:

Post a Comment