Thursday, February 16, 2017

vSAN Availability Part 12 - Data Protection and Disaster Recovery

This is the final article in my vSAN Availability blog series. To wrap things up, I thought it made sense to briefly discuss data protection, replication, and disaster recovery. I say "briefly" because it would be nearly impossible to cover all of the various solutions and options that are available both from VMware and its massive ecosystem of data protection and disaster recovery partners. There will be mention of a few solutions and product features in this article - again, this is not a comprehensive list. We will start with data protection, i.e., backup and restore, move on to replication, and finish with disaster recovery.

Data Protection

VMware vSphere includes vSphere APIs for Data Protection (VADP), which provides a framework and mechanisms such as Changed Block Tracking (CBT) to data protection vendors including Dell EMC, Veeam, Symantec, Commvault, and IBM to name just a few. These vendors utilize the APIs and redistribute code provided by VMware to enable efficient backup and recovery of virtual machines. The use of VADP also helps ensure compatibility with a number of features such as vMotion and various vSphere storage types including vSAN. In other words, the majority of backup solutions will be able to back up VMs on vSAN the same as traditional VMFS and NFS datastores.

An important thing to point out here is that the backup vendors provide the products. VMware simply provides the APIs (to the backup vendors). Each vendor utilizes the APIs in different ways. Therefore, it is up to the vendors to test/certify their solution(s) and provide support statements to their customers. The reason I point this out is I often get asked "What backup products does VMware support?" The answer to that is none - the backup vendors provide support for their products.

A number of vendors support backing up VMs residing on vSAN datastores. Be sure to check with your backup vendor to determine version requirements and understand best practices for deployment and operations.

One more item I'll mention while on the topic of data protection: snapshots. While the use of traditional VM snapshots has provided significant benefits for backup and recovery, there are also a few challenges with this approach. These snapshots use redo logs to store changes while backups and, in some cases, restores are being performed. I am sure a few individuals reading this article have probably experienced stability and/or performance issues with traditional VM snapshots. I will not dig into that here as it has been covered numerous times in many other publications. What I am getting to is that vSAN 6.0 and higher versions use a new type of snapshot called vsanSparse. This method uses a copy-on-write (COW) method, which provides considerable improvements over redo log-based snapshots. If you want to read more about vsanSparse snapshots, see the vsanSparse Tech Note. Long story short, one more good reason to use vSAN is improved backup and recovery.


There are a number of host-based replication solutions that can be used to replicate VMs such as vSphere Replication and Dell EMC RecoverPoint for VMs. vSphere Replication is an asynchronous replication feature that is included with vSphere Essentials Plus Kit and higher editions of vSphere. Since these solutions are host-based (as opposed to array-based - some differences documented here), they can replicate VMs residing on just about any type of vSphere storage including vSAN.

Similar to the use of VADP for backup solutions, VMware provides vSphere APIs for IO Filtering (VAIO), which enables solutions such as Dell EMC RecoverPoint for VMs to provide efficient replication and very low RPOs with little or no impact to the protected VMs. For more details on VAIO, see vSphere APIs for IO Filtering (blog article) and vSphere APIs for I/O Filtering (VAIO) Program on the VMware Code site.

Since vSphere Replication is included with most editions of vSphere at no additional charge, I'll spend a few moments on this feature. There are basically two components: The replication "transmitter", which is built into vSphere, and the replication "receiver", which is a virtual appliance. When replication is configured for a VM, vSphere Replication creates a copy of the files that make up the VM at the target location. After the initial full sync, only changes to the VM are replicated.

The frequency of replication is based on the RPO configured for the replicated VM. For example, if the RPO is set to four hours, vSphere Replication will replicate changed data approximately every four hours to help ensure the replica data at the target site is never more than four hours old. This video shows how easy it is to configure replication for a VM. Notice that vSphere Replication is compatible with and "aware of" vSAN storage policies.

Recovering a VM with vSphere Replication is also quite simple - it is a matter of just a few mouse clicks. However, only one VM can be recovered at a time. This is fine if you are only recovering a few VMs, but it becomes a bit more cumbersome when recovering dozens or hundreds of VMs. The good news is vSAN and vSphere Replication are compatible with VMware Site Recovery Manager (SRM), which automates the process of recovering larger numbers of replicated VMs. The mention of SRM is a perfect segue in to our next topic, disaster recovery.

Disaster Recovery

The keys to a good disaster recovery (DR) solution are speed and reliability. Naturally, organizations want to minimize downtime in the event of a disaster by using tools that provide the fastest recovery times. Automation is the main ingredient that facilitates speed of recovery. Reliability of a DR plan is just as important as speed. Frequent testing is the best way to help ensure reliability - especially in IT environments where change is constant.

SRM delivers on both speed through automation and reliability by enabling frequent, non-disruptive DR plan testing. SRM is tightly integrated with vSphere Replication. Multiple groups of protected VMs can be configured and included in one or more recovery plans giving organizations the flexibility needed to satisfy varying DR requirements.

Another important consideration is the speed of the storage at the DR site. This is especially true when recovering larger numbers of VMs, which is effectively a "boot storm". All-flash vSAN configurations are ideal for this use case as shown in this video: Recover 1000 VMs in 26 mins with SRM & VR on vSAN

I have been asked if it is possible to utilize a vSAN stretched cluster and SRM together and the answer is yes. The vSAN stretched cluster would provide protection for two sites located relatively close together while SRM with vSphere Replication enables disaster recovery in scenarios where both sides of the stretched cluster are impacted by an event such as a large hurricane or power grid failure. The diagram below provides a simple, high-level look at what this architecture might look like.


vSAN, vSphere Replication, and SRM provide an integrated solution for ensuring the highest levels of availability. As we have seen throughout this blog series, these products work together to provide resiliency against disk failure, host failure, and loss of a server rack with vSAN fault domains. vSAN stretched clusters with vSphere HA enable disaster avoidance with no downtime and rapid recovery from an unplanned site outage. Layering on the proper data protection, replication, and disaster recovery solutions creates an environment that can recover from nearly any kind of disruptive scenario in a rapid and reliable manner.

Hopefully, this blog series has provided plenty of insight on how vSAN achieves outstanding resiliency in a wide variety scenarios and rapid recovery from unplanned downtime. As you might expect, products and features will continue to evolve to further eliminate and minimize downtime. Follow me on Twitter - @jhuntervmware - to hear about these and other topics in related to virtualized storage and availability.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.