Monday, December 5, 2016

vSAN Availability Part 9 - Configuring a Stretched Cluster

A vSAN stretched cluster configuration provides a simple solution for extending a vSAN cluster across geographically disbursed locations. These locations could be opposite sides of a data center with separate power feeds or two different cities. Stretched clusters enable rapid recovery from site failure with no data loss. They also provide an excellent option for migrating workloads between locations with zero downtime if maintenance at one site or the other is needed. For more of an introduction to vSAN stretched clusters, take a look at the previous article (Part 8) in this blog series.

This article covers the simplicity of configuring a vSAN stretched cluster complete with a video demo. Before we get to the video, let's take a moment to cover the main prerequisites that must be in place prior to configuring the stretched cluster. For starters, note that stretched clusters are supported with vSAN versions 6.1 and higher and vSAN Enterprise licensing is required for this feature.

Monday, November 7, 2016

vSphere Replication Target Storage Consumption

How Much Space Will Be Consumed?

vSphere Replication is an asynchronous, host-based replication feature that is included with vSphere Essentials Plus Kit and higher editions. It can be used as a standalone solution for simple, storage-agnostic, cost-effective virtual machine replication. vSphere Replication also serves as a replication component for VMware Site Recovery Manager (SRM) and VMware vCloud Air Disaster Recovery. When replication is configured for a powered on virtual machine, vSphere Replication starts replicating the files that make up the virtual machine from the source location to the target location. A question that comes up sometimes is “How much storage will be consumed by the virtual machine at the target location?” As with many questions like this, the short answer is “It depends.”   🙂

Friday, November 4, 2016

vSAN Availability Part 8 - Intro to vSAN Stretched Clusters

First, Virtual SAN's New Name: vSAN

The previous seven blog articles are titled "Virtual SAN Availability..." and this article starts with "vSAN". Why the name change you ask? Duncan Epping discusses it in this blog article: Virtual SAN >> vSAN, and grown to 5500 customers.

Now that the name change is out of the way, let's get back to the topic of vSAN availability. Earlier posts discussed availability within a single site. The next few articles will cover resiliency across sites using a vSAN stretched cluster. We will begin with a short introduction to this vSAN feature.

Monday, October 10, 2016

Virtual SAN Availability Part 7 - Degraded Disk Handling (DDH)

Degraded Disk Handling (DDH)

While this blog series focuses on availability, performance is certainly worth mentioning. In many cases, a poorly performing application or platform can be the equivalent of offline. For example, excessive latency (network, disk, etc.) can cause a database query to take much longer than normal. If an end-user expects query results in 30 seconds and suddenly it takes 10 minutes, it is likely the end-user will stop using the application and report the issue to IT - same result as the database being offline altogether.

A cache or capacity device that is constantly producing errors and/or high latencies can have a similar negative effect on a Virtual SAN (VSAN) cluster. This can impact multiple workloads in the cluster. Prior to VSAN 6.1, a badly behaving disk caused issues in a hand-full of cases, which led to another VSAN availability feature. It is commonly called Dying Disk Handling, Degraded Disk Handling, or simply "DDH".

Virtual SAN (VSAN) 6.1 and newer versions monitor cache and capacity devices for issues such as excessive latency and errors. These symptoms can be indicative of an imminent drive failure. Monitoring these conditions enables VSAN to be proactive in correcting conditions such as excessive latencies, which negatively affects performance and availability. Depending on the version of VSAN you are running, you might see varying responses to disks that are behaving badly.

Friday, September 30, 2016

Virtual SAN Availability Part 6 - Maintenance Mode

Planned Downtime

The last few articles in this series focused on unplanned downtime. While we still have more to cover there, let's briefly shift focus to planned downtime. The primary example of planned downtime is host maintenance. There are a number of reasons a vSphere host might need to be taken offline such as firmware updates, storage device replacement, and software patches.

vSphere has a feature designed specifically for these types of activities. It is called "maintenance mode". When a host is put into maintenance mode, vSphere automatically evacuates the running virtual machines to other hosts in the cluster. This is done with vMotion so that virtual machine downtime is not incurred. This can take just a few minutes or several minutes depending on factors such as the number of virtual machines that must be migrated and vMotion network speed. Once all of the virtual machines have been evacuated, the host enters maintenance mode and work on that host can begin.

Virtual SAN introduces another consideration, which is the utilization of local storage devices inside of each host. These devices contain components that make up Virtual SAN objects. Shutting down or rebooting a host naturally makes these components inaccessible until the host is back online. Let's take a closer look at how vSphere's maintenance mode has been enhanced for Virtual SAN clusters.

Thursday, September 22, 2016

Virtual SAN Availability Part 5 - Fault Domains

Fault Domains

"Fault domain" is a term that comes up fairly often in availability discussions. In IT, a fault domain usually refers to a group of servers, storage, and/or networking components that would be impacted collectively by an outage. A very common example of this is a server rack. If a top-of-rack (TOR) switch or the power distribution unit (PDU) for a server rack would fail, it would take all of the servers in that rack offline even though the servers themselves are functioning properly. That server rack is considered a fault domain.

Virtual SAN (VSAN) includes a feature called "Rack Awareness", which enables an administrator to configure fault domains in the context of a Virtual SAN cluster. Before we get into the details of this feature, let's briefly revisit the default behavior of VSAN.

Monday, September 19, 2016

Virtual SAN Availability Part 4 - Component States

VSAN Component States

VSAN components can be found in a few different states. The most common state is Active, which means the component is accessible and is up to date. Below we see two components that are Active.

Another fairly common component state is Reconfiguring. This state is observed when a change to a storage policy is made or a new storage policy is assigned to an object. For example, when the Failure Tolerance Method is changed from RAID-1 mirroring to RAID-5/6 erasure coding on an all-flash VSAN cluster. The screen shot below shows a component in the Reconfiguring state.

There are other component states related to availability that are observed when a disk or host is offline. Let's take a closer look at these states.