Degraded Disk Handling (DDH)
While this blog series focuses on availability, performance is certainly worth mentioning. In many cases, a poorly performing application or platform can be the equivalent of offline. For example, excessive latency (network, disk, etc.) can cause a database query to take much longer than normal. If an end-user expects query results in 30 seconds and suddenly it takes 10 minutes, it is likely the end-user will stop using the application and report the issue to IT - same result as the database being offline altogether.
A cache or capacity device that is constantly producing errors and/or high latencies can have a similar negative effect on a Virtual SAN (VSAN) cluster. This can impact multiple workloads in the cluster. Prior to VSAN 6.1, a badly behaving disk caused issues in a hand-full of cases, which led to another VSAN availability feature. It is commonly called Dying Disk Handling, Degraded Disk Handling, or simply "DDH".
Virtual SAN (VSAN) 6.1 and newer versions monitor cache and capacity devices for issues such as excessive latency and errors. These symptoms can be indicative of an imminent drive failure. Monitoring these conditions enables VSAN to be proactive in correcting conditions such as excessive latencies, which negatively affects performance and availability. Depending on the version of VSAN you are running, you might see varying responses to disks that are behaving badly.