1 00:00:00,840 --> 00:00:03,240 Windows failover clustering. 2 00:00:03,240 --> 00:00:06,940 Now AZ‑801 has a lot of failover clustering, 3 00:00:06,940 --> 00:00:10,860 so this is a Windows feature that you really need to be sharp on. 4 00:00:10,860 --> 00:00:16,410 I dare say if you go into AZ‑801 not very clear on failover clustering, 5 00:00:16,410 --> 00:00:20,240 you probably won't clear the exam, it's that big of a topic. 6 00:00:20,240 --> 00:00:25,120 A cluster, in general, is a group of independent hosts, basically computers. 7 00:00:25,120 --> 00:00:29,270 These can be physical computers or virtual machines that provide high 8 00:00:29,270 --> 00:00:33,300 availability and scalability to clustered roles, though the bottom 9 00:00:33,300 --> 00:00:37,400 line is you have a SQL Server database, you could have file shares, 10 00:00:37,400 --> 00:00:41,390 you could have Hyper‑V virtual machines that are running services and 11 00:00:41,390 --> 00:00:44,510 offering data for which you may have service‑level agreements, 12 00:00:44,510 --> 00:00:48,490 so it's really important that you keep those services available, 13 00:00:48,490 --> 00:00:50,540 that's high availability. 14 00:00:50,540 --> 00:00:52,840 And normally when it comes to high availability, 15 00:00:52,840 --> 00:00:54,840 we're talking about redundancy, 16 00:00:54,840 --> 00:00:58,770 and that again is a core principle of a failover cluster. 17 00:00:58,770 --> 00:01:02,380 The more nodes you have in the cluster, the more redundancy you have. 18 00:01:02,380 --> 00:01:05,420 Now there's some Microsoft‑specific twists to the 19 00:01:05,420 --> 00:01:07,940 subject of failover clustering. 20 00:01:07,940 --> 00:01:09,810 One is the Clustered Shared Volume, 21 00:01:09,810 --> 00:01:12,890 which you'll know a lot about by the end of this course and 22 00:01:12,890 --> 00:01:14,970 certainly by the end of the skill path. 23 00:01:14,970 --> 00:01:18,310 A Cluster Shared Volume is a way to support two 24 00:01:18,310 --> 00:01:20,910 specific types of clustered roles. 25 00:01:20,910 --> 00:01:22,840 There's the Scale‑Out File Server, 26 00:01:22,840 --> 00:01:26,860 which from the name you might think this is highly available file shares, 27 00:01:26,860 --> 00:01:27,800 and that's true. 28 00:01:27,800 --> 00:01:30,970 And then there's highly available Hyper‑V VMs, 29 00:01:30,970 --> 00:01:35,530 these are definitely the marquis clustered roles that most 30 00:01:35,530 --> 00:01:38,420 Microsoft customers use with failover clustering. 31 00:01:38,420 --> 00:01:40,660 I would add SQL Server actually, 32 00:01:40,660 --> 00:01:44,840 but you're not going to see SQL Server on your AZ‑801 exams. 33 00:01:44,840 --> 00:01:49,840 But these all function excellently, if that's a legitimate term, 34 00:01:49,840 --> 00:01:54,240 when you're using commodity storage, Storage Spaces Direct, 35 00:01:54,240 --> 00:01:56,150 and Cluster Shared Volumes. 36 00:01:56,150 --> 00:01:59,940 The idea is that you can do your clustering end‑to‑end with 37 00:01:59,940 --> 00:02:02,790 your commodity physical servers or VMs, 38 00:02:02,790 --> 00:02:07,040 and the Windows Server Failover Clustering and Storage Spaces Direct features. 39 00:02:07,040 --> 00:02:09,660 You don't necessarily have to have a dedicated SAN, 40 00:02:09,660 --> 00:02:13,220 or storage area network for your shared storage layer. 41 00:02:13,220 --> 00:02:16,750 We also learn in this course about Cluster‑Aware Updating, 42 00:02:16,750 --> 00:02:18,440 which picks up the question, 43 00:02:18,440 --> 00:02:22,480 how do we maintain the integrity and availability of our failover cluster 44 00:02:22,480 --> 00:02:25,350 when we have to patch the servers from time to time, 45 00:02:25,350 --> 00:02:29,020 particularly with Microsoft provided updates? Guest 46 00:02:29,020 --> 00:02:32,670 clustering is a vocab term that refers to setting up a 47 00:02:32,670 --> 00:02:36,840 failover cluster virtually using Hyper‑V VMs. 48 00:02:36,840 --> 00:02:38,390 That's absolutely possible. 49 00:02:38,390 --> 00:02:42,980 And stretch clustering is another vocab term that refers to taking a 50 00:02:42,980 --> 00:02:46,510 failover cluster and stretching it across multiple sites, 51 00:02:46,510 --> 00:02:52,840 so this gives your cluster potentially multi‑region high availability. 52 00:02:52,840 --> 00:02:55,880 Let's take a look at a couple of conceptual diagrams 53 00:02:55,880 --> 00:02:59,160 I've put together with Lucidchart, my favorite drawing tool. 54 00:02:59,160 --> 00:03:03,040 This shows a topology of Windows Server Failover Clustering. 55 00:03:03,040 --> 00:03:06,880 What is something that all failover clusters, 56 00:03:06,880 --> 00:03:09,240 whether it's Microsoft or non‑Microsoft, 57 00:03:09,240 --> 00:03:11,740 has in common is the idea of shared storage, 58 00:03:11,740 --> 00:03:15,980 and hopefully it makes sense that you need that shared access 59 00:03:15,980 --> 00:03:19,970 to the storage layer because this is your data that you're 60 00:03:19,970 --> 00:03:22,220 providing high availability for. 61 00:03:22,220 --> 00:03:25,640 Think of a file share with files. 62 00:03:25,640 --> 00:03:28,080 Think of SQL Server databases. 63 00:03:28,080 --> 00:03:30,640 Think of virtual machines. 64 00:03:30,640 --> 00:03:33,650 In this case, we have a number of nodes as they're called, 65 00:03:33,650 --> 00:03:35,770 these are Windows Server machines. 66 00:03:35,770 --> 00:03:38,160 I'm dealing with Active Directory in this course, 67 00:03:38,160 --> 00:03:41,080 so I'm going to assume that all of these nodes are domain members. 68 00:03:41,080 --> 00:03:44,510 That's assumed here, not even implied, it's assumed. 69 00:03:44,510 --> 00:03:48,960 And the idea is that you can do active‑passive high availability. 70 00:03:48,960 --> 00:03:50,460 So in this example, 71 00:03:50,460 --> 00:03:55,540 Node2 is active for a particular virtual machine, that highly 72 00:03:55,540 --> 00:03:58,340 available VM is called a clustered role. 73 00:03:58,340 --> 00:04:04,140 The idea of failover is if Node2 goes down either expectedly or unexpectedly, 74 00:04:04,140 --> 00:04:07,470 we can either manually or have Windows Server take care of 75 00:04:07,470 --> 00:04:10,860 shifting or moving that role to another node. 76 00:04:10,860 --> 00:04:14,090 Node3 would have been passive for that VM, 77 00:04:14,090 --> 00:04:19,240 but after the failover completes it will become the active node for VM. 78 00:04:19,240 --> 00:04:22,150 And the idea is when Node2 comes back online, 79 00:04:22,150 --> 00:04:24,980 you can shift the clustered role, that's called failback, 80 00:04:24,980 --> 00:04:27,220 back to the original node, 81 00:04:27,220 --> 00:04:32,240 or you might keep it at Node3 and keep that as an active node. 82 00:04:32,240 --> 00:04:35,910 The idea of quorum, which we'll get into later in this course, 83 00:04:35,910 --> 00:04:40,660 deals with what happens when one or more cluster nodes becomes unavailable. 84 00:04:40,660 --> 00:04:40,820 See, 85 00:04:40,820 --> 00:04:43,900 they communicate, these Windows Servers communicate with each other 86 00:04:43,900 --> 00:04:46,460 using what are called cluster heartbeat messages. 87 00:04:46,460 --> 00:04:50,540 TCP/UDP port 3343 is standard. 88 00:04:50,540 --> 00:04:54,090 And the idea is if a node stops issuing heartbeats, 89 00:04:54,090 --> 00:04:58,530 then the quorum needs to each vote to determine do we have majority, 90 00:04:58,530 --> 00:05:01,340 do we have a majority of our nodes here? 91 00:05:01,340 --> 00:05:05,130 And depending upon whether you start with an even or odd number of nodes, 92 00:05:05,130 --> 00:05:07,440 you may need to bring in a witness. 93 00:05:07,440 --> 00:05:10,380 And there's a few different types of witness that we'll talk about, 94 00:05:10,380 --> 00:05:12,810 and they can serve as a tie‑breaking vote. 95 00:05:12,810 --> 00:05:13,160 So, 96 00:05:13,160 --> 00:05:16,790 bottom line is quorum determines how aggressive your 97 00:05:16,790 --> 00:05:19,740 cluster is in terms of deciding majority and whether it 98 00:05:19,740 --> 00:05:22,340 should shut down or stay available. 99 00:05:22,340 --> 00:05:25,230 Now VMs are kind of an exception for clustered roles 100 00:05:25,230 --> 00:05:28,270 because you'll continue to access highly available or 101 00:05:28,270 --> 00:05:31,740 clustered VMs on their own IP addresses. 102 00:05:31,740 --> 00:05:32,680 But normally, 103 00:05:32,680 --> 00:05:36,480 as you can see with a Scale‑Out File Server, we've got a shared 104 00:05:36,480 --> 00:05:41,000 folder that's accessible on a DNS name and IP address. 105 00:05:41,000 --> 00:05:46,040 So we have what's called a client access point or a virtual IP address, 106 00:05:46,040 --> 00:05:50,460 and this is convenient because then we have a single DNS IP 107 00:05:50,460 --> 00:05:53,640 address that anchors to that shared role. 108 00:05:53,640 --> 00:05:57,550 And the idea is the cluster is responsible for resolving 109 00:05:57,550 --> 00:06:00,450 which node is active for those requests, 110 00:06:00,450 --> 00:06:04,450 but the requests themselves will always go to the client access points. 111 00:06:04,450 --> 00:06:07,780 So your application and your connection string or wherever 112 00:06:07,780 --> 00:06:14,000 your incoming request for service is just has to worry about getting to the client access point.