1
00:00:00,840 --> 00:00:03,240
Windows failover clustering.

2
00:00:03,240 --> 00:00:06,940
Now AZ‑801 has a lot of failover clustering,

3
00:00:06,940 --> 00:00:10,860
so this is a Windows feature that you really need to be sharp on.

4
00:00:10,860 --> 00:00:16,410
I dare say if you go into AZ‑801 not very clear on failover clustering,

5
00:00:16,410 --> 00:00:20,240
you probably won't clear the exam, it's that big of a topic.

6
00:00:20,240 --> 00:00:25,120
A cluster, in general, is a group of independent hosts, basically computers.

7
00:00:25,120 --> 00:00:29,270
These can be physical computers or virtual machines that provide high

8
00:00:29,270 --> 00:00:33,300
availability and scalability to clustered roles, though the bottom

9
00:00:33,300 --> 00:00:37,400
line is you have a SQL Server database, you could have file shares,

10
00:00:37,400 --> 00:00:41,390
you could have Hyper‑V virtual machines that are running services and

11
00:00:41,390 --> 00:00:44,510
offering data for which you may have service‑level agreements,

12
00:00:44,510 --> 00:00:48,490
so it's really important that you keep those services available,

13
00:00:48,490 --> 00:00:50,540
that's high availability.

14
00:00:50,540 --> 00:00:52,840
And normally when it comes to high availability,

15
00:00:52,840 --> 00:00:54,840
we're talking about redundancy,

16
00:00:54,840 --> 00:00:58,770
and that again is a core principle of a failover cluster.

17
00:00:58,770 --> 00:01:02,380
The more nodes you have in the cluster, the more redundancy you have.

18
00:01:02,380 --> 00:01:05,420
Now there's some Microsoft‑specific twists to the

19
00:01:05,420 --> 00:01:07,940
subject of failover clustering.

20
00:01:07,940 --> 00:01:09,810
One is the Clustered Shared Volume,

21
00:01:09,810 --> 00:01:12,890
which you'll know a lot about by the end of this course and

22
00:01:12,890 --> 00:01:14,970
certainly by the end of the skill path.

23
00:01:14,970 --> 00:01:18,310
A Cluster Shared Volume is a way to support two

24
00:01:18,310 --> 00:01:20,910
specific types of clustered roles.

25
00:01:20,910 --> 00:01:22,840
There's the Scale‑Out File Server,

26
00:01:22,840 --> 00:01:26,860
which from the name you might think this is highly available file shares,

27
00:01:26,860 --> 00:01:27,800
and that's true.

28
00:01:27,800 --> 00:01:30,970
And then there's highly available Hyper‑V VMs,

29
00:01:30,970 --> 00:01:35,530
these are definitely the marquis clustered roles that most

30
00:01:35,530 --> 00:01:38,420
Microsoft customers use with failover clustering.

31
00:01:38,420 --> 00:01:40,660
I would add SQL Server actually,

32
00:01:40,660 --> 00:01:44,840
but you're not going to see SQL Server on your AZ‑801 exams.

33
00:01:44,840 --> 00:01:49,840
But these all function excellently, if that's a legitimate term,

34
00:01:49,840 --> 00:01:54,240
when you're using commodity storage, Storage Spaces Direct,

35
00:01:54,240 --> 00:01:56,150
and Cluster Shared Volumes.

36
00:01:56,150 --> 00:01:59,940
The idea is that you can do your clustering end‑to‑end with

37
00:01:59,940 --> 00:02:02,790
your commodity physical servers or VMs,

38
00:02:02,790 --> 00:02:07,040
and the Windows Server Failover Clustering and Storage Spaces Direct features.

39
00:02:07,040 --> 00:02:09,660
You don't necessarily have to have a dedicated SAN,

40
00:02:09,660 --> 00:02:13,220
or storage area network for your shared storage layer.

41
00:02:13,220 --> 00:02:16,750
We also learn in this course about Cluster‑Aware Updating,

42
00:02:16,750 --> 00:02:18,440
which picks up the question,

43
00:02:18,440 --> 00:02:22,480
how do we maintain the integrity and availability of our failover cluster

44
00:02:22,480 --> 00:02:25,350
when we have to patch the servers from time to time,

45
00:02:25,350 --> 00:02:29,020
particularly with Microsoft provided updates? Guest

46
00:02:29,020 --> 00:02:32,670
clustering is a vocab term that refers to setting up a

47
00:02:32,670 --> 00:02:36,840
failover cluster virtually using Hyper‑V VMs.

48
00:02:36,840 --> 00:02:38,390
That's absolutely possible.

49
00:02:38,390 --> 00:02:42,980
And stretch clustering is another vocab term that refers to taking a

50
00:02:42,980 --> 00:02:46,510
failover cluster and stretching it across multiple sites,

51
00:02:46,510 --> 00:02:52,840
so this gives your cluster potentially multi‑region high availability.

52
00:02:52,840 --> 00:02:55,880
Let's take a look at a couple of conceptual diagrams

53
00:02:55,880 --> 00:02:59,160
I've put together with Lucidchart, my favorite drawing tool.

54
00:02:59,160 --> 00:03:03,040
This shows a topology of Windows Server Failover Clustering.

55
00:03:03,040 --> 00:03:06,880
What is something that all failover clusters,

56
00:03:06,880 --> 00:03:09,240
whether it's Microsoft or non‑Microsoft,

57
00:03:09,240 --> 00:03:11,740
has in common is the idea of shared storage,

58
00:03:11,740 --> 00:03:15,980
and hopefully it makes sense that you need that shared access

59
00:03:15,980 --> 00:03:19,970
to the storage layer because this is your data that you're

60
00:03:19,970 --> 00:03:22,220
providing high availability for.

61
00:03:22,220 --> 00:03:25,640
Think of a file share with files.

62
00:03:25,640 --> 00:03:28,080
Think of SQL Server databases.

63
00:03:28,080 --> 00:03:30,640
Think of virtual machines.

64
00:03:30,640 --> 00:03:33,650
In this case, we have a number of nodes as they're called,

65
00:03:33,650 --> 00:03:35,770
these are Windows Server machines.

66
00:03:35,770 --> 00:03:38,160
I'm dealing with Active Directory in this course,

67
00:03:38,160 --> 00:03:41,080
so I'm going to assume that all of these nodes are domain members.

68
00:03:41,080 --> 00:03:44,510
That's assumed here, not even implied, it's assumed.

69
00:03:44,510 --> 00:03:48,960
And the idea is that you can do active‑passive high availability.

70
00:03:48,960 --> 00:03:50,460
So in this example,

71
00:03:50,460 --> 00:03:55,540
Node2 is active for a particular virtual machine, that highly

72
00:03:55,540 --> 00:03:58,340
available VM is called a clustered role.

73
00:03:58,340 --> 00:04:04,140
The idea of failover is if Node2 goes down either expectedly or unexpectedly,

74
00:04:04,140 --> 00:04:07,470
we can either manually or have Windows Server take care of

75
00:04:07,470 --> 00:04:10,860
shifting or moving that role to another node.

76
00:04:10,860 --> 00:04:14,090
Node3 would have been passive for that VM,

77
00:04:14,090 --> 00:04:19,240
but after the failover completes it will become the active node for VM.

78
00:04:19,240 --> 00:04:22,150
And the idea is when Node2 comes back online,

79
00:04:22,150 --> 00:04:24,980
you can shift the clustered role, that's called failback,

80
00:04:24,980 --> 00:04:27,220
back to the original node,

81
00:04:27,220 --> 00:04:32,240
or you might keep it at Node3 and keep that as an active node.

82
00:04:32,240 --> 00:04:35,910
The idea of quorum, which we'll get into later in this course,

83
00:04:35,910 --> 00:04:40,660
deals with what happens when one or more cluster nodes becomes unavailable.

84
00:04:40,660 --> 00:04:40,820
See,

85
00:04:40,820 --> 00:04:43,900
they communicate, these Windows Servers communicate with each other

86
00:04:43,900 --> 00:04:46,460
using what are called cluster heartbeat messages.

87
00:04:46,460 --> 00:04:50,540
TCP/UDP port 3343 is standard.

88
00:04:50,540 --> 00:04:54,090
And the idea is if a node stops issuing heartbeats,

89
00:04:54,090 --> 00:04:58,530
then the quorum needs to each vote to determine do we have majority,

90
00:04:58,530 --> 00:05:01,340
do we have a majority of our nodes here?

91
00:05:01,340 --> 00:05:05,130
And depending upon whether you start with an even or odd number of nodes,

92
00:05:05,130 --> 00:05:07,440
you may need to bring in a witness.

93
00:05:07,440 --> 00:05:10,380
And there's a few different types of witness that we'll talk about,

94
00:05:10,380 --> 00:05:12,810
and they can serve as a tie‑breaking vote.

95
00:05:12,810 --> 00:05:13,160
So,

96
00:05:13,160 --> 00:05:16,790
bottom line is quorum determines how aggressive your

97
00:05:16,790 --> 00:05:19,740
cluster is in terms of deciding majority and whether it

98
00:05:19,740 --> 00:05:22,340
should shut down or stay available.

99
00:05:22,340 --> 00:05:25,230
Now VMs are kind of an exception for clustered roles

100
00:05:25,230 --> 00:05:28,270
because you'll continue to access highly available or

101
00:05:28,270 --> 00:05:31,740
clustered VMs on their own IP addresses.

102
00:05:31,740 --> 00:05:32,680
But normally,

103
00:05:32,680 --> 00:05:36,480
as you can see with a Scale‑Out File Server, we've got a shared

104
00:05:36,480 --> 00:05:41,000
folder that's accessible on a DNS name and IP address.

105
00:05:41,000 --> 00:05:46,040
So we have what's called a client access point or a virtual IP address,

106
00:05:46,040 --> 00:05:50,460
and this is convenient because then we have a single DNS IP

107
00:05:50,460 --> 00:05:53,640
address that anchors to that shared role.

108
00:05:53,640 --> 00:05:57,550
And the idea is the cluster is responsible for resolving

109
00:05:57,550 --> 00:06:00,450
which node is active for those requests,

110
00:06:00,450 --> 00:06:04,450
but the requests themselves will always go to the client access points.

111
00:06:04,450 --> 00:06:07,780
So your application and your connection string or wherever

112
00:06:07,780 --> 00:06:14,000
your incoming request for service is just has to worry about getting to the client access point.