1 00:00:00,200 --> 00:00:01,540 ‫So let's talk about Health Checks 2 00:00:01,540 --> 00:00:02,940 ‫in Route 53. 3 00:00:02,940 --> 00:00:05,140 ‫So health checks are a way for you to check 4 00:00:05,140 --> 00:00:07,750 ‫the health of mainly public resources, 5 00:00:07,750 --> 00:00:09,040 ‫although there's a way for us to do it 6 00:00:09,040 --> 00:00:11,640 ‫for private resources as well, as we'll see in this lecture. 7 00:00:11,640 --> 00:00:12,640 ‫So the idea is that, for example, 8 00:00:12,640 --> 00:00:15,590 ‫we have two Load balancers in different regions 9 00:00:15,590 --> 00:00:17,920 ‫and they're public load balancers, okay? 10 00:00:17,920 --> 00:00:18,753 ‫And behind the scenes, 11 00:00:18,753 --> 00:00:20,670 ‫we have our application running in both of them. 12 00:00:20,670 --> 00:00:22,670 ‫So we're running into a multi-region setup 13 00:00:22,670 --> 00:00:24,690 ‫because we want high availability, and so on, 14 00:00:24,690 --> 00:00:25,940 ‫at the region level. 15 00:00:25,940 --> 00:00:29,690 ‫Then we're going to use Route 53 to create DNS records. 16 00:00:29,690 --> 00:00:32,051 ‫So that's when users access our URL, for example, 17 00:00:32,051 --> 00:00:35,860 ‫mydomain.com, then they get redirected to, for example, 18 00:00:35,860 --> 00:00:38,040 ‫the closest load balancer they have. 19 00:00:38,040 --> 00:00:41,510 ‫So this would be the case with a latency type of record. 20 00:00:41,510 --> 00:00:44,000 ‫But we want to make sure that, if one region is down, 21 00:00:44,000 --> 00:00:46,340 ‫then we don't send our users to that region, 22 00:00:46,340 --> 00:00:47,410 ‫obviously, right? 23 00:00:47,410 --> 00:00:48,330 ‫So to do so, 24 00:00:48,330 --> 00:00:50,970 ‫we're going to create health checks from Route 53. 25 00:00:50,970 --> 00:00:53,990 ‫So we'll create health checks on the one in us-east-1, 26 00:00:53,990 --> 00:00:56,210 ‫and we will create a health check on our instance 27 00:00:56,210 --> 00:00:58,530 ‫in eu-west-1. 28 00:00:58,530 --> 00:00:59,930 ‫Well, with these two health checks, 29 00:00:59,930 --> 00:01:01,860 ‫we're going to be able to associate them 30 00:01:01,860 --> 00:01:04,420 ‫with our Route 53 records. 31 00:01:04,420 --> 00:01:08,120 ‫And the reason we do so is to get automated DNS failover. 32 00:01:08,120 --> 00:01:10,590 ‫So we have three health checks that are possible. 33 00:01:10,590 --> 00:01:12,150 ‫The ones I just showed you, which are the health check 34 00:01:12,150 --> 00:01:14,160 ‫that monitor an endpoint, which is a public endpoint. 35 00:01:14,160 --> 00:01:16,450 ‫So it could be an application, a server, 36 00:01:16,450 --> 00:01:18,070 ‫or another AWS resource. 37 00:01:18,070 --> 00:01:18,920 ‫It could be a health check 38 00:01:18,920 --> 00:01:20,640 ‫that monitors other health checks, 39 00:01:20,640 --> 00:01:22,850 ‫also called a calculated health check, 40 00:01:22,850 --> 00:01:23,790 ‫or it could be a health check 41 00:01:23,790 --> 00:01:25,550 ‫that monitors a CloudWatch Alarm, 42 00:01:25,550 --> 00:01:27,950 ‫which gives you more control and is helpful for private 43 00:01:27,950 --> 00:01:30,070 ‫resources as we'll see in this lecture. 44 00:01:30,070 --> 00:01:32,430 ‫Finally, these health checks have their own metric 45 00:01:32,430 --> 00:01:35,290 ‫and you can view them in CloudWatch metrics as well. 46 00:01:35,290 --> 00:01:37,280 ‫So let's look at how health checks work 47 00:01:37,280 --> 00:01:38,260 ‫with a specific endpoint. 48 00:01:38,260 --> 00:01:41,860 ‫So if we have a health check for eu-west-1, for an ALB, 49 00:01:41,860 --> 00:01:44,140 ‫then the health checkers of AWS 50 00:01:44,140 --> 00:01:45,980 ‫are coming from all around the world. 51 00:01:45,980 --> 00:01:47,440 ‫So it's not just one health checker. 52 00:01:47,440 --> 00:01:49,940 ‫It's about 15 health checkers from all around the world. 53 00:01:49,940 --> 00:01:51,580 ‫And they're all going to send requests 54 00:01:51,580 --> 00:01:55,020 ‫into our public endpoint to wherever routes we set. 55 00:01:55,020 --> 00:01:58,950 ‫And then if it gets 200 OK code back or the code we defined, 56 00:01:58,950 --> 00:02:01,140 ‫then the resource is deemed healthy. 57 00:02:01,140 --> 00:02:02,930 ‫So about 15 global health checkers 58 00:02:02,930 --> 00:02:04,310 ‫will check the endpoint health, 59 00:02:04,310 --> 00:02:07,360 ‫and then you can set a threshold for healthy or unhealthy. 60 00:02:07,360 --> 00:02:08,260 ‫You can set an interval, 61 00:02:08,260 --> 00:02:09,630 ‫so we have two options. 62 00:02:09,630 --> 00:02:12,210 ‫It could be either 30 seconds for regular health checks 63 00:02:12,210 --> 00:02:14,390 ‫or every 10 seconds, which is a higher cost, 64 00:02:14,390 --> 00:02:16,490 ‫which is what's called a fast health check. 65 00:02:16,490 --> 00:02:20,860 ‫It supports many protocols, so HTTP, and HTTPS, and TCP. 66 00:02:20,860 --> 00:02:24,400 ‫And the rule is that if over 18% of the health checkers 67 00:02:24,400 --> 00:02:26,250 ‫say that the endpoint is healthy, 68 00:02:26,250 --> 00:02:28,500 ‫then Route 53 will consider it healthy, 69 00:02:28,500 --> 00:02:30,670 ‫otherwise it's deemed unhealthy. 70 00:02:30,670 --> 00:02:31,760 ‫And you have the ability to choose 71 00:02:31,760 --> 00:02:34,380 ‫which locations you want to use for the health checks. 72 00:02:34,380 --> 00:02:36,770 ‫Now the health checks will only pass if you have the status 73 00:02:36,770 --> 00:02:40,537 ‫2xx or 3xx status code back from the load balancer 74 00:02:40,537 --> 00:02:42,660 ‫and the health check has a cool capability. 75 00:02:42,660 --> 00:02:45,570 ‫So if it is a text-based response, 76 00:02:45,570 --> 00:02:50,473 ‫then the health checkers can check the first 5,120 bytes 77 00:02:50,473 --> 00:02:52,160 ‫of the response to look for some specific texts 78 00:02:52,160 --> 00:02:53,910 ‫in the response itself. 79 00:02:53,910 --> 00:02:56,400 ‫Finally, very important from a network perspective, 80 00:02:56,400 --> 00:02:58,970 ‫if you want for it to work, obviously, 81 00:02:58,970 --> 00:03:01,880 ‫the health checkers must be able to access your 82 00:03:01,880 --> 00:03:04,340 ‫Application Balancer or whatever endpoints you have. 83 00:03:04,340 --> 00:03:06,710 ‫And so therefore you must allow incoming requests 84 00:03:06,710 --> 00:03:09,730 ‫coming from the Route 53 health checkers' IP address range. 85 00:03:09,730 --> 00:03:12,310 ‫And you can find this address range at the URL 86 00:03:12,310 --> 00:03:14,840 ‫in the bottom right of the screen. 87 00:03:14,840 --> 00:03:16,550 ‫Now the second type of health checks we have 88 00:03:16,550 --> 00:03:18,430 ‫are calculated health checks. 89 00:03:18,430 --> 00:03:20,100 ‫And so this is to combine the results 90 00:03:20,100 --> 00:03:22,450 ‫of multiple health checks into a single health check. 91 00:03:22,450 --> 00:03:24,160 ‫And so if you look at Route 53, 92 00:03:24,160 --> 00:03:25,320 ‫with three EC2 instance, 93 00:03:25,320 --> 00:03:27,150 ‫we can create three health checks. 94 00:03:27,150 --> 00:03:28,560 ‫They're all going to be children health check, 95 00:03:28,560 --> 00:03:31,770 ‫and they can all monitor each EC2 instance one by one. 96 00:03:31,770 --> 00:03:33,930 ‫And then we can define a parent health check, 97 00:03:33,930 --> 00:03:35,410 ‫which is going to be defined 98 00:03:35,410 --> 00:03:38,110 ‫on all these child health checks. 99 00:03:38,110 --> 00:03:40,360 ‫And so the conditions to combine all these health checks 100 00:03:40,360 --> 00:03:43,270 ‫could be an OR, an AND, or a NOT. 101 00:03:43,270 --> 00:03:47,120 ‫You can monitor up to 256 child health checks, 102 00:03:47,120 --> 00:03:49,240 ‫and you can specify how many of the health checks 103 00:03:49,240 --> 00:03:51,790 ‫need to pass to make the parent pass. 104 00:03:51,790 --> 00:03:53,061 ‫So the use case for this, 105 00:03:53,061 --> 00:03:54,660 ‫for example, if you want to have 106 00:03:54,660 --> 00:03:56,530 ‫a parent health check to perform maintenance 107 00:03:56,530 --> 00:03:58,110 ‫on your website without causing 108 00:03:58,110 --> 00:04:00,160 ‫all the health checks to fail. 109 00:04:00,160 --> 00:04:03,700 ‫And so how do we monitor the health of a private resource? 110 00:04:03,700 --> 00:04:06,897 ‫So in case you want to monitor something private, 111 00:04:06,897 --> 00:04:08,030 ‫it's going to be difficult because 112 00:04:08,030 --> 00:04:09,930 ‫while all the Route 53 health checkers 113 00:04:09,930 --> 00:04:12,800 ‫live on the public web, they're outside of your VPC, 114 00:04:12,800 --> 00:04:14,710 ‫so they cannot access private endpoints. 115 00:04:14,710 --> 00:04:18,020 ‫So if it's a private VPC or an on-premises resource. 116 00:04:18,020 --> 00:04:19,860 ‫And so the way we can do it, though, 117 00:04:19,860 --> 00:04:21,930 ‫is to create a CloudWatch Metric 118 00:04:21,930 --> 00:04:24,200 ‫and assign a CloudWatch Alarm on it. 119 00:04:24,200 --> 00:04:25,960 ‫And then you can assign the CloudWatch Alarm 120 00:04:25,960 --> 00:04:27,220 ‫into the health checker. 121 00:04:27,220 --> 00:04:28,700 ‫So the idea is that we're going to monitor 122 00:04:28,700 --> 00:04:31,332 ‫the health of our EC2 instance in a private subnet 123 00:04:31,332 --> 00:04:32,750 ‫with a CloudWatch Metric. 124 00:04:32,750 --> 00:04:34,810 ‫And then if the metric is breached, 125 00:04:34,810 --> 00:04:37,230 ‫we're going to create a CloudWatch Alarm on it. 126 00:04:37,230 --> 00:04:39,810 ‫And when the alarm goes into the alarm state, 127 00:04:39,810 --> 00:04:41,500 ‫then the health checker is going to be 128 00:04:41,500 --> 00:04:43,100 ‫automatically unhealthy 129 00:04:43,100 --> 00:04:45,310 ‫and therefore will have created exactly what we want, 130 00:04:45,310 --> 00:04:48,140 ‫which is a health check on a private resource, 131 00:04:48,140 --> 00:04:50,460 ‫which is the most common use case on how to do it. 132 00:04:50,460 --> 00:04:51,770 ‫So that's it for this lecture. 133 00:04:51,770 --> 00:04:54,720 ‫I hope you liked it and I will see you in the next lecture.