1 00:00:00,120 --> 00:00:02,580 ‫Welcome to this section on Cloud Monitoring. 2 00:00:02,580 --> 00:00:04,620 ‫In this section, we're going to know how we can 3 00:00:04,620 --> 00:00:06,780 ‫get a better idea and a better picture 4 00:00:06,780 --> 00:00:09,930 ‫of the performance of our cloud deployments. 5 00:00:09,930 --> 00:00:11,740 ‫So the first service I want to talk about 6 00:00:11,740 --> 00:00:13,420 ‫is CloudWatch Metrics. 7 00:00:13,420 --> 00:00:17,330 ‫CloudWatch provides metrics for every service in AWS, 8 00:00:17,330 --> 00:00:20,220 ‫and a Metric is a variable to monitor. 9 00:00:20,220 --> 00:00:23,480 ‫For example, the CPUUtilization or the NetworkIn. 10 00:00:23,480 --> 00:00:25,720 ‫The metrics are going through the time, 11 00:00:25,720 --> 00:00:27,400 ‫so they will have timestamps, 12 00:00:27,400 --> 00:00:30,260 ‫and if you want to visualize all your metrics at once, 13 00:00:30,260 --> 00:00:33,570 ‫you can create a CloudWatch dashboard of metrics. 14 00:00:33,570 --> 00:00:37,160 ‫So here is an example of a very common metric 15 00:00:37,160 --> 00:00:40,050 ‫for CloudWatch which is called the Billing metric. 16 00:00:40,050 --> 00:00:43,740 ‫So this metric is only available in us-east-I, 17 00:00:43,740 --> 00:00:46,790 ‫so only in one region, and it represents the total amounts 18 00:00:46,790 --> 00:00:49,550 ‫you have spent on your AWS cloud. 19 00:00:49,550 --> 00:00:52,960 ‫So obviously, at every month end it will reset back 20 00:00:52,960 --> 00:00:55,020 ‫in to zero, but as you can see, 21 00:00:55,020 --> 00:00:58,180 ‫over time that metric goes up and then will go back to zero, 22 00:00:58,180 --> 00:01:00,510 ‫and so this month I have spent more than $100 23 00:01:00,510 --> 00:01:04,070 ‫because I am experimenting with different AWS services. 24 00:01:04,070 --> 00:01:05,750 ‫So this is one metric, but obviously, 25 00:01:05,750 --> 00:01:08,010 ‫there are a ton more metrics we can look at. 26 00:01:08,010 --> 00:01:10,170 ‫For example, for our EC2 instances, 27 00:01:10,170 --> 00:01:11,730 ‫we can look at the CPU Utilization, 28 00:01:11,730 --> 00:01:14,580 ‫which is how much we are making the CPU work 29 00:01:14,580 --> 00:01:16,630 ‫and if it makes it work a lot, 30 00:01:16,630 --> 00:01:18,320 ‫than maybe our instance is too busy 31 00:01:18,320 --> 00:01:21,130 ‫and we need to scale it up or scale it out. 32 00:01:21,130 --> 00:01:23,680 ‫The Status Check to make sure that our EC2 instance 33 00:01:23,680 --> 00:01:26,330 ‫is properly functioning, and the Network, 34 00:01:26,330 --> 00:01:28,960 ‫to see how much network is going in our instance 35 00:01:28,960 --> 00:01:30,580 ‫and out our instance. 36 00:01:30,580 --> 00:01:33,360 ‫As you can see, the RAM is not an available metric 37 00:01:33,360 --> 00:01:35,350 ‫for your EC2 instances. 38 00:01:35,350 --> 00:01:38,160 ‫These metrics you get every five minutes by default, 39 00:01:38,160 --> 00:01:40,010 ‫but you can enable a Detailed Monitoring, 40 00:01:40,010 --> 00:01:41,280 ‫which is more expensive, 41 00:01:41,280 --> 00:01:43,780 ‫to get these metrics every one minute. 42 00:01:43,780 --> 00:01:45,410 ‫Then you have EBS volumes, 43 00:01:45,410 --> 00:01:47,190 ‫which are where you store your data, 44 00:01:47,190 --> 00:01:48,900 ‫and you get information about the amount 45 00:01:48,900 --> 00:01:51,310 ‫of disk read and writes that are happening. 46 00:01:51,310 --> 00:01:53,390 ‫Then for your S3 buckets, you can get some information 47 00:01:53,390 --> 00:01:56,600 ‫around the bucket size and bytes, the number of objects, 48 00:01:56,600 --> 00:01:59,770 ‫or the number of requests done into your S3 buckets, 49 00:01:59,770 --> 00:02:01,610 ‫and the Billing metric that just shows you, 50 00:02:01,610 --> 00:02:03,350 ‫shows you the total estimated charge 51 00:02:03,350 --> 00:02:05,690 ‫for your account only in us-east-I, 52 00:02:05,690 --> 00:02:07,900 ‫but it's for your entire account. 53 00:02:07,900 --> 00:02:10,120 ‫Then the Service Limits, which is how much you have been 54 00:02:10,120 --> 00:02:12,640 ‫using a service API, or finally, 55 00:02:12,640 --> 00:02:14,250 ‫if you don't find the metric you like, 56 00:02:14,250 --> 00:02:17,480 ‫you can push your own custom metrics. 57 00:02:17,480 --> 00:02:19,530 ‫Next lets talk about CloudWatch Alarms. 58 00:02:19,530 --> 00:02:23,170 ‫So Alarms are used to trigger notifications for any metric, 59 00:02:23,170 --> 00:02:26,480 ‫and that means that once a metric goes above a threshold, 60 00:02:26,480 --> 00:02:29,350 ‫then we can have a CloudWatch Alarm action, 61 00:02:29,350 --> 00:02:31,790 ‫and these actions can be for an auto scaling group 62 00:02:31,790 --> 00:02:33,260 ‫to increase or decrease the number 63 00:02:33,260 --> 00:02:35,510 ‫of EC2 instances desired counts 64 00:02:35,510 --> 00:02:37,680 ‫effectively allowing your auto scaling group 65 00:02:37,680 --> 00:02:39,500 ‫to scale automatically. 66 00:02:39,500 --> 00:02:42,730 ‫EC2 Actions, if you want to stop, terminate, reboot, 67 00:02:42,730 --> 00:02:45,260 ‫or recover an EC2 instance, 68 00:02:45,260 --> 00:02:48,370 ‫and SNS notifications if you wanted to send a notification 69 00:02:48,370 --> 00:02:49,810 ‫into an SNS topic. 70 00:02:49,810 --> 00:02:51,250 ‫For example, you're saying, okay, 71 00:02:51,250 --> 00:02:54,410 ‫if my EC2 instance has a utilization 72 00:02:54,410 --> 00:02:57,310 ‫of over 90%, then send us an email 73 00:02:57,310 --> 00:02:59,990 ‫because we want to look at it and something's wrong. 74 00:02:59,990 --> 00:03:02,560 ‫Then you get various options for creating the alarm, 75 00:03:02,560 --> 00:03:05,130 ‫sampling, percentage, max, min, et cetera, 76 00:03:05,130 --> 00:03:07,160 ‫and you can which, you can choose the period on which 77 00:03:07,160 --> 00:03:08,980 ‫to evaluate an alarm, whether it be five minutes, 78 00:03:08,980 --> 00:03:11,690 ‫ten minutes, an hour, and then finally, 79 00:03:11,690 --> 00:03:13,750 ‫you can create what's called a billing alarm 80 00:03:13,750 --> 00:03:15,620 ‫on the CloudWatch Billing metric, 81 00:03:15,620 --> 00:03:19,190 ‫which allows you to get notified if your metric goes over, 82 00:03:19,190 --> 00:03:21,400 ‫for example, 10 or $20. 83 00:03:21,400 --> 00:03:24,020 ‫The alarm state can be OK when everything is green, 84 00:03:24,020 --> 00:03:26,070 ‫INSUFFICIENT_DATA when there's not enough data points 85 00:03:26,070 --> 00:03:28,420 ‫to figure out if it should be green or bad, 86 00:03:28,420 --> 00:03:30,230 ‫and then ALARM when it's bad. 87 00:03:30,230 --> 00:03:32,090 ‫Okay, so that's it for the overview, 88 00:03:32,090 --> 00:03:33,400 ‫now we'll see you in the next lecture 89 00:03:33,400 --> 00:03:35,323 ‫to practice using metrics and alarms.