1
00:00:00,120 --> 00:00:02,580
‫Welcome to this section on Cloud Monitoring.

2
00:00:02,580 --> 00:00:04,620
‫In this section, we're going to know how we can

3
00:00:04,620 --> 00:00:06,780
‫get a better idea and a better picture

4
00:00:06,780 --> 00:00:09,930
‫of the performance of our cloud deployments.

5
00:00:09,930 --> 00:00:11,740
‫So the first service I want to talk about

6
00:00:11,740 --> 00:00:13,420
‫is CloudWatch Metrics.

7
00:00:13,420 --> 00:00:17,330
‫CloudWatch provides metrics for every service in AWS,

8
00:00:17,330 --> 00:00:20,220
‫and a Metric is a variable to monitor.

9
00:00:20,220 --> 00:00:23,480
‫For example, the CPUUtilization or the NetworkIn.

10
00:00:23,480 --> 00:00:25,720
‫The metrics are going through the time,

11
00:00:25,720 --> 00:00:27,400
‫so they will have timestamps,

12
00:00:27,400 --> 00:00:30,260
‫and if you want to visualize all your metrics at once,

13
00:00:30,260 --> 00:00:33,570
‫you can create a CloudWatch dashboard of metrics.

14
00:00:33,570 --> 00:00:37,160
‫So here is an example of a very common metric

15
00:00:37,160 --> 00:00:40,050
‫for CloudWatch which is called the Billing metric.

16
00:00:40,050 --> 00:00:43,740
‫So this metric is only available in us-east-I,

17
00:00:43,740 --> 00:00:46,790
‫so only in one region, and it represents the total amounts

18
00:00:46,790 --> 00:00:49,550
‫you have spent on your AWS cloud.

19
00:00:49,550 --> 00:00:52,960
‫So obviously, at every month end it will reset back

20
00:00:52,960 --> 00:00:55,020
‫in to zero, but as you can see,

21
00:00:55,020 --> 00:00:58,180
‫over time that metric goes up and then will go back to zero,

22
00:00:58,180 --> 00:01:00,510
‫and so this month I have spent more than $100

23
00:01:00,510 --> 00:01:04,070
‫because I am experimenting with different AWS services.

24
00:01:04,070 --> 00:01:05,750
‫So this is one metric, but obviously,

25
00:01:05,750 --> 00:01:08,010
‫there are a ton more metrics we can look at.

26
00:01:08,010 --> 00:01:10,170
‫For example, for our EC2 instances,

27
00:01:10,170 --> 00:01:11,730
‫we can look at the CPU Utilization,

28
00:01:11,730 --> 00:01:14,580
‫which is how much we are making the CPU work

29
00:01:14,580 --> 00:01:16,630
‫and if it makes it work a lot,

30
00:01:16,630 --> 00:01:18,320
‫than maybe our instance is too busy

31
00:01:18,320 --> 00:01:21,130
‫and we need to scale it up or scale it out.

32
00:01:21,130 --> 00:01:23,680
‫The Status Check to make sure that our EC2 instance

33
00:01:23,680 --> 00:01:26,330
‫is properly functioning, and the Network,

34
00:01:26,330 --> 00:01:28,960
‫to see how much network is going in our instance

35
00:01:28,960 --> 00:01:30,580
‫and out our instance.

36
00:01:30,580 --> 00:01:33,360
‫As you can see, the RAM is not an available metric

37
00:01:33,360 --> 00:01:35,350
‫for your EC2 instances.

38
00:01:35,350 --> 00:01:38,160
‫These metrics you get every five minutes by default,

39
00:01:38,160 --> 00:01:40,010
‫but you can enable a Detailed Monitoring,

40
00:01:40,010 --> 00:01:41,280
‫which is more expensive,

41
00:01:41,280 --> 00:01:43,780
‫to get these metrics every one minute.

42
00:01:43,780 --> 00:01:45,410
‫Then you have EBS volumes,

43
00:01:45,410 --> 00:01:47,190
‫which are where you store your data,

44
00:01:47,190 --> 00:01:48,900
‫and you get information about the amount

45
00:01:48,900 --> 00:01:51,310
‫of disk read and writes that are happening.

46
00:01:51,310 --> 00:01:53,390
‫Then for your S3 buckets, you can get some information

47
00:01:53,390 --> 00:01:56,600
‫around the bucket size and bytes, the number of objects,

48
00:01:56,600 --> 00:01:59,770
‫or the number of requests done into your S3 buckets,

49
00:01:59,770 --> 00:02:01,610
‫and the Billing metric that just shows you,

50
00:02:01,610 --> 00:02:03,350
‫shows you the total estimated charge

51
00:02:03,350 --> 00:02:05,690
‫for your account only in us-east-I,

52
00:02:05,690 --> 00:02:07,900
‫but it's for your entire account.

53
00:02:07,900 --> 00:02:10,120
‫Then the Service Limits, which is how much you have been

54
00:02:10,120 --> 00:02:12,640
‫using a service API, or finally,

55
00:02:12,640 --> 00:02:14,250
‫if you don't find the metric you like,

56
00:02:14,250 --> 00:02:17,480
‫you can push your own custom metrics.

57
00:02:17,480 --> 00:02:19,530
‫Next lets talk about CloudWatch Alarms.

58
00:02:19,530 --> 00:02:23,170
‫So Alarms are used to trigger notifications for any metric,

59
00:02:23,170 --> 00:02:26,480
‫and that means that once a metric goes above a threshold,

60
00:02:26,480 --> 00:02:29,350
‫then we can have a CloudWatch Alarm action,

61
00:02:29,350 --> 00:02:31,790
‫and these actions can be for an auto scaling group

62
00:02:31,790 --> 00:02:33,260
‫to increase or decrease the number

63
00:02:33,260 --> 00:02:35,510
‫of EC2 instances desired counts

64
00:02:35,510 --> 00:02:37,680
‫effectively allowing your auto scaling group

65
00:02:37,680 --> 00:02:39,500
‫to scale automatically.

66
00:02:39,500 --> 00:02:42,730
‫EC2 Actions, if you want to stop, terminate, reboot,

67
00:02:42,730 --> 00:02:45,260
‫or recover an EC2 instance,

68
00:02:45,260 --> 00:02:48,370
‫and SNS notifications if you wanted to send a notification

69
00:02:48,370 --> 00:02:49,810
‫into an SNS topic.

70
00:02:49,810 --> 00:02:51,250
‫For example, you're saying, okay,

71
00:02:51,250 --> 00:02:54,410
‫if my EC2 instance has a utilization

72
00:02:54,410 --> 00:02:57,310
‫of over 90%, then send us an email

73
00:02:57,310 --> 00:02:59,990
‫because we want to look at it and something's wrong.

74
00:02:59,990 --> 00:03:02,560
‫Then you get various options for creating the alarm,

75
00:03:02,560 --> 00:03:05,130
‫sampling, percentage, max, min, et cetera,

76
00:03:05,130 --> 00:03:07,160
‫and you can which, you can choose the period on which

77
00:03:07,160 --> 00:03:08,980
‫to evaluate an alarm, whether it be five minutes,

78
00:03:08,980 --> 00:03:11,690
‫ten minutes, an hour, and then finally,

79
00:03:11,690 --> 00:03:13,750
‫you can create what's called a billing alarm

80
00:03:13,750 --> 00:03:15,620
‫on the CloudWatch Billing metric,

81
00:03:15,620 --> 00:03:19,190
‫which allows you to get notified if your metric goes over,

82
00:03:19,190 --> 00:03:21,400
‫for example, 10 or $20.

83
00:03:21,400 --> 00:03:24,020
‫The alarm state can be OK when everything is green,

84
00:03:24,020 --> 00:03:26,070
‫INSUFFICIENT_DATA when there's not enough data points

85
00:03:26,070 --> 00:03:28,420
‫to figure out if it should be green or bad,

86
00:03:28,420 --> 00:03:30,230
‫and then ALARM when it's bad.

87
00:03:30,230 --> 00:03:32,090
‫Okay, so that's it for the overview,

88
00:03:32,090 --> 00:03:33,400
‫now we'll see you in the next lecture

89
00:03:33,400 --> 00:03:35,323
‫to practice using metrics and alarms.