1
00:00:00,260 --> 00:00:01,850
Now let's discuss CloudWatch Alarms.

2
00:00:01,850 --> 00:00:02,990
So alarms, as we know,

3
00:00:02,990 --> 00:00:06,060
they're used to trigger notifications from any metric.

4
00:00:06,060 --> 00:00:07,990
And you can define complex alarms

5
00:00:07,990 --> 00:00:10,210
and on various options such as sampling,

6
00:00:10,210 --> 00:00:13,200
or doing percentage, or max, min and so on.

7
00:00:13,200 --> 00:00:14,480
Alarm has three states,

8
00:00:14,480 --> 00:00:15,846
OK means that it's not triggered,

9
00:00:15,846 --> 00:00:18,230
insufficient data means that there's not enough data

10
00:00:18,230 --> 00:00:20,300
for the alarm to determine a state,

11
00:00:20,300 --> 00:00:23,630
and alarm, which is that your threshold has been breached

12
00:00:23,630 --> 00:00:26,190
and therefore a notification will be sent.

13
00:00:26,190 --> 00:00:27,970
The period is how long you want the alarm

14
00:00:27,970 --> 00:00:29,750
to evaluate for on the metric

15
00:00:29,750 --> 00:00:32,750
and so it could be very, very short or very, very long,

16
00:00:32,750 --> 00:00:35,780
and it can apply also to high resolution custom metrics.

17
00:00:35,780 --> 00:00:37,210
For example, 10 seconds, 30 seconds,

18
00:00:37,210 --> 00:00:39,509
or a multiple of 60 seconds.

19
00:00:39,509 --> 00:00:41,924
Now, alarms have three main targets,

20
00:00:41,924 --> 00:00:45,210
the first one is actions on EC2 Instances,

21
00:00:45,210 --> 00:00:48,110
such as stopping it, terminating it, rebooting it,

22
00:00:48,110 --> 00:00:50,010
or recovering an instance.

23
00:00:50,010 --> 00:00:52,450
The second one is to trigger an auto-scaling action,

24
00:00:52,450 --> 00:00:54,844
for example, a scale out or a scale in.

25
00:00:54,844 --> 00:00:57,330
And the last one is to send a notification

26
00:00:57,330 --> 00:00:59,366
to the SNS service, for example,

27
00:00:59,366 --> 00:01:02,060
and from the SNS service we can hook it

28
00:01:02,060 --> 00:01:04,590
to a lambda function and have the lambda function

29
00:01:04,590 --> 00:01:06,130
do pretty much anything we want based

30
00:01:06,130 --> 00:01:08,430
on an alarm being breached.

31
00:01:08,430 --> 00:01:10,040
So, let's talk about EC2 Instance Recovery.

32
00:01:10,040 --> 00:01:10,873
We've already seen it,

33
00:01:10,873 --> 00:01:13,410
but there's a status check to check the EC2 VM

34
00:01:13,410 --> 00:01:14,840
and the system status check

35
00:01:14,840 --> 00:01:16,694
to check the underlying hardware.

36
00:01:16,694 --> 00:01:18,690
And you can define a CloudWatch Alarm

37
00:01:18,690 --> 00:01:19,960
on both of these checks.

38
00:01:19,960 --> 00:01:22,592
Okay, so you will monitor a specific EC2 Instance

39
00:01:22,592 --> 00:01:25,232
and in case the alarm is being breached,

40
00:01:25,232 --> 00:01:28,550
then you can start an EC2 Instance Recovery

41
00:01:28,550 --> 00:01:29,530
to make sure, for example,

42
00:01:29,530 --> 00:01:31,130
that you move your EC2 Instance

43
00:01:31,130 --> 00:01:32,610
from one host to another.

44
00:01:32,610 --> 00:01:34,030
When you do a recovery,

45
00:01:34,030 --> 00:01:35,730
you get the same private, public,

46
00:01:35,730 --> 00:01:37,480
and elastic IP, the same metadata,

47
00:01:37,480 --> 00:01:39,104
and the same placement group for your instance.

48
00:01:39,104 --> 00:01:43,060
And you can also send an alarm and alerts to your SNS Topic

49
00:01:43,060 --> 00:01:46,713
to get alerted that the EC2 Instance was being recovered.

50
00:01:47,680 --> 00:01:49,110
Now the CloudWatch Alarm has some good stuff.

51
00:01:49,110 --> 00:01:49,943
So, you know, that first of all

52
00:01:49,943 --> 00:01:51,010
is that as we've seen,

53
00:01:51,010 --> 00:01:52,540
we can create an alarm on top

54
00:01:52,540 --> 00:01:54,020
of a CloudWatch Logs Metric Filter.

55
00:01:54,020 --> 00:01:54,853
So remember,

56
00:01:54,853 --> 00:01:57,210
the CloudWatch Logs are having a metric filter,

57
00:01:57,210 --> 00:01:58,920
which is hooked to a CloudWatch Alarm

58
00:01:58,920 --> 00:02:01,268
and then when we receive too many instances

59
00:02:01,268 --> 00:02:03,574
of a specific word, for example, the word error,

60
00:02:03,574 --> 00:02:08,479
then do an alert and send a message into Amazon SNS.

61
00:02:08,479 --> 00:02:10,530
And so if you want it to test alarm notifications,

62
00:02:10,530 --> 00:02:13,618
you can use a CLI Call called set alarm state.

63
00:02:13,618 --> 00:02:16,360
And this is helpful when you want to trigger an alarm,

64
00:02:16,360 --> 00:02:19,154
even though it didn't reach a specific specific threshold,

65
00:02:19,154 --> 00:02:22,340
because you wanted to see whether or not the alarm being

66
00:02:22,340 --> 00:02:24,680
triggered results in the correct action

67
00:02:24,680 --> 00:02:26,900
for your infrastructure.

68
00:02:26,900 --> 00:02:27,733
So that's it for alarms.

69
00:02:27,733 --> 00:02:28,566
I hope you liked it,

70
00:02:28,566 --> 00:02:31,023
and I will see you in the next lecture for some practice.