0
1
00:00:00,330 --> 00:00:05,229
Please note that this content is
targeted for SysOps administrators. If
1

2
00:00:05,229 --> 00:00:09,820
you're a Solutions Architect or a
developer you may want to skip over this
2

3
00:00:09,820 --> 00:00:12,060
one.
3

4
00:00:13,269 --> 00:00:20,410
Welcome back to BackSpace Academy if
you're a system administrator you're 
4

5
00:00:20,410 --> 00:00:24,820
not only going to be heavily involved in
troubleshooting services when they go
5

6
00:00:24,820 --> 00:00:30,910
wrong you're also going to be expected
to implement systems that can alert you
6

7
00:00:30,910 --> 00:00:35,379
to problems and also not only alert you
to problems but also invoke other
7

8
00:00:35,379 --> 00:00:39,160
services that may be able to correct
that problem. So we'll go through and
8

9
00:00:39,160 --> 00:00:43,960
have a look at one of the most important
of those services being the ec2 service
9

10
00:00:43,960 --> 00:00:47,890
and we'll look at the CloudWatch of
metrics that are available we'll also
10

11
00:00:47,890 --> 00:00:51,220
look at the custom metrics different
metrics that we can implement that are
11

12
00:00:51,220 --> 00:00:56,830
not standard within CloudWatch we can
look at CloudWatch statistics and that
12

13
00:00:56,830 --> 00:01:02,199
will provide us aggregate data of those
metrics that are being recorded we'll
13

14
00:01:02,199 --> 00:01:05,619
look at the different types of actions
that are available from a CloudWatch 
14

15
00:01:05,619 --> 00:01:12,390
alarm and then finally we'll look at
elastic load balancer monitoring as well
15

16
00:01:12,390 --> 00:01:18,520
the standard ec2 CloudWatch metrics
that are available to us for our t2
16

17
00:01:18,520 --> 00:01:23,469
burstable instances we can keep track of
our credit usage and credit balance to
17

18
00:01:23,469 --> 00:01:27,759
make sure that we still always have
available that ability to burst when
18

19
00:01:27,759 --> 00:01:34,719
required, For general instance metrics we
have CPU utilization we have Disk I/O and
19

20
00:01:34,719 --> 00:01:39,640
network information that we can monitor
it as a metrics and we also have our
20

21
00:01:39,640 --> 00:01:48,670
status checks that we can monitor as a
metric as well. CloudWatch metrics they
21

22
00:01:48,670 --> 00:01:53,740
can be filtered using a dimension so for
example if you wanted to just get the
22

23
00:01:53,740 --> 00:01:58,359
metrics for all of the instance within a
specific auto scaling group you can use
23

24
00:01:58,359 --> 00:02:02,829
the auto scaling group name or you can
use the image ID the instance ID or the
24

25
00:02:02,829 --> 00:02:07,840
instance type and the available metrics
I will be listed in the CloudWatch 
25

26
00:02:07,840 --> 00:02:12,010
console but you can also use a
command-line interface by using the
26

27
00:02:12,010 --> 00:02:17,680
CloudWatch list metrics command to
list those as well detailed monitoring
27

28
00:02:17,680 --> 00:02:23,470
that can be enabled and that will enable
one minute interval of your CloudWatch 
28

29
00:02:23,470 --> 00:02:26,950
metrics and you can do that at launch or
you can do it we
29

30
00:02:26,950 --> 00:02:31,720
existing instances also using the ec2
console but you can also use a
30

31
00:02:31,720 --> 00:02:35,920
command-line interface so when you're
doing an ec2 run instances command you
31

32
00:02:35,920 --> 00:02:40,530
can have monitoring enabled equals
equals true for detailed monitoring or
32

33
00:02:40,530 --> 00:02:46,480
you if you're doing the easy to monitor
instances command on an existing
33

34
00:02:46,480 --> 00:02:54,069
instance that can be done as well it is
possible to create your own custom
34

35
00:02:54,069 --> 00:02:59,230
metrics and they can be collected and
published to CloudWatch from your ec2
35

36
00:02:59,230 --> 00:03:04,569
instances and you can do that using the
CloudWatch put metric data command in
36

37
00:03:04,569 --> 00:03:09,160
the CLI or if you're using an SDK and
you can have an application running on
37

38
00:03:09,160 --> 00:03:15,459
your ec2 instance you can use put metric
data and that will enable you to publish
38

39
00:03:15,459 --> 00:03:19,180
that information to CloudWatch and
cloud what you will collect and monitor
39

40
00:03:19,180 --> 00:03:22,840
that for you you can have that at
standard resolution of one minute or up
40

41
00:03:22,840 --> 00:03:27,880
to high resolution of one second they
are also available CloudWatch 
41

42
00:03:27,880 --> 00:03:33,579
monitoring scripts and they are again
custom metrics and they will run Perl
42

43
00:03:33,579 --> 00:03:39,120
scripts on your ec2 instances and they
can collect memory swap and disk space
43

44
00:03:39,120 --> 00:03:43,959
utilization data if you would like to
have a that they monitored on a regular
44

45
00:03:43,959 --> 00:03:48,340
basis you can create a cron job and that
will publish that at regular intervals
45

46
00:03:48,340 --> 00:03:52,570
to the CloudWatch service and from
there you can view that information as
46

47
00:03:52,570 --> 00:03:59,890
you would any other CloudWatch metric a
lot of times it is not really beneficial
47

48
00:03:59,890 --> 00:04:04,450
to look at instantaneous metrics and see
what's going on there but we might want
48

49
00:04:04,450 --> 00:04:09,459
to look at what the aggregation of that
data is over a specific period of time
49

50
00:04:09,459 --> 00:04:14,560
so we can look at the minimum or maximum
levels that have that have occurred over
50

51
00:04:14,560 --> 00:04:19,000
that period of time we can look at the
sum of all of the values that were
51

52
00:04:19,000 --> 00:04:23,740
submitted during that period from all of
the samples that were taken and we can
52

53
00:04:23,740 --> 00:04:27,400
also look at the number of counts all
these sample accounts which is a number
53

54
00:04:27,400 --> 00:04:33,880
of samples that received over that
period and the average which will of
54

55
00:04:33,880 --> 00:04:38,330
course be the sum the total of all those
values divided by the number of
55

56
00:04:38,330 --> 00:04:44,450
of samples all that's divided by the
sample count also we can have a look at
56

57
00:04:44,450 --> 00:04:50,060
the specific the specified percentile so
review if we've got there a percentile
57

58
00:04:50,060 --> 00:04:55,400
of 95.45 that
means that 95.45%
58

59
00:04:55,400 --> 00:04:59,750
of all of the data collected
will be lower than this value if we did
59

60
00:04:59,750 --> 00:05:03,800
94 would be 94% of that data is below this value
60

61
00:05:03,800 --> 00:05:13,300
so we can select a specific percentile
and report our metrics based upon that
61

62
00:05:13,300 --> 00:05:19,460
one of the great features of using cloud
watch with the ec2 service is being able
62

63
00:05:19,460 --> 00:05:24,890
to use alarm actions and they can
automatically stop terminate reboot or
63

64
00:05:24,890 --> 00:05:29,000
recover our instances for us we don't
need to intervene in any way it will
64

65
00:05:29,000 --> 00:05:34,070
automatically happen once it's set up so
they can be created using or the ec2 or
65

66
00:05:34,070 --> 00:05:38,540
the CloudWatch console and there are a
number of use cases that are very good
66

67
00:05:38,540 --> 00:05:44,240
that we could use for this feature we
could use it to stop idle instances that
67

68
00:05:44,240 --> 00:05:48,380
are not really being used we can use it
to stop web servers that are getting
68

69
00:05:48,380 --> 00:05:52,760
unusually high traffic for example
they're getting attacked we can look at
69

70
00:05:52,760 --> 00:05:56,840
the network out and then we can also
look at stopping in an instance that
70

71
00:05:56,840 --> 00:06:02,210
is experiencing a memory leak, we can
stop impaired systems that have failed
71

72
00:06:02,210 --> 00:06:07,370
their status checks and we can terminate
an instance when a job has been finished
72

73
00:06:07,370 --> 00:06:10,970
for exact you might have a batch job to
process a video and then you can
73

74
00:06:10,970 --> 00:06:18,710
terminate that instance when it is
completed so additional to monitoring at
74

75
00:06:18,710 --> 00:06:24,410
ec2 instances we can also monitor our
elastic load balancers out of the box we
75

76
00:06:24,410 --> 00:06:28,850
have CloudWatch metrics and they will
be reported at or monitored and reported
76

77
00:06:28,850 --> 00:06:34,520
at sixty second or one-minute intervals
additional to that we have access logs
77

78
00:06:34,520 --> 00:06:39,800
so the elastic load balancer can publish
a log file when it is enabled and that
78

79
00:06:39,800 --> 00:06:45,080
will record log information from
anywhere from five to sixty minute
79

80
00:06:45,080 --> 00:06:50,330
intervals and those logs will be saved
to Amazon s3 and we can access those if
80

81
00:06:50,330 --> 00:06:54,600
we need to
additional to access logs when they are
81

82
00:06:54,600 --> 00:06:59,700
enabled. A feature of application ELBs, not
classic ELBs, but just
82

83
00:06:59,700 --> 00:07:04,230
application ELBs is that we can
look at request tracing and that enables
83

84
00:07:04,230 --> 00:07:10,800
us to track our HTTP request from our
clients to our targets or other services
84

85
00:07:10,800 --> 00:07:15,930
and it does that by adding or updating
the trace ID header before sending it
85

86
00:07:15,930 --> 00:07:22,530
back and that is integrated again with
with the ELB access logs and finally we
86

87
00:07:22,530 --> 00:07:28,050
can look at implementing CloudTrail on
our elastic load balancer to log any API
87

88
00:07:28,050 --> 00:07:34,530
calls to our load balancer. So that's all
I need to discuss now from a high level
88

89
00:07:34,530 --> 00:07:40,320
around monitoring of ec2 and ELB coming
up next we'll have a hands-on session to
89

90
00:07:40,320 --> 00:07:44,540
apply this stuff, so I'll see you in that
one.