1
00:00:00,450 --> 00:00:01,830
‫Okay, welcome to this section

2
00:00:01,830 --> 00:00:03,820
‫on monitoring, troubleshooting, and audits.

3
00:00:03,820 --> 00:00:05,510
‫We are going to learn about CloudWatch,

4
00:00:05,510 --> 00:00:07,470
‫X-Ray, and CloudTrail and to me

5
00:00:07,470 --> 00:00:09,380
‫it's one of the most exciting section.

6
00:00:09,380 --> 00:00:11,500
‫So why is monitoring important?

7
00:00:11,500 --> 00:00:12,660
‫I think you already know the answer

8
00:00:12,660 --> 00:00:14,200
‫but I like to say it out loud.

9
00:00:14,200 --> 00:00:15,760
‫We know how to deploy applications.

10
00:00:15,760 --> 00:00:17,520
‫We have seen how to do it safely,

11
00:00:17,520 --> 00:00:20,210
‫automatically using infrastructure as code,

12
00:00:20,210 --> 00:00:21,870
‫leveraging the best AWS components.

13
00:00:21,870 --> 00:00:24,950
‫So we know how to do deployments.

14
00:00:24,950 --> 00:00:27,160
‫What we don't know is that once our applications

15
00:00:27,160 --> 00:00:30,250
‫are deployed, our users don't really care how we did it.

16
00:00:30,250 --> 00:00:32,970
‫They don't care if we used Elastic Beanstalk,

17
00:00:32,970 --> 00:00:35,320
‫they don't care if we use infrastructure as code.

18
00:00:35,320 --> 00:00:37,960
‫It's great that we did it, it's an edging prowess,

19
00:00:37,960 --> 00:00:39,700
‫but the users don't care.

20
00:00:39,700 --> 00:00:42,740
‫The users only care that the application is working.

21
00:00:42,740 --> 00:00:46,830
‫And so what we know to work is, for example, the latency.

22
00:00:46,830 --> 00:00:50,040
‫Will the application latency increase over time and why?

23
00:00:50,040 --> 00:00:52,380
‫Outages, you know, if there's an outage,

24
00:00:52,380 --> 00:00:55,150
‫well, our customer experience should not be degraded, okay?

25
00:00:55,150 --> 00:00:57,020
‫It should still be good, that's why we deploy

26
00:00:57,020 --> 00:00:58,900
‫highly available things.

27
00:00:58,900 --> 00:01:02,040
‫And then if the user contacts the IT departments

28
00:01:02,040 --> 00:01:03,810
‫or complaining, that's really, really bad.

29
00:01:03,810 --> 00:01:07,610
‫We don't want to be alerted of problems by our users,

30
00:01:07,610 --> 00:01:09,930
‫we kind of want to be able to do troubleshooting

31
00:01:09,930 --> 00:01:12,090
‫and remediation beforehand.

32
00:01:12,090 --> 00:01:15,130
‫So internally, can we prevent issues before they happen,

33
00:01:15,130 --> 00:01:18,100
‫or if they happen, can we see them before our users?

34
00:01:18,100 --> 00:01:20,580
‫Can we also monitor performance and cost?

35
00:01:20,580 --> 00:01:23,300
‫Can we look at trends in terms of how things scale,

36
00:01:23,300 --> 00:01:25,350
‫in terms of patterns of outages?

37
00:01:25,350 --> 00:01:28,770
‫And, you know, what can we learn and how can we improve?

38
00:01:28,770 --> 00:01:30,050
‫Thanks to this monitoring.

39
00:01:30,050 --> 00:01:33,280
‫So to me monitoring is really, really, really important.

40
00:01:33,280 --> 00:01:35,810
‫Now, in AWS there, CloudWatch.

41
00:01:35,810 --> 00:01:38,440
‫And CloudWatch allows you to collect metrics.

42
00:01:38,440 --> 00:01:40,630
‫It allows you to collect logs to monitor

43
00:01:40,630 --> 00:01:42,310
‫and analyze the log files.

44
00:01:42,310 --> 00:01:44,850
‫Events to send notifications when certain things happen

45
00:01:44,850 --> 00:01:46,310
‫in your AWS environment.

46
00:01:46,310 --> 00:01:48,650
‫And alarms to react in real time

47
00:01:48,650 --> 00:01:51,060
‫to metrics events and even logs.

48
00:01:51,060 --> 00:01:54,430
‫Then we have X-Ray and X-Ray is kind of a new service

49
00:01:54,430 --> 00:01:55,640
‫that is not very popular yet,

50
00:01:55,640 --> 00:01:57,920
‫but I think it is one of the most awesome ones.

51
00:01:57,920 --> 00:01:59,390
‫And so it allows you to troubleshoot

52
00:01:59,390 --> 00:02:01,250
‫your application performance and errors,

53
00:02:01,250 --> 00:02:05,170
‫so we'll see the latency and we'll see the errors just live.

54
00:02:05,170 --> 00:02:06,900
‫And it allows us to do something really cool

55
00:02:06,900 --> 00:02:09,310
‫called distributed tracing of microservices.

56
00:02:09,310 --> 00:02:10,880
‫So if you have a lot of services

57
00:02:10,880 --> 00:02:13,240
‫doing a lot of things and calling one another,

58
00:02:13,240 --> 00:02:15,060
‫or if you're in track with many AWS components,

59
00:02:15,060 --> 00:02:18,120
‫such as, you know, S3, DynamoDB, et cetera,

60
00:02:18,120 --> 00:02:21,050
‫then you're able to see how your application makes calls

61
00:02:21,050 --> 00:02:23,490
‫and how long they take and you can trace your call

62
00:02:23,490 --> 00:02:26,040
‫all the way through, which is really, really nice.

63
00:02:26,040 --> 00:02:28,940
‫CloudTrail is allowing you to do internal monitoring

64
00:02:28,940 --> 00:02:32,060
‫of your API calls being made, and also audit the changes

65
00:02:32,060 --> 00:02:35,100
‫made to AWS resources by your users.

66
00:02:35,100 --> 00:02:38,360
‫So overall, these three technologies all together

67
00:02:38,360 --> 00:02:41,400
‫gives you a really solid combination to monitor AWS.

68
00:02:41,400 --> 00:02:43,330
‫We're going to learn about these in this section.

69
00:02:43,330 --> 00:02:44,880
‫So see you in the next lecture.