1 00:00:00,450 --> 00:00:01,830 ‫Okay, welcome to this section 2 00:00:01,830 --> 00:00:03,820 ‫on monitoring, troubleshooting, and audits. 3 00:00:03,820 --> 00:00:05,510 ‫We are going to learn about CloudWatch, 4 00:00:05,510 --> 00:00:07,470 ‫X-Ray, and CloudTrail and to me 5 00:00:07,470 --> 00:00:09,380 ‫it's one of the most exciting section. 6 00:00:09,380 --> 00:00:11,500 ‫So why is monitoring important? 7 00:00:11,500 --> 00:00:12,660 ‫I think you already know the answer 8 00:00:12,660 --> 00:00:14,200 ‫but I like to say it out loud. 9 00:00:14,200 --> 00:00:15,760 ‫We know how to deploy applications. 10 00:00:15,760 --> 00:00:17,520 ‫We have seen how to do it safely, 11 00:00:17,520 --> 00:00:20,210 ‫automatically using infrastructure as code, 12 00:00:20,210 --> 00:00:21,870 ‫leveraging the best AWS components. 13 00:00:21,870 --> 00:00:24,950 ‫So we know how to do deployments. 14 00:00:24,950 --> 00:00:27,160 ‫What we don't know is that once our applications 15 00:00:27,160 --> 00:00:30,250 ‫are deployed, our users don't really care how we did it. 16 00:00:30,250 --> 00:00:32,970 ‫They don't care if we used Elastic Beanstalk, 17 00:00:32,970 --> 00:00:35,320 ‫they don't care if we use infrastructure as code. 18 00:00:35,320 --> 00:00:37,960 ‫It's great that we did it, it's an edging prowess, 19 00:00:37,960 --> 00:00:39,700 ‫but the users don't care. 20 00:00:39,700 --> 00:00:42,740 ‫The users only care that the application is working. 21 00:00:42,740 --> 00:00:46,830 ‫And so what we know to work is, for example, the latency. 22 00:00:46,830 --> 00:00:50,040 ‫Will the application latency increase over time and why? 23 00:00:50,040 --> 00:00:52,380 ‫Outages, you know, if there's an outage, 24 00:00:52,380 --> 00:00:55,150 ‫well, our customer experience should not be degraded, okay? 25 00:00:55,150 --> 00:00:57,020 ‫It should still be good, that's why we deploy 26 00:00:57,020 --> 00:00:58,900 ‫highly available things. 27 00:00:58,900 --> 00:01:02,040 ‫And then if the user contacts the IT departments 28 00:01:02,040 --> 00:01:03,810 ‫or complaining, that's really, really bad. 29 00:01:03,810 --> 00:01:07,610 ‫We don't want to be alerted of problems by our users, 30 00:01:07,610 --> 00:01:09,930 ‫we kind of want to be able to do troubleshooting 31 00:01:09,930 --> 00:01:12,090 ‫and remediation beforehand. 32 00:01:12,090 --> 00:01:15,130 ‫So internally, can we prevent issues before they happen, 33 00:01:15,130 --> 00:01:18,100 ‫or if they happen, can we see them before our users? 34 00:01:18,100 --> 00:01:20,580 ‫Can we also monitor performance and cost? 35 00:01:20,580 --> 00:01:23,300 ‫Can we look at trends in terms of how things scale, 36 00:01:23,300 --> 00:01:25,350 ‫in terms of patterns of outages? 37 00:01:25,350 --> 00:01:28,770 ‫And, you know, what can we learn and how can we improve? 38 00:01:28,770 --> 00:01:30,050 ‫Thanks to this monitoring. 39 00:01:30,050 --> 00:01:33,280 ‫So to me monitoring is really, really, really important. 40 00:01:33,280 --> 00:01:35,810 ‫Now, in AWS there, CloudWatch. 41 00:01:35,810 --> 00:01:38,440 ‫And CloudWatch allows you to collect metrics. 42 00:01:38,440 --> 00:01:40,630 ‫It allows you to collect logs to monitor 43 00:01:40,630 --> 00:01:42,310 ‫and analyze the log files. 44 00:01:42,310 --> 00:01:44,850 ‫Events to send notifications when certain things happen 45 00:01:44,850 --> 00:01:46,310 ‫in your AWS environment. 46 00:01:46,310 --> 00:01:48,650 ‫And alarms to react in real time 47 00:01:48,650 --> 00:01:51,060 ‫to metrics events and even logs. 48 00:01:51,060 --> 00:01:54,430 ‫Then we have X-Ray and X-Ray is kind of a new service 49 00:01:54,430 --> 00:01:55,640 ‫that is not very popular yet, 50 00:01:55,640 --> 00:01:57,920 ‫but I think it is one of the most awesome ones. 51 00:01:57,920 --> 00:01:59,390 ‫And so it allows you to troubleshoot 52 00:01:59,390 --> 00:02:01,250 ‫your application performance and errors, 53 00:02:01,250 --> 00:02:05,170 ‫so we'll see the latency and we'll see the errors just live. 54 00:02:05,170 --> 00:02:06,900 ‫And it allows us to do something really cool 55 00:02:06,900 --> 00:02:09,310 ‫called distributed tracing of microservices. 56 00:02:09,310 --> 00:02:10,880 ‫So if you have a lot of services 57 00:02:10,880 --> 00:02:13,240 ‫doing a lot of things and calling one another, 58 00:02:13,240 --> 00:02:15,060 ‫or if you're in track with many AWS components, 59 00:02:15,060 --> 00:02:18,120 ‫such as, you know, S3, DynamoDB, et cetera, 60 00:02:18,120 --> 00:02:21,050 ‫then you're able to see how your application makes calls 61 00:02:21,050 --> 00:02:23,490 ‫and how long they take and you can trace your call 62 00:02:23,490 --> 00:02:26,040 ‫all the way through, which is really, really nice. 63 00:02:26,040 --> 00:02:28,940 ‫CloudTrail is allowing you to do internal monitoring 64 00:02:28,940 --> 00:02:32,060 ‫of your API calls being made, and also audit the changes 65 00:02:32,060 --> 00:02:35,100 ‫made to AWS resources by your users. 66 00:02:35,100 --> 00:02:38,360 ‫So overall, these three technologies all together 67 00:02:38,360 --> 00:02:41,400 ‫gives you a really solid combination to monitor AWS. 68 00:02:41,400 --> 00:02:43,330 ‫We're going to learn about these in this section. 69 00:02:43,330 --> 00:02:44,880 ‫So see you in the next lecture.