1 00:00:13,870 --> 00:00:18,320 Welcome back to BackSpace Academy. In this lecture we're going to be building 2 00:00:18,320 --> 00:00:22,880 on what we already know about Cloudwatch from the cloud practitioner course. 3 00:00:22,880 --> 00:00:27,590 We know that we can use it for infrastructure monitoring, for example to 4 00:00:27,590 --> 00:00:32,900 trigger an auto scaling event that would add EC2 instances to our infrastructure 5 00:00:32,900 --> 00:00:38,899 when demand is high. We know we can use it to alert us when our AWS bill is 6 00:00:38,899 --> 00:00:44,059 going to exceed a certain level. We can also use it for application monitoring, 7 00:00:44,059 --> 00:00:48,050 so not only monitoring our infrastructure as working but also 8 00:00:48,050 --> 00:00:53,600 looking at our application and making sure that the application is doing what 9 00:00:53,600 --> 00:00:58,700 we intended it to do and it is performing exactly how we would like it to perform. 10 00:00:58,700 --> 00:01:04,339 We'll look at CloudWatch metrics and alerts specifically for containers in 11 00:01:04,339 --> 00:01:10,070 container insights. We'll look at CloudWatch logs and how we can filter and 12 00:01:10,070 --> 00:01:14,600 analyze those logs and the different services around that can integrate with 13 00:01:14,600 --> 00:01:21,799 CloudWatch logs. We'll look at how AWS services can subscribe to log events 14 00:01:21,799 --> 00:01:26,719 from multiple sources and multiple accounts and collate that in one 15 00:01:26,719 --> 00:01:32,539 delivered spot through the use of CloudWatch subscriptions, and finally we'll 16 00:01:32,539 --> 00:01:40,130 look at securing our CloudWatch logs as well. We already know about metrics and 17 00:01:40,130 --> 00:01:45,829 what they can do. We know that there are default metrics for EC2 and SQS but 18 00:01:45,829 --> 00:01:51,439 there are also a number of other default metrics for other AWS services. 19 00:01:51,439 --> 00:01:56,719 In fact more than 70 AWS services have default metrics that are available for you to use. 20 00:01:56,719 --> 00:02:04,670 That includes not only EC2 and SQS but also Dynamodb S3, ECS, Lamdda, 21 00:02:04,670 --> 00:02:10,580 API Gateway and many more, and if those default metrics don't do what we want 22 00:02:10,580 --> 00:02:15,500 then we can also create our own custom metrics, and the way we do that is that 23 00:02:15,500 --> 00:02:21,800 we can have an application deliver data directly to CloudWatch for us, and we 24 00:02:21,800 --> 00:02:27,140 can do that using the CloudWatch API, SDK or the command-line interface, 25 00:02:27,140 --> 00:02:31,040 and use the put metric data command which will send data that we have collected 26 00:02:31,040 --> 00:02:37,280 and send that to CloudWatch. For example you might have an Internet of Things 27 00:02:37,280 --> 00:02:44,570 device and that is acquiring data from a sensor, and you would like to put that 28 00:02:44,570 --> 00:02:48,680 data somewhere. You could use the put metric data command and have your 29 00:02:48,680 --> 00:02:55,670 application that is using the AWS SDK and that would send that data to the AWS 30 00:02:55,670 --> 00:02:59,420 CloudWatch service and from there you can analyze it and view it and do whatever 31 00:02:59,420 --> 00:03:04,520 you want with it. Now standard custom metrics they are a one minute resolution 32 00:03:04,520 --> 00:03:10,730 but there are also high resolution custom metrics that can go up to one 33 00:03:10,730 --> 00:03:15,800 second resolution. One thing to consider though is that the pricing is the same 34 00:03:15,800 --> 00:03:21,700 regardless of resolution. So the higher resolution will of course cost you more. 35 00:03:21,700 --> 00:03:29,180 All metrics will expire after the defined retention period, so if you've 36 00:03:29,180 --> 00:03:34,820 got high resolution custom metrics and they are less than 60 seconds of 37 00:03:34,820 --> 00:03:41,000 resolution then CloudWatch will hold those for up to 3 hours and then after 38 00:03:41,000 --> 00:03:45,200 that they will expire and will be removed from the system. If it's 1 minute 39 00:03:45,200 --> 00:03:50,299 it will be 15 days, if it's 5 minutes it will be 63 days, and if you have a one 40 00:03:50,299 --> 00:03:57,500 hour resolution between your sampling then that will be 455 days or 15 months 41 00:03:57,500 --> 00:04:03,500 of retention period. Those metrics, you cannot delete those manually they must 42 00:04:03,500 --> 00:04:09,890 expire after that defined retention period. If an instance is terminated or 43 00:04:09,890 --> 00:04:16,010 that monitoring is disabled, the previously connected metrics will still 44 00:04:16,010 --> 00:04:21,950 be available for you to view. Now if you find that the retention period is not 45 00:04:21,950 --> 00:04:26,419 long enough for you, you want to archive these further, you can actually download 46 00:04:26,419 --> 00:04:32,750 your data points using the CloudWatch API or an SDK or the command line interface 47 00:04:32,750 --> 00:04:37,400 using get metrics statistics command and that will allow you to 48 00:04:37,400 --> 00:04:40,849 download those data points over a defined 49 00:04:40,849 --> 00:04:44,029 period and from there you can store them somewhere else 50 00:04:44,029 --> 00:04:51,949 such as Amazon s3. If you implement CloudWatch logs to store your log data then 51 00:04:51,949 --> 00:04:56,619 that will store it for as long as you want, it will store it indefinitely and 52 00:04:56,619 --> 00:05:02,080 you can change that period for however you want as well. 53 00:05:02,080 --> 00:05:08,659 Any CloudWatch metric, whether it's default or custom doesn't matter, 54 00:05:08,659 --> 00:05:13,490 can have an alarm associated with it and that alarm will be triggered when that 55 00:05:13,490 --> 00:05:19,519 threshold is met. An example of an alarm would be EC2 CPU utilisation, and we 56 00:05:19,519 --> 00:05:24,830 could use that alarm to alert an auto scaling event to occur. We could use the 57 00:05:24,830 --> 00:05:31,189 sqs queue length to add worker ec2 instances on a processing line. We could 58 00:05:31,189 --> 00:05:37,219 also have a metric that alerts us or an alarm that alerts us when our AWS bill 59 00:05:37,219 --> 00:05:42,860 exceeds a certain level. The alarms will be triggered when a threshold level is 60 00:05:42,860 --> 00:05:47,569 met, so you need to set a target value and then you need to choose whether 61 00:05:47,569 --> 00:05:52,249 there is greater than, greater than or equal, less than or, less than or equal to 62 00:05:52,249 --> 00:05:58,189 that target value, for that alarm to be triggered. If you are using high 63 00:05:58,189 --> 00:06:04,579 resolution custom metrics then you can establish a high resolution alarm 64 00:06:04,579 --> 00:06:11,779 for periods as low as 10 seconds. The history of alarms will be retained for 65 00:06:11,779 --> 00:06:21,139 up to 14 days and available for you to view. Not only can we collect metric data 66 00:06:21,139 --> 00:06:27,259 and view that metric data, we can also apply basic statistics to that data as 67 00:06:27,259 --> 00:06:34,279 well. We can apply metric data aggregations over a specified period of 68 00:06:34,279 --> 00:06:40,369 time and we can apply the minimum or the maximum value, the sum of values, the 69 00:06:40,369 --> 00:06:46,069 average of all of these data points, the sample count or, a percentile of that 70 00:06:46,069 --> 00:06:52,969 data within that period of time. It supports many unit types and if you want 71 00:06:52,969 --> 00:06:54,740 to know what they are you just go to the 72 00:06:54,740 --> 00:06:59,810 CloudWatch API documentation and lookup metric datum and that will list all of 73 00:06:59,810 --> 00:07:04,490 the different data types that are available. The periods for our data 74 00:07:04,490 --> 00:07:09,169 aggregations will be defined in seconds and that will be one second, or five 75 00:07:09,169 --> 00:07:17,449 seconds, or 10 seconds, or 30 seconds, or any multiple of 60 seconds. Sub-minute 76 00:07:17,449 --> 00:07:22,819 periods are only available if you are using high resolution custom monitoring, 77 00:07:22,819 --> 00:07:29,449 it's not available with standard monitoring. Not only can you collect data 78 00:07:29,449 --> 00:07:35,960 but you can also view those metrics and alarms in the console with the AWS 79 00:07:35,960 --> 00:07:42,530 CloudWatch dashboards. They are a customized auto refresh view of that 80 00:07:42,530 --> 00:07:49,330 data. You can also create cross account and cross region dashboards if you need. 81 00:07:49,330 --> 00:07:56,990 Live data if it is switched on is shown as soon as any data is published for 82 00:07:56,990 --> 00:08:02,150 that period. So when that period is up and it and it publishes the data for 83 00:08:02,150 --> 00:08:08,060 that specific period that live data will appear on the dashboard. If live data is 84 00:08:08,060 --> 00:08:11,440 switched off then it will be shown one minute later. 85 00:08:11,440 --> 00:08:15,949 Those dashboards, they can be created obviously using the AWS 86 00:08:15,949 --> 00:08:20,150 management console, but they can also be created manually using the command-line 87 00:08:20,150 --> 00:08:25,340 interface, API or one of the many software development kits, using the put 88 00:08:25,340 --> 00:08:31,310 dashboard command. Anyone who wants to view those dashboards will need to have 89 00:08:31,310 --> 00:08:36,770 an IAM permission to view that, and that could be administrator access, or 90 00:08:36,770 --> 00:08:42,320 CloudWatch full access policy, or you could create a custom policy that would have 91 00:08:42,320 --> 00:08:46,610 get dashboard permissions, or list dashboard, or put dashboard to create a 92 00:08:46,610 --> 00:08:52,880 dashboard, or delete dashboards permissions attached to it. 93 00:08:52,880 --> 00:08:59,149 CloudWatch synthetics allows you to create configurable scripts or canaries to 94 00:08:59,149 --> 00:09:06,680 monitor web endpoints and api's. What that means is if you want to not only 95 00:09:06,680 --> 00:09:11,270 check that a website is up and available, you also 96 00:09:11,270 --> 00:09:15,620 want to check that it is performing and doing exactly what it is supposed to do, 97 00:09:15,620 --> 00:09:21,680 then you can create a script that can test that for you to make life easier 98 00:09:21,680 --> 00:09:24,710 for you. There are blueprints available and you 99 00:09:24,710 --> 00:09:29,180 can choose from a heartbeat monitor just to check that the the endpoint or API is 100 00:09:29,180 --> 00:09:37,130 actually accessible, you can check an API canary that can send a HTTP POST command 101 00:09:37,130 --> 00:09:41,540 or whatever to an API and check that it is responding correctly, you can also 102 00:09:41,540 --> 00:09:45,380 have a broken link checker that can go through a website and make 103 00:09:45,380 --> 00:09:50,860 sure that there are no broken links, and there is also a graphical user interface 104 00:09:50,860 --> 00:09:58,310 workflow blueprint that is based upon puppeteer. Now this is where the real 105 00:09:58,310 --> 00:10:04,940 power of Synthetics comes in. Puppeteer, not to be confused with Puppet, 106 00:10:04,940 --> 00:10:09,080 Puppeteer is a Nodejs package that allows you to communicate 107 00:10:09,080 --> 00:10:14,090 with a headless webkit for web automation. Now if you're not a developer 108 00:10:14,090 --> 00:10:21,260 you're probably saying, what is that? Now this is a very powerful tool and I've 109 00:10:21,260 --> 00:10:27,080 been using web automation for many many years and it is part of my really 110 00:10:27,080 --> 00:10:32,830 powerful tool kit that i use quite a bit. I've uses PhantomJS quite a lot and, 111 00:10:32,830 --> 00:10:39,140 in 2017 when Google came out with their headless Chrome browser I've been using 112 00:10:39,140 --> 00:10:44,420 that with puppeteer. Both of them are very good. What they do is that they 113 00:10:44,420 --> 00:10:49,700 allow you to interact with a website without actually having a browser and 114 00:10:49,700 --> 00:10:55,339 you can do it all with code. So you can code to enter into a text box and then 115 00:10:55,339 --> 00:11:01,550 click on submit and you can wait until a button appears or some CSS element appears 116 00:11:01,550 --> 00:11:07,370 and you can do all this through code, and so this from the backend, so 117 00:11:07,370 --> 00:11:12,470 from a website's perspective, it doesn't know whether it's communicating with a 118 00:11:12,470 --> 00:11:17,660 real person that is using a browser or whether it is actually just code that is 119 00:11:17,660 --> 00:11:23,660 sending commands to a headless browser, and that is very powerful because we can 120 00:11:23,660 --> 00:11:29,180 use that for testing. So we can go through as a normal user would and go 121 00:11:29,180 --> 00:11:34,009 through and conduct some things on a web page, we can download stuff, we can input 122 00:11:34,009 --> 00:11:39,350 stuff, and we can do everything that a normal user would do. So we can set up 123 00:11:39,350 --> 00:11:47,509 with CloudWatch Synthetics to monitor and run through this GUI test, using 124 00:11:47,509 --> 00:11:53,449 Puppeteer, and we can write our code in Puppeteer, and Synthetics will, on a 125 00:11:53,449 --> 00:11:58,399 regular basis that we that we define, will run that test on our 126 00:11:58,399 --> 00:12:03,079 application and this is great because although we may have unit testing at the 127 00:12:03,079 --> 00:12:07,730 design stage and development of our application, it's always great because 128 00:12:07,730 --> 00:12:11,779 things do change and there may be things that we didn't pick up and we 129 00:12:11,779 --> 00:12:16,310 later on pick those up when it goes live, and so that's where this really comes in 130 00:12:16,310 --> 00:12:21,980 and is really powerful. Those synthetics tests can be scheduled for up 131 00:12:21,980 --> 00:12:27,500 to one minute intervals and statistics and details of those can be viewed in 132 00:12:27,500 --> 00:12:32,959 the console. Another great feature is it integrates with CloudWatch Servicelens 133 00:12:32,959 --> 00:12:39,139 which we'll talk about in the next slide. CloudWatch Servicelens, it allows you 134 00:12:39,139 --> 00:12:43,720 to visualize and analyze a health performance and availability of your 135 00:12:43,720 --> 00:12:50,629 applications in a single place, and it does that by integrating AWS x-ray with 136 00:12:50,629 --> 00:12:57,439 CloudWatch. Now you'll learn a lot about AWS x-ray if you're going on to do the 137 00:12:57,439 --> 00:13:03,350 AWS certified developer course with backspace' academy, but if you're not 138 00:13:03,350 --> 00:13:08,990 it's a service that allows you to insert, within your applications 139 00:13:08,990 --> 00:13:17,000 code, to send data to the AWS x-ray service, and so what that does is the AWS 140 00:13:17,000 --> 00:13:21,019 x-ray service will collect that data, those data points, and it will 141 00:13:21,019 --> 00:13:26,290 display those, and what that means is that you can see how long it takes for 142 00:13:26,290 --> 00:13:30,769 different functions within your application to operate, and allows you to 143 00:13:30,769 --> 00:13:34,680 pick up on any errors or bottlenecks within your application 144 00:13:34,680 --> 00:13:39,210 when it's running. So the way it works is that we have to integrate x-ray 145 00:13:39,210 --> 00:13:45,420 with CloudWatch by deploying first off the x-ray code on the application and we 146 00:13:45,420 --> 00:13:49,080 do that using one of these software development kits, and then we also, 147 00:13:49,080 --> 00:13:54,390 separate from our application, we have to install the x-ray daemon, and what the 148 00:13:54,390 --> 00:14:00,029 daemon does is it communicates between our application and the AWS x-ray 149 00:14:00,029 --> 00:14:04,560 service, and then the x-ray service will collect that data. Then we need to 150 00:14:04,560 --> 00:14:10,290 install the CloudWatch agent on our server and that will allow the x-ray 151 00:14:10,290 --> 00:14:14,760 data points to be sent off to CloudWatch and from there we can monitor and 152 00:14:14,760 --> 00:14:22,110 view those. Container insights are metrics and logs specifically for 153 00:14:22,110 --> 00:14:26,760 containerized applications and micro-services . Some examples of metrics 154 00:14:26,760 --> 00:14:34,110 could be CPU utilization, memory, storage, task count, it's available for elastic 155 00:14:34,110 --> 00:14:38,970 container service, elastic kubernetes service and also for kubernetes that is 156 00:14:38,970 --> 00:14:47,820 running on EC2. Container insights are charged as custom metrics. 157 00:14:47,820 --> 00:14:55,140 CloudWatch logs allow you to monitor store and access log files from a number of AWS 158 00:14:55,140 --> 00:15:03,180 services such as ec2, VPC flow logs, cloudtrail, API gateway, ECS, EKS, lambda, 159 00:15:03,180 --> 00:15:10,440 RDS, route 53 and SNS, just to name a few. There are log events for CloudWatch 160 00:15:10,440 --> 00:15:16,230 logs that consists of a timestamp and then a message, and the log streams are a 161 00:15:16,230 --> 00:15:21,900 sequence of these log events that come from the same source, and then we have 162 00:15:21,900 --> 00:15:27,029 log groups which are groups of these log streams that have the same retention 163 00:15:27,029 --> 00:15:32,490 monitoring and access control settings. You can apply filters to the metrics 164 00:15:32,490 --> 00:15:38,100 that are collected by the CloudWatch log service by assigning those metric 165 00:15:38,100 --> 00:15:45,000 filters to an individual log group and by doing that that metric filter will be 166 00:15:45,000 --> 00:15:52,220 applied to all of those associated group streams. Now by default, unlike standard 167 00:15:52,220 --> 00:15:57,800 CloudWatch, the logs are kept indefinitely and never expire but you 168 00:15:57,800 --> 00:16:03,860 can change this by modifying the retention settings. So by creating 169 00:16:03,860 --> 00:16:08,440 different retention settings and assigning those to a log group any 170 00:16:08,440 --> 00:16:13,130 streams that are associated with that log group will have those retention 171 00:16:13,130 --> 00:16:21,920 settings applied to them. CloudWatch logs insights is a fully managed service 172 00:16:21,920 --> 00:16:27,769 that allows you to analyze data stored in CloudWatch logs and it does that 173 00:16:27,769 --> 00:16:33,620 with a really fast an interactive querying service and it also allows you 174 00:16:33,620 --> 00:16:38,390 to visualize the results of that afterwards in line and area charts as well. 175 00:16:38,390 --> 00:16:43,850 It uses a really simple query language with really simple commands 176 00:16:43,850 --> 00:16:49,100 such as display to simply limit the display of those results, to retrieve 177 00:16:49,100 --> 00:16:54,950 certain fields, to apply a filter on results, to apply statistics to an 178 00:16:54,950 --> 00:17:00,740 aggregated region of those results, to limit those results, and also to conduct 179 00:17:00,740 --> 00:17:06,500 parse operations on those results as well. You can apply arithmetic operations and 180 00:17:06,500 --> 00:17:12,079 also you can apply regular expressions to those results as well. There are a 181 00:17:12,079 --> 00:17:16,250 number of built-in functions such as string functions, date/time functions, 182 00:17:16,250 --> 00:17:21,860 also functions that can be based upon the IP address of those results, and also 183 00:17:21,860 --> 00:17:27,050 functions that can be that can apply across aggregations of data, and again 184 00:17:27,050 --> 00:17:35,360 the visualization results can be in line or area charts. CloudWatch contributor 185 00:17:35,360 --> 00:17:40,760 insights is a great tool if you're conducting a root cause analysis into 186 00:17:40,760 --> 00:17:47,059 problems in your infrastructur. It helps you to identify the top contributors to 187 00:17:47,059 --> 00:17:52,640 those results that are influencing your system performance. It does this by you 188 00:17:52,640 --> 00:17:59,990 creating rules in a JSON or a JavaScript object notation and those rules will 189 00:17:59,990 --> 00:18:05,780 allow contributor insight to evaluate for patterns in those streams 190 00:18:05,780 --> 00:18:10,640 of log events that are sent to CloudWatch logs. To make life easier for you 191 00:18:10,640 --> 00:18:16,280 there is a wizard that can develop these JSON objects for you. You don't have to 192 00:18:16,280 --> 00:18:20,860 actually do it yourself as such and there are also sample rules also 193 00:18:20,860 --> 00:18:24,679 available to create that JSON. So if you're not too comfortable in creating 194 00:18:24,679 --> 00:18:30,080 JSON scripts or JSON notation then there is that wizard and sample rules 195 00:18:30,080 --> 00:18:35,419 available for you. Those results can be also sorted and filtered according to 196 00:18:35,419 --> 00:18:41,690 what you require. Here we've got a CloudWatch insights of a DynamoDB 197 00:18:41,690 --> 00:18:47,030 table and here we can see we've got a most or the most accessed items and we 198 00:18:47,030 --> 00:18:52,190 see a good line graph of that, and we can clearly identify at what time what items 199 00:18:52,190 --> 00:18:57,380 were most accessed, and also the most throttled items. So again we can see what 200 00:18:57,380 --> 00:19:02,690 areas of this Dynamodb table are under load as such and we might be able to 201 00:19:02,690 --> 00:19:06,919 make some changes to our schema or make some changes to our infrastructure to 202 00:19:06,919 --> 00:19:14,090 accommodate that. CloudWatch subscriptions allow a Kinesis stream or 203 00:19:14,090 --> 00:19:22,789 the Kinesis data firehose or AWS lambda to subscribe to a real-time feed of log 204 00:19:22,789 --> 00:19:30,710 events from CloudWatch logs. It achieves this by having you define a subscription 205 00:19:30,710 --> 00:19:37,100 filter which defines which events are to be delivered to that subscribed service 206 00:19:37,100 --> 00:19:41,299 and then you define the ARN or the amazon resource number for that 207 00:19:41,299 --> 00:19:47,659 particular resource be it a Kinesis stream or lambda or whatever it is. Another 208 00:19:47,659 --> 00:19:52,760 really good feature of subscriptions is that you can create an IAM role that 209 00:19:52,760 --> 00:19:58,730 will allow log data from other AWS accounts to be subscribed to as well. 210 00:19:58,730 --> 00:20:04,400 You could have one account with a Kinesis stream or a data Firehose 211 00:20:04,400 --> 00:20:10,700 delivering it to s3 or whatever. It is receiving a real-time feed from multiple 212 00:20:10,700 --> 00:20:17,220 AWS accounts. The data in CloudWatch logs can be 213 00:20:17,220 --> 00:20:24,120 encrypted at rest by simply associating an AWS key management service or kms 214 00:20:24,120 --> 00:20:30,540 customer master key with a particular log group, and so any logs or any log 215 00:20:30,540 --> 00:20:36,180 streams that are associated with that log group will be encrypted at rest 216 00:20:36,180 --> 00:20:44,570 with that kms master key. Identity based IAM policies can be used to restrict 217 00:20:44,570 --> 00:20:52,920 access from an IAM entity to log groups log streams or destinations for 218 00:20:52,920 --> 00:20:58,050 CloudWatch logs. So this entity could be a user or a group in your account but it 219 00:20:58,050 --> 00:21:02,970 also could be someone who has permissions that has been granted to 220 00:21:02,970 --> 00:21:08,000 them through a role, and by doing that they may be able to access your data 221 00:21:08,000 --> 00:21:14,460 securely, with the permissions defined, from another account. You can also have 222 00:21:14,460 --> 00:21:19,770 resource based policies and they can be attached to the actual resources of 223 00:21:19,770 --> 00:21:25,080 CloudWatch logs destinations that are defined in the CloudWatch subscription 224 00:21:25,080 --> 00:21:29,670 service, and by doing that you're going to restrict access to that particular 225 00:21:29,670 --> 00:21:36,210 resource by applying it to the actual resource itself. If the service that 226 00:21:36,210 --> 00:21:44,100 you're collecting data from resides within a VPC or inside of a VPC you 227 00:21:44,100 --> 00:21:51,210 can encrypt that data in transit between your VPC and the CloudWatch service by 228 00:21:51,210 --> 00:21:56,430 using a VPC endpoint and that will allow for private communication between 229 00:21:56,430 --> 00:22:01,410 those resources that you're monitoring inside of your VPC and the CloudWatch service. 230 00:22:01,410 --> 00:22:06,810 So that brings us to the end of this lecture and I hope you enjoyed it 231 00:22:06,810 --> 00:22:11,060 and I look forward to seeing you in the next one.