0 1 00:00:00,330 --> 00:00:05,229 Please note that this content is targeted for SysOps administrators. If 1 2 00:00:05,229 --> 00:00:09,820 you're a Solutions Architect or a developer you may want to skip over this 2 3 00:00:09,820 --> 00:00:12,060 one. 3 4 00:00:13,269 --> 00:00:20,410 Welcome back to BackSpace Academy if you're a system administrator you're 4 5 00:00:20,410 --> 00:00:24,820 not only going to be heavily involved in troubleshooting services when they go 5 6 00:00:24,820 --> 00:00:30,910 wrong you're also going to be expected to implement systems that can alert you 6 7 00:00:30,910 --> 00:00:35,379 to problems and also not only alert you to problems but also invoke other 7 8 00:00:35,379 --> 00:00:39,160 services that may be able to correct that problem. So we'll go through and 8 9 00:00:39,160 --> 00:00:43,960 have a look at one of the most important of those services being the ec2 service 9 10 00:00:43,960 --> 00:00:47,890 and we'll look at the CloudWatch of metrics that are available we'll also 10 11 00:00:47,890 --> 00:00:51,220 look at the custom metrics different metrics that we can implement that are 11 12 00:00:51,220 --> 00:00:56,830 not standard within CloudWatch we can look at CloudWatch statistics and that 12 13 00:00:56,830 --> 00:01:02,199 will provide us aggregate data of those metrics that are being recorded we'll 13 14 00:01:02,199 --> 00:01:05,619 look at the different types of actions that are available from a CloudWatch 14 15 00:01:05,619 --> 00:01:12,390 alarm and then finally we'll look at elastic load balancer monitoring as well 15 16 00:01:12,390 --> 00:01:18,520 the standard ec2 CloudWatch metrics that are available to us for our t2 16 17 00:01:18,520 --> 00:01:23,469 burstable instances we can keep track of our credit usage and credit balance to 17 18 00:01:23,469 --> 00:01:27,759 make sure that we still always have available that ability to burst when 18 19 00:01:27,759 --> 00:01:34,719 required, For general instance metrics we have CPU utilization we have Disk I/O and 19 20 00:01:34,719 --> 00:01:39,640 network information that we can monitor it as a metrics and we also have our 20 21 00:01:39,640 --> 00:01:48,670 status checks that we can monitor as a metric as well. CloudWatch metrics they 21 22 00:01:48,670 --> 00:01:53,740 can be filtered using a dimension so for example if you wanted to just get the 22 23 00:01:53,740 --> 00:01:58,359 metrics for all of the instance within a specific auto scaling group you can use 23 24 00:01:58,359 --> 00:02:02,829 the auto scaling group name or you can use the image ID the instance ID or the 24 25 00:02:02,829 --> 00:02:07,840 instance type and the available metrics I will be listed in the CloudWatch 25 26 00:02:07,840 --> 00:02:12,010 console but you can also use a command-line interface by using the 26 27 00:02:12,010 --> 00:02:17,680 CloudWatch list metrics command to list those as well detailed monitoring 27 28 00:02:17,680 --> 00:02:23,470 that can be enabled and that will enable one minute interval of your CloudWatch 28 29 00:02:23,470 --> 00:02:26,950 metrics and you can do that at launch or you can do it we 29 30 00:02:26,950 --> 00:02:31,720 existing instances also using the ec2 console but you can also use a 30 31 00:02:31,720 --> 00:02:35,920 command-line interface so when you're doing an ec2 run instances command you 31 32 00:02:35,920 --> 00:02:40,530 can have monitoring enabled equals equals true for detailed monitoring or 32 33 00:02:40,530 --> 00:02:46,480 you if you're doing the easy to monitor instances command on an existing 33 34 00:02:46,480 --> 00:02:54,069 instance that can be done as well it is possible to create your own custom 34 35 00:02:54,069 --> 00:02:59,230 metrics and they can be collected and published to CloudWatch from your ec2 35 36 00:02:59,230 --> 00:03:04,569 instances and you can do that using the CloudWatch put metric data command in 36 37 00:03:04,569 --> 00:03:09,160 the CLI or if you're using an SDK and you can have an application running on 37 38 00:03:09,160 --> 00:03:15,459 your ec2 instance you can use put metric data and that will enable you to publish 38 39 00:03:15,459 --> 00:03:19,180 that information to CloudWatch and cloud what you will collect and monitor 39 40 00:03:19,180 --> 00:03:22,840 that for you you can have that at standard resolution of one minute or up 40 41 00:03:22,840 --> 00:03:27,880 to high resolution of one second they are also available CloudWatch 41 42 00:03:27,880 --> 00:03:33,579 monitoring scripts and they are again custom metrics and they will run Perl 42 43 00:03:33,579 --> 00:03:39,120 scripts on your ec2 instances and they can collect memory swap and disk space 43 44 00:03:39,120 --> 00:03:43,959 utilization data if you would like to have a that they monitored on a regular 44 45 00:03:43,959 --> 00:03:48,340 basis you can create a cron job and that will publish that at regular intervals 45 46 00:03:48,340 --> 00:03:52,570 to the CloudWatch service and from there you can view that information as 46 47 00:03:52,570 --> 00:03:59,890 you would any other CloudWatch metric a lot of times it is not really beneficial 47 48 00:03:59,890 --> 00:04:04,450 to look at instantaneous metrics and see what's going on there but we might want 48 49 00:04:04,450 --> 00:04:09,459 to look at what the aggregation of that data is over a specific period of time 49 50 00:04:09,459 --> 00:04:14,560 so we can look at the minimum or maximum levels that have that have occurred over 50 51 00:04:14,560 --> 00:04:19,000 that period of time we can look at the sum of all of the values that were 51 52 00:04:19,000 --> 00:04:23,740 submitted during that period from all of the samples that were taken and we can 52 53 00:04:23,740 --> 00:04:27,400 also look at the number of counts all these sample accounts which is a number 53 54 00:04:27,400 --> 00:04:33,880 of samples that received over that period and the average which will of 54 55 00:04:33,880 --> 00:04:38,330 course be the sum the total of all those values divided by the number of 55 56 00:04:38,330 --> 00:04:44,450 of samples all that's divided by the sample count also we can have a look at 56 57 00:04:44,450 --> 00:04:50,060 the specific the specified percentile so review if we've got there a percentile 57 58 00:04:50,060 --> 00:04:55,400 of 95.45 that means that 95.45% 58 59 00:04:55,400 --> 00:04:59,750 of all of the data collected will be lower than this value if we did 59 60 00:04:59,750 --> 00:05:03,800 94 would be 94% of that data is below this value 60 61 00:05:03,800 --> 00:05:13,300 so we can select a specific percentile and report our metrics based upon that 61 62 00:05:13,300 --> 00:05:19,460 one of the great features of using cloud watch with the ec2 service is being able 62 63 00:05:19,460 --> 00:05:24,890 to use alarm actions and they can automatically stop terminate reboot or 63 64 00:05:24,890 --> 00:05:29,000 recover our instances for us we don't need to intervene in any way it will 64 65 00:05:29,000 --> 00:05:34,070 automatically happen once it's set up so they can be created using or the ec2 or 65 66 00:05:34,070 --> 00:05:38,540 the CloudWatch console and there are a number of use cases that are very good 66 67 00:05:38,540 --> 00:05:44,240 that we could use for this feature we could use it to stop idle instances that 67 68 00:05:44,240 --> 00:05:48,380 are not really being used we can use it to stop web servers that are getting 68 69 00:05:48,380 --> 00:05:52,760 unusually high traffic for example they're getting attacked we can look at 69 70 00:05:52,760 --> 00:05:56,840 the network out and then we can also look at stopping in an instance that 70 71 00:05:56,840 --> 00:06:02,210 is experiencing a memory leak, we can stop impaired systems that have failed 71 72 00:06:02,210 --> 00:06:07,370 their status checks and we can terminate an instance when a job has been finished 72 73 00:06:07,370 --> 00:06:10,970 for exact you might have a batch job to process a video and then you can 73 74 00:06:10,970 --> 00:06:18,710 terminate that instance when it is completed so additional to monitoring at 74 75 00:06:18,710 --> 00:06:24,410 ec2 instances we can also monitor our elastic load balancers out of the box we 75 76 00:06:24,410 --> 00:06:28,850 have CloudWatch metrics and they will be reported at or monitored and reported 76 77 00:06:28,850 --> 00:06:34,520 at sixty second or one-minute intervals additional to that we have access logs 77 78 00:06:34,520 --> 00:06:39,800 so the elastic load balancer can publish a log file when it is enabled and that 78 79 00:06:39,800 --> 00:06:45,080 will record log information from anywhere from five to sixty minute 79 80 00:06:45,080 --> 00:06:50,330 intervals and those logs will be saved to Amazon s3 and we can access those if 80 81 00:06:50,330 --> 00:06:54,600 we need to additional to access logs when they are 81 82 00:06:54,600 --> 00:06:59,700 enabled. A feature of application ELBs, not classic ELBs, but just 82 83 00:06:59,700 --> 00:07:04,230 application ELBs is that we can look at request tracing and that enables 83 84 00:07:04,230 --> 00:07:10,800 us to track our HTTP request from our clients to our targets or other services 84 85 00:07:10,800 --> 00:07:15,930 and it does that by adding or updating the trace ID header before sending it 85 86 00:07:15,930 --> 00:07:22,530 back and that is integrated again with with the ELB access logs and finally we 86 87 00:07:22,530 --> 00:07:28,050 can look at implementing CloudTrail on our elastic load balancer to log any API 87 88 00:07:28,050 --> 00:07:34,530 calls to our load balancer. So that's all I need to discuss now from a high level 88 89 00:07:34,530 --> 00:07:40,320 around monitoring of ec2 and ELB coming up next we'll have a hands-on session to 89 90 00:07:40,320 --> 00:07:44,540 apply this stuff, so I'll see you in that one.