0 1 00:00:00,330 --> 00:00:05,229 Please note that this content is targeted for SysOps Administrators. If 1 2 00:00:05,229 --> 00:00:09,820 you're a Solutions Architect or a developer you may want to skip over this 2 3 00:00:09,820 --> 00:00:12,060 one. 3 4 00:00:14,710 --> 00:00:20,119 Welcome back to BackSpace Academy one thing that a SysOps administrator 4 5 00:00:20,119 --> 00:00:24,830 will be doing a lot of and that is troubleshooting of services and one of 5 6 00:00:24,830 --> 00:00:28,610 those services which you will no doubt be involved in troubleshooting will be 6 7 00:00:28,610 --> 00:00:34,400 Amazon ec2. So this lecture is going to run through all of the different issues 7 8 00:00:34,400 --> 00:00:40,250 that may arise with the ec2 service and how to troubleshoot those so we'll look 8 9 00:00:40,250 --> 00:00:45,590 at launching issues. We'll look at issues with failed status checks. We'll also 9 10 00:00:45,590 --> 00:00:51,200 look at where you where you are unable to terminate or stop an instance and 10 11 00:00:51,200 --> 00:00:55,280 then we'll finally look at connection issues with the Linux instances and also 11 12 00:00:55,280 --> 00:01:04,759 with Windows instances as well. There are a range of probable causes that can lead 12 13 00:01:04,759 --> 00:01:11,030 to your ec2 instance failing to launch the first one and probably the most 13 14 00:01:11,030 --> 00:01:15,530 significant one would be that you've exceeded the ec2 service limits and that 14 15 00:01:15,530 --> 00:01:20,720 can be a range of things, for example a lot of instances you may only be able to 15 16 00:01:20,720 --> 00:01:25,700 launch one or two of those, others you may be able to launch 20 of, so if you 16 17 00:01:25,700 --> 00:01:32,179 exceed that limit then AWS is not going to let you launch any more instances and 17 18 00:01:32,179 --> 00:01:37,819 the same with the EBS volume limits if you exceed those then you are also 18 19 00:01:37,819 --> 00:01:42,289 going to run into problems when you go to launch an instance. There are other 19 20 00:01:42,289 --> 00:01:46,789 issues you could have a corrupt EBS snapshot that you are trying to launch 20 21 00:01:46,789 --> 00:01:52,640 your instance with you could also have a problem with an instant store backed 21 22 00:01:52,640 --> 00:01:58,069 AMI and with those they will be, and this is only for instance store backed AMI, is 22 23 00:01:58,069 --> 00:02:04,220 that they will be stored to Amazon s3 in parts and the AWS service when it goes 23 24 00:02:04,220 --> 00:02:09,410 to launch it will piece those parts back together to create that AMi that is 24 25 00:02:09,410 --> 00:02:13,700 going to be used to launch that instance. So if you're missing one of those part 25 26 00:02:13,700 --> 00:02:18,050 files then you're not going to be able to launch that instance. If you have 26 27 00:02:18,050 --> 00:02:22,220 insufficient instance capacity to run your application for example if you'll 27 28 00:02:22,220 --> 00:02:26,390 have a very compute intensive application or 28 29 00:02:26,390 --> 00:02:33,709 intensive application and your instance that your selected might be a t2 nano or 29 30 00:02:33,709 --> 00:02:36,830 something like that it may not have enough capacity to actually run that 30 31 00:02:36,830 --> 00:02:42,020 application and finally you may encounter account issues if you haven't 31 32 00:02:42,020 --> 00:02:46,220 paid your bill then you're not going to be able to launch instances so any one 32 33 00:02:46,220 --> 00:02:53,060 of those can fail to launch or cause you to fail to launch or ec2 instances so 33 34 00:02:53,060 --> 00:02:56,810 the action that we can take the first thing we should do is to look at the 34 35 00:02:56,810 --> 00:03:01,310 instance or the state-transition reason which will be in the instance 35 36 00:03:01,310 --> 00:03:05,510 description so when we go to the console we click on our on our instance and 36 37 00:03:05,510 --> 00:03:09,140 we'll be able to get that state-transition reason we can also get 37 38 00:03:09,140 --> 00:03:15,680 similar information by using the CLI describe-instances command and if we 38 39 00:03:15,680 --> 00:03:20,120 find that it is something around limits then we can check our limits in the ec2 39 40 00:03:20,120 --> 00:03:24,200 console so there will be a menu option there on the left hand side that we can 40 41 00:03:24,200 --> 00:03:29,000 click on to have a look at our available limits if we find that we're exceeding 41 42 00:03:29,000 --> 00:03:32,360 those limits for example the number of instances that we may be able to launch 42 43 00:03:32,360 --> 00:03:42,200 we can request an increase from AWS to correct that problem if an ec2 instance 43 44 00:03:42,200 --> 00:03:47,239 fails its status checks there are a number of probable causes for that it 44 45 00:03:47,239 --> 00:03:52,370 can be memory issues it can be problems with EBS with an i/o device it could be 45 46 00:03:52,370 --> 00:03:57,350 kernel issues it could be filesystem issues and it could be other issues with 46 47 00:03:57,350 --> 00:04:01,549 the operating system so the action that we would want to take is that first of 47 48 00:04:01,549 --> 00:04:05,989 all we would like to wait for it to resolve itself obviously if we find that 48 49 00:04:05,989 --> 00:04:12,200 it's not resolving itself then we can if it's an EBS based instant or EBS backed 49 50 00:04:12,200 --> 00:04:17,900 instance we can restart it by stopping and then starting it or we can relaunch 50 51 00:04:17,900 --> 00:04:21,890 it so with with an instant store you can't stop that you'll have to terminate 51 52 00:04:21,890 --> 00:04:26,300 it and then relaunch it to see if it fixes it. You can also retrieve the 52 53 00:04:26,300 --> 00:04:31,039 system log which is basically the console output for their instance so 53 54 00:04:31,039 --> 00:04:36,229 it's a Linux operating system you can actually retrieve all of the output from 54 55 00:04:36,229 --> 00:04:40,240 the console or from the Linux console there 55 56 00:04:40,240 --> 00:04:45,729 and we can also look at creating an instant recovery alarm with cloud watch 56 57 00:04:45,729 --> 00:04:50,680 and so if there is an issue then we can have cloud watch take an action to 57 58 00:04:50,680 --> 00:04:58,150 recover that instance automatically for us. If we have problems with terminating 58 59 00:04:58,150 --> 00:05:02,919 or stopping our instance it's most likely going to be a problem with the 59 60 00:05:02,919 --> 00:05:08,710 underlying host computer or the underlying host computer is processing 60 61 00:05:08,710 --> 00:05:12,310 scripts that haven't been finished before you can actually stop or 61 62 00:05:12,310 --> 00:05:16,300 terminate that instance. Another common problem is that you might have that 62 63 00:05:16,300 --> 00:05:20,710 instance inside of an auto scaling group or it might be part of Elastic Beanstalk 63 64 00:05:20,710 --> 00:05:24,699 and every time you terminate that instance it's actually replaced with 64 65 00:05:24,699 --> 00:05:28,500 another instance so quite a common problem that you that you can encounter 65 66 00:05:28,500 --> 00:05:34,150 so the action that you can take is that you can use a CLI stop instance command 66 67 00:05:34,150 --> 00:05:41,289 with the force option on it or you can actually create an ami of that instance 67 68 00:05:41,289 --> 00:05:45,789 and then terminate it and replace it with another instance if you find that 68 69 00:05:45,789 --> 00:05:49,210 you just cannot terminate that ec2 instance and you're going to have to 69 70 00:05:49,210 --> 00:05:56,440 contact AWS support and they can terminate that instance for you. If you 70 71 00:05:56,440 --> 00:06:01,840 are having trouble connecting to your ec2 Linux instance it may be that the 71 72 00:06:01,840 --> 00:06:06,280 instance is overloaded and just doesn't have the resources available to do that 72 73 00:06:06,280 --> 00:06:12,159 connection it could and most probably would be a problem with your VPC setup 73 74 00:06:12,159 --> 00:06:16,150 it could be a problem with your private key that you're using to connect to that 74 75 00:06:16,150 --> 00:06:20,259 instance if you're looking to ping that instance and it could be a problem with 75 76 00:06:20,259 --> 00:06:25,960 setting up ICMP so the action available if it's a VPC issue make sure that you 76 77 00:06:25,960 --> 00:06:29,770 that you have your security group rules set up and your network access control 77 78 00:06:29,770 --> 00:06:34,900 is set up to allow inbound traffic on that port make sure they have an 78 79 00:06:34,900 --> 00:06:38,860 Internet gateway or a virtual private gateway and that you have a route from 79 80 00:06:38,860 --> 00:06:43,780 your subnet through to that internet gateway also make sure that you have a 80 81 00:06:43,780 --> 00:06:47,979 public IP address if you don't have a public IP then you're not got your 81 82 00:06:47,979 --> 00:06:52,750 instance will not be visible on the wider internet if you find you've got a 82 83 00:06:52,750 --> 00:06:57,280 problem with your private keep and it's not recognized you need to 83 84 00:06:57,280 --> 00:07:02,350 check the format of that private key so if you're using putty make sure that 84 85 00:07:02,350 --> 00:07:09,220 you're using the PPK format it could be that you're using the wrong username to 85 86 00:07:09,220 --> 00:07:16,180 connect so if you are looking at a AWS ami or an Amazon ami the user will be 86 87 00:07:16,180 --> 00:07:23,620 ec2 - user if it's an Ubuntu instance it will be ubuntu as the user. You also 87 88 00:07:23,620 --> 00:07:28,210 need to make sure that you've got permission to access that private key 88 89 00:07:28,210 --> 00:07:31,840 and also make sure that it is not a completely unprotected file because 89 90 00:07:31,840 --> 00:07:38,830 either those will prevent you your side from connecting to that that ec2 Linux 90 91 00:07:38,830 --> 00:07:46,180 instance the issues where the failing to connect with Windows instances are 91 92 00:07:46,180 --> 00:07:51,580 similar to Linux instances so you still have problems that may arise from the 92 93 00:07:51,580 --> 00:07:57,220 instance being overloaded or VPC issues or problems with the credentials but we 93 94 00:07:57,220 --> 00:08:01,630 also have a Windows Firewall that may also be giving us issues and we also 94 95 00:08:01,630 --> 00:08:06,390 have a maximum number of RDB sessions that we can can currently have 95 96 00:08:06,390 --> 00:08:11,920 connecting to that instance so we exceed that we're going to run into issues so 96 97 00:08:11,920 --> 00:08:15,070 with our credentials we need to make sure that a username and password is 97 98 00:08:15,070 --> 00:08:21,040 correct and also make sure that that password has not expired if we find that 98 99 00:08:21,040 --> 00:08:26,410 we've found Windows Firewall issues on our Windows server we need to disable 99 100 00:08:26,410 --> 00:08:33,240 that Windows Firewall and use our security group rules to control access 100 101 00:08:33,240 --> 00:08:38,310 via RDP for our instance so that's all I'm going to talk to you now about 101 102 00:08:38,310 --> 00:08:42,849 troubleshooting coming up next we'll have a hands-on lesson on how to 102 103 00:08:42,849 --> 00:08:48,180 actually uses so I'll see you in the next one