0
1
00:00:00,330 --> 00:00:05,229
Please note that this content is
targeted for SysOps Administrators. If
1

2
00:00:05,229 --> 00:00:09,820
you're a Solutions Architect or a
developer you may want to skip over this
2

3
00:00:09,820 --> 00:00:12,060
one.
3

4
00:00:14,710 --> 00:00:20,119
Welcome back to BackSpace Academy
one thing that a SysOps administrator
4

5
00:00:20,119 --> 00:00:24,830
will be doing a lot of and that is
troubleshooting of services and one of
5

6
00:00:24,830 --> 00:00:28,610
those services which you will no doubt
be involved in troubleshooting will be
6

7
00:00:28,610 --> 00:00:34,400
Amazon ec2. So this lecture is going to
run through all of the different issues
7

8
00:00:34,400 --> 00:00:40,250
that may arise with the ec2 service and
how to troubleshoot those so we'll look
8

9
00:00:40,250 --> 00:00:45,590
at launching issues. We'll look at issues
with failed status checks. We'll also
9

10
00:00:45,590 --> 00:00:51,200
look at where you where you are unable
to terminate or stop an instance and
10

11
00:00:51,200 --> 00:00:55,280
then we'll finally look at connection
issues with the Linux instances and also
11

12
00:00:55,280 --> 00:01:04,759
with Windows instances as well. There are
a range of probable causes that can lead
12

13
00:01:04,759 --> 00:01:11,030
to your ec2 instance failing to launch
the first one and probably the most
13

14
00:01:11,030 --> 00:01:15,530
significant one would be that you've
exceeded the ec2 service limits and that
14

15
00:01:15,530 --> 00:01:20,720
can be a range of things, for example a
lot of instances you may only be able to
15

16
00:01:20,720 --> 00:01:25,700
launch one or two of those, others you
may be able to launch 20 of, so if you
16

17
00:01:25,700 --> 00:01:32,179
exceed that limit then AWS is not going
to let you launch any more instances and
17

18
00:01:32,179 --> 00:01:37,819
the same with the EBS volume limits
if you exceed those then you are also
18

19
00:01:37,819 --> 00:01:42,289
going to run into problems when you go
to launch an instance. There are other
19

20
00:01:42,289 --> 00:01:46,789
issues you could have a corrupt EBS
snapshot that you are trying to launch
20

21
00:01:46,789 --> 00:01:52,640
your instance with you could also have a
problem with an instant store backed
21

22
00:01:52,640 --> 00:01:58,069
AMI and with those they will be, and this
is only for instance store backed AMI, is
22

23
00:01:58,069 --> 00:02:04,220
that they will be stored to Amazon s3 in
parts and the AWS service when it goes
23

24
00:02:04,220 --> 00:02:09,410
to launch it will piece those parts back
together to create that AMi that is
24

25
00:02:09,410 --> 00:02:13,700
going to be used to launch that instance.
So if you're missing one of those part
25

26
00:02:13,700 --> 00:02:18,050
files then you're not going to be able
to launch that instance. If you have
26

27
00:02:18,050 --> 00:02:22,220
insufficient instance capacity to run
your application for example if you'll
27

28
00:02:22,220 --> 00:02:26,390
have a very compute intensive
application or
28

29
00:02:26,390 --> 00:02:33,709
intensive application and your instance
that your selected might be a t2 nano or
29

30
00:02:33,709 --> 00:02:36,830
something like that it may not have
enough capacity to actually run that
30

31
00:02:36,830 --> 00:02:42,020
application and finally you may
encounter account issues if you haven't
31

32
00:02:42,020 --> 00:02:46,220
paid your bill then you're not going to
be able to launch instances so any one
32

33
00:02:46,220 --> 00:02:53,060
of those can fail to launch or cause you
to fail to launch or ec2 instances so
33

34
00:02:53,060 --> 00:02:56,810
the action that we can take the first
thing we should do is to look at the
34

35
00:02:56,810 --> 00:03:01,310
instance or the state-transition reason
which will be in the instance
35

36
00:03:01,310 --> 00:03:05,510
description so when we go to the console
we click on our on our instance and
36

37
00:03:05,510 --> 00:03:09,140
we'll be able to get that
state-transition reason we can also get
37

38
00:03:09,140 --> 00:03:15,680
similar information by using the CLI
describe-instances command and if we
38

39
00:03:15,680 --> 00:03:20,120
find that it is something around limits
then we can check our limits in the ec2
39

40
00:03:20,120 --> 00:03:24,200
console so there will be a menu option
there on the left hand side that we can
40

41
00:03:24,200 --> 00:03:29,000
click on to have a look at our available
limits if we find that we're exceeding
41

42
00:03:29,000 --> 00:03:32,360
those limits for example the number of
instances that we may be able to launch
42

43
00:03:32,360 --> 00:03:42,200
we can request an increase from AWS to
correct that problem if an ec2 instance
43

44
00:03:42,200 --> 00:03:47,239
fails its status checks there are a
number of probable causes for that it
44

45
00:03:47,239 --> 00:03:52,370
can be memory issues it can be problems
with EBS with an i/o device it could be
45

46
00:03:52,370 --> 00:03:57,350
kernel issues it could be filesystem
issues and it could be other issues with
46

47
00:03:57,350 --> 00:04:01,549
the operating system so the action that
we would want to take is that first of
47

48
00:04:01,549 --> 00:04:05,989
all we would like to wait for it to
resolve itself obviously if we find that
48

49
00:04:05,989 --> 00:04:12,200
it's not resolving itself then we can if
it's an EBS based instant or EBS backed
49

50
00:04:12,200 --> 00:04:17,900
instance we can restart it by stopping
and then starting it or we can relaunch
50

51
00:04:17,900 --> 00:04:21,890
it so with with an instant store you
can't stop that you'll have to terminate
51

52
00:04:21,890 --> 00:04:26,300
it and then relaunch it to see if it
fixes it. You can also retrieve the
52

53
00:04:26,300 --> 00:04:31,039
system log which is basically the
console output for their instance so
53

54
00:04:31,039 --> 00:04:36,229
it's a Linux operating system you can
actually retrieve all of the output from
54

55
00:04:36,229 --> 00:04:40,240
the console or from the Linux console
there
55

56
00:04:40,240 --> 00:04:45,729
and we can also look at creating an
instant recovery alarm with cloud watch
56

57
00:04:45,729 --> 00:04:50,680
and so if there is an issue then we can
have cloud watch take an action to
57

58
00:04:50,680 --> 00:04:58,150
recover that instance automatically for
us. If we have problems with terminating
58

59
00:04:58,150 --> 00:05:02,919
or stopping our instance it's most
likely going to be a problem with the
59

60
00:05:02,919 --> 00:05:08,710
underlying host computer or the
underlying host computer is processing
60

61
00:05:08,710 --> 00:05:12,310
scripts that haven't been finished
before you can actually stop or
61

62
00:05:12,310 --> 00:05:16,300
terminate that instance. Another common
problem is that you might have that
62

63
00:05:16,300 --> 00:05:20,710
instance inside of an auto scaling group
or it might be part of Elastic Beanstalk
63

64
00:05:20,710 --> 00:05:24,699
and every time you terminate that
instance it's actually replaced with
64

65
00:05:24,699 --> 00:05:28,500
another instance so quite a common
problem that you that you can encounter
65

66
00:05:28,500 --> 00:05:34,150
so the action that you can take is that
you can use a CLI stop instance command
66

67
00:05:34,150 --> 00:05:41,289
with the force option on it or you can
actually create an ami of that instance
67

68
00:05:41,289 --> 00:05:45,789
and then terminate it and replace it
with another instance if you find that
68

69
00:05:45,789 --> 00:05:49,210
you just cannot terminate that ec2
instance and you're going to have to
69

70
00:05:49,210 --> 00:05:56,440
contact AWS support and they can
terminate that instance for you. If you
70

71
00:05:56,440 --> 00:06:01,840
are having trouble connecting to your
ec2 Linux instance it may be that the
71

72
00:06:01,840 --> 00:06:06,280
instance is overloaded and just doesn't
have the resources available to do that
72

73
00:06:06,280 --> 00:06:12,159
connection it could and most probably
would be a problem with your VPC setup
73

74
00:06:12,159 --> 00:06:16,150
it could be a problem with your private
key that you're using to connect to that
74

75
00:06:16,150 --> 00:06:20,259
instance if you're looking to ping that
instance and it could be a problem with
75

76
00:06:20,259 --> 00:06:25,960
setting up ICMP so the action available
if it's a VPC issue make sure that you
76

77
00:06:25,960 --> 00:06:29,770
that you have your security group rules
set up and your network access control
77

78
00:06:29,770 --> 00:06:34,900
is set up to allow inbound traffic on
that port make sure they have an
78

79
00:06:34,900 --> 00:06:38,860
Internet gateway or a virtual private
gateway and that you have a route from
79

80
00:06:38,860 --> 00:06:43,780
your subnet through to that internet
gateway also make sure that you have a
80

81
00:06:43,780 --> 00:06:47,979
public IP address if you don't have a
public IP then you're not got your
81

82
00:06:47,979 --> 00:06:52,750
instance will not be visible on the
wider internet if you find you've got a
82

83
00:06:52,750 --> 00:06:57,280
problem with your private
keep and it's not recognized you need to
83

84
00:06:57,280 --> 00:07:02,350
check the format of that private key so
if you're using putty make sure that
84

85
00:07:02,350 --> 00:07:09,220
you're using the PPK format it could be
that you're using the wrong username to
85

86
00:07:09,220 --> 00:07:16,180
connect so if you are looking at a AWS
ami or an Amazon ami the user will be
86

87
00:07:16,180 --> 00:07:23,620
ec2 - user if it's an Ubuntu instance it
will be ubuntu as the user. You also
87

88
00:07:23,620 --> 00:07:28,210
need to make sure that you've got
permission to access that private key
88

89
00:07:28,210 --> 00:07:31,840
and also make sure that it is not a
completely unprotected file because
89

90
00:07:31,840 --> 00:07:38,830
either those will prevent you your side
from connecting to that that ec2 Linux
90

91
00:07:38,830 --> 00:07:46,180
instance the issues where the failing to
connect with Windows instances are
91

92
00:07:46,180 --> 00:07:51,580
similar to Linux instances so you still
have problems that may arise from the
92

93
00:07:51,580 --> 00:07:57,220
instance being overloaded or VPC issues
or problems with the credentials but we
93

94
00:07:57,220 --> 00:08:01,630
also have a Windows Firewall that may
also be giving us issues and we also
94

95
00:08:01,630 --> 00:08:06,390
have a maximum number of RDB sessions
that we can can currently have
95

96
00:08:06,390 --> 00:08:11,920
connecting to that instance so we exceed
that we're going to run into issues so
96

97
00:08:11,920 --> 00:08:15,070
with our credentials we need to make
sure that a username and password is
97

98
00:08:15,070 --> 00:08:21,040
correct and also make sure that that
password has not expired if we find that
98

99
00:08:21,040 --> 00:08:26,410
we've found Windows Firewall issues on
our Windows server we need to disable
99

100
00:08:26,410 --> 00:08:33,240
that Windows Firewall and use our
security group rules to control access
100

101
00:08:33,240 --> 00:08:38,310
via RDP for our instance so that's all
I'm going to talk to you now about
101

102
00:08:38,310 --> 00:08:42,849
troubleshooting coming up next we'll
have a hands-on lesson on how to
102

103
00:08:42,849 --> 00:08:48,180
actually uses so I'll see you in the
next one