0
1
00:00:00,330 --> 00:00:05,229
Please note that this content is
targeted for SysOps administrators. If
1

2
00:00:05,229 --> 00:00:09,820
you're a Solutions Architect or a
developer you may want to skip over this
2

3
00:00:09,820 --> 00:00:12,060
one.
3

4
00:00:13,010 --> 00:00:20,010
Welcome back to BackSpace Academy in
this troubleshooting lesson this time
4

5
00:00:20,010 --> 00:00:25,320
it'll be on the RDS service will again
look at issues around failing to connect
5

6
00:00:25,320 --> 00:00:29,640
or instances and we'll also look at
those specifically around a SQL
6

7
00:00:29,640 --> 00:00:35,610
server as well loss a look at the
probable causes of a service outage and
7

8
00:00:35,610 --> 00:00:44,520
also the causes of a mySQL read
replicas lag as well if you're having
8

9
00:00:44,520 --> 00:00:49,589
trouble connecting to your RDS instance
the most probable cause would be first
9

10
00:00:49,589 --> 00:00:52,920
off your security group rules so make
sure that you have an appropriate
10

11
00:00:52,920 --> 00:00:57,870
ingress rule for the instance that you
are connecting into it could be a wrong
11

12
00:00:57,870 --> 00:01:03,510
password and it could also be problems
on your side of the connection so you
12

13
00:01:03,510 --> 00:01:07,050
could have local firewall restrictions
on your side that are preventing you
13

14
00:01:07,050 --> 00:01:11,070
from connecting it could be something as
simple as you haven't given enough time
14

15
00:01:11,070 --> 00:01:15,630
for the instance to be created yet so
the action that you can take is first
15

16
00:01:15,630 --> 00:01:20,670
off a test set connection using net cake
if you're in the UNIX environment or Mac
16

17
00:01:20,670 --> 00:01:26,340
OS X or else use telnet if you're in a
Windows environment the netcat command e
17

18
00:01:26,340 --> 00:01:32,370
will be in c - z v and then the end
point followed by the port and windows
18

19
00:01:32,370 --> 00:01:37,020
will be similar it'll be telnet and then
the end point and the port you can also
19

20
00:01:37,020 --> 00:01:45,000
try and reset your master password as
well additional to the general failing
20

21
00:01:45,000 --> 00:01:50,010
to connect issues there are also a
SQL server specific issues so you may
21

22
00:01:50,010 --> 00:01:54,000
get a message from SQL server saying
could not open a connection to SQL 
22

23
00:01:54,000 --> 00:01:58,950
server that would most probably be
you're using a wrong connection string
23

24
00:01:58,950 --> 00:02:02,520
so might have the wrong end point that
you're using or you could have the wrong
24

25
00:02:02,520 --> 00:02:06,660
port number or the missing port number
from that that connection string that
25

26
00:02:06,660 --> 00:02:11,519
you're trying to connect with it could
also be that you're using the incorrect
26

27
00:02:11,519 --> 00:02:15,209
user name and password and if that
occurs then you would have a message
27

28
00:02:15,209 --> 00:02:19,410
similar to no connection could be made
because the target machine actively
28

29
00:02:19,410 --> 00:02:23,370
refused it so that is the credentials
have been refused and so you would need
29

30
00:02:23,370 --> 00:02:27,069
to go back and have a look and make sure
that your password hasn't expired or
30

31
00:02:27,069 --> 00:02:33,849
you're not using the correct username if
your RDS instance goes down and your
31

32
00:02:33,849 --> 00:02:38,739
experienced a service outage the most
probable causes first off there would be
32

33
00:02:38,739 --> 00:02:44,019
that the instance has rebooted for
whatever reason and you may have changed
33

34
00:02:44,019 --> 00:02:47,739
the setting that requires the instance
to reboot and immediately for example if
34

35
00:02:47,739 --> 00:02:51,640
you change the backup retention period
or the instance class of that day of
35

36
00:02:51,640 --> 00:02:56,889
that DB instance and if you select apply
immediately it will immediately reboot
36

37
00:02:56,889 --> 00:03:01,769
that instance if you change the storage
type it will also reboot that instance
37

38
00:03:01,769 --> 00:03:06,430
you could also be that you've just run
out of storage any that if that occurs
38

39
00:03:06,430 --> 00:03:11,709
in your RDS instance will go down so the
action that you can take is first off
39

40
00:03:11,709 --> 00:03:17,620
check the instance status by looking in
the in the console or you can also use
40

41
00:03:17,620 --> 00:03:23,859
the command line interface as well also
if you're making a setting that requires
41

42
00:03:23,859 --> 00:03:28,480
a reboot
you can select apply immediately and set
42

43
00:03:28,480 --> 00:03:32,379
that to false and that will make sure
that the reboot occurs during a
43

44
00:03:32,379 --> 00:03:37,269
maintenance window that will produce
less disruption for you if you find that
44

45
00:03:37,269 --> 00:03:41,709
you are running out of storage and make
sure that you monitor the free storage
45

46
00:03:41,709 --> 00:03:48,540
space metric in cloud watch and set up a
cloud watch alert for that as well
46

47
00:03:48,989 --> 00:03:55,989
sometimes with my secret we may find
that there is a lag between the data
47

48
00:03:55,989 --> 00:04:01,569
that is in our master data base and what
is in our read replicas either I've got
48

49
00:04:01,569 --> 00:04:05,769
a rear applica set up so that lag
between that in seconds is what we call
49

50
00:04:05,769 --> 00:04:11,859
the replicas lag so if we experience a
high level of replicas lag there are a
50

51
00:04:11,859 --> 00:04:16,030
number of causes that could be behind
that first off we could find that there
51

52
00:04:16,030 --> 00:04:20,859
is a difference in the capability of our
master data base and our read replicas
52

53
00:04:20,859 --> 00:04:25,240
for example it may have a different
storage class or the master data base
53

54
00:04:25,240 --> 00:04:30,220
might have a high provision die ops and
the real etiquette only has a low
54

55
00:04:30,220 --> 00:04:35,110
provision die ops and it's not taking
advantage of that it could be that the
55

56
00:04:35,110 --> 00:04:40,150
DB parameter group settings are
incompatible with the between the read
56

57
00:04:40,150 --> 00:04:44,349
replicas
and the master database it could also be
57

58
00:04:44,349 --> 00:04:50,110
that we are experiencing a high write
rate and that is causing the mySQL 
58

59
00:04:50,110 --> 00:04:55,049
query crash to be refreshed too often
and it can't keep up with that
59

60
00:04:55,049 --> 00:05:00,309
appropriately so the action that we can
take is first of all we can monitor
60

61
00:05:00,309 --> 00:05:06,309
replicas lag in seconds using the cloud
watch replicas lag metric and that will
61

62
00:05:06,309 --> 00:05:11,019
return in seconds what that replicas lag
is if they are identical then it will be
62

63
00:05:11,019 --> 00:05:17,229
returned zero if we have our instance is
experiencing a an outage then that will
63

64
00:05:17,229 --> 00:05:23,139
return a minus one we should make sure
that we use the same instance and
64

65
00:05:23,139 --> 00:05:28,629
storage class for every replicas as
we've got with our master database and
65

66
00:05:28,629 --> 00:05:33,129
if we find that we're having a high
write rate we can look at disabling that
66

67
00:05:33,129 --> 00:05:37,629
query cache another thing we can do is
we can warm the InnoDB
67

68
00:05:37,629 --> 00:05:43,299
if it's mySQL or xtraDB if it's a
Maria DB, we can warm the buffer pool on
68

69
00:05:43,299 --> 00:05:49,839
that read replica. The way we do that is
that we can copy data from our master
69

70
00:05:49,839 --> 00:05:54,999
database over to our read replicas and
that will update the buffer pool on that
70

71
00:05:54,999 --> 00:06:01,089
read replica and we'll set that replica
lag back to zero for us. So that's all I
71

72
00:06:01,089 --> 00:06:06,249
need to discuss now on troubleshooting
RDS the best way to learn this stuff is
72

73
00:06:06,249 --> 00:06:09,819
to get hands-on with it and that's what
we'll be doing in the next lecture so
73

74
00:06:09,819 --> 00:06:12,629
we'll see you in that one