0 1 00:00:00,330 --> 00:00:05,229 Please note that this content is targeted for SysOps administrators. If 1 2 00:00:05,229 --> 00:00:09,820 you're a Solutions Architect or a developer you may want to skip over this 2 3 00:00:09,820 --> 00:00:12,060 one. 3 4 00:00:13,010 --> 00:00:20,010 Welcome back to BackSpace Academy in this troubleshooting lesson this time 4 5 00:00:20,010 --> 00:00:25,320 it'll be on the RDS service will again look at issues around failing to connect 5 6 00:00:25,320 --> 00:00:29,640 or instances and we'll also look at those specifically around a SQL 6 7 00:00:29,640 --> 00:00:35,610 server as well loss a look at the probable causes of a service outage and 7 8 00:00:35,610 --> 00:00:44,520 also the causes of a mySQL read replicas lag as well if you're having 8 9 00:00:44,520 --> 00:00:49,589 trouble connecting to your RDS instance the most probable cause would be first 9 10 00:00:49,589 --> 00:00:52,920 off your security group rules so make sure that you have an appropriate 10 11 00:00:52,920 --> 00:00:57,870 ingress rule for the instance that you are connecting into it could be a wrong 11 12 00:00:57,870 --> 00:01:03,510 password and it could also be problems on your side of the connection so you 12 13 00:01:03,510 --> 00:01:07,050 could have local firewall restrictions on your side that are preventing you 13 14 00:01:07,050 --> 00:01:11,070 from connecting it could be something as simple as you haven't given enough time 14 15 00:01:11,070 --> 00:01:15,630 for the instance to be created yet so the action that you can take is first 15 16 00:01:15,630 --> 00:01:20,670 off a test set connection using net cake if you're in the UNIX environment or Mac 16 17 00:01:20,670 --> 00:01:26,340 OS X or else use telnet if you're in a Windows environment the netcat command e 17 18 00:01:26,340 --> 00:01:32,370 will be in c - z v and then the end point followed by the port and windows 18 19 00:01:32,370 --> 00:01:37,020 will be similar it'll be telnet and then the end point and the port you can also 19 20 00:01:37,020 --> 00:01:45,000 try and reset your master password as well additional to the general failing 20 21 00:01:45,000 --> 00:01:50,010 to connect issues there are also a SQL server specific issues so you may 21 22 00:01:50,010 --> 00:01:54,000 get a message from SQL server saying could not open a connection to SQL 22 23 00:01:54,000 --> 00:01:58,950 server that would most probably be you're using a wrong connection string 23 24 00:01:58,950 --> 00:02:02,520 so might have the wrong end point that you're using or you could have the wrong 24 25 00:02:02,520 --> 00:02:06,660 port number or the missing port number from that that connection string that 25 26 00:02:06,660 --> 00:02:11,519 you're trying to connect with it could also be that you're using the incorrect 26 27 00:02:11,519 --> 00:02:15,209 user name and password and if that occurs then you would have a message 27 28 00:02:15,209 --> 00:02:19,410 similar to no connection could be made because the target machine actively 28 29 00:02:19,410 --> 00:02:23,370 refused it so that is the credentials have been refused and so you would need 29 30 00:02:23,370 --> 00:02:27,069 to go back and have a look and make sure that your password hasn't expired or 30 31 00:02:27,069 --> 00:02:33,849 you're not using the correct username if your RDS instance goes down and your 31 32 00:02:33,849 --> 00:02:38,739 experienced a service outage the most probable causes first off there would be 32 33 00:02:38,739 --> 00:02:44,019 that the instance has rebooted for whatever reason and you may have changed 33 34 00:02:44,019 --> 00:02:47,739 the setting that requires the instance to reboot and immediately for example if 34 35 00:02:47,739 --> 00:02:51,640 you change the backup retention period or the instance class of that day of 35 36 00:02:51,640 --> 00:02:56,889 that DB instance and if you select apply immediately it will immediately reboot 36 37 00:02:56,889 --> 00:03:01,769 that instance if you change the storage type it will also reboot that instance 37 38 00:03:01,769 --> 00:03:06,430 you could also be that you've just run out of storage any that if that occurs 38 39 00:03:06,430 --> 00:03:11,709 in your RDS instance will go down so the action that you can take is first off 39 40 00:03:11,709 --> 00:03:17,620 check the instance status by looking in the in the console or you can also use 40 41 00:03:17,620 --> 00:03:23,859 the command line interface as well also if you're making a setting that requires 41 42 00:03:23,859 --> 00:03:28,480 a reboot you can select apply immediately and set 42 43 00:03:28,480 --> 00:03:32,379 that to false and that will make sure that the reboot occurs during a 43 44 00:03:32,379 --> 00:03:37,269 maintenance window that will produce less disruption for you if you find that 44 45 00:03:37,269 --> 00:03:41,709 you are running out of storage and make sure that you monitor the free storage 45 46 00:03:41,709 --> 00:03:48,540 space metric in cloud watch and set up a cloud watch alert for that as well 46 47 00:03:48,989 --> 00:03:55,989 sometimes with my secret we may find that there is a lag between the data 47 48 00:03:55,989 --> 00:04:01,569 that is in our master data base and what is in our read replicas either I've got 48 49 00:04:01,569 --> 00:04:05,769 a rear applica set up so that lag between that in seconds is what we call 49 50 00:04:05,769 --> 00:04:11,859 the replicas lag so if we experience a high level of replicas lag there are a 50 51 00:04:11,859 --> 00:04:16,030 number of causes that could be behind that first off we could find that there 51 52 00:04:16,030 --> 00:04:20,859 is a difference in the capability of our master data base and our read replicas 52 53 00:04:20,859 --> 00:04:25,240 for example it may have a different storage class or the master data base 53 54 00:04:25,240 --> 00:04:30,220 might have a high provision die ops and the real etiquette only has a low 54 55 00:04:30,220 --> 00:04:35,110 provision die ops and it's not taking advantage of that it could be that the 55 56 00:04:35,110 --> 00:04:40,150 DB parameter group settings are incompatible with the between the read 56 57 00:04:40,150 --> 00:04:44,349 replicas and the master database it could also be 57 58 00:04:44,349 --> 00:04:50,110 that we are experiencing a high write rate and that is causing the mySQL 58 59 00:04:50,110 --> 00:04:55,049 query crash to be refreshed too often and it can't keep up with that 59 60 00:04:55,049 --> 00:05:00,309 appropriately so the action that we can take is first of all we can monitor 60 61 00:05:00,309 --> 00:05:06,309 replicas lag in seconds using the cloud watch replicas lag metric and that will 61 62 00:05:06,309 --> 00:05:11,019 return in seconds what that replicas lag is if they are identical then it will be 62 63 00:05:11,019 --> 00:05:17,229 returned zero if we have our instance is experiencing a an outage then that will 63 64 00:05:17,229 --> 00:05:23,139 return a minus one we should make sure that we use the same instance and 64 65 00:05:23,139 --> 00:05:28,629 storage class for every replicas as we've got with our master database and 65 66 00:05:28,629 --> 00:05:33,129 if we find that we're having a high write rate we can look at disabling that 66 67 00:05:33,129 --> 00:05:37,629 query cache another thing we can do is we can warm the InnoDB 67 68 00:05:37,629 --> 00:05:43,299 if it's mySQL or xtraDB if it's a Maria DB, we can warm the buffer pool on 68 69 00:05:43,299 --> 00:05:49,839 that read replica. The way we do that is that we can copy data from our master 69 70 00:05:49,839 --> 00:05:54,999 database over to our read replicas and that will update the buffer pool on that 70 71 00:05:54,999 --> 00:06:01,089 read replica and we'll set that replica lag back to zero for us. So that's all I 71 72 00:06:01,089 --> 00:06:06,249 need to discuss now on troubleshooting RDS the best way to learn this stuff is 72 73 00:06:06,249 --> 00:06:09,819 to get hands-on with it and that's what we'll be doing in the next lecture so 73 74 00:06:09,819 --> 00:06:12,629 we'll see you in that one