0 1 00:00:00,450 --> 00:00:05,250 Please note that this content is targeted for SysOps administrators. If 1 2 00:00:05,250 --> 00:00:09,840 you're a Solutions Architect or a developer you may want to skip over this 2 3 00:00:09,840 --> 00:00:15,660 one. Welcome back to backspace Academy in this lecture we'll have a 3 4 00:00:15,660 --> 00:00:19,230 look at some of the tools that are available for troubleshooting our RDS 4 5 00:00:19,230 --> 00:00:23,100 instances. We're also going to have a look at ways that we can troubleshoot 5 6 00:00:23,100 --> 00:00:31,109 and correct issues with excessive read replicas lag. Okay so here I am in the 6 7 00:00:31,109 --> 00:00:35,789 RDS console we already know how to create an RDS instance so I'm just going 7 8 00:00:35,789 --> 00:00:40,530 to run through this quite quickly so get started now free tier eligible only will 8 9 00:00:40,530 --> 00:00:47,609 select mySQL Community Edition we'll leave everything as it is there will use 9 10 00:00:47,609 --> 00:00:54,749 a t2.micro it's on the free tier. We will give our instance an identifier, we 10 11 00:00:54,749 --> 00:01:11,759 just call it test and a username and a password and next. We'll leave the RDS 11 12 00:01:11,759 --> 00:01:15,960 instance to create a new security group for us, we just launch this into the 12 13 00:01:15,960 --> 00:01:22,429 default subnets, we give the database a name just use tests again we'll be fine 13 14 00:01:22,429 --> 00:01:27,990 and we'll leave everything else as it is, we'll do a backup that will be fine, 14 15 00:01:27,990 --> 00:01:31,859 what we also will do actually is we will put a maintenance window in there 15 16 00:01:31,859 --> 00:01:36,899 as well, so it's going to select a maintenance window so what that means is 16 17 00:01:36,899 --> 00:01:40,859 that when we do any major updates that require a reboot 17 18 00:01:40,859 --> 00:01:45,689 they will occur during this maintenance window, So if you don't want an 18 19 00:01:45,689 --> 00:01:50,670 interruption to your service it's a good idea to pick a maintenance window that 19 20 00:01:50,670 --> 00:01:55,560 corresponds with the lowest time that you've got usage on your database so 20 21 00:01:55,560 --> 00:01:59,389 here we've got here Monday at midnight at for half an hour that will be fine 21 22 00:01:59,389 --> 00:02:03,439 we'll launch that DB instance 22 23 00:02:06,460 --> 00:02:11,740 so after certain amount of time again our DB instance will go from creating through 23 24 00:02:11,740 --> 00:02:18,910 to available for us and after a certain amount of time our mySQL RDS 24 25 00:02:18,910 --> 00:02:24,790 instance is up and running with status available. Now if we found that it didn't 25 26 00:02:24,790 --> 00:02:30,190 for whatever reason become available there is there are some things that we 26 27 00:02:30,190 --> 00:02:34,750 can do we can go into instance actions and we can have a look at the details 27 28 00:02:34,750 --> 00:02:40,810 and we can have a look at the recent events and logs and that will give us a 28 29 00:02:40,810 --> 00:02:44,739 running view of what has happened to our instance and all the events that have 29 30 00:02:44,739 --> 00:02:50,950 occurred and also additional to that we have here the mySQL error running 30 31 00:02:50,950 --> 00:02:55,200 log and so we can actually download that or we can just view it from here and 31 32 00:02:55,200 --> 00:03:01,480 there we can see that there are a number of notes there is a warning here and but 32 33 00:03:01,480 --> 00:03:04,540 there are no errors as such they've been in the log so there's nothing 33 34 00:03:04,540 --> 00:03:09,310 particularly alarming there but if you found something in there that would be 34 35 00:03:09,310 --> 00:03:16,560 certainly a very good troubleshooting tool to find where the problem exists 35 36 00:03:17,040 --> 00:03:22,510 now we just jump back into our DB instances and we'll have a look at some 36 37 00:03:22,510 --> 00:03:26,560 of the issues that can arise with connecting to our our RDS instance and 37 38 00:03:26,560 --> 00:03:30,400 what we can do about it so we've got here our endpoints so we're just going 38 39 00:03:30,400 --> 00:03:38,950 to copy that over and we'll jump into the mySQL workbench and let's have a 39 40 00:03:38,950 --> 00:03:42,030 look at connecting into that 40 41 00:03:54,430 --> 00:03:58,500 so let's just test that connection now 41 42 00:04:04,950 --> 00:04:11,020 so there we can see our connection has been successful, so let's just say we had 42 43 00:04:11,020 --> 00:04:24,400 the wrong username and let's see what happens here and there we can see that 43 44 00:04:24,400 --> 00:04:29,680 we have an access denied so we have been able to establish a connection as such 44 45 00:04:29,680 --> 00:04:34,960 but we haven't been able to authorize that user and the same would happen if 45 46 00:04:34,960 --> 00:04:41,350 we uses a wrong password as well now another thing that can occur is if we 46 47 00:04:41,350 --> 00:04:45,880 use the wrong port so just say we tried to connect in on port 22 that will 47 48 00:04:45,880 --> 00:04:52,720 basically just hang for quite some time before it gives any message, so nothing 48 49 00:04:52,720 --> 00:04:57,550 is happening it's just hanging there but over time it will eventually give a 49 50 00:04:57,550 --> 00:05:00,430 message, so I'm not gonna wait for that so I'm just going to get out of that and 50 51 00:05:00,430 --> 00:05:08,440 jump into telnet so I've just now jumped into the command line and I'm just going 51 52 00:05:08,440 --> 00:05:12,850 to run a telnet command to do a low-level connection to our RDS instance 52 53 00:05:12,850 --> 00:05:19,150 now telnet is not by default enabled in Windows but you need to go into your 53 54 00:05:19,150 --> 00:05:23,919 control panel and then go into turning Windows Features on and off and select 54 55 00:05:23,919 --> 00:05:29,440 telnet and turn that on and that will install the windows telnet client for 55 56 00:05:29,440 --> 00:05:34,419 you, so the command that we use after we've installed telnet is telling it and 56 57 00:05:34,419 --> 00:05:40,900 simply the endpoint and the port so a space and then the port and we'll 57 58 00:05:40,900 --> 00:05:47,950 connect in and that has actually connected in its return something that 58 59 00:05:47,950 --> 00:05:52,000 doesn't really make any sense the reason for that is that although that we are 59 60 00:05:52,000 --> 00:05:59,110 able to physically connect to our RDS instance we cannot actually do anything 60 61 00:05:59,110 --> 00:06:04,360 from there so we cannot actually ssh into an RDS instance and we can't 61 62 00:06:04,360 --> 00:06:07,419 connect in through telnet or anything like that it doesn't let us to do that 62 63 00:06:07,419 --> 00:06:13,060 but if we get a message back like this it means that the low-level physical 63 64 00:06:13,060 --> 00:06:17,500 connection is okay so it means that our security groups are okay 64 65 00:06:17,500 --> 00:06:24,760 VPC is okay our endpoint is okay and our Porter's okay so if we try and connect 65 66 00:06:24,760 --> 00:06:29,940 in with a bad URL let's see what or bad endpoint but to see what happens 66 67 00:06:29,940 --> 00:06:38,470 put a D in there and straight away it comes back and could not connect to that 67 68 00:06:38,470 --> 00:06:42,850 host so this is a really good tool that if you can't really find out what's 68 69 00:06:42,850 --> 00:06:47,050 going on in the mySQL workbench just jump into telling it and just see if you 69 70 00:06:47,050 --> 00:06:51,610 can do a quick connection through to telling it if you can't start having a 70 71 00:06:51,610 --> 00:06:57,340 look at your port your port number start having a look at your your endpoint URL 71 72 00:06:57,340 --> 00:07:01,930 and start looking at your security groups in your VPC as well was that are 72 73 00:07:01,930 --> 00:07:08,830 associated to that RDS instance . Now just jumping back into the RDS console 73 74 00:07:08,830 --> 00:07:12,700 again what I want to have a look at now is just looking at some of the issues 74 75 00:07:12,700 --> 00:07:18,040 that can arise around read replicas or mySQL read replicas and what we can 75 76 00:07:18,040 --> 00:07:22,780 do some of the troubleshooting tools that are available for us to identify 76 77 00:07:22,780 --> 00:07:26,979 what's going on and causing that read replicas lag and and it also some of the 77 78 00:07:26,979 --> 00:07:31,660 corrective actions that we can take. So first off let's look at launching a red 78 79 00:07:31,660 --> 00:07:40,000 lip replica so we go into instance actions create read replicas we'll give 79 80 00:07:40,000 --> 00:07:49,090 that a name and we'll leave everything there as it is and create that raed 80 81 00:07:49,090 --> 00:07:51,570 replica. 81 82 00:07:55,540 --> 00:08:00,920 okay so we can see there that the master database is being modified and we are 82 83 00:08:00,920 --> 00:08:04,880 currently creating that replica so after a certain amount of time that will be 83 84 00:08:04,880 --> 00:08:11,270 available for us okay so after a certain amount of time we've got our mySQL 84 85 00:08:11,270 --> 00:08:15,590 replica database up and running I'm just going to expand those details and I'm 85 86 00:08:15,590 --> 00:08:20,780 going to click on the replication tab here on the left and there we can see 86 87 00:08:20,780 --> 00:08:24,560 the replication details we've got our master database called test we've got 87 88 00:08:24,560 --> 00:08:31,040 our replica database test - we can also see that we've got our replicas lag at 0 88 89 00:08:31,040 --> 00:08:35,540 milliseconds which is what you'd expect because there's no database in the 89 90 00:08:35,540 --> 00:08:42,560 database now if we found that we're having excessive replicas lag the first 90 91 00:08:42,560 --> 00:08:47,150 thing we would do is that we would have a look at our cloud watch our cloud 91 92 00:08:47,150 --> 00:08:50,930 watch metrics and our cloud watch graphs and we would see if there's anything 92 93 00:08:50,930 --> 00:08:55,970 that coincides with that increase in replica lag for example if you had 93 94 00:08:55,970 --> 00:09:02,000 significant write operations to your database at that time then that would 94 95 00:09:02,000 --> 00:09:05,690 obviously cause it because the replication is not keeping up with the 95 96 00:09:05,690 --> 00:09:12,080 speed of what which with which you are changing your data if that's a one-off 96 97 00:09:12,080 --> 00:09:17,930 situation you can look at pre-warming that replica by copying data from 97 98 00:09:17,930 --> 00:09:24,440 your master database over to your replica database now the problem with 98 99 00:09:24,440 --> 00:09:31,280 that is that a replica instance is read-only so what you need to do is that 99 100 00:09:31,280 --> 00:09:36,380 you need to temporarily change that to be writable and then change it back to 100 101 00:09:36,380 --> 00:09:41,570 being read-only if you find that it's not a one-off situation you find that 101 102 00:09:41,570 --> 00:09:44,960 it's an ongoing situation then the next thing you would do is that you would 102 103 00:09:44,960 --> 00:09:49,970 disable your query cache, and we'll have a look at how we do that soon and if you 103 104 00:09:49,970 --> 00:09:53,870 still have problems then you're going to have to look at upgrading your your 104 105 00:09:53,870 --> 00:09:57,980 master database and your replica database to a larger instance type with 105 106 00:09:57,980 --> 00:10:05,510 more memory or more capacity whatever now to make these changes to the query 106 107 00:10:05,510 --> 00:10:11,660 cache or to make our replicas writable temporarily we need to 107 108 00:10:11,660 --> 00:10:16,910 change the DB parameter groups for that instance so again what a parameter group 108 109 00:10:16,910 --> 00:10:22,820 is it defines the parameters around that database application and so it doesn't 109 110 00:10:22,820 --> 00:10:26,960 go into anything low level as far as a networking layer and it doesn't go 110 111 00:10:26,960 --> 00:10:32,150 anything to about the operating system behind that database application it's 111 112 00:10:32,150 --> 00:10:36,620 about the parameters around the my sequel database application so just 112 113 00:10:36,620 --> 00:10:41,180 click on parameter groups here on the left and we can see there that we've got 113 114 00:10:41,180 --> 00:10:47,150 our default mySQL parameter group now that is applied by default to all of 114 115 00:10:47,150 --> 00:10:52,790 these including the read replicas to all of our RDS instance or mySQL 115 116 00:10:52,790 --> 00:10:56,660 earliest instances that are launched now one thing you'll notice from this is if 116 117 00:10:56,660 --> 00:11:01,460 you select it you cannot edit the parameter everything is grayed out so 117 118 00:11:01,460 --> 00:11:07,580 you cannot copy them you cannot you cannot change them in any way if you go 118 119 00:11:07,580 --> 00:11:11,150 to delete them it'll come up with a message saying do you really want to 119 120 00:11:11,150 --> 00:11:16,520 delete that and we obviously don't want to do that so what we do is we create a 120 121 00:11:16,520 --> 00:11:23,900 new parameter group and we select my sequel parameter group family for that 121 122 00:11:23,900 --> 00:11:27,920 we'll give it a name we just call it test again we leave the description just 122 123 00:11:27,920 --> 00:11:35,330 a test that will be fine and we'll create that parameter group so that's 123 124 00:11:35,330 --> 00:11:42,890 been created so we just select it and we'll edit the parameters so we can see 124 125 00:11:42,890 --> 00:11:47,540 there there's a lot of parameters that we can edit edit for our database so the 125 126 00:11:47,540 --> 00:11:53,750 ones that we are concerned with will be the query which I'll just go down to and 126 127 00:11:53,750 --> 00:11:56,980 read as well 127 128 00:11:59,830 --> 00:12:05,080 okay so we can see here we've got query cache size and there's nothing in there 128 129 00:12:05,080 --> 00:12:12,310 so by default the query cache is empty and not being utilized so if it is 129 130 00:12:12,310 --> 00:12:17,620 you can disable it here but by default it is actually disabled so with our 130 131 00:12:17,620 --> 00:12:25,089 replicas we can see here read-only it is currently if it is actually a replica so 131 132 00:12:25,089 --> 00:12:31,000 can see true replica so if this instance is a replica it will be read-only so 132 133 00:12:31,000 --> 00:12:37,450 what we can do is we can select that and we can change that to zero and so that 133 134 00:12:37,450 --> 00:12:42,010 means that it is not read-only its writable and then we can go back and we 134 135 00:12:42,010 --> 00:12:49,240 can save that save the changes so now that we've got a parameter group that 135 136 00:12:49,240 --> 00:12:58,959 allows us to write we can now associate this with our replica database so we go 136 137 00:12:58,959 --> 00:13:06,130 back into instances we'll select our replica database and we go to instance 137 138 00:13:06,130 --> 00:13:17,020 actions and we go to modify so we scroll down we'll leave everything as it is 138 139 00:13:17,020 --> 00:13:25,930 there and we select DB parameter group so we can change this to our new test so 139 140 00:13:25,930 --> 00:13:29,230 it'll provide you with a list of all of the mySQL parameter groups that we 140 141 00:13:29,230 --> 00:13:33,399 have available and because we just created a new one testi available we can 141 142 00:13:33,399 --> 00:13:43,060 go through there and we can continue and we can modify that DB instance and there 142 143 00:13:43,060 --> 00:13:49,240 we can see that's done so that brings us to the end of the troubleshooting 143 144 00:13:49,240 --> 00:13:56,440 section for RDS and by all means go ahead and how to play around with it 144 145 00:13:56,440 --> 00:13:59,529 yourself try and connect into an instance to try and play around with 145 146 00:13:59,529 --> 00:14:04,029 different port numbers see what happens see if you can create a new parameter 146 147 00:14:04,029 --> 00:14:06,490 group and have a play around with that yourself because that's going to be the 147 148 00:14:06,490 --> 00:14:11,699 best way to learn and I'll see you in the next one