1 00:00:00,000 --> 00:00:01,650 In this lesson, we're going to discuss 2 00:00:01,650 --> 00:00:04,019 how you can understand the troubleshooting process 3 00:00:04,019 --> 00:00:06,240 by walking through an example of an issue 4 00:00:06,240 --> 00:00:08,310 that you may experience as a network technician 5 00:00:08,310 --> 00:00:09,600 or administrator. 6 00:00:09,600 --> 00:00:10,920 We're going to take this step by step 7 00:00:10,920 --> 00:00:12,870 and walk through all seven portions 8 00:00:12,870 --> 00:00:15,600 of these steps in order in just a few minutes. 9 00:00:15,600 --> 00:00:17,640 Now, the first step is to identify the problem. 10 00:00:17,640 --> 00:00:18,750 So let's pretend you're working 11 00:00:18,750 --> 00:00:20,250 at the help desk and a user reports 12 00:00:20,250 --> 00:00:22,860 that they have intermittent network connectivity issues. 13 00:00:22,860 --> 00:00:24,390 Now, as you start working with this person, 14 00:00:24,390 --> 00:00:26,850 you're gathering information, you're questioning the user, 15 00:00:26,850 --> 00:00:28,920 and you're trying to identify the symptoms. 16 00:00:28,920 --> 00:00:30,180 When you ask the user questions, 17 00:00:30,180 --> 00:00:32,369 they tell you that based on their observations, 18 00:00:32,369 --> 00:00:34,230 it seems that the connection is dropping 19 00:00:34,230 --> 00:00:35,850 and then resuming sporadically. 20 00:00:35,850 --> 00:00:37,830 And they tell you that there's no rhyme or reason. 21 00:00:37,830 --> 00:00:39,420 It's happening at all times of the day, 22 00:00:39,420 --> 00:00:41,010 no matter what program they're using. 23 00:00:41,010 --> 00:00:43,140 So we really don't know what's causing it yet, 24 00:00:43,140 --> 00:00:45,180 but we're going to gather as much information as we can. 25 00:00:45,180 --> 00:00:46,740 We'll ask them additional questions, 26 00:00:46,740 --> 00:00:48,030 we'll identify those symptoms, 27 00:00:48,030 --> 00:00:50,340 and we're going to ask things like, has anything changed, 28 00:00:50,340 --> 00:00:52,200 and can we duplicate the problem? 29 00:00:52,200 --> 00:00:54,780 Also, we're going to approach multiple problems individually, 30 00:00:54,780 --> 00:00:56,730 even if we have multiple users calling up 31 00:00:56,730 --> 00:00:58,710 and telling us they're all having the same issue. 32 00:00:58,710 --> 00:00:59,543 It's important to note 33 00:00:59,543 --> 00:01:01,350 that there are multiple people having this issue, 34 00:01:01,350 --> 00:01:03,150 but we're going to treat each individual user 35 00:01:03,150 --> 00:01:05,370 as their own ticket and their own problem 36 00:01:05,370 --> 00:01:08,490 because each one may be caused by different issues. 37 00:01:08,490 --> 00:01:10,140 This brings us to our second step. 38 00:01:10,140 --> 00:01:12,870 We now need to establish a theory of probable cause. 39 00:01:12,870 --> 00:01:14,130 So here we're going to think to ourself, 40 00:01:14,130 --> 00:01:16,080 what could be causing this issue? 41 00:01:16,080 --> 00:01:19,200 Why is this connection sporadically dropping and resuming? 42 00:01:19,200 --> 00:01:20,940 Well, it could be a faulty router. 43 00:01:20,940 --> 00:01:22,620 It could be that your bandwidth is overloaded. 44 00:01:22,620 --> 00:01:24,480 It could be there's an IP conflict. 45 00:01:24,480 --> 00:01:26,280 It could be the switchboard is failing on the router. 46 00:01:26,280 --> 00:01:27,720 It could be a faulty cable. 47 00:01:27,720 --> 00:01:29,670 There are lots and lots of different things, 48 00:01:29,670 --> 00:01:31,260 but our goal here is to establish 49 00:01:31,260 --> 00:01:33,090 a theory of probable cause. 50 00:01:33,090 --> 00:01:35,400 And when we do this, we are going to question the obvious, 51 00:01:35,400 --> 00:01:37,170 and we're going to consider different approaches 52 00:01:37,170 --> 00:01:38,730 such as the top-to-bottom approach, 53 00:01:38,730 --> 00:01:41,820 the bottom-up approach, or the divide and conquer approach. 54 00:01:41,820 --> 00:01:42,840 Now, because we're having an issue 55 00:01:42,840 --> 00:01:44,940 with intermediate network connectivity, 56 00:01:44,940 --> 00:01:48,060 we might want to go ahead and do this from the bottom up. 57 00:01:48,060 --> 00:01:50,460 And the bottom up would be starting with the physical layer. 58 00:01:50,460 --> 00:01:52,260 So if this is a wireless connection, 59 00:01:52,260 --> 00:01:54,690 we might want to see is there any wireless frequency issues 60 00:01:54,690 --> 00:01:56,610 that are causing this connection to drop? 61 00:01:56,610 --> 00:01:59,100 If it's a cabling issue, we may want to check the cable 62 00:01:59,100 --> 00:02:01,020 and verify it's working correctly. 63 00:02:01,020 --> 00:02:02,130 Now, at this point in step two, 64 00:02:02,130 --> 00:02:04,440 we're not going to take the actual actions yet. 65 00:02:04,440 --> 00:02:05,910 We're just coming up with the idea 66 00:02:05,910 --> 00:02:08,220 of what we think the probable cause is, 67 00:02:08,220 --> 00:02:10,320 and then we move into step three. 68 00:02:10,320 --> 00:02:11,790 So let's move into step three, 69 00:02:11,790 --> 00:02:14,400 and we're going to test the theory to determine the cause. 70 00:02:14,400 --> 00:02:17,190 Here, we need to go ahead and test this theory 71 00:02:17,190 --> 00:02:20,040 and see is this theory correct, and if it's confirmed, 72 00:02:20,040 --> 00:02:21,240 then we would figure out what steps 73 00:02:21,240 --> 00:02:23,340 we need to do to resolve the issue. 74 00:02:23,340 --> 00:02:25,380 If it's not correct and we've ruled it out, 75 00:02:25,380 --> 00:02:27,180 then we need to come up with a new theory, 76 00:02:27,180 --> 00:02:29,790 and we'll start this process again by going back to step two 77 00:02:29,790 --> 00:02:31,500 and then into step three again. 78 00:02:31,500 --> 00:02:32,333 So in our case, 79 00:02:32,333 --> 00:02:34,440 let's say that we believe it was an issue with the router, 80 00:02:34,440 --> 00:02:35,910 so we're going to go ahead and test the router, 81 00:02:35,910 --> 00:02:38,220 and we're going to check its logs and reboot it. 82 00:02:38,220 --> 00:02:40,800 At this point, we've done that and we see if it works. 83 00:02:40,800 --> 00:02:42,570 Nope, we're still getting sporadic issues 84 00:02:42,570 --> 00:02:43,710 with the connectivity. 85 00:02:43,710 --> 00:02:45,690 All right, in that case, we'll move back to step two, 86 00:02:45,690 --> 00:02:47,450 establish a new theory of probable cause. 87 00:02:47,450 --> 00:02:50,040 In this case, we're going to think it's overloaded bandwidth. 88 00:02:50,040 --> 00:02:51,660 So we go into step three, 89 00:02:51,660 --> 00:02:54,240 and we test our theory by monitoring bandwidth usage 90 00:02:54,240 --> 00:02:56,430 to rule out if it's overloaded or not. 91 00:02:56,430 --> 00:02:58,830 In this case, we look at our bandwidth and it looks normal. 92 00:02:58,830 --> 00:03:01,080 It's all within our normal bands and tolerances. 93 00:03:01,080 --> 00:03:03,150 So we go back to step two and we go ahead 94 00:03:03,150 --> 00:03:05,160 and think of another probable cause. 95 00:03:05,160 --> 00:03:07,410 Well, if it keeps dropping network connectivity 96 00:03:07,410 --> 00:03:09,450 and then it resumes itself automatically, 97 00:03:09,450 --> 00:03:11,400 maybe it's an IP conflict. 98 00:03:11,400 --> 00:03:14,430 So I'm going to go into step three and test my theory, 99 00:03:14,430 --> 00:03:15,660 and I'm going to go ahead and scan the network 100 00:03:15,660 --> 00:03:17,940 to see if anybody else is using that IP address. 101 00:03:17,940 --> 00:03:19,710 And if we're using DHCP, 102 00:03:19,710 --> 00:03:21,480 I'm going to see if that DHCP address 103 00:03:21,480 --> 00:03:23,640 is being handed out to multiple people. 104 00:03:23,640 --> 00:03:25,710 At this point, let's say that was the issue. 105 00:03:25,710 --> 00:03:27,390 So now that we know that's the issue, 106 00:03:27,390 --> 00:03:29,730 we have to establish a plan of action. 107 00:03:29,730 --> 00:03:31,560 And this brings us to step four. 108 00:03:31,560 --> 00:03:33,900 In step four, we're going to establish a plan of action 109 00:03:33,900 --> 00:03:37,170 to resolve the problem and identify potential effects. 110 00:03:37,170 --> 00:03:39,420 Now, when we do this, we are going to go ahead 111 00:03:39,420 --> 00:03:42,030 and figure out what is causing this IP conflict. 112 00:03:42,030 --> 00:03:44,010 Maybe we're using static IP addresses 113 00:03:44,010 --> 00:03:46,560 and somebody isn't doing proper IP management. 114 00:03:46,560 --> 00:03:49,770 Or maybe it's a problem with the DHCP server. 115 00:03:49,770 --> 00:03:52,260 In either case, we have to establish what our plan is, 116 00:03:52,260 --> 00:03:55,380 and then we're going to go about fixing that plan in step five. 117 00:03:55,380 --> 00:03:56,820 So for this case, let's go ahead 118 00:03:56,820 --> 00:03:58,830 and say it was an issue with our DHCP server. 119 00:03:58,830 --> 00:04:00,750 So we come up with a plan to go ahead 120 00:04:00,750 --> 00:04:03,690 and troubleshoot that DHCP server to reconfigure it 121 00:04:03,690 --> 00:04:07,110 and to change our scope addresses inside the DHCP server. 122 00:04:07,110 --> 00:04:08,790 This now brings us to step five, 123 00:04:08,790 --> 00:04:12,150 which is to implement the solution or escalate as necessary. 124 00:04:12,150 --> 00:04:12,983 In this case, 125 00:04:12,983 --> 00:04:15,870 if I have permission to do that change on the DHCP server, 126 00:04:15,870 --> 00:04:17,310 I can implement that change. 127 00:04:17,310 --> 00:04:19,350 But if I don't, I'm going to have to escalate that 128 00:04:19,350 --> 00:04:21,570 to the change advisory board for their approval 129 00:04:21,570 --> 00:04:23,670 for us to make a change to the DHCP server 130 00:04:23,670 --> 00:04:25,290 and to the technicians who are responsible 131 00:04:25,290 --> 00:04:27,210 for that DHCP server. 132 00:04:27,210 --> 00:04:28,500 When we get into step six, 133 00:04:28,500 --> 00:04:30,600 we need to verify full system functionality 134 00:04:30,600 --> 00:04:33,180 and implement preventive measures if applicable. 135 00:04:33,180 --> 00:04:36,390 So let's say that the DHCP server had a configuration issue, 136 00:04:36,390 --> 00:04:37,770 we got permission to make the change, 137 00:04:37,770 --> 00:04:39,090 we implemented that change. 138 00:04:39,090 --> 00:04:41,940 Now, we want to see does this solve the problem? 139 00:04:41,940 --> 00:04:43,620 And when we look at it, we figure out that, yes, 140 00:04:43,620 --> 00:04:44,730 it does solve the problem. 141 00:04:44,730 --> 00:04:46,770 We've verified full system functionality, 142 00:04:46,770 --> 00:04:48,540 everybody's getting the proper IP addresses 143 00:04:48,540 --> 00:04:50,040 and everyone is happy. 144 00:04:50,040 --> 00:04:52,530 The only problem is we figured out the issue was, 145 00:04:52,530 --> 00:04:54,690 somebody went in and made a configuration change 146 00:04:54,690 --> 00:04:57,570 to the DHCP server without permission and without going 147 00:04:57,570 --> 00:04:59,670 through the configuration management process. 148 00:04:59,670 --> 00:05:01,350 Now, we have a preventative measure 149 00:05:01,350 --> 00:05:02,580 that we need to implement. 150 00:05:02,580 --> 00:05:04,200 And here we're going to work within the bounds 151 00:05:04,200 --> 00:05:06,330 of our organizational structure to figure out 152 00:05:06,330 --> 00:05:08,280 how do we implement preventative measures. 153 00:05:08,280 --> 00:05:09,120 In this case, 154 00:05:09,120 --> 00:05:11,220 it's probably going to involve talking with your manager, 155 00:05:11,220 --> 00:05:12,840 and having your manager talk with the technician 156 00:05:12,840 --> 00:05:14,490 who made the unauthorized change 157 00:05:14,490 --> 00:05:16,860 to ensure that doesn't happen again in the future. 158 00:05:16,860 --> 00:05:18,330 And then we go into our seventh step, 159 00:05:18,330 --> 00:05:20,790 which is to document our findings, actions, outcomes, 160 00:05:20,790 --> 00:05:22,950 and lessons learned throughout the process. 161 00:05:22,950 --> 00:05:25,230 As we're going through and doing all these different steps, 162 00:05:25,230 --> 00:05:26,490 we are going to be documenting this 163 00:05:26,490 --> 00:05:28,230 inside of our trouble ticket system 164 00:05:28,230 --> 00:05:30,330 so we know exactly what's going on. 165 00:05:30,330 --> 00:05:31,680 And if you go home for the day 166 00:05:31,680 --> 00:05:32,730 and this is still an issue, 167 00:05:32,730 --> 00:05:34,140 somebody else can come in behind you 168 00:05:34,140 --> 00:05:35,970 and start working on this issue too. 169 00:05:35,970 --> 00:05:36,990 So as you're doing this, 170 00:05:36,990 --> 00:05:39,810 we want to record the issue symptoms we gathered in step one. 171 00:05:39,810 --> 00:05:41,850 We want to talk about the different theories we had. 172 00:05:41,850 --> 00:05:43,770 We want to talk about what the diagnostic steps 173 00:05:43,770 --> 00:05:45,390 and tests were in step three, 174 00:05:45,390 --> 00:05:47,760 where we're testing our theory to determine the cause. 175 00:05:47,760 --> 00:05:49,620 We want to talk about the solution that we came up with 176 00:05:49,620 --> 00:05:51,990 and how it worked when we tried to implement it. 177 00:05:51,990 --> 00:05:53,760 And then we want to talk about the outcome 178 00:05:53,760 --> 00:05:54,840 and what changes we made, 179 00:05:54,840 --> 00:05:57,510 and what we're now seeing as a result of those changes. 180 00:05:57,510 --> 00:05:59,370 All of these are things that we want to document 181 00:05:59,370 --> 00:06:00,690 inside our trouble ticket system 182 00:06:00,690 --> 00:06:03,660 and in some cases in our lessons learned system too, 183 00:06:03,660 --> 00:06:06,240 so we can then share these lessons with other people. 184 00:06:06,240 --> 00:06:07,470 That's the idea of how you work 185 00:06:07,470 --> 00:06:09,870 through this troubleshooting methodology methodically, 186 00:06:09,870 --> 00:06:12,240 going from step one down to step seven 187 00:06:12,240 --> 00:06:13,560 by identifying the problem, 188 00:06:13,560 --> 00:06:15,480 establishing a theory of probable cause, 189 00:06:15,480 --> 00:06:17,340 testing the theory to determine the cause, 190 00:06:17,340 --> 00:06:19,380 establishing a plan of action to resolve the problem 191 00:06:19,380 --> 00:06:21,090 and identify potential effects, 192 00:06:21,090 --> 00:06:23,700 implement the solution or escalate as necessary, 193 00:06:23,700 --> 00:06:25,380 verify full system functionality 194 00:06:25,380 --> 00:06:27,630 and implement preventative measures if applicable. 195 00:06:27,630 --> 00:06:30,450 And then finally document findings, actions, outcomes, 196 00:06:30,450 --> 00:06:32,600 and lessons learned throughout the process.