1 00:00:00,480 --> 00:00:01,680 Welcome back. 2 00:00:01,710 --> 00:00:06,730 In the last video we finished off saying the next classification model evaluation metric we're going 3 00:00:06,730 --> 00:00:09,180 to use is a confusion matrix. 4 00:00:09,180 --> 00:00:15,060 Now if you're confused about the name confusion matrix Don't worry it will make sense after we go through 5 00:00:15,060 --> 00:00:15,180 it. 6 00:00:15,390 --> 00:00:25,830 So a confusion matrix let's define it is a quick way to compare the labels a model predicts and the 7 00:00:26,010 --> 00:00:33,450 actual labels it was supposed to predict now. 8 00:00:33,450 --> 00:00:34,320 In essence 9 00:00:37,230 --> 00:00:44,850 giving you an idea of where the model is getting confused. 10 00:00:44,850 --> 00:00:45,770 Beautiful. 11 00:00:45,810 --> 00:00:48,180 Now again if it's still confusing don't why let's have a look. 12 00:00:48,180 --> 00:00:49,310 That's what we always do right. 13 00:00:49,320 --> 00:00:50,580 We prefer to get hands on. 14 00:00:50,580 --> 00:00:51,800 So from our skyline. 15 00:00:51,840 --> 00:00:54,470 We're going to go from The Matrix module. 16 00:00:54,620 --> 00:01:02,330 Import confusion matrix and now we can create a confusion matrix by making some predictions. 17 00:01:02,390 --> 00:01:08,190 We've seen how to do this with the predict function we're going to do that on the test data set and 18 00:01:08,190 --> 00:01:15,330 then we're going to pass confusion matrix function confusion matrix y tests are the true labels and 19 00:01:15,330 --> 00:01:15,990 the prediction. 20 00:01:16,020 --> 00:01:19,730 So this is the crux of what basically all evaluation metrics are doing right. 21 00:01:19,740 --> 00:01:25,560 They're comparing the true labels with the predictions and giving us some insights on that comparison. 22 00:01:25,560 --> 00:01:33,960 And so if we hit shift into 0 we're getting a ride back or does this even mean just for numbers. 23 00:01:34,050 --> 00:01:39,260 Again this is probably another thing that is easier to understand when it's visualized. 24 00:01:39,420 --> 00:01:48,120 One way to do so is visualize confusion matrix with PD cross tab. 25 00:01:48,120 --> 00:01:49,610 So let's see it in action. 26 00:01:49,680 --> 00:01:55,100 So we're gonna go PDA for panders cross tab which is going to compare two different things. 27 00:01:55,100 --> 00:01:55,800 Why test. 28 00:01:55,800 --> 00:02:00,120 We want to compare the Why test labels again just another comparison between the true labels and the 29 00:02:00,120 --> 00:02:01,320 predictions. 30 00:02:01,350 --> 00:02:13,550 Why test why parades and we're gonna pass at row names is actual labels call them names is can you guess. 31 00:02:13,550 --> 00:02:15,450 If not that's fine. 32 00:02:15,600 --> 00:02:21,350 This is short for call names and this will be row names so we don't need that essay. 33 00:02:21,410 --> 00:02:30,000 This is going to be predicted labels. 34 00:02:30,200 --> 00:02:31,740 Wonderful. 35 00:02:31,760 --> 00:02:33,470 So what is going on here. 36 00:02:33,980 --> 00:02:39,920 Well because we are comparing the true labels versus the predicted labels. 37 00:02:39,930 --> 00:02:45,060 This is what our cross tab of the confusion matrix is showing us so where our predicted labels where 38 00:02:45,060 --> 00:02:49,240 our model has predicted 0 1 and our actual labels. 39 00:02:49,260 --> 00:02:55,860 So here the rows are the actual labels and the columns are the predicted labels. 40 00:02:55,890 --> 00:03:04,080 So in our case where the actual label is 0 and the predicted label is 0 we have 22 examples and then 41 00:03:04,110 --> 00:03:10,300 where the predicted label is 1 and the actual label is 1 we have 24 examples. 42 00:03:10,420 --> 00:03:17,700 And now if we total all of these up before we check out the other two twenty two plus seven plus eight 43 00:03:17,820 --> 00:03:25,440 plus twenty four that's gonna give us sixty one and then if we remind ourselves of how many predictions 44 00:03:25,440 --> 00:03:28,210 we've made there's sixty one. 45 00:03:28,290 --> 00:03:28,650 Why. 46 00:03:28,710 --> 00:03:31,120 Because there's sixty one examples in the tests. 47 00:03:32,010 --> 00:03:34,430 Okay so what are these two here. 48 00:03:34,590 --> 00:03:34,830 Right. 49 00:03:34,830 --> 00:03:39,420 So this is let's read it out so we got predicted label where the model is predicted 1. 50 00:03:39,420 --> 00:03:43,380 But the actual label was 0. 51 00:03:43,710 --> 00:03:46,860 Okay so that's a false positive. 52 00:03:46,860 --> 00:03:51,270 So there's seven of those and now we have a predicted label of zero. 53 00:03:51,510 --> 00:03:54,310 But the actual label is 1. 54 00:03:54,430 --> 00:03:56,220 So that's eight examples of that. 55 00:03:56,230 --> 00:04:02,100 That's eight false negatives and so hence this is where the confusion matrix or the word confusion comes 56 00:04:02,100 --> 00:04:10,300 into confusion Matrix these examples here this diagonal is where our model is getting confused a.k.a. 57 00:04:10,360 --> 00:04:16,510 predicting zero where the actual labels won or predicting one where the actual label zero and a confusion 58 00:04:16,510 --> 00:04:17,650 matrix. 59 00:04:17,650 --> 00:04:22,500 We look here where the model got the samples right is on the diagonal. 60 00:04:22,630 --> 00:04:23,200 So this here. 61 00:04:23,200 --> 00:04:29,830 So if we come to our confusion matrix anatomy now we're going to replicate this in the next video. 62 00:04:29,860 --> 00:04:34,410 This our confusion matrix anatomy we see correct here is a diagonal. 63 00:04:34,420 --> 00:04:38,380 So this is true positives true negatives of false negatives is over here. 64 00:04:38,530 --> 00:04:43,570 So on the right side of the confusion matrix and there are false positives is down here. 65 00:04:43,570 --> 00:04:44,770 Why is it a false negative. 66 00:04:44,770 --> 00:04:47,170 Well the predicted label is zero. 67 00:04:47,200 --> 00:04:47,550 Right. 68 00:04:47,920 --> 00:04:52,480 And the model has the true label is actually 1 and this is a false positive because the model is predicted 69 00:04:52,480 --> 00:04:54,580 1 and the true label is 0. 70 00:04:54,610 --> 00:04:58,810 And we've seen this before what a true positive a false positive a true negative and a false negative 71 00:04:58,810 --> 00:04:59,440 is. 72 00:04:59,500 --> 00:05:03,940 So how would we replicate something like this which is again a little bit more visual than our cross 73 00:05:03,940 --> 00:05:04,630 tab. 74 00:05:04,810 --> 00:05:17,140 One way to do so is with let's make a comment here make our confusion matrix more visual with C Bond's 75 00:05:17,680 --> 00:05:19,000 heat map. 76 00:05:19,260 --> 00:05:19,960 Now not gone. 77 00:05:19,960 --> 00:05:21,940 DANIEL What the hell is Seabourn. 78 00:05:21,950 --> 00:05:24,490 Well if we go here seaborne heat map. 79 00:05:24,490 --> 00:05:26,910 This is what I do whenever I don't know something. 80 00:05:27,140 --> 00:05:28,730 I go seaborne heat map. 81 00:05:28,800 --> 00:05:35,350 We look at seaborne you might be asking What even is seaborne seaborne heat map plot rectangular data 82 00:05:35,380 --> 00:05:38,560 as a color encoded matrix right. 83 00:05:39,040 --> 00:05:44,530 Well I'll give you the nuts and bolts Seabourn is a visualization library that is built on top of matte 84 00:05:44,530 --> 00:05:49,420 plot lib and it's pretty relatively easy to use but we're going to mostly just take care of the heat 85 00:05:49,420 --> 00:05:51,300 map function so let's see it. 86 00:05:51,580 --> 00:05:55,230 So import seaborne as SARS. 87 00:05:55,740 --> 00:06:02,350 And then we're going to go set the font scale of seaborne because it can be a bit smaller set the font 88 00:06:02,350 --> 00:06:04,420 scale that we want set. 89 00:06:04,730 --> 00:06:06,500 We want font scale. 90 00:06:06,520 --> 00:06:12,100 This is one of the easiest ways to visualize a confusion matrix other than a PD cross tab. 91 00:06:12,160 --> 00:06:16,220 So we're going to go create a confusion matrix. 92 00:06:16,410 --> 00:06:17,650 We're going to confirm that. 93 00:06:17,650 --> 00:06:23,440 Short for confusion matrix as we like to minimize how many keystrokes that we take. 94 00:06:23,440 --> 00:06:30,180 Why test why spreads passed the test and predictions just like we've done up here and then we're going 95 00:06:30,180 --> 00:06:37,980 to plot it using seaborne S.A. don't hate Matt confirm that 96 00:06:41,050 --> 00:06:49,730 a what module not found era drat we can't use Seabourn now I'm kidding. 97 00:06:49,730 --> 00:06:54,350 So the reason why this module is not found is because you remember right back in the start when we create 98 00:06:54,350 --> 00:06:55,810 an environment using conduct. 99 00:06:56,780 --> 00:07:02,150 Well when we did that we didn't install Seabourn and because Seabourn is built on top of matte plot 100 00:07:02,150 --> 00:07:08,390 Lib we do have map plot layer but we don't have seaborne we have pan is we have cyclone we have Jupiter 101 00:07:08,960 --> 00:07:10,120 but we're missing Seabourn. 102 00:07:10,120 --> 00:07:15,320 So how would you go about installing a module into your condo environment. 103 00:07:15,320 --> 00:07:17,400 What we've got running in our terminal. 104 00:07:17,570 --> 00:07:20,050 Right so this is our condo environment running in our terminal. 105 00:07:20,060 --> 00:07:23,180 This is what serving up our Jupiter notebook server. 106 00:07:23,180 --> 00:07:25,430 How do you think you would install it. 107 00:07:25,630 --> 00:07:33,680 So one way would be to go to terminal start a new window with Mike OCD desktop email cause where my 108 00:07:33,680 --> 00:07:40,950 environment is stored sample project or zoom in here so you can see a sample project. 109 00:07:41,210 --> 00:07:42,790 So we've got our environment here. 110 00:07:42,830 --> 00:07:49,410 And then if we go conduct and list wonderful and then if we wanted to activate it we could go conduct 111 00:07:49,500 --> 00:07:58,230 activate we want our sample project or copy and paste that now and I could go conduct install seaborne 112 00:08:00,040 --> 00:08:04,180 that's one option a one hit enter because I'm going to show you the other option and I'll show you how 113 00:08:04,180 --> 00:08:08,350 you can do it right with energy but a notebook this a little bit bonus tip right. 114 00:08:08,410 --> 00:08:16,270 When doing this during a classification model evaluation video but because we want to use seaborne let's 115 00:08:16,270 --> 00:08:19,660 see how we can do it from within a group notebook. 116 00:08:19,660 --> 00:08:23,650 So this is something you might try whenever you get this module not found error say you came across 117 00:08:23,650 --> 00:08:29,380 some code or you come across a function that you want to use from a library you don't have install this 118 00:08:29,380 --> 00:08:34,390 is how you can do it from with energy but a notebook to install it into the environment so it's ready 119 00:08:34,390 --> 00:08:35,140 to go. 120 00:08:35,140 --> 00:08:42,890 So after running this cell here we're gonna be able to use this cell if everything works to plan yes 121 00:08:43,500 --> 00:08:44,340 prefix. 122 00:08:44,860 --> 00:08:51,370 So what this is saying is we're importing says this is going to let us access our system a.k.a. our 123 00:08:51,370 --> 00:08:59,110 computer and then we're running this bang function which is just a way of telling Jupiter to run a bash 124 00:08:59,110 --> 00:09:01,730 command so as if we did the same command here. 125 00:09:02,710 --> 00:09:08,890 Now if we run bang LSI here it's gonna show us the exact same output but just from with energy but a 126 00:09:08,890 --> 00:09:11,160 notebook we get rid of that so we don't need it. 127 00:09:11,580 --> 00:09:16,600 And we go condo install we're gonna pass code to install yes and prefix we've seen prefix when we're 128 00:09:16,600 --> 00:09:24,400 setting up our environment going to go CIS dot prefix prefix is just another word for path name a.k.a. 129 00:09:25,390 --> 00:09:32,410 this thing here path name and then we're gonna go save on so fingers crossed. 130 00:09:32,540 --> 00:09:33,480 We run their cell. 131 00:09:33,500 --> 00:09:38,900 We should install Seabourn and then we should be able to plot our confusion matrix using sea Bourne's 132 00:09:38,960 --> 00:09:40,440 heat map function. 133 00:09:40,550 --> 00:09:41,510 So let's shift and into 134 00:09:45,870 --> 00:09:47,250 Hall again some loading. 135 00:09:47,310 --> 00:09:48,540 Beautiful. 136 00:09:48,540 --> 00:09:50,270 This is telling us what it's going to do. 137 00:09:50,340 --> 00:09:52,690 A newer version of countries is thank you. 138 00:09:53,010 --> 00:09:56,360 Package plan added slash updated specs seaborne. 139 00:09:56,530 --> 00:09:57,100 Yep. 140 00:09:57,120 --> 00:09:59,330 The following new packages will be installed yet. 141 00:09:59,340 --> 00:10:02,280 Patty seaborne stats models preparing transaction. 142 00:10:02,280 --> 00:10:03,180 Dun dun dun. 143 00:10:03,180 --> 00:10:04,130 Tick tick tick. 144 00:10:04,140 --> 00:10:04,920 Moment truth. 145 00:10:06,940 --> 00:10:08,530 Might take a little while to import. 146 00:10:08,770 --> 00:10:09,700 Oh there we go. 147 00:10:10,390 --> 00:10:12,710 But it's not really offering much. 148 00:10:12,970 --> 00:10:13,550 Why. 149 00:10:13,820 --> 00:10:15,100 We go back to our keynote. 150 00:10:15,100 --> 00:10:16,740 To see our confusion metrics anatomy. 151 00:10:16,740 --> 00:10:20,820 We wanted to look a bit a bit more like this right to get some numbers on there. 152 00:10:20,890 --> 00:10:23,690 Now why isn't this offering them. 153 00:10:23,740 --> 00:10:25,570 Well let's pause a video there. 154 00:10:25,560 --> 00:10:30,140 We'll continue on fixing up our confusion matrix in the next video see how we get some numbers on there. 155 00:10:30,340 --> 00:10:33,220 But take a little note of this one here. 156 00:10:33,220 --> 00:10:35,250 It'll be in the example code reference that you can use. 157 00:10:35,260 --> 00:10:43,160 But this is how you can install how to install a Conda package. 158 00:10:44,660 --> 00:10:50,830 From a Jupiter notebook into the current environment 159 00:10:54,270 --> 00:10:55,840 and then you just put the package over here. 160 00:10:57,250 --> 00:10:58,910 All right let's take a quick break. 161 00:10:58,910 --> 00:11:00,230 There is a lot to cover in that video. 162 00:11:00,320 --> 00:11:01,250 I'll see you in the next one.