1 00:00:00,233 --> 00:00:01,066 Okay, my friends, 2 00:00:01,066 --> 00:00:04,733 we have three steps left and one we're about to implement 3 00:00:04,733 --> 00:00:08,433 is the confusion matrix, which is a simple 2D matrix. 4 00:00:08,466 --> 00:00:11,700 Know two rows, two columns, which will show us the number 5 00:00:11,700 --> 00:00:15,733 of correct predictions we did in both case, you know, predicting 0 or 1, 6 00:00:15,733 --> 00:00:20,300 and how many incorrect predictions we did in both cases, zero and one. 7 00:00:20,400 --> 00:00:20,700 All right. 8 00:00:20,700 --> 00:00:24,266 So that will be a nice way to see quickly where we did right and wrong. 9 00:00:24,266 --> 00:00:28,433 And also in that same code cell we will compute the accuracy. 10 00:00:29,066 --> 00:00:32,766 And now well at the end of the previous tutorial I actually asked you to 11 00:00:32,800 --> 00:00:37,966 try to figure it out on your own by looking at the scikit learn API. 12 00:00:37,966 --> 00:00:39,900 So that's exactly what we're going to do. 13 00:00:39,900 --> 00:00:40,833 First here 14 00:00:40,833 --> 00:00:45,166 I'm going to show you how to navigate it and find the information we want. 15 00:00:45,166 --> 00:00:47,000 You know the tool that we need. 16 00:00:47,000 --> 00:00:50,333 So let's go back to the welcome page scikit learn. 17 00:00:50,466 --> 00:00:53,466 Then remember you have to go to API here which contains 18 00:00:53,666 --> 00:00:56,666 all the classes and functions from the different modules. 19 00:00:57,000 --> 00:00:58,900 And I actually give you a hint. 20 00:00:58,900 --> 00:01:00,233 In the previous tutorial 21 00:01:00,233 --> 00:01:06,400 I told you to go to look into a module called metrics remember? 22 00:01:06,633 --> 00:01:09,633 So we just have to scroll down a bit here 23 00:01:09,766 --> 00:01:12,766 and we will find very soon metrics. 24 00:01:12,800 --> 00:01:14,533 It's in the alphabetical order. 25 00:01:14,533 --> 00:01:17,400 So there it is metrics. 26 00:01:17,400 --> 00:01:19,200 And then it's very well organized. 27 00:01:19,200 --> 00:01:23,166 As you can see you have the regression metrics in which we already covered 28 00:01:23,166 --> 00:01:26,466 the most important ones and the classification metrics. 29 00:01:26,700 --> 00:01:29,266 And here we are of course dealing with classification. 30 00:01:29,266 --> 00:01:31,400 Therefore you had to look into here. 31 00:01:31,400 --> 00:01:33,700 And now we're getting closer. We're getting warmer. 32 00:01:33,700 --> 00:01:35,666 What do we see inside. 33 00:01:35,666 --> 00:01:38,966 Well we actually see metrics confusion matrix. 34 00:01:39,000 --> 00:01:43,800 And that's exactly what we'll take in order to build this confusion matrix. 35 00:01:44,500 --> 00:01:45,066 All right. 36 00:01:45,066 --> 00:01:48,900 And then what I usually do is simply have a look at one of the examples, 37 00:01:48,900 --> 00:01:52,600 because it usually contains the code on how to build such a tool. 38 00:01:52,600 --> 00:01:55,900 You know the confusion matrix, but it usually give you this example 39 00:01:55,900 --> 00:01:59,233 on some random values of dependent variable vectors. 40 00:01:59,233 --> 00:02:00,900 You know, that's like why test? 41 00:02:00,900 --> 00:02:02,666 You know, the vector containing the real results. 42 00:02:02,666 --> 00:02:05,300 That's why pred containing the vector of predictions. 43 00:02:05,300 --> 00:02:08,866 And it tells you how to apply this confusion matrix function 44 00:02:09,100 --> 00:02:11,966 onto the vector of real results and the vector of predictions. 45 00:02:11,966 --> 00:02:12,300 Okay. 46 00:02:12,300 --> 00:02:16,500 So in order to implement this, well let's just take this so that we can 47 00:02:16,500 --> 00:02:18,900 import indeed that confusion matrix 48 00:02:18,900 --> 00:02:22,500 which belongs to the metrics module from the second library. 49 00:02:22,733 --> 00:02:23,566 So there we go. 50 00:02:23,566 --> 00:02:26,566 Let's face that. First I just copied it. 51 00:02:26,566 --> 00:02:28,933 Let's create a new code cell and paste this. 52 00:02:28,933 --> 00:02:31,200 So that's how you import the confusion matrix. 53 00:02:31,200 --> 00:02:34,833 And then let's grab that other piece of code. 54 00:02:34,833 --> 00:02:40,200 You know this particular line where indeed we apply the confusion matrix function 55 00:02:40,600 --> 00:02:45,200 onto the vectors of predictions and real results. 56 00:02:45,533 --> 00:02:46,400 So let's paste this. 57 00:02:46,400 --> 00:02:48,300 So you see what I'm trying to do here. 58 00:02:48,300 --> 00:02:50,166 I don't do this because I'm lazy. 59 00:02:50,166 --> 00:02:53,500 I do this in order to train you to be independent. 60 00:02:53,500 --> 00:02:56,466 You know, whenever you need a new information or a new tool 61 00:02:56,466 --> 00:03:00,400 that you need, I'm training you on how to find it in the scikit 62 00:03:00,400 --> 00:03:04,300 learn API, and I will do the same later on when we start working with TensorFlow. 63 00:03:04,300 --> 00:03:07,333 You know, in the deep learning part of this course, part eight. 64 00:03:07,700 --> 00:03:09,666 But so you see, this is very important. 65 00:03:09,666 --> 00:03:13,400 I really want you to be independent and figure things out on your own. 66 00:03:13,866 --> 00:03:18,100 And now inside this confusion matrix function, what do we have to replace. 67 00:03:18,300 --> 00:03:21,333 Well, you know, they call the vector here of real result y. 68 00:03:21,333 --> 00:03:21,833 True. 69 00:03:21,833 --> 00:03:24,333 But us, since we actually want to distinguish 70 00:03:24,333 --> 00:03:27,666 the vector of real results in the training set and the test set, 71 00:03:27,900 --> 00:03:30,533 well we actually called are y true vectors. 72 00:03:30,533 --> 00:03:34,200 Let's say y train for the training set and y test for the test set. 73 00:03:34,200 --> 00:03:38,400 So here, since of course the confusion matrix is usually evaluated 74 00:03:38,533 --> 00:03:39,566 on the test set. 75 00:03:39,566 --> 00:03:44,366 You know for new observations here we have to replace y true by white test 76 00:03:44,566 --> 00:03:49,500 so that we will get indeed the confusion matrix showing the correct predictions 77 00:03:49,500 --> 00:03:53,933 and the incorrect predictions for both cases zero and one on the test set. 78 00:03:54,733 --> 00:03:55,300 Great. 79 00:03:55,300 --> 00:04:00,400 So we will actually put the output of this confusion matrix function apply 80 00:04:00,400 --> 00:04:04,700 to white this and white bread into a new variable, which we're going to call c 81 00:04:04,700 --> 00:04:08,500 m which stands for confusion matrix and which will be exactly 82 00:04:08,500 --> 00:04:11,666 the output returned by this confusion matrix function. 83 00:04:11,666 --> 00:04:12,633 So there we go. 84 00:04:12,633 --> 00:04:15,733 And then we'll add a final print 85 00:04:16,366 --> 00:04:21,533 c m in order to print indeed that confusion matrix okay. 86 00:04:21,533 --> 00:04:25,433 So these are the three lines of code that allow indeed to build that confusion 87 00:04:25,433 --> 00:04:26,833 matrix and print it. 88 00:04:26,833 --> 00:04:33,433 And now remember that I also asked you to compute the accuracy and will to do this. 89 00:04:33,433 --> 00:04:35,933 We just had to do exactly the same as which we just did. 90 00:04:35,933 --> 00:04:39,633 You know, to find that information, actually try to press pause on the video 91 00:04:39,633 --> 00:04:42,633 and find it yourself if not already. 92 00:04:42,666 --> 00:04:45,933 So we're going to go back to that matrix module. 93 00:04:45,933 --> 00:04:49,300 You know, remember the metrics module from the scikit learn library. 94 00:04:49,633 --> 00:04:52,300 And we're going to look back into this classification 95 00:04:52,300 --> 00:04:55,733 matrix section to find the accuracy. 96 00:04:55,966 --> 00:04:57,766 So according to you where is it. 97 00:04:57,766 --> 00:04:59,400 Well it's hard to miss it. 98 00:04:59,400 --> 00:05:03,133 This is actually the first one accuracy score which computes indeed 99 00:05:03,200 --> 00:05:06,200 the accuracy classification score, which is just, 100 00:05:06,266 --> 00:05:09,033 you know, the rate of correct predictions. 101 00:05:09,033 --> 00:05:10,166 So let's do this. 102 00:05:10,166 --> 00:05:14,066 Let's click this link and we will get indeed 103 00:05:14,233 --> 00:05:17,933 all the documentation on this accuracy score function, 104 00:05:18,166 --> 00:05:22,600 which returns the accuracy of your model on whatever set of data. 105 00:05:22,600 --> 00:05:25,433 And we'll apply it of course, to the test set. 106 00:05:25,433 --> 00:05:28,033 So first of all here as we can see 107 00:05:28,033 --> 00:05:32,100 this accuracy score function belongs of course to the same metrics module. 108 00:05:32,366 --> 00:05:35,100 So we don't have to take all of this. Again. 109 00:05:35,100 --> 00:05:38,133 We can just take that name of the function here. 110 00:05:38,533 --> 00:05:41,400 And I will show you what to do here. 111 00:05:41,400 --> 00:05:44,933 Just next to the confusion matrix you add a comma 112 00:05:44,933 --> 00:05:48,100 and then you can paste this other function you need. 113 00:05:48,266 --> 00:05:48,766 You know what 114 00:05:48,766 --> 00:05:52,533 you have to import still from that metrics module from the scikit learn library.