1 00:00:00,200 --> 00:00:01,066 Hello my friends. 2 00:00:01,066 --> 00:00:06,400 Welcome to this new practical activity on our second classification model, 3 00:00:06,466 --> 00:00:11,566 which is of course the k nearest neighbors and now we're going to go into point 4 00:00:11,566 --> 00:00:17,100 three classification and then k nearest neighbors k and n. 5 00:00:17,833 --> 00:00:18,233 All right. 6 00:00:18,233 --> 00:00:20,533 And as usual we're going to start with Python. 7 00:00:20,533 --> 00:00:24,000 And inside this folder you will indeed find two files. 8 00:00:24,266 --> 00:00:27,966 Of course the k nearest neighbors implementation in ipynb format 9 00:00:27,966 --> 00:00:31,366 which you can open with either Jupyter Notebook or Google Colab. 10 00:00:31,366 --> 00:00:34,866 And the same data set social network ads 11 00:00:35,166 --> 00:00:39,466 containing 400 observations, where each observations 12 00:00:39,466 --> 00:00:42,466 you know, each row corresponds to customers. 13 00:00:42,600 --> 00:00:45,200 And for each of these customers, we get the age. 14 00:00:45,200 --> 00:00:47,800 That's the first feature, the estimated salary. 15 00:00:47,800 --> 00:00:49,033 That's the second feature. 16 00:00:49,033 --> 00:00:54,066 And their purchase decision, whether or not they bought yes or no, the SUV. 17 00:00:54,300 --> 00:00:56,833 And that's of course the dependent variable. 18 00:00:56,833 --> 00:01:01,466 And our exercise here is to learn some correlations between these features. 19 00:01:01,466 --> 00:01:06,166 The age and estimated salary to predict this dependent variable purchased. 20 00:01:06,166 --> 00:01:10,100 And the goal of all this is to of course, predict which customers 21 00:01:10,100 --> 00:01:14,133 are going to buy the new SUV so that we can optimize our targeting. 22 00:01:14,333 --> 00:01:15,100 All right. 23 00:01:15,100 --> 00:01:16,766 So exactly the same. 24 00:01:16,766 --> 00:01:20,700 And now we're going to open this implementation k nearest neighbor 25 00:01:20,833 --> 00:01:24,066 with either Google Collaboratory or Jupyter Notebook. 26 00:01:24,066 --> 00:01:25,366 Choose your favorite. 27 00:01:25,366 --> 00:01:30,000 And now it is opening the notebook laying it out in a second. 28 00:01:30,300 --> 00:01:32,633 So learning it laying it out there we go. 29 00:01:32,633 --> 00:01:34,166 And that's the notebook. 30 00:01:34,166 --> 00:01:37,466 And as you noticed, as you probably noticed, I actually kept 31 00:01:37,633 --> 00:01:41,933 the logistic regression implementation to show you, indeed, that 32 00:01:42,300 --> 00:01:47,200 all the code cells inside these two implementations are the same. 33 00:01:47,200 --> 00:01:50,200 You know, all the code cells are the same, except 34 00:01:50,566 --> 00:01:55,366 the one where we indeed build and train the model on the training set. 35 00:01:55,433 --> 00:01:55,766 Right? 36 00:01:55,766 --> 00:01:59,700 So up to here, you know, all this is the same, all this is the same. 37 00:02:00,000 --> 00:02:03,533 And once we reach that step where we train the classification 38 00:02:03,533 --> 00:02:06,900 model on the train set, well that's where we have this single change. 39 00:02:06,900 --> 00:02:08,766 And then all the rest is the same. 40 00:02:08,766 --> 00:02:13,100 Then we predict a new result, you know, with the same name 41 00:02:13,100 --> 00:02:15,066 for the classifier which we called classifier. 42 00:02:15,066 --> 00:02:18,666 Then we predict test result here once again we have nothing to change. 43 00:02:18,666 --> 00:02:21,666 It's all the same right? All the same. 44 00:02:21,766 --> 00:02:25,333 So if by any chance you are starting with K 45 00:02:25,333 --> 00:02:27,233 and then you know before logistic regression, 46 00:02:27,233 --> 00:02:30,133 I really encourage you to do logistic regression first 47 00:02:30,133 --> 00:02:32,900 because all these cells are explained in detail. 48 00:02:32,900 --> 00:02:36,600 And for this in an implementation we will just re-implement, 49 00:02:37,233 --> 00:02:41,333 you know, that cell to train the K and and model on the training set. 50 00:02:41,833 --> 00:02:42,366 All right. So see. 51 00:02:42,366 --> 00:02:45,766 So that's why it is a classification template 52 00:02:46,033 --> 00:02:49,633 which you can use to build any other classification models. 53 00:02:49,633 --> 00:02:52,866 And together we will of course use this classification template 54 00:02:52,966 --> 00:02:58,066 to build all our other classification models, including K and N support 55 00:02:58,066 --> 00:03:02,700 Vector Machine, kernel, SVM, Naive Bayes, Decision Tree Classification 56 00:03:02,700 --> 00:03:07,733 and Random Forest Classification and we will all build them in maximum efficiency. 57 00:03:07,966 --> 00:03:09,500 All right. So let's start with K. 58 00:03:09,500 --> 00:03:14,133 And as you understood the only cell that we have to change here is this one. 59 00:03:14,366 --> 00:03:18,266 So actually since we cannot do it here you know re-implement it here 60 00:03:18,266 --> 00:03:20,233 because this is in read only mode. 61 00:03:20,233 --> 00:03:23,700 Well we'll have to create a new file, you know, a copy of this file 62 00:03:23,700 --> 00:03:24,400 by clicking here. 63 00:03:24,400 --> 00:03:27,333 This button, save a copy and drive. 64 00:03:27,333 --> 00:03:29,200 This will create a copy. 65 00:03:29,200 --> 00:03:32,200 And inside which we will be able to re-implement 66 00:03:32,200 --> 00:03:35,900 that cell to build and train the K, and then model. 67 00:03:35,900 --> 00:03:36,300 All right. 68 00:03:36,300 --> 00:03:38,000 So let's do this. 69 00:03:38,000 --> 00:03:39,966 We just need to scroll down here. 70 00:03:39,966 --> 00:03:42,266 All the rest is the same. And there we go. 71 00:03:42,266 --> 00:03:46,633 So what we're going to do is just remove that cell and there we go. 72 00:03:46,966 --> 00:03:49,500 Now we can re-implement that cell. 73 00:03:49,500 --> 00:03:51,733 So let's create a new code cell. 74 00:03:51,733 --> 00:03:53,866 And now I would like you to please 75 00:03:53,866 --> 00:03:57,500 press pause on this video to indeed try to do it yourself first. 76 00:03:57,733 --> 00:03:59,233 Because once again, 77 00:03:59,233 --> 00:04:03,133 I not only want to train you on machine learning, but also I want to train you 78 00:04:03,133 --> 00:04:06,300 on how to be independent and create things on your own. 79 00:04:06,300 --> 00:04:09,000 You know, create your machine learning models on your own. 80 00:04:09,000 --> 00:04:12,366 I already guided you on how to navigate the scikit 81 00:04:12,366 --> 00:04:15,400 learn API to find some information in some tools. 82 00:04:15,633 --> 00:04:18,766 And now I would like you to do this same exercise again. 83 00:04:19,000 --> 00:04:23,366 Please press pause on the video and navigate the scikit learn API 84 00:04:23,366 --> 00:04:28,200 to find the class that allows indeed to build the k and end model, 85 00:04:28,200 --> 00:04:31,200 and then you know how to train it on the training set. 86 00:04:31,500 --> 00:04:35,266 All right, so in two seconds now we're going to start the solution together. 87 00:04:36,333 --> 00:04:37,266 Here we go. 88 00:04:37,266 --> 00:04:37,666 All right. 89 00:04:37,666 --> 00:04:41,700 So let's suppose you know I'm just like you I'm not an expert in machine learning. 90 00:04:41,700 --> 00:04:45,566 And I would like to build and train a K in and model on the training set. 91 00:04:45,900 --> 00:04:49,500 So since I actually have no idea of what is the name of the class 92 00:04:49,500 --> 00:04:54,566 that does it, well, I'm going to go of course, to search it online and inside. 93 00:04:54,566 --> 00:04:58,433 Well, we will do it this way scikit learn. 94 00:04:58,766 --> 00:05:00,966 Then we're going to go to the first link. 95 00:05:00,966 --> 00:05:01,600 Right. 96 00:05:01,600 --> 00:05:05,066 We will end up in the scikit learn welcome page. 97 00:05:05,066 --> 00:05:08,700 Then we're going to go remember to API which contains all the 98 00:05:08,900 --> 00:05:11,933 modules and functions and classes of scikit learn. 99 00:05:12,166 --> 00:05:13,466 And now there we go. 100 00:05:13,466 --> 00:05:19,000 We are looking for the class that allows us to build the k and model. 101 00:05:19,466 --> 00:05:20,500 So let's scroll down a bit. 102 00:05:20,500 --> 00:05:23,466 And if we have, you know, too much difficulty to find it, 103 00:05:23,466 --> 00:05:27,466 well we will do Ctrl or command F to find it. 104 00:05:27,500 --> 00:05:28,833 You know, there are many tricks actually. 105 00:05:28,833 --> 00:05:33,966 You can also directly type in your search bar K nearest neighbor class 106 00:05:33,966 --> 00:05:37,566 scikit learn or scikit learn k nearest neighbor class, something like that. 107 00:05:38,033 --> 00:05:42,100 But I like to navigate the secondary API because it is really well made. 108 00:05:42,900 --> 00:05:46,300 And so here by scrolling down, do we find it? 109 00:05:46,500 --> 00:05:49,933 Yes. Here it is. Nearest neighbor. 110 00:05:49,966 --> 00:05:51,700 So it was actually hard to miss. 111 00:05:51,700 --> 00:05:54,100 Indeed we had to scroll down a bit, but that's okay 112 00:05:54,100 --> 00:05:57,900 because it's really good to get familiar with the Scikit-Learn library.