1 00:00:00,100 --> 00:00:02,666 Hello and welcome to this art tutorial. 2 00:00:02,666 --> 00:00:07,033 Today we're going to implement the k nearest neighbors classifier on R. 3 00:00:07,333 --> 00:00:08,666 So let's start right now. 4 00:00:08,666 --> 00:00:11,300 The first thing we're going to do is to set the working directory. 5 00:00:11,300 --> 00:00:12,966 So right now I'm on my desktop. 6 00:00:12,966 --> 00:00:16,833 We're going to go to Machine Learning A-Z folder part three classification. 7 00:00:17,100 --> 00:00:19,800 And then section K Nearest neighbors. 8 00:00:19,800 --> 00:00:22,333 All right. That's the folder we want to set. It's working directory. 9 00:00:22,333 --> 00:00:25,533 Make sure that you have the social network add CSV file. 10 00:00:25,933 --> 00:00:28,600 And if that's the case you're ready to click on this more button 11 00:00:28,600 --> 00:00:31,600 here to set the folder as working directory. 12 00:00:31,866 --> 00:00:33,300 Here we go. That's done. 13 00:00:33,300 --> 00:00:37,466 And now we're going to use our classification template to be efficient 14 00:00:37,833 --> 00:00:41,100 and generate all the results in a flashlight. 15 00:00:41,633 --> 00:00:45,700 So I'm going to select everything from here to the top. 16 00:00:46,133 --> 00:00:48,000 There you go. Copy and 17 00:00:49,200 --> 00:00:50,333 paste it here. 18 00:00:50,333 --> 00:00:52,700 And now we need to change a very few things. 19 00:00:52,700 --> 00:00:55,200 First we need to create the classifier here. 20 00:00:55,200 --> 00:00:57,833 And we just need to change the titles 21 00:00:57,833 --> 00:01:01,233 of our plots in the training set results and the test results. 22 00:01:01,233 --> 00:01:04,233 So let's do this right now so that we don't forget 23 00:01:04,266 --> 00:01:05,933 I'm going to replace classifier by. 24 00:01:05,933 --> 00:01:07,900 So it's k and ends. 25 00:01:07,900 --> 00:01:11,933 We're just doing this to specify the algorithm that we're plotting. 26 00:01:12,800 --> 00:01:16,866 And here as well classifier is replaced by k. 27 00:01:17,366 --> 00:01:19,066 And then right. 28 00:01:19,066 --> 00:01:23,000 And now what we only need to do is to create our classifier. 29 00:01:23,433 --> 00:01:27,000 But first let's select and execute the pre-processing steps. 30 00:01:27,000 --> 00:01:29,700 So I just need to select everything from here 31 00:01:29,700 --> 00:01:32,700 and then press Command Control plus enter to execute. 32 00:01:33,766 --> 00:01:35,100 That's done. All right. 33 00:01:35,100 --> 00:01:37,733 So we have our data sets. We can have a look. 34 00:01:37,733 --> 00:01:41,866 That's our data set our training set and our test set 35 00:01:42,433 --> 00:01:45,333 which are well scaled okay. 36 00:01:45,333 --> 00:01:48,633 So as a reminder the data set contains information 37 00:01:48,766 --> 00:01:51,300 about users in a social network. 38 00:01:51,300 --> 00:01:54,600 The social network has a business client, which is a car company 39 00:01:54,900 --> 00:01:57,733 that puts ads on the social network. 40 00:01:57,733 --> 00:02:01,366 And the social network gathered not only the information 41 00:02:01,366 --> 00:02:04,500 like the age and the estimated salary of those users, 42 00:02:04,766 --> 00:02:09,133 but it also took the response to these users to the ad. 43 00:02:09,133 --> 00:02:10,766 That is zero. 44 00:02:10,766 --> 00:02:12,566 If the user didn't buy 45 00:02:12,566 --> 00:02:16,366 the product, the car, and one if the user bought the product. 46 00:02:16,800 --> 00:02:20,366 So, you know, it's a very cool luxury 47 00:02:20,366 --> 00:02:23,366 SUV launched at a ridiculously low price. 48 00:02:23,633 --> 00:02:28,433 So a lot of people, when they saw the ad said, let's do this, let's get the car. 49 00:02:28,433 --> 00:02:30,533 And we can see there are a lot of buyers. 50 00:02:30,533 --> 00:02:33,933 So yeah, as you can see, it must be a very cool car and cheap. 51 00:02:34,200 --> 00:02:39,000 So now let's go back to and and create our classifier. 52 00:02:39,666 --> 00:02:41,366 So the classifier is a k. 53 00:02:41,366 --> 00:02:42,333 And then classifier. 54 00:02:42,333 --> 00:02:44,533 And to create this we're going to 55 00:02:45,566 --> 00:02:47,933 first import the right library. 56 00:02:47,933 --> 00:02:51,266 We need a library for this which is by the way called class. 57 00:02:51,700 --> 00:02:54,700 So let's do this. 58 00:02:54,833 --> 00:02:57,300 So if by any chance the class 59 00:02:57,300 --> 00:03:01,733 library is not in your packages list, you need to install it using this command. 60 00:03:01,733 --> 00:03:06,033 Install the packages and in parentheses and in quotes the name of the class 61 00:03:06,033 --> 00:03:09,766 which is class, but I think you might have it by default, so. 62 00:03:10,133 --> 00:03:13,933 But if that's not the case, please write this command with class inside. 63 00:03:14,133 --> 00:03:16,633 Then execute and it will install it. 64 00:03:16,633 --> 00:03:19,633 But so here we're just going to write library 65 00:03:19,966 --> 00:03:23,033 and in parentheses class that's selected automatically. 66 00:03:23,033 --> 00:03:26,933 The class you see that right now the class package is not selected. 67 00:03:27,300 --> 00:03:31,233 And once this is executed it will select it automatically. 68 00:03:31,800 --> 00:03:32,666 All right. 69 00:03:32,666 --> 00:03:36,766 And now now we're going to do something a little different 70 00:03:37,166 --> 00:03:39,733 because usually you know we create a class. 71 00:03:39,733 --> 00:03:44,633 If I go here and then we create this vector of prediction y using this predict 72 00:03:44,633 --> 00:03:49,333 function that we apply on our classifier and on new observations in the test set. 73 00:03:49,800 --> 00:03:54,100 But here it's like we're going to do this two steps all in once because 74 00:03:54,300 --> 00:03:57,300 we're actually going to remove this line. 75 00:03:57,366 --> 00:04:00,366 And we are actually going to replace this command 76 00:04:00,366 --> 00:04:04,866 fitting K and end to the training set 77 00:04:06,233 --> 00:04:08,000 and predicting 78 00:04:08,000 --> 00:04:11,100 the test set results. 79 00:04:11,200 --> 00:04:16,200 Because all in once we're going to fit K and into our training set and predict 80 00:04:16,200 --> 00:04:20,233 if the users of our test set by yes or no, the SUV. 81 00:04:20,566 --> 00:04:21,600 So you're going to understand 82 00:04:21,600 --> 00:04:25,200 why we're going to directly create a vector of prediction y pred here. 83 00:04:26,100 --> 00:04:29,133 And then, you know, before we had classifier equals 84 00:04:29,366 --> 00:04:32,966 then the SVM or logistic regression 85 00:04:33,166 --> 00:04:37,033 here, it's going to be directly white bread equals k and n 86 00:04:37,033 --> 00:04:40,300 because actually this k and then function here will return 87 00:04:40,300 --> 00:04:44,133 the predictions of the test set observations okay. 88 00:04:44,133 --> 00:04:46,700 So now let's have a look at this k and in function. 89 00:04:46,700 --> 00:04:48,133 And you'll get it. 90 00:04:48,133 --> 00:04:51,233 So press F1 okay. 91 00:04:51,366 --> 00:04:53,966 So k nearest neighbor classification. 92 00:04:53,966 --> 00:04:55,800 Let's first input the arguments. 93 00:04:55,800 --> 00:04:58,300 The first argument is train. 94 00:04:58,300 --> 00:05:01,800 So you can guess that by train they mean the training set. 95 00:05:02,200 --> 00:05:05,900 So here we need to specify what the training set is. 96 00:05:06,100 --> 00:05:09,100 And actually since we call it training set it's training set. 97 00:05:09,266 --> 00:05:10,466 So train set here. 98 00:05:11,766 --> 00:05:13,566 All right training set. 99 00:05:13,566 --> 00:05:16,566 However the training set as you can see contains 100 00:05:17,066 --> 00:05:20,400 the independent variables and the dependent variable. 101 00:05:20,666 --> 00:05:23,666 And we only need to take the independent variables. 102 00:05:23,733 --> 00:05:28,100 We want to include the dependent variable in the k in in function here. 103 00:05:28,100 --> 00:05:30,533 So here we need to take only the first two columns. 104 00:05:30,533 --> 00:05:32,600 So we're going to add a bracket here. 105 00:05:32,600 --> 00:05:35,933 And then comma because on the left of this comma 106 00:05:36,233 --> 00:05:39,933 all the lines that I'm taking all the observations and that the right of this 107 00:05:39,933 --> 00:05:43,700 comma is the columns I want to include in the training set. 108 00:05:44,133 --> 00:05:46,666 So it's all the columns except for the last one. 109 00:05:46,666 --> 00:05:50,133 So I'm going to put here minus three 110 00:05:50,233 --> 00:05:53,833 which means I'm removing the last column of my training set.