1 00:00:00,066 --> 00:00:02,633 Hello and welcome to this art tutorial. 2 00:00:02,633 --> 00:00:06,233 So for those of you who haven't checked the Python tutorials on kernel SVM, 3 00:00:06,233 --> 00:00:08,000 I can't wait to show you the results. 4 00:00:08,000 --> 00:00:10,900 It's going to be something you'll definitely see how the kernel 5 00:00:10,900 --> 00:00:11,766 SVM classifier. 6 00:00:11,766 --> 00:00:13,566 You can be a powerful classifier, 7 00:00:13,566 --> 00:00:17,700 especially in our situation where the data set is not linearly separable. 8 00:00:18,000 --> 00:00:22,533 Here you're going to see how it can manage to overcome this non-linear separability 9 00:00:22,800 --> 00:00:26,933 and classify most of our users of the social network correctly. 10 00:00:27,433 --> 00:00:29,966 So let's make this kernel SVM classifier 11 00:00:29,966 --> 00:00:32,966 right now and quickly look at the results. 12 00:00:32,966 --> 00:00:36,333 So as usual we're going to start by setting our working directory. 13 00:00:36,600 --> 00:00:38,400 So right now I'm on my desktop. 14 00:00:38,400 --> 00:00:41,400 I'm going to my machine learning A-Z folder part 15 00:00:41,400 --> 00:00:44,566 three classification section kernel SVM. 16 00:00:44,566 --> 00:00:45,933 And here we are. 17 00:00:45,933 --> 00:00:49,866 Make sure that you have this social network add CSV file a new folder. 18 00:00:49,866 --> 00:00:52,900 And if that's the case, you ready to click on this more button here 19 00:00:53,233 --> 00:00:56,233 to set your folder as working directory. 20 00:00:56,600 --> 00:00:58,600 Here it is all good. 21 00:00:58,600 --> 00:01:01,033 And now let's start making our model. 22 00:01:01,033 --> 00:01:04,100 So we made an awesome classification template 23 00:01:04,100 --> 00:01:06,233 in the logistic regression section. 24 00:01:06,233 --> 00:01:08,300 So we're going to use it. 25 00:01:08,300 --> 00:01:12,100 So here we're going to take everything from here to the end. 26 00:01:12,533 --> 00:01:17,300 Copy this and simply paste it in the kernel SVM. 27 00:01:17,300 --> 00:01:18,500 Our file. 28 00:01:18,500 --> 00:01:23,333 And now what we only need to do is to create our classifier here. 29 00:01:23,733 --> 00:01:25,800 Because the template is made in such a way 30 00:01:25,800 --> 00:01:29,200 to make your machine learning experience as efficient as possible. 31 00:01:29,566 --> 00:01:32,066 So we're not going to create it now. 32 00:01:32,066 --> 00:01:36,100 First we're going to, you know, select all the data pre-processing. 33 00:01:36,100 --> 00:01:38,300 First step to preprocess the data. 34 00:01:38,300 --> 00:01:41,300 So we're going to press Command and Control Plus and to execute. 35 00:01:41,366 --> 00:01:42,266 And all good. 36 00:01:42,266 --> 00:01:45,100 As you can see the code executed properly. 37 00:01:45,100 --> 00:01:49,300 Now we can have a quick look at the data set the training set and the test set. 38 00:01:49,800 --> 00:01:52,800 So we can see that we have 400 observations 39 00:01:52,800 --> 00:01:56,600 in the data set, 300 that were selected to go to the training set. 40 00:01:56,833 --> 00:01:59,466 And 100 selected to go to the test set. 41 00:01:59,466 --> 00:02:03,233 That's because of our oh point 75 split ratio here. 42 00:02:04,133 --> 00:02:07,466 And quick reminder, this dataset is about a social network 43 00:02:07,466 --> 00:02:11,300 that contains information about users in a social network. 44 00:02:11,366 --> 00:02:12,500 So we have their age. 45 00:02:12,500 --> 00:02:16,500 The estimated salary, and the last column tells if yes or no. 46 00:02:16,633 --> 00:02:21,133 The users bought a product of a business kind of this social network, 47 00:02:21,133 --> 00:02:22,566 which is a car company. 48 00:02:22,566 --> 00:02:26,033 So this car company put ads on the social network, 49 00:02:26,300 --> 00:02:31,133 and the social network gathered those informations to see the users reaction. 50 00:02:31,466 --> 00:02:35,266 And so zero means here that the user didn't buy the SUV. 51 00:02:35,500 --> 00:02:38,500 And one here means that the user bought the SUV. 52 00:02:38,666 --> 00:02:42,033 And our goal now is to make a classifier that classifies 53 00:02:42,300 --> 00:02:44,400 those users into two categories. 54 00:02:44,400 --> 00:02:48,100 The category of users that didn't buy the SUV, and the category of users 55 00:02:48,100 --> 00:02:49,633 that bought the SUV. 56 00:02:49,633 --> 00:02:50,466 So let's do this. 57 00:02:50,466 --> 00:02:52,800 Let's do it with kernel SVM. 58 00:02:52,800 --> 00:02:55,800 And so right now we need to create our classifier. 59 00:02:56,633 --> 00:03:00,100 And as usual it's going to be very intuitive and simple. 60 00:03:00,333 --> 00:03:02,266 We're going to use the best library for this. 61 00:03:02,266 --> 00:03:06,066 And besides if you watched the SVM tutorials you're going 62 00:03:06,066 --> 00:03:09,600 to see that we're using the same library and the same function. 63 00:03:09,866 --> 00:03:12,466 We will just need to change some parameters. 64 00:03:12,466 --> 00:03:13,233 So let's do this. 65 00:03:13,233 --> 00:03:13,900 For those of you 66 00:03:13,900 --> 00:03:17,500 who didn't follow the SVM tutorials, you need to install the package. 67 00:03:17,800 --> 00:03:20,666 So to do this you type this command, 68 00:03:20,666 --> 00:03:23,666 install the packages 69 00:03:23,666 --> 00:03:27,400 and in parentheses and quotes E 1071. 70 00:03:27,700 --> 00:03:30,700 So that's the most popular package for SVMs. 71 00:03:30,733 --> 00:03:35,200 Another very popular package is kernel that you can play around and check it out. 72 00:03:35,466 --> 00:03:36,566 It's actually quite simple. 73 00:03:36,566 --> 00:03:40,066 It's kind of the same function, only with slightly different parameters to input. 74 00:03:40,433 --> 00:03:42,733 But it's also a great package for SVMs. 75 00:03:42,733 --> 00:03:46,266 But here we're going to use E 1071 which is one of the most popular packages. 76 00:03:46,566 --> 00:03:51,700 And so that's for you guys who you know, if you go to your packages, this here. 77 00:03:52,066 --> 00:03:55,333 And if you go to E 1071 78 00:03:55,900 --> 00:03:59,866 you can see that, I have this package installed on my packages. 79 00:04:00,000 --> 00:04:01,366 It might not be the case for you. 80 00:04:01,366 --> 00:04:03,133 So that's why I'm adding this line here. 81 00:04:03,133 --> 00:04:06,266 In case you don't have the package installed on your packages in R, 82 00:04:06,566 --> 00:04:08,433 so I won't do it right now. 83 00:04:08,433 --> 00:04:10,633 You just need to select this line and execute. 84 00:04:10,633 --> 00:04:13,200 And it will install the package very quickly. 85 00:04:13,200 --> 00:04:16,533 So I'm just going to put that in comment. 86 00:04:17,233 --> 00:04:21,266 I just pressed command shift plus C and now we need to do. 87 00:04:21,266 --> 00:04:25,600 Another important thing about this library is to you know type this common library. 88 00:04:26,333 --> 00:04:30,300 And then in parentheses the name not in quotes of the library. 89 00:04:30,300 --> 00:04:33,300 It 1071 okay. 90 00:04:33,500 --> 00:04:34,300 And we need to add 91 00:04:34,300 --> 00:04:37,466 this line is very important in case we have some automated scripts 92 00:04:37,733 --> 00:04:41,266 that, you know, make some kernel SVM models today. 93 00:04:41,400 --> 00:04:42,900 So imagine you have a workflow 94 00:04:42,900 --> 00:04:46,400 and you want to include the kernel SVM model in this workflow. 95 00:04:46,500 --> 00:04:48,800 Well you will need some automated scripts. 96 00:04:48,800 --> 00:04:52,833 So it's important to add this line because this will automatically select 97 00:04:53,033 --> 00:04:55,433 your library. Because as you can see here, it's not selected. 98 00:04:55,433 --> 00:04:57,966 That means it's not imported in some way. 99 00:04:57,966 --> 00:05:01,666 So you need this line that will automatically select this package. 100 00:05:02,300 --> 00:05:03,266 Okay. 101 00:05:03,266 --> 00:05:06,000 And now we are ready to create our classifier. 102 00:05:06,000 --> 00:05:08,866 So it's going to be the same as for SVM 103 00:05:08,866 --> 00:05:11,866 which is actually and SVM with a linear kernel. 104 00:05:12,000 --> 00:05:13,666 Here we're not going to use a linear kernel. 105 00:05:13,666 --> 00:05:16,966 We're going to use something else which will be the Gaussian kernel. 106 00:05:17,400 --> 00:05:18,166 So let's do this. 107 00:05:18,166 --> 00:05:21,466 We're going to call our classifier as usual classifier 108 00:05:22,666 --> 00:05:23,700 and then equal. 109 00:05:23,700 --> 00:05:26,966 And then here we're going to use the SVM function 110 00:05:27,266 --> 00:05:30,266 of the E 1071 library. 111 00:05:30,366 --> 00:05:32,533 And now we need to import the parameter. 112 00:05:32,533 --> 00:05:35,000 So as usual let's have a look. 113 00:05:35,000 --> 00:05:36,133 Here it is. 114 00:05:36,133 --> 00:05:38,166 And now you need to click on this. 115 00:05:38,166 --> 00:05:41,266 And here we are in the our documentation for SVM 116 00:05:41,466 --> 00:05:44,466 1071.