1 00:00:00,300 --> 00:00:02,933 Hello and welcome to this art tutorial. 2 00:00:02,933 --> 00:00:05,666 Today we're going to implement Support Vector Machine 3 00:00:05,666 --> 00:00:08,033 or more commonly called SVM. 4 00:00:08,033 --> 00:00:12,533 And after having implemented it on Python we're going to do it on R. 5 00:00:12,900 --> 00:00:16,833 So let's quickly set the folder as Working directory machine learning A-Z 6 00:00:17,400 --> 00:00:20,066 part three classification SVM. 7 00:00:20,066 --> 00:00:21,200 And here we are. 8 00:00:21,200 --> 00:00:24,500 Make sure that you have the social network ads dot csv file. 9 00:00:24,900 --> 00:00:26,433 And then you can click on this more button 10 00:00:26,433 --> 00:00:29,066 here to set the folder as working directory. 11 00:00:29,066 --> 00:00:29,966 All good. 12 00:00:29,966 --> 00:00:31,300 Now we're going to go to our 13 00:00:31,300 --> 00:00:34,833 classification template that we made in the logistic regression section. 14 00:00:35,266 --> 00:00:40,833 So let's take everything from here to the end. 15 00:00:41,066 --> 00:00:45,566 Copy and let's paste it in our SVM file. 16 00:00:46,233 --> 00:00:46,766 Here we go. 17 00:00:46,766 --> 00:00:49,766 And now we need to change very few things. 18 00:00:49,966 --> 00:00:52,733 So first we need to create our classifier here 19 00:00:52,733 --> 00:00:55,666 which is going to be the SVM classifier of course. 20 00:00:55,666 --> 00:00:59,533 And then we just need to change the title here for the graph 21 00:00:59,533 --> 00:01:01,200 the training set results. 22 00:01:01,200 --> 00:01:04,600 So we're going to specify that it's the SVM classifier. 23 00:01:05,066 --> 00:01:08,066 Then for the test set here SVM. 24 00:01:08,066 --> 00:01:13,066 And now let's move back up to create our classifier okay. 25 00:01:13,066 --> 00:01:15,633 So as usual we're going to import a library. 26 00:01:15,633 --> 00:01:19,866 And this time the library is going to be the most popular library 27 00:01:19,866 --> 00:01:24,000 that is used for SVM which is the library E10 71. 28 00:01:24,466 --> 00:01:25,600 So let's do this first. 29 00:01:25,600 --> 00:01:27,700 Let's have a look at the packages here. 30 00:01:27,700 --> 00:01:31,066 Check out to see if you have the E10 71 library. 31 00:01:31,500 --> 00:01:34,800 It is installed on my RCU because I used it a lot of times, but 32 00:01:34,966 --> 00:01:37,966 if you installed R for the first time then you might not have it. 33 00:01:38,133 --> 00:01:40,200 So I'm going to just 34 00:01:40,200 --> 00:01:43,933 write this command here for those of you who need to install it. 35 00:01:43,933 --> 00:01:47,266 So install dot packages. Here it is. 36 00:01:47,533 --> 00:01:50,766 And then in parentheses you put in quotes 37 00:01:51,133 --> 00:01:55,500 the name of the library which is named this way E10 71. 38 00:01:56,133 --> 00:01:56,533 All right. 39 00:01:56,533 --> 00:02:01,500 So if you select and execute this line it will install the package. 40 00:02:01,500 --> 00:02:05,400 I won't do it because it's already installed but it will do it I promise you. 41 00:02:05,700 --> 00:02:08,700 So I'm going to put that as comment there. 42 00:02:09,100 --> 00:02:12,333 And now let's start creating our classifier. 43 00:02:12,333 --> 00:02:18,300 Well first we need to, you know, add this line library E10 71. 44 00:02:18,300 --> 00:02:21,833 And that's in case you want to make some automated script 45 00:02:21,900 --> 00:02:25,700 that select automatically your library here because this won't be 46 00:02:25,700 --> 00:02:26,700 selected all the time. 47 00:02:26,700 --> 00:02:28,800 So you want to make sure that you select it. 48 00:02:28,800 --> 00:02:31,900 And now we are going to create our SVM classifier. 49 00:02:31,900 --> 00:02:34,566 So as usual we're going to call our classifier. 50 00:02:34,566 --> 00:02:37,166 Classifier equals. 51 00:02:37,166 --> 00:02:41,566 And then here we're going to use very simply the function SVM. 52 00:02:42,033 --> 00:02:43,866 So let's do this SVM. 53 00:02:43,866 --> 00:02:47,066 Then let's press F1 to look at the parameters f1. 54 00:02:47,533 --> 00:02:50,366 And here are the parameters the arguments. 55 00:02:50,366 --> 00:02:52,666 So the first argument is formula. 56 00:02:52,666 --> 00:02:53,766 So let's add it. 57 00:02:53,766 --> 00:02:58,966 So as usual it's the formula that is the dependent variable expressed 58 00:02:59,100 --> 00:03:02,600 using a tilde with respect to all your independent variables 59 00:03:02,600 --> 00:03:04,400 which you represent by a dot. 60 00:03:04,400 --> 00:03:07,300 So here we're just going to add formula equals. 61 00:03:07,300 --> 00:03:11,100 So first we take our dependent variable which is purchased. 62 00:03:11,966 --> 00:03:12,900 All right. 63 00:03:12,900 --> 00:03:15,200 And now we use this tilde here. 64 00:03:15,200 --> 00:03:18,433 And then a dot a dot meaning that we're taking 65 00:03:18,433 --> 00:03:22,400 all the independent variables of our data set okay. 66 00:03:22,633 --> 00:03:28,000 Now comma to go to the next argument then the next argument is data. 67 00:03:28,333 --> 00:03:32,266 So of course that's the data on which you want to train your classifier 68 00:03:32,433 --> 00:03:34,033 on which you want your class for you 69 00:03:34,033 --> 00:03:37,233 to learn the data to make the future classification. 70 00:03:37,833 --> 00:03:41,033 And of course, this data is our training set. 71 00:03:41,933 --> 00:03:42,866 Okay. Perfect 72 00:03:43,833 --> 00:03:44,500 comma. 73 00:03:44,500 --> 00:03:45,700 Let's go to the next argument. 74 00:03:45,700 --> 00:03:48,633 The next argument is x and y. 75 00:03:48,633 --> 00:03:50,766 But we don't really care about that. 76 00:03:50,766 --> 00:03:55,300 And however we do care about the type here and the kernel 77 00:03:56,166 --> 00:04:00,466 because the type as you notice there's two type of SVM. 78 00:04:00,466 --> 00:04:05,300 There is the SVM for classification and the SVR for regression, 79 00:04:05,300 --> 00:04:08,866 which is kind of the same support vector machines algorithm, 80 00:04:08,866 --> 00:04:11,400 but there is one for classification and one for regression. 81 00:04:11,400 --> 00:04:15,900 So here with this argument type you choose the classifier version. 82 00:04:15,933 --> 00:04:19,766 And the default type for classification is C classification. 83 00:04:19,966 --> 00:04:22,800 So that's the one we're going to choose for type. 84 00:04:22,800 --> 00:04:25,800 So let's add here type equals 85 00:04:26,266 --> 00:04:29,266 c classification. 86 00:04:30,333 --> 00:04:31,300 All right. 87 00:04:31,300 --> 00:04:34,766 And now the last very important argument is the kernel. 88 00:04:35,166 --> 00:04:40,800 So we're starting with the most simple SVM which is the linear SVM. 89 00:04:40,800 --> 00:04:43,266 So here we're going to choose the linear kernel. 90 00:04:43,266 --> 00:04:46,433 And then we're going to try some more sophisticated SVM 91 00:04:46,433 --> 00:04:48,300 with some Gaussian kernel. 92 00:04:48,300 --> 00:04:51,066 But for now we're going to choose a linear kernel. 93 00:04:51,066 --> 00:04:55,466 So here we'll input kernel equals linear. 94 00:04:55,733 --> 00:04:56,166 All right. 95 00:04:56,166 --> 00:05:00,066 And that's all with these four arguments our classifier is ready. 96 00:05:00,066 --> 00:05:02,400 Our SVM classifier is ready. 97 00:05:02,400 --> 00:05:07,166 So we're going to select the different steps or our sections of our code. 98 00:05:07,366 --> 00:05:11,133 So let's first select this one to preprocess the data. 99 00:05:11,833 --> 00:05:14,600 Here it is okay. So perfect. 100 00:05:14,600 --> 00:05:18,000 We have our data set here our training set and our test set. 101 00:05:18,900 --> 00:05:23,900 So the data set contains 400 observations which are informations about users 102 00:05:23,900 --> 00:05:27,100 of a social network including the age and the estimated salary. 103 00:05:27,100 --> 00:05:27,766 In this column. 104 00:05:27,766 --> 00:05:30,300 Purchase here tells if yes or no. 105 00:05:30,300 --> 00:05:35,000 The user bought a car when the user got the ad that 106 00:05:35,000 --> 00:05:39,300 the car company put on the social network for marketing campaign purposes. 107 00:05:39,600 --> 00:05:42,933 And so our class very as usual, it's going to try to classify 108 00:05:43,300 --> 00:05:45,300 the users into the two categories. 109 00:05:45,300 --> 00:05:47,533 Yes, they bought the car and no, they didn't buy the car.