1 00:00:00,133 --> 00:00:01,633 Hello my friends, and welcome 2 00:00:01,633 --> 00:00:06,000 to this new practical activity on support vector machines. 3 00:00:06,533 --> 00:00:06,900 All right. 4 00:00:06,900 --> 00:00:11,466 So we already built two classification models logistic regression and k. 5 00:00:11,466 --> 00:00:14,466 And then we got the best results so far with Kinen. 6 00:00:14,500 --> 00:00:17,500 And now let's see if SVM can beat it. 7 00:00:17,966 --> 00:00:18,266 All right. 8 00:00:18,266 --> 00:00:21,800 So before we start as usual let's make sure everyone here is on the same page. 9 00:00:22,033 --> 00:00:26,333 And if that's the case then follow me into part three classification 10 00:00:26,466 --> 00:00:30,466 and then section 16 Support Vector machine SVM. 11 00:00:30,866 --> 00:00:33,466 And we're going to start with Python of course as usual. 12 00:00:33,466 --> 00:00:36,300 And in this Python folder you will get two files. 13 00:00:36,300 --> 00:00:40,666 The first is the same data set social network at dot csv 14 00:00:40,900 --> 00:00:46,000 containing 400 observations, where each observation is actually a customer 15 00:00:46,166 --> 00:00:51,533 who but yes or no SUV that is advertised on social networks. 16 00:00:51,766 --> 00:00:55,566 And for each of these customers you have the age, the estimated salary. 17 00:00:55,566 --> 00:00:57,533 So these are the two features. 18 00:00:57,533 --> 00:01:00,966 And with these two features you will predict the dependent variable 19 00:01:00,966 --> 00:01:05,366 purchased meaning whether or not the customers but the SUV. 20 00:01:05,366 --> 00:01:09,633 So one means yes, the customer but a previous SUV and zero means no. 21 00:01:09,700 --> 00:01:12,166 The customer didn't buy any SUV. 22 00:01:12,166 --> 00:01:14,133 All right so same data set. 23 00:01:14,133 --> 00:01:18,466 And of course the second file is this support vector machine implementation. 24 00:01:18,633 --> 00:01:20,133 And the Ipynb format, 25 00:01:20,133 --> 00:01:23,733 which you can either open with Google Colaboratory or Jupyter Notebook. 26 00:01:24,000 --> 00:01:27,766 And as far as I'm concerned, I'm going to open it with Google Collaboratory. 27 00:01:27,766 --> 00:01:30,766 But feel free to choose your favorite ID. 28 00:01:31,033 --> 00:01:31,533 All right. 29 00:01:31,533 --> 00:01:35,000 So let's put that file here. Actually here. 30 00:01:35,200 --> 00:01:37,900 And right now it is loading the notebook laying it out. 31 00:01:37,900 --> 00:01:40,566 And in a second we should have it open. 32 00:01:40,566 --> 00:01:41,766 There we go. 33 00:01:41,766 --> 00:01:42,033 All right. 34 00:01:42,033 --> 00:01:45,100 So that's the whole support vector machine implementation. 35 00:01:45,100 --> 00:01:48,033 And of course it is exactly the same as before. 36 00:01:48,033 --> 00:01:51,766 In order to re-implement this we will only have to change 37 00:01:51,900 --> 00:01:54,900 the code cell where we build and train this model. 38 00:01:55,100 --> 00:01:59,466 Because indeed this implementation results from the exact same classification 39 00:01:59,466 --> 00:02:03,366 template that we made when we built the logistic regression model. 40 00:02:03,666 --> 00:02:05,500 We saw clearly when implementing 41 00:02:05,500 --> 00:02:09,266 the K-nearest neighbors model, how indeed we only had to change one cell 42 00:02:09,266 --> 00:02:12,733 and how this template worked super well for that model. 43 00:02:12,900 --> 00:02:16,100 So here for SVM we're going to do exactly the same. 44 00:02:16,233 --> 00:02:17,900 We're just going to leave all the cells 45 00:02:17,900 --> 00:02:21,166 as they are, as they actually were in the logistic regression model. 46 00:02:21,366 --> 00:02:25,400 And we will only re-implement the cell where we built the SVM. 47 00:02:25,933 --> 00:02:26,400 All right. 48 00:02:26,400 --> 00:02:27,300 So let's do this. 49 00:02:27,300 --> 00:02:29,966 Let's create a new copy of this file. 50 00:02:29,966 --> 00:02:32,000 Because this file is in read only mode. 51 00:02:32,000 --> 00:02:34,800 So let's click here. Save a copy and drive. 52 00:02:34,800 --> 00:02:39,600 And this will create a copy inside which we will indeed be able to modify 53 00:02:39,600 --> 00:02:40,500 the implementation. 54 00:02:40,500 --> 00:02:44,533 And mostly to re-implement that could sell to build the SVM model. 55 00:02:45,033 --> 00:02:46,500 All right. Perfect. 56 00:02:46,500 --> 00:02:50,466 So at the beginning of course, we start with the data preprocessing phase 57 00:02:50,466 --> 00:02:53,733 with all the same outputs displayed on the notebook. 58 00:02:53,966 --> 00:02:55,266 So that's all good. 59 00:02:55,266 --> 00:02:57,100 Then we apply feature scaling because you know 60 00:02:57,100 --> 00:02:59,100 it improves the training performance. 61 00:02:59,100 --> 00:03:02,066 And anyway it's never bad to apply feature scaling. 62 00:03:02,066 --> 00:03:04,100 And finally there we go. 63 00:03:04,100 --> 00:03:08,500 That's the cell we have to re-implement together. 64 00:03:08,500 --> 00:03:09,300 Because indeed 65 00:03:09,300 --> 00:03:12,833 it is the one that differs with respect to the previous implementations. 66 00:03:13,066 --> 00:03:17,000 So let's click this trash button here to, you know, re-implement it again. 67 00:03:17,000 --> 00:03:18,633 Let's create a new code cell. 68 00:03:18,633 --> 00:03:23,033 And now my friends, over to you once again I would like you to please 69 00:03:23,033 --> 00:03:26,233 press pause on the video and try to implement that code. 70 00:03:26,233 --> 00:03:26,900 Sell yourself. 71 00:03:26,900 --> 00:03:29,400 And that's because I not only want to train you 72 00:03:29,400 --> 00:03:32,566 in machine learning, but also train you on how to be independent 73 00:03:32,700 --> 00:03:33,900 with machine learning. 74 00:03:33,900 --> 00:03:38,133 So right now, the exercise I want you to do is to do some research 75 00:03:38,133 --> 00:03:41,166 in the cycle Learning API to figure out 76 00:03:41,233 --> 00:03:44,333 which class allows to build the SVM model. 77 00:03:44,433 --> 00:03:46,933 So you will find it very easily actually, because 78 00:03:46,933 --> 00:03:50,400 there is no trap in the name of the class or the name of the module. 79 00:03:50,566 --> 00:03:51,400 So I trust 80 00:03:51,400 --> 00:03:56,100 you will totally be able to do this exercise successfully and mostly know 81 00:03:56,100 --> 00:04:00,700 which method to use at the end to train that SVM model on the training set. 82 00:04:01,266 --> 00:04:02,766 All right, so please press pause. 83 00:04:02,766 --> 00:04:05,566 And now in two seconds I'm going to give you the solution. 84 00:04:07,733 --> 00:04:09,133 All right let's do this. 85 00:04:09,133 --> 00:04:12,566 So I already have the cycle API open. 86 00:04:12,600 --> 00:04:15,266 You know that was for the nearest neighbors. 87 00:04:15,266 --> 00:04:18,066 The k nearest neighbors which we implemented previously. 88 00:04:18,066 --> 00:04:21,800 In the previous section we used this class k neighbors classifier. 89 00:04:22,033 --> 00:04:26,466 And now the next thing we would like to find in this API documentation 90 00:04:26,700 --> 00:04:31,533 is the module that contains the class that allows to build the SVM model. 91 00:04:31,900 --> 00:04:34,066 So naturally where can we find it? 92 00:04:34,066 --> 00:04:37,333 You know here should we scroll back up or scroll down? 93 00:04:37,566 --> 00:04:41,333 Well, let's hope that you know, the name of the module starts with an S, 94 00:04:41,333 --> 00:04:45,200 because here, you know the modules are organized by alphabetical order. 95 00:04:45,200 --> 00:04:49,866 So since here we are at N use neighbors, let's hope that the name of the module 96 00:04:49,866 --> 00:04:53,400 we're looking for starts with a nest like support vector machine. 97 00:04:53,400 --> 00:04:57,900 So let's scroll down and random projections semi-supervised learning. 98 00:04:57,900 --> 00:05:01,300 And there we go support vector machines. 99 00:05:01,300 --> 00:05:02,033 Hello. 100 00:05:02,033 --> 00:05:04,200 That's exactly what we were looking for. 101 00:05:04,200 --> 00:05:06,633 Support vector machine. So that's not the name of the module. 102 00:05:06,633 --> 00:05:08,966 The name of the module is SVM. It's same. 103 00:05:08,966 --> 00:05:11,700 That stands for Support Vector machines. 104 00:05:11,700 --> 00:05:12,133 All right. 105 00:05:12,133 --> 00:05:14,500 And then well you know the hardest part is done 106 00:05:14,500 --> 00:05:17,966 now according to you which estimator is you know because here you have 107 00:05:17,966 --> 00:05:22,333 all the basically support vector machines based machine learning models. 108 00:05:22,566 --> 00:05:25,200 And so according to you which one do we need to take here. 109 00:05:25,200 --> 00:05:28,133 Well, we actually have two options. 110 00:05:28,133 --> 00:05:31,800 We could either take the linear SVC which will directly 111 00:05:31,800 --> 00:05:34,800 build the linear support vector machine model, 112 00:05:34,800 --> 00:05:39,500 or we can take this one SVC and choose a linear kernel. 113 00:05:39,966 --> 00:05:40,600 All right. 114 00:05:40,600 --> 00:05:44,566 And we will actually go for this option because in the next section 115 00:05:44,566 --> 00:05:48,600 we will study the kernel SVM models, which as you might guess 116 00:05:48,600 --> 00:05:53,233 allow us to choose some different kernels in our SVM, including the linear one 117 00:05:53,233 --> 00:05:57,833 and the nonlinear ones, like for example, the very famous one RBF.