1 00:00:00,366 --> 00:00:01,833 All right, so there we go. 2 00:00:01,833 --> 00:00:03,066 Let's go inside. 3 00:00:03,066 --> 00:00:06,533 And here is the name of the module SVM, of course. 4 00:00:06,533 --> 00:00:12,000 And the class that we must now import in order to build our SVM model. 5 00:00:12,466 --> 00:00:14,366 All right. So let's start with this as usual. 6 00:00:14,366 --> 00:00:20,100 Let's get you know this path and let's adapt it inside our code cell. 7 00:00:20,200 --> 00:00:21,166 You know how to do this. 8 00:00:21,166 --> 00:00:25,266 We need to start with from so from the scikit learn library. 9 00:00:25,266 --> 00:00:28,266 And then the SVM module of the scikit learn library, 10 00:00:28,366 --> 00:00:33,200 we will import the SVC class okay. 11 00:00:33,566 --> 00:00:34,633 So that's the first step. 12 00:00:34,633 --> 00:00:37,333 And then the next natural step you know it by heart. 13 00:00:37,333 --> 00:00:40,100 It is of course to create an object of this class. 14 00:00:40,100 --> 00:00:43,100 And we need to keep here the name classifier. 15 00:00:43,266 --> 00:00:47,400 Because in order to not have anything to change afterwards, the classifier 16 00:00:47,400 --> 00:00:50,533 which will be nothing else, then the SVM model itself. 17 00:00:50,900 --> 00:00:54,033 And so to create such an object we need to call this class. 18 00:00:54,033 --> 00:00:57,900 Well I can just type it as SVC and then add some parenthesis. 19 00:00:58,200 --> 00:01:02,866 And now the question is what do we need to input here as parameters. 20 00:01:03,300 --> 00:01:05,800 Well I hope you have the reflex to think. 21 00:01:05,800 --> 00:01:10,266 Of course we need to input the kernel because as I've just explained, this 22 00:01:10,466 --> 00:01:15,000 SVC class allows us to build kernel SVM models with either a linear kernel 23 00:01:15,000 --> 00:01:18,400 which is the classic SVM model, or a nonlinear kernel. 24 00:01:18,400 --> 00:01:21,400 So here in our parameters we'll have to specify that we want 25 00:01:21,533 --> 00:01:25,300 a linear kernel because we're starting with the classic SVM model. 26 00:01:25,300 --> 00:01:26,733 And then in the next section you know 27 00:01:26,733 --> 00:01:30,900 on kernel SVM we'll choose a nonlinear kernel like RBF or polynomial. 28 00:01:30,966 --> 00:01:31,800 We'll see. 29 00:01:31,800 --> 00:01:34,200 But there you go. That's our first argument. 30 00:01:34,200 --> 00:01:37,533 And indeed when we have a look at the documentation, we can see 31 00:01:37,533 --> 00:01:40,733 that indeed the second parameter here is kernel. 32 00:01:40,966 --> 00:01:42,300 And the default value is RBF. 33 00:01:42,300 --> 00:01:45,800 So we'll have to specify that we want linear to start with. 34 00:01:46,300 --> 00:01:50,433 And then as you can see you have many many other arguments you know parameters. 35 00:01:50,700 --> 00:01:53,366 And you can check out the description for each of them. 36 00:01:53,366 --> 00:01:56,666 But no worries we won't change their default value. 37 00:01:56,666 --> 00:01:58,033 We'll just keep their default value. 38 00:01:58,033 --> 00:02:00,133 Like for example, the regularization parameter 39 00:02:00,133 --> 00:02:02,133 for which we'll keep the default value of one. 40 00:02:02,133 --> 00:02:02,666 That's fine. 41 00:02:02,666 --> 00:02:05,100 You'll see in the visualization that there is not much 42 00:02:05,100 --> 00:02:08,100 we can do in order to improve the model, or to avoid overfitting. 43 00:02:08,366 --> 00:02:09,300 So there we go. 44 00:02:09,300 --> 00:02:15,466 The only parameter that we'll take is this one kernel equals not RBF but linear. 45 00:02:15,633 --> 00:02:17,366 All right. So let's do this. 46 00:02:17,366 --> 00:02:20,333 Kernel equals in quotes 47 00:02:21,400 --> 00:02:22,666 linear. 48 00:02:22,666 --> 00:02:23,566 All right. 49 00:02:23,566 --> 00:02:27,800 And then I know I've just said that we won't have to input any other parameters. 50 00:02:27,800 --> 00:02:29,933 But let's add anyway you know this one. 51 00:02:29,933 --> 00:02:33,900 The random state parameter and set it equal to zero. 52 00:02:34,033 --> 00:02:38,433 Just to make sure that we get the same result displayed on our notebook. 53 00:02:38,433 --> 00:02:42,200 You know, because there are some random factors when we built this SVM model. 54 00:02:42,200 --> 00:02:44,133 And therefore for teaching purposes, it's 55 00:02:44,133 --> 00:02:47,900 better if we all have the same results displayed in our notebook. 56 00:02:48,066 --> 00:02:48,800 All right. 57 00:02:48,800 --> 00:02:50,666 And that's it. Congratulations. 58 00:02:50,666 --> 00:02:54,966 That builds the SVM model, the classic one with a linear kernel. 59 00:02:55,233 --> 00:02:57,666 And now you know how to finish this. 60 00:02:57,666 --> 00:03:01,166 Of course, our last step here is to take our classify data, 61 00:03:01,500 --> 00:03:04,500 from which we're going to call the fit method 62 00:03:04,766 --> 00:03:09,000 to train our SVM classifier on the training set. 63 00:03:09,000 --> 00:03:11,366 And we have to input that here in two parts. 64 00:03:11,366 --> 00:03:15,600 First, the matrix of features of the training set, which is X train, 65 00:03:15,800 --> 00:03:20,166 and then the dependent variable vector of the training set, which is why train. 66 00:03:20,700 --> 00:03:22,333 All right. So you know this perfectly well. 67 00:03:22,333 --> 00:03:24,966 Now it's like your native language right. 68 00:03:24,966 --> 00:03:26,133 Okay I hope I'm right. 69 00:03:26,133 --> 00:03:27,933 But there you go. Congratulations. 70 00:03:27,933 --> 00:03:29,733 This implementation is now over. 71 00:03:29,733 --> 00:03:33,033 Thanks to this very efficient code template. 72 00:03:33,333 --> 00:03:38,500 And so now well we'll just run everything and observe the final results Indian. 73 00:03:38,800 --> 00:03:39,166 All right. 74 00:03:39,166 --> 00:03:39,900 So let's do this. 75 00:03:39,900 --> 00:03:44,033 Let's not forget to you know import the data set here. 76 00:03:44,033 --> 00:03:45,800 You know upload it in the notebook. 77 00:03:45,800 --> 00:03:47,966 So click this folder here. 78 00:03:47,966 --> 00:03:51,000 Then you will have to wait a few seconds because your notebook 79 00:03:51,300 --> 00:03:55,666 will be connecting to a runtime to enable file browsing. 80 00:03:56,166 --> 00:04:00,133 And in a second we should see there we go the upload button. 81 00:04:00,400 --> 00:04:02,033 So we're going to click that. 82 00:04:02,033 --> 00:04:03,600 And well you know that's the data set. 83 00:04:03,600 --> 00:04:05,833 But I'll show you the whole path again. 84 00:04:05,833 --> 00:04:07,466 So you find your machine learning 85 00:04:07,466 --> 00:04:08,466 is that folder codes 86 00:04:08,466 --> 00:04:11,566 and data sets which you could download in the previous tutorial in the article. 87 00:04:11,833 --> 00:04:15,633 Then you're going to go to part three classification, then to section 88 00:04:15,633 --> 00:04:19,633 16 support Vector Machine and Python and Social Network. 89 00:04:19,633 --> 00:04:24,966 At dot csv you click open and this will upload the data set inside the notebook. 90 00:04:24,966 --> 00:04:25,633 There we go. 91 00:04:25,633 --> 00:04:29,666 And now now you can run all the cells by clicking runtime here. 92 00:04:29,666 --> 00:04:32,433 And then are you ready. Run. 93 00:04:32,433 --> 00:04:36,766 and this will run all the cell and perfect. 94 00:04:36,800 --> 00:04:41,166 Now we have our SVM model with a linear kernel with all the default 95 00:04:41,166 --> 00:04:44,533 values of the parameters here except this one linear kernel. 96 00:04:45,000 --> 00:04:45,733 And so there we go. 97 00:04:45,733 --> 00:04:48,300 Then we have the rest of the cells. 98 00:04:48,300 --> 00:04:51,233 When we predict the new result. Indeed we get the right prediction. 99 00:04:51,233 --> 00:04:54,633 Remember at zero for that particular first customer of the test 100 00:04:54,633 --> 00:04:58,500 set of age 30 and estimated salary $87,000. 101 00:04:58,800 --> 00:05:02,966 Indeed, it is predicted not to buy the SUV as it is the case in reality. 102 00:05:03,166 --> 00:05:06,666 Okay, then when we predict the test results once again, 103 00:05:06,666 --> 00:05:09,900 we see that we have many correct predictions, except some of them. 104 00:05:09,900 --> 00:05:11,766 This one, this one. 105 00:05:11,766 --> 00:05:14,533 And you know this one. 106 00:05:14,533 --> 00:05:15,933 Okay, but it looks very good. 107 00:05:15,933 --> 00:05:19,533 However, what we're mostly interested in and we're about to get it right now 108 00:05:19,800 --> 00:05:23,366 is to see the accuracy on the test set, you know, the number 109 00:05:23,366 --> 00:05:26,366 of correct predictions or if you want the number of incorrect predictions. 110 00:05:26,666 --> 00:05:28,900 Are you ready? We're about to get it right now. 111 00:05:28,900 --> 00:05:33,133 Write that code cell, print the confusion matrix and display the accuracy. 112 00:05:33,133 --> 00:05:35,166 And wow okay interesting. 113 00:05:35,166 --> 00:05:38,766 So it beat actually the logistic regression model right. 114 00:05:38,766 --> 00:05:42,600 Remember the logistic regression model had an accuracy of 89%. 115 00:05:43,000 --> 00:05:46,366 And here the SVM slightly beat it by 1%. 116 00:05:46,566 --> 00:05:49,433 However it doesn't beat the turkey in and model 117 00:05:49,433 --> 00:05:53,133 which remember had an accuracy of 93%.