1 00:00:00,233 --> 00:00:03,866 Hello my friends, and welcome to this new practical activity 2 00:00:03,900 --> 00:00:07,200 on this time support vector regression. 3 00:00:07,466 --> 00:00:11,866 So I have to start by telling you that the model we're about to implement 4 00:00:11,866 --> 00:00:15,633 will be slightly more advanced than the models were built before, 5 00:00:15,633 --> 00:00:16,800 but it's totally fine. 6 00:00:16,800 --> 00:00:19,900 We will do it together and we will succeed 100%. 7 00:00:20,366 --> 00:00:22,833 So why is it slightly more advanced? 8 00:00:22,833 --> 00:00:27,000 Well, that's because we will have to play a lot with feature scaling. 9 00:00:27,000 --> 00:00:30,733 And I can tell you that after this implementation of the SVR model, 10 00:00:30,866 --> 00:00:34,833 you will be a master in feature scaling because you will know 11 00:00:34,866 --> 00:00:38,000 not only how to apply the feature scaling transformation, 12 00:00:38,166 --> 00:00:41,100 but also how to apply the inverse transformation. 13 00:00:41,100 --> 00:00:43,900 You know, to go back to the original scaling. 14 00:00:43,900 --> 00:00:46,900 So you will know all the Sep tools, let's say 15 00:00:46,900 --> 00:00:49,900 a feature scaling, and you will handle them like a probe. 16 00:00:49,900 --> 00:00:50,600 All right. 17 00:00:50,600 --> 00:00:52,933 So if you're ready let's start. 18 00:00:52,933 --> 00:00:54,800 And just before we go inside 19 00:00:54,800 --> 00:00:58,500 the folder, let's make sure here that we are all on the same page. 20 00:00:58,733 --> 00:01:02,566 I gave you the link to this folder containing all the codes and data sets 21 00:01:02,766 --> 00:01:06,400 right before this tutorial in an oracle, so make sure to connect to it. 22 00:01:06,600 --> 00:01:08,933 And now we should all be on the same page. 23 00:01:08,933 --> 00:01:12,633 And therefore we're going to go of course to part two regression. 24 00:01:12,800 --> 00:01:17,200 And then of course to support vector regression SVR. 25 00:01:17,766 --> 00:01:20,100 All right. And as usual we're going to start with Python. 26 00:01:20,100 --> 00:01:23,700 So we're going to go to that Python folder inside which will find 27 00:01:23,866 --> 00:01:26,166 the same data set as before. 28 00:01:26,166 --> 00:01:30,500 Because you know I would like to compare the performance of different regression 29 00:01:30,500 --> 00:01:34,533 models on this data set, which proved to have some non-linear 30 00:01:34,533 --> 00:01:37,200 relationships, you know, between the feature, 31 00:01:37,200 --> 00:01:40,433 which is the level you know, the position level going from 1 to 10 32 00:01:40,666 --> 00:01:45,466 and the salary going from 45,000 to $1 million per year. 33 00:01:45,833 --> 00:01:48,766 So here again, this is exactly the same scenario. 34 00:01:48,766 --> 00:01:53,933 We want to train this time a support vector regression model to learn 35 00:01:53,933 --> 00:01:57,900 and understand the correlations between these position levels and these salaries, 36 00:01:58,166 --> 00:02:02,400 and to quickly remind the context, we are hiring a new person 37 00:02:02,700 --> 00:02:07,500 who is expecting 160 K salary, justifying this by the fact 38 00:02:07,500 --> 00:02:11,900 that this person earned a 160 K salary in the previous company. 39 00:02:12,333 --> 00:02:15,100 This data is exactly the data of the previous company. 40 00:02:15,100 --> 00:02:16,700 You know, with the different position 41 00:02:16,700 --> 00:02:20,400 levels from business analyst to CEO and their corresponding salary. 42 00:02:20,633 --> 00:02:24,800 And so not only we want to train a model to learn these relationships, 43 00:02:24,966 --> 00:02:27,966 but also we want to deploy this model to predict 44 00:02:27,966 --> 00:02:30,966 the salary that this person had in this previous company, 45 00:02:31,166 --> 00:02:35,700 knowing that indeed, this person was a region manager for a couple of years 46 00:02:35,700 --> 00:02:40,000 and therefore is considered to have a position level of 6.5. 47 00:02:40,200 --> 00:02:43,066 So that's the exact same scenario, exact same data set. 48 00:02:43,066 --> 00:02:47,700 And so now let's build the support vector regression model on this data 49 00:02:47,700 --> 00:02:49,533 set to see if it performs 50 00:02:49,533 --> 00:02:53,100 better than the previous model which showed great results. 51 00:02:53,366 --> 00:02:55,333 The polynomial regression model. 52 00:02:55,333 --> 00:02:55,800 All right. 53 00:02:55,800 --> 00:02:56,633 So let's do this. 54 00:02:56,633 --> 00:03:01,266 Let's close this and let's open our support vector 55 00:03:01,266 --> 00:03:05,033 regression implementation either with Google Colaboratory 56 00:03:05,033 --> 00:03:10,366 if you love it, or with Jupyter Notebook because indeed it is an IP and B file. 57 00:03:10,600 --> 00:03:11,400 All right. 58 00:03:11,400 --> 00:03:13,033 So that's the whole implementation. 59 00:03:13,033 --> 00:03:16,200 Let me quickly show you the structure of the implementation. 60 00:03:16,400 --> 00:03:19,266 We're going to start by importing the libraries as usual 61 00:03:19,266 --> 00:03:22,766 then importing the data set then applying feature scaling. 62 00:03:22,766 --> 00:03:23,733 So this is interesting. 63 00:03:23,733 --> 00:03:27,500 This time we have to apply feature scaling because in the SVR model 64 00:03:27,500 --> 00:03:30,600 there is not this, you know, explicit equation 65 00:03:30,866 --> 00:03:33,866 of the dependent variable with respect to the features. 66 00:03:33,966 --> 00:03:37,833 And mostly there are not those, you know, coefficients multiplying 67 00:03:37,833 --> 00:03:39,000 each of the features 68 00:03:39,000 --> 00:03:43,166 and therefore not compensating with lower values for the features. 69 00:03:43,166 --> 00:03:44,833 Taking high values. No. 70 00:03:44,833 --> 00:03:50,133 This time the support vector regression model has an implicit equation 71 00:03:50,133 --> 00:03:53,600 of you know, the dependent variable with respect to the features. 72 00:03:53,800 --> 00:03:57,466 So we don't have such coefficients, and we will definitely 73 00:03:57,733 --> 00:04:00,600 have to apply feature scaling for this model. 74 00:04:00,600 --> 00:04:04,633 So you see you start to understand when and when not to apply features. 75 00:04:04,633 --> 00:04:05,600 Getting well. 76 00:04:05,600 --> 00:04:09,500 You know we don't have to apply feature scaling for linear regression models 77 00:04:09,500 --> 00:04:10,300 where you have indeed 78 00:04:10,300 --> 00:04:14,000 those coefficients that can compensate with the high values of the features. 79 00:04:14,333 --> 00:04:17,933 And so of course these include simple linear regression, multiple linear 80 00:04:17,933 --> 00:04:19,933 regression and polynomial regression. 81 00:04:19,933 --> 00:04:23,333 And you will see later on in the course many other models. 82 00:04:23,633 --> 00:04:28,600 And for other models which usually have an implicit equation, 83 00:04:28,600 --> 00:04:30,866 you know, an implicit relationship 84 00:04:30,866 --> 00:04:34,800 between the dependent variable y and the features x. 85 00:04:34,933 --> 00:04:38,700 Well, usually for these models we have to apply feature scaling. 86 00:04:38,700 --> 00:04:39,433 All right. 87 00:04:39,433 --> 00:04:42,866 Then we will of course trained SVM model on the whole data set. 88 00:04:42,866 --> 00:04:44,233 You know this time is the same. 89 00:04:44,233 --> 00:04:48,233 We want split the whole data set into a training set and a test set, 90 00:04:48,433 --> 00:04:51,800 because we want to leverage the maximum data to learn 91 00:04:51,800 --> 00:04:55,466 these correlations between those position levels and the salaries. 92 00:04:55,633 --> 00:04:56,833 So we won't do that split. 93 00:04:56,833 --> 00:05:00,400 And we will directly train the SVR model on the whole data set. 94 00:05:00,866 --> 00:05:04,333 Then after this training, well, we'll have a smart SVR model, 95 00:05:04,500 --> 00:05:08,433 which therefore we're going to use to predict this new result and exactly 96 00:05:08,700 --> 00:05:12,300 that salary of this position level of 6.5. 97 00:05:12,500 --> 00:05:15,633 And we will compare, of course, that prediction with the prediction 98 00:05:15,633 --> 00:05:18,733 of the polynomial regression model, which I've kept here. 99 00:05:19,333 --> 00:05:23,700 And then of course, we will visualize SVR results in low resolution 100 00:05:23,700 --> 00:05:28,733 and high resolution to see, of course, the regression curve of the SVR model. 101 00:05:29,200 --> 00:05:29,666 All right. 102 00:05:29,666 --> 00:05:33,633 I'm sure you noticed that I did not click on each of these contents. 103 00:05:33,633 --> 00:05:34,833 Well, I did that on purpose. 104 00:05:34,833 --> 00:05:37,666 It's because I don't want to reveal the prediction. 105 00:05:37,666 --> 00:05:40,500 You know, the predicted salary by the SVR model. 106 00:05:40,500 --> 00:05:44,533 I want us to save the surprise until, you know, we execute that cell 107 00:05:44,533 --> 00:05:46,500 to return that predicted salary.