1 00:00:00,066 --> 00:00:00,733 Okay. 2 00:00:00,733 --> 00:00:04,866 Actually, this is all we had to change in this regression template. 3 00:00:05,166 --> 00:00:08,100 And now we are ready to execute the sections one by one 4 00:00:08,100 --> 00:00:11,133 to create our model and find out about the final results 5 00:00:11,133 --> 00:00:14,133 and the final verdict, whether it's truth or bluff. 6 00:00:14,200 --> 00:00:16,500 So let's do it. 7 00:00:16,500 --> 00:00:19,500 I'm going to import the dataset first. 8 00:00:19,500 --> 00:00:20,433 All right here we go. 9 00:00:20,433 --> 00:00:22,200 Well import it. We can have a look. 10 00:00:22,200 --> 00:00:25,366 So these are our data set with the levels as the independent variable 11 00:00:25,366 --> 00:00:28,366 and the salaries as the dependent variable. 12 00:00:28,666 --> 00:00:29,700 Okay great. 13 00:00:29,700 --> 00:00:33,766 And now we need to split the dataset into the training set and the test set. 14 00:00:33,800 --> 00:00:34,800 No need for feature scaling 15 00:00:34,800 --> 00:00:38,700 because we're using the very popular E10 71 library that includes it. 16 00:00:39,233 --> 00:00:41,400 And now let's create our model. 17 00:00:41,400 --> 00:00:44,233 So I'm going to select this whole section here. 18 00:00:44,233 --> 00:00:48,600 Press Command and Control Plus and do the cute done regress or created. 19 00:00:48,900 --> 00:00:49,800 Great. 20 00:00:49,800 --> 00:00:52,666 And now let's predict the new results which I'd like to start 21 00:00:52,666 --> 00:00:56,500 by visualizing the SVR results first or predicting the new result. 22 00:00:56,500 --> 00:01:00,833 Well, let's actually predict the result because this is the most exciting step. 23 00:01:01,000 --> 00:01:02,833 And let's keep the best for the end. 24 00:01:02,833 --> 00:01:04,533 But this step is actually very exciting. 25 00:01:04,533 --> 00:01:05,700 Also, because 26 00:01:05,700 --> 00:01:09,300 we are getting the final prediction, we are getting the final predicted salary. 27 00:01:09,300 --> 00:01:12,600 So you know it says you want if you're doing actually this code 28 00:01:12,600 --> 00:01:15,633 by yourself before me, that's actually a very good exercise. 29 00:01:15,633 --> 00:01:17,533 So I really encourage you to do this. 30 00:01:17,533 --> 00:01:20,300 Choose whatever you want to execute first. 31 00:01:20,300 --> 00:01:21,300 It's ready anyway. 32 00:01:21,300 --> 00:01:23,300 You don't need to change anything more now. 33 00:01:23,300 --> 00:01:26,100 So I'm going to execute that right now. 34 00:01:26,100 --> 00:01:29,100 And actually we obtain a predicted salary 35 00:01:29,266 --> 00:01:31,933 of $177,000. 36 00:01:31,933 --> 00:01:33,100 Great. 37 00:01:33,100 --> 00:01:36,600 So remember this employee who we are negotiating the future salary right 38 00:01:36,600 --> 00:01:39,733 now, said that its previous salary was 160 K. 39 00:01:40,033 --> 00:01:45,166 And our model predicted that its previous salary was 177 K. 40 00:01:45,500 --> 00:01:49,200 So first of all, that's actually close to what this employee said. 41 00:01:49,200 --> 00:01:52,133 And besides, it's on the good side of the negotiation. 42 00:01:52,133 --> 00:01:53,600 So that's actually pretty good. 43 00:01:53,600 --> 00:01:57,333 And we can be satisfied with this result and our model. 44 00:01:57,333 --> 00:02:01,866 But to be really satisfied let's see what's happening with the graphic result. 45 00:02:01,866 --> 00:02:04,866 So I'm going to execute this section 46 00:02:04,866 --> 00:02:07,733 and let's have a look at the SVR model. 47 00:02:09,600 --> 00:02:10,933 And here it is. 48 00:02:10,933 --> 00:02:14,233 I'm going to zoom on it and better okay. 49 00:02:14,700 --> 00:02:18,900 So first of all this model fits very well most of the data points. 50 00:02:19,266 --> 00:02:22,266 As a reminder the real observation points are the red points here. 51 00:02:22,500 --> 00:02:26,966 And all the points in this blue curve here which is the SVR model itself 52 00:02:27,233 --> 00:02:28,466 are the prediction points. 53 00:02:28,466 --> 00:02:32,100 So for example, if we're taking this red observation point here, 54 00:02:32,366 --> 00:02:35,366 well the prediction is perfect because the prediction point 55 00:02:35,366 --> 00:02:38,333 is the projection of this red points on the blue curve. 56 00:02:38,333 --> 00:02:41,000 So it's actually the red point itself because the red point 57 00:02:41,000 --> 00:02:42,466 is on the blue curve. 58 00:02:42,466 --> 00:02:43,866 So that makes a perfect prediction. 59 00:02:43,866 --> 00:02:45,133 But here for example 60 00:02:45,133 --> 00:02:48,133 if we take a less accurate prediction but still a very good one, 61 00:02:48,200 --> 00:02:50,100 then we take the real observation point here. 62 00:02:50,100 --> 00:02:53,100 The red point we projected on the blue curve. 63 00:02:53,166 --> 00:02:57,000 And that's the difference between the real salary 64 00:02:57,000 --> 00:02:58,866 and the predicted salary, which is here. 65 00:02:58,866 --> 00:03:02,533 If we projected back on the y axis, which contains the salaries. 66 00:03:02,700 --> 00:03:07,100 But you can see that for all of these points from this one to actually this one, 67 00:03:07,433 --> 00:03:10,966 the blue curve is actually getting very close to the real observation 68 00:03:10,966 --> 00:03:12,533 points, the red points. 69 00:03:12,533 --> 00:03:15,533 And so the predictions are very close to real results. 70 00:03:15,866 --> 00:03:21,100 But that's for all these points of the data set except for this one here. 71 00:03:21,100 --> 00:03:25,200 This one is left alone which by the way corresponds to the CEO. 72 00:03:25,200 --> 00:03:26,733 So I'm sorry for the CEO, 73 00:03:26,733 --> 00:03:31,500 but the reason for what is happening here is that this is actually an outlier. 74 00:03:31,600 --> 00:03:33,266 Not to call the CEO, not liar. 75 00:03:33,266 --> 00:03:36,266 But this is an outlier because as you can see, 76 00:03:36,466 --> 00:03:40,766 this point is actually far from the other points in terms of salaries. 77 00:03:41,100 --> 00:03:45,500 The CEO has a much higher salary than the previous positions. 78 00:03:45,500 --> 00:03:47,766 So for the SVR model, that's an outlier. 79 00:03:47,766 --> 00:03:49,800 And therefore he actually didn't consider it. 80 00:03:49,800 --> 00:03:53,100 It's like it excluded this point in the model by not looking at it 81 00:03:53,400 --> 00:03:56,733 and making its predictions on these points here. 82 00:03:57,266 --> 00:04:00,466 So that's specifically due to the SVR algorithm itself. 83 00:04:00,466 --> 00:04:02,466 But there are a lot of parameters. 84 00:04:02,466 --> 00:04:05,533 And you can actually play with the parameters to change 85 00:04:05,533 --> 00:04:08,833 the way the SVR model perceives outliers. 86 00:04:09,000 --> 00:04:10,033 So for example, 87 00:04:10,033 --> 00:04:13,900 these parameters are the penalty parameters, the regularization parameters. 88 00:04:14,233 --> 00:04:16,766 It's most of the time well described in the description. 89 00:04:16,766 --> 00:04:20,300 And of course the E 1071 library includes such techniques. 90 00:04:21,000 --> 00:04:25,533 But we're not going to do it here in this tutorial because we actually don't need 91 00:04:25,533 --> 00:04:29,800 to get a good prediction of the CEO salary and remember what we actually needed. 92 00:04:29,800 --> 00:04:30,633 What is a good prediction 93 00:04:30,633 --> 00:04:33,633 of the previous salary of this employee we are negotiating with? 94 00:04:33,833 --> 00:04:36,300 And this employee has a level 6.5. 95 00:04:36,300 --> 00:04:39,533 And around this point we can see that the SVR model fits very well 96 00:04:39,533 --> 00:04:40,633 to our data set. 97 00:04:40,633 --> 00:04:45,366 And we actually get a prediction of 177 K quite close to the real 98 00:04:45,366 --> 00:04:49,533 or the mentioned salary of this employee, which is 160 K. 99 00:04:49,900 --> 00:04:51,100 So that's actually pretty good. 100 00:04:51,100 --> 00:04:54,233 And therefore the verdict of truth 101 00:04:54,233 --> 00:04:57,866 or bluff according to our SVR model is rather true.