1 00:00:00,200 --> 00:00:00,800 All right. 2 00:00:00,800 --> 00:00:02,866 Okay. So now I'm going to tell you the solution. 3 00:00:02,866 --> 00:00:06,700 Well, first of course we have to change our regressor here 4 00:00:06,700 --> 00:00:09,733 because we want to use our polynomial regression model 5 00:00:09,733 --> 00:00:12,900 which is based on Lin rank two. 6 00:00:13,200 --> 00:00:14,466 So that's the first change. 7 00:00:14,466 --> 00:00:16,400 Lin Rick two. Yes. 8 00:00:16,400 --> 00:00:18,533 And now in the predict method, 9 00:00:18,533 --> 00:00:22,500 do you think we need to keep x or change that by something else? 10 00:00:23,400 --> 00:00:26,833 Well of course we can't keep x because remember x is 11 00:00:26,833 --> 00:00:30,566 only that matrix of single feature containing the position levels. 12 00:00:30,866 --> 00:00:35,600 Remember that when we use the Lin rank two regressor which is a linear regressor, 13 00:00:35,833 --> 00:00:39,366 it has to be applied to the transformed matrix 14 00:00:39,366 --> 00:00:43,333 of features X into this matrix of features at the different powers, 15 00:00:43,333 --> 00:00:47,800 you know, position level, then squared position level, and then the other powers. 16 00:00:47,800 --> 00:00:50,066 Here we only chose n equals two so far. 17 00:00:50,066 --> 00:00:52,766 So we only have the squared position levels. 18 00:00:52,766 --> 00:00:55,200 But that's exactly what you need to change here. 19 00:00:55,200 --> 00:00:57,666 You can't skip X because it is the single feature. 20 00:00:57,666 --> 00:01:02,200 So you need to input here that transformed matrix of features 21 00:01:02,200 --> 00:01:06,000 X containing the different powers of that single feature position level. 22 00:01:06,333 --> 00:01:10,200 And that's of course exactly this poly rig 23 00:01:10,466 --> 00:01:15,333 fit transform method apply it to this matrix of single feature x. 24 00:01:15,566 --> 00:01:16,633 So I'm copying this. 25 00:01:16,633 --> 00:01:21,666 And that's exactly what we have to input inside this predict method. 26 00:01:21,866 --> 00:01:22,833 You understand. 27 00:01:22,833 --> 00:01:25,733 So that was the little thing not to forget. 28 00:01:25,733 --> 00:01:26,900 And now you're all good. 29 00:01:26,900 --> 00:01:27,533 You're ready 30 00:01:27,533 --> 00:01:31,333 to have a beautiful visualization of the polynomial regression results. 31 00:01:31,333 --> 00:01:34,433 Let's just replace this linear here by Bo 32 00:01:34,800 --> 00:01:38,500 Li normal and now 100% ready. 33 00:01:38,600 --> 00:01:42,600 Let's visualize the polynomial regression results. 34 00:01:42,600 --> 00:01:43,866 And there you go. 35 00:01:43,866 --> 00:01:47,866 Now we have indeed a way more adaptive regression 36 00:01:47,866 --> 00:01:52,166 curve coming indeed much closer to the real results. 37 00:01:52,166 --> 00:01:53,833 You know, the real salaries. 38 00:01:53,833 --> 00:01:54,366 Indeed. 39 00:01:54,366 --> 00:01:56,233 If we compare two, I'm 40 00:01:56,233 --> 00:02:00,100 going to zoom out a bit so that we can see both of them at the same time. 41 00:02:00,100 --> 00:02:01,100 There we go. 42 00:02:01,100 --> 00:02:04,800 So if we compare these points where we had issues previously with the linear 43 00:02:04,800 --> 00:02:09,166 regression model, well, we can clearly see now that the issue is resolved 44 00:02:09,200 --> 00:02:12,233 because indeed the predictions on this blue 45 00:02:12,233 --> 00:02:15,533 curve come way closer to the real salaries. 46 00:02:15,533 --> 00:02:18,200 And this is only with n equals two. 47 00:02:18,200 --> 00:02:23,033 I'm going to show you then that with higher powers you know n equals 3 or 4. 48 00:02:23,200 --> 00:02:25,300 Well we will get even better results. 49 00:02:25,300 --> 00:02:28,200 And I'm going to actually show this to you right away. 50 00:02:28,200 --> 00:02:31,633 So now what we're going to do is, well, we're going to keep this, 51 00:02:31,633 --> 00:02:35,166 but we're going to remove this because we're going to retrain 52 00:02:35,200 --> 00:02:38,700 the polynomial regression model with a higher degree. 53 00:02:38,800 --> 00:02:40,466 Let's take for example, 54 00:02:40,466 --> 00:02:43,466 you can try with three but we're going to directly try with four. 55 00:02:43,500 --> 00:02:46,500 So there you go I'm going to remove the output. 56 00:02:46,733 --> 00:02:50,066 Retrain the polynomial regression model on the whole data set with. 57 00:02:50,066 --> 00:02:54,666 Therefore this time a degree of four, which means that the polynomial regression 58 00:02:54,666 --> 00:03:00,600 equation will be salary equals b0 plus b1 times position level plus 59 00:03:00,600 --> 00:03:04,766 b2 times position level square plus B3 times position 60 00:03:04,766 --> 00:03:08,800 level of the power of three plus B4 times position level the power of four. 61 00:03:08,866 --> 00:03:11,400 So that will be the new polynomial regression equation. 62 00:03:11,400 --> 00:03:13,833 And so therefore let's retrain it. 63 00:03:13,833 --> 00:03:18,366 Let's build this new polynomial regression model by just running this cell again. 64 00:03:18,733 --> 00:03:20,266 All right. So now we have it. 65 00:03:20,266 --> 00:03:23,166 And now very simply we're going to visualize the new 66 00:03:23,166 --> 00:03:26,166 polynomial regression results by clicking this cell here. 67 00:03:26,400 --> 00:03:29,333 And now as you can see now the polynomial regression 68 00:03:29,333 --> 00:03:32,466 model is perfectly fitting this data set. 69 00:03:32,466 --> 00:03:34,466 So here we clearly have overfitting. 70 00:03:34,466 --> 00:03:37,466 But that's okay only in this situation because we want to have 71 00:03:37,466 --> 00:03:43,200 a perfect prediction of the salary between position level six and seven okay. 72 00:03:43,200 --> 00:03:45,133 And now one final thing because I really want you 73 00:03:45,133 --> 00:03:48,000 to have the best results and best visualizations. 74 00:03:48,000 --> 00:03:48,333 Indeed. 75 00:03:48,333 --> 00:03:52,200 As you can see here, what happened is that only some straight lines 76 00:03:52,200 --> 00:03:57,633 where plotted between each consecutive points of the data set, right? 77 00:03:57,900 --> 00:04:01,900 And therefore that makes this curve not as smooth as we would hope for. 78 00:04:02,266 --> 00:04:06,233 So I actually prepared another code, but which we won't code together, 79 00:04:06,233 --> 00:04:09,300 because this is just for the sake of having a more beautiful curve. 80 00:04:09,466 --> 00:04:12,966 So we're going to get it from the original implementation. 81 00:04:13,400 --> 00:04:14,966 It is actually right here. 82 00:04:14,966 --> 00:04:16,733 You see visualizing the polynomial 83 00:04:16,733 --> 00:04:19,733 regression results for higher resolution and smoother curve. 84 00:04:19,733 --> 00:04:23,733 So I'm going to take all this code and then paste it 85 00:04:24,066 --> 00:04:28,100 in our copy of the implementation right here. 86 00:04:28,100 --> 00:04:29,400 And your code cell. 87 00:04:29,400 --> 00:04:30,500 And you will see that 88 00:04:30,500 --> 00:04:34,500 we will get indeed a much smoother and more beautiful curve as you can see. 89 00:04:34,500 --> 00:04:35,166 Right. 90 00:04:35,166 --> 00:04:38,400 And the trick to plot this curve, I'm going to explain this quickly 91 00:04:38,866 --> 00:04:42,600 is just to, instead of taking you know, the integers zero, one, 92 00:04:42,600 --> 00:04:44,833 two, three, four, five, six, seven, eight, nine, ten. 93 00:04:44,833 --> 00:04:47,800 Well, we increase the density of these points by taking 94 00:04:47,800 --> 00:04:52,666 not only these integers, but, you know, 11.1, 1.2, 1.3, and 1.4 95 00:04:52,900 --> 00:04:56,833 up to, you know, 9.1, 9.2, 9.8, 9.9, ten. 96 00:04:56,833 --> 00:04:58,666 That's what this 0.1 means. 97 00:04:58,666 --> 00:05:00,466 That's what we call this tip. 98 00:05:00,466 --> 00:05:02,700 All right. So you don't have to understand this. 99 00:05:02,700 --> 00:05:03,600 You can if you want. 100 00:05:03,600 --> 00:05:07,300 But this is something you will probably do only once in your life, because I remind 101 00:05:07,300 --> 00:05:10,300 that usually your data sets will have many features. 102 00:05:10,300 --> 00:05:14,233 And here I only took one feature in order to show you the results on a graph. 103 00:05:14,333 --> 00:05:16,466 Because, you know, if we had many features, I couldn't 104 00:05:16,466 --> 00:05:19,400 show this to you because we would have way too many dimensions. 105 00:05:19,400 --> 00:05:21,600 So don't worry too much about this, okay? 106 00:05:21,600 --> 00:05:23,666 But just appreciate the result, right? 107 00:05:23,666 --> 00:05:25,366 We have a very well trained 108 00:05:25,366 --> 00:05:29,433 and therefore very well fitted but overfitted model for this data set. 109 00:05:29,633 --> 00:05:33,600 But that's fine because then indeed we will be able to get an amazing 110 00:05:33,600 --> 00:05:38,000 and accurate prediction to figure out if there is truth or.