1 00:00:00,166 --> 00:00:00,966 So let's try. 2 00:00:00,966 --> 00:00:05,600 Let's now reselect this section to create a new regressor. 3 00:00:06,000 --> 00:00:09,433 So our part is already imported so I don't need to select that again. 4 00:00:09,800 --> 00:00:12,800 And let's create this new regressor. 5 00:00:13,033 --> 00:00:13,800 Okay. Done. 6 00:00:13,800 --> 00:00:14,833 Perfect. 7 00:00:14,833 --> 00:00:17,666 New regressor created all properly. Okay. 8 00:00:17,666 --> 00:00:20,933 And now let's perhaps first visualize the results 9 00:00:20,933 --> 00:00:24,533 to see if our model is now correct before getting to the final verdict. 10 00:00:25,066 --> 00:00:28,066 Because, you know, we need to validate the model. Now. 11 00:00:29,066 --> 00:00:32,200 So I'm selecting this section and let's see what we get. 12 00:00:32,233 --> 00:00:35,233 Let's cross our fingers. 13 00:00:35,233 --> 00:00:35,933 Here we go. 14 00:00:35,933 --> 00:00:38,500 I'm going to zoom on the plot. 15 00:00:38,500 --> 00:00:40,700 And now the first word I have to say 16 00:00:40,700 --> 00:00:43,900 here is trap or red flag. 17 00:00:44,100 --> 00:00:47,000 Right now we're just in front of a new trap. 18 00:00:47,000 --> 00:00:48,166 The model is improved. 19 00:00:48,166 --> 00:00:48,866 Definitely. 20 00:00:48,866 --> 00:00:52,033 We can definitely see that we have more than one splits here. 21 00:00:52,233 --> 00:00:53,700 For example, here, that's a split. 22 00:00:53,700 --> 00:00:55,766 That's another split here and here. 23 00:00:55,766 --> 00:00:58,933 So okay, we solved the problem of the number of splits. 24 00:00:59,400 --> 00:01:03,000 But according to what Kirill explained in the intuition tutorial, 25 00:01:03,300 --> 00:01:07,000 do you think that the real shape of a decision tree regression model. 26 00:01:07,600 --> 00:01:11,066 Well, because you know, what Kirill explains is the algorithm of decision tree 27 00:01:11,066 --> 00:01:14,700 regression is that by considering the entropy and the information gain, 28 00:01:14,933 --> 00:01:19,033 it's splitting the independent variables into several intervals. 29 00:01:19,266 --> 00:01:22,200 So in the intuition tutorial you had two independent variables. 30 00:01:22,200 --> 00:01:25,233 So the different intervals formed some rectangles in which 31 00:01:25,233 --> 00:01:28,233 you took the average of the dependent variable values. 32 00:01:28,300 --> 00:01:31,900 But here, since we are in one dimension, that means that the algorithm 33 00:01:31,900 --> 00:01:34,900 will only take intervals here of the independent variable. 34 00:01:34,933 --> 00:01:37,333 For example, that should be one interval. 35 00:01:37,333 --> 00:01:39,800 And here it looks like it's the second one. 36 00:01:39,800 --> 00:01:41,266 And here it's the third one. 37 00:01:41,266 --> 00:01:43,166 And here a fourth one. 38 00:01:43,166 --> 00:01:43,566 Okay. 39 00:01:43,566 --> 00:01:46,666 So basically it looks like we have four conditions 40 00:01:46,666 --> 00:01:49,600 and four intervals that are making the splits. 41 00:01:49,600 --> 00:01:50,533 But as you understood 42 00:01:50,533 --> 00:01:54,566 in the intuition tutorial it's taking the average in each interval. 43 00:01:54,900 --> 00:01:58,500 So if it's taking the average how do you want to have this 44 00:01:58,500 --> 00:02:00,633 straight line here that is not horizontal. 45 00:02:00,633 --> 00:02:02,366 Because you know what the decision tree 46 00:02:02,366 --> 00:02:05,766 regression is doing is that in each interval it's calculating 47 00:02:05,766 --> 00:02:08,766 the average of the dependent variable salaries. 48 00:02:08,800 --> 00:02:11,800 And therefore for all the levels contained in this interval, 49 00:02:12,133 --> 00:02:15,100 the value of the prediction should be a constant equal 50 00:02:15,100 --> 00:02:18,100 to this average of the dependent variable in this interval. 51 00:02:18,333 --> 00:02:20,600 And here, as we can see, it's not a constant. 52 00:02:20,600 --> 00:02:23,266 You know the prediction here is not the same as the prediction here. 53 00:02:23,266 --> 00:02:27,000 So either it's considering an infinity of intervals 54 00:02:27,000 --> 00:02:30,433 with different constants in each of those infinite intervals. 55 00:02:30,800 --> 00:02:33,233 Or either we have a problem here. 56 00:02:33,233 --> 00:02:34,766 Of course it's not the first option. 57 00:02:34,766 --> 00:02:37,800 Of course, the decision tree regression is not considering 58 00:02:37,800 --> 00:02:41,000 an infinity of intervals between this level and this level. 59 00:02:41,500 --> 00:02:43,700 So it's definitely the second option. 60 00:02:43,700 --> 00:02:46,266 So now do you see where the problem comes from. 61 00:02:46,266 --> 00:02:49,500 Well the answer is in our regression template. 62 00:02:49,966 --> 00:02:53,966 Because what we observe here is only due to the resolution 63 00:02:53,966 --> 00:02:57,366 we picked to plot these decision tree regression results, 64 00:02:57,900 --> 00:03:00,533 because we are actually plotting the predictions 65 00:03:00,533 --> 00:03:03,566 for each of the ten levels incremented by one. 66 00:03:03,600 --> 00:03:05,700 That means that here, you know, it's only plotting 67 00:03:05,700 --> 00:03:09,000 the predictions of the ten salaries corresponding to the ten levels, 68 00:03:09,333 --> 00:03:12,366 and then it's joining the predictions by a straight line here, 69 00:03:12,533 --> 00:03:16,300 because it had no predictions to plot in this interval 70 00:03:16,300 --> 00:03:18,866 here of the independent variable level. 71 00:03:18,866 --> 00:03:22,900 And the fact that it's a problem for this new non-linear regression model, 72 00:03:22,900 --> 00:03:26,500 the decision tree model is due to a very specific reason 73 00:03:26,966 --> 00:03:28,800 for the previous non-linear regression models. 74 00:03:28,800 --> 00:03:31,700 We could use the code that generated this plot 75 00:03:31,700 --> 00:03:33,933 because the models were actually continuous. 76 00:03:33,933 --> 00:03:36,700 So for example in the polynomial regression model 77 00:03:36,700 --> 00:03:39,666 well between these prediction and these prediction, 78 00:03:39,666 --> 00:03:42,400 well it was actually almost a straight line here. 79 00:03:42,400 --> 00:03:45,633 However, right now we are facing a new kind of regression model. 80 00:03:46,000 --> 00:03:47,766 Remember the first kind of regression model 81 00:03:47,766 --> 00:03:50,400 we studied was the linear regression model. 82 00:03:50,400 --> 00:03:54,333 Then the second kind of regression model we saw was the nonlinear regression model. 83 00:03:54,633 --> 00:03:58,000 And now we're facing a new kind of regression model. 84 00:03:58,200 --> 00:04:02,100 It's the nonlinear and non continuous regression model. 85 00:04:03,000 --> 00:04:05,633 Indeed all the previous regression models that we saw, 86 00:04:05,633 --> 00:04:09,600 whether they were linear or not linear they were all continuous. 87 00:04:09,866 --> 00:04:13,433 But here the decision tree regression model is not continuous. 88 00:04:13,433 --> 00:04:14,833 And this is the first non 89 00:04:14,833 --> 00:04:17,833 continuous machine learning model we are seeing together. 90 00:04:17,966 --> 00:04:18,666 And so do you know 91 00:04:18,666 --> 00:04:21,900 what is the best way to visualize a non continuous regression model. 92 00:04:21,966 --> 00:04:24,333 Well as I was telling you the answer 93 00:04:24,333 --> 00:04:27,433 the solution for this is in our regression template. 94 00:04:27,766 --> 00:04:29,133 So let's have a look. 95 00:04:29,133 --> 00:04:32,733 And actually we need to take the code section 96 00:04:32,733 --> 00:04:36,633 that visualize the regression model results in higher resolution. 97 00:04:37,100 --> 00:04:41,033 So let's take this and let's actually go back to our decision 98 00:04:41,033 --> 00:04:45,166 tree regression file and replace this code here. 99 00:04:45,166 --> 00:04:47,233 Because this is totally not appropriate 100 00:04:47,233 --> 00:04:50,533 for our decision tree regression model because it's a non continuous model. 101 00:04:50,866 --> 00:04:55,433 And so we need to replace this code by the same but for the high resolution.