1 00:00:00,066 --> 00:00:00,900 Perfect. 2 00:00:00,900 --> 00:00:03,766 And now you know what the next natural step is. 3 00:00:03,766 --> 00:00:07,566 Well, it is, of course, to create a new object of this 4 00:00:07,566 --> 00:00:09,333 random Forest Regressor class. 5 00:00:09,333 --> 00:00:13,100 And we're going to call this object as usual regressor, 6 00:00:13,433 --> 00:00:18,500 which will be equal to, well, you know, an instance of this class. 7 00:00:18,533 --> 00:00:23,300 That's why I'm copying it and basing it here and adding some parentheses. 8 00:00:23,833 --> 00:00:24,366 All right. 9 00:00:24,366 --> 00:00:28,900 So this time do you think we have to input something in this class. 10 00:00:28,900 --> 00:00:33,266 Well let's not fall into the trap of too much easiness. 11 00:00:33,266 --> 00:00:36,400 Let's say remember that for decision tree regression 12 00:00:36,600 --> 00:00:39,933 we actually did not input, you know, an essential parameter. 13 00:00:39,933 --> 00:00:43,466 We just input a random state factor in order to fix the seeds 14 00:00:43,500 --> 00:00:46,733 so that we can all have the same results displayed in the output. 15 00:00:47,033 --> 00:00:51,833 But this time there's actually a parameter that is very important. 16 00:00:51,833 --> 00:00:54,800 And that parameter is the number of trees. 17 00:00:54,800 --> 00:00:57,233 So that's exactly what we're going to input here. 18 00:00:57,233 --> 00:01:01,700 The name of that parameter is an underscore estimators. 19 00:01:02,466 --> 00:01:06,800 We're going to set it equal to 1010 trees right ten estimators. 20 00:01:06,800 --> 00:01:09,533 Each tree is an estimate okay. 21 00:01:09,533 --> 00:01:14,166 And then as usual we're going to add the random underscore state parameter. 22 00:01:14,166 --> 00:01:18,033 And once again we'll set that equal to zero so that we can fix the seeds 23 00:01:18,033 --> 00:01:21,600 and get the same output displayed in our notebook. 24 00:01:22,200 --> 00:01:23,100 Okay great. 25 00:01:23,100 --> 00:01:24,333 And now final step. 26 00:01:24,333 --> 00:01:26,166 You know this step by heart. 27 00:01:26,166 --> 00:01:28,300 Now it is so obvious for you. 28 00:01:28,300 --> 00:01:34,133 It is of course to fit the regressor object to the whole data set. 29 00:01:34,133 --> 00:01:34,833 In other words, 30 00:01:34,833 --> 00:01:38,666 that means we are training the regressor on the whole data set, right? 31 00:01:38,666 --> 00:01:42,500 So here we have to add a dot and then fit applied 32 00:01:42,500 --> 00:01:45,500 to x and y. 33 00:01:45,600 --> 00:01:46,466 Perfect. 34 00:01:46,466 --> 00:01:50,033 So and that's all we had to do you know for this random 35 00:01:50,033 --> 00:01:51,400 forest regression implementation. 36 00:01:51,400 --> 00:01:54,866 All the rest is the same as the decision tree. 37 00:01:54,866 --> 00:01:59,700 So we can just execute all the cells here and observe comfortably in our chair 38 00:01:59,700 --> 00:02:01,533 the final result. So let's have a look. 39 00:02:01,533 --> 00:02:04,000 I just want to warn you again that it's not going to be pretty. 40 00:02:04,000 --> 00:02:07,266 And that's of course for the exact same reason as decision trees. 41 00:02:07,266 --> 00:02:11,266 A random forest regression model is way better adapted to high 42 00:02:11,266 --> 00:02:14,400 dimensional data sets or, you know, data sets with multiple features, 43 00:02:14,400 --> 00:02:18,600 which you will see in the final section of this part two regression, 44 00:02:18,800 --> 00:02:22,500 when not only learning how to evaluate your regression models, 45 00:02:22,500 --> 00:02:26,800 but also on how to select the best model for any data 46 00:02:26,800 --> 00:02:30,600 set you know for a particular data set, you're working with all right. 47 00:02:30,600 --> 00:02:31,500 So let's do this. 48 00:02:31,500 --> 00:02:34,500 Let's execute each of these cells 49 00:02:34,500 --> 00:02:37,500 starting with importing the libraries. 50 00:02:37,500 --> 00:02:42,433 Then I'm not going to forget I almost forgot to upload the data set. 51 00:02:42,566 --> 00:02:46,333 You know I was about to execute the cell, but that would have returned an error 52 00:02:46,466 --> 00:02:49,033 because indeed the data set needs to be uploaded. 53 00:02:49,033 --> 00:02:51,266 So let's upload it now. All right. 54 00:02:51,266 --> 00:02:53,666 So as usual we have to go into our machine learning. 55 00:02:53,666 --> 00:02:56,600 It is that folder. Wherever you put it on your machine. 56 00:02:56,600 --> 00:02:58,400 Then forward to regression. 57 00:02:58,400 --> 00:03:01,266 Then section nine and last section of this part 58 00:03:01,266 --> 00:03:04,266 two random forest regression and Python. 59 00:03:04,300 --> 00:03:05,966 And there we go. Position salaries. 60 00:03:05,966 --> 00:03:09,666 This is still of course the same data set okay. 61 00:03:09,666 --> 00:03:11,066 So that's all good. 62 00:03:11,066 --> 00:03:12,266 Now we have the data sets. 63 00:03:12,266 --> 00:03:15,133 And now we can import it inside the notebook. 64 00:03:15,133 --> 00:03:18,133 Well actually inside our Python program. 65 00:03:18,300 --> 00:03:20,000 And now we're going to 66 00:03:20,000 --> 00:03:23,700 train the random forest regression model on the whole data set. 67 00:03:24,000 --> 00:03:25,400 So let's do this. 68 00:03:25,400 --> 00:03:30,266 And this will output the random Forest regressor model with all the parameters. 69 00:03:30,266 --> 00:03:34,533 And at this stage the only parameter that I recommend to tune 70 00:03:34,533 --> 00:03:39,033 is indeed that number of estimators which we chose to be equal to ten. 71 00:03:39,333 --> 00:03:41,133 All right. And don't worry too much about the rest. 72 00:03:41,133 --> 00:03:44,300 Now this will already gives you an excellent model. 73 00:03:44,733 --> 00:03:47,866 And then now let's predict the final result 74 00:03:48,233 --> 00:03:52,466 by, you know, just calling this predict message from our regressor object. 75 00:03:52,466 --> 00:03:54,966 And the predict method just has to take as input. 76 00:03:54,966 --> 00:03:59,366 Well that's position level number 6.5 which remember 77 00:03:59,366 --> 00:04:02,533 you have to input in a double pair of square brackets. 78 00:04:02,700 --> 00:04:07,000 Because the predict method expects a 2D array as its input. 79 00:04:07,300 --> 00:04:09,166 Right. So that's very important for you to know. 80 00:04:09,166 --> 00:04:10,700 But we saw this many times. 81 00:04:10,700 --> 00:04:14,000 I'm sure it has become also very obvious for you. 82 00:04:14,233 --> 00:04:17,066 So let's do this. Let's get this prediction. 83 00:04:17,066 --> 00:04:22,966 And we get wow we get a pretty good prediction actually $167,000, 84 00:04:22,966 --> 00:04:26,266 which is very close to, you know, that salary that this person 85 00:04:26,533 --> 00:04:30,933 mentioned to earn in the previous company, which was 160 K. 86 00:04:30,933 --> 00:04:32,633 So that's very, very good. 87 00:04:32,633 --> 00:04:35,133 And now let's visualize the final result. 88 00:04:35,133 --> 00:04:38,333 Oh I actually forgot to delete that output. 89 00:04:38,333 --> 00:04:39,200 But that's fine. 90 00:04:39,200 --> 00:04:42,200 Let's run this cell and we'll get that output again. 91 00:04:42,500 --> 00:04:47,500 And here is the regression curve of the random forest regression model. 92 00:04:47,500 --> 00:04:51,366 And of course it looks like a lot the one of the decision trees. 93 00:04:51,366 --> 00:04:54,900 Although this time there are more steps of the stairs, 94 00:04:54,900 --> 00:04:57,600 you know, remember if the decision tree regression model, 95 00:04:57,600 --> 00:05:00,766 we had a step for each of the position level. 96 00:05:00,933 --> 00:05:05,000 And here for example we have two steps between two position levels. 97 00:05:05,000 --> 00:05:08,066 So that's of course because we have more trees this time 98 00:05:08,066 --> 00:05:11,500 and therefore more splits of, you know, the features where you have 99 00:05:11,700 --> 00:05:15,166 the same prediction, you know, the average of the predicted salary. 100 00:05:15,433 --> 00:05:18,433 So it only makes sense that there are more steps. 101 00:05:18,733 --> 00:05:20,933 All right. So congratulations. 102 00:05:20,933 --> 00:05:24,433 That was your final regression model of part two. 103 00:05:24,733 --> 00:05:26,300 You now have a complete 104 00:05:26,300 --> 00:05:30,166 toolkit of regression models, which gives you a lot of different options 105 00:05:30,166 --> 00:05:34,000 and solutions for your future data sets and future machine learning problems. 106 00:05:34,266 --> 00:05:37,266 So I'm really happy for you that we built together this toolkit. 107 00:05:37,366 --> 00:05:39,000 Make sure to use it the right way. 108 00:05:39,000 --> 00:05:41,700 And now we will finish with this last section 109 00:05:41,700 --> 00:05:45,000 to teach you how to use this regression toolkit the best way 110 00:05:45,000 --> 00:05:48,833 you know by selecting the best model for any data set. 111 00:05:49,233 --> 00:05:51,666 All right, so join me in this next section. 112 00:05:51,666 --> 00:05:53,433 I can't wait to see you again. 113 00:05:53,433 --> 00:05:55,300 And until then, enjoy machine learning.