1 00:00:00,133 --> 00:00:00,966 Hello my friends. 2 00:00:00,966 --> 00:00:02,233 Happy to see you again. 3 00:00:02,233 --> 00:00:06,166 Now for this new section and mostly this new practical activity 4 00:00:06,300 --> 00:00:10,166 where we're going to build together the random forest regression. 5 00:00:10,500 --> 00:00:13,500 So this is our last model of the regression part. 6 00:00:13,500 --> 00:00:16,633 And we're going to build it very quickly and efficiently. 7 00:00:16,633 --> 00:00:21,266 Because actually it looks very much like the decision tree regression model. 8 00:00:21,400 --> 00:00:22,933 So we will be very efficient. 9 00:00:22,933 --> 00:00:27,366 And besides right after this section comes the most important section for you. 10 00:00:27,633 --> 00:00:33,000 It is the section where Kirill and I will explain to you how to not only evaluate 11 00:00:33,000 --> 00:00:36,533 your regression models, but also how to select the best one. 12 00:00:36,766 --> 00:00:39,733 So right after we're done with the random forest regression, 13 00:00:39,733 --> 00:00:43,700 we will actually have a new data set with, you know, multiple features. 14 00:00:43,700 --> 00:00:46,000 You know, like more real world data set. 15 00:00:46,000 --> 00:00:51,866 And I will show you how to use your regression code templates to quickly plug 16 00:00:51,900 --> 00:00:56,266 your regression models onto the data set and quickly find the best one. 17 00:00:56,300 --> 00:00:56,666 All right. 18 00:00:56,666 --> 00:00:59,966 So that will be very important for you to learn how to handle. 19 00:01:00,100 --> 00:01:04,833 But now let's smash our last regression model the random forest regression. 20 00:01:05,133 --> 00:01:08,633 And before we start, let's just make sure everyone here is on the same page. 21 00:01:08,833 --> 00:01:13,066 I give you the link to this folder right before this tutorial in the article. 22 00:01:13,100 --> 00:01:14,966 So make sure to connect to that link. 23 00:01:14,966 --> 00:01:16,866 And now we should all be on the same page. 24 00:01:16,866 --> 00:01:19,866 So we're all going to go into part two regression. 25 00:01:20,100 --> 00:01:20,900 Then we're going to go 26 00:01:20,900 --> 00:01:24,866 this time to the last regression model which is our random forest regression. 27 00:01:25,100 --> 00:01:28,366 And then we're going to start as usual with Python. 28 00:01:28,900 --> 00:01:32,566 And this folder contains once again the position salaries data set. 29 00:01:32,766 --> 00:01:37,833 And of course the random forest regression implementation in the Ipynb format, 30 00:01:37,833 --> 00:01:42,400 which you can therefore open with either Google Colaboratory or Jupyter Notebook. 31 00:01:42,600 --> 00:01:46,033 And as far as I'm concerned, as usual I'm going to open it with 32 00:01:46,266 --> 00:01:47,966 Google Colaboratory. 33 00:01:47,966 --> 00:01:48,966 So there we go. 34 00:01:48,966 --> 00:01:53,400 Let's start implementing the random forest regression model. 35 00:01:53,700 --> 00:01:57,533 So now it is, you know, laying out the notebook. 36 00:01:58,066 --> 00:01:59,800 And in a second we should have it. 37 00:01:59,800 --> 00:02:00,966 There we go. 38 00:02:00,966 --> 00:02:01,600 All right. 39 00:02:01,600 --> 00:02:04,533 So as usual this is in read only mode. 40 00:02:04,533 --> 00:02:09,166 So we're going to quickly create a copy so that we can re-implement it. 41 00:02:09,400 --> 00:02:13,400 We're not going to re-implement it from scratch because it is really similar 42 00:02:13,533 --> 00:02:16,333 as the decision tree regression model. 43 00:02:16,333 --> 00:02:20,200 So you will see that we will only re-implement one code cell. 44 00:02:20,200 --> 00:02:22,833 And if you know, you come here in this regression 45 00:02:22,833 --> 00:02:25,966 folder for the first time with random forest regression, well, 46 00:02:25,966 --> 00:02:29,700 I encourage you to have a look at decision tree regression first, because indeed 47 00:02:30,000 --> 00:02:32,000 all these code cells were explained right. 48 00:02:32,000 --> 00:02:35,433 But this time we're only going to delete this one. 49 00:02:35,733 --> 00:02:36,533 That's where we, 50 00:02:36,533 --> 00:02:40,366 you know, train the random forest regression model on the whole data set. 51 00:02:40,600 --> 00:02:42,300 And then all the rest is same. 52 00:02:42,300 --> 00:02:45,366 It is all the same as decision tree regression. 53 00:02:45,366 --> 00:02:48,600 We import first the libraries, then we import the data set. 54 00:02:49,000 --> 00:02:51,700 Then after, you know, the training of the random forest regression 55 00:02:51,700 --> 00:02:53,266 model on the whole data set, 56 00:02:53,266 --> 00:02:57,333 we predict this new result, which is the exact same syntax. 57 00:02:57,333 --> 00:02:59,800 Actually I'm going to hide this now. 58 00:02:59,800 --> 00:03:03,666 So this is the exact same syntax as the decision tree regression model. 59 00:03:03,766 --> 00:03:05,400 And then here that's the same. 60 00:03:05,400 --> 00:03:08,533 This is the exact same code that we implement 61 00:03:08,533 --> 00:03:11,533 to visualize the random forest regression result. 62 00:03:11,566 --> 00:03:11,900 All right. 63 00:03:11,900 --> 00:03:14,666 So let's just keep this because we did it many times. 64 00:03:14,666 --> 00:03:18,600 And I'm sure you're looking forward to that final section 65 00:03:18,600 --> 00:03:21,000 where you know everything is going to make sense. 66 00:03:21,000 --> 00:03:24,633 Because indeed you will learn how to use this regression 67 00:03:24,633 --> 00:03:27,833 folder containing all these code templates for regression. 68 00:03:27,833 --> 00:03:30,966 And you will learn how to understand which model to choose. 69 00:03:30,966 --> 00:03:33,666 And, you know, select the best one for your data set. 70 00:03:33,666 --> 00:03:36,366 I will explain everything in this last section, 71 00:03:36,366 --> 00:03:40,333 but for now, let's implement that only missing code 72 00:03:40,333 --> 00:03:44,500 cell to train the random forest regression model on the whole dataset. 73 00:03:44,800 --> 00:03:46,900 So let's add a new code cell here. 74 00:03:46,900 --> 00:03:47,700 And now there you go. 75 00:03:47,700 --> 00:03:48,733 You could once again 76 00:03:48,733 --> 00:03:53,033 totally do it yourself by looking at some documentation online. 77 00:03:53,033 --> 00:03:54,933 And actually, well, let's do it together. 78 00:03:54,933 --> 00:03:58,800 This time we're going to pretend that, you know, we would like to build a random 79 00:03:58,800 --> 00:04:01,333 forest regression model and train it on the data set, 80 00:04:01,333 --> 00:04:05,200 and that I have absolutely no clue on how to build it, or you know, 81 00:04:05,266 --> 00:04:08,300 which Scikit-Learn class to use to build it. 82 00:04:08,633 --> 00:04:09,433 So let's see. 83 00:04:09,433 --> 00:04:14,166 Well, what I would do, as I said, would be to go to Google or Bing. 84 00:04:14,633 --> 00:04:15,400 So here is Google. 85 00:04:15,400 --> 00:04:17,566 And I would type here in the search bar. 86 00:04:17,566 --> 00:04:20,566 Well you know for example scikit learn. 87 00:04:20,900 --> 00:04:21,400 All right. 88 00:04:21,400 --> 00:04:25,300 And then random forest regression this one right. 89 00:04:25,666 --> 00:04:28,033 Even the suggestion helps. So that's perfect. 90 00:04:28,033 --> 00:04:29,366 Then pressing enter. 91 00:04:29,366 --> 00:04:32,800 And then I would go for the link of scikit learn. 92 00:04:32,800 --> 00:04:34,266 You know the scikit learn website. 93 00:04:34,266 --> 00:04:37,666 And this will usually be the first link as it is the case right here. 94 00:04:37,666 --> 00:04:39,233 So let's click this. 95 00:04:39,233 --> 00:04:44,700 And normally I should find side the exact name of the random forest regression 96 00:04:44,700 --> 00:04:48,633 class, and also the name of the module that contains this class. 97 00:04:48,633 --> 00:04:52,100 And indeed that is exactly what we see in big here. 98 00:04:52,333 --> 00:04:54,366 This is the whole library scikit learn. 99 00:04:54,366 --> 00:04:56,966 This is the module that contains the class we want. 100 00:04:56,966 --> 00:04:59,200 And this is the name of the class. 101 00:04:59,200 --> 00:05:00,366 So there you go. 102 00:05:00,366 --> 00:05:03,300 Well let's actually take everything here. 103 00:05:03,300 --> 00:05:04,800 And we will get this bit. 104 00:05:04,800 --> 00:05:07,800 Because this is not exactly what we have to write in Python. 105 00:05:07,800 --> 00:05:10,533 But I'm sure you'll know how to adapt this. 106 00:05:10,533 --> 00:05:11,100 There you go. 107 00:05:11,100 --> 00:05:12,266 That's what I wanted to show you. 108 00:05:12,266 --> 00:05:15,600 You know, it's very, very easy to find online. 109 00:05:15,600 --> 00:05:19,200 The name of class that allows to build the model you want, right? 110 00:05:19,200 --> 00:05:22,266 I just had to type scikit learn and random forest regression 111 00:05:22,500 --> 00:05:24,000 and just go to the first link. 112 00:05:24,000 --> 00:05:26,000 So you see very very easy. 113 00:05:26,000 --> 00:05:29,300 So let's go back to our implementation and let's start building 114 00:05:29,433 --> 00:05:33,433 this random forest regression model and train it on the whole data set. 115 00:05:33,700 --> 00:05:35,833 So now I'm going to paste what I've just copied. 116 00:05:35,833 --> 00:05:39,333 Because indeed, in order to import this class 117 00:05:39,333 --> 00:05:44,133 I simply need to add here at the beginning of from right from scikit learn 118 00:05:44,500 --> 00:05:47,633 and then from the in symbol module of scikit learn. 119 00:05:48,000 --> 00:05:52,833 Well I'm going to import that random forest regressor class.