1 00:00:00,133 --> 00:00:01,100 Hello my friends. 2 00:00:01,100 --> 00:00:05,233 Welcome to this new section on now Decision Tree Regression. 3 00:00:05,233 --> 00:00:09,333 So we're about to tackle a new practical activity where we will 4 00:00:09,333 --> 00:00:12,766 all learn together how to build the decision tree regression model. 5 00:00:13,166 --> 00:00:17,100 You will see that it will be super easy based on what we did before. 6 00:00:17,100 --> 00:00:22,200 You know, with all the feature scaling and inverse transformation of the SVR, now 7 00:00:22,200 --> 00:00:26,633 we will not have to apply feature scaling and therefore we will just smash this. 8 00:00:26,633 --> 00:00:27,366 All right. 9 00:00:27,366 --> 00:00:28,033 Are you ready? 10 00:00:28,033 --> 00:00:29,133 Are you ready to start. 11 00:00:29,133 --> 00:00:32,133 Before we go into this part two regression folder, 12 00:00:32,200 --> 00:00:35,200 let's just make sure that everyone here is on the same page. 13 00:00:35,333 --> 00:00:38,033 I give you the link to this folder just before this tutorial. 14 00:00:38,033 --> 00:00:39,966 So you just have to click the link. 15 00:00:39,966 --> 00:00:42,133 And then now we should all be on the same page. 16 00:00:42,133 --> 00:00:45,133 So we're going to go into part two regression. 17 00:00:45,300 --> 00:00:48,300 And then section eight decision tree regression. 18 00:00:48,400 --> 00:00:50,766 We are almost at the end of the regression part. 19 00:00:50,766 --> 00:00:54,266 Congratulations for the great progress you've made so far. 20 00:00:54,533 --> 00:00:55,300 And now we're going to go, 21 00:00:55,300 --> 00:00:59,833 of course, to Python in order to fine our files for this section. 22 00:01:00,300 --> 00:01:01,800 So there are two files here. 23 00:01:01,800 --> 00:01:06,566 The Python implementation of the decision tree regression model in Ipynb format, 24 00:01:06,566 --> 00:01:10,033 which you can open with either Google Colab or Jupyter Notebook. 25 00:01:10,333 --> 00:01:13,500 And our same position salaries data set 26 00:01:13,800 --> 00:01:17,333 containing that same data of the previous company, showing the 27 00:01:17,333 --> 00:01:21,966 different position levels from 1 to 10, corresponding to business analyst to CEO, 28 00:01:22,066 --> 00:01:27,633 and the corresponding salaries from 45,000 to $1 million per year. 29 00:01:28,100 --> 00:01:28,500 All right. 30 00:01:28,500 --> 00:01:32,100 So this time we're going to train a decision tree regression model 31 00:01:32,100 --> 00:01:35,700 to understand the correlations between these two features. 32 00:01:35,900 --> 00:01:38,700 However, I have to say something important here. 33 00:01:38,700 --> 00:01:43,066 The decision tree regression model is not really well adapted to, 34 00:01:43,200 --> 00:01:45,300 you know, these simple data sets, 35 00:01:45,300 --> 00:01:48,566 you know, with only one feature and the dependent variable vector. 36 00:01:48,966 --> 00:01:53,166 You'll see what I mean by that at the end, you know, on the visualization graphs. 37 00:01:53,400 --> 00:01:57,200 But having said that, I would like you not to worry 38 00:01:57,433 --> 00:02:00,666 because to implement one of the decision tree regression model 39 00:02:00,666 --> 00:02:05,233 we were about to build will still work on any other data sets. 40 00:02:05,233 --> 00:02:06,933 You know, with several features. 41 00:02:06,933 --> 00:02:10,066 Here we have one feature, but the code we were about to make will work 42 00:02:10,066 --> 00:02:13,500 for data set having any number of features. 43 00:02:13,766 --> 00:02:14,100 All right. 44 00:02:14,100 --> 00:02:17,400 So even if the results won't be beautiful in the end, well, 45 00:02:17,400 --> 00:02:21,366 you will still be able to use this decision tree regression implementation 46 00:02:21,600 --> 00:02:24,900 on your other data sets, even if they have hundreds of features. 47 00:02:24,900 --> 00:02:29,266 But then make sure to add some data preprocessing tools if needed. 48 00:02:29,400 --> 00:02:32,566 For example, if your data set has some categorical data 49 00:02:32,566 --> 00:02:35,866 or missing data, but you don't have to apply feature scaling 50 00:02:35,866 --> 00:02:39,900 for decision tree regression, and neither for random forest regression, 51 00:02:39,900 --> 00:02:42,900 which will be our next model, you know, in the next section. 52 00:02:43,500 --> 00:02:44,000 All right. 53 00:02:44,000 --> 00:02:46,666 So that's the important thing I wanted to say here. 54 00:02:46,666 --> 00:02:51,500 And now we're going to start our implementation by double clicking 55 00:02:51,500 --> 00:02:55,900 this file here which you can either open with Google Collab 56 00:02:55,900 --> 00:02:59,566 if you like, Google Collab like me or Jupyter Notebook. 57 00:02:59,700 --> 00:03:00,033 All right. 58 00:03:00,033 --> 00:03:01,600 So choose your favorite. 59 00:03:01,600 --> 00:03:04,600 And now let's open Google Colaboratory. 60 00:03:04,633 --> 00:03:08,166 It is opening the notebook and 61 00:03:09,266 --> 00:03:09,966 here we go. 62 00:03:09,966 --> 00:03:12,266 That's the whole implementation. 63 00:03:12,266 --> 00:03:12,566 All right. 64 00:03:12,566 --> 00:03:16,600 So as usual now we're going to create a copy of this notebook. 65 00:03:16,600 --> 00:03:21,600 Because this is in read only mode which means you can't modify it or recode 66 00:03:21,600 --> 00:03:25,900 it. So we're going to go to file here and then click here. 67 00:03:26,100 --> 00:03:28,366 Save a copy in drive. 68 00:03:28,366 --> 00:03:31,433 And this will as you can see create a copy of this notebook 69 00:03:31,766 --> 00:03:35,333 on which you will be able to recode on it. 70 00:03:35,333 --> 00:03:38,600 You know, re-implement this decision tree regression model. 71 00:03:39,100 --> 00:03:39,933 Perfect. 72 00:03:39,933 --> 00:03:42,266 So now you know the next step. 73 00:03:42,266 --> 00:03:44,633 We're going to delete the code cells. 74 00:03:44,633 --> 00:03:50,700 But since it is the third time we actually work on this position salaries data set. 75 00:03:50,966 --> 00:03:55,100 And of course each time it's the same two first steps of the data 76 00:03:55,100 --> 00:03:58,266 preprocessing phase importing the libraries and importing the data sets. 77 00:03:58,466 --> 00:04:00,900 Well, this time we won't re-implement this. 78 00:04:00,900 --> 00:04:03,500 We will just leave them and not delete them. 79 00:04:03,500 --> 00:04:06,766 So we will just delete all the code cells from here. 80 00:04:06,766 --> 00:04:08,466 You know, from the step training, the decision tree 81 00:04:08,466 --> 00:04:10,600 regression model on the whole data set. All right. 82 00:04:10,600 --> 00:04:11,533 So let's do this. 83 00:04:11,533 --> 00:04:15,733 Let's start by deleting this one because we will re-implement it together 84 00:04:15,966 --> 00:04:17,366 then this one. 85 00:04:17,366 --> 00:04:19,300 And now this one. 86 00:04:19,300 --> 00:04:20,133 All right. 87 00:04:20,133 --> 00:04:20,766 Perfect. 88 00:04:20,766 --> 00:04:24,433 Also you can notice that at the end we will only visualize 89 00:04:24,433 --> 00:04:27,433 the decision tree regression results in high resolution. 90 00:04:27,433 --> 00:04:30,033 Because you will see and I will show this to you 91 00:04:30,033 --> 00:04:33,533 that the decision tree regression results in low resolution. 92 00:04:33,533 --> 00:04:37,733 You know, without applying the grid solution will absolutely not make sense. 93 00:04:37,733 --> 00:04:40,200 And I will explain that at the end of this section.