1 00:00:00,300 --> 00:00:02,733 All right, so we're almost ready to start. 2 00:00:02,733 --> 00:00:06,166 Now we're going just to upload, you know, the data 3 00:00:06,166 --> 00:00:09,166 sets by clicking this little folder here. 4 00:00:09,200 --> 00:00:13,533 And then right now it is connecting to runtime to enable file browsing. 5 00:00:13,766 --> 00:00:17,700 And in a second we should be able to see the upload button. 6 00:00:17,700 --> 00:00:19,566 Here we go to upload. 7 00:00:19,566 --> 00:00:22,400 Indeed the data set. 8 00:00:22,400 --> 00:00:24,633 And now we are on my desktop. 9 00:00:24,633 --> 00:00:26,333 That's where I put my machine learning. 10 00:00:26,333 --> 00:00:29,900 It is a folder, but make sure to go wherever you saved it. 11 00:00:30,200 --> 00:00:33,200 And now inside we're going to go to part two regression 12 00:00:33,366 --> 00:00:36,466 and then decision tree regression and then Python. 13 00:00:36,466 --> 00:00:37,600 And then there you go. 14 00:00:37,600 --> 00:00:42,566 You select your position series data set and click open. 15 00:00:42,933 --> 00:00:46,133 And this will upload the data set inside the notebook. 16 00:00:46,400 --> 00:00:47,700 And now we're ready to start. 17 00:00:47,700 --> 00:00:50,200 Let's just run these two cells here. 18 00:00:50,200 --> 00:00:52,700 The first one to import the libraries. 19 00:00:52,700 --> 00:00:56,100 And then the second one to import the data set. 20 00:00:56,666 --> 00:00:57,466 All right. Perfect. 21 00:00:57,466 --> 00:00:59,266 So now we have the data set. 22 00:00:59,266 --> 00:01:03,500 And of course the matrix of features X containing only the position levels 23 00:01:03,733 --> 00:01:07,200 and the dependent variable vector containing the salaries. 24 00:01:07,200 --> 00:01:07,966 All right. 25 00:01:07,966 --> 00:01:11,700 And here quick reminder this model that we were able to build. 26 00:01:11,733 --> 00:01:15,100 You will totally be able to implement it on your data sets. 27 00:01:15,366 --> 00:01:19,066 And you will only have two things to change which are first of course 28 00:01:19,066 --> 00:01:20,000 the name of the data set. 29 00:01:20,000 --> 00:01:23,000 Here you will put the name of your data set. 30 00:01:23,133 --> 00:01:24,600 And then in a matrix of features. 31 00:01:24,600 --> 00:01:28,100 Well maybe you will want to select all the columns. 32 00:01:28,100 --> 00:01:33,133 And here we just excluded the first column because this contains only the 33 00:01:33,366 --> 00:01:38,033 positions and strings which are exactly the same as the levels in this column. 34 00:01:38,033 --> 00:01:39,700 So of course we didn't want to include it. 35 00:01:39,700 --> 00:01:41,400 But check your data sets. 36 00:01:41,400 --> 00:01:45,000 Check if you want to include all the columns, and mostly check 37 00:01:45,000 --> 00:01:50,633 if you need to apply some of the tools of your data preprocessing toolkit, 38 00:01:50,766 --> 00:01:53,766 which are either, you know, taking care of missing data 39 00:01:54,000 --> 00:01:57,000 or encoding categorical data. Right? 40 00:01:57,000 --> 00:02:01,466 So you would need to check the variables and see if there are some categorical 41 00:02:01,466 --> 00:02:02,766 variables and strings. 42 00:02:02,766 --> 00:02:06,333 If the order matters, like for example, the size of a clothes, 43 00:02:06,466 --> 00:02:08,733 well you will apply label encoder. 44 00:02:08,733 --> 00:02:13,033 And if the order doesn't matter, like some countries or some states, 45 00:02:13,166 --> 00:02:17,000 well you will apply column transformer with one hot encoder. 46 00:02:17,000 --> 00:02:17,666 All right. 47 00:02:17,666 --> 00:02:21,000 And then you don't have to apply feature scaling. 48 00:02:21,000 --> 00:02:24,000 You can totally split your data set into the training set and test it 49 00:02:24,233 --> 00:02:27,233 if you want to evaluate your model on you observations. 50 00:02:27,333 --> 00:02:30,333 But you don't have to apply feature scaling 51 00:02:30,400 --> 00:02:34,500 for decision tree regression, and neither for random forest regression. 52 00:02:34,766 --> 00:02:35,400 Why is that? 53 00:02:35,400 --> 00:02:37,100 That's because you know the predictions 54 00:02:37,100 --> 00:02:41,366 from a decision tree regression or random forest regression model, or results 55 00:02:41,466 --> 00:02:45,433 from successive splits of the data, you know, through the different nodes 56 00:02:45,466 --> 00:02:46,300 of your tree, 57 00:02:46,300 --> 00:02:50,000 and therefore there are not some equations like with the previous models. 58 00:02:50,000 --> 00:02:53,000 And that's why, of course, no feature scaling is needed 59 00:02:53,000 --> 00:02:55,466 to, you know, split the different values of your feature 60 00:02:55,466 --> 00:02:58,733 into these different categories, leading to different predictions. 61 00:02:58,733 --> 00:03:02,466 We can still do this with the original scale of your features, 62 00:03:02,466 --> 00:03:05,466 even if your features take different ranges of values. 63 00:03:05,700 --> 00:03:06,366 All right. 64 00:03:06,366 --> 00:03:08,866 So remember this no feature scaling. 65 00:03:08,866 --> 00:03:10,566 And then check for the other tools. 66 00:03:10,566 --> 00:03:13,633 But just for your future data sets where you would like 67 00:03:13,633 --> 00:03:17,100 to apply decision tree regression okay. 68 00:03:17,100 --> 00:03:19,866 Perfect. So we have everything. We have the data set. 69 00:03:19,866 --> 00:03:22,866 And now we are ready to build the decision tree regression 70 00:03:22,866 --> 00:03:25,166 model on the whole data set. Right. This time. 71 00:03:25,166 --> 00:03:26,400 We don't want to split it. 72 00:03:26,400 --> 00:03:29,933 We want to leverage the maximum data to understand the correlations 73 00:03:29,933 --> 00:03:33,300 in this small amount of information. 74 00:03:33,433 --> 00:03:35,533 So we will train it on the whole data set. 75 00:03:35,533 --> 00:03:37,766 Then we will predict our final result. 76 00:03:37,766 --> 00:03:40,766 You know the salary of the position table number 6.5. 77 00:03:40,900 --> 00:03:44,233 And then at the end we will visualize what the regression 78 00:03:44,233 --> 00:03:47,533 curve of the decision tree regression model looks like. 79 00:03:48,033 --> 00:03:48,533 All right. 80 00:03:48,533 --> 00:03:50,866 So let's do all this in the next three tutorials 81 00:03:50,866 --> 00:03:54,300 starting with this step training the decision tree regression model. 82 00:03:54,433 --> 00:03:56,100 On the whole data set. 83 00:03:56,100 --> 00:03:57,966 And until then enjoy machine learning.