1 00:00:00,100 --> 00:00:01,400 And now final step. 2 00:00:01,400 --> 00:00:05,066 You know, with this line of code, we actually built the multiple linear 3 00:00:05,066 --> 00:00:07,900 regression model. So we have the model already on the right. 4 00:00:07,900 --> 00:00:08,666 Now it is dumb. 5 00:00:08,666 --> 00:00:11,466 You know, it is not trained yet on the data set. 6 00:00:11,466 --> 00:00:14,066 So now we're going to make it smart by training. 7 00:00:14,066 --> 00:00:18,766 It's indeed on our training set composed of remember X train and Y train. 8 00:00:19,100 --> 00:00:19,666 So there we go. 9 00:00:19,666 --> 00:00:22,666 Let's take our regressor object 10 00:00:23,600 --> 00:00:26,433 from which we're going to call the fit 11 00:00:26,433 --> 00:00:29,066 method which takes us input. 12 00:00:29,066 --> 00:00:31,033 Right. Exactly the same as before. 13 00:00:31,033 --> 00:00:34,800 First, the matrix of features x train. 14 00:00:35,266 --> 00:00:40,033 And second the dependent variable vector y train of the training set. 15 00:00:40,033 --> 00:00:40,666 Of course. 16 00:00:40,666 --> 00:00:42,300 All right. And that's it. 17 00:00:42,300 --> 00:00:45,733 This line of code will build our multiple linear regression model. 18 00:00:45,733 --> 00:00:49,166 And this line of code will train it on our training set. 19 00:00:49,766 --> 00:00:50,633 All right. Perfect. 20 00:00:50,633 --> 00:00:52,666 So now let's just execute the cell. 21 00:00:52,666 --> 00:00:55,633 So let me check the cells that we didn't execute before. 22 00:00:55,633 --> 00:00:58,400 So we executed everything up to here. 23 00:00:58,400 --> 00:01:01,666 So we still have to execute this cell 24 00:01:01,733 --> 00:01:05,100 you know to split indeed the data set into the training set and test it. 25 00:01:05,400 --> 00:01:06,800 And now we have the training set. 26 00:01:06,800 --> 00:01:10,233 And therefore we can proceed to this next step to train 27 00:01:10,400 --> 00:01:13,200 the multiple linear regression model on the training set. 28 00:01:13,200 --> 00:01:16,200 So here we go. Let's do it clicking play. 29 00:01:16,200 --> 00:01:17,400 And there we go. 30 00:01:17,400 --> 00:01:22,700 We now have a fully trained linear regression model on this data set. 31 00:01:22,700 --> 00:01:26,266 So now we have a model that was trained to understand the correlations 32 00:01:26,266 --> 00:01:30,966 between different types of spins by 50 strips and their profit, 33 00:01:30,966 --> 00:01:34,800 so that now investors can deploy this model on new startups 34 00:01:35,000 --> 00:01:39,300 in order to predict what profit they will generate based on these informations. 35 00:01:39,600 --> 00:01:42,166 All right. Perfect. So congratulations. 36 00:01:42,166 --> 00:01:44,366 Now let's recap. You know three things. 37 00:01:44,366 --> 00:01:46,600 Thanks to this amazing linear regression class. 38 00:01:46,600 --> 00:01:49,600 You don't have to worry about the dummy variable trap. 39 00:01:49,600 --> 00:01:53,633 You don't either have to worry about selecting the best features 40 00:01:53,633 --> 00:01:54,300 meaning the features 41 00:01:54,300 --> 00:01:57,600 with the highest p value or the most statistically significant. 42 00:01:57,900 --> 00:02:00,466 The linear regression class will take care of that as well. 43 00:02:00,466 --> 00:02:04,666 And also, you know how to build and train a multiple linear regression model 44 00:02:04,900 --> 00:02:08,733 on a data set, which you also know how to preprocess like a pro. 45 00:02:09,000 --> 00:02:09,733 All right. 46 00:02:09,733 --> 00:02:10,866 So we're now 47 00:02:10,866 --> 00:02:14,933 going to move on to the last step which will be to predict the test results. 48 00:02:14,933 --> 00:02:19,066 Because remember that so far our model was trained on the training set. 49 00:02:19,233 --> 00:02:23,000 And therefore we still need to check its performance on new observations 50 00:02:23,233 --> 00:02:25,400 which are contained in the test set. 51 00:02:25,400 --> 00:02:27,966 However, here we have to understand something important. 52 00:02:27,966 --> 00:02:29,200 You know, compared to what we did 53 00:02:29,200 --> 00:02:32,066 in the previous section on simple linear regression. 54 00:02:32,066 --> 00:02:35,300 This time we actually have several features, right? 55 00:02:35,300 --> 00:02:38,300 We actually have four features instead of one like before. 56 00:02:38,333 --> 00:02:41,400 And therefore we can not actually plot a graph 57 00:02:41,566 --> 00:02:44,100 like we did in simple linear regression where, you know, 58 00:02:44,100 --> 00:02:48,100 we have the features in the x axis and the dependent variable in the y axis. 59 00:02:48,100 --> 00:02:50,000 Because simply there are four features 60 00:02:50,000 --> 00:02:51,966 we would need actually a five dimensional graph, 61 00:02:51,966 --> 00:02:54,400 which is not possible to visualize as humans. 62 00:02:54,400 --> 00:02:57,633 So what we're going to do instead, you know, instead of visualizing 63 00:02:57,900 --> 00:03:00,666 the test results on a plot, well, 64 00:03:00,666 --> 00:03:03,866 we will actually display two vectors. 65 00:03:04,266 --> 00:03:08,766 The first one is the vectors of the real profits in the test set. 66 00:03:09,200 --> 00:03:13,600 And remember that the test set actually counts for 20% of the whole data set. 67 00:03:13,866 --> 00:03:16,766 And therefore here there are, you know, actually 68 00:03:16,766 --> 00:03:19,766 50 observations corresponding to the 50 strips. 69 00:03:20,000 --> 00:03:24,066 Well, 20% of 50 will be ten observations in the test set. 70 00:03:24,366 --> 00:03:27,000 So we will actually have ten samples in the test set. 71 00:03:27,000 --> 00:03:28,966 And we will display two vectors. 72 00:03:28,966 --> 00:03:33,433 First the vector of the ten real profits from the test set. 73 00:03:33,433 --> 00:03:37,666 You know, the ten ground truth and the second vector of the ten predicted 74 00:03:37,666 --> 00:03:42,000 profit of the same test set, so that we can compare for each strip 75 00:03:42,000 --> 00:03:46,400 of the test set if our predicted profit is close to the real profit. 76 00:03:46,400 --> 00:03:48,733 And that's how we will, you know, evaluate our model. 77 00:03:48,733 --> 00:03:51,466 And then later on in this part, you will actually learn 78 00:03:51,466 --> 00:03:55,366 about some evaluation techniques to measure better 79 00:03:55,366 --> 00:03:58,900 the performance of your regression models with some relevant metrics. 80 00:03:58,900 --> 00:04:00,933 But so far, this is how we will do it. 81 00:04:00,933 --> 00:04:03,666 We will clearly see if you know our model performs 82 00:04:03,666 --> 00:04:05,733 well on your observations, you know, on the test set, 83 00:04:05,733 --> 00:04:10,166 because we will clearly see if the predictions are close to the real results. 84 00:04:10,400 --> 00:04:11,100 Okay. 85 00:04:11,100 --> 00:04:13,833 So you can try to do it yourself at least the first step, 86 00:04:13,833 --> 00:04:16,833 which is to get the vector of predicted profit. 87 00:04:16,833 --> 00:04:20,933 But then in order to display the two vectors, you know, next to each other, 88 00:04:21,166 --> 00:04:24,466 well, we'll have to use some certain tricks which are not necessarily obvious. 89 00:04:24,600 --> 00:04:26,800 So we'll do that together. All right. 90 00:04:26,800 --> 00:04:29,933 So as soon as you're ready let's proceed to the next tutorial. 91 00:04:29,933 --> 00:04:31,766 And until then enjoy machine learning.