1 00:00:00,333 --> 00:00:03,000 And now next one is going to be. 2 00:00:03,000 --> 00:00:06,000 Well, the other vector we want to concatenate 3 00:00:06,000 --> 00:00:09,933 to that vector of predicted profit, which is the vector of real profits. 4 00:00:10,100 --> 00:00:10,400 All right. 5 00:00:10,400 --> 00:00:13,833 So here we can do this very efficiently because this is exactly the same trick. 6 00:00:14,233 --> 00:00:17,900 I'm going to copy all this and paste 7 00:00:17,900 --> 00:00:21,600 that here and just replace y pred by. 8 00:00:21,966 --> 00:00:24,400 What do we have to replace y bread by. 9 00:00:24,400 --> 00:00:27,300 Well of course we have to replace it by whitest 10 00:00:27,300 --> 00:00:31,933 because whitest contains of course the real profit in the test set. 11 00:00:32,100 --> 00:00:32,566 All right. 12 00:00:32,566 --> 00:00:34,800 Here we are evaluating our model on the test set. 13 00:00:34,800 --> 00:00:36,766 So here we go. 14 00:00:36,766 --> 00:00:39,333 Replacing white bread by white test. 15 00:00:39,333 --> 00:00:43,000 And here as well actually we could keep that 16 00:00:43,000 --> 00:00:47,033 because the length of white test is the same as the length of white bread. 17 00:00:47,500 --> 00:00:48,300 But there we go. 18 00:00:48,300 --> 00:00:52,766 Now we have a beautiful concatenation of two vertical vectors. 19 00:00:53,100 --> 00:00:54,333 But remember this. 20 00:00:54,333 --> 00:00:58,500 You know this up to here is actually 21 00:00:58,500 --> 00:01:02,000 the first argument of the concatenate function. 22 00:01:02,333 --> 00:01:05,333 And therefore we need to add the second one 23 00:01:05,466 --> 00:01:08,400 which is the axis as you can see. 24 00:01:08,400 --> 00:01:11,633 So axis here can take two values 0 or 1. 25 00:01:11,866 --> 00:01:14,466 Zero means that we want to do a vertical concatenation. 26 00:01:14,466 --> 00:01:17,466 And one means that we want to do a horizontal concatenation. 27 00:01:17,766 --> 00:01:21,600 And since here we want to concatenate two vertical vectors together. 28 00:01:21,733 --> 00:01:24,433 Well that concatenation is actually horizontal. 29 00:01:24,433 --> 00:01:27,833 And therefore we have to input here axis equals one. 30 00:01:27,833 --> 00:01:28,900 And we don't have to specify 31 00:01:28,900 --> 00:01:32,166 the name of the argument because this is input in the same order. 32 00:01:32,933 --> 00:01:33,600 All right. 33 00:01:33,600 --> 00:01:34,233 Okay good. 34 00:01:34,233 --> 00:01:37,966 So now we're going to observe the final result and see 35 00:01:38,200 --> 00:01:41,666 if our model was able to return some predictions. 36 00:01:41,666 --> 00:01:45,433 You know, some predicted profits close to the real profit. 37 00:01:45,600 --> 00:01:46,433 So there we go. 38 00:01:46,433 --> 00:01:49,200 Let's press play to run the sale. 39 00:01:49,200 --> 00:01:50,500 And awesome. 40 00:01:50,500 --> 00:01:52,233 I didn't make any mistake. Perfect. 41 00:01:52,233 --> 00:01:54,133 So let's recap. 42 00:01:54,133 --> 00:01:56,500 We can clearly see that we have two vectors here. 43 00:01:56,500 --> 00:01:58,800 That's the first one and that's the second one. 44 00:01:58,800 --> 00:02:02,066 On the left we have the vector of predicted profit. 45 00:02:02,066 --> 00:02:03,400 So that's why pred. 46 00:02:03,400 --> 00:02:06,700 And on the right we have the vector of real profits for 47 00:02:06,966 --> 00:02:10,500 of course the ten startups of the test set. 48 00:02:11,066 --> 00:02:13,166 All right. And so let's see let's see what we get. 49 00:02:13,166 --> 00:02:16,233 Let's see if our predicted profit are close to the real profit. 50 00:02:16,600 --> 00:02:22,433 So for the first drop of the test set well the predicted profit is around 103,000. 51 00:02:22,433 --> 00:02:25,933 And the real profit is actually 103,002 hundred. 52 00:02:25,933 --> 00:02:26,700 So very close. 53 00:02:26,700 --> 00:02:29,633 That's perfect. That's an amazing first prediction. 54 00:02:29,633 --> 00:02:31,400 Then second startup of the test set. 55 00:02:31,400 --> 00:02:35,400 The predicted profit is 132,582. 56 00:02:35,666 --> 00:02:39,000 And the real profit is actually 144,000. 57 00:02:39,000 --> 00:02:42,333 So not a great prediction like before, but still not too bad. 58 00:02:42,733 --> 00:02:45,733 Third startup 132 146. 59 00:02:45,833 --> 00:02:48,900 Still not great, but not too bad either for startup 60 00:02:48,900 --> 00:02:52,400 71 actually 72 and 78. 61 00:02:52,666 --> 00:02:58,300 All right, so pretty close then 178 191 okay. 62 00:02:58,666 --> 00:03:01,300 116 105. 63 00:03:01,300 --> 00:03:03,433 So actually the first prediction was amazing. 64 00:03:03,433 --> 00:03:06,433 But then you know the other ones are still quite good. 65 00:03:06,900 --> 00:03:10,500 Then 6768 actually 81 okay. 66 00:03:10,866 --> 00:03:13,866 98,090 7000. 67 00:03:13,866 --> 00:03:14,733 Very good. 68 00:03:14,733 --> 00:03:19,300 113,000 114,000 110,000. 69 00:03:19,433 --> 00:03:20,600 Very, very good. 70 00:03:20,600 --> 00:03:26,000 And 167,000 and 166,000 amazing predictions. 71 00:03:26,000 --> 00:03:29,333 So we have some, you know, amazing predictions, very close 72 00:03:29,333 --> 00:03:32,500 to the real profits and some okay predictions. 73 00:03:32,500 --> 00:03:35,766 You know, okay, is there are not too far from the real results. 74 00:03:35,766 --> 00:03:37,933 So here from what we see. 75 00:03:37,933 --> 00:03:39,166 Well we could say that 76 00:03:39,166 --> 00:03:42,700 the multiple linear regression is well adapted to this data set. 77 00:03:43,033 --> 00:03:47,433 The data set does not necessarily have some perfect linear correlations. 78 00:03:47,700 --> 00:03:51,400 However, you can be assured that with this linear regression class, well, 79 00:03:51,400 --> 00:03:53,100 it was able to select the right features 80 00:03:53,100 --> 00:03:55,866 with the right parameters to make these predictions. 81 00:03:55,866 --> 00:03:59,866 And even if you tune your linear regression model by, for example, applying 82 00:03:59,866 --> 00:04:04,533 backward elimination to select, you know, a team of more statistically significant 83 00:04:04,533 --> 00:04:08,133 features, you will actually get similar results you can try. 84 00:04:08,133 --> 00:04:10,300 That actually would be a good practice. 85 00:04:10,300 --> 00:04:12,500 We actually do that in the R section. 86 00:04:12,500 --> 00:04:15,333 But in terms of performance this won't change much. 87 00:04:15,333 --> 00:04:19,433 And remember your goal is to be efficient when building 88 00:04:19,433 --> 00:04:21,333 and testing your machine learning models. 89 00:04:21,333 --> 00:04:24,866 So when you get such results with your multiple linear regression, 90 00:04:25,133 --> 00:04:28,100 you know in real life you will actually try other models, 91 00:04:28,100 --> 00:04:31,100 you will actually try other modules which you can tune also. 92 00:04:31,233 --> 00:04:33,600 And then in the and you will compare the performance 93 00:04:33,600 --> 00:04:36,400 of each of these models and select the best one. 94 00:04:36,400 --> 00:04:39,400 So we'll talk about this again at the end of this section. 95 00:04:39,400 --> 00:04:43,100 And also a lot important on model selection. 96 00:04:43,500 --> 00:04:47,666 And so now I have to say congratulations because you now know how to build 97 00:04:47,666 --> 00:04:51,400 another machine learning model which is multiple linear regression and therefore 98 00:04:51,400 --> 00:04:54,866 which you can add in your toolkit thanks to this new code template. 99 00:04:55,600 --> 00:04:58,433 Perfect. So now we're going to move on to R. 100 00:04:58,433 --> 00:05:01,900 I remind that you don't have to master the two programing languages. 101 00:05:01,900 --> 00:05:03,633 If you want to master to. That's fine. 102 00:05:03,633 --> 00:05:05,233 Join me in the R tutorials. 103 00:05:05,233 --> 00:05:07,466 And otherwise if you want to stick to Python, 104 00:05:07,466 --> 00:05:11,366 well feel free to skip the R section and join us, Carol and I, in 105 00:05:11,366 --> 00:05:15,533 the next section on polynomial regression, where you will learn 106 00:05:15,800 --> 00:05:19,233 how to make predictions on a linear data set. 107 00:05:19,233 --> 00:05:23,400 You know, on a data set with non linear relationships, therefore, 108 00:05:23,400 --> 00:05:27,300 with which a multiple linear regression model would not be relevant. 109 00:05:27,766 --> 00:05:30,733 So it's an absolutely necessary model to add in your toolkit. 110 00:05:30,733 --> 00:05:32,866 And you will added in the next section. 111 00:05:32,866 --> 00:05:34,633 Until then, enjoy machine learning.