1 00:00:00,600 --> 00:00:01,133 All right. 2 00:00:01,133 --> 00:00:03,300 So that's the results of the training set. 3 00:00:03,300 --> 00:00:04,700 These are quite good results. 4 00:00:04,700 --> 00:00:09,033 But that's because we trained our linear regression models on the training set. 5 00:00:09,533 --> 00:00:13,466 And now let's see how our simple linear regression model can predict 6 00:00:13,566 --> 00:00:14,600 new observations. 7 00:00:14,600 --> 00:00:17,400 That is new observations of the test set. 8 00:00:17,400 --> 00:00:19,700 So I'm going to close that okay. 9 00:00:19,700 --> 00:00:23,533 And now let's do the same for the test sets to see if our simple linear 10 00:00:23,533 --> 00:00:27,300 regression can make some good predictions on new observations. 11 00:00:27,300 --> 00:00:30,466 So the test set observations on which our model wasn't built. 12 00:00:30,900 --> 00:00:33,833 So as usual we want to be efficient. 13 00:00:33,833 --> 00:00:36,833 And so we're going to copy all this 14 00:00:38,100 --> 00:00:41,100 copy and paste it here. 15 00:00:41,400 --> 00:00:42,333 Okay. 16 00:00:42,333 --> 00:00:45,333 So let's replace all the training set here 17 00:00:45,733 --> 00:00:48,666 by to set okay 18 00:00:48,666 --> 00:00:49,400 here as well. 19 00:00:49,400 --> 00:00:51,466 Training set to set. 20 00:00:51,466 --> 00:00:53,800 That's the test results okay. 21 00:00:53,800 --> 00:00:56,100 So now we must be careful with something. 22 00:00:56,100 --> 00:00:59,100 Let's see if we have to replace all the training set. 23 00:00:59,466 --> 00:01:03,500 So in the first component of ggplot that is when we plot 24 00:01:03,500 --> 00:01:04,700 all the observation points. 25 00:01:04,700 --> 00:01:06,766 The real observations 26 00:01:06,766 --> 00:01:10,366 here we have training set years experience and training set salary. 27 00:01:10,566 --> 00:01:13,600 In your opinion do we need to change training set here by test set? 28 00:01:14,166 --> 00:01:15,100 Yes of course. 29 00:01:15,100 --> 00:01:19,566 Because this time we want to have the observation points of the test set. 30 00:01:20,533 --> 00:01:21,800 So we want to 31 00:01:21,800 --> 00:01:24,800 take the years experience here of the test set observations 32 00:01:24,933 --> 00:01:27,733 and the salaries of the test set observations. 33 00:01:27,733 --> 00:01:31,366 And here we will replace training set by test set okay. 34 00:01:31,366 --> 00:01:33,300 But now in geom line. 35 00:01:33,300 --> 00:01:36,300 So that is the component that builds a regression line. 36 00:01:36,566 --> 00:01:40,166 Do we need to replace training set by test set here. 37 00:01:40,333 --> 00:01:43,000 You know training set here and training set here. 38 00:01:43,000 --> 00:01:46,000 Do we need to replace those two training sets by test set. 39 00:01:46,700 --> 00:01:49,033 Well the answer is actually of course no 40 00:01:49,033 --> 00:01:52,566 because our regressor is already trained on the training set. 41 00:01:52,566 --> 00:01:56,333 So whether we keep here the training set or replaced by test set, 42 00:01:56,566 --> 00:01:59,366 we will obtain the same simple linear regression line. 43 00:01:59,366 --> 00:02:02,200 Indeed, if we replaced here training set by test set, 44 00:02:02,200 --> 00:02:05,500 we would just build some new points of this same regression line 45 00:02:05,500 --> 00:02:08,566 corresponding to the new predictions of the test set observation points. 46 00:02:08,566 --> 00:02:12,666 Because when we trained our simple linear regressor on the training set, 47 00:02:13,000 --> 00:02:17,466 we obtained one unique model equation, which is the simple linear equation 48 00:02:17,466 --> 00:02:18,266 itself. 49 00:02:18,266 --> 00:02:18,866 And therefore, 50 00:02:18,866 --> 00:02:22,800 whether we build the regression line here by predicting the training set points 51 00:02:22,833 --> 00:02:24,300 or the test set points. 52 00:02:24,300 --> 00:02:27,366 Well, since this predictions result from the same unique 53 00:02:27,366 --> 00:02:30,533 simple linear equation, we will get the same regression line. 54 00:02:31,266 --> 00:02:32,166 Okay, now. 55 00:02:32,166 --> 00:02:33,733 So that means that now we're ready okay. 56 00:02:33,733 --> 00:02:37,166 To set to set test it here and training set here. 57 00:02:37,166 --> 00:02:37,900 We're all good. 58 00:02:37,900 --> 00:02:41,866 We are ready to plot the test set results 59 00:02:42,000 --> 00:02:45,000 to find out how our simple linear 60 00:02:45,000 --> 00:02:48,000 regression behaves on new observations. 61 00:02:48,366 --> 00:02:50,066 So let's do this. Press Command and control. 62 00:02:50,066 --> 00:02:51,200 Plus enter to execute. 63 00:02:52,866 --> 00:02:54,066 And here we go. 64 00:02:54,066 --> 00:02:58,466 So as you can see as I was telling you if we go back to the training set here, 65 00:02:58,866 --> 00:03:02,633 you can see that only the red points change you see. 66 00:03:02,966 --> 00:03:05,100 And the blue regression line doesn't change. 67 00:03:05,100 --> 00:03:06,666 You see, it's the same here. 68 00:03:06,666 --> 00:03:08,366 Only the red points change. 69 00:03:08,366 --> 00:03:10,800 So here that's the training set observation points 70 00:03:10,800 --> 00:03:12,766 with the model built on the training set. 71 00:03:12,766 --> 00:03:16,866 And here that's the test set observations with the same regression line 72 00:03:17,100 --> 00:03:19,000 builds on the training set okay. 73 00:03:19,000 --> 00:03:22,266 So now let's zoom on the test set plots and give some interpretation 74 00:03:23,766 --> 00:03:24,100 okay. 75 00:03:24,100 --> 00:03:26,233 So the test set results are actually not bad. 76 00:03:26,233 --> 00:03:29,366 We can see that we have some very good predictions here. 77 00:03:29,666 --> 00:03:32,666 For example for this employee this employee this employee 78 00:03:33,300 --> 00:03:34,866 and this one as well. 79 00:03:34,866 --> 00:03:38,800 Because as you can see the real value is closed to the prediction 80 00:03:38,800 --> 00:03:40,700 which is on the blue regression line. 81 00:03:40,700 --> 00:03:45,433 So since these are new observations on which our simple linear regression 82 00:03:45,433 --> 00:03:49,133 model didn't learn any correlation, that's actually quite a good job 83 00:03:49,133 --> 00:03:52,433 to find, you know, predictions so close to the real value. 84 00:03:52,600 --> 00:03:56,466 And of course, there are some predictions that are further from the real value. 85 00:03:56,766 --> 00:04:00,300 But that's because there is not, 100% 86 00:04:00,300 --> 00:04:04,200 linear dependency between the salary and the years of experience. 87 00:04:04,566 --> 00:04:07,566 But for sure there is a certain linear dependency. 88 00:04:07,966 --> 00:04:08,366 All right. 89 00:04:08,366 --> 00:04:11,466 So thank you for watching this tutorial about simple linear regression. 90 00:04:11,800 --> 00:04:15,833 In the next section, we will find out about a new type of linear regression, 91 00:04:16,100 --> 00:04:20,000 which is going to be the multiple linear regression, meaning that instead 92 00:04:20,000 --> 00:04:24,733 of having only one independent variable, we'll have several independent variables 93 00:04:25,033 --> 00:04:28,033 that will predict a single dependent variable. 94 00:04:28,133 --> 00:04:30,933 So there will be a new business problem, and I can't wait 95 00:04:30,933 --> 00:04:33,933 to solve it with you using multiple linear regression. 96 00:04:34,300 --> 00:04:35,700 I'll see you in the next section. 97 00:04:35,700 --> 00:04:37,600 And until then, enjoy machine learning.