1 00:00:00,066 --> 00:00:00,466 Great. 2 00:00:00,466 --> 00:00:04,466 And now let's add the second component which are the predictions. 3 00:00:04,500 --> 00:00:08,633 So I'm going to add a plus here and add my second component here. 4 00:00:08,900 --> 00:00:11,933 So the second component is as I just said the predictions. 5 00:00:12,266 --> 00:00:16,666 And we are going to add these predictions component with the germline function. 6 00:00:16,666 --> 00:00:19,666 So exactly like we did in simple linear regression. 7 00:00:19,666 --> 00:00:25,200 So geom line here serum underscore line and then parenthesis of course. 8 00:00:25,200 --> 00:00:30,733 And again we are going to take the a static function combined to a color. 9 00:00:30,933 --> 00:00:33,933 So I'm just going to copy all this right. 10 00:00:34,200 --> 00:00:37,200 And paste it in my germline function. 11 00:00:37,433 --> 00:00:37,800 All right. 12 00:00:37,800 --> 00:00:40,166 And now of course we will change a few things. 13 00:00:40,166 --> 00:00:43,966 So first let's start with the simplest the let's change the color. 14 00:00:43,966 --> 00:00:47,100 We're going to pick a blue color for predictions. 15 00:00:47,966 --> 00:00:48,833 Here we go. 16 00:00:48,833 --> 00:00:54,033 And now what we need to change are the coordinates of our prediction points. 17 00:00:54,566 --> 00:00:58,100 So the x coordinates of our prediction points are still going 18 00:00:58,100 --> 00:00:59,466 to be the levels themselves. 19 00:00:59,466 --> 00:01:03,600 Because we're just predicting the salaries of the ten levels 20 00:01:03,600 --> 00:01:04,733 that we have in our data set. 21 00:01:04,733 --> 00:01:08,866 So here we obviously need to take the only levels that we have in our data set. 22 00:01:09,300 --> 00:01:12,366 So we keep here data set the level. 23 00:01:13,033 --> 00:01:17,100 But then for the y coordinates of course something is going to change here 24 00:01:17,100 --> 00:01:18,833 because by taking data set 25 00:01:18,833 --> 00:01:22,933 our salary here, we are taking the real salaries of our ten levels. 26 00:01:23,100 --> 00:01:26,100 That is the salaries that are in the salary column of our data set. 27 00:01:26,100 --> 00:01:27,333 Here. 28 00:01:27,333 --> 00:01:31,933 So of course we need to remove this and replace it by something else. 29 00:01:32,366 --> 00:01:34,500 And what is this something else going to be. 30 00:01:34,500 --> 00:01:37,400 Well of course it's going to be the predictions 31 00:01:37,400 --> 00:01:40,966 of the salaries of the ten levels according to the linear regression model. 32 00:01:41,266 --> 00:01:44,933 And to get these predictions, we are going to use of course the predict 33 00:01:45,166 --> 00:01:46,066 function. 34 00:01:46,066 --> 00:01:49,600 And so actually that's exactly the same as in simple linear regression. 35 00:01:49,933 --> 00:01:53,366 In this predict function we first need to specify the regressor. 36 00:01:53,666 --> 00:01:56,800 So the regressor is called Len rag because you know 37 00:01:56,800 --> 00:01:58,766 we are plotting the linear regression results. 38 00:01:58,766 --> 00:02:01,766 And our linear regression model was called linear. 39 00:02:02,033 --> 00:02:04,300 That's our regressor. So linear right here. 40 00:02:04,300 --> 00:02:08,066 And now the second argument remember is new data. 41 00:02:08,100 --> 00:02:10,633 So new data here. 42 00:02:10,633 --> 00:02:14,466 And that's actually the data points we want to make predictions on. 43 00:02:14,633 --> 00:02:18,500 So very simply that's our data set because we want to get 44 00:02:18,500 --> 00:02:22,200 the predictions of the salaries of the ten levels of our data set. 45 00:02:22,266 --> 00:02:24,433 So here we input indeed data set 46 00:02:25,433 --> 00:02:26,133 there. 47 00:02:26,133 --> 00:02:27,000 Perfect. 48 00:02:27,000 --> 00:02:32,266 So we're good because we already inputted the blue color for our predictions. 49 00:02:32,566 --> 00:02:33,400 Great. 50 00:02:33,400 --> 00:02:37,966 And now let's complete this graph very quickly by adding a title A plus here. 51 00:02:38,233 --> 00:02:43,200 And then you know we use the G title function to add a title. 52 00:02:43,200 --> 00:02:44,433 Can you see how it's simple. 53 00:02:44,433 --> 00:02:46,766 It's starting to get really intuitive now. 54 00:02:46,766 --> 00:02:49,766 You know we are plotting the real observation points with the jump 55 00:02:49,766 --> 00:02:52,933 point function, then the predictions with the GM line function, 56 00:02:53,066 --> 00:02:55,566 and now a title with the target title function. 57 00:02:55,566 --> 00:02:58,866 Well, everything will seem easier and easier for you. 58 00:02:59,600 --> 00:03:02,800 So for the title we're going to add as in Python truth 59 00:03:03,600 --> 00:03:06,100 or Bluff. 60 00:03:06,100 --> 00:03:08,400 You know, just to give a funny name to our problem. 61 00:03:08,400 --> 00:03:12,333 And we're going to specify here that we are plotting the linear regression result. 62 00:03:12,366 --> 00:03:16,633 So we will just specify here linear regression. 63 00:03:18,033 --> 00:03:18,833 All right. 64 00:03:18,833 --> 00:03:20,100 So that's it for the title. 65 00:03:20,100 --> 00:03:23,233 And now let's add an x label and a y label. 66 00:03:23,766 --> 00:03:26,433 So for the x label we're going to add simply here 67 00:03:26,433 --> 00:03:29,266 x lab parenthesis then quotes. 68 00:03:29,266 --> 00:03:31,033 And then level. 69 00:03:31,033 --> 00:03:33,766 Because our x axis will contain the levels. 70 00:03:33,766 --> 00:03:38,400 And then plus y lab parenthesis 71 00:03:39,033 --> 00:03:43,566 quotes and salary and done. 72 00:03:43,833 --> 00:03:46,933 The linear regression results are ready to be plotted. 73 00:03:47,466 --> 00:03:51,000 So let's not wait anymore and let's immediately have a look. 74 00:03:51,433 --> 00:03:53,500 So I'm going to select all this. 75 00:03:53,500 --> 00:03:56,500 I don't need to select this because ggplot2 is already imported. 76 00:03:56,833 --> 00:03:59,833 And now let's press Command and Control plus enter to execute. 77 00:04:01,366 --> 00:04:04,066 And here are the linear regression results. 78 00:04:04,066 --> 00:04:06,900 So let's zoom on it and let's interpret. 79 00:04:06,900 --> 00:04:07,800 Great. Okay. 80 00:04:07,800 --> 00:04:10,066 So what is the first thing to understand here. 81 00:04:10,066 --> 00:04:13,200 Well the most important thing to understand is definitely that 82 00:04:13,366 --> 00:04:16,933 we need to make the distinction between the real observation points 83 00:04:16,933 --> 00:04:18,766 that are the red points here. 84 00:04:18,766 --> 00:04:21,800 Each point correspond to one specific level 85 00:04:22,133 --> 00:04:25,933 and the real salary associated to this level on the y axis here. 86 00:04:26,233 --> 00:04:29,433 And then we have our predictions on this line that is that 87 00:04:29,733 --> 00:04:33,300 for each level let's say for example the level five 88 00:04:33,766 --> 00:04:37,166 when we project this level five here on this straight line 89 00:04:37,166 --> 00:04:40,166 here, this point is actually our prediction point. 90 00:04:40,400 --> 00:04:43,766 And we can get the predicted salary by projecting 91 00:04:43,766 --> 00:04:46,800 this point here on the y axis this way. 92 00:04:47,400 --> 00:04:50,533 And so we get a little less than $250,000. 93 00:04:50,900 --> 00:04:52,766 Okay. So that's the first thing to understand. 94 00:04:52,766 --> 00:04:55,066 The red points are the real observation points. 95 00:04:55,066 --> 00:04:58,233 And the points on this blue straight line are the prediction points. 96 00:04:58,600 --> 00:04:59,133 Okay. 97 00:04:59,133 --> 00:05:01,100 Now the second important thing to understand 98 00:05:01,100 --> 00:05:04,733 is that our prediction result is actually a straight line. 99 00:05:04,966 --> 00:05:07,966 And the fact that it's a straight line is due to a particular reason. 100 00:05:08,233 --> 00:05:12,933 It's that the linear regression model is obviously a linear model. 101 00:05:13,500 --> 00:05:18,000 Each time you build a linear model in two dimensions, you'll get a straight line. 102 00:05:18,300 --> 00:05:21,800 And I'm highlighting this because the next model we are going to visualize 103 00:05:21,800 --> 00:05:24,800 the result is going to be the polynomial regression model. 104 00:05:24,966 --> 00:05:28,333 And this model turns out to be a non-linear regression model. 105 00:05:28,500 --> 00:05:31,200 And you will see that it will not be a straight line anymore.