1 00:00:00,300 --> 00:00:01,033 And now let's add 2 00:00:01,033 --> 00:00:04,500 the following component which is our regression line. 3 00:00:04,933 --> 00:00:07,933 So here we add a little plus and then enter. 4 00:00:08,766 --> 00:00:12,600 So to plot the observation points we used geom point. 5 00:00:12,900 --> 00:00:16,133 And now very intuitively to plot the regression line 6 00:00:16,166 --> 00:00:19,166 we're going to use geom line. 7 00:00:19,233 --> 00:00:21,600 All right. So pretty easy to remember. 8 00:00:21,600 --> 00:00:23,866 And then in this line same as for the points 9 00:00:23,866 --> 00:00:28,600 we're going to take the x coordinate and the y coordinate of our regression line. 10 00:00:29,200 --> 00:00:34,200 So let's see first we need to take the function esthetic as parentheses. 11 00:00:34,633 --> 00:00:37,100 And that's where we put the x and the y. 12 00:00:37,100 --> 00:00:40,100 So in your opinion what is going to be the x. 13 00:00:40,133 --> 00:00:44,533 Well the x is going to be the years experience of the training set 14 00:00:44,700 --> 00:00:47,100 because we want to have the predictions 15 00:00:47,100 --> 00:00:51,400 of the salaries for the observations in the training set. 16 00:00:51,400 --> 00:00:54,600 So in the x axis we need to have the number of years 17 00:00:54,600 --> 00:00:57,333 of experience of the training set observations. 18 00:00:57,333 --> 00:00:59,133 So here I'm just going to copy this. 19 00:01:01,566 --> 00:01:04,333 There and 20 00:01:04,333 --> 00:01:06,100 paste it here. 21 00:01:06,100 --> 00:01:09,100 But now what is going to be the Y. 22 00:01:10,133 --> 00:01:11,966 It's not going to be training set salary 23 00:01:11,966 --> 00:01:15,033 obviously because we don't want to take the real salaries now. 24 00:01:15,033 --> 00:01:17,400 We want to take the predicted salaries. 25 00:01:17,400 --> 00:01:19,933 So in order to take the predicted salaries 26 00:01:19,933 --> 00:01:23,233 we will very simply use again this predict function. 27 00:01:23,633 --> 00:01:26,866 We cannot use again y pred here because why print 28 00:01:26,900 --> 00:01:29,900 is the predicted salaries of the test set observations. 29 00:01:30,266 --> 00:01:33,500 But here we want to get the training set predicted salaries. 30 00:01:33,900 --> 00:01:37,433 So we will copy this predict 31 00:01:39,000 --> 00:01:41,000 to take the predictions 32 00:01:41,000 --> 00:01:44,833 of the training, set salaries and therefore in new data. 33 00:01:44,833 --> 00:01:49,200 Here we need to replace test set by training set. 34 00:01:50,800 --> 00:01:52,066 And now it's good. 35 00:01:52,066 --> 00:01:55,033 Now this geom line function here 36 00:01:55,033 --> 00:01:58,633 will plot all the predicted salaries of the training set. 37 00:01:58,633 --> 00:01:59,966 Observations. 38 00:01:59,966 --> 00:02:02,400 And so for each employee with 39 00:02:02,400 --> 00:02:05,466 the same number of years of experience, because here we have the same x, 40 00:02:06,033 --> 00:02:09,600 we're going to have both the real salaries, the salaries 41 00:02:09,600 --> 00:02:13,033 they actually have in the company and the salaries predicted 42 00:02:13,033 --> 00:02:16,700 by our simple linear regression model for the same employees. 43 00:02:17,300 --> 00:02:19,366 Okay, so it's ready. The line is ready. 44 00:02:19,366 --> 00:02:22,366 But as for the observation points, we will just add a color. 45 00:02:22,766 --> 00:02:26,866 And then that means that we need to put a comma here and press enter. 46 00:02:27,200 --> 00:02:30,200 And that's where we put the second argument which is going to be the color. 47 00:02:30,500 --> 00:02:31,800 And this time we will pick 48 00:02:33,566 --> 00:02:35,066 a blue color. 49 00:02:35,066 --> 00:02:37,166 So again that's just my taste. 50 00:02:37,166 --> 00:02:40,166 You're welcome to use some other colors if you prefer. 51 00:02:40,366 --> 00:02:42,933 Okay. So now our plot is almost ready. 52 00:02:42,933 --> 00:02:45,833 We have all the observation points of the training set. 53 00:02:45,833 --> 00:02:47,766 We have the regression line. 54 00:02:47,766 --> 00:02:49,433 And so now we could plot the graph now. 55 00:02:49,433 --> 00:02:53,400 But if we want to have a nice graph that we want to present to people 56 00:02:53,400 --> 00:02:56,566 like managers or clients or even your friends, it's always 57 00:02:56,566 --> 00:03:00,100 better to add a little title and some labels on the axis. 58 00:03:00,333 --> 00:03:03,633 So let's do this. Plus here and Gigi. 59 00:03:03,666 --> 00:03:07,200 So this time it's not GM, it's Gigi because it's not a geometrical thing. 60 00:03:07,200 --> 00:03:09,933 Gigi. Title parenthesis quotes. 61 00:03:09,933 --> 00:03:12,933 Because we have to put the title of our plot in quotes. 62 00:03:13,000 --> 00:03:15,533 And here we're going to give the following title salary 63 00:03:16,900 --> 00:03:19,900 vs experience. 64 00:03:21,966 --> 00:03:23,400 And we are going to specify 65 00:03:23,400 --> 00:03:26,400 that it's the training set. 66 00:03:27,033 --> 00:03:28,500 Because then we will do 67 00:03:28,500 --> 00:03:31,500 the same for the test set and close the parenthesis. 68 00:03:31,833 --> 00:03:34,833 And that's it. So perfect. That's our title. 69 00:03:35,133 --> 00:03:38,633 And now let's just add a name of the x axis. 70 00:03:38,666 --> 00:03:42,066 So to do this we use the x lab as in parentheses. 71 00:03:42,066 --> 00:03:46,400 We will say that the name of our x axis is going to be years 72 00:03:47,400 --> 00:03:50,000 of experience 73 00:03:50,000 --> 00:03:51,200 okay. 74 00:03:51,200 --> 00:03:54,000 And then plus and then the same for our y axis. 75 00:03:54,000 --> 00:03:56,233 And obviously it's going to be y lab. 76 00:03:56,233 --> 00:04:00,900 And he will choose salary very simply okay. 77 00:04:01,200 --> 00:04:04,200 And our graph is ready to be plotted. 78 00:04:04,200 --> 00:04:07,200 So I'm going to select all this. 79 00:04:08,233 --> 00:04:09,833 And now press Command and Control. 80 00:04:09,833 --> 00:04:11,033 Press enter to execute. 81 00:04:12,666 --> 00:04:14,966 And here is our beautiful graph. 82 00:04:14,966 --> 00:04:19,366 So that's the training set results with the real training set observations 83 00:04:19,366 --> 00:04:23,366 with the real salaries and red and the regression line in blue. 84 00:04:23,666 --> 00:04:26,666 So let's zoom this graph and let's give an interpretation. 85 00:04:28,133 --> 00:04:28,433 All right. 86 00:04:28,433 --> 00:04:30,900 So this is our training set results. 87 00:04:30,900 --> 00:04:32,466 So we can clearly make the distinction 88 00:04:32,466 --> 00:04:35,466 between the real observations which are the red points. 89 00:04:35,700 --> 00:04:39,133 So these are the real salaries of the employees in the company. 90 00:04:39,800 --> 00:04:42,466 And this blue straight line here 91 00:04:42,466 --> 00:04:45,200 is our linear regression model. 92 00:04:45,200 --> 00:04:48,200 So that's the prediction that means that for example 93 00:04:48,233 --> 00:04:51,533 if we take this employee here well we can see that 94 00:04:51,533 --> 00:04:55,433 his number of years of experience is a little more than five years of experience. 95 00:04:55,733 --> 00:05:01,066 His real salary is about $65,000. 96 00:05:01,600 --> 00:05:05,233 But our model predicted for this particular employee, 97 00:05:05,566 --> 00:05:10,100 this result here, that means he predicted that this employee has a salary 98 00:05:10,366 --> 00:05:13,800 of a little less than $75,000. 99 00:05:14,633 --> 00:05:14,966 Okay. 100 00:05:14,966 --> 00:05:17,600 So that's a real important distinction to understand. 101 00:05:17,600 --> 00:05:21,133 This is the real value, and this is the predicted value of the model. 102 00:05:21,133 --> 00:05:24,200 And to get the predicted salary you need to project this 103 00:05:24,200 --> 00:05:27,566 point of the line here onto the salary axis, 104 00:05:28,033 --> 00:05:32,833 which give us a predicted salary of a little less than $75,000.