1 00:00:00,100 --> 00:00:00,966 All right. 2 00:00:00,966 --> 00:00:02,166 So good. 3 00:00:02,166 --> 00:00:04,633 Now we're going to do the same. For the test result. 4 00:00:04,633 --> 00:00:09,400 So I'm just going to copy everything here and paste that in a new. 5 00:00:09,400 --> 00:00:11,700 Code cell right here. 6 00:00:11,700 --> 00:00:12,933 And so now we're going to. 7 00:00:12,933 --> 00:00:14,000 Do the right. 8 00:00:14,000 --> 00:00:17,166 Replacements in this code in order to visualize this I'm not the. 9 00:00:17,166 --> 00:00:19,700 Training set results but the test set results. 10 00:00:19,700 --> 00:00:20,433 And so obviously. 11 00:00:20,433 --> 00:00:22,966 The first thing we need to replace here are the. 12 00:00:22,966 --> 00:00:24,000 Coordinates. 13 00:00:24,000 --> 00:00:25,600 Of the real observations. 14 00:00:25,600 --> 00:00:27,633 Because these are the coordinates of, 15 00:00:27,633 --> 00:00:29,833 you know, the employees in the training set. 16 00:00:29,833 --> 00:00:32,200 And we want the coordinates of the employees. Of the test set. 17 00:00:32,200 --> 00:00:36,833 Therefore here we have to replace obviously X trained by X test. 18 00:00:37,133 --> 00:00:40,133 And then Y train by. Y test. 19 00:00:40,366 --> 00:00:40,733 Okay. 20 00:00:40,733 --> 00:00:42,100 So these are for the real. 21 00:00:42,100 --> 00:00:44,433 Observations with the numbers. Of years of. 22 00:00:44,433 --> 00:00:45,533 Experience and the. 23 00:00:45,533 --> 00:00:47,733 Real salaries of the test set. 24 00:00:47,733 --> 00:00:48,500 And now. 25 00:00:48,500 --> 00:00:50,733 According to you, do we have to replace. 26 00:00:50,733 --> 00:00:52,033 Something here in PLT. 27 00:00:52,033 --> 00:00:53,166 Plot x. Train. 28 00:00:53,166 --> 00:00:57,133 Regressor dot predict x train according to you, do we have to replace. 29 00:00:57,133 --> 00:00:58,866 X train by x. Test. 30 00:00:58,866 --> 00:01:01,633 And the same x trained by x. Test. 31 00:01:01,633 --> 00:01:05,400 Well you know that's kind of a trick question because remember. 32 00:01:05,400 --> 00:01:06,100 That the. 33 00:01:06,100 --> 00:01:07,766 Regression line that. 34 00:01:07,766 --> 00:01:09,533 We get is actually resulting. 35 00:01:09,533 --> 00:01:11,366 From a unique equation. 36 00:01:11,366 --> 00:01:15,700 And therefore the predicted salaries of the test set will be. 37 00:01:15,700 --> 00:01:18,266 On the same regression line as. 38 00:01:18,266 --> 00:01:20,866 The predicted salaries of. The training set. 39 00:01:20,866 --> 00:01:21,300 And that's. 40 00:01:21,300 --> 00:01:24,200 Why here we actually don't have to replace X. 41 00:01:24,200 --> 00:01:24,900 Train by. 42 00:01:24,900 --> 00:01:26,966 Test. Here and here. Okay. 43 00:01:26,966 --> 00:01:28,100 You would actually get the. 44 00:01:28,100 --> 00:01:29,700 Exact same regression line. 45 00:01:29,700 --> 00:01:31,200 Whether you plot the. 46 00:01:31,200 --> 00:01:33,133 Coordinates. Of X train and the predicted. 47 00:01:33,133 --> 00:01:34,300 Salaries of training set. 48 00:01:34,300 --> 00:01:36,100 Or X. Test and the predicted. 49 00:01:36,100 --> 00:01:38,133 Salaries of the test set, you can check. 50 00:01:38,133 --> 00:01:39,133 But that's because the. 51 00:01:39,133 --> 00:01:41,800 Regression line in a simple linear regression model. 52 00:01:41,800 --> 00:01:43,800 Results. From a unique equation. 53 00:01:43,800 --> 00:01:46,733 And we therefore end up with the same regression line. 54 00:01:46,733 --> 00:01:49,433 Okay. So no need to replace anything here. 55 00:01:49,433 --> 00:01:50,266 And finally we. 56 00:01:50,266 --> 00:01:51,300 Will just replace. 57 00:01:51,300 --> 00:01:54,233 Your training set. By test set. 58 00:01:54,233 --> 00:01:54,933 And there you go. 59 00:01:54,933 --> 00:01:56,766 Now we're ready to visualize. 60 00:01:56,766 --> 00:01:59,533 The training set results and the. Test set results. 61 00:01:59,533 --> 00:02:02,366 So let's first. Run the cells we haven't. 62 00:02:02,366 --> 00:02:03,633 Executed before. 63 00:02:03,633 --> 00:02:04,766 So remember we. 64 00:02:04,766 --> 00:02:06,366 Executed up to. The training. 65 00:02:06,366 --> 00:02:08,700 So now we have to execute this one to indeed. 66 00:02:08,700 --> 00:02:12,600 Get the vector y pred of the predicted salaries of the test set. 67 00:02:13,066 --> 00:02:15,800 So there we. Go. We now have y pred. 68 00:02:15,800 --> 00:02:16,633 And so now we're going to. 69 00:02:16,633 --> 00:02:19,766 First execute this to visualize the training set results. 70 00:02:20,033 --> 00:02:20,866 And we're going to obtain. 71 00:02:20,866 --> 00:02:24,033 Now nice 2D plot with. Indeed. 72 00:02:24,233 --> 00:02:27,433 The real salaries. In these red points here. 73 00:02:27,433 --> 00:02:28,433 And of course. 74 00:02:28,433 --> 00:02:31,333 Our beautiful. Regression line containing the. 75 00:02:31,333 --> 00:02:33,000 Predicted salaries. 76 00:02:33,000 --> 00:02:35,400 And we clearly see that this. Regression line was. 77 00:02:35,400 --> 00:02:37,000 Calculated so that. 78 00:02:37,000 --> 00:02:37,966 It comes as. 79 00:02:37,966 --> 00:02:40,866 Close as possible to the real salaries. 80 00:02:40,866 --> 00:02:42,533 And of course, for each of the years of. 81 00:02:42,533 --> 00:02:43,966 Experience here, in order to. 82 00:02:43,966 --> 00:02:47,033 Get the predicted salary, you have to project the. 83 00:02:47,033 --> 00:02:48,100 Year of experience. 84 00:02:48,100 --> 00:02:49,566 To this blue. 85 00:02:49,566 --> 00:02:51,666 Regression line. So, for example. 86 00:02:51,666 --> 00:02:52,666 The predicted salary. 87 00:02:52,666 --> 00:02:55,300 Corresponding to. Eight years. Of experience. 88 00:02:55,300 --> 00:02:56,100 Is about. 89 00:02:56,100 --> 00:02:59,666 You know, 100,000. Dollars per year. Okay. 90 00:02:59,666 --> 00:03:01,766 So that's how it works. And so we can clearly. 91 00:03:01,766 --> 00:03:06,800 See that our predicted salaries are very close to the real salaries. 92 00:03:06,800 --> 00:03:08,600 You know, for most of them. 93 00:03:08,600 --> 00:03:10,300 But that's on the training set. 94 00:03:10,300 --> 00:03:12,700 Remember that's important because our model was. 95 00:03:12,700 --> 00:03:15,466 Actually. Trained. With these observations. 96 00:03:15,466 --> 00:03:18,366 You know. With these years. Of experience and. Salaries. 97 00:03:18,366 --> 00:03:19,133 Now what we would. 98 00:03:19,133 --> 00:03:21,233 Like to observe is if we have. 99 00:03:21,233 --> 00:03:23,966 The same results, you know, the same closeness. 100 00:03:23,966 --> 00:03:25,800 Of the. Regression line to the real. 101 00:03:25,800 --> 00:03:28,266 Salaries in the test set with. 102 00:03:28,266 --> 00:03:30,133 Which the model wasn't trained. 103 00:03:30,133 --> 00:03:33,133 You know, we want to evaluate it on new observations. 104 00:03:33,166 --> 00:03:36,433 And that's exactly what this new graphic will tell us. 105 00:03:36,600 --> 00:03:38,566 Because now we were about. To plot. 106 00:03:38,566 --> 00:03:39,866 This time, the real. 107 00:03:39,866 --> 00:03:42,166 Salaries of the test. Set and the. 108 00:03:42,166 --> 00:03:44,633 Predicted salaries at. The same. Test set. 109 00:03:44,633 --> 00:03:46,666 So there we. Go. Let's run this cell and let's. 110 00:03:46,666 --> 00:03:50,700 See if we're still close to the real salaries. 111 00:03:50,700 --> 00:03:52,266 Even for new observations. 112 00:03:52,266 --> 00:03:53,166 And yes. 113 00:03:53,166 --> 00:03:56,200 Absolutely are predicted salaries, which are of. 114 00:03:56,200 --> 00:03:59,733 Course on the blue line once again are very close indeed to the. 115 00:03:59,733 --> 00:04:01,166 Real salaries. 116 00:04:01,166 --> 00:04:04,233 So our model or a simple linear regression model. 117 00:04:04,433 --> 00:04:05,300 Was able to. 118 00:04:05,300 --> 00:04:07,533 Do a wonderful. Job at predicting. 119 00:04:07,533 --> 00:04:09,300 New observations. 120 00:04:09,300 --> 00:04:10,533 So congratulations. 121 00:04:10,533 --> 00:04:11,533 That's very exciting. 122 00:04:11,533 --> 00:04:12,966 You built. A very successful. 123 00:04:12,966 --> 00:04:14,700 First machine learning model. 124 00:04:14,700 --> 00:04:16,800 However, just remember. That. 125 00:04:16,800 --> 00:04:19,300 The reason why we. Get such a good. 126 00:04:19,300 --> 00:04:20,600 Regression line here. 127 00:04:20,600 --> 00:04:22,666 Is simply because there was a linear. 128 00:04:22,666 --> 00:04:25,633 Relationship in the data set between the features 129 00:04:25,633 --> 00:04:27,733 and the dependent variable of. The data. Set. 130 00:04:27,733 --> 00:04:29,500 In other words. We had a perfectly. 131 00:04:29,500 --> 00:04:30,933 Linear data. Set. 132 00:04:30,933 --> 00:04:32,366 In other. Words, we had a data set. 133 00:04:32,366 --> 00:04:34,033 With linear correlations. 134 00:04:34,033 --> 00:04:36,266 And what I want to make sure you understand is that. 135 00:04:36,266 --> 00:04:37,800 These beautiful results. 136 00:04:37,800 --> 00:04:39,300 Won't happen for any. 137 00:04:39,300 --> 00:04:41,400 Data set, because of course, you'll get to. 138 00:04:41,400 --> 00:04:44,400 Work on data sets with non-linear relationships. 139 00:04:44,600 --> 00:04:47,600 And for these data sets, you will need a non-linear model. 140 00:04:47,733 --> 00:04:50,000 And that's exactly why. In this part. 141 00:04:50,000 --> 00:04:51,500 Two. Regression, we. 142 00:04:51,500 --> 00:04:54,600 Will also study. Some nonlinear models. 143 00:04:54,600 --> 00:04:55,533 Which you will see will be. 144 00:04:55,533 --> 00:04:56,900 Polynomial regression or. 145 00:04:56,900 --> 00:04:59,166 SVR, which will allow you to. 146 00:04:59,166 --> 00:05:00,066 Get amazing. 147 00:05:00,066 --> 00:05:01,633 Results for data. 148 00:05:01,633 --> 00:05:03,633 Sets having non-linear. Relationships. 149 00:05:03,633 --> 00:05:06,800 But for this data set, we clearly had linear relationships. 150 00:05:06,800 --> 00:05:07,166 And that's. 151 00:05:07,166 --> 00:05:09,133 Why, Linear regression model. 152 00:05:09,133 --> 00:05:10,233 Was perfectly. Fine. 153 00:05:10,233 --> 00:05:12,800 But I just. Wanted to highlight this okay. 154 00:05:12,800 --> 00:05:14,666 So congratulations once again, I. 155 00:05:14,666 --> 00:05:17,566 Hope you're very happy and excited to have built your very. 156 00:05:17,566 --> 00:05:19,033 First machine learning model. 157 00:05:19,033 --> 00:05:21,300 Now we're going to increase. The difficulty a bit. 158 00:05:21,300 --> 00:05:25,266 Because in the next section we will build a multiple linear regression 159 00:05:25,266 --> 00:05:29,033 model, meaning that instead of having one feature, 160 00:05:29,033 --> 00:05:31,166 you know only one independent variable. 161 00:05:31,166 --> 00:05:33,700 We will have several of them. And that's. Exactly the. 162 00:05:33,700 --> 00:05:37,333 Difference between simple linear regression and multiple linear regression. 163 00:05:37,566 --> 00:05:40,366 Simple linear regression is when you have only one feature, 164 00:05:40,366 --> 00:05:43,466 and multiple linear regression is when you have several features. 165 00:05:43,833 --> 00:05:44,700 So take a. 166 00:05:44,700 --> 00:05:46,700 Good break now, and as soon as you're. 167 00:05:46,700 --> 00:05:49,500 Back with your energy. Let's implement. Our. 168 00:05:49,500 --> 00:05:52,433 Second machine learning model in the next practical. 169 00:05:52,433 --> 00:05:54,400 Activity after. 170 00:05:54,400 --> 00:05:55,633 Intuition lecture. 171 00:05:55,633 --> 00:05:57,466 And until then, enjoy machine learning.