1 00:00:00,300 --> 00:00:01,200 Hello my friends. 2 00:00:01,200 --> 00:00:01,633 All right. 3 00:00:01,633 --> 00:00:03,900 So how did you do with the little exercise here? 4 00:00:03,900 --> 00:00:06,266 Predicting the test set results. 5 00:00:06,266 --> 00:00:09,133 So, based on what we've just done in the previous tutorial, 6 00:00:09,133 --> 00:00:12,466 you know, when training the simple linear regression model on the training set, 7 00:00:12,700 --> 00:00:16,366 and also based on the hint I gave you, you know, this predict method? 8 00:00:16,566 --> 00:00:19,133 Well, you should have been able to crack this. 9 00:00:19,133 --> 00:00:19,500 All right. 10 00:00:19,500 --> 00:00:20,533 So now we're going to implement 11 00:00:20,533 --> 00:00:24,233 the solution together starting by creating this new code cell. 12 00:00:24,900 --> 00:00:25,233 All right. 13 00:00:25,233 --> 00:00:28,233 So first let's just explain what we want to do here. 14 00:00:28,266 --> 00:00:32,700 We want to predict the result of the observations in the test set. 15 00:00:33,000 --> 00:00:36,500 And so let me show you once again the data set. 16 00:00:36,800 --> 00:00:40,500 So that's the whole data set containing the 30 observations. 17 00:00:40,800 --> 00:00:45,166 And we actually splitted this data set into the training set and the test set. 18 00:00:45,433 --> 00:00:49,033 The test size was chosen to be 0.2, meaning that 19 00:00:49,233 --> 00:00:53,133 20% of all these observations went into the test set. 20 00:00:53,400 --> 00:00:56,100 So 20% of 30 is actually six. 21 00:00:56,100 --> 00:00:58,766 So let's say just to simplify the explanation 22 00:00:58,766 --> 00:01:02,833 that the observations of the test set are the last six ones. 23 00:01:02,833 --> 00:01:05,300 You know, 123456. 24 00:01:05,300 --> 00:01:08,266 So let's say all of these observations went into the test set. 25 00:01:08,266 --> 00:01:13,000 Well, what we want to do now is predict for each of these observations, 26 00:01:13,000 --> 00:01:16,300 meaning for each of these employees the salary. 27 00:01:16,500 --> 00:01:17,100 Right. 28 00:01:17,100 --> 00:01:20,900 So what we're going to input in that predict method is exactly 29 00:01:20,900 --> 00:01:24,533 the number of years of experience for each of these six employees. 30 00:01:24,633 --> 00:01:27,600 And our model will predict the salaries. 31 00:01:27,600 --> 00:01:30,333 So the salaries that we see here, you know, the last six ones 32 00:01:30,333 --> 00:01:32,333 are the exact salaries. You know the truth. 33 00:01:32,333 --> 00:01:34,533 We call it the ground truth. 34 00:01:34,533 --> 00:01:37,300 And when calling the predict method on these 35 00:01:37,300 --> 00:01:41,866 six numbers of years of experience, we will get six predicted salaries. 36 00:01:42,000 --> 00:01:42,500 And so what? 37 00:01:42,500 --> 00:01:46,200 We'll want to do then will be to compare the predicted 38 00:01:46,200 --> 00:01:49,333 salaries to these real six salaries okay. 39 00:01:49,500 --> 00:01:52,133 And we will do that in the last steps of this implementation. 40 00:01:52,133 --> 00:01:55,533 When visualizing not only the test results but also the training 41 00:01:55,533 --> 00:01:58,533 set results you're going to see, everything will be super clear. 42 00:01:58,933 --> 00:02:00,033 All right so let's do this. 43 00:02:00,033 --> 00:02:03,166 Let's get the test results by. 44 00:02:03,200 --> 00:02:03,933 First. 45 00:02:03,933 --> 00:02:07,600 You know, in order to call a method we first need to call the object itself 46 00:02:07,600 --> 00:02:09,000 which is regressor. 47 00:02:09,000 --> 00:02:11,200 So that was the first step you had to do here. 48 00:02:11,200 --> 00:02:13,766 Then from our object we add a dot here. 49 00:02:13,766 --> 00:02:16,133 And we call the function we want. 50 00:02:16,133 --> 00:02:17,200 Oh wow. It's funny. 51 00:02:17,200 --> 00:02:21,266 The Google collab actually guessed what I was about to call. 52 00:02:21,266 --> 00:02:24,000 That's I never noticed that. But anyway, yeah, that's perfect. 53 00:02:24,000 --> 00:02:25,566 We want to use the predict method. 54 00:02:25,566 --> 00:02:27,433 Of course that was the hint. 55 00:02:27,433 --> 00:02:31,933 And this predict method, you know as any function expects some arguments. 56 00:02:32,233 --> 00:02:36,000 And so now according to you what does we have to input in this predict method. 57 00:02:36,233 --> 00:02:37,333 Well that's simple. 58 00:02:37,333 --> 00:02:41,566 Once again you know we want to enter the features 59 00:02:42,300 --> 00:02:46,066 meaning numbers of years of experience and not the salaries. 60 00:02:46,066 --> 00:02:47,466 That was just for the training set. 61 00:02:47,466 --> 00:02:49,700 Here we only need the numbers of experience, 62 00:02:49,700 --> 00:02:52,966 because from the numbers of experience we want to predict, the salaries 63 00:02:53,300 --> 00:02:58,066 and the numbers of experience are exactly contained in X test. 64 00:02:58,300 --> 00:02:58,600 Right. 65 00:02:58,600 --> 00:03:01,600 Because we want the numbers of experience of the test set. 66 00:03:01,666 --> 00:03:05,400 And so the only thing we had to input here is X test. 67 00:03:05,466 --> 00:03:08,333 And there you go. That was the solution. 68 00:03:08,333 --> 00:03:10,566 Okay. So as you can see once again very easy. 69 00:03:10,566 --> 00:03:12,300 That's the beauty of the libraries. 70 00:03:12,300 --> 00:03:16,933 You can just do anything you want in usually one or 2 or 3 lines of code. 71 00:03:16,933 --> 00:03:20,800 And here we just had to call the predict method to make some predictions 72 00:03:20,800 --> 00:03:24,300 from a model that is of course already trained on the training set. 73 00:03:24,433 --> 00:03:25,366 All right. 74 00:03:25,366 --> 00:03:29,066 Now, because we were going to visualize then the training set results 75 00:03:29,066 --> 00:03:30,466 in a test result. 76 00:03:30,466 --> 00:03:32,533 I'm actually going to put 77 00:03:32,533 --> 00:03:36,433 all these predictions because this returns actually a vector of predictions. 78 00:03:36,433 --> 00:03:37,033 You know, a vector 79 00:03:37,033 --> 00:03:40,700 containing the predicted salaries of the employees in the test set. 80 00:03:40,933 --> 00:03:44,366 So I would like to put all these predicted salaries in a vector, 81 00:03:44,566 --> 00:03:46,200 therefore in a new variable. 82 00:03:46,200 --> 00:03:51,300 And therefore here I'm creating this new variable which I'm calling y pred. 83 00:03:51,466 --> 00:03:55,866 You know, as opposed to y test which contains the real salary. 84 00:03:55,866 --> 00:03:59,500 So make sure to understand why test here contains the real salaries 85 00:03:59,700 --> 00:04:02,700 and why it here contains the predicted salaries. 86 00:04:02,700 --> 00:04:06,233 And now in the next steps we're going to compare white bread to 87 00:04:06,266 --> 00:04:10,300 white test and also will compare Y train to the predicted salaries 88 00:04:10,300 --> 00:04:11,433 in the training set. 89 00:04:11,433 --> 00:04:13,900 And the last two steps of this implementation 90 00:04:13,900 --> 00:04:17,233 visualizing the training set results and visualizing the test results. 91 00:04:17,233 --> 00:04:21,000 We will get amazing graphs, super clear super simple to understand, 92 00:04:21,066 --> 00:04:25,666 and we will clearly see how well our model was trained with this visualization, 93 00:04:25,833 --> 00:04:27,466 and how well our model was able 94 00:04:27,466 --> 00:04:30,466 to predict new observations with this visualization. 95 00:04:30,566 --> 00:04:32,400 So let's do this in the next tutorial. 96 00:04:32,400 --> 00:04:34,366 And until then, enjoy machine learning.