1 00:00:00,100 --> 00:00:01,000 All right, my friends. 2 00:00:01,000 --> 00:00:04,233 Are you ready for the final steps of this implementation? 3 00:00:04,333 --> 00:00:06,966 First, we're going to visualize the training set results. 4 00:00:06,966 --> 00:00:09,666 And second, we will visualize the test set results. 5 00:00:09,666 --> 00:00:10,633 So let's do this. 6 00:00:10,633 --> 00:00:12,966 Let's start by creating a new code cell. 7 00:00:12,966 --> 00:00:14,066 And there we go. 8 00:00:14,066 --> 00:00:18,000 Now we just want to plot on a nice graphic the real salaries 9 00:00:18,000 --> 00:00:20,000 compared to the predicted salaries. 10 00:00:20,000 --> 00:00:21,966 So basically we're going to have a 2D plot 11 00:00:21,966 --> 00:00:25,500 with the x axis being the numbers of years of experience, 12 00:00:25,500 --> 00:00:30,200 you know, from 1 to 10 and the y axis being the salaries, 13 00:00:30,200 --> 00:00:33,600 you know, in the range of salaries given in this data set. 14 00:00:34,000 --> 00:00:37,433 And so we will plot in red points the real salaries 15 00:00:37,700 --> 00:00:41,200 and in the blue straight line, the predicted salaries. 16 00:00:41,200 --> 00:00:41,700 Okay. 17 00:00:41,700 --> 00:00:43,166 And we will do that both 18 00:00:43,166 --> 00:00:46,933 for the predictions in the training set and the predictions in a test set. 19 00:00:47,400 --> 00:00:48,100 Are you ready? 20 00:00:48,100 --> 00:00:49,433 Let's do this. 21 00:00:49,433 --> 00:00:52,366 We're going to first call matplotlib as a reminder. 22 00:00:52,366 --> 00:00:54,266 It has the shortcut plt. 23 00:00:54,266 --> 00:00:55,533 Actually what we're going to call 24 00:00:55,533 --> 00:00:58,533 is exactly the pie plot module from the matplotlib library. 25 00:00:58,533 --> 00:01:01,933 And this is what is having the PLT shortcut. 26 00:01:02,233 --> 00:01:03,366 So there we go. 27 00:01:03,366 --> 00:01:06,933 Let's start by calling Pyplot plt. 28 00:01:07,800 --> 00:01:08,666 And then we're going to call 29 00:01:08,666 --> 00:01:12,000 a specific function from this module which is called scatter. 30 00:01:12,700 --> 00:01:16,033 And scatter will allow us to put the red points 31 00:01:16,033 --> 00:01:19,033 in a corresponding to the real salaries in the 2D plot. 32 00:01:19,133 --> 00:01:23,400 Okay, so naturally what we have to input here are the coordinates 33 00:01:23,400 --> 00:01:26,466 of these red points representing the real salaries 34 00:01:26,733 --> 00:01:30,600 and these coordinates are nothing else than X train and Y train. 35 00:01:30,600 --> 00:01:31,000 Right. 36 00:01:31,000 --> 00:01:35,233 Because remember the x axis will be the numbers of years of experience. 37 00:01:35,233 --> 00:01:38,033 That's what is contained in x train. 38 00:01:38,033 --> 00:01:40,200 And the y axis will be the salaries. 39 00:01:40,200 --> 00:01:42,033 And that is what is contained in y train. 40 00:01:42,033 --> 00:01:43,566 So here and scatter here. 41 00:01:43,566 --> 00:01:45,266 We just have to input the coordinates 42 00:01:45,266 --> 00:01:48,866 first X train for the numbers of years of experience. 43 00:01:49,133 --> 00:01:49,800 There you go. 44 00:01:49,800 --> 00:01:53,466 And then Y train for the salaries okay. 45 00:01:54,500 --> 00:01:55,233 And then we 46 00:01:55,233 --> 00:01:58,233 can add another argument which is color 47 00:01:58,400 --> 00:02:02,133 and which will allow us to choose color of these points. 48 00:02:02,133 --> 00:02:05,066 And as we said, we're going to choose red. 49 00:02:05,066 --> 00:02:06,000 Perfect. 50 00:02:06,000 --> 00:02:09,266 Then next step is to plot the regression line. 51 00:02:09,266 --> 00:02:13,700 So remember from the intuition lectures, the regression line is the line 52 00:02:13,700 --> 00:02:18,433 of the predictions coming as close as possible to the real results, 53 00:02:18,433 --> 00:02:21,833 you know, to the real salaries and therefore the predictions. 54 00:02:21,833 --> 00:02:25,433 You know, the points corresponding to the predicted salaries will follow 55 00:02:25,433 --> 00:02:28,600 a straight line, right, as in a linear function. 56 00:02:29,000 --> 00:02:32,000 And therefore here we're not going to use the scatter method. 57 00:02:32,000 --> 00:02:34,433 We're going to use the plot method 58 00:02:34,433 --> 00:02:37,233 because that's what we use to plot the curve of a function. 59 00:02:37,233 --> 00:02:40,900 And since our function is linear, it will actually be a straight line. 60 00:02:41,100 --> 00:02:41,466 All right. 61 00:02:41,466 --> 00:02:42,300 So there we go. 62 00:02:42,300 --> 00:02:46,533 Let's call first the PLT module pyplot. 63 00:02:46,933 --> 00:02:50,233 Then from this module we're going to call the plot function. 64 00:02:50,566 --> 00:02:51,666 There you go. 65 00:02:51,666 --> 00:02:54,633 And in this function will same as in the scatter function 66 00:02:54,633 --> 00:02:59,266 we have to input the coordinates of the points or line nodes 67 00:02:59,266 --> 00:03:02,100 corresponding to the predictions. So let's do this according to you. 68 00:03:02,100 --> 00:03:04,866 What is the first coordinate of these predicted salaries. 69 00:03:04,866 --> 00:03:08,266 Well that's of course X train Y is. 70 00:03:08,266 --> 00:03:11,933 That is because right now we're visualizing the result of the training set 71 00:03:12,100 --> 00:03:12,566 okay. 72 00:03:12,566 --> 00:03:13,733 So these corresponds 73 00:03:13,733 --> 00:03:17,700 to the numbers of years of experience of the employees in the training set. 74 00:03:17,700 --> 00:03:19,900 That's the X coordinates okay. 75 00:03:19,900 --> 00:03:22,600 Now we're going to input the y coordinates. 76 00:03:22,600 --> 00:03:24,666 And according to you which ones are these. 77 00:03:24,666 --> 00:03:27,000 Well it's something we haven't created yet. 78 00:03:27,000 --> 00:03:30,066 You know we created actually the y print 79 00:03:30,066 --> 00:03:33,066 variable containing the predicted salaries of the test set. 80 00:03:33,133 --> 00:03:36,800 But we haven't created a vector containing the predicted salaries 81 00:03:36,800 --> 00:03:39,000 of the training set. And that's totally fine. 82 00:03:39,000 --> 00:03:43,333 What we'll actually enter here as the y coordinate of this plot 83 00:03:43,333 --> 00:03:44,433 we're about to make, 84 00:03:44,433 --> 00:03:48,066 are going to be exactly the predicted salaries of the training set. 85 00:03:48,066 --> 00:03:53,733 And to get them I'm going to copy this and paste it right here. 86 00:03:53,733 --> 00:03:56,733 But instead of inputting here X test 87 00:03:56,733 --> 00:04:02,200 I'm going to input actually X train because calling the predict method 88 00:04:02,200 --> 00:04:05,500 on X train, meaning on the numbers of years 89 00:04:05,500 --> 00:04:08,766 of experience of the employees in the training set will get me 90 00:04:08,766 --> 00:04:11,933 exactly the predicted salaries of the training set. 91 00:04:12,100 --> 00:04:15,100 All right, just as before with predict X test. 92 00:04:15,433 --> 00:04:17,466 So xtrain. Yes. 93 00:04:17,466 --> 00:04:20,666 And then just as before, we're going to choose a color 94 00:04:20,766 --> 00:04:24,333 which this time will be blue, right. 95 00:04:24,500 --> 00:04:26,100 So that we can clearly see the difference 96 00:04:26,100 --> 00:04:29,100 between the real salaries and the predicted salaries. 97 00:04:29,100 --> 00:04:31,800 Okay. So this will plot the regression line. 98 00:04:31,800 --> 00:04:33,400 Now next step what can we do. 99 00:04:33,400 --> 00:04:35,766 So here we're going to add a title. 100 00:04:35,766 --> 00:04:39,200 And to do this we have to call the title function. 101 00:04:39,600 --> 00:04:42,833 And inside in the parentheses we just have to input in quotes. 102 00:04:43,033 --> 00:04:45,133 Well the title we want for our graph. 103 00:04:45,133 --> 00:04:47,733 And so let's say salary 104 00:04:47,733 --> 00:04:50,566 versus experience okay. 105 00:04:50,566 --> 00:04:53,566 You can choose any other title if you prefer another one. 106 00:04:53,900 --> 00:04:55,200 So salary versus experience. 107 00:04:55,200 --> 00:04:56,533 And then we're going to specify 108 00:04:56,533 --> 00:05:00,700 the training set because indeed then we'll do the same for the test set. 109 00:05:01,433 --> 00:05:05,966 Now we're just going to add a label to the x axis and to the y axis. 110 00:05:05,966 --> 00:05:09,500 And for this we have to use the x label function 111 00:05:10,100 --> 00:05:14,966 into which we have to input well the label we want to display on the x axis. 112 00:05:15,300 --> 00:05:20,400 And we're going to choose and quote of course years of experience. 113 00:05:21,466 --> 00:05:22,200 All right. 114 00:05:22,200 --> 00:05:23,366 Good. 115 00:05:23,366 --> 00:05:28,433 Now we're going to copy this because we're going to do the same for the y axis. 116 00:05:28,766 --> 00:05:32,033 And the name of the function for this is y label. 117 00:05:32,400 --> 00:05:33,600 And for the y label. 118 00:05:33,600 --> 00:05:37,800 Well we will choose to display salary okay. 119 00:05:38,233 --> 00:05:43,333 And finally in order to show the graphic well we just had to finish here 120 00:05:43,333 --> 00:05:46,533 with PLT dot show, you know the show function. 121 00:05:46,866 --> 00:05:49,866 And this will display the graphic in the output.