1 00:00:00,166 --> 00:00:00,566 So we're 2 00:00:00,566 --> 00:00:04,466 going to open our ipynb file or Python file. 3 00:00:04,600 --> 00:00:07,233 Let's open this right now. And there we go. 4 00:00:07,233 --> 00:00:10,800 So I remind that this notebook is in read only mode. 5 00:00:10,800 --> 00:00:13,900 And since we, as usual, want to re-implement this from scratch, 6 00:00:14,100 --> 00:00:19,500 we're going to go to file here to create a copy by saving a copy in drive. 7 00:00:19,500 --> 00:00:21,833 And as you see it is creating a copy 8 00:00:21,833 --> 00:00:25,766 in which we will be able to re-implement the whole model from scratch. 9 00:00:25,800 --> 00:00:27,233 And there we go. 10 00:00:27,233 --> 00:00:27,700 All right. 11 00:00:27,700 --> 00:00:30,800 So now as usual, we're going to delete all the code cells 12 00:00:30,800 --> 00:00:33,933 so that we can re-implement this from scratch. 13 00:00:33,933 --> 00:00:37,800 Make sure to only delete the code cells and not the text cells. 14 00:00:37,933 --> 00:00:40,300 So we can see the structure. 15 00:00:40,300 --> 00:00:44,300 And lastly this one perfect. 16 00:00:45,000 --> 00:00:45,400 All right. 17 00:00:45,400 --> 00:00:48,533 So this is the whole structure of this implementation. 18 00:00:48,533 --> 00:00:49,600 Let's have a look. 19 00:00:49,600 --> 00:00:53,700 So we're going to start by importing the libraries then importing the data set. 20 00:00:53,933 --> 00:00:56,200 Then as I told you we're going to skip this step 21 00:00:56,200 --> 00:00:57,166 where we split the data 22 00:00:57,166 --> 00:00:58,700 set into the training set and the test 23 00:00:58,700 --> 00:01:03,200 set simply because we want to leverage the maximum data in order to make our future 24 00:01:03,200 --> 00:01:07,666 prediction of the salary for the position level between 6 and 7. 25 00:01:07,966 --> 00:01:11,766 So we skip this step and then we move on directly to first. 26 00:01:11,766 --> 00:01:15,600 Actually the training of the linear regression model on the whole data set. 27 00:01:15,866 --> 00:01:18,866 And that's for the simple reason that we want to compare the two 28 00:01:18,866 --> 00:01:21,866 models, linear regression and polynomial regression. 29 00:01:22,033 --> 00:01:25,333 Because I want to show you that indeed, a polynomial regression 30 00:01:25,333 --> 00:01:29,100 model will be much more adapted to this data set. 31 00:01:29,500 --> 00:01:34,233 So in the end, we will actually, as you can see, predict the salary of the 32 00:01:34,233 --> 00:01:38,600 position level between 6 and 5 with linear regression and polynomial regression. 33 00:01:38,600 --> 00:01:42,300 And you will see that we will get a much better result with polynomial regression. 34 00:01:42,666 --> 00:01:43,366 Okay. 35 00:01:43,366 --> 00:01:46,333 So after training the linear regression model on the whole data set, 36 00:01:46,333 --> 00:01:49,733 we will train the polynomial regression model on the whole data set as well. 37 00:01:50,133 --> 00:01:52,800 Then we will visualize the linear regression 38 00:01:52,800 --> 00:01:54,566 results on the whole data set again. 39 00:01:54,566 --> 00:01:56,600 And then the polynomial regression result. 40 00:01:56,600 --> 00:01:59,600 And finally we will make our last two predictions. 41 00:01:59,700 --> 00:02:01,700 Predicting a new result with linear regression 42 00:02:01,700 --> 00:02:04,233 and predicting a new result with polynomial regression. 43 00:02:04,233 --> 00:02:08,733 And at the same time you will learn how to make a single prediction. 44 00:02:08,733 --> 00:02:12,000 You know, because what we've done so far, you know, when making predictions, 45 00:02:12,100 --> 00:02:17,366 was to input the whole test set features, but you actually still don't know 46 00:02:17,366 --> 00:02:21,100 exactly how to predict the result of a single observation. 47 00:02:21,100 --> 00:02:24,733 And that's exactly in this practical activity that you will learn it. 48 00:02:24,866 --> 00:02:27,133 Okay. So that will be a good new skill to know. 49 00:02:27,133 --> 00:02:31,000 And I will of course ask you to try to do it on your own before we do it together. 50 00:02:31,333 --> 00:02:32,100 And so there you go. 51 00:02:32,100 --> 00:02:33,433 That's the whole structure. 52 00:02:33,433 --> 00:02:35,200 And now we're ready to start. 53 00:02:35,200 --> 00:02:38,700 And we're going to tackle this semi data preprocessing phase. 54 00:02:38,700 --> 00:02:41,800 And Fleshlight starting by importing the libraries 55 00:02:42,033 --> 00:02:44,733 and thanks to our data preprocessing template. 56 00:02:44,733 --> 00:02:49,900 So I'm copying this here pasting that right here in the new code cell. 57 00:02:50,300 --> 00:02:53,733 Then we're going to import the data set using still our data 58 00:02:53,733 --> 00:02:54,766 preprocessing template. 59 00:02:54,766 --> 00:02:58,266 Because indeed we will have only one thing to change 60 00:02:58,266 --> 00:03:00,633 which is of course the name of our data set. 61 00:03:00,633 --> 00:03:02,300 No, actually my mistake. 62 00:03:02,300 --> 00:03:03,866 We will have two things to change. 63 00:03:03,866 --> 00:03:05,866 And I will explain why right away. 64 00:03:05,866 --> 00:03:09,800 So first let's change that obvious thing, which is the name of the data set. 65 00:03:10,033 --> 00:03:15,266 So let's see the name of the data set again position salaries. 66 00:03:15,600 --> 00:03:16,000 All right. 67 00:03:16,000 --> 00:03:21,900 So let's do this I'm going to replace data here by position salaries. Oh 68 00:03:23,100 --> 00:03:23,800 great. 69 00:03:23,800 --> 00:03:26,000 And now the second thing I wanted to change 70 00:03:26,000 --> 00:03:27,800 is of course the matrix of features X. 71 00:03:27,800 --> 00:03:32,700 Because remember that this automatically takes all the columns except the last one. 72 00:03:32,700 --> 00:03:36,466 But actually here you know these two columns are redundant. 73 00:03:36,466 --> 00:03:37,800 It's like we already did 74 00:03:37,800 --> 00:03:41,600 the label encoding, you know, to encode each of these positions 75 00:03:41,800 --> 00:03:45,500 into numerical integers going from 1 to 10. 76 00:03:45,500 --> 00:03:50,033 But I wanted to leave that column because it's pretty nice to see the names 77 00:03:50,033 --> 00:03:53,233 of the different positions, you know, from business analyst to CEO. 78 00:03:53,700 --> 00:03:56,100 So we actually don't want to include this column. 79 00:03:56,100 --> 00:03:57,000 And therefore 80 00:03:57,000 --> 00:04:00,700 we will start to get the column from the second index, which is index one. 81 00:04:00,700 --> 00:04:03,366 Right. Because indexes in Python start from zero. 82 00:04:03,366 --> 00:04:04,300 So there you go. 83 00:04:04,300 --> 00:04:05,833 We're only going to take that column. 84 00:04:05,833 --> 00:04:08,966 And to do so well we're going to replace that 85 00:04:09,333 --> 00:04:13,600 empty lower bound which means the first index by index one 86 00:04:14,000 --> 00:04:17,700 so that it will indeed take all the columns from the second one, 87 00:04:17,700 --> 00:04:21,900 you know, of index one up to the last one, except the last one, and therefore 88 00:04:22,166 --> 00:04:25,666 it will take all the columns from this one except this one, and therefore 89 00:04:25,666 --> 00:04:28,100 just this one. See, just a simple trick. 90 00:04:28,100 --> 00:04:30,133 All right. So now we're all good. 91 00:04:30,133 --> 00:04:33,200 This will still take the last column, the dependent variable vector 92 00:04:33,233 --> 00:04:34,633 containing all the salaries. 93 00:04:34,633 --> 00:04:36,466 So we're all good. 94 00:04:36,466 --> 00:04:40,000 And we can move on now to the next step which will be to train 95 00:04:40,000 --> 00:04:43,000 the linear regression model on the whole data set. 96 00:04:43,200 --> 00:04:43,666 Great. 97 00:04:43,666 --> 00:04:45,766 So would you like to execute this now? 98 00:04:45,766 --> 00:04:48,366 Yes. Maybe. Let's do it now so that it's done. 99 00:04:48,366 --> 00:04:53,266 So let's click this folder here to be able to upload the file. 100 00:04:53,266 --> 00:04:57,000 Right now it is connecting to a runtime to enable file browsing. 101 00:04:57,000 --> 00:04:57,900 And in a second. 102 00:04:57,900 --> 00:04:58,633 Yet there we go. 103 00:04:58,633 --> 00:05:00,933 We should be able to upload the data set. 104 00:05:00,933 --> 00:05:01,500 All right. 105 00:05:01,500 --> 00:05:03,700 So we're going to go to our machine learning. 106 00:05:03,700 --> 00:05:06,000 It is that folder I always put it on my desktop. 107 00:05:06,000 --> 00:05:09,966 Then part two regression then section six polynomial regression. 108 00:05:09,966 --> 00:05:11,066 Then Python. 109 00:05:11,066 --> 00:05:11,966 And there you go. 110 00:05:11,966 --> 00:05:16,833 Let's get and upload our position salaries data set. 111 00:05:17,133 --> 00:05:18,300 That's all good. 112 00:05:18,300 --> 00:05:19,100 Now we have it. 113 00:05:19,100 --> 00:05:21,200 And now we can run these first two cells. 114 00:05:21,200 --> 00:05:22,800 First importing the libraries 115 00:05:23,800 --> 00:05:24,566 okay done. 116 00:05:24,566 --> 00:05:28,800 And now importing the data set and done as well. 117 00:05:28,833 --> 00:05:32,833 And now we can move on to the next step training the linear regression model 118 00:05:32,833 --> 00:05:34,366 on the whole data set. 119 00:05:34,366 --> 00:05:37,200 So try to do it on your own because you now know how to implement 120 00:05:37,200 --> 00:05:38,666 the linear regression model. 121 00:05:38,666 --> 00:05:41,433 So I trust you will smash this. There is no trap. 122 00:05:41,433 --> 00:05:44,400 And this will be a good way to practice again. 123 00:05:44,400 --> 00:05:46,000 So as soon as you're ready 124 00:05:46,000 --> 00:05:49,266 well let's move on together to the next tutorial to build that 125 00:05:49,266 --> 00:05:52,933 first linear regression model together and then the polynomial regression model. 126 00:05:53,233 --> 00:05:56,233 And until then, enjoy machine learning.