1 00:00:00,133 --> 00:00:01,166 So great. 2 00:00:01,166 --> 00:00:02,700 That's for the data set. 3 00:00:02,700 --> 00:00:05,300 And now we're going to move on to the implementation. 4 00:00:05,300 --> 00:00:06,966 Simple linear regression 5 00:00:06,966 --> 00:00:10,600 IPython notebook which we're going to open with Google Collaboratory. 6 00:00:10,800 --> 00:00:12,500 So that's simple linear regression. 7 00:00:12,500 --> 00:00:15,500 Let's see how this implementation is structured. 8 00:00:15,733 --> 00:00:18,000 First we're going to import the libraries. 9 00:00:18,000 --> 00:00:19,766 You recognize the first steps of the data 10 00:00:19,766 --> 00:00:22,466 preprocessing phase which we covered in part one. 11 00:00:22,466 --> 00:00:25,333 Second we're going to import the data set. 12 00:00:25,333 --> 00:00:28,566 Then third we're going to split that data set into the training set. 13 00:00:28,566 --> 00:00:29,866 And test set. 14 00:00:29,866 --> 00:00:32,700 Then we're going to train the simple linear regression 15 00:00:32,700 --> 00:00:34,233 model on the training set. 16 00:00:34,233 --> 00:00:36,566 Then we're going to predict the test results. 17 00:00:36,566 --> 00:00:37,866 Then we're going to visualize the training 18 00:00:37,866 --> 00:00:40,066 set results and visualize the test results. 19 00:00:40,066 --> 00:00:41,100 But I don't want to show you this 20 00:00:41,100 --> 00:00:44,566 too much right now because I want to keep the surprise for the end. 21 00:00:44,933 --> 00:00:48,000 And so what we're going to do now is implement 22 00:00:48,133 --> 00:00:51,666 this whole simple linear regression model from scratch together. 23 00:00:52,333 --> 00:00:56,166 And to do so, we are going to create a copy of this file 24 00:00:56,333 --> 00:00:59,833 by clicking here on save a copy in Drive. 25 00:01:00,100 --> 00:01:04,066 This will create a new copy on your drive, in which you will be able 26 00:01:04,066 --> 00:01:07,400 to do some modifications or implement the new model. 27 00:01:07,966 --> 00:01:09,766 Now we just have to do one thing. 28 00:01:09,766 --> 00:01:13,800 It is to remove all the code cells because we were going to re-implement it 29 00:01:13,800 --> 00:01:16,133 from scratch step by step. So let's do this. 30 00:01:16,133 --> 00:01:20,433 We just need to click this trash button here on only the code cells. 31 00:01:20,433 --> 00:01:23,500 Make sure to keep the text in order to highlight you know, 32 00:01:23,700 --> 00:01:26,700 the structure of this implementation. 33 00:01:26,800 --> 00:01:27,700 So there we go. 34 00:01:27,700 --> 00:01:30,700 Let's just delete everything here. 35 00:01:30,933 --> 00:01:33,066 And one last. Perfect. 36 00:01:33,066 --> 00:01:35,700 So now we have the whole structure of the implementation. 37 00:01:35,700 --> 00:01:39,100 You can clearly see the different steps that we're going to implement together. 38 00:01:39,366 --> 00:01:42,300 I will ask you to implement some of them on your own. 39 00:01:42,300 --> 00:01:44,866 And of course then we will implement the solution together. 40 00:01:44,866 --> 00:01:47,166 But that's because I really want you to learn by doing. 41 00:01:47,166 --> 00:01:49,100 I really want you to take action 42 00:01:49,100 --> 00:01:52,100 and try to implement some parts of the models on your own. 43 00:01:52,800 --> 00:01:53,533 All right. Perfect. 44 00:01:53,533 --> 00:01:57,366 So now I want you to think what is going to be our first step here 45 00:01:57,833 --> 00:02:00,833 based on, you know, what we covered in part one. 46 00:02:01,200 --> 00:02:04,500 Well, our first step is obviously the data 47 00:02:04,500 --> 00:02:08,166 preprocessing phase in which we have to import the data set 48 00:02:08,400 --> 00:02:12,800 and maybe use some tools in order to preprocess it the right way. 49 00:02:12,933 --> 00:02:14,733 So that our future simple linear 50 00:02:14,733 --> 00:02:18,200 regression model can be ready to be trained on this data set. 51 00:02:18,833 --> 00:02:19,533 All right. 52 00:02:19,533 --> 00:02:22,966 And now I'm excited because I'm going to show you the efficiency 53 00:02:23,200 --> 00:02:26,966 of the data preprocessing template and how easy and fast 54 00:02:26,966 --> 00:02:30,266 it will be for us to preprocess our data set here. 55 00:02:30,500 --> 00:02:34,133 So what we're going to do now is we're going to go to that previous 56 00:02:34,133 --> 00:02:38,966 folder, you know, the data preprocessing folder which was in part 57 00:02:39,300 --> 00:02:43,633 one data preprocessing accessing the Python section. 58 00:02:43,900 --> 00:02:44,633 And there we go. 59 00:02:44,633 --> 00:02:47,633 Now we're going to open that template. 60 00:02:48,266 --> 00:02:49,566 Perfect. 61 00:02:49,566 --> 00:02:52,866 And you're going to see we're simply going to do some copy paste. 62 00:02:53,200 --> 00:02:56,066 And you'll see that we will only have one thing to change. 63 00:02:56,066 --> 00:02:59,666 And then the data preprocessing phase will be ready and done. 64 00:02:59,700 --> 00:03:03,666 It will be done for us to move on with the next step, which will be, 65 00:03:03,666 --> 00:03:07,700 of course, to train the simple linear regression model on the training set. 66 00:03:07,966 --> 00:03:08,633 All right. 67 00:03:08,633 --> 00:03:10,866 So now I'm going to show you what I've just said. 68 00:03:10,866 --> 00:03:13,866 And simply going to do some copy paste here. 69 00:03:13,866 --> 00:03:17,000 There we go to import first the libraries 70 00:03:17,000 --> 00:03:20,233 and then adding them in a new code cell. 71 00:03:20,233 --> 00:03:26,766 Then to import the data set I'm simply going to copy paste this second code cell. 72 00:03:26,766 --> 00:03:29,766 And you'll see that I will have almost nothing to change. 73 00:03:30,033 --> 00:03:33,033 So creating a new code cell here, pasting it. 74 00:03:33,133 --> 00:03:33,966 And finally, 75 00:03:35,033 --> 00:03:37,000 let's do that final 76 00:03:37,000 --> 00:03:40,700 step of the data preprocessing phase to split the data 77 00:03:40,700 --> 00:03:45,166 set into the training set and the test set to ensure I remind, have the separate 78 00:03:45,400 --> 00:03:49,200 entities where we're going to train separately the model and then evaluate it. 79 00:03:49,500 --> 00:03:51,600 So there we go. Let's paste it. 80 00:03:51,600 --> 00:03:54,900 And now as I've told you, we only have one little thing 81 00:03:54,900 --> 00:03:58,200 to change, which is of course the name of the data set. 82 00:03:58,500 --> 00:04:01,466 And I remind the name of the data set is 83 00:04:01,466 --> 00:04:05,200 we're going to go back to our regression folder here 84 00:04:05,600 --> 00:04:09,700 very quickly two regression simple linear regression Python. 85 00:04:09,700 --> 00:04:11,100 And there we go. Here we are. 86 00:04:11,100 --> 00:04:14,166 And as you can see this is salary underscore data. 87 00:04:14,200 --> 00:04:17,733 So the only thing I have to change here is the name of the data set. 88 00:04:18,000 --> 00:04:20,333 And then voila everything is ready. 89 00:04:20,333 --> 00:04:22,600 The data preprocessing phase is ready. 90 00:04:22,600 --> 00:04:25,900 Because indeed this takes automatically the features 91 00:04:25,900 --> 00:04:28,900 because it selects all the columns except the last one. 92 00:04:28,933 --> 00:04:32,100 And this selects automatically the dependent variable vector 93 00:04:32,366 --> 00:04:33,866 because it selects the last column. 94 00:04:33,866 --> 00:04:36,133 So here of course we have only one feature, 95 00:04:36,133 --> 00:04:39,566 therefore one column for the features and one column for the dependent variable. 96 00:04:39,733 --> 00:04:42,566 But you will see, for example, that for multiple linear regression 97 00:04:42,566 --> 00:04:43,800 it will be exactly the same. 98 00:04:43,800 --> 00:04:46,633 We will have absolutely nothing to change here. 99 00:04:46,633 --> 00:04:47,600 Okay. 100 00:04:47,600 --> 00:04:50,100 So now let me show you the beauty of the result. 101 00:04:50,100 --> 00:04:53,433 Don't forget of course to upload the data set before, 102 00:04:53,733 --> 00:04:55,733 you know running these cells here. 103 00:04:55,733 --> 00:04:59,966 So to upload the data set you indeed need to click this folder here. 104 00:05:00,300 --> 00:05:03,600 Then it's going to connect to a runtime to enable file browsing, 105 00:05:03,900 --> 00:05:07,100 after which you'll be able to upload your data set. 106 00:05:07,133 --> 00:05:08,166 There we go. 107 00:05:08,166 --> 00:05:10,366 So you click this upload button here. 108 00:05:10,366 --> 00:05:14,333 Then you go to, you know, this whole machinery is that folder 109 00:05:14,333 --> 00:05:17,600 which was provided to you in just the previous tutorial. 110 00:05:17,800 --> 00:05:20,100 You know, which you had to download if not already. 111 00:05:20,100 --> 00:05:22,400 So indeed, we're going to go inside this folder. 112 00:05:22,400 --> 00:05:24,666 Then we're going to go to part two regression, 113 00:05:24,666 --> 00:05:27,333 then simple linear regression, then Python. 114 00:05:27,333 --> 00:05:28,266 And there we go. 115 00:05:28,266 --> 00:05:31,066 We select our data set salary data. 116 00:05:31,066 --> 00:05:34,866 We open it and now it is indeed in Google Colab. 117 00:05:35,133 --> 00:05:36,200 So perfect. 118 00:05:36,200 --> 00:05:39,900 And now we're simply going to execute each of these cells first 119 00:05:39,900 --> 00:05:44,333 importing the libraries then importing the data set. 120 00:05:44,666 --> 00:05:45,233 Perfect. 121 00:05:45,233 --> 00:05:49,500 And now splitting the data set into the training set and test set and done. 122 00:05:49,766 --> 00:05:54,300 Now the data preprocessing phase is over and we can move on to the next step 123 00:05:54,466 --> 00:05:57,900 training the simple linear regression model on the training set.