1 00:00:00,100 --> 00:00:00,900 Hello my friends. 2 00:00:00,900 --> 00:00:01,300 All right. 3 00:00:01,300 --> 00:00:02,933 So let's implement quickly 4 00:00:02,933 --> 00:00:06,733 and efficiently the linear regression model on the whole data set. 5 00:00:06,966 --> 00:00:10,566 First a quick reminder on what is the linear regression model 6 00:00:10,566 --> 00:00:11,766 that we're about to build. 7 00:00:11,766 --> 00:00:14,033 We're simply about to build this one right. 8 00:00:14,033 --> 00:00:17,400 Because we have only one feature which is the position levels 9 00:00:17,533 --> 00:00:21,033 and the dependent variable, which is of course the salary to predict. 10 00:00:21,066 --> 00:00:21,866 All right. 11 00:00:21,866 --> 00:00:23,400 So that's what we'll do first. 12 00:00:23,400 --> 00:00:27,000 And then we'll of course implement the polynomial regression model with 13 00:00:27,133 --> 00:00:31,733 I will tell you how many powers of that same position level feature. 14 00:00:32,000 --> 00:00:32,366 Okay. 15 00:00:32,366 --> 00:00:36,100 So let's do this starting by creating a new code cell. 16 00:00:36,733 --> 00:00:37,166 All right. 17 00:00:37,166 --> 00:00:41,066 So I hope you did it yourself first to make sure you refresh your skills. 18 00:00:41,333 --> 00:00:46,766 And so if that's the case of course you stole it from the scikit learn library 19 00:00:46,766 --> 00:00:50,100 because that's the library that contains this linear 20 00:00:50,100 --> 00:00:53,100 regression class which allows to build a linear regression model. 21 00:00:53,166 --> 00:00:55,400 So scikit learn from which we're going 22 00:00:55,400 --> 00:00:59,300 to get access to the linear model which contains this class. 23 00:00:59,500 --> 00:01:02,400 And from this linear model we're going to import 24 00:01:02,400 --> 00:01:06,766 the linear regression class as Google Collab. 25 00:01:06,766 --> 00:01:09,433 Guess it's perfectly perfect. So that's the first step. 26 00:01:09,433 --> 00:01:11,866 Then we're going to create an object of this class. 27 00:01:11,866 --> 00:01:15,400 So I know we used to call it regressor, but this time 28 00:01:15,400 --> 00:01:18,000 since we're going to have two regressors, you know we're going to have 29 00:01:18,000 --> 00:01:21,666 the linear regression regressor and the polynomial regression regressor. 30 00:01:21,666 --> 00:01:27,100 Therefore this time I'm going to call it Lin rake for linear regressor okay. 31 00:01:27,266 --> 00:01:32,600 So Lin rake will be created as an object of this linear regression class. 32 00:01:32,600 --> 00:01:36,066 So I'm copying and pasting that here adding some parenthesis. 33 00:01:36,233 --> 00:01:39,666 And remember we don't have to input anything in these parentheses 34 00:01:39,666 --> 00:01:42,866 because there is not much to tune in the linear regression model. 35 00:01:43,633 --> 00:01:44,166 Good. 36 00:01:44,166 --> 00:01:47,466 And finally now we have, you know, the linear regression model. 37 00:01:47,466 --> 00:01:49,100 But it is not smart yet. 38 00:01:49,100 --> 00:01:52,400 It is not trained yet on this data set. 39 00:01:52,633 --> 00:01:55,433 You know, to understand and learn the correlations 40 00:01:55,433 --> 00:01:57,833 between the position levels and the salaries. 41 00:01:57,833 --> 00:02:01,500 So that's what we're going to do right now by using this fit method, 42 00:02:01,600 --> 00:02:06,666 which is exactly a training method that will train the model on these data. 43 00:02:06,866 --> 00:02:07,700 All right. 44 00:02:07,700 --> 00:02:11,133 So to do this we're going to call our object first Lin 45 00:02:11,466 --> 00:02:16,133 Rag from which we're going to call the fit method 46 00:02:16,333 --> 00:02:20,933 which remember has to take as input the couple of matrix 47 00:02:20,933 --> 00:02:24,700 of features and dependent variable vector of the training set. 48 00:02:24,766 --> 00:02:26,000 Right. That's what we did before. 49 00:02:26,000 --> 00:02:30,200 But remember here that we didn't split the data set into a training set. 50 00:02:30,200 --> 00:02:31,733 And it just said because we want to leverage 51 00:02:31,733 --> 00:02:34,733 the maximum data in order to train our model. 52 00:02:34,866 --> 00:02:38,200 And therefore this time we're going to take the whole matrix of features X 53 00:02:38,200 --> 00:02:40,433 and the whole dependent variable vector y'all. Right. 54 00:02:40,433 --> 00:02:45,066 So here we just input x and y and there we go. 55 00:02:45,100 --> 00:02:48,300 We built the linear regression model in a flashlight. 56 00:02:48,300 --> 00:02:50,700 So we're going to execute this. 57 00:02:50,700 --> 00:02:51,600 Here we go. 58 00:02:51,600 --> 00:02:57,100 And now we have not only our model but also a trained model on this data set 59 00:02:57,566 --> 00:02:58,700 okay great. 60 00:02:58,700 --> 00:03:00,000 Now we're going to close this. 61 00:03:00,000 --> 00:03:02,200 And we're going to focus on 62 00:03:02,200 --> 00:03:06,633 the heart of the matter which is the polynomial regression model. 63 00:03:06,633 --> 00:03:10,233 I'm going to teach you now how to build the polynomial regression model. 64 00:03:10,933 --> 00:03:13,000 All right. So we're going to create a new code cell. 65 00:03:13,000 --> 00:03:14,133 And now 66 00:03:14,133 --> 00:03:17,966 we're going to go back to this slide to explain exactly what we're going to do. 67 00:03:18,566 --> 00:03:22,400 All right so as you understand what we've just built so far is this model. 68 00:03:22,500 --> 00:03:23,666 We only have one feature 69 00:03:23,666 --> 00:03:25,200 the position levels and the dependent 70 00:03:25,200 --> 00:03:28,200 variable vector to predict which are the salaries. 71 00:03:28,200 --> 00:03:31,666 Now what we're going to do is we're going to create a multiple linear 72 00:03:31,666 --> 00:03:32,833 regression model. 73 00:03:32,833 --> 00:03:36,600 But instead of having different features you know, X1, x2 and Zn, 74 00:03:36,700 --> 00:03:42,300 well these features will be x1, x1, squared and x1 are the power of n. 75 00:03:42,300 --> 00:03:46,866 And we'll actually tune this parameter a bit to try several powers. 76 00:03:47,200 --> 00:03:49,466 So not to confuse the polynomial. 77 00:03:49,466 --> 00:03:54,100 Linear regression is not a linear model, because you will see that it can learn 78 00:03:54,100 --> 00:03:57,600 some nonlinear correlations, but we call it polynomial linear regression 79 00:03:57,600 --> 00:04:02,466 because indeed there is this linear combination of the squared and, 80 00:04:02,466 --> 00:04:06,633 you know, powered features x1, x1, squared and x1 and the power of n. 81 00:04:07,033 --> 00:04:07,700 All right. 82 00:04:07,700 --> 00:04:11,333 And so the process of building this model on Python 83 00:04:11,500 --> 00:04:15,500 will be first to create a matrix of the powered features, 84 00:04:15,500 --> 00:04:17,833 you know a matrix of features but not containing 85 00:04:17,833 --> 00:04:22,200 different features like X1, x2, an x n, but a matrix of features containing x1 86 00:04:22,200 --> 00:04:25,433 as a first feature, then x1 squared as a second feature, 87 00:04:25,433 --> 00:04:28,533 and then x1 to the power of and as an end feature. 88 00:04:28,800 --> 00:04:32,800 So that will be our matrix of features, and we will call it x poly. 89 00:04:33,300 --> 00:04:35,966 And then we will create a linear 90 00:04:35,966 --> 00:04:38,966 regressor object, you know, from the linear regression class 91 00:04:39,233 --> 00:04:43,200 to integrate these powered features of this matrix of features 92 00:04:43,366 --> 00:04:46,600 in this new linear regressor, you see the idea. 93 00:04:46,800 --> 00:04:48,666 So it's a building process in two steps. 94 00:04:48,666 --> 00:04:50,600 We're going to first create the matrix of features 95 00:04:50,600 --> 00:04:53,600 containing these features at different powers. 96 00:04:53,666 --> 00:04:57,566 And then we'll integrate that into a linear regression model. 97 00:04:57,600 --> 00:05:01,433 Because indeed this is a linear combination of these powered features. 98 00:05:02,066 --> 00:05:03,000 All right perfect. 99 00:05:03,000 --> 00:05:05,466 So let's do this first step. 100 00:05:05,466 --> 00:05:08,533 The first step is to actually import this tool 101 00:05:08,533 --> 00:05:12,366 that will allow us to create this matrix of powered features. 102 00:05:12,366 --> 00:05:15,366 You know X1X1 squared and x one are the power of n. 103 00:05:15,466 --> 00:05:18,000 And this class is called polynomial features. 104 00:05:18,000 --> 00:05:19,466 So we're going to import it first 105 00:05:19,466 --> 00:05:24,000 which we can get from once again of course the scikit learn library. 106 00:05:24,233 --> 00:05:25,166 There we go. 107 00:05:25,166 --> 00:05:30,133 But this time we're going to get access to the pre-processing module 108 00:05:30,133 --> 00:05:32,766 which contains that polynomial features class. 109 00:05:32,766 --> 00:05:37,666 This is a class because indeed we're kind of preprocessing our X1 feature 110 00:05:37,666 --> 00:05:43,433 because we want to create out of x1 well x1, x1 squared and x1 add the power of n. 111 00:05:43,433 --> 00:05:46,433 So we're kind of doing preprocessing of the features. 112 00:05:46,700 --> 00:05:47,066 All right. 113 00:05:47,066 --> 00:05:49,900 So from this preprocessing module we're going to import. 114 00:05:49,900 --> 00:05:54,533 Well that class put Li no more features. 115 00:05:54,533 --> 00:05:54,900 Perfect.