1 00:00:00,466 --> 00:00:01,066 And this little 2 00:00:01,066 --> 00:00:04,500 something that we're going to add is actually the polynomial features. 3 00:00:05,000 --> 00:00:09,000 The polynomial features are actually some additional independent variables 4 00:00:09,266 --> 00:00:12,366 that are going to be the level squared, the level cubed level 5 00:00:12,366 --> 00:00:15,800 to the fourth level to the fifth up to any degree you want. 6 00:00:16,200 --> 00:00:18,833 These are these additional independent variables. 7 00:00:18,833 --> 00:00:22,466 There are going to compose our new matrix of features in some way, 8 00:00:22,666 --> 00:00:24,533 which will actually be the matrix 9 00:00:24,533 --> 00:00:28,033 on which we will apply a multiple linear regression model, 10 00:00:28,366 --> 00:00:31,366 which will make the whole model a polynomial regression model. 11 00:00:31,666 --> 00:00:36,233 So in short, a polynomial regression model is a multiple linear regression 12 00:00:36,233 --> 00:00:40,500 model that is composed of one independent variable and additional independent 13 00:00:40,500 --> 00:00:44,633 variables that are the polynomial terms of this first independent variable. 14 00:00:45,166 --> 00:00:49,066 Okay, so now that we get this idea, you'll perfectly understand 15 00:00:49,066 --> 00:00:52,833 what we're about to do right now when building our polynomial regression model. 16 00:00:53,100 --> 00:00:57,600 Because what we are going to do is to build this polynomial terms. 17 00:00:57,966 --> 00:01:02,400 And to that extent, what we're simply going to do is to add specifically 18 00:01:02,633 --> 00:01:06,366 a new column here in this data set, which will be the level squared. 19 00:01:06,366 --> 00:01:09,866 So we're going to call this new independent variable level two. 20 00:01:10,266 --> 00:01:12,733 And let's add this new column right now. 21 00:01:12,733 --> 00:01:16,200 So to add a column in a data frame we need to take our data frame. 22 00:01:16,200 --> 00:01:18,300 So its data set. 23 00:01:18,300 --> 00:01:19,300 Here it is. 24 00:01:19,300 --> 00:01:21,766 And then we need to add a dollar sign here. 25 00:01:21,766 --> 00:01:26,100 And then we can add any name here which will create a new column of this name. 26 00:01:26,100 --> 00:01:28,200 And add it into our data set. 27 00:01:28,200 --> 00:01:31,133 So we're going to call this column level two 28 00:01:31,133 --> 00:01:34,200 because we're taking the level squared of our ten levels. 29 00:01:34,833 --> 00:01:36,166 Then we add equals. 30 00:01:36,166 --> 00:01:38,466 And then we actually need to give the formula 31 00:01:38,466 --> 00:01:41,900 of the values in this level two column of our data set. 32 00:01:42,300 --> 00:01:47,366 And this formula is therefore the levels in our data set at the power of two. 33 00:01:47,600 --> 00:01:49,666 So very simply we're going to take 34 00:01:49,666 --> 00:01:53,000 all the levels of our data set by taking our data set. 35 00:01:53,133 --> 00:01:55,200 And same adding this dollar here 36 00:01:55,200 --> 00:01:59,000 to take the level column, which is here, I just have to press enter. 37 00:01:59,400 --> 00:02:01,900 So by doing this I'm taking the level column. 38 00:02:01,900 --> 00:02:04,900 That is all the values of the level column in our data set. 39 00:02:05,266 --> 00:02:09,766 And now I'm going to take the square of all these levels in our data set. 40 00:02:10,166 --> 00:02:13,166 And so to do this very simply I just add hat. 41 00:02:13,166 --> 00:02:17,666 And two that's it that we'll just create a new column 42 00:02:17,666 --> 00:02:20,666 which will contain the squares of our ten levels in our data set. 43 00:02:20,866 --> 00:02:24,500 So let's see right now my data set contains only two columns 44 00:02:24,833 --> 00:02:28,200 level and salary. And if I execute this 45 00:02:29,966 --> 00:02:30,766 here we go. 46 00:02:30,766 --> 00:02:32,733 Now let's have a look at the data set. 47 00:02:32,733 --> 00:02:37,533 As you can see I now have three columns level salary and level two. 48 00:02:37,833 --> 00:02:41,033 And as you can notice the values in the level two column 49 00:02:41,033 --> 00:02:44,033 are the squares of the values in the level column. 50 00:02:44,500 --> 00:02:45,833 Okay. 51 00:02:45,833 --> 00:02:48,533 And now we can build our polynomial regression model. 52 00:02:48,533 --> 00:02:52,833 Because now this dot here not only contains the level column 53 00:02:53,066 --> 00:02:55,166 but also the level two column. 54 00:02:55,166 --> 00:02:58,266 So this will build a multiple linear regression model 55 00:02:58,466 --> 00:03:02,900 where the independent variables are the original independent variable 56 00:03:03,200 --> 00:03:06,733 and the polynomial term of this first independent variable. 57 00:03:07,000 --> 00:03:09,766 And if you want to build a polynomial regression 58 00:03:09,766 --> 00:03:12,900 model with a higher degree, well, you will need to do the same here. 59 00:03:12,900 --> 00:03:16,633 You know, we can copy this line and paste it here. 60 00:03:16,800 --> 00:03:21,300 And just add a level three column that will contain the cubes 61 00:03:21,300 --> 00:03:26,100 of the levels in the original independent variable level of our data set. 62 00:03:26,466 --> 00:03:28,333 And as you can see, it'll be very easy. 63 00:03:28,333 --> 00:03:30,700 I just need to execute this line. 64 00:03:30,700 --> 00:03:34,600 And this will add a level three column in our data set. 65 00:03:34,600 --> 00:03:37,600 And so now this little dot here will be 66 00:03:37,966 --> 00:03:42,133 the original independent variable level, the square values of our levels 67 00:03:42,133 --> 00:03:46,600 in level two and the cube values of our levels in the column level three. 68 00:03:47,400 --> 00:03:51,433 You can add as many powers of levels as you want, but maybe we will stop here. 69 00:03:51,700 --> 00:03:54,900 We will see what is the best polynomial regression for our model. 70 00:03:55,066 --> 00:03:57,166 And so we will see what we'll get with this one. 71 00:03:57,166 --> 00:04:00,800 And now let's actually build the polynomial regression model. 72 00:04:00,800 --> 00:04:04,833 If we select this and execute press Command and Control plus enter to execute 73 00:04:05,333 --> 00:04:08,866 and run our polynomial regression model is created. 74 00:04:09,233 --> 00:04:09,700 Awesome. 75 00:04:09,700 --> 00:04:11,600 So we're going to have a look. 76 00:04:11,600 --> 00:04:14,700 Let's type summary in the console poly rag. 77 00:04:15,700 --> 00:04:16,966 And let's press enter. 78 00:04:16,966 --> 00:04:18,800 And let's see where we get okay. 79 00:04:18,800 --> 00:04:23,000 So now what we can see is that these level two and level three polynomial terms 80 00:04:23,000 --> 00:04:26,633 that we created are actually statistically significant variables. 81 00:04:27,033 --> 00:04:29,866 But actually this table here does not show 82 00:04:29,866 --> 00:04:33,266 the real deal of the polynomial regression model for our problem. 83 00:04:33,566 --> 00:04:35,133 You will be much more convinced 84 00:04:35,133 --> 00:04:38,700 in the next tutorial, we will be visualizing the graphic results, 85 00:04:38,933 --> 00:04:41,966 and you will perfectly understand why our polynomial regression, 86 00:04:41,966 --> 00:04:46,866 which is a non-linear model, will make a much better job at predicting what we want 87 00:04:47,000 --> 00:04:50,833 compared to this linear regression model, because it's a linear model. 88 00:04:51,200 --> 00:04:53,266 So we'll check that in the next tutorial. 89 00:04:53,266 --> 00:04:56,266 And until then, enjoy machine learning.