1 00:00:00,233 --> 00:00:02,400 Hello and welcome to this art tutorial. 2 00:00:02,400 --> 00:00:04,366 And now we are ready to move on to the next step, 3 00:00:04,366 --> 00:00:07,600 which is to fit our polynomial model to the data set. 4 00:00:07,933 --> 00:00:11,733 However, I would like to show you how the polynomial regression 5 00:00:11,733 --> 00:00:15,200 model is a much more powerful model for our situation, 6 00:00:15,566 --> 00:00:20,166 and the best way to show you this is to actually compare it to a baseline model, 7 00:00:20,166 --> 00:00:23,666 like a reference based model, which will be our linear regression model. 8 00:00:24,100 --> 00:00:26,733 That's why in this tutorial we will build two models 9 00:00:26,733 --> 00:00:29,733 the linear regression model and the polynomial regression model. 10 00:00:30,000 --> 00:00:31,500 And then we will compare the results. 11 00:00:31,500 --> 00:00:35,366 We will compare the graphic results and also the predictions. 12 00:00:35,400 --> 00:00:38,400 So you'll be convinced that the polynomial regression model 13 00:00:38,566 --> 00:00:41,433 is a much more appropriate model for this problem. 14 00:00:41,433 --> 00:00:45,433 And the main reason for that is that our problem is a nonlinear problem. 15 00:00:46,433 --> 00:00:46,733 Okay. 16 00:00:46,733 --> 00:00:49,400 So let's start by building this linear model. 17 00:00:49,400 --> 00:00:52,700 It's going to be very quick because we actually already did this. 18 00:00:52,933 --> 00:00:56,500 So you know we start by creating our regressor that this time 19 00:00:56,500 --> 00:00:57,766 we're not going to call it regressor 20 00:00:57,766 --> 00:01:01,866 because we will build two regressors the linear one and the polynomial one. 21 00:01:02,100 --> 00:01:04,933 So we will call it Lin rank. 22 00:01:04,933 --> 00:01:05,900 Here we go. 23 00:01:05,900 --> 00:01:09,533 And then you know later we will call our polynomial regressor poly rank. 24 00:01:10,100 --> 00:01:11,600 So Lin rank equals. 25 00:01:11,600 --> 00:01:15,300 And then remember we have to use the LM formula. 26 00:01:15,300 --> 00:01:17,533 And so we need to add some parentheses here. 27 00:01:17,533 --> 00:01:21,533 And then let's remind ourselves what we need to input 28 00:01:21,800 --> 00:01:25,400 by pressing F1 here to have a look at the arguments okay. 29 00:01:25,400 --> 00:01:28,033 So remember the first argument is formula. 30 00:01:28,033 --> 00:01:33,800 So the formula is very simply the salary. 31 00:01:33,800 --> 00:01:35,233 That is our dependent variable 32 00:01:36,266 --> 00:01:36,800 tilde. 33 00:01:36,800 --> 00:01:40,566 So I just pressed alt n and then a dot here 34 00:01:40,566 --> 00:01:42,200 to take all the independent variables. 35 00:01:42,200 --> 00:01:45,133 But actually there is only one independent variable. 36 00:01:45,133 --> 00:01:50,566 So the dot here is strictly equivalent as just writing level okay. 37 00:01:50,566 --> 00:01:52,100 So perfect for the first argument. 38 00:01:52,100 --> 00:01:55,066 And then what is the second argument. It's data. 39 00:01:55,066 --> 00:01:56,100 So data okay. 40 00:01:56,100 --> 00:02:00,900 So let's add this argument data equals and now let's see okay. 41 00:02:00,900 --> 00:02:05,133 So in the linear regression section we actually used the training 42 00:02:05,133 --> 00:02:09,566 set here as the data argument to train our linear regression model. 43 00:02:09,833 --> 00:02:10,800 But as we explained 44 00:02:10,800 --> 00:02:14,466 before here we did not split the data set into the training set or the test set. 45 00:02:14,700 --> 00:02:19,166 So we are simply going to train our model on the whole data set itself, 46 00:02:19,400 --> 00:02:21,966 because it's a small data set anyway, and we want to have 47 00:02:21,966 --> 00:02:24,100 the most accurate prediction. 48 00:02:24,100 --> 00:02:27,000 So here simply we will just input data set. 49 00:02:28,400 --> 00:02:29,100 All right. 50 00:02:29,100 --> 00:02:31,633 And our linear regression model is ready. 51 00:02:31,633 --> 00:02:33,466 We are actually ready to build it. 52 00:02:33,466 --> 00:02:34,933 So let's just do it. 53 00:02:34,933 --> 00:02:37,633 Let's build our linear regression model 54 00:02:37,633 --> 00:02:40,266 I'm pressing Command and Control plus enter to execute. 55 00:02:40,266 --> 00:02:42,666 All right linear regression made. 56 00:02:42,666 --> 00:02:45,666 We can have a quick look at the summary here. 57 00:02:45,833 --> 00:02:48,833 Summary linear lag. 58 00:02:49,066 --> 00:02:52,133 So I'm typing this in the console and pressing enter. 59 00:02:52,600 --> 00:02:55,566 And here are the statistical results of the model. 60 00:02:55,566 --> 00:02:58,933 We're doing this on R because as you can see it's really easy on Python. 61 00:02:58,933 --> 00:03:01,900 We would have needed to import a class create an object here. 62 00:03:01,900 --> 00:03:02,533 It's really easy. 63 00:03:02,533 --> 00:03:04,866 So we can have a look and we can see that 64 00:03:04,866 --> 00:03:08,433 the level has two stars here for the level of significance 65 00:03:08,700 --> 00:03:12,166 and is actually not a bad predictor of the salary. 66 00:03:12,300 --> 00:03:14,200 But wait for the polynomial model 67 00:03:14,200 --> 00:03:17,500 to see how it's going to be much better than linear regression. 68 00:03:18,166 --> 00:03:20,500 Okay, so now let's move on to the next level. 69 00:03:20,500 --> 00:03:23,500 That is the better model polynomial regression. 70 00:03:23,633 --> 00:03:25,466 And let's build it okay. 71 00:03:25,466 --> 00:03:30,733 So as I mentioned we are going to call this regressor poly underscore rank. 72 00:03:30,900 --> 00:03:32,100 All right this way. 73 00:03:32,100 --> 00:03:34,666 And you know since the polynomial regression model 74 00:03:34,666 --> 00:03:39,633 is actually a multiple linear regression model in which the independent variables 75 00:03:39,633 --> 00:03:42,900 are actually the polynomial features of one independent 76 00:03:42,900 --> 00:03:45,900 variables, as Carol explained in the intuition tutorial. 77 00:03:46,033 --> 00:03:50,566 Well, we are going to use again the alarm function as we did for linear regression. 78 00:03:50,900 --> 00:03:54,033 So here I'm just going to start by taking my lm function. 79 00:03:54,200 --> 00:03:57,600 Add the parenthesis and we're going to input our two arguments 80 00:03:57,900 --> 00:04:01,100 formula equals salary 81 00:04:02,033 --> 00:04:05,466 tilde of n and then this dot here. 82 00:04:05,800 --> 00:04:08,666 But don't worry we'll actually represent something else. 83 00:04:08,666 --> 00:04:11,566 So so far I'm just putting the dot and you'll understand 84 00:04:11,566 --> 00:04:15,133 what's going to happen next which will make our polynomial regression 85 00:04:15,300 --> 00:04:17,000 built correctly. 86 00:04:17,000 --> 00:04:19,766 So then comma and then we add our second argument 87 00:04:19,766 --> 00:04:22,700 which is still going to be the data equals data set. 88 00:04:23,700 --> 00:04:24,733 Because we're going to train 89 00:04:24,733 --> 00:04:28,033 our polynomial regression model on the whole data set okay. 90 00:04:28,533 --> 00:04:30,433 So now you must be telling yourself wait. 91 00:04:30,433 --> 00:04:33,066 But this is exactly the same as linear regression. 92 00:04:33,066 --> 00:04:33,800 Well that's true. 93 00:04:33,800 --> 00:04:36,833 And that's why we need to add a little something here 94 00:04:36,833 --> 00:04:39,900 to make this model a polynomial regression model.