1 00:00:00,166 --> 00:00:02,766 Hello and welcome to this art tutorial. 2 00:00:02,766 --> 00:00:05,733 So in the previous tutorial, we prepared our data correctly 3 00:00:05,733 --> 00:00:08,900 so that now we are ready to fit our simple linear 4 00:00:08,900 --> 00:00:11,900 regression to our data set without any issues. 5 00:00:12,333 --> 00:00:14,400 So we're going to do that right now. 6 00:00:14,400 --> 00:00:17,700 And as usual we are going to use the simplest way, 7 00:00:17,866 --> 00:00:20,733 which is to take the LM function. 8 00:00:20,733 --> 00:00:25,066 So we're going to do that right now we're going to call a new variable 9 00:00:25,100 --> 00:00:27,300 regressor. 10 00:00:27,300 --> 00:00:28,966 And that's going to be the simple linear 11 00:00:28,966 --> 00:00:31,966 regressor itself then equals. 12 00:00:32,133 --> 00:00:34,700 And then that's where we use the LM function. 13 00:00:34,700 --> 00:00:36,900 So let's just type lm here. 14 00:00:36,900 --> 00:00:39,600 And then let's press F1 to see the info 15 00:00:39,600 --> 00:00:42,600 of this function and especially the arguments. 16 00:00:42,700 --> 00:00:45,766 So let's see the first argument we have to input is formula. 17 00:00:46,100 --> 00:00:49,100 So let's input it formula. 18 00:00:49,266 --> 00:00:51,700 And according to you what is it going to be. 19 00:00:51,700 --> 00:00:54,733 Well this is going to be the dependent variable 20 00:00:55,000 --> 00:00:59,266 expressed as a linear combination of the independent variable. 21 00:00:59,733 --> 00:01:01,233 So here it's very simple. 22 00:01:01,233 --> 00:01:04,533 Since we only have one dependent variable and one independent variable 23 00:01:04,933 --> 00:01:08,833 we just need to type formula equals salary. 24 00:01:10,500 --> 00:01:12,300 Then I'll plus n. 25 00:01:12,300 --> 00:01:12,866 Here you go. 26 00:01:12,866 --> 00:01:17,600 And then we put the independent variable which is years experience. 27 00:01:18,866 --> 00:01:21,066 So what does this notation means. 28 00:01:21,066 --> 00:01:25,200 That means that the salary is proportional to years experience. 29 00:01:25,533 --> 00:01:27,233 Okay. So that's it for the first argument. 30 00:01:27,233 --> 00:01:29,233 That's the formula we need to input. 31 00:01:29,233 --> 00:01:31,800 And that's actually the simple linear regression formula. 32 00:01:31,800 --> 00:01:34,800 And then we need to add a second argument 33 00:01:35,300 --> 00:01:37,800 which is let's see the data okay. 34 00:01:37,800 --> 00:01:38,566 And that's normal. 35 00:01:38,566 --> 00:01:42,366 That's because we have to specify to R on which data 36 00:01:42,366 --> 00:01:45,566 we want to train our simple linear regression model. 37 00:01:45,900 --> 00:01:49,100 And of course this data is the training set. 38 00:01:49,766 --> 00:01:53,100 Because the training set is the set on which you build your model. 39 00:01:53,866 --> 00:01:55,500 Okay. So that's it. 40 00:01:55,500 --> 00:01:57,366 Actually I know there are some other arguments, 41 00:01:57,366 --> 00:02:00,733 but these are optional arguments that we don't really need here. 42 00:02:00,733 --> 00:02:03,733 So we will just use this two formula and data. 43 00:02:04,066 --> 00:02:05,233 Okay. So that's it. 44 00:02:05,233 --> 00:02:08,433 The regressor will be ready once we select it and execute it. 45 00:02:08,433 --> 00:02:10,600 So let's do this right now. 46 00:02:10,600 --> 00:02:13,500 And let's press Command and Control plus enter to execute. 47 00:02:15,133 --> 00:02:15,733 Here we go. 48 00:02:15,733 --> 00:02:17,366 Now the regressor is ready. 49 00:02:17,366 --> 00:02:19,000 As you can see it just appeared here. 50 00:02:19,000 --> 00:02:23,133 If you want to have some info about this regressor then the best way to do it 51 00:02:23,466 --> 00:02:27,533 is to, you know, go here in the console and type summary 52 00:02:28,433 --> 00:02:31,533 regressor because the name of our regressor is regressor. 53 00:02:32,000 --> 00:02:35,800 Then type enter and then you have some very good informations 54 00:02:35,800 --> 00:02:39,933 about your simple linear model for example. 55 00:02:40,233 --> 00:02:41,400 Okay. So let's see. 56 00:02:41,400 --> 00:02:44,400 let's just put that up 57 00:02:44,666 --> 00:02:45,533 right. 58 00:02:45,533 --> 00:02:48,833 So first it tells you what the formula is okay. 59 00:02:48,833 --> 00:02:52,433 So it's the salary being proportional to the number of years of experience. 60 00:02:52,866 --> 00:02:55,500 And that the model is built on the training set. 61 00:02:55,500 --> 00:02:57,400 Then you have some info about the residuals. 62 00:02:57,400 --> 00:02:59,233 We won't be talking about this now. 63 00:02:59,233 --> 00:03:03,200 But the really important section is this one coefficients. 64 00:03:03,200 --> 00:03:05,100 Because not only itself, 65 00:03:05,100 --> 00:03:08,966 the value of your coefficients in the simple linear regression equation, 66 00:03:09,300 --> 00:03:13,533 but also it tells you the statistical significance of your career efficiency. 67 00:03:14,833 --> 00:03:16,333 And here we observe three stars. 68 00:03:16,333 --> 00:03:16,833 Here. 69 00:03:16,833 --> 00:03:18,166 That means the years 70 00:03:18,166 --> 00:03:22,200 experience independent variable is highly statistically significant 71 00:03:22,600 --> 00:03:27,600 because you can either have no star or one star two stars or three stars. 72 00:03:27,900 --> 00:03:31,500 No star means that there is no statistical significance, 73 00:03:31,500 --> 00:03:34,833 and three stars means that there is a high statistical significance. 74 00:03:35,233 --> 00:03:37,200 So that's the first info. 75 00:03:37,200 --> 00:03:39,400 That's the first hint of what is going to happen, 76 00:03:39,400 --> 00:03:44,100 because we already know by looking at this that there will be a strong linear 77 00:03:44,100 --> 00:03:47,733 relationship between the salary and the number of years of experience. 78 00:03:48,433 --> 00:03:51,433 And the other info here is the p value. 79 00:03:51,466 --> 00:03:55,533 Then the p value is another indicator of the statistical significance, 80 00:03:55,766 --> 00:03:58,533 because the lower the p value is, 81 00:03:58,533 --> 00:04:02,133 the more significant your independent variable is going to be. 82 00:04:02,433 --> 00:04:05,433 That is, the more impact, the more effect 83 00:04:05,700 --> 00:04:09,100 your independent variable is going to have on the dependent variable. 84 00:04:09,600 --> 00:04:14,633 And usually a good threshold for the p value is 5%, which means that 85 00:04:14,633 --> 00:04:18,766 when we are below 5%, the independent variable is highly significant, 86 00:04:18,966 --> 00:04:22,766 and when we are over 5%, that means that it's less significant. 87 00:04:23,100 --> 00:04:28,000 And here, as you can see, the p value is 1.52 ten at the power of -14, 88 00:04:28,000 --> 00:04:31,200 which means that it's a very, very, very small p value. 89 00:04:31,466 --> 00:04:34,266 So that means that this independent variable 90 00:04:34,266 --> 00:04:37,933 years of experience is highly statistically significant. 91 00:04:38,166 --> 00:04:42,000 And it has high impact and high effect on the formula dependent variable. 92 00:04:42,700 --> 00:04:45,600 So that's very important information. 93 00:04:45,600 --> 00:04:50,566 Get the reflex to look at these by you know typing summary regressor. 94 00:04:50,733 --> 00:04:54,233 Because this is really important especially when you want to try out 95 00:04:54,333 --> 00:04:57,133 several potential independent variables. 96 00:04:57,133 --> 00:05:00,500 You need to look at their statistical significance to choose them. 97 00:05:01,900 --> 00:05:04,700 And then you have some informations of your model 98 00:05:04,700 --> 00:05:08,466 globally which will we'll be talking about at the end of this part. 99 00:05:08,500 --> 00:05:09,666 Part one regression, 100 00:05:09,666 --> 00:05:13,533 when we'll be talking about ways to evaluate your model when here. 101 00:05:13,533 --> 00:05:17,433 As you can see, you have the multiple R-squared that we'll talk to you about. 102 00:05:17,766 --> 00:05:19,066 And the adjusted R-squared. 103 00:05:19,066 --> 00:05:21,333 If you have several models with several 104 00:05:21,333 --> 00:05:24,933 teams of independent variables, then that's the adjusted R-squared. 105 00:05:24,933 --> 00:05:27,666 You must choose to choose the best model. 106 00:05:27,666 --> 00:05:28,000 All right. 107 00:05:28,000 --> 00:05:31,100 So that was just a parenthesis to give you this very important trick 108 00:05:31,100 --> 00:05:34,766 to know in R and learn how to evaluate your model already. 109 00:05:35,033 --> 00:05:37,100 So actually we are done 110 00:05:37,100 --> 00:05:40,766 fitting the simple linear regression to our data set, our training set. 111 00:05:41,133 --> 00:05:45,100 And in the next tutorial we are going to be predicting the test 112 00:05:45,100 --> 00:05:49,233 set results to finally see how our simple linear 113 00:05:49,233 --> 00:05:52,500 regression behaves on a new set on some new observations. 114 00:05:53,000 --> 00:05:54,766 Okay, so that's the end of this tutorial. 115 00:05:54,766 --> 00:05:56,800 I look forward to seeing you in the next one. 116 00:05:56,800 --> 00:05:58,500 And until then, enjoy machine learning.