1 00:00:00,066 --> 00:00:00,633 So that's great. 2 00:00:00,633 --> 00:00:01,800 That's the first thing to notice. 3 00:00:01,800 --> 00:00:02,900 And then what I was telling you 4 00:00:02,900 --> 00:00:06,533 is that for each of the coefficient we have different information. 5 00:00:06,533 --> 00:00:10,233 We have actually the coefficient in the linear regression equation. 6 00:00:10,800 --> 00:00:13,600 Then we have the standard error, then the t value, 7 00:00:13,600 --> 00:00:16,800 then the p value here, and then the significance level. 8 00:00:17,333 --> 00:00:21,600 So the most important info, by looking at this are the last two columns 9 00:00:22,000 --> 00:00:24,700 the p value and the significance level. 10 00:00:24,700 --> 00:00:28,000 Because these columns tell us about the statistical 11 00:00:28,000 --> 00:00:31,966 significance of the independent variable onto the dependent variable. 12 00:00:32,466 --> 00:00:35,533 That means it tells us if each of the independent variable 13 00:00:35,700 --> 00:00:39,100 has a significant impact on the dependent variable. 14 00:00:40,133 --> 00:00:40,433 Okay. 15 00:00:40,433 --> 00:00:43,033 So let me first explain what the p value is. 16 00:00:43,033 --> 00:00:47,700 The most important thing to understand is that the lower the p value is 17 00:00:48,100 --> 00:00:52,366 and the mode statistically significant, your independent variable is going to be. 18 00:00:53,366 --> 00:00:54,033 That means that the 19 00:00:54,033 --> 00:00:58,200 lower the p value is, the more impact or effect 20 00:00:58,200 --> 00:01:01,400 your independent variable is going to have on the dependent variable. 21 00:01:02,333 --> 00:01:06,633 And generally a good threshold to use is the 5% threshold. 22 00:01:07,200 --> 00:01:11,733 That means that if your p value is lower than 5%, then that means 23 00:01:11,733 --> 00:01:15,300 that your independent variable would be highly statistically significant. 24 00:01:15,633 --> 00:01:19,866 And the more it is about 5%, the less it will be statistically significant. 25 00:01:20,400 --> 00:01:23,333 So that's how you must interpret the p value. 26 00:01:23,333 --> 00:01:25,000 And then we have this last column, 27 00:01:25,000 --> 00:01:28,500 which is just a faster way of interpreting the coefficients. 28 00:01:28,933 --> 00:01:33,066 Because as you can see we have this line here that explains us about the stars. 29 00:01:33,633 --> 00:01:37,233 And basically what the stars means is that when the p value is 30 00:01:37,233 --> 00:01:42,300 between 0 and 0.1%, then there's going to be three stars, 31 00:01:42,300 --> 00:01:46,966 which means the highly statistical significance of your independent variable. 32 00:01:47,533 --> 00:01:51,833 If the p value is between 0.1 percent and 1%, 33 00:01:52,133 --> 00:01:55,566 then it's a quite high level of statistical significance. 34 00:01:56,000 --> 00:01:59,366 Then if your p value is between 1% and 5%, 35 00:01:59,500 --> 00:02:03,000 then the independent variable is still statistically significant, 36 00:02:03,266 --> 00:02:06,600 but will have a less strong effect than the first categories. 37 00:02:07,333 --> 00:02:11,400 And then if your p value is between 5% and 10%, it's borderline. 38 00:02:11,700 --> 00:02:14,600 The independent variable might have a certain level 39 00:02:14,600 --> 00:02:19,133 of statistical significance, but not that much anyway, a lot 40 00:02:19,133 --> 00:02:23,066 less than those independent variables that have three stars or two stars. 41 00:02:23,700 --> 00:02:27,000 And then if your p value is between 10% and one, 42 00:02:27,000 --> 00:02:30,400 then there is absolutely no statistical significance. 43 00:02:30,400 --> 00:02:32,166 That means that your independent variable 44 00:02:32,166 --> 00:02:35,166 won't have any effect on the dependent variable. 45 00:02:35,400 --> 00:02:40,166 So that's very interesting because when we look at our independent 46 00:02:40,166 --> 00:02:43,200 variables here, we notice that only one variable has 47 00:02:43,200 --> 00:02:46,833 a high statistical significance onto the dependent variable. 48 00:02:47,300 --> 00:02:49,000 It's the R&D spend. 49 00:02:49,000 --> 00:02:54,100 So it looks like the profit is mainly governed by the R&D spend. 50 00:02:54,100 --> 00:02:59,700 That means that it's the R&D spend that has the only strong effect on the profit. 51 00:03:00,233 --> 00:03:04,166 So that's a very important information for our investors because they now know 52 00:03:04,166 --> 00:03:07,833 that they shouldn't only be looking at the profit itself by, you know, 53 00:03:08,166 --> 00:03:11,800 just looking at the maximum profit to decide where to invest. 54 00:03:12,100 --> 00:03:14,866 But also it should be looking at the R&D spend variable. 55 00:03:14,866 --> 00:03:17,866 It should be looking at the amount spent in R&D 56 00:03:17,866 --> 00:03:21,300 to add another criterion in their investment decisions. 57 00:03:21,600 --> 00:03:24,300 So that's a very good info and basically what it means. 58 00:03:24,300 --> 00:03:27,900 What all this means is that among all the independent variables here, 59 00:03:28,133 --> 00:03:33,300 the only predictor, the only strong predictor of the profit is the R&D spend. 60 00:03:33,533 --> 00:03:36,666 The rest is absolutely like any useful. 61 00:03:37,166 --> 00:03:39,433 So actually what this means is that we could 62 00:03:40,533 --> 00:03:40,966 we could 63 00:03:40,966 --> 00:03:44,900 rewrite this multiple linear regression equation 64 00:03:44,900 --> 00:03:48,100 and turn it into a simple linear regression. 65 00:03:48,100 --> 00:03:51,233 Because since the only independent variable 66 00:03:51,233 --> 00:03:55,133 that has an effect on the profit dependent variable is the R&D spend, 67 00:03:55,133 --> 00:04:00,433 then the formula could be profit equals r that the dot spent. 68 00:04:01,366 --> 00:04:02,266 And that would be okay. 69 00:04:02,266 --> 00:04:05,500 That would actually give us the same prediction okay. 70 00:04:05,500 --> 00:04:07,700 So you learned a lot of stuff in this tutorial. 71 00:04:07,700 --> 00:04:10,966 But that's very important things to know in linear regression. 72 00:04:11,300 --> 00:04:12,766 So congratulations. 73 00:04:12,766 --> 00:04:16,566 We only have one tutorial left where we will be predicting the test set results. 74 00:04:16,866 --> 00:04:18,466 So I look forward to seeing you there. 75 00:04:18,466 --> 00:04:21,466 And until then enjoy machine learning.