1 00:00:00,600 --> 00:00:03,590 So we have learned how to run a simple linear regression model. 2 00:00:04,840 --> 00:00:07,680 It is time to learn how to run a multiple linear regression model. 3 00:00:07,780 --> 00:00:08,030 Not. 4 00:00:09,490 --> 00:00:10,990 In multiple linear regression model. 5 00:00:11,800 --> 00:00:17,170 Instead of using one predictor, variable will be using multiple predictive variables. 6 00:00:18,430 --> 00:00:24,520 So let us first run this model with all the variables that we have in dataset to. 7 00:00:25,080 --> 00:00:28,510 We'll just create and one even call you multiple Morlin. 8 00:00:38,480 --> 00:00:42,500 This will get you on the same function, L.M.. 9 00:00:44,630 --> 00:00:50,570 And when the bracket will come in place, this is the dependent variable, then a dollar. 10 00:00:51,600 --> 00:00:55,890 Well, if you want all variables from your data, say dust right around. 11 00:00:57,350 --> 00:01:01,910 So apart from price, it will take all the news from your day, does it? 12 00:01:02,120 --> 00:01:04,680 And what did your data say that will define no comma? 13 00:01:05,340 --> 00:01:06,450 Data is equal to be it. 14 00:01:10,150 --> 00:01:10,950 If put on this. 15 00:01:13,130 --> 00:01:15,710 We have another variable called McLibel model. 16 00:01:18,000 --> 00:01:18,750 Look at the design. 17 00:01:19,060 --> 00:01:19,870 All right, somebody. 18 00:01:21,370 --> 00:01:23,200 I went back and realized makeable model. 19 00:01:28,450 --> 00:01:29,270 But in this. 20 00:01:33,480 --> 00:01:35,510 To continue, we have the that we want. 21 00:01:36,530 --> 00:01:41,690 We have better values in this column of all the variables. 22 00:01:41,960 --> 00:01:43,970 These are all the variables that we had. 23 00:01:45,610 --> 00:01:46,810 You have better values. 24 00:01:48,070 --> 00:01:49,590 This is because it'll be done. 25 00:01:49,690 --> 00:01:50,300 We do. 26 00:01:50,450 --> 00:01:51,010 And so on. 27 00:01:53,110 --> 00:01:59,390 This is the column giving a standard error, just as saw in Berlin, integration, that is devalue. 28 00:01:59,500 --> 00:02:00,390 This is P-value. 29 00:02:04,190 --> 00:02:10,460 And if you remember these, Tojo's telling us the level of significance, so point zero zero one means 30 00:02:10,570 --> 00:02:15,980 we had ninety nine point nine percent confident of this redub being non. 31 00:02:17,990 --> 00:02:20,680 Those does will mean you're 99 percent confident. 32 00:02:20,950 --> 00:02:26,370 And a single star will mean that we are 95 percent confident that this mutation on. 33 00:02:27,620 --> 00:02:31,760 So basically, you're getting six variables with these starts. 34 00:02:33,570 --> 00:02:39,350 Four of them are ninety nine point nine percent sure that they are related with these guys. 35 00:02:40,690 --> 00:02:41,270 Or two of them. 36 00:02:41,330 --> 00:02:44,910 We have 95 percent confidence that they are impacting the place we live in. 37 00:02:48,450 --> 00:02:49,130 What are those? 38 00:02:50,360 --> 00:02:51,980 Although they have a non it'll be done. 39 00:02:53,000 --> 00:02:59,050 But we are not really sure that they are actually impacting the response video. 40 00:02:59,340 --> 00:02:59,610 Not. 41 00:03:01,320 --> 00:03:08,660 So this part is telling you about the coefficient and the significance level for each individual variable, 42 00:03:09,120 --> 00:03:11,640 and this part is about Diebel model. 43 00:03:13,310 --> 00:03:15,440 What this whole model did is a double standard. 44 00:03:15,620 --> 00:03:18,630 It it is coming out to be forty point nine to five. 45 00:03:19,730 --> 00:03:21,870 With four hundred ninety degrees of freedom. 46 00:03:23,040 --> 00:03:30,690 So the degrees of freedom in simple linear regression was 500 foot because we calculated it by doing 47 00:03:30,810 --> 00:03:31,680 and minus two. 48 00:03:32,450 --> 00:03:33,710 So that's you're under six. 49 00:03:33,840 --> 00:03:35,020 We got 504. 50 00:03:36,210 --> 00:03:37,140 But in this case. 51 00:03:37,350 --> 00:03:41,300 Degrees of freedom is for 90 because we are subtracting from it. 52 00:03:42,330 --> 00:03:44,190 The number of variables also. 53 00:03:46,310 --> 00:03:49,640 So 506 minus the 16 variables. 54 00:03:49,910 --> 00:03:51,620 That is why I did this, the freedom. 55 00:03:52,080 --> 00:03:52,610 Or maybe. 56 00:03:53,570 --> 00:03:57,150 So the city R-squared value, and this is the adjusted R-squared value. 57 00:03:58,520 --> 00:04:01,350 Australia does not take into account the number of variables. 58 00:04:02,550 --> 00:04:08,400 This is coming or do we were in Toronto, meaning that 72 percent of the variance is being explained 59 00:04:09,030 --> 00:04:12,310 by this model Variant A. Osprey's data, the. 60 00:04:14,520 --> 00:04:19,500 I think it is a pretty good model, since it is able to explain 72 percent of the variance. 61 00:04:20,820 --> 00:04:23,870 It destroyed R-squared takes into account the number of variables. 62 00:04:24,200 --> 00:04:27,260 That is why it is coming out to be little less. 63 00:04:27,350 --> 00:04:28,700 Since it has a lot of variables. 64 00:04:29,730 --> 00:04:32,220 You increase the number of variables, it go down. 65 00:04:35,540 --> 00:04:37,610 And for malleability integration. 66 00:04:37,690 --> 00:04:40,230 This is a more useful parameter. 67 00:04:41,080 --> 00:04:45,070 So it is better to report adjusted R-squared in case of multiple integration. 68 00:04:47,520 --> 00:04:49,780 This last is the after stick barometer. 69 00:04:51,300 --> 00:04:55,480 We calculate the value of a statistic to save it. 70 00:04:55,500 --> 00:05:01,950 Confidence that whether the variables that we took actually impact the response variable. 71 00:05:03,460 --> 00:05:07,250 So we get a essayistic rally of eighty four point eighty four. 72 00:05:07,790 --> 00:05:09,970 And it's P-value is very small. 73 00:05:10,340 --> 00:05:11,800 The P value is very small. 74 00:05:12,610 --> 00:05:16,240 We are pretty confident that these variables are impacting. 75 00:05:16,620 --> 00:05:17,730 They respond very well. 76 00:05:19,990 --> 00:05:25,970 And I just noticed that this last very one average, this is also significantly larger. 77 00:05:26,830 --> 00:05:28,000 It also has three stars. 78 00:05:29,130 --> 00:05:38,120 So we have at least seven variables that are significantly impacting the response variable and others, 79 00:05:38,130 --> 00:05:39,150 we are not really sure. 80 00:05:39,810 --> 00:05:42,540 And these are the values for all of these. 81 00:05:46,080 --> 00:05:53,820 In terms of business, all this is going to help me is as follows, I can see that air quality. 82 00:05:55,180 --> 00:06:00,160 Impacts the house price negatively and its leader will lose also very large. 83 00:06:01,160 --> 00:06:08,180 So one unit changed equality, if it is an increase, it will reduce the house price. 84 00:06:08,600 --> 00:06:09,800 But if you've been unit. 85 00:06:11,790 --> 00:06:17,250 So if I want to maximize the price, I should plan to build a house. 86 00:06:19,160 --> 00:06:22,310 An area with lower value of this index. 87 00:06:24,090 --> 00:06:26,940 Similarly, room them has a positive impact. 88 00:06:27,780 --> 00:06:33,570 If I add one unit, the house price will increase by four units. 89 00:06:34,200 --> 00:06:36,860 Now, you can also see that room number. 90 00:06:37,040 --> 00:06:40,780 Eliot had a bit of a leave of mine in a simple linear model. 91 00:06:42,020 --> 00:06:43,610 It had a value of nine. 92 00:06:44,720 --> 00:06:46,000 Now it has a value of food. 93 00:06:47,140 --> 00:06:52,380 So it's into the added new variables, the value of biddable name has also changed. 94 00:06:53,820 --> 00:07:00,190 So meaning you have to know this here is this P-value is saying that your variables are impacting house 95 00:07:00,190 --> 00:07:00,640 price. 96 00:07:01,890 --> 00:07:08,720 So first we look at this P-value, then we go and look at these P values to identify which of the individual 97 00:07:09,470 --> 00:07:13,520 variables we are confident are impacting our despond variable. 98 00:07:14,300 --> 00:07:16,460 So we get these seven variables. 99 00:07:17,850 --> 00:07:21,120 And from these seven variables, we look at that. 100 00:07:21,140 --> 00:07:27,510 We does, and there are signs, so positive sign means if I increase, you know, it will increase house 101 00:07:27,510 --> 00:07:29,190 price and buy this many units. 102 00:07:29,580 --> 00:07:36,980 A negative sign means if I increase air quality unit by one, it will decrease house place by 50 per 103 00:07:36,990 --> 00:07:37,350 unit. 104 00:07:39,140 --> 00:07:43,670 So this is how we run multiple linear regression in odd. 105 00:07:44,210 --> 00:07:49,590 And this is also how we interpret it to make business sense out of the reported result.