1 00:00:01,110 --> 00:00:07,050 In this video, we are going to run multiple linear regression, that is, we are going to use all the 2 00:00:07,050 --> 00:00:10,550 variables of our dataset to predict the house price. 3 00:00:12,390 --> 00:00:16,590 This is exactly similar to the way we did a simple linear regression. 4 00:00:16,950 --> 00:00:23,310 Only difference is that in the input X range, instead of just putting room, no table will be putting 5 00:00:23,340 --> 00:00:28,280 all the variables to will again go to the data analysis button. 6 00:00:29,610 --> 00:00:33,860 We will select a regression in the input y range. 7 00:00:35,760 --> 00:00:37,020 We have the price column. 8 00:00:40,800 --> 00:00:44,820 In the input X range, we'll put all the other variables. 9 00:00:51,170 --> 00:00:52,910 All the other options will remain the same. 10 00:00:53,060 --> 00:00:56,630 We have levels we want to put in a new worksheet. 11 00:00:57,950 --> 00:00:59,330 And we'll click on Oggy. 12 00:01:03,140 --> 00:01:06,890 So here is the result of multiple linear regression. 13 00:01:08,890 --> 00:01:14,760 The R-squared value for our model is coming out three point seventy two, where the adjusted R-squared 14 00:01:14,770 --> 00:01:15,620 is point seven one. 15 00:01:16,150 --> 00:01:23,760 This is a pretty good value and it means that nearly 72 percent of the variation in the values of house 16 00:01:23,770 --> 00:01:27,280 price are accounted for by this model. 17 00:01:30,720 --> 00:01:39,510 In the table below, we have all the Beatles for all the independent variables to the first column contains 18 00:01:39,510 --> 00:01:42,640 the name of all the independent variables, plus the intercept. 19 00:01:43,470 --> 00:01:45,660 The second column is of the Capuchins. 20 00:01:46,380 --> 00:01:51,930 The third column is giving us a standard error corresponding to those coefficient. 21 00:01:52,620 --> 00:01:55,140 Fourth column contains these statistic value. 22 00:01:56,340 --> 00:02:04,410 Fifth column is the P value will be using the P value to determine which of the variables is significantly 23 00:02:04,410 --> 00:02:06,640 impacting the dependent variable. 24 00:02:07,380 --> 00:02:14,160 So a value of less than point zero five will be used as a threshold to identify the variables which 25 00:02:14,160 --> 00:02:16,800 are significantly impacting our dependent variable. 26 00:02:18,600 --> 00:02:25,680 So if you look at the values of P for air quality and what room number and for other such variables, 27 00:02:26,340 --> 00:02:28,730 you can see these are very small values. 28 00:02:30,000 --> 00:02:34,910 It is written in scientific format that is in the exponential format. 29 00:02:35,640 --> 00:02:41,730 So this value is actually eight point two four multiplied by ten to one minus zero five. 30 00:02:43,410 --> 00:02:45,030 So this is a very small number. 31 00:02:46,260 --> 00:02:52,470 So whatever the number is smaller than point zero five, that is, we are more than 95 percent confident 32 00:02:52,740 --> 00:02:57,040 that that particular variable is impacting our dependent variable. 33 00:02:58,260 --> 00:03:03,270 We will use only those variables and we will correspondingly see their capuchins. 34 00:03:05,100 --> 00:03:06,750 When we are looking at their coefficient. 35 00:03:07,290 --> 00:03:08,760 We have to look at two things. 36 00:03:09,210 --> 00:03:14,580 First is the sign of that coefficient, and second is the magnitude of that collision. 37 00:03:16,780 --> 00:03:24,880 If the sign is positive, that means increasing that particular variable will increase the Osprey's, 38 00:03:25,300 --> 00:03:29,080 for example, in Rubner and getting plus four point zero one. 39 00:03:30,160 --> 00:03:36,160 This means that if I increase the number one unit, keeping other things constant, house price will 40 00:03:36,160 --> 00:03:37,360 increase by four units. 41 00:03:39,710 --> 00:03:46,970 Whereas for every distance, the coefficient is negative, it is minus one point two, which means that 42 00:03:46,970 --> 00:03:54,740 if I increase average distance from the employment hubs by one unit, the price of houses will decrease 43 00:03:54,740 --> 00:03:56,210 by one point two units. 44 00:03:59,970 --> 00:04:07,260 So it is important to look at the sign, which will tell you whether increasing the variable increases 45 00:04:07,260 --> 00:04:11,550 the response variable or decreasing the variable increases the response variable. 46 00:04:12,590 --> 00:04:19,760 And the second thing is magnitude, that is how big is that particular cooperation, if it is for that 47 00:04:19,760 --> 00:04:24,560 means increasing it by one unit increases response variable by four units. 48 00:04:24,980 --> 00:04:32,240 And if it is zero point zero zero five, it means increasing age by one unit will have negligible impact 49 00:04:32,570 --> 00:04:33,830 on the response variable. 50 00:04:34,640 --> 00:04:42,670 So we will look at two things of all those variables which are significantly impacting the house price 51 00:04:43,250 --> 00:04:43,760 variable. 52 00:04:44,850 --> 00:04:47,320 One is saying and the other is magnitude. 53 00:04:49,620 --> 00:04:56,490 So in this way, whenever you have your marketing problem, collecting data for that problem, you can 54 00:04:56,490 --> 00:05:01,500 run a linear regression analysis just the way we have demonstrated a law. 55 00:05:02,160 --> 00:05:09,660 And using this final table, you can identify which all variable is important by looking at the p value 56 00:05:10,170 --> 00:05:17,850 and what is the impact of that particular variable using the coefficient value once you have this column. 57 00:05:18,030 --> 00:05:24,750 You can create the relationship between these dependent and independent variables and you can predict 58 00:05:24,750 --> 00:05:27,360 the value of price using these coefficient. 59 00:05:28,860 --> 00:05:35,010 So let us see how to protect the value of house price using the coefficient that we have predicted. 60 00:05:36,390 --> 00:05:40,910 Let us bring one observation for which we are going to predict the house price. 61 00:05:41,660 --> 00:05:43,530 So for this last observation. 62 00:05:45,820 --> 00:05:48,580 Will predict the house price, so I'm going to paste it. 63 00:05:50,650 --> 00:05:58,750 It interests me, so I had the data, although I listed it as transpose, that is why now I have this 64 00:05:58,750 --> 00:06:03,250 data vertically now as per the equation. 65 00:06:06,360 --> 00:06:13,110 The predicted price is going to be intercept, plus the value of this. 66 00:06:14,490 --> 00:06:22,680 Crime rate coefficient multiplied by the value of crime rate plus value of resod area coefficient multiplied 67 00:06:22,680 --> 00:06:25,260 by the value of residential and so on. 68 00:06:26,360 --> 00:06:32,660 So in this column, I'm going to get the product of commission and the value. 69 00:06:35,770 --> 00:06:38,130 I dragged down to the lassalle. 70 00:06:40,660 --> 00:06:43,990 So this is giving us the impact of. 71 00:06:45,750 --> 00:06:48,420 Those variables for these particular values. 72 00:06:49,950 --> 00:06:55,860 Now, if I add all these cells and get the estimated price of Holth. 73 00:07:00,750 --> 00:07:08,280 So my model is predicting a value of 24 units for the house, the actual price for the house was 19 74 00:07:08,280 --> 00:07:08,700 units. 75 00:07:12,700 --> 00:07:18,460 Similarly, we can predict the price of any other house for which we have the value of all these variables 76 00:07:18,790 --> 00:07:20,740 and we want to predict the house price. 77 00:07:24,210 --> 00:07:30,900 So this is all using the coefficient, we multiply the corporations with the value of a particular observation 78 00:07:31,140 --> 00:07:34,350 and add them up to get defined and protected place.