1 00:00:00,240 --> 00:00:07,140 In this video, we will discuss the results we received for the categorical variables, if you remember, 2 00:00:07,590 --> 00:00:10,190 there were two categorical variables in our dataset. 3 00:00:11,070 --> 00:00:14,310 One was airport with values, yes and no. 4 00:00:15,540 --> 00:00:19,710 But we converted into corresponding the variable called airport. 5 00:00:19,710 --> 00:00:22,470 Yes, with values 192. 6 00:00:23,810 --> 00:00:30,410 Second categorical variable was waterboarding, which had four categories, and we made three dummy 7 00:00:30,410 --> 00:00:34,490 variables with value zero and one to correspond to this very well. 8 00:00:36,080 --> 00:00:39,320 Let us see what that model has to say about these variables. 9 00:00:43,090 --> 00:00:45,280 Now, let's talk about the airport very well first. 10 00:00:47,190 --> 00:00:48,370 We look at the equation. 11 00:00:49,600 --> 00:00:55,100 Here I've kept only airport in the linear regression, you can keep all other variables also. 12 00:00:55,120 --> 00:01:02,080 So basically the equation is Y is equal to all the other variables multiplied with the coefficient plus 13 00:01:02,080 --> 00:01:07,360 constant plus bit of airport into the airport variable. 14 00:01:10,620 --> 00:01:18,930 What this means is there are two possibilities if there is an airport, which means the value of X is 15 00:01:18,930 --> 00:01:19,440 one. 16 00:01:20,420 --> 00:01:28,100 Then we'll have value of all the repeaters multiplied by the commissions plus the constant plus between. 17 00:01:29,100 --> 00:01:36,450 And if it is zero, this one will not be present since it's a zero, so we don't need to say is zero. 18 00:01:40,580 --> 00:01:47,450 Therefore, this bit of airport is away, giving us the difference in price. 19 00:01:48,350 --> 00:01:51,710 If there is an airport and if there is not. 20 00:01:53,650 --> 00:02:00,490 So if I look at the result of my model airport, this variable has an estimate of one point one three, 21 00:02:00,640 --> 00:02:01,810 the GDP value. 22 00:02:02,990 --> 00:02:11,030 So this means that if there is an airport and all the other variables are seeing the value of the house, 23 00:02:11,030 --> 00:02:15,440 the price of the house will increase by one point one three units. 24 00:02:17,900 --> 00:02:25,070 Also, look at the p value for this, the p value is low, which means there is a statistical evidence 25 00:02:25,430 --> 00:02:31,130 of a difference in the house price, depending on whether there is an airport or whether there is not 26 00:02:31,130 --> 00:02:31,670 an airport. 27 00:02:35,160 --> 00:02:41,670 Also note that this coating of airport, yes, as one an airport known as zero is arbitrary. 28 00:02:42,700 --> 00:02:44,620 We can use other venues also. 29 00:02:44,710 --> 00:02:49,090 The result will have the same interpretation regardless these values. 30 00:02:52,140 --> 00:02:56,010 In the other variable, which is waterboarding, we had three dummy variables. 31 00:02:58,130 --> 00:03:02,540 So we received three different beta values for all the three variables. 32 00:03:04,410 --> 00:03:11,250 So if you remember in this, we selected the baseline of waterboarding, none, that is, we said that 33 00:03:11,670 --> 00:03:18,770 all of these values as zero will mean that there is neither a leak nor a river in the area. 34 00:03:20,300 --> 00:03:28,130 So now what is the meaning of each of these variable autobody lake means that compared to the situation 35 00:03:28,130 --> 00:03:35,870 where there is no water body, if there is a lake in that area, how much will the price increase? 36 00:03:36,810 --> 00:03:43,920 So zero point two six units will be the increase in price of house if there is a lake in the area. 37 00:03:45,660 --> 00:03:53,490 If you look at the lake and river coefficient, it is minus point six eight, which means that if there 38 00:03:53,490 --> 00:04:02,010 is both lake and river, the house price will go down in comparison to if there was no water body in 39 00:04:02,010 --> 00:04:02,510 the area. 40 00:04:05,020 --> 00:04:10,900 And the third variable, which is what was it, is saying that if there is a river, the house price 41 00:04:10,900 --> 00:04:15,780 will go down in comparison to if there is no water body in that area. 42 00:04:17,360 --> 00:04:24,740 Next thing we have to look at is the P-value, all of these P values are large, which means that there 43 00:04:24,740 --> 00:04:31,240 is no statistical evidence that there will be an impact of these three variables on the house price. 44 00:04:32,000 --> 00:04:39,320 So all although the relationship is given by these betas, but statistically, we are not confident 45 00:04:39,320 --> 00:04:43,500 that the relationship given by these betas actually holds or not. 46 00:04:46,170 --> 00:04:50,680 So this is how qualitative animals are handled and interpreted in linear model. 47 00:04:51,420 --> 00:04:56,520 We first transform them into dummy variables of and minus one categories. 48 00:04:57,360 --> 00:04:58,650 Then we run the regression. 49 00:04:59,340 --> 00:05:04,020 Then looking at the Beatles and the P values, we interpret the result.