1 00:00:00,670 --> 00:00:05,200 So in the last lecture, we have performed preprocessing on our data. 2 00:00:06,560 --> 00:00:12,000 We converted these three categorical variables into this four numeric orrible. 3 00:00:12,890 --> 00:00:19,820 Now we can use this numeric variable along with other variables, to create a linear regression model 4 00:00:19,820 --> 00:00:20,600 in exit. 5 00:00:22,260 --> 00:00:26,430 Now, to create the model, you have to go to this Datatel. 6 00:00:27,800 --> 00:00:29,660 Click on this, do the analysis. 7 00:00:32,230 --> 00:00:39,280 And just search for recreation, just to scroll down, you will find recreation over here and click 8 00:00:39,280 --> 00:00:39,840 on OK. 9 00:00:41,830 --> 00:00:47,260 Now, once you click on regulation, you will get this kind of dialog box where you have to provide 10 00:00:47,260 --> 00:00:51,460 your variable X rebels and some other information. 11 00:00:52,620 --> 00:00:56,520 So a variable that is dependent variable is this valuation. 12 00:00:58,540 --> 00:01:02,080 So just select all the records that you have in this column. 13 00:01:03,900 --> 00:01:11,130 So select the top cell and hold your control Chimpsky and click on the downward arrow. 14 00:01:12,600 --> 00:01:16,260 You will be able to select all the values in this food. 15 00:01:19,010 --> 00:01:23,930 And once you do that, you will find a range that you have selected over. 16 00:01:25,430 --> 00:01:28,490 Next, we have to select the input X range. 17 00:01:30,000 --> 00:01:39,230 Now, this all are over ex, and we've already converted this categorical variables to a numerical variable, 18 00:01:39,480 --> 00:01:45,410 so select all the converted rebel and all the candidates rebels that were there. 19 00:01:45,780 --> 00:01:48,090 So select all this records. 20 00:01:50,410 --> 00:01:56,040 And again, select all the records by holding down shift control and don't do it at all. 21 00:02:01,610 --> 00:02:08,150 So you can notice that we have ignored these three categorical variables because we have already converted 22 00:02:08,150 --> 00:02:13,760 them and we have also ignored this valuation column because this is our variable. 23 00:02:15,280 --> 00:02:21,100 So now we have selected our X and Y range that is independent and dependent variables. 24 00:02:22,760 --> 00:02:24,110 There are few other options. 25 00:02:25,190 --> 00:02:32,840 Now, while selecting, you can notice that we have also selected this heteros, and whenever you select 26 00:02:32,840 --> 00:02:36,890 the heteros, you have to click on this labels checkbox. 27 00:02:37,130 --> 00:02:41,210 This means that you have also selected labels of your data. 28 00:02:41,540 --> 00:02:43,940 The rules are also known as labels. 29 00:02:45,210 --> 00:02:47,160 So now the settings are fine. 30 00:02:48,090 --> 00:02:54,540 And you can click on OK to create the model and the summary report. 31 00:02:57,390 --> 00:03:01,140 So this is the result of a linear regression. 32 00:03:02,860 --> 00:03:08,980 So whenever you get the result of linear regression, the first thing you have to check is significant 33 00:03:09,030 --> 00:03:09,640 evalu. 34 00:03:13,060 --> 00:03:20,770 A small, significant F value signifies that the predictors have some kind of relationship with the 35 00:03:20,770 --> 00:03:21,790 response variable. 36 00:03:23,910 --> 00:03:31,440 In other words, you can see that the independent variables that you are using have some significant 37 00:03:31,440 --> 00:03:35,970 relationship with the dependent variable or the target variable. 38 00:03:38,110 --> 00:03:46,300 So you can see that here the value is four point four and two areas to the power, minus 264. 39 00:03:48,710 --> 00:03:57,500 It is labor minus 264 is a very small number, and since the significant effort is very small, you 40 00:03:57,500 --> 00:04:03,650 can see that there is a significant relationship between independent and dependent variable. 41 00:04:05,220 --> 00:04:12,810 Now, once we have checked this significant value, we can look at the accuracy score of our modern. 42 00:04:14,940 --> 00:04:21,730 Now, to look at the accuracy of our model, you can look at this adjusted square value here. 43 00:04:22,350 --> 00:04:27,070 The value is zero point six zero five out of one. 44 00:04:28,110 --> 00:04:29,860 And this is a good enough value. 45 00:04:30,270 --> 00:04:37,530 Ideally, your adjusted artistic value should be more than zero point four or zero point four five. 46 00:04:38,250 --> 00:04:40,590 So since we are getting zero point six. 47 00:04:41,830 --> 00:04:48,190 This means that our independent variables are good enough to predict the dependent variable. 48 00:04:50,420 --> 00:04:57,380 Now, using significant F and artistic value, we have seen that there is a significant relationship 49 00:04:57,380 --> 00:05:05,750 between dependent and independent variables and our independent variables are good enough to predict 50 00:05:05,750 --> 00:05:07,340 the dependent variable. 51 00:05:08,360 --> 00:05:16,730 Now let's look at individual variables and let's find out which all independent variables have more 52 00:05:16,730 --> 00:05:18,950 impact on our dependent variable. 53 00:05:21,500 --> 00:05:25,730 Now, here in the bottom, you will find all the independent variables. 54 00:05:27,390 --> 00:05:35,520 The first item in this list is intercept, this is the beeton not or the concern coefficient, then 55 00:05:35,520 --> 00:05:39,320 we have all the variables that we have in our data. 56 00:05:40,600 --> 00:05:48,310 Here, four variables, you have to look at the P values and similar to significant F value, p value 57 00:05:49,030 --> 00:05:57,100 tells us how significant the relationship is between dependent and that independent variable. 58 00:05:58,190 --> 00:06:05,930 And the smaller the P value, the more significant the relationship is between the dependent and independent 59 00:06:05,930 --> 00:06:06,410 variable. 60 00:06:08,070 --> 00:06:12,420 So we should look for P values less than zero point zero five. 61 00:06:13,500 --> 00:06:17,910 Let's say let all the cells where this p value is less than zero point zero five. 62 00:06:19,710 --> 00:06:23,590 So you can see that here the value is Edessa minus 17. 63 00:06:24,360 --> 00:06:26,370 So obviously this is very less. 64 00:06:27,400 --> 00:06:29,950 And again, we can ignore this value. 65 00:06:31,170 --> 00:06:36,780 And here the value is it to go to minus one nine so we can highlight this? 66 00:06:37,750 --> 00:06:44,230 This also the values are Edessa are minus 20 and it is sort of minus 39. 67 00:06:45,430 --> 00:06:46,930 And this also. 68 00:06:48,400 --> 00:06:56,380 So by looking at the P-value, you can see what all independent variables are important in your analysis 69 00:06:56,890 --> 00:07:00,820 or in other words, have significant impact on the viability. 70 00:07:00,830 --> 00:07:07,330 But you can see that the first variable we have here is the large. 71 00:07:08,370 --> 00:07:16,770 Then location equal to residential city equal to Metro, the investment we are making and the average 72 00:07:16,770 --> 00:07:17,730 household income. 73 00:07:19,750 --> 00:07:28,270 Now, these three are the not important variables, so exercise equal to medium do not have a significant 74 00:07:28,270 --> 00:07:32,500 relationship with the variable or the valuation of the store. 75 00:07:33,280 --> 00:07:41,590 Similarly, the number of competitor stores and the estimated population in the vicinity are also not 76 00:07:41,590 --> 00:07:42,700 that important. 77 00:07:42,970 --> 00:07:45,370 And the valuation of our store. 78 00:07:46,520 --> 00:07:53,430 Now, once you have identified the key independent variables, you can look at their coefficient. 79 00:07:54,050 --> 00:07:57,530 These are the better values for all these horrible. 80 00:07:58,550 --> 00:08:02,690 So the coefficient for a store under Scollard is. 81 00:08:03,810 --> 00:08:06,750 Four thousand six hundred and fifty nine. 82 00:08:07,960 --> 00:08:08,950 This means that. 83 00:08:09,740 --> 00:08:11,570 If the stories large. 84 00:08:12,730 --> 00:08:18,880 The valuation is going to increase by four thousand six hundred and fifty nine dollars. 85 00:08:20,020 --> 00:08:25,420 Similarly, the Corporation for Investment is zero point four five. 86 00:08:25,810 --> 00:08:33,490 This means that if we increase investment amount by one dollar, the valuation is going to increase 87 00:08:33,490 --> 00:08:35,640 by zero point four or five dollars. 88 00:08:37,890 --> 00:08:44,200 And here is the confusion for average household income, which is zero point four three one. 89 00:08:45,060 --> 00:08:49,530 This means that keeping all the variable same except the household income. 90 00:08:50,380 --> 00:08:58,780 If the household income in one locality is 10000 and in another localities 11000, then there will be 91 00:08:58,780 --> 00:09:07,370 a difference of point four, three, one and two thousand that is 431 dollars and the valuation of two 92 00:09:07,370 --> 00:09:07,920 escorts. 93 00:09:08,500 --> 00:09:15,850 So with each increase in the household income, there is around point for three dollar increase and 94 00:09:15,850 --> 00:09:17,410 the valuation amount. 95 00:09:19,850 --> 00:09:27,380 So that's how we read the result of regression analysis, first we look at the significant value you 96 00:09:27,380 --> 00:09:31,330 got, a significant value should be less than zero point zero five. 97 00:09:31,710 --> 00:09:35,270 In our case, this is much smaller than zero point zero five. 98 00:09:35,280 --> 00:09:41,150 So that we can say that there is a significant relationship between dependent and independent variables. 99 00:09:42,680 --> 00:09:48,380 Once you have identified that there is a significant relationship, you can look at the accuracy of 100 00:09:48,380 --> 00:09:51,360 your model by looking at I just artist square. 101 00:09:52,580 --> 00:09:56,300 I just said artist good value should be more than zero point four. 102 00:09:56,300 --> 00:09:58,580 And the work is this is zero point six. 103 00:09:58,940 --> 00:10:03,340 So you can see that our model is going to predict the variables accurately. 104 00:10:04,610 --> 00:10:11,630 Then after looking at these two values, we can look at individual variable level data where we can 105 00:10:11,630 --> 00:10:15,920 identify the important variables by looking at their P values. 106 00:10:17,350 --> 00:10:22,270 P value should be less than zero point zero five for important variables. 107 00:10:23,250 --> 00:10:28,470 So just highlight all the variables with P-value, less than zero point zero five. 108 00:10:29,930 --> 00:10:37,040 Once you have highlighted the variables, you can look at Coefficient to see how those variables are 109 00:10:37,040 --> 00:10:44,210 going to impact your variable positive means, if you are increasing your independent variable, the 110 00:10:44,210 --> 00:10:48,530 dependent variable is also going to increase negative means. 111 00:10:48,530 --> 00:10:54,530 If you are going to increase your independent variable, variable is going to decrease. 112 00:10:55,500 --> 00:11:00,840 So, for example, here you can see that for residential location, the coefficient is negative. 113 00:11:01,020 --> 00:11:05,190 That means that between commercial and residential options. 114 00:11:06,160 --> 00:11:09,820 The valuation of commercial location is more.