1 00:00:01,460 --> 00:00:07,590 So you would have noticed that there is one value of this outpoured that we are still not discussed. 2 00:00:07,790 --> 00:00:09,890 And that value is called F statistic. 3 00:00:10,490 --> 00:00:13,460 And in this video, we'll be discussing about F statistic. 4 00:00:15,660 --> 00:00:19,270 Now, if you see in the desert for each of the coefficient. 5 00:00:20,330 --> 00:00:26,600 There is a corresponding P-value, which is telling us whether each individual predictor is related 6 00:00:26,600 --> 00:00:28,800 to the response I'm talking about. 7 00:00:28,820 --> 00:00:30,980 These P values and probability values. 8 00:00:31,100 --> 00:00:31,910 These two columns. 9 00:00:34,560 --> 00:00:41,350 So since at least one of them has a small P value here, we have six of them, too. 10 00:00:41,700 --> 00:00:46,740 Since at least one of them has a small P value corresponding to their devalue. 11 00:00:47,400 --> 00:00:52,830 We may think that shortly at least one of the predictors are related to the response. 12 00:00:54,870 --> 00:00:59,820 Therefore, we will say that there is a relationship between this bond and predictor variables. 13 00:01:00,720 --> 00:01:05,080 That is not all of the coefficients of the variables that has been one. 14 00:01:05,190 --> 00:01:06,610 We do appear to be. 15 00:01:08,020 --> 00:01:08,420 Ardito. 16 00:01:08,880 --> 00:01:10,560 So all of them are not deedle. 17 00:01:13,120 --> 00:01:19,960 This argument may sound OK, but it is actually not, especially if we have a large number of variables. 18 00:01:21,920 --> 00:01:24,050 Let me put it mathematically for you first. 19 00:01:25,550 --> 00:01:30,770 What I'm saying is the null hypothesis is that all of them are Z2. 20 00:01:32,300 --> 00:01:35,150 I want to prove that this is not the case. 21 00:01:35,660 --> 00:01:39,730 I want to say that at least one of DBI does is definitely non-zero. 22 00:01:40,520 --> 00:01:45,380 That is the predictor that I have selected have some relationship with the response. 23 00:01:49,530 --> 00:01:57,090 Now, suppose if my model is saying that any one of the predictors say, hey, we got six. 24 00:01:57,600 --> 00:02:05,040 Suppose it was giving that only one of them to a room them is significantly littered with host praise. 25 00:02:05,190 --> 00:02:07,860 This is the P value and the rest are not. 26 00:02:09,940 --> 00:02:19,510 Now, can I say with confidence that since bit of room them is significantly lifted, I am confident 27 00:02:19,540 --> 00:02:23,880 that the predictors in this model are impacting the response variable. 28 00:02:25,490 --> 00:02:26,810 The answer is no. 29 00:02:26,870 --> 00:02:27,800 I cannot see that. 30 00:02:28,870 --> 00:02:29,710 Let me show you how. 31 00:02:31,660 --> 00:02:38,260 So suppose I toss a coin five times and I say that that I figured five heads in a row. 32 00:02:38,890 --> 00:02:41,470 I will thumb that going as a biased coin. 33 00:02:43,310 --> 00:02:51,770 Now, if this is my criteria of saying that the coin is biased or does the probability that I will classify 34 00:02:51,770 --> 00:02:54,140 a fair coin as a biased coin? 35 00:02:56,540 --> 00:03:02,870 As you can see, probably of head in one toss of a coin is half. 36 00:03:04,980 --> 00:03:09,380 So what is the probability that in all the five doses, I'll get all the heads? 37 00:03:10,020 --> 00:03:14,580 So half into half into half and half and half five times. 38 00:03:15,030 --> 00:03:18,290 There it is, harvestable fail become, which comes out to me. 39 00:03:18,360 --> 00:03:19,830 Point zero three one two, fail. 40 00:03:21,380 --> 00:03:29,990 This means that there is nearly three percent chance that I will wrongly call classify a fair coin as 41 00:03:29,990 --> 00:03:30,680 a biased coin. 42 00:03:33,490 --> 00:03:37,770 Now, if I was hundred feared grains five times. 43 00:03:39,590 --> 00:03:45,680 What is the probability of at least one of them getting all five head? 44 00:03:47,730 --> 00:03:50,070 So these are hundred fed grains. 45 00:03:50,760 --> 00:03:53,730 The probability of getting ahead is point five again. 46 00:03:55,520 --> 00:04:00,830 What is the probability that at least one of them is wrongly classified as a biased going? 47 00:04:02,240 --> 00:04:08,690 So this weekend, calculate using this formula that I've shown here, it is first finding out the probability 48 00:04:08,690 --> 00:04:11,690 of getting all heads in none of these. 49 00:04:12,590 --> 00:04:14,990 And we are finding one minus that probability. 50 00:04:15,500 --> 00:04:17,750 And that is coming out to be 95 percent. 51 00:04:18,820 --> 00:04:22,770 You do not need to know how this came out, but the result is important. 52 00:04:23,840 --> 00:04:31,340 The point that I'm trying to make is, although the probably of calling a fair coin as a bad coin was 53 00:04:31,610 --> 00:04:32,660 only three percent. 54 00:04:33,840 --> 00:04:41,520 But if you're doing this experiment a hundred times, it is almost certain that you will wrongly classify 55 00:04:41,820 --> 00:04:43,440 at least one of the coins. 56 00:04:45,500 --> 00:04:49,150 By almost Artane, I mean, there is more than 95 percent probability. 57 00:04:50,630 --> 00:04:58,980 You know, this is similar to the T statistic in the sense I said that we are 95 percent sure that we 58 00:04:59,000 --> 00:05:00,360 today is not zero. 59 00:05:00,870 --> 00:05:03,180 When I saw it, P-value and P-value. 60 00:05:04,970 --> 00:05:06,980 So we had nearly five percent unsure. 61 00:05:08,290 --> 00:05:14,200 So there is a five percent chance of a BITA, which is zero being classified as a nondurable bita. 62 00:05:16,690 --> 00:05:18,770 Not if we have a large number of variables. 63 00:05:20,200 --> 00:05:24,370 All having five percent chance of getting wrongly guessed as significant. 64 00:05:25,190 --> 00:05:31,310 It is almost certain that at least one of them is actually wrongly saying that of for that variable 65 00:05:31,330 --> 00:05:31,870 is non-zero. 66 00:05:33,670 --> 00:05:41,140 So if we depend on individual P values, there is a very high chance that we will incorrectly conclude 67 00:05:41,350 --> 00:05:45,130 that there is a relationship between predictive variables and they respond to real. 68 00:05:47,010 --> 00:05:52,440 The solution to this problem is to adjust for the number of predictors and then find the value. 69 00:05:53,960 --> 00:05:58,670 This new statistic, which I just for the number of predictors, is called F statistic. 70 00:06:00,340 --> 00:06:04,660 This includes the number of variables in the form of this be. 71 00:06:06,600 --> 00:06:14,580 So Alz of when we run a multiple regression model, give a F statistic value for the model and the corresponding 72 00:06:14,580 --> 00:06:15,180 P value. 73 00:06:16,790 --> 00:06:23,490 So this P-value must be checked against the threshold that whether there is a relationship between model 74 00:06:23,490 --> 00:06:25,290 predictors and the response. 75 00:06:26,160 --> 00:06:32,790 If there is, then we will look at the P-value and P-value of each individual variable to identify which 76 00:06:32,790 --> 00:06:34,520 of the variables is significant. 77 00:06:34,550 --> 00:06:35,030 Delegate. 78 00:06:37,520 --> 00:06:38,790 I hope you understand the concept. 79 00:06:39,170 --> 00:06:40,720 I'll briefly summarize it. 80 00:06:42,200 --> 00:06:46,790 The idea is, instead of looking at individual variables. 81 00:06:48,080 --> 00:06:53,290 And saying that the model predictors are significantly impacting the response variable. 82 00:06:54,320 --> 00:06:59,930 We will look at a different statistic which takes into account the number of variables that we have. 83 00:07:01,890 --> 00:07:09,270 This ensures that we do not make the mistake of saying that these models have a significant relationship 84 00:07:09,270 --> 00:07:14,970 with predicted, if we got that, we're not too deeply into variables versus individually impacting 85 00:07:14,970 --> 00:07:15,860 the response. 86 00:07:16,080 --> 00:07:16,860 By chance. 87 00:07:18,000 --> 00:07:23,220 So to avoid that jaunt, we are using this new statistic, which is called a statistic. 88 00:07:23,880 --> 00:07:28,140 We look at its value and we look at the corresponding P value. 89 00:07:28,710 --> 00:07:35,520 If this P value is lower than the threshold value, say, one percent or five percent, again, we say 90 00:07:35,670 --> 00:07:40,050 that the model predictors are significantly impacting the response. 91 00:07:40,620 --> 00:07:48,030 After that, we will look at the individual predictors and their estimates and P values, although not 92 00:07:48,390 --> 00:07:49,620 like I have told you a little. 93 00:07:49,620 --> 00:07:56,370 Also, even if you do not understand the whole concept behind this statistic, the concept that you 94 00:07:56,370 --> 00:08:04,440 should take away from this, we do listen is just that you have to check whether your P value of F statistic 95 00:08:04,650 --> 00:08:06,600 is lower than attracts your low five percent. 96 00:08:07,470 --> 00:08:11,670 You do not need to understand all of the concept behind me after Ristic. 97 00:08:12,870 --> 00:08:20,880 This will ensure that the model predictors that you used are having some significant impact on the response 98 00:08:20,880 --> 00:08:21,210 that you would.