1 00:00:01,400 --> 00:00:07,670 So you would have noticed that there is one value of this outpoured that we have still not discussed 2 00:00:07,670 --> 00:00:13,490 and that value is called F statistic, and in this video we'll be discussing about F statistic. 3 00:00:15,580 --> 00:00:19,300 Now, if you see the result for each of the coefficient. 4 00:00:20,240 --> 00:00:26,630 There is a corresponding P-value, which is telling us whether each individual predictor is related 5 00:00:26,630 --> 00:00:31,040 to the response, I'm talking about these P values and probability values. 6 00:00:31,070 --> 00:00:31,910 These two columns. 7 00:00:34,500 --> 00:00:42,780 So since at least one of them has a small p value here, we have six of them to, since at least one 8 00:00:42,780 --> 00:00:50,340 of them has a small P value corresponding to their T value, we may think that shortly at least one 9 00:00:50,340 --> 00:00:52,830 of the predictors are related to the response. 10 00:00:54,810 --> 00:01:00,930 Therefore, we will say that there is a relationship between this point and political variables that 11 00:01:00,930 --> 00:01:06,660 is not all of the conditions of the variables, that is the we do have to be to be. 12 00:01:07,890 --> 00:01:10,520 Ardito, so all of them are not Dettol. 13 00:01:12,910 --> 00:01:19,960 So this argument may sound okay, but it is actually not, especially if we have a large number of variables. 14 00:01:21,830 --> 00:01:30,200 Let me put it mathematically for you first, what I'm saying is the null hypothesis is that all of them 15 00:01:30,200 --> 00:01:30,710 are zero. 16 00:01:32,180 --> 00:01:35,130 I want to prove that this is not the case. 17 00:01:35,510 --> 00:01:39,500 I want to say that at least one of the Beatles is definitely not. 18 00:01:40,430 --> 00:01:45,350 That is the predictor that I am selected, have some relationship with the response. 19 00:01:49,310 --> 00:01:58,050 To now, suppose if my model is saying that any one of the predictors say here we got sakes, suppose 20 00:01:58,050 --> 00:02:05,520 it was giving that only one of them to room number is significantly littered with Osprey's, which is 21 00:02:05,520 --> 00:02:07,890 the P value and the rest are not. 22 00:02:09,850 --> 00:02:19,540 Now, can I say with confidence that since bit of room number is significantly limited, I am confident 23 00:02:19,540 --> 00:02:23,950 that the predictors in this model are impacting the response variable. 24 00:02:25,430 --> 00:02:27,800 The answer is no, I cannot say that. 25 00:02:28,780 --> 00:02:37,780 Let me show you how to suppose I toss a coin five times and I say that, that if I get five heads in 26 00:02:37,780 --> 00:02:41,470 a row, I will turn that coin as a biased coin. 27 00:02:43,220 --> 00:02:50,810 Now, if this is my criteria of saying that the coin is biased, what is the probability that I will 28 00:02:51,080 --> 00:02:54,140 classify a fair coin as a biased coin? 29 00:02:56,420 --> 00:03:02,870 As you can see, probability of head in one toss of a coin is half. 30 00:03:04,920 --> 00:03:12,510 So what is the probability that in all the five doses, I'll get all the heads half into half into half 31 00:03:12,510 --> 00:03:19,380 and half and half five times, that is harvestable fail, which comes out three point zero three one 32 00:03:19,380 --> 00:03:19,860 two fail. 33 00:03:21,320 --> 00:03:30,380 This means that there is nearly three percent chance that I will wrongly classify a fair coin as a biased 34 00:03:30,380 --> 00:03:30,700 coin. 35 00:03:33,330 --> 00:03:37,770 Now, if I toss 100 fair coins five times. 36 00:03:39,530 --> 00:03:45,680 What is the probability of at least one of them getting all five head? 37 00:03:47,670 --> 00:03:53,730 So these are 100 coins, the probability of getting ahead is point five again. 38 00:03:55,430 --> 00:04:00,830 What is the probability that at least one of them is wrongly classified as a biased coin? 39 00:04:02,180 --> 00:04:08,690 So this we can calculate using this formula that I've shown here, it is first finding out the probability 40 00:04:08,690 --> 00:04:14,930 of getting all heads and none of these and we are finding one minus that probability. 41 00:04:15,380 --> 00:04:17,780 And that is coming out to be 95 percent. 42 00:04:18,730 --> 00:04:22,770 You do not need to know how this came out, but the result is important. 43 00:04:23,780 --> 00:04:30,530 The point that I am trying to make is, although the probability of calling a fair coin as a bad coin 44 00:04:30,890 --> 00:04:32,660 was only three percent. 45 00:04:33,750 --> 00:04:41,490 But if you are doing this experiment 100 times, it is almost certain that you will wrongly classify 46 00:04:41,670 --> 00:04:43,410 at least one of the Queen's. 47 00:04:45,440 --> 00:04:49,160 By almost certain, I mean, there is more than 95 percent probability. 48 00:04:50,640 --> 00:04:58,940 You know, this is similar to this statistic in the sense I said that we are 95 percent sure that we 49 00:04:58,970 --> 00:05:00,280 that there is not zero. 50 00:05:00,780 --> 00:05:03,120 When I saw it devalue and P-value. 51 00:05:04,910 --> 00:05:07,000 So we are nearly five percent unsure. 52 00:05:08,230 --> 00:05:14,260 So there is a five percent chance of a beta, which is zero being classified as a non-zero beta. 53 00:05:16,640 --> 00:05:18,760 Not if we have a large number of variables. 54 00:05:20,110 --> 00:05:26,890 All having five percent chance of getting wrongly guessed as significant, it is almost certain that 55 00:05:26,890 --> 00:05:31,690 at least one of them is actually wrongly saying that we have got that very well known. 56 00:05:33,640 --> 00:05:41,110 So if we depend on individual P values, there is a very high chance that we will incorrectly conclude 57 00:05:41,260 --> 00:05:45,100 that there is a relationship between predictor variables and irresponsible. 58 00:05:46,950 --> 00:05:52,380 The solution to this problem is to adjust for the number of predictors and then find the p value. 59 00:05:53,930 --> 00:05:58,700 This new statistic, which adjust just for the number of predictors, is called a statistic. 60 00:06:00,340 --> 00:06:04,600 This includes the number of variables in the form of this be. 61 00:06:06,570 --> 00:06:14,580 So all this then we run a multiple regression model, give a statistic value for the model and the corresponding 62 00:06:14,580 --> 00:06:15,180 P value. 63 00:06:16,770 --> 00:06:23,490 So this P-value must be checked against the threshold that whether there is a relationship between model 64 00:06:23,490 --> 00:06:30,840 predictors and the response, if there is, then we will look at the disvalue and P-value of each individual 65 00:06:30,840 --> 00:06:34,780 variable to identify which of the variables is significantly. 66 00:06:37,400 --> 00:06:38,820 I hope you understand the concept. 67 00:06:39,020 --> 00:06:40,730 I'll briefly summarize it. 68 00:06:42,110 --> 00:06:46,790 The idea is, instead of looking at individual variables. 69 00:06:47,960 --> 00:06:54,860 And saying that the model predictors are significantly impacting the response variable, we will look 70 00:06:54,860 --> 00:06:59,990 at a different statistic which takes into account the number of variables that we have. 71 00:07:01,830 --> 00:07:09,300 This ensures that we do not make the mistake of saying that the models have a significant relationship 72 00:07:09,300 --> 00:07:10,060 with predictor. 73 00:07:10,470 --> 00:07:16,830 If we got that, we are not to worry about positively impacting the response by chance. 74 00:07:17,910 --> 00:07:24,360 So to avoid that chant, we are using this new statistic, which is called a statistic, we look at 75 00:07:24,360 --> 00:07:28,110 its value and we look at the corresponding P value. 76 00:07:28,590 --> 00:07:33,890 If this P value is lower than the threshold value, say, one percent or five percent. 77 00:07:33,900 --> 00:07:40,040 Again, we say that the model predictors are significantly impacting the response. 78 00:07:40,500 --> 00:07:48,000 After that, we will look at the individual predictors and their estimate and P values, although not 79 00:07:48,330 --> 00:07:49,620 like I have told you earlier. 80 00:07:49,620 --> 00:07:56,370 Also, even if you do not understand the whole concept behind this statistic, the concept that you 81 00:07:56,370 --> 00:08:04,530 should take away from this video lesson is just that you have to check whether your P value of F statistic 82 00:08:04,530 --> 00:08:06,610 is lower than a threshold of five percent. 83 00:08:07,410 --> 00:08:11,700 You do not need to understand all of the concept behind Aphoristic. 84 00:08:12,810 --> 00:08:20,880 This will ensure that the model predicts that you used are having some significant impact on the response 85 00:08:20,880 --> 00:08:21,270 that responsivity will.