1
00:00:01,460 --> 00:00:07,590
So you would have noticed that there is one value of this outpoured that we are still not discussed.

2
00:00:07,790 --> 00:00:09,890
And that value is called F statistic.

3
00:00:10,490 --> 00:00:13,460
And in this video, we'll be discussing about F statistic.

4
00:00:15,660 --> 00:00:19,270
Now, if you see in the desert for each of the coefficient.

5
00:00:20,330 --> 00:00:26,600
There is a corresponding P-value, which is telling us whether each individual predictor is related

6
00:00:26,600 --> 00:00:28,800
to the response I'm talking about.

7
00:00:28,820 --> 00:00:30,980
These P values and probability values.

8
00:00:31,100 --> 00:00:31,910
These two columns.

9
00:00:34,560 --> 00:00:41,350
So since at least one of them has a small P value here, we have six of them, too.

10
00:00:41,700 --> 00:00:46,740
Since at least one of them has a small P value corresponding to their devalue.

11
00:00:47,400 --> 00:00:52,830
We may think that shortly at least one of the predictors are related to the response.

12
00:00:54,870 --> 00:00:59,820
Therefore, we will say that there is a relationship between this bond and predictor variables.

13
00:01:00,720 --> 00:01:05,080
That is not all of the coefficients of the variables that has been one.

14
00:01:05,190 --> 00:01:06,610
We do appear to be.

15
00:01:08,020 --> 00:01:08,420
Ardito.

16
00:01:08,880 --> 00:01:10,560
So all of them are not deedle.

17
00:01:13,120 --> 00:01:19,960
This argument may sound OK, but it is actually not, especially if we have a large number of variables.

18
00:01:21,920 --> 00:01:24,050
Let me put it mathematically for you first.

19
00:01:25,550 --> 00:01:30,770
What I'm saying is the null hypothesis is that all of them are Z2.

20
00:01:32,300 --> 00:01:35,150
I want to prove that this is not the case.

21
00:01:35,660 --> 00:01:39,730
I want to say that at least one of DBI does is definitely non-zero.

22
00:01:40,520 --> 00:01:45,380
That is the predictor that I have selected have some relationship with the response.

23
00:01:49,530 --> 00:01:57,090
Now, suppose if my model is saying that any one of the predictors say, hey, we got six.

24
00:01:57,600 --> 00:02:05,040
Suppose it was giving that only one of them to a room them is significantly littered with host praise.

25
00:02:05,190 --> 00:02:07,860
This is the P value and the rest are not.

26
00:02:09,940 --> 00:02:19,510
Now, can I say with confidence that since bit of room them is significantly lifted, I am confident

27
00:02:19,540 --> 00:02:23,880
that the predictors in this model are impacting the response variable.

28
00:02:25,490 --> 00:02:26,810
The answer is no.

29
00:02:26,870 --> 00:02:27,800
I cannot see that.

30
00:02:28,870 --> 00:02:29,710
Let me show you how.

31
00:02:31,660 --> 00:02:38,260
So suppose I toss a coin five times and I say that that I figured five heads in a row.

32
00:02:38,890 --> 00:02:41,470
I will thumb that going as a biased coin.

33
00:02:43,310 --> 00:02:51,770
Now, if this is my criteria of saying that the coin is biased or does the probability that I will classify

34
00:02:51,770 --> 00:02:54,140
a fair coin as a biased coin?

35
00:02:56,540 --> 00:03:02,870
As you can see, probably of head in one toss of a coin is half.

36
00:03:04,980 --> 00:03:09,380
So what is the probability that in all the five doses, I'll get all the heads?

37
00:03:10,020 --> 00:03:14,580
So half into half into half and half and half five times.

38
00:03:15,030 --> 00:03:18,290
There it is, harvestable fail become, which comes out to me.

39
00:03:18,360 --> 00:03:19,830
Point zero three one two, fail.

40
00:03:21,380 --> 00:03:29,990
This means that there is nearly three percent chance that I will wrongly call classify a fair coin as

41
00:03:29,990 --> 00:03:30,680
a biased coin.

42
00:03:33,490 --> 00:03:37,770
Now, if I was hundred feared grains five times.

43
00:03:39,590 --> 00:03:45,680
What is the probability of at least one of them getting all five head?

44
00:03:47,730 --> 00:03:50,070
So these are hundred fed grains.

45
00:03:50,760 --> 00:03:53,730
The probability of getting ahead is point five again.

46
00:03:55,520 --> 00:04:00,830
What is the probability that at least one of them is wrongly classified as a biased going?

47
00:04:02,240 --> 00:04:08,690
So this weekend, calculate using this formula that I've shown here, it is first finding out the probability

48
00:04:08,690 --> 00:04:11,690
of getting all heads in none of these.

49
00:04:12,590 --> 00:04:14,990
And we are finding one minus that probability.

50
00:04:15,500 --> 00:04:17,750
And that is coming out to be 95 percent.

51
00:04:18,820 --> 00:04:22,770
You do not need to know how this came out, but the result is important.

52
00:04:23,840 --> 00:04:31,340
The point that I'm trying to make is, although the probably of calling a fair coin as a bad coin was

53
00:04:31,610 --> 00:04:32,660
only three percent.

54
00:04:33,840 --> 00:04:41,520
But if you're doing this experiment a hundred times, it is almost certain that you will wrongly classify

55
00:04:41,820 --> 00:04:43,440
at least one of the coins.

56
00:04:45,500 --> 00:04:49,150
By almost Artane, I mean, there is more than 95 percent probability.

57
00:04:50,630 --> 00:04:58,980
You know, this is similar to the T statistic in the sense I said that we are 95 percent sure that we

58
00:04:59,000 --> 00:05:00,360
today is not zero.

59
00:05:00,870 --> 00:05:03,180
When I saw it, P-value and P-value.

60
00:05:04,970 --> 00:05:06,980
So we had nearly five percent unsure.

61
00:05:08,290 --> 00:05:14,200
So there is a five percent chance of a BITA, which is zero being classified as a nondurable bita.

62
00:05:16,690 --> 00:05:18,770
Not if we have a large number of variables.

63
00:05:20,200 --> 00:05:24,370
All having five percent chance of getting wrongly guessed as significant.

64
00:05:25,190 --> 00:05:31,310
It is almost certain that at least one of them is actually wrongly saying that of for that variable

65
00:05:31,330 --> 00:05:31,870
is non-zero.

66
00:05:33,670 --> 00:05:41,140
So if we depend on individual P values, there is a very high chance that we will incorrectly conclude

67
00:05:41,350 --> 00:05:45,130
that there is a relationship between predictive variables and they respond to real.

68
00:05:47,010 --> 00:05:52,440
The solution to this problem is to adjust for the number of predictors and then find the value.

69
00:05:53,960 --> 00:05:58,670
This new statistic, which I just for the number of predictors, is called F statistic.

70
00:06:00,340 --> 00:06:04,660
This includes the number of variables in the form of this be.

71
00:06:06,600 --> 00:06:14,580
So Alz of when we run a multiple regression model, give a F statistic value for the model and the corresponding

72
00:06:14,580 --> 00:06:15,180
P value.

73
00:06:16,790 --> 00:06:23,490
So this P-value must be checked against the threshold that whether there is a relationship between model

74
00:06:23,490 --> 00:06:25,290
predictors and the response.

75
00:06:26,160 --> 00:06:32,790
If there is, then we will look at the P-value and P-value of each individual variable to identify which

76
00:06:32,790 --> 00:06:34,520
of the variables is significant.

77
00:06:34,550 --> 00:06:35,030
Delegate.

78
00:06:37,520 --> 00:06:38,790
I hope you understand the concept.

79
00:06:39,170 --> 00:06:40,720
I'll briefly summarize it.

80
00:06:42,200 --> 00:06:46,790
The idea is, instead of looking at individual variables.

81
00:06:48,080 --> 00:06:53,290
And saying that the model predictors are significantly impacting the response variable.

82
00:06:54,320 --> 00:06:59,930
We will look at a different statistic which takes into account the number of variables that we have.

83
00:07:01,890 --> 00:07:09,270
This ensures that we do not make the mistake of saying that these models have a significant relationship

84
00:07:09,270 --> 00:07:14,970
with predicted, if we got that, we're not too deeply into variables versus individually impacting

85
00:07:14,970 --> 00:07:15,860
the response.

86
00:07:16,080 --> 00:07:16,860
By chance.

87
00:07:18,000 --> 00:07:23,220
So to avoid that jaunt, we are using this new statistic, which is called a statistic.

88
00:07:23,880 --> 00:07:28,140
We look at its value and we look at the corresponding P value.

89
00:07:28,710 --> 00:07:35,520
If this P value is lower than the threshold value, say, one percent or five percent, again, we say

90
00:07:35,670 --> 00:07:40,050
that the model predictors are significantly impacting the response.

91
00:07:40,620 --> 00:07:48,030
After that, we will look at the individual predictors and their estimates and P values, although not

92
00:07:48,390 --> 00:07:49,620
like I have told you a little.

93
00:07:49,620 --> 00:07:56,370
Also, even if you do not understand the whole concept behind this statistic, the concept that you

94
00:07:56,370 --> 00:08:04,440
should take away from this, we do listen is just that you have to check whether your P value of F statistic

95
00:08:04,650 --> 00:08:06,600
is lower than attracts your low five percent.

96
00:08:07,470 --> 00:08:11,670
You do not need to understand all of the concept behind me after Ristic.

97
00:08:12,870 --> 00:08:20,880
This will ensure that the model predictors that you used are having some significant impact on the response

98
00:08:20,880 --> 00:08:21,210
that you would.