1
00:00:02,070 --> 00:00:12,240
In this session, let's understand the concept of central tendency and dispassion whenever you are presented

2
00:00:12,240 --> 00:00:22,260
with any data set before you go on develop any statistical models, it is important to understand what

3
00:00:22,260 --> 00:00:23,430
is going on in the data.

4
00:00:24,120 --> 00:00:30,660
And the best way to understand the data is to use the concept of central tendency and dispersion.

5
00:00:31,260 --> 00:00:33,490
So what is the central tendency in this push?

6
00:00:34,350 --> 00:00:39,360
These are the techniques we use when it comes to understanding central tendency.

7
00:00:40,170 --> 00:00:45,390
Dispersion tells me the extent of variation that is there in my process.

8
00:00:46,020 --> 00:00:52,920
This tells about what is going on in my process, what is there in my dataset from the perspective of

9
00:00:52,920 --> 00:00:54,720
the midpoint.

10
00:00:54,990 --> 00:00:55,310
OK.

11
00:00:56,330 --> 00:00:57,930
So you have a central tendency.

12
00:00:57,950 --> 00:01:04,550
I mean, how does Bush analyze the data from both the perspectives it will view very useful insights.

13
00:01:05,090 --> 00:01:10,520
OK, so let's understand the central tendency first and then we will go to this Bush.

14
00:01:11,180 --> 00:01:16,120
OK, central tendency measures like mean median mode.

15
00:01:16,610 --> 00:01:23,180
You'll see that first arithmetic range or simply range is a variation measure.

16
00:01:23,600 --> 00:01:31,430
OK, so let's say you have a data dataset, nine three one eight three six, meaning simply some of

17
00:01:31,430 --> 00:01:33,380
this divided by count.

18
00:01:33,650 --> 00:01:33,950
Right.

19
00:01:34,310 --> 00:01:40,130
If I add all the six members, the count the study, I divided by six.

20
00:01:40,130 --> 00:01:43,850
I find this is simply the arithmetic average.

21
00:01:44,480 --> 00:01:49,310
Median, on the other hand, is the midpoint.

22
00:01:50,340 --> 00:01:55,290
Midpoint of my data set after I arranged the data set in an ascending order.

23
00:01:55,740 --> 00:01:56,910
So this is the data set.

24
00:01:56,910 --> 00:01:59,270
I have nine three one eight three six.

25
00:01:59,610 --> 00:02:05,790
I arranged this in ascending order, one three three six, eight, nine, three and six.

26
00:02:05,790 --> 00:02:08,560
The ones I've highlighted in blue, they are the midpoint.

27
00:02:08,640 --> 00:02:09,030
Right.

28
00:02:09,420 --> 00:02:15,040
So the median is three plus six, divided by two and four point five.

29
00:02:15,690 --> 00:02:19,350
Let's assume for a moment, OK, six is not there.

30
00:02:20,130 --> 00:02:21,340
What will be the median?

31
00:02:22,410 --> 00:02:24,060
The median will be three.

32
00:02:25,030 --> 00:02:32,130
Right, so you add and divide by two one leave and you have two data points as points, right?

33
00:02:32,590 --> 00:02:38,350
If six is not there, there are only five data points and the midpoint is three.

34
00:02:38,590 --> 00:02:40,000
We will go for the middle point.

35
00:02:40,660 --> 00:02:52,840
OK, unmoored more than simply the frequently occurring number modestly and ranges maximum minus minimum

36
00:02:52,840 --> 00:02:54,620
that is there in my data set.

37
00:02:55,540 --> 00:02:58,140
OK, so I mean median.

38
00:02:58,150 --> 00:03:05,440
What are the measures of central tendency that gives information about what is there in my dataset from

39
00:03:05,440 --> 00:03:07,240
the midpoint perspective?

40
00:03:08,170 --> 00:03:11,130
OK, now let's understand.

41
00:03:11,150 --> 00:03:17,710
Standard Deviation already introduced a measure of variation that is range ranges, maximum minhas meaning

42
00:03:18,010 --> 00:03:26,380
minimum standard deviation is the extent of variation that is there in my dataset from the.

43
00:03:27,530 --> 00:03:28,410
Average.

44
00:03:29,030 --> 00:03:32,300
OK, so the average one, two, three, four, five.

45
00:03:32,330 --> 00:03:35,220
This is my dataset and three is my average.

46
00:03:35,780 --> 00:03:38,720
I mentioned the extent of variation from three.

47
00:03:39,380 --> 00:03:45,650
OK, then I squared it divided by nine minus one, and then I take a square root.

48
00:03:46,250 --> 00:03:50,650
OK, this is the sum of the square of difference.

49
00:03:50,660 --> 00:03:56,040
And then I submit count minus one is four, so 10 divided by four is two point five.

50
00:03:56,450 --> 00:03:59,760
And the square root of this is one point fifty eight.

51
00:04:00,170 --> 00:04:04,730
So my standard deviation is one point five eight.

52
00:04:05,060 --> 00:04:09,140
We can also take the square of standard deviation, which is variance.

53
00:04:09,440 --> 00:04:15,410
OK, we'll tell the extent of variation that is there in my data or process.

54
00:04:16,070 --> 00:04:16,490
OK.

55
00:04:18,920 --> 00:04:25,850
Having seen the concept of central tendency and dispersion, let's now look at the types of distribution

56
00:04:25,850 --> 00:04:27,590
that you will encounter, OK?

57
00:04:28,070 --> 00:04:35,570
You will either encounter a symmetrical distribution or a unsymmetrical distribution, right in a symmetrical

58
00:04:35,570 --> 00:04:38,860
distribution, which is also called as a normal distribution.

59
00:04:39,290 --> 00:04:44,210
The valley of mean median and mode would be the same in a skewed distribution.

60
00:04:44,240 --> 00:04:47,120
You can either have a positive skew on a negative skew.

61
00:04:47,540 --> 00:04:49,880
These values won't be the same.

62
00:04:50,950 --> 00:04:57,490
So how will you create a graph like this, we use a technique called frequency distribution.

63
00:04:58,330 --> 00:05:06,550
Frequency distribution is nothing but a display of different frequencies of data that is there in my

64
00:05:06,550 --> 00:05:07,090
dataset.

65
00:05:07,420 --> 00:05:11,890
OK, if you see this is the mark obtained by different students.

66
00:05:14,630 --> 00:05:20,960
Nine students have got two marks for students, about one mark, six students have got three months,

67
00:05:21,260 --> 00:05:22,230
so on and so forth.

68
00:05:22,580 --> 00:05:26,430
If I plotted this graphically, it is known as the frequency distribution.

69
00:05:26,810 --> 00:05:31,760
It tells me whether there is any skew that is there in my data set or not.

70
00:05:32,630 --> 00:05:35,720
OK, so this understanding is also important.

71
00:05:36,440 --> 00:05:39,900
OK, now we have understood the concept of distribution.

72
00:05:39,930 --> 00:05:44,830
Now let's take a simple exercise, OK?

73
00:05:45,170 --> 00:05:47,500
I have two frequency distributions.

74
00:05:48,380 --> 00:05:52,010
I also have their averages and standard deviation.

75
00:05:52,370 --> 00:05:58,010
These is the marks obtained by students in two different groups.

76
00:05:58,060 --> 00:06:00,340
OK, this is group's performance.

77
00:06:00,350 --> 00:06:01,810
This is Group B's performance.

78
00:06:02,960 --> 00:06:06,020
Can you tell me which group is better and why?

79
00:06:07,100 --> 00:06:07,850
Take a moment.

80
00:06:10,000 --> 00:06:11,230
If you really see.

81
00:06:12,710 --> 00:06:21,860
Group A is better because the standard deviation is lower when the standard deviation is lawyer, it

82
00:06:21,860 --> 00:06:23,900
means my variation is lower.

83
00:06:24,650 --> 00:06:29,260
Y variation has to be lower when variation is low.

84
00:06:29,720 --> 00:06:35,960
My predictability for future from the current data set is higher.

85
00:06:37,190 --> 00:06:40,220
When the variation is high in my predictability is obviously lower.

86
00:06:40,220 --> 00:06:40,540
Right.

87
00:06:41,640 --> 00:06:48,430
So from that perspective, it is important to understand the concept of mean versus standard English.

88
00:06:49,310 --> 00:06:53,010
OK, let's look at one more example.

89
00:06:53,850 --> 00:07:02,610
In this example, I have again a similar ESCOs scenario, meaning eighty five year, meaning that standard

90
00:07:02,610 --> 00:07:04,830
deviation of seven standard deviation is 50.

91
00:07:05,460 --> 00:07:07,170
Which one is better on one?

92
00:07:09,590 --> 00:07:12,590
If you really see in this scenario also.

93
00:07:14,100 --> 00:07:23,380
Team's performance, that is, this team's performance is better because the variation is low, OK,

94
00:07:23,460 --> 00:07:25,160
and the average is also higher, right?

95
00:07:25,980 --> 00:07:32,580
Normally in a test, you want more students to score higher marks, like look at these extreme data

96
00:07:32,580 --> 00:07:32,900
points.

97
00:07:32,910 --> 00:07:33,750
Look at the skew.

98
00:07:34,820 --> 00:07:41,610
Right, which means that a variation is higher, that means more predictability for future will be lower.

99
00:07:43,070 --> 00:07:46,410
Are you are you understanding the concept of standard deviation was as me?

100
00:07:47,290 --> 00:07:50,120
OK, now let's see one more example.

101
00:07:51,620 --> 00:07:53,840
I have four shooters, OK?

102
00:07:54,790 --> 00:07:58,120
You have to tell me which order is the best.

103
00:07:58,480 --> 00:08:02,380
OK, I think you all can guess that this shooter is the best.

104
00:08:02,380 --> 00:08:02,700
Right.

105
00:08:03,160 --> 00:08:10,390
And of the remaining three shooters who can actually become the best shooter, right.

106
00:08:10,480 --> 00:08:11,500
He's the best shooter.

107
00:08:12,040 --> 00:08:18,820
So I have three shooters now of the three shooters, which shooter can become the best shooter?

108
00:08:19,000 --> 00:08:19,300
Right.

109
00:08:20,720 --> 00:08:23,470
Ritual can perform like this.

110
00:08:24,610 --> 00:08:28,100
You have one, two, three choices, these take a moment.

111
00:08:29,480 --> 00:08:34,690
He said this, I think this is ruled out right, this shooter is all over the place.

112
00:08:35,930 --> 00:08:44,410
This shooter is closer to the target, this shooter is away from the target, but the variation is low.

113
00:08:47,670 --> 00:08:54,540
In my opinion, this shooter can become a very good shooter, much like this shooter.

114
00:08:55,120 --> 00:09:01,350
OK, it could be that this shooter is probably not holding the gun properly or holding the gun at an

115
00:09:01,350 --> 00:09:04,920
angle or not aiming correctly.

116
00:09:04,960 --> 00:09:09,420
OK, all that this person has to be taught is to invent.

117
00:09:10,450 --> 00:09:19,100
This person that is a Fashoda can actually do a good job if he's taught how to aim correctly.

118
00:09:19,780 --> 00:09:25,420
OK, only a minor modification is needed in shooter one.

119
00:09:26,800 --> 00:09:32,620
Shooter two and three will probably require more effort, more effort.

120
00:09:32,620 --> 00:09:34,270
Only then they can become a shooter.

121
00:09:35,000 --> 00:09:40,090
So shooter one can actually go all the way to the top just like the shooter.

122
00:09:40,930 --> 00:09:41,330
Right.

123
00:09:42,250 --> 00:09:47,920
So what is the concept we are trying to explain here in this case, precision versus accuracy, that

124
00:09:47,920 --> 00:09:48,860
is a game mean.

125
00:09:48,860 --> 00:09:51,490
What's the standard deviation in this case?

126
00:09:52,090 --> 00:09:53,920
In this case, precision and accuracy?

127
00:09:53,920 --> 00:09:57,480
Both are high, in this case, precision, accuracy, both are bad.

128
00:09:58,150 --> 00:10:00,340
In this case, precision is lower.

129
00:10:00,340 --> 00:10:01,540
Accuracy is fairly high.

130
00:10:02,320 --> 00:10:05,250
In this case, precision is very good, but accuracy is low.

131
00:10:06,220 --> 00:10:09,820
We actually want both precision and accuracy in our dataset.

132
00:10:13,540 --> 00:10:16,540
You understand the concept of central tendency and dispositional.

133
00:10:17,770 --> 00:10:18,250
OK.