1 00:00:02,070 --> 00:00:12,240 In this session, let's understand the concept of central tendency and dispassion whenever you are presented 2 00:00:12,240 --> 00:00:22,260 with any data set before you go on develop any statistical models, it is important to understand what 3 00:00:22,260 --> 00:00:23,430 is going on in the data. 4 00:00:24,120 --> 00:00:30,660 And the best way to understand the data is to use the concept of central tendency and dispersion. 5 00:00:31,260 --> 00:00:33,490 So what is the central tendency in this push? 6 00:00:34,350 --> 00:00:39,360 These are the techniques we use when it comes to understanding central tendency. 7 00:00:40,170 --> 00:00:45,390 Dispersion tells me the extent of variation that is there in my process. 8 00:00:46,020 --> 00:00:52,920 This tells about what is going on in my process, what is there in my dataset from the perspective of 9 00:00:52,920 --> 00:00:54,720 the midpoint. 10 00:00:54,990 --> 00:00:55,310 OK. 11 00:00:56,330 --> 00:00:57,930 So you have a central tendency. 12 00:00:57,950 --> 00:01:04,550 I mean, how does Bush analyze the data from both the perspectives it will view very useful insights. 13 00:01:05,090 --> 00:01:10,520 OK, so let's understand the central tendency first and then we will go to this Bush. 14 00:01:11,180 --> 00:01:16,120 OK, central tendency measures like mean median mode. 15 00:01:16,610 --> 00:01:23,180 You'll see that first arithmetic range or simply range is a variation measure. 16 00:01:23,600 --> 00:01:31,430 OK, so let's say you have a data dataset, nine three one eight three six, meaning simply some of 17 00:01:31,430 --> 00:01:33,380 this divided by count. 18 00:01:33,650 --> 00:01:33,950 Right. 19 00:01:34,310 --> 00:01:40,130 If I add all the six members, the count the study, I divided by six. 20 00:01:40,130 --> 00:01:43,850 I find this is simply the arithmetic average. 21 00:01:44,480 --> 00:01:49,310 Median, on the other hand, is the midpoint. 22 00:01:50,340 --> 00:01:55,290 Midpoint of my data set after I arranged the data set in an ascending order. 23 00:01:55,740 --> 00:01:56,910 So this is the data set. 24 00:01:56,910 --> 00:01:59,270 I have nine three one eight three six. 25 00:01:59,610 --> 00:02:05,790 I arranged this in ascending order, one three three six, eight, nine, three and six. 26 00:02:05,790 --> 00:02:08,560 The ones I've highlighted in blue, they are the midpoint. 27 00:02:08,640 --> 00:02:09,030 Right. 28 00:02:09,420 --> 00:02:15,040 So the median is three plus six, divided by two and four point five. 29 00:02:15,690 --> 00:02:19,350 Let's assume for a moment, OK, six is not there. 30 00:02:20,130 --> 00:02:21,340 What will be the median? 31 00:02:22,410 --> 00:02:24,060 The median will be three. 32 00:02:25,030 --> 00:02:32,130 Right, so you add and divide by two one leave and you have two data points as points, right? 33 00:02:32,590 --> 00:02:38,350 If six is not there, there are only five data points and the midpoint is three. 34 00:02:38,590 --> 00:02:40,000 We will go for the middle point. 35 00:02:40,660 --> 00:02:52,840 OK, unmoored more than simply the frequently occurring number modestly and ranges maximum minus minimum 36 00:02:52,840 --> 00:02:54,620 that is there in my data set. 37 00:02:55,540 --> 00:02:58,140 OK, so I mean median. 38 00:02:58,150 --> 00:03:05,440 What are the measures of central tendency that gives information about what is there in my dataset from 39 00:03:05,440 --> 00:03:07,240 the midpoint perspective? 40 00:03:08,170 --> 00:03:11,130 OK, now let's understand. 41 00:03:11,150 --> 00:03:17,710 Standard Deviation already introduced a measure of variation that is range ranges, maximum minhas meaning 42 00:03:18,010 --> 00:03:26,380 minimum standard deviation is the extent of variation that is there in my dataset from the. 43 00:03:27,530 --> 00:03:28,410 Average. 44 00:03:29,030 --> 00:03:32,300 OK, so the average one, two, three, four, five. 45 00:03:32,330 --> 00:03:35,220 This is my dataset and three is my average. 46 00:03:35,780 --> 00:03:38,720 I mentioned the extent of variation from three. 47 00:03:39,380 --> 00:03:45,650 OK, then I squared it divided by nine minus one, and then I take a square root. 48 00:03:46,250 --> 00:03:50,650 OK, this is the sum of the square of difference. 49 00:03:50,660 --> 00:03:56,040 And then I submit count minus one is four, so 10 divided by four is two point five. 50 00:03:56,450 --> 00:03:59,760 And the square root of this is one point fifty eight. 51 00:04:00,170 --> 00:04:04,730 So my standard deviation is one point five eight. 52 00:04:05,060 --> 00:04:09,140 We can also take the square of standard deviation, which is variance. 53 00:04:09,440 --> 00:04:15,410 OK, we'll tell the extent of variation that is there in my data or process. 54 00:04:16,070 --> 00:04:16,490 OK. 55 00:04:18,920 --> 00:04:25,850 Having seen the concept of central tendency and dispersion, let's now look at the types of distribution 56 00:04:25,850 --> 00:04:27,590 that you will encounter, OK? 57 00:04:28,070 --> 00:04:35,570 You will either encounter a symmetrical distribution or a unsymmetrical distribution, right in a symmetrical 58 00:04:35,570 --> 00:04:38,860 distribution, which is also called as a normal distribution. 59 00:04:39,290 --> 00:04:44,210 The valley of mean median and mode would be the same in a skewed distribution. 60 00:04:44,240 --> 00:04:47,120 You can either have a positive skew on a negative skew. 61 00:04:47,540 --> 00:04:49,880 These values won't be the same. 62 00:04:50,950 --> 00:04:57,490 So how will you create a graph like this, we use a technique called frequency distribution. 63 00:04:58,330 --> 00:05:06,550 Frequency distribution is nothing but a display of different frequencies of data that is there in my 64 00:05:06,550 --> 00:05:07,090 dataset. 65 00:05:07,420 --> 00:05:11,890 OK, if you see this is the mark obtained by different students. 66 00:05:14,630 --> 00:05:20,960 Nine students have got two marks for students, about one mark, six students have got three months, 67 00:05:21,260 --> 00:05:22,230 so on and so forth. 68 00:05:22,580 --> 00:05:26,430 If I plotted this graphically, it is known as the frequency distribution. 69 00:05:26,810 --> 00:05:31,760 It tells me whether there is any skew that is there in my data set or not. 70 00:05:32,630 --> 00:05:35,720 OK, so this understanding is also important. 71 00:05:36,440 --> 00:05:39,900 OK, now we have understood the concept of distribution. 72 00:05:39,930 --> 00:05:44,830 Now let's take a simple exercise, OK? 73 00:05:45,170 --> 00:05:47,500 I have two frequency distributions. 74 00:05:48,380 --> 00:05:52,010 I also have their averages and standard deviation. 75 00:05:52,370 --> 00:05:58,010 These is the marks obtained by students in two different groups. 76 00:05:58,060 --> 00:06:00,340 OK, this is group's performance. 77 00:06:00,350 --> 00:06:01,810 This is Group B's performance. 78 00:06:02,960 --> 00:06:06,020 Can you tell me which group is better and why? 79 00:06:07,100 --> 00:06:07,850 Take a moment. 80 00:06:10,000 --> 00:06:11,230 If you really see. 81 00:06:12,710 --> 00:06:21,860 Group A is better because the standard deviation is lower when the standard deviation is lawyer, it 82 00:06:21,860 --> 00:06:23,900 means my variation is lower. 83 00:06:24,650 --> 00:06:29,260 Y variation has to be lower when variation is low. 84 00:06:29,720 --> 00:06:35,960 My predictability for future from the current data set is higher. 85 00:06:37,190 --> 00:06:40,220 When the variation is high in my predictability is obviously lower. 86 00:06:40,220 --> 00:06:40,540 Right. 87 00:06:41,640 --> 00:06:48,430 So from that perspective, it is important to understand the concept of mean versus standard English. 88 00:06:49,310 --> 00:06:53,010 OK, let's look at one more example. 89 00:06:53,850 --> 00:07:02,610 In this example, I have again a similar ESCOs scenario, meaning eighty five year, meaning that standard 90 00:07:02,610 --> 00:07:04,830 deviation of seven standard deviation is 50. 91 00:07:05,460 --> 00:07:07,170 Which one is better on one? 92 00:07:09,590 --> 00:07:12,590 If you really see in this scenario also. 93 00:07:14,100 --> 00:07:23,380 Team's performance, that is, this team's performance is better because the variation is low, OK, 94 00:07:23,460 --> 00:07:25,160 and the average is also higher, right? 95 00:07:25,980 --> 00:07:32,580 Normally in a test, you want more students to score higher marks, like look at these extreme data 96 00:07:32,580 --> 00:07:32,900 points. 97 00:07:32,910 --> 00:07:33,750 Look at the skew. 98 00:07:34,820 --> 00:07:41,610 Right, which means that a variation is higher, that means more predictability for future will be lower. 99 00:07:43,070 --> 00:07:46,410 Are you are you understanding the concept of standard deviation was as me? 100 00:07:47,290 --> 00:07:50,120 OK, now let's see one more example. 101 00:07:51,620 --> 00:07:53,840 I have four shooters, OK? 102 00:07:54,790 --> 00:07:58,120 You have to tell me which order is the best. 103 00:07:58,480 --> 00:08:02,380 OK, I think you all can guess that this shooter is the best. 104 00:08:02,380 --> 00:08:02,700 Right. 105 00:08:03,160 --> 00:08:10,390 And of the remaining three shooters who can actually become the best shooter, right. 106 00:08:10,480 --> 00:08:11,500 He's the best shooter. 107 00:08:12,040 --> 00:08:18,820 So I have three shooters now of the three shooters, which shooter can become the best shooter? 108 00:08:19,000 --> 00:08:19,300 Right. 109 00:08:20,720 --> 00:08:23,470 Ritual can perform like this. 110 00:08:24,610 --> 00:08:28,100 You have one, two, three choices, these take a moment. 111 00:08:29,480 --> 00:08:34,690 He said this, I think this is ruled out right, this shooter is all over the place. 112 00:08:35,930 --> 00:08:44,410 This shooter is closer to the target, this shooter is away from the target, but the variation is low. 113 00:08:47,670 --> 00:08:54,540 In my opinion, this shooter can become a very good shooter, much like this shooter. 114 00:08:55,120 --> 00:09:01,350 OK, it could be that this shooter is probably not holding the gun properly or holding the gun at an 115 00:09:01,350 --> 00:09:04,920 angle or not aiming correctly. 116 00:09:04,960 --> 00:09:09,420 OK, all that this person has to be taught is to invent. 117 00:09:10,450 --> 00:09:19,100 This person that is a Fashoda can actually do a good job if he's taught how to aim correctly. 118 00:09:19,780 --> 00:09:25,420 OK, only a minor modification is needed in shooter one. 119 00:09:26,800 --> 00:09:32,620 Shooter two and three will probably require more effort, more effort. 120 00:09:32,620 --> 00:09:34,270 Only then they can become a shooter. 121 00:09:35,000 --> 00:09:40,090 So shooter one can actually go all the way to the top just like the shooter. 122 00:09:40,930 --> 00:09:41,330 Right. 123 00:09:42,250 --> 00:09:47,920 So what is the concept we are trying to explain here in this case, precision versus accuracy, that 124 00:09:47,920 --> 00:09:48,860 is a game mean. 125 00:09:48,860 --> 00:09:51,490 What's the standard deviation in this case? 126 00:09:52,090 --> 00:09:53,920 In this case, precision and accuracy? 127 00:09:53,920 --> 00:09:57,480 Both are high, in this case, precision, accuracy, both are bad. 128 00:09:58,150 --> 00:10:00,340 In this case, precision is lower. 129 00:10:00,340 --> 00:10:01,540 Accuracy is fairly high. 130 00:10:02,320 --> 00:10:05,250 In this case, precision is very good, but accuracy is low. 131 00:10:06,220 --> 00:10:09,820 We actually want both precision and accuracy in our dataset. 132 00:10:13,540 --> 00:10:16,540 You understand the concept of central tendency and dispositional. 133 00:10:17,770 --> 00:10:18,250 OK.