1 00:00:01,140 --> 00:00:02,370 Measures of desperation. 2 00:00:02,790 --> 00:00:04,660 Tell us about the spread of the data. 3 00:00:05,520 --> 00:00:07,650 We'll be discussing three measures of discussion. 4 00:00:08,250 --> 00:00:13,740 First is range, second is standard deviation and third is variance variance and standard deviation 5 00:00:13,740 --> 00:00:18,780 are related by a simple formula that is variance is the squared of standard deviation. 6 00:00:20,490 --> 00:00:22,050 Let's look at range first. 7 00:00:24,130 --> 00:00:27,580 Range is simply the largest value, minus the smallest value. 8 00:00:29,310 --> 00:00:35,260 So in that example, the largest value is 34 and the smallest value is seven. 9 00:00:36,000 --> 00:00:37,490 So the range is 27. 10 00:00:38,700 --> 00:00:43,380 But as we discussed, four main ranges influenced by outliers. 11 00:00:43,950 --> 00:00:51,710 So if your data has outliers, range should be avoided, for example, in the state, only if you had 12 00:00:51,780 --> 00:00:59,820 one value, which was exceptionally high, for example, its value was 150, then your range would be 13 00:00:59,820 --> 00:01:02,460 150 minus seven, which will be 143. 14 00:01:02,910 --> 00:01:09,930 But this range will not be a correct representative of the dispersion in this data, since only one 15 00:01:09,930 --> 00:01:11,230 value is exceptionally high. 16 00:01:11,470 --> 00:01:16,920 Other values are more or less confined to a small range, which is seven to 34. 17 00:01:18,630 --> 00:01:24,480 To that way, if your data has outliers, try to remove the outliers before finding the range or else 18 00:01:25,470 --> 00:01:27,720 don't use range as a measure of dispersion. 19 00:01:28,440 --> 00:01:33,260 We will discuss this further when we are preparing our data for analysis. 20 00:01:33,690 --> 00:01:38,450 So there will be a separate section on data prepossessing that will be discussing it in further detail. 21 00:01:42,820 --> 00:01:49,870 Next, a standard deviation and variance variance is the average of square differences from the mean. 22 00:01:51,550 --> 00:01:54,030 So explain this is the difference from the mean. 23 00:01:54,470 --> 00:01:56,440 Where was the population mean? 24 00:01:56,590 --> 00:02:00,790 As we defined earlier, X is the value of each observation. 25 00:02:01,630 --> 00:02:10,410 When we do X minus MU, it is the difference of that particular observation from the mean we square 26 00:02:10,540 --> 00:02:14,290 it all such differences and then we find the average of it. 27 00:02:18,390 --> 00:02:24,510 When we do it for the population, we get a population standard deviation, for population standard 28 00:02:24,510 --> 00:02:27,500 deviation, the formula is as per the definition only. 29 00:02:28,740 --> 00:02:35,970 But if you are estimating standard deviation of population, this is a sample the formula has and minus 30 00:02:35,970 --> 00:02:36,480 one below. 31 00:02:37,580 --> 00:02:41,460 We will not discuss the proof of how this becomes a minus one. 32 00:02:42,030 --> 00:02:48,690 But in short, this is because when we create all that possible sample, the standard deviation of all 33 00:02:48,690 --> 00:02:51,600 these samples should come out to population standard deviation. 34 00:02:52,870 --> 00:02:57,060 That will happen only when there is a minus one hit instead of in. 35 00:02:57,690 --> 00:03:07,710 To just remember this, you could probably imagine that larger sigma value means data is more widely 36 00:03:07,710 --> 00:03:10,560 distributed to. 37 00:03:10,560 --> 00:03:12,940 Let's calculate the standard deviation for this data. 38 00:03:14,070 --> 00:03:21,780 First, we have to calculate the mean mean for this data comes out to twenty four point eight and then 39 00:03:21,780 --> 00:03:26,890 we find the variance which is defined as some of squared of distances from mean. 40 00:03:27,540 --> 00:03:29,790 So what the first value, which is ten. 41 00:03:30,330 --> 00:03:33,090 We find the difference from mean and we square it. 42 00:03:33,840 --> 00:03:39,850 Then we take the second value, find the difference from mean, we square it and we add all these values 43 00:03:39,850 --> 00:03:43,300 that we divided by the total number of observations, which is twenty four. 44 00:03:44,280 --> 00:03:50,040 So we get sixteen twenty four point sixty five divided by twenty four, which comes out to sixty seven 45 00:03:50,040 --> 00:03:50,710 point sixteen. 46 00:03:51,930 --> 00:03:52,920 This is the variance. 47 00:03:53,790 --> 00:04:00,460 We do a square root of this value to get standard deviation square root of sixty seven point sixteen. 48 00:04:00,480 --> 00:04:01,550 Is it one, two, three. 49 00:04:01,770 --> 00:04:08,410 So standard deviation for this data is eight point two three to the larger this value, the larger is 50 00:04:08,430 --> 00:04:09,780 the dispersion from the center. 51 00:04:11,610 --> 00:04:20,070 So if you get another set of twenty four observations which are similar mean but has lower value standard 52 00:04:20,070 --> 00:04:27,420 deviation, that means that data has less dispersion in it and the values are closer to the mean. 53 00:04:28,890 --> 00:04:36,450 And if you have a data which has just a division, the values are farther away from dimin to that's 54 00:04:36,450 --> 00:04:37,530 all in this we do like to.