1 00:00:00,450 --> 00:00:01,600 Welcome back. 2 00:00:01,630 --> 00:00:08,340 In the last video we looked at some aggregation functions with using num pi such as getting the sum 3 00:00:08,340 --> 00:00:10,080 of all the elements in array. 4 00:00:10,080 --> 00:00:17,220 We saw how much faster num PIs code is to pure Python code when working on num pi data types. 5 00:00:17,280 --> 00:00:22,140 So always use num pi aggregation functions on no higher res. 6 00:00:22,350 --> 00:00:28,220 And then we left to with a brief introduction to standard deviation and variance. 7 00:00:28,230 --> 00:00:33,020 Now the main thing to remember about these two metrics is in a word. 8 00:00:33,050 --> 00:00:36,450 They're just the measures of this spread of data. 9 00:00:36,630 --> 00:00:38,230 But let's not talk about it. 10 00:00:38,250 --> 00:00:43,860 Let's always focus on running code and seeing practical examples so that's what we're gonna do in this 11 00:00:43,860 --> 00:00:44,330 video. 12 00:00:44,790 --> 00:00:54,530 So go demo of and var standard deviation FTD a short for standard deviation var show 4 variance. 13 00:00:54,870 --> 00:01:12,180 So we'll create a high bar array equals empty array 1 100 200 300 4000 5000 wonderful low var array 14 00:01:12,790 --> 00:01:21,170 equals empty array maybe we'll make this one nice and simple 2 4 6 8 10. 15 00:01:21,180 --> 00:01:22,590 There we go. 16 00:01:22,590 --> 00:01:27,440 Now before we even run any code how about look at the numbers in these arrays. 17 00:01:27,450 --> 00:01:29,610 Imagine these are actual data. 18 00:01:30,150 --> 00:01:34,730 What would you infer from just looking at these numbers. 19 00:01:35,310 --> 00:01:44,730 So this one goes starts from 1 goes to 100 200 300 then jumps to 4000 finishes on 5000 and this one 20 00:01:44,730 --> 00:01:51,690 kind of just goes to 4 6 8 10 so first of all you might look at the difference between the two maximums. 21 00:01:51,690 --> 00:01:52,930 That's what I kind of looked at. 22 00:01:52,920 --> 00:01:58,380 So this one goes from one to 5000 and this one goes from two to 10. 23 00:01:58,440 --> 00:02:04,230 So if we looked at this we might say that these numbers to me these are more spread out whereas these 24 00:02:04,230 --> 00:02:05,490 are pretty close together. 25 00:02:05,670 --> 00:02:12,000 And the maximum distance is only eight and they have the same step in between each one. 26 00:02:12,210 --> 00:02:17,280 Whereas this kind of varies it goes up by 100 at a time and then up by a few thousand and then up by 27 00:02:17,280 --> 00:02:18,360 a thousand to finish off. 28 00:02:18,720 --> 00:02:22,840 But let's put it into practice let's run the code let's see what's going on. 29 00:02:22,860 --> 00:02:26,740 So we shift into let's find the violence of both of them. 30 00:02:26,760 --> 00:02:35,040 High bar then we can compress tab for autocomplete MP bar we want to have auto complete. 31 00:02:35,460 --> 00:02:38,940 We need to actually type the start of a variable that would be helpful. 32 00:02:39,150 --> 00:02:40,020 Beautiful. 33 00:02:40,280 --> 00:02:42,330 We'll hit shift and enter. 34 00:02:42,450 --> 00:02:42,990 Huh. 35 00:02:43,090 --> 00:02:47,740 Now this is an absolutely massive number here and this is not too big. 36 00:02:48,260 --> 00:02:49,520 Mm hmm. 37 00:02:49,740 --> 00:02:52,800 Well we're not going to look at the mathematics behind these two. 38 00:02:53,130 --> 00:02:56,610 What's more important is the concept here and the variance. 39 00:02:56,610 --> 00:03:02,550 If we go up here remember variance equals a measure of the average degree to which each number is different 40 00:03:02,550 --> 00:03:06,780 to the mean higher variance equals wider range of numbers. 41 00:03:06,780 --> 00:03:10,380 Yep lower variance equals lower range of numbers. 42 00:03:10,380 --> 00:03:14,760 And that makes sense with these two arrays this array here. 43 00:03:14,760 --> 00:03:21,090 High var array has very high variance whereas lower Ray because the numbers are close together it has 44 00:03:21,090 --> 00:03:22,440 a lower variance. 45 00:03:22,440 --> 00:03:23,260 So let's have a look. 46 00:03:23,530 --> 00:03:26,090 Let's do standard deviation FTD. 47 00:03:26,370 --> 00:03:33,960 We want pi var array and we also want the low VAR array. 48 00:03:34,080 --> 00:03:37,520 Standard deviation before we do this. 49 00:03:37,710 --> 00:03:41,030 What do you think the standard deviation is here. 50 00:03:41,040 --> 00:03:44,540 Is it going to come out similar to what this looks like. 51 00:03:44,540 --> 00:03:46,940 Well let's have a look. 52 00:03:46,950 --> 00:03:47,620 There we go. 53 00:03:48,000 --> 00:03:48,900 And that makes sense. 54 00:03:48,900 --> 00:03:56,100 The standard deviation is higher for the high variance array then for the low variance array and what 55 00:03:56,100 --> 00:04:01,410 the standard deviation a measure of how spread out a group of numbers is from the main. 56 00:04:01,560 --> 00:04:01,970 Okay. 57 00:04:01,980 --> 00:04:05,060 They've both got mean in their definition so let's have a look. 58 00:04:05,190 --> 00:04:21,290 Let's find the main dot high var array e and we want N.P. main low VAR array however is not defined. 59 00:04:21,290 --> 00:04:23,430 This is why we should use to have auto complete. 60 00:04:23,480 --> 00:04:31,100 So what this is saying is a standard deviation this number here is the average distance a number is 61 00:04:31,220 --> 00:04:32,240 away from the main. 62 00:04:33,010 --> 00:04:36,230 So if the main is one thousand six hundred. 63 00:04:36,350 --> 00:04:45,510 That means on average any other number in this array in high var array is two thousand seventy two. 64 00:04:45,560 --> 00:04:47,040 Away from the next number. 65 00:04:47,240 --> 00:04:53,550 The same goes for the low var array any number in the low bar is on average 2.8. 66 00:04:53,670 --> 00:04:57,620 Away from the main so that means 10. 67 00:04:57,740 --> 00:05:04,310 This number is although it's 4 away from 6 because the mean is 6 8 is closer to 6. 68 00:05:04,400 --> 00:05:05,820 So that's what brings the average down. 69 00:05:06,500 --> 00:05:09,120 All right now let's check out this spread. 70 00:05:09,120 --> 00:05:14,420 Remember that's the main crux of standard deviation of variance is just the spread of numbers how they 71 00:05:14,420 --> 00:05:15,790 appear. 72 00:05:15,830 --> 00:05:19,970 Let's plot them so we can do that by importing that plot lib. 73 00:05:19,970 --> 00:05:24,050 We're gonna have a look at map plot lib in a future section so don't worry too much if you're not sure 74 00:05:24,050 --> 00:05:25,340 what's going on here. 75 00:05:25,340 --> 00:05:30,440 Map plot lib dot pi law as peyote as what we want. 76 00:05:30,470 --> 00:05:33,100 Yeah plot dot hist. 77 00:05:33,200 --> 00:05:34,730 So this is a histogram. 78 00:05:34,950 --> 00:05:35,670 Hi. 79 00:05:35,720 --> 00:05:37,610 We just want that tab order complete. 80 00:05:37,610 --> 00:05:40,480 If in doubt tab order complaint see what comes up. 81 00:05:40,730 --> 00:05:44,910 Shift into no module Name at plot. 82 00:05:45,550 --> 00:05:47,160 We got that wrong map plotted. 83 00:05:47,310 --> 00:05:47,830 There we go. 84 00:05:48,600 --> 00:05:50,370 Okay so there we go. 85 00:05:50,430 --> 00:05:52,230 The high var array. 86 00:05:52,230 --> 00:06:00,530 Now let's do the same for the low VAR right thought haste low bar array and we won't plot. 87 00:06:00,530 --> 00:06:03,680 Don't show wonderful. 88 00:06:03,700 --> 00:06:09,560 So now we can see visually if you couldn't see it from these numbers in the number line here. 89 00:06:09,610 --> 00:06:15,980 Now visually we can kind of see the spread of the numbers of high var array is a lot bigger so there's 90 00:06:15,980 --> 00:06:17,800 a lot more whitespace here. 91 00:06:17,800 --> 00:06:24,880 A lot of the numbers appear towards zero and only to the samples appear up here whereas in our low VAR 92 00:06:24,880 --> 00:06:30,730 array the spread is kind of similar across the whole thing because there's gaps of 2 between each data 93 00:06:30,730 --> 00:06:35,420 point and the maximum distance here is only 8. 94 00:06:35,470 --> 00:06:40,300 All right so that's standard deviation and variance in a conceptual level. 95 00:06:40,300 --> 00:06:46,900 Now remember if you do want the mathematics behind this how these two values here the standard deviation 96 00:06:46,930 --> 00:06:48,520 and the variance are calculated. 97 00:06:48,600 --> 00:06:54,160 I'll leave some extra resources but just remember the main takeaway from this is that standard deviation 98 00:06:54,280 --> 00:06:57,770 and variance are measures of spread of data. 99 00:06:58,880 --> 00:07:04,010 Now with that being said we finished up some of the most common aggregation functions you'll see a num 100 00:07:04,010 --> 00:07:04,850 pi. 101 00:07:04,910 --> 00:07:10,040 Let's get into the next section where we can check out some more ways of manipulating arrays.