1 00:00:00,390 --> 00:00:01,450 Welcome back. 2 00:00:01,500 --> 00:00:07,260 The last video we learned a few little arithmetic tricks and we learned that arithmetic is a fancy word 3 00:00:07,260 --> 00:00:12,350 for mathematical operations that we can do on num pi arrays. 4 00:00:12,450 --> 00:00:18,990 And so now we're going to learn a bit more about manipulating numbers higher res through aggregation 5 00:00:19,410 --> 00:00:22,520 our aggregation is another one of those fancy words. 6 00:00:22,770 --> 00:00:32,470 Let's Google and define aggregation the formation of a number of things into a cluster. 7 00:00:32,470 --> 00:00:34,560 Yeah that's kind of what we're gonna do. 8 00:00:34,580 --> 00:00:41,770 So aggregation performing the same operation on a number of things. 9 00:00:41,950 --> 00:00:44,830 And in our case the number of things is our number. 10 00:00:44,830 --> 00:00:45,940 Hi Ray. 11 00:00:45,970 --> 00:00:47,630 So let's have a look. 12 00:00:47,800 --> 00:00:51,780 You might be familiar with the some method. 13 00:00:51,880 --> 00:00:54,970 So dot some or Python some. 14 00:00:55,000 --> 00:00:56,730 So let's make a list list. 15 00:00:57,010 --> 00:01:01,550 List equals one two three. 16 00:01:01,600 --> 00:01:02,440 Nice and simple. 17 00:01:02,520 --> 00:01:10,670 And we'll go type list a list listed list is not defined of course we're making typos. 18 00:01:10,670 --> 00:01:11,480 There we go. 19 00:01:11,480 --> 00:01:12,440 Type python. 20 00:01:12,440 --> 00:01:13,210 List. 21 00:01:13,200 --> 00:01:20,790 So if we wanted to get the total of the listed list we could type in this some listed list. 22 00:01:20,820 --> 00:01:22,540 Now that's a python function right there. 23 00:01:22,550 --> 00:01:28,030 Some but none Pi has its own version of these. 24 00:01:28,120 --> 00:01:28,990 So let's see. 25 00:01:28,990 --> 00:01:33,580 Let's go back to our A1 array which is a lot more trusty. 26 00:01:33,670 --> 00:01:36,430 Well not a lot more but it's just a different type. 27 00:01:36,430 --> 00:01:45,000 To list list even though it contains the same information it's in a type num pi NDA array. 28 00:01:45,540 --> 00:01:46,260 So let's have a look. 29 00:01:46,260 --> 00:01:51,330 What can we do with a one we can call some on it and we'll get six. 30 00:01:51,330 --> 00:01:57,690 We can also call MP dot sum on a one we get six as well. 31 00:01:57,720 --> 00:01:58,810 Okay. 32 00:01:58,860 --> 00:02:01,710 Now why would there be different functions here. 33 00:02:01,780 --> 00:02:05,600 And what is the main difference between these two. 34 00:02:05,720 --> 00:02:13,740 It can get confusing having two different ways to perform the same aggregation on a set of data and 35 00:02:13,830 --> 00:02:17,960 remember aggregation just performing the same operation on a number of things. 36 00:02:18,000 --> 00:02:24,270 Some is just an aggregation to add up all the elements of a certain a real list. 37 00:02:24,300 --> 00:02:31,800 Now the little tidbit you can use here is of which some you should use whether it be Python some or 38 00:02:31,890 --> 00:02:51,870 n p dot some is use python's methods on Python data types and use now implies methods on num pi arrays 39 00:02:53,190 --> 00:03:00,060 so this is some actually reminded this one in Mark down so that's a bit easier to read we don't want 40 00:03:00,060 --> 00:03:01,580 heading one. 41 00:03:02,010 --> 00:03:05,750 We'll put this in code so this is our tidbit. 42 00:03:06,220 --> 00:03:09,360 NDP dot some. 43 00:03:09,390 --> 00:03:10,730 There we go. 44 00:03:10,740 --> 00:03:12,570 Now let's say this in action. 45 00:03:12,570 --> 00:03:22,210 So we're gonna create a massive num pi array massive array. 46 00:03:22,420 --> 00:03:27,060 This will really demonstrate the power of how fast num pi can be. 47 00:03:27,070 --> 00:03:29,100 Remember how we talked about right at the start. 48 00:03:29,140 --> 00:03:32,750 Create a massive array with 100 thousand different elements. 49 00:03:32,770 --> 00:03:37,210 This won't be too uncommon for machine learning problems. 50 00:03:37,210 --> 00:03:38,220 So there we go. 51 00:03:38,290 --> 00:03:40,180 Size one hundred thousand. 52 00:03:40,180 --> 00:03:43,890 Let's just view just so we can see what it looks like. 53 00:03:44,080 --> 00:03:45,330 Massive array. 54 00:03:45,670 --> 00:03:47,740 Let's view the first 100 elements. 55 00:03:47,770 --> 00:03:48,580 There we go. 56 00:03:49,710 --> 00:03:51,540 All random numbers right. 57 00:03:51,550 --> 00:03:52,770 We might reduce that to 10. 58 00:03:52,770 --> 00:03:54,350 So we have a bit more space. 59 00:03:54,360 --> 00:03:55,800 Wonderful. 60 00:03:55,800 --> 00:04:02,100 Now we're going to use one of Jupiter notebooks or Python's magic functions with remember a magic function 61 00:04:02,550 --> 00:04:04,890 has a little percentage sign at the start of it. 62 00:04:05,220 --> 00:04:12,540 We got time it and this is going to time how long a particular line of code takes to run. 63 00:04:12,630 --> 00:04:13,820 We'll try Python some. 64 00:04:14,160 --> 00:04:17,970 So some across massive array. 65 00:04:17,970 --> 00:04:20,250 We'll just put a little comment here. 66 00:04:20,370 --> 00:04:21,480 So Python some. 67 00:04:21,990 --> 00:04:23,160 And then time it. 68 00:04:23,220 --> 00:04:32,960 We want now implies some massive array and we'll put a little comment here now and some. 69 00:04:33,040 --> 00:04:34,660 So let's run this and see what happens. 70 00:04:36,140 --> 00:04:40,820 It might take a little while because it's adding up 100000 different numbers. 71 00:04:40,820 --> 00:04:42,350 So what have we got here. 72 00:04:42,350 --> 00:04:46,110 Well the first line is going to be dedicated towards this line here. 73 00:04:46,280 --> 00:04:49,930 And the second line is gonna be dedicated towards non pi some. 74 00:04:49,990 --> 00:04:57,110 So the first line to some over using Python some across 100000 different numbers took seventeen point 75 00:04:57,110 --> 00:05:04,940 nine merely seconds per loop plus or minus two point nine for milliseconds ten loops. 76 00:05:05,270 --> 00:05:13,820 But now implies some end paid out some input that paid out some took 34 micro. 77 00:05:13,850 --> 00:05:17,620 This little symbol here is Mew for microseconds. 78 00:05:18,320 --> 00:05:23,000 So let's convert that we want microseconds to milliseconds 79 00:05:26,090 --> 00:05:31,180 equals one microsecond equals zero point zero zero one milliseconds. 80 00:05:31,190 --> 00:05:38,540 So let's convert this is merely seconds we want to convert that to microseconds seven pain point nine 81 00:05:38,660 --> 00:05:43,650 milliseconds actually two microseconds. 82 00:05:43,700 --> 00:05:45,090 There we go. 83 00:05:45,110 --> 00:05:52,330 So seventeen point nine milliseconds equals seventeen thousand nine hundred microseconds. 84 00:05:52,490 --> 00:05:53,740 My goodness. 85 00:05:53,740 --> 00:06:03,250 So seventeen thousand nine hundred divided by thirty four equals five hundred and twenty six. 86 00:06:03,290 --> 00:06:12,880 So that means now implies MP dot sum is five hundred and twenty six times faster than Python some so 87 00:06:12,890 --> 00:06:14,080 going forward. 88 00:06:14,090 --> 00:06:15,710 Remember that tidbit here. 89 00:06:15,710 --> 00:06:23,630 If you're working with NUM pi data use num pi methods but if you're working with Python data types use 90 00:06:23,630 --> 00:06:31,760 python methods because going back to the start num Pi has been optimized to perform numerical calculations. 91 00:06:31,760 --> 00:06:34,850 So whenever you want to perform numerical calculations. 92 00:06:34,850 --> 00:06:40,190 Always remember to use the NUM pi version of an aggregation function. 93 00:06:40,420 --> 00:06:46,450 And speaking of aggregation functions let's have a look at a few more we'll go a two who will have a 94 00:06:46,450 --> 00:06:51,480 look at our second array that we created as our trusty two dimensional array from before. 95 00:06:51,610 --> 00:06:53,060 Let's find the main of this. 96 00:06:53,080 --> 00:06:59,110 So your MP dot mean a two three point six three three beautiful. 97 00:06:59,110 --> 00:07:04,360 Now the main would be if we added these up one plus two plus three point three plus four plus five for 98 00:07:04,360 --> 00:07:07,980 six point five then divided by the total number of items. 99 00:07:07,990 --> 00:07:14,120 So six we won't do that but that's how you find the main if we wanted the maximum you can use MP dot 100 00:07:14,230 --> 00:07:24,170 Max if we wanted the minimum you can do MP dot mean if we wanted the standard deviation we can do STV 101 00:07:24,250 --> 00:07:30,620 for standard deviation remember we can always press shift tab compute the standard deviation along the 102 00:07:30,620 --> 00:07:37,300 specified axis if we pass no access it's gonna do it over the entire array which is what we want. 103 00:07:37,580 --> 00:07:47,420 Wonderful and if we want the variance we can do in p dot var a2 wonderful shift tab compute the variance 104 00:07:47,570 --> 00:07:53,930 along the specified axis and now you might be saying Daniel what the hell is a standard deviation and 105 00:07:53,930 --> 00:07:55,870 what the hell is the variance. 106 00:07:56,030 --> 00:07:58,710 Well variance let's put a little cone here. 107 00:07:58,910 --> 00:08:07,400 What you need to remember is the variance equals the measure of the average degree to which each number 108 00:08:07,520 --> 00:08:23,450 is different to the main so we want higher variance equals wider range of numbers lower variance equals 109 00:08:23,570 --> 00:08:33,440 lower range of numbers beautiful and now the standard deviation the formal definition is standard deviation 110 00:08:34,490 --> 00:08:46,310 equals standard deviation is a measure of how spread out a group of numbers is from the main the standard 111 00:08:46,310 --> 00:08:52,270 deviation is actually just the square root of the variance so let's try this out. 112 00:08:52,280 --> 00:09:00,710 So we've got MP There's another aggregation function square root of MP dot var a A2 should be equal 113 00:09:00,710 --> 00:09:10,760 to the standard deviation so let's write this down standard deviation equals square root of variance 114 00:09:11,260 --> 00:09:15,200 shift and enter there we go these two numbers are the same. 115 00:09:15,200 --> 00:09:21,110 Now if you'd like to look into a bit more of the math behind this standard deviation the variance all 116 00:09:21,110 --> 00:09:25,970 leave for some reasons in the resources section for you otherwise in the next video we'll see a bit 117 00:09:25,970 --> 00:09:31,220 more of an example of these two in practice they're an important concept to know going forward in any 118 00:09:31,220 --> 00:09:36,540 type of data science or machine learning or statistical work so let's take a quick break. 119 00:09:36,560 --> 00:09:41,420 Play around with some aggregation functions that you've seen on some of the arrays we've made or create 120 00:09:41,420 --> 00:09:43,570 your own and I'll see you in a second.