1 00:00:01,200 --> 00:00:06,320 And welcome back to another class of our course about data science, a complete introduction. 2 00:00:07,080 --> 00:00:13,500 So we're going to discuss we are going to talk about the basics of statistics and we are going to talk 3 00:00:13,500 --> 00:00:15,630 about some basic formulas. 4 00:00:15,660 --> 00:00:17,430 So really basic concepts. 5 00:00:18,000 --> 00:00:20,520 You'll see it's really, really easy to vote. 6 00:00:20,880 --> 00:00:25,500 So basically, the three formulas that are going to be covered today would be the median, the mode 7 00:00:25,500 --> 00:00:28,440 and finally the average or the mean. 8 00:00:28,920 --> 00:00:35,100 And as you can see, those are pretty simple, but yet really important in statistics. 9 00:00:35,580 --> 00:00:40,860 So basically, the first thing that we are going to talk about today would be that median and basically 10 00:00:40,860 --> 00:00:44,760 the median is simply the middle number inside of a deficit. 11 00:00:45,510 --> 00:00:47,330 So let's take an example right here. 12 00:00:47,340 --> 00:00:50,550 We have one, two, three, four, eight, nine twenty two. 13 00:00:50,970 --> 00:00:57,750 In this case, the median will be four, even if it's not the center number in this world, the central 14 00:00:57,750 --> 00:01:03,460 number means the number between one and twenty two, which would be 11 in this dataset. 15 00:01:03,750 --> 00:01:05,250 This is the middle number. 16 00:01:05,250 --> 00:01:11,910 So this is the number that is in the medium well in what is that median here? 17 00:01:12,540 --> 00:01:14,220 As you can see, it's the exact same thing. 18 00:01:14,240 --> 00:01:15,730 It's a perfect representation of it. 19 00:01:16,110 --> 00:01:24,960 So right here we have a dataset and the median will be right there because this is the number that is 20 00:01:24,960 --> 00:01:25,970 in the center. 21 00:01:26,490 --> 00:01:31,680 So basically, each time that we have a number that is in the center, no matter how big the dataset 22 00:01:31,680 --> 00:01:34,280 is, this would be the median. 23 00:01:34,290 --> 00:01:38,400 So let's say, for example, we have one, two, three, four and one thousand. 24 00:01:38,760 --> 00:01:45,780 Well, in this small dataset, the median number would be three, even if the average between one and 25 00:01:45,780 --> 00:01:48,650 five and one thousand would be five hundred. 26 00:01:49,710 --> 00:01:54,690 But in this case, it's going to be well, the median would be three, because this is the number that 27 00:01:54,690 --> 00:01:56,220 is in the center. 28 00:01:57,060 --> 00:01:58,110 So this is for the median. 29 00:01:58,440 --> 00:02:01,830 The second thing that we are going to talk about today would be the mode. 30 00:02:02,340 --> 00:02:05,690 The mode basically is a bit different from the median. 31 00:02:05,700 --> 00:02:07,320 Very simple to understand as well. 32 00:02:07,800 --> 00:02:12,430 Basically, the mode is the number that will appear the most inside of a dataset. 33 00:02:12,720 --> 00:02:18,000 It's possible that we have data sets with more than one mode and also it's possible that we have data 34 00:02:18,000 --> 00:02:20,040 sets with no mode at all. 35 00:02:21,030 --> 00:02:26,700 So let's say, for example, the dataset right here, which is one, two, three three three four eight 36 00:02:26,700 --> 00:02:32,850 nine twenty two twenty two, which in this data set, the mood will be three because three appears three 37 00:02:32,850 --> 00:02:33,770 times in this case. 38 00:02:34,770 --> 00:02:41,280 And since it's a number that appears the most, this would be our most same thing for twenty two in 39 00:02:41,280 --> 00:02:41,610 this case. 40 00:02:41,610 --> 00:02:46,680 Twenty two is not the mode of this data set because it appears three times to be, it appears only two 41 00:02:46,680 --> 00:02:49,340 times and the number three appears three times. 42 00:02:49,830 --> 00:02:54,960 And if we didn't have that number three in this case, the mode will be twenty two. 43 00:02:55,260 --> 00:03:01,490 If like for example, in this dataset we have five numbers and all the numbers appear only one time. 44 00:03:01,830 --> 00:03:03,870 There is no mode in this case. 45 00:03:03,880 --> 00:03:06,690 So we have in this that set new model. 46 00:03:07,110 --> 00:03:12,360 And if for example, we have here three, three, three and we had twenty two, twenty two and twenty 47 00:03:12,360 --> 00:03:14,310 two, twenty two the average three times. 48 00:03:14,640 --> 00:03:20,130 In that case the model has been three and twenty two because they appear the same number of times. 49 00:03:21,040 --> 00:03:24,850 So as you can see in this picture, this represents it really good. 50 00:03:25,200 --> 00:03:27,770 So basically we have the number 10 right here. 51 00:03:27,780 --> 00:03:28,990 Thirty here. 52 00:03:29,010 --> 00:03:29,970 Fifty three. 53 00:03:30,010 --> 00:03:31,330 Seventy nine. 54 00:03:31,710 --> 00:03:34,500 So basically the number of seventy appears forty times. 55 00:03:34,680 --> 00:03:39,090 And this is why this would be the mood of this dataset right here. 56 00:03:39,360 --> 00:03:41,640 All the other numbers don't appear as much. 57 00:03:41,640 --> 00:03:47,760 So if we had, for example, 90, that appears as well 40 times and then it would have been as well. 58 00:03:47,790 --> 00:03:51,340 So we would have had two modes inside of this dataset. 59 00:03:53,220 --> 00:03:57,560 Finally, the last thing that we are going to talk about today would be the mean. 60 00:03:58,350 --> 00:04:01,020 And as I said, this is pretty simple to understand. 61 00:04:01,020 --> 00:04:06,390 Basically, the meaning would be the average of a set of numbers that we have in this case. 62 00:04:07,470 --> 00:04:08,870 The example is directly here. 63 00:04:08,880 --> 00:04:10,800 So let's say we have five numbers. 64 00:04:11,070 --> 00:04:17,050 So the height of some people, basically the mode in this dataset would be there is no more. 65 00:04:17,070 --> 00:04:23,040 So since everything is the same, the median would be one meter seventy centimeters. 66 00:04:23,050 --> 00:04:25,400 So basically in this case, one hundred seventy centimeters. 67 00:04:25,680 --> 00:04:30,030 And finally the average or the mean, the weights, typically it's pretty simple. 68 00:04:30,300 --> 00:04:36,840 You simply make an addition of all the numbers that we have right here and we'll divide it by the number 69 00:04:36,840 --> 00:04:37,390 of numbers. 70 00:04:37,420 --> 00:04:42,120 So in this case, we have five numbers or five entries, so we will divide it by five. 71 00:04:42,150 --> 00:04:43,650 So the calculation is right here. 72 00:04:43,650 --> 00:04:45,240 What we have done is pretty simple. 73 00:04:45,690 --> 00:04:50,220 We made an addition of one hundred fifty one hundred sixty one hundred seventy, one hundred eighty 74 00:04:50,220 --> 00:04:51,870 and one hundred eighty eight right here. 75 00:04:51,900 --> 00:04:52,740 So I made a mistake. 76 00:04:52,740 --> 00:04:53,300 We have a mode. 77 00:04:53,340 --> 00:04:56,020 It's one hundred eighty because it appears two times that difference. 78 00:04:57,630 --> 00:04:59,580 And finally the answer would be. 79 00:05:00,350 --> 00:05:07,220 840 divided by five, because we have right here, we have our five numbers and divided by five, it's 80 00:05:07,220 --> 00:05:09,400 one hundred sixty eight centimeters. 81 00:05:09,710 --> 00:05:12,890 So if I make a quick summary, it's pretty simple with this example. 82 00:05:12,920 --> 00:05:18,050 So first of all, we have the median, which is one hundred seventy centimeters, the mood, which would 83 00:05:18,050 --> 00:05:21,350 be one hundred eighty centimeters because it appears two times. 84 00:05:21,800 --> 00:05:25,660 So since it's the number that appears the most, it's our mode. 85 00:05:25,940 --> 00:05:30,260 And finally our average, which would be one hundred sixty eight centimeters. 86 00:05:30,260 --> 00:05:32,360 And how we made the calculation, it's pretty simple. 87 00:05:32,720 --> 00:05:38,330 We simply made an addition of all the numbers that we have here and we divide it by the number of numbers, 88 00:05:38,330 --> 00:05:43,960 which is five, as you can see, those were the three basic formulas that we talked about. 89 00:05:44,510 --> 00:05:50,570 So in the next class, we are going to learn some well, some concepts and formulas that are a bit more 90 00:05:50,570 --> 00:05:50,980 advanced. 91 00:05:51,260 --> 00:05:54,500 So that's where the glance guys and see our next class.