1 00:00:00,480 --> 00:00:05,790 Hello and welcome back to another class of our course about the complete introduction to data science 2 00:00:05,790 --> 00:00:06,600 and Python. 3 00:00:07,290 --> 00:00:11,670 So in this class, we are still talking about some more statistical concepts and this is what we will 4 00:00:11,670 --> 00:00:13,560 cover in this whole class. 5 00:00:13,630 --> 00:00:20,010 We are going to talk about four very interesting statistical concepts that are very, very used in the 6 00:00:20,100 --> 00:00:26,250 science, as well as in statistics in general, and that we will be really interesting for you guys 7 00:00:26,250 --> 00:00:31,940 to know once again for the purpose of this course, but for your life in general as well. 8 00:00:32,770 --> 00:00:34,640 So let's jump right into it. 9 00:00:35,350 --> 00:00:40,120 So the first official concept that we are going to talk about would be the range. 10 00:00:40,570 --> 00:00:47,050 So basically, let's say you guys are in a deficit and you want to know the range of the numbers that 11 00:00:47,050 --> 00:00:48,120 are in this dataset. 12 00:00:48,430 --> 00:00:52,060 How exactly do you calculate the range of those numbers? 13 00:00:52,600 --> 00:00:53,490 So it's pretty simple. 14 00:00:54,520 --> 00:01:00,880 You take the highest numbers to the max minus the lowest number, which is the mean, and this would 15 00:01:00,880 --> 00:01:02,980 give you the range of those numbers. 16 00:01:03,460 --> 00:01:08,500 So, for example, in this case, as you can see, the max would be eight and the minute is well, in 17 00:01:08,500 --> 00:01:11,380 this case four, because this is where it starts. 18 00:01:12,040 --> 00:01:13,490 So the range would be four. 19 00:01:14,110 --> 00:01:20,100 So basically, if we read the definition could be the difference between the max and the men in dataset. 20 00:01:21,380 --> 00:01:27,040 So it will allow us to know what is the spread in the data or the data that we have. 21 00:01:27,370 --> 00:01:31,780 So basically the highest number that we have, minus the lowest number that we have, and this would 22 00:01:31,780 --> 00:01:36,330 be Erving's swendson in this example right here, the range will be 10. 23 00:01:36,340 --> 00:01:41,740 So the highest number is 12 and the lowest number would be all right. 24 00:01:41,770 --> 00:01:47,760 The second thing that we are going to talk about would be quartiles and basically this or percentiles. 25 00:01:48,310 --> 00:01:56,500 So basically do those things which will allow us to know, for example, who is the 10 percent best, 26 00:01:56,500 --> 00:02:01,980 where the 10 percent where starts of something, where it's just a measure in statistics. 27 00:02:02,410 --> 00:02:07,320 So basically, this is a division of observations and two to four defined points. 28 00:02:08,200 --> 00:02:10,860 So you have the Q1 to Q2, to Q3. 29 00:02:11,290 --> 00:02:15,070 And as you can see right here, we have a graph of cordials. 30 00:02:16,270 --> 00:02:19,030 So basically the Q1 would be the twenty fifth percentile. 31 00:02:19,480 --> 00:02:22,280 The Q2 is basically the median or the center. 32 00:02:22,300 --> 00:02:27,460 So basically this would be the center and finally the Q3 would be the seventy fifth person, though. 33 00:02:27,820 --> 00:02:33,460 So if we take a basic example of some numbers like here, for example, we have the two four, four, 34 00:02:33,460 --> 00:02:34,870 five, six, seven, eight. 35 00:02:35,620 --> 00:02:37,980 So in this case, the Q1 would be four. 36 00:02:38,380 --> 00:02:42,820 So it would be the twenty five percent, the median would be five. 37 00:02:43,570 --> 00:02:44,280 The third. 38 00:02:44,710 --> 00:02:46,410 Q The Q3 would be seven. 39 00:02:46,420 --> 00:02:49,890 So the seventy fifth percentile and the max would be eight. 40 00:02:49,900 --> 00:02:50,710 The main would be two. 41 00:02:51,640 --> 00:02:53,950 So once again this is these are just basics. 42 00:02:54,370 --> 00:02:56,680 But let's say you have a dataset of one hundred numbers. 43 00:02:57,400 --> 00:02:59,320 In this case it's pretty simple. 44 00:02:59,320 --> 00:03:05,920 The 25th percentile would simply be the twenty fifth number, the 58 the 58 percentile or the median 45 00:03:06,280 --> 00:03:09,790 would be the 58 number and the Q3 was 70 percent. 46 00:03:09,790 --> 00:03:11,270 That would be the seventy fifth number. 47 00:03:11,800 --> 00:03:14,280 So this is just a division into percentages. 48 00:03:14,290 --> 00:03:18,970 So the first 10 percent would be the on the tenth percentile. 49 00:03:18,970 --> 00:03:20,620 So the worst percentiles. 50 00:03:22,480 --> 00:03:22,710 Right. 51 00:03:22,810 --> 00:03:26,410 The next thing that we are going to talk about will be the standard deviation. 52 00:03:26,950 --> 00:03:33,400 And this is where it will be a bit more complicated because, well, the formulas are a bit more complicated. 53 00:03:34,030 --> 00:03:36,010 So first of all, what is the standard deviation? 54 00:03:36,010 --> 00:03:43,270 Basically, the standard deviation is the measure of variation or dispersion of a dataset and something 55 00:03:43,270 --> 00:03:48,940 that is really, really used in finance as well with the variation that we are going to talk about. 56 00:03:49,190 --> 00:03:54,310 Well, the variance sorry that we are going to talk about a bit later in this class. 57 00:03:55,300 --> 00:04:01,930 So basically the standard deviation works well with the data sets and the way it's calculated could 58 00:04:01,930 --> 00:04:05,200 be calculated on a sample as well as on a population. 59 00:04:06,360 --> 00:04:10,440 So this is the formula, basically, and what exactly does it mean? 60 00:04:10,470 --> 00:04:11,190 It's pretty simple. 61 00:04:11,190 --> 00:04:14,740 So we have the X right here, the X would be our numbers. 62 00:04:14,740 --> 00:04:17,360 So let's say, for example, have a data set of one, two, three, four. 63 00:04:17,730 --> 00:04:22,200 So it will simply be one minus the mean of our data sets. 64 00:04:22,230 --> 00:04:24,280 In this case, it could be one, two, three, four. 65 00:04:24,300 --> 00:04:29,430 So the mean of one, two, three, four, all this at two. 66 00:04:29,460 --> 00:04:32,760 So basically you make the square square feet. 67 00:04:32,760 --> 00:04:38,190 So you you multiply it by itself and you make the addition of everything. 68 00:04:38,200 --> 00:04:42,230 So it's going to be, for example, in this case and if we have a dataset of one, two, three, four, 69 00:04:42,420 --> 00:04:50,190 there's going to be one minus the mean of our data set everything multiplied by itself, plus two minus 70 00:04:50,190 --> 00:04:58,760 our mean multiplied by itself, plus three minus the mean but multiplied by itself, plus four by the 71 00:04:58,780 --> 00:05:08,610 multiple of minus four multiplied by itself and everything divided by the number of numbers that we 72 00:05:08,610 --> 00:05:08,880 have. 73 00:05:09,630 --> 00:05:16,590 So if in our dataset we have four numbers, it's going to be four minus one, which would be three for 74 00:05:16,590 --> 00:05:22,140 a population that's a bit more different instead of making minus one at the end. 75 00:05:22,150 --> 00:05:25,280 So we will simply divide it by the number of numbers that we have. 76 00:05:25,710 --> 00:05:30,720 And at the end it's going to be the square feet of the square square root of everything. 77 00:05:31,920 --> 00:05:35,470 How to interpret the variance, the standard deviation. 78 00:05:35,490 --> 00:05:37,040 So it's pretty simple. 79 00:05:37,050 --> 00:05:45,420 So basically you let's say you have a dataset and you have a mean of 10, and in this mean often you 80 00:05:45,420 --> 00:05:48,060 have a standard deviation of two. 81 00:05:48,630 --> 00:05:55,950 So basically, if you add to your mean or your average one standard deviation, you have sixty eight 82 00:05:55,950 --> 00:05:58,990 percent of chances to be on target. 83 00:05:59,010 --> 00:06:01,080 If, for example, I don't know you are. 84 00:06:01,380 --> 00:06:03,060 Well, it's really used in the stock market. 85 00:06:03,480 --> 00:06:09,990 So let's say you want to predict the year you're stuck, it goes up or down by how much. 86 00:06:10,230 --> 00:06:11,040 So it's pretty simple. 87 00:06:11,040 --> 00:06:18,000 You will simply take your data, the data of the stock, and then you will simply calculate the standard 88 00:06:18,000 --> 00:06:19,020 deviation of this data. 89 00:06:19,050 --> 00:06:25,800 So let's say, for example, the the mean or the average of this data would be 10 with a standard deviation 90 00:06:25,800 --> 00:06:26,310 of one. 91 00:06:26,490 --> 00:06:33,960 So in this case, the price of your stock has six to eight percent of chances to be ten plus one or 92 00:06:33,960 --> 00:06:34,800 ten minus one. 93 00:06:34,830 --> 00:06:41,990 So it's going to be between nine dollars and 11 dollars in the next well, in the next X amount of times. 94 00:06:42,570 --> 00:06:49,140 So if you want to have ninety five percent of chances to be right, it's going to be ten dollars plus 95 00:06:49,680 --> 00:06:50,800 two standard deviation. 96 00:06:50,830 --> 00:06:57,180 So in this case, it's going to be two multiplied by your standard deviation or ten minus two multiplied 97 00:06:57,180 --> 00:06:58,070 your standard deviation. 98 00:06:58,500 --> 00:07:05,460 So your stock would be between eight and 12 in this case because you have a stock that has an average 99 00:07:05,460 --> 00:07:09,030 of ten dollars and a standard deviation of one. 100 00:07:10,170 --> 00:07:13,490 So it's pretty simple and it's the same thing for three standard deviation. 101 00:07:13,860 --> 00:07:19,500 If you want to have ninety nine point seven percent of chances to be on target, well, you're the price 102 00:07:19,500 --> 00:07:22,620 of your stock would be between seven dollars and 13 dollars. 103 00:07:22,620 --> 00:07:30,690 So those would be your extremes or those would be like if you want to be sure to be one well, ninety 104 00:07:30,690 --> 00:07:36,630 nine point seven percent having nineteen point seven percent of chances to be on target or hitting your 105 00:07:36,630 --> 00:07:43,020 stock price, you will add three standard deviation, usually around three. 106 00:07:43,020 --> 00:07:49,980 Standard deviation is not that much of a good thing because, well, it doesn't give you the opportunity 107 00:07:49,980 --> 00:07:52,560 to take risks or your risk is really, really small. 108 00:07:52,860 --> 00:07:56,680 So the profit that you will make would be also really, really small. 109 00:07:57,660 --> 00:07:59,020 So this is pretty standard deviation. 110 00:07:59,040 --> 00:08:03,000 The next thing that we are going to talk about would be the variance. 111 00:08:03,000 --> 00:08:07,140 So basically the variance is pretty much like the standard deviation. 112 00:08:07,170 --> 00:08:14,040 The only difference is that you will well, basically the standard deviation is simply the square root 113 00:08:14,040 --> 00:08:15,600 of the variance. 114 00:08:15,870 --> 00:08:23,310 So basically the variance is simply the standard deviation of at A2, some standard deviation multiplied 115 00:08:23,310 --> 00:08:23,760 by two. 116 00:08:25,680 --> 00:08:30,490 So basically how it works, it does pretty much the same thing as the standard deviation. 117 00:08:31,200 --> 00:08:39,530 It's a measure of spread in a certain dataset, so it works pretty much the same as the standard deviation. 118 00:08:39,780 --> 00:08:42,240 So usually people don't really use it. 119 00:08:42,270 --> 00:08:44,100 People use the standard deviation. 120 00:08:44,120 --> 00:08:47,140 But but it's also really, really important to understand. 121 00:08:47,730 --> 00:08:52,030 So as I said, the method of calculation is the same thing as the standard deviation. 122 00:08:52,830 --> 00:08:59,100 You have to measure the calculation for a population or for example, the main difference is the end 123 00:08:59,100 --> 00:08:59,530 right here. 124 00:08:59,550 --> 00:09:04,380 So it would be all the numbers that we have or if you calculate it for a sample, it will be all the 125 00:09:04,380 --> 00:09:05,250 numbers that we have. 126 00:09:05,360 --> 00:09:09,770 I just want to say, for example, you have a set of 10 numbers is going to be ten minus one, you will 127 00:09:09,770 --> 00:09:11,860 divide it by ten minus one at the end. 128 00:09:12,200 --> 00:09:17,390 And if we compare both formulas, as you can see, for this standard deviation, it's pretty much the 129 00:09:17,390 --> 00:09:17,760 same thing. 130 00:09:17,780 --> 00:09:20,810 The only difference is that you have a square root. 131 00:09:21,710 --> 00:09:23,460 That is the only difference. 132 00:09:23,480 --> 00:09:28,520 So basically the variance is simply the standard deviation multiplied by itself. 133 00:09:29,530 --> 00:09:31,190 So it's pretty much the same thing. 134 00:09:32,000 --> 00:09:34,970 Well, with the difference, because it's multiplied by itself. 135 00:09:34,980 --> 00:09:41,390 But at the end of the day, it give us the same measure measures the spread of a certain dataset. 136 00:09:41,630 --> 00:09:43,340 So this is really important to understand. 137 00:09:44,210 --> 00:09:47,220 So those were the things that we talked about in this class. 138 00:09:47,240 --> 00:09:50,830 So first of all, we talked about the range, which is simply the max minus the minute. 139 00:09:51,470 --> 00:09:56,570 Then we talked about the quartiles, quartiles, which are percentiles as well. 140 00:09:57,230 --> 00:10:04,430 So you have the twenty fifth, the median and the seventy five percentile, which are Q1, Q3 to Q1, 141 00:10:04,430 --> 00:10:05,390 Q2 and Q3. 142 00:10:05,780 --> 00:10:07,850 So quarter one quarter to quarter. 143 00:10:07,850 --> 00:10:13,910 And then we talked about the standard deviation, which is a measure of spread and that aceto measure 144 00:10:14,300 --> 00:10:16,100 of spread. 145 00:10:16,110 --> 00:10:20,480 So you just to know what is the spread or how much something is volatile. 146 00:10:20,870 --> 00:10:26,630 So if we use that, for example, on a stock, it's a good way to know how much stock is volatile and 147 00:10:27,200 --> 00:10:29,230 how much risk we are ready to take. 148 00:10:29,510 --> 00:10:35,320 And then we talked about the variance, which does pretty much the same thing as the standard deviation. 149 00:10:35,330 --> 00:10:40,060 But if you guys want to use something, for example, to predict stocks, once again, I really and 150 00:10:40,190 --> 00:10:44,140 highly suggest you to use the standard deviation, which is way better. 151 00:10:44,600 --> 00:10:49,970 So that's why this class guys and you all in our next class where we are still talking about statistics.