1 00:00:00,330 --> 00:00:06,000 Hello and welcome back to another class of our course about the complete introduction to science, as 2 00:00:06,000 --> 00:00:12,480 well as that analysis with the use of Python in this class, we are going to talk about the really important 3 00:00:12,480 --> 00:00:18,780 part of the science, which would be statistics or basically statistics are not the base of data science, 4 00:00:18,780 --> 00:00:22,010 but are a huge player in data science. 5 00:00:22,500 --> 00:00:29,070 Indeed, a lot of techniques, a lot of statistical measures are used in data science, and this is 6 00:00:29,070 --> 00:00:30,220 why it's really important. 7 00:00:30,540 --> 00:00:33,930 So in this class, we'll just have a small introduction to statistics. 8 00:00:33,930 --> 00:00:40,590 We're not going to talk about all the statistics, terminologies and all the statistics measures that 9 00:00:40,590 --> 00:00:43,430 are used, but just have an introduction to statistics. 10 00:00:43,460 --> 00:00:49,170 What is it exactly where it can be used in the future classes? 11 00:00:49,170 --> 00:00:55,860 We are going to talk about the basics that you guys would need to have in statistics to be able to understand 12 00:00:55,860 --> 00:00:56,450 that science. 13 00:00:56,820 --> 00:00:57,750 So let's start. 14 00:00:58,710 --> 00:01:05,730 So basically what the statistics say, if I go with a basic explanation of what is statistics, that 15 00:01:05,740 --> 00:01:12,480 simply a branch of maths that will deal with the collection, analysis, interpretation and the representation 16 00:01:12,900 --> 00:01:15,570 of masses of numerical data. 17 00:01:15,810 --> 00:01:22,710 So basically, you take a population and you will work with, let's say, a sample of this population. 18 00:01:22,950 --> 00:01:25,470 Indeed, it's possible to work with the whole population. 19 00:01:25,470 --> 00:01:32,550 But once again, this could be really complicated because it's not always possible to, let's say, 20 00:01:32,550 --> 00:01:35,690 for example, make service to the whole population. 21 00:01:35,970 --> 00:01:38,950 So we will work with samples of population. 22 00:01:38,970 --> 00:01:43,410 So let's say, for example, we want to work with, I don't know, with all the American population 23 00:01:44,040 --> 00:01:47,340 to know what is their favorite color, for example. 24 00:01:48,030 --> 00:01:53,610 So instead of asking all the papillae, all the American population will take different sample. 25 00:01:53,630 --> 00:01:58,220 So we're not going to talk about sampling in this class, but this is just an example. 26 00:01:58,410 --> 00:02:02,380 So you can take a sample, for example, from each city in each state. 27 00:02:02,400 --> 00:02:04,190 So this is just one example. 28 00:02:04,500 --> 00:02:10,140 And from that sample, since it's random people, you guys will be able we will be able to bring to 29 00:02:10,140 --> 00:02:15,750 come to a conclusion that the favorite color of this sample is red, to say that the favorite color 30 00:02:15,750 --> 00:02:19,920 of American based on a certain sample is red. 31 00:02:20,790 --> 00:02:24,720 So this is, for example, this is one example of how statistics could be used. 32 00:02:24,730 --> 00:02:31,680 And as you can see, this is some really basic data science things that we have here, statistics to 33 00:02:31,690 --> 00:02:32,940 be used in many fields. 34 00:02:32,970 --> 00:02:35,920 So as you can see here, it could be used in education. 35 00:02:36,000 --> 00:02:42,410 Know, for example, I don't know which well, we can use statistics to see, for example, which in 36 00:02:42,420 --> 00:02:49,710 which type of classes students perform the most or perform the less and try to find out the problems 37 00:02:49,710 --> 00:02:51,930 in, for example, those classes. 38 00:02:51,940 --> 00:02:55,870 For example, let's say students have problems in maths in this field. 39 00:02:55,890 --> 00:02:56,490 Exactly. 40 00:02:56,490 --> 00:03:00,100 Then we will try to find out the problem, why students have problems in this field. 41 00:03:00,120 --> 00:03:03,450 Maybe it's because of the way it's presented. 42 00:03:03,450 --> 00:03:09,070 Maybe it's because, I don't know, students are just not motivated by this field. 43 00:03:09,100 --> 00:03:10,710 Maybe there is any. 44 00:03:10,740 --> 00:03:13,500 Well, we'll try to find the explanation to this. 45 00:03:13,770 --> 00:03:18,060 I mean, it's possible to do it with statistics as well as data science. 46 00:03:18,060 --> 00:03:23,670 But once again, statistics will be the basics of all this could be used in stock market life science, 47 00:03:23,670 --> 00:03:27,710 whether retail insurances and many, many other fields. 48 00:03:27,930 --> 00:03:29,430 Once again, it's not used alone. 49 00:03:29,430 --> 00:03:33,540 It's used with other mathematical fields as well as data science. 50 00:03:35,220 --> 00:03:38,460 So right now, we'll talk about the types of statistics. 51 00:03:38,460 --> 00:03:44,160 Basically, we have two types of statistics, descriptive statistics and inferential statistics. 52 00:03:44,820 --> 00:03:48,000 We are not going to come into details of each of them. 53 00:03:48,000 --> 00:03:54,810 But I'll try to give you a brief definition of what are those two types of statistics and just give 54 00:03:54,810 --> 00:03:56,740 you a small example of what it could be. 55 00:03:57,630 --> 00:04:04,260 So basically, for the descriptive statistic, it will help us organize the data and focus on the character 56 00:04:04,260 --> 00:04:06,120 of data providing parameters. 57 00:04:07,140 --> 00:04:14,610 So this would be for the descriptive statistic, and it's really used to well, it's the use of data 58 00:04:14,610 --> 00:04:22,420 to provide a description of people of a population sort either through numerical or graphs or tables. 59 00:04:22,770 --> 00:04:27,360 So this is basically for descriptive statistics and inferential statistics. 60 00:04:27,390 --> 00:04:28,710 It's a bit different. 61 00:04:29,130 --> 00:04:37,620 It's simply generalized generalized data set and applying probabilities to arrive to a conclusion. 62 00:04:38,520 --> 00:04:44,980 And the inferential statistics will make inferences and predictions about population based on a same 63 00:04:45,210 --> 00:04:48,450 sample of data taken from a certain population. 64 00:04:49,140 --> 00:04:53,060 So basically, let's say that we have wine. 65 00:04:53,070 --> 00:04:58,770 So let's say you guys are producing wine in huge quantities, quantities, and you guys want to know 66 00:04:58,780 --> 00:05:01,980 what you guys want to look at the age of your wine. 67 00:05:02,250 --> 00:05:07,950 So basically with the descriptive statistics that when you guys will start the age of your wine, you 68 00:05:07,950 --> 00:05:09,320 will have the average. 69 00:05:09,330 --> 00:05:11,460 So you will have the average age of your wine. 70 00:05:12,210 --> 00:05:13,480 You will have the max. 71 00:05:13,500 --> 00:05:22,200 So the max, the max, the highest age of your wine, the min, so the lowest and the wine that is the 72 00:05:22,200 --> 00:05:23,090 less aged. 73 00:05:24,120 --> 00:05:29,790 And basically divide it into three categories like this, inferential statistics. 74 00:05:29,800 --> 00:05:34,140 It's going to be a bit well, it's going to be a bit different. 75 00:05:34,560 --> 00:05:39,930 You will still have your average in the center, but you will have a category that will be, in this 76 00:05:39,930 --> 00:05:44,040 case, aged wines, not max aged wine. 77 00:05:44,040 --> 00:05:50,220 So you will just have aged wine where you will have all the winds that are aged more than a certain, 78 00:05:50,280 --> 00:05:53,850 let's say, age, and you will have less aged wines. 79 00:05:54,070 --> 00:05:55,770 So in this case, you will have, let's say. 80 00:05:57,090 --> 00:06:04,350 All the ones that are less age than a certain age in this case will work more with the quartiles instead 81 00:06:04,350 --> 00:06:06,870 of just working with Maxims and Minimum's. 82 00:06:07,380 --> 00:06:09,600 You can also apply this to, say, population. 83 00:06:09,600 --> 00:06:13,560 So a population that is short or small while a population. 84 00:06:13,570 --> 00:06:15,800 So you want to know the height of a certain population. 85 00:06:16,080 --> 00:06:21,950 So in the descriptive statistics, you will have the highest person, you will have the max height, 86 00:06:22,260 --> 00:06:26,610 so you will have the highest person in the room and the shortest person in the room and you will have 87 00:06:26,610 --> 00:06:27,450 the average. 88 00:06:27,720 --> 00:06:32,880 So basically you have those three persons and inferential statistics. 89 00:06:32,880 --> 00:06:38,400 You will have a category that will be tall, a category that would be short, and you have the average 90 00:06:38,400 --> 00:06:39,210 as well. 91 00:06:40,170 --> 00:06:43,200 So basically, those are for the two types of statistics. 92 00:06:43,890 --> 00:06:47,820 I'm then why statistics are useful in data science. 93 00:06:47,820 --> 00:06:51,300 So basically there are really useful for plenty of reasons. 94 00:06:51,300 --> 00:06:58,440 But for me personally, I think that the two main reasons why they are really useful is that it will 95 00:06:58,440 --> 00:07:04,710 give you a basic understanding of different ablazing methods and techniques as well. 96 00:07:05,070 --> 00:07:12,150 So basically, data well, statistics brings to data science many, many techniques and methods that 97 00:07:12,150 --> 00:07:16,710 could be used in data science to be able to, well, analyze. 98 00:07:16,860 --> 00:07:18,480 We better the data. 99 00:07:19,350 --> 00:07:23,900 So basically, this is why your statistics are really, really important data science. 100 00:07:24,090 --> 00:07:28,290 And as I said, statistics are really the base of data science. 101 00:07:28,680 --> 00:07:32,230 And this is well, this is why it's really important. 102 00:07:32,550 --> 00:07:39,900 And the second reason is because it allows data scientists to use many measures such as? 103 00:07:40,020 --> 00:07:41,840 Well, those are just some of them. 104 00:07:42,030 --> 00:07:47,790 So we'll have the merrian the mean, the standard deviation and, well, many more to be able to make 105 00:07:47,790 --> 00:07:50,580 a complete analysis of something. 106 00:07:50,620 --> 00:07:54,530 Well, when I see something good, it could be, for example, in the financial field. 107 00:07:54,840 --> 00:07:59,100 So let's say you want to analyze a stock, you can use the standard deviation. 108 00:07:59,100 --> 00:08:03,150 And this is something that is really, really important because, well, for example, for a stock, 109 00:08:03,480 --> 00:08:06,480 a high standard deviation means that the stock is very volatile. 110 00:08:06,480 --> 00:08:10,110 It's a low standard deviation means that the stock will not move that much. 111 00:08:10,470 --> 00:08:17,250 And once again, this is one this is something that comes from statistics and that can be used in data 112 00:08:17,250 --> 00:08:17,730 science. 113 00:08:17,730 --> 00:08:22,370 Once again, the median, for example, I want to find out the median of something I want to find on 114 00:08:22,410 --> 00:08:24,830 the mean or the average of something. 115 00:08:25,590 --> 00:08:27,050 Those are all good. 116 00:08:27,060 --> 00:08:33,470 That's all coming from statistics and is very, very used into data science. 117 00:08:33,900 --> 00:08:39,990 So what you see here are just some graphs of some analysis that have been made a while. 118 00:08:39,990 --> 00:08:42,960 You will be able to perform those types of analysis. 119 00:08:43,260 --> 00:08:48,300 Those are simply results of of analysis that have been made. 120 00:08:48,300 --> 00:08:52,740 I don't know what those analysis, but those are just the results of the results. 121 00:08:52,740 --> 00:08:53,790 Look, something like this. 122 00:08:54,210 --> 00:09:00,120 And at the end of the day, this is a mix of statistics and the science put it together to be able to 123 00:09:00,120 --> 00:09:07,860 analyze a population or a certain dataset, to be able to answer a certain problematic in this world 124 00:09:08,130 --> 00:09:12,120 and to come to a conclusion about, well, about something. 125 00:09:12,960 --> 00:09:17,370 And to solve a problem, so because this is the main goal of their science, as well as the statistics 126 00:09:17,370 --> 00:09:20,140 in their science to solve a certain problem. 127 00:09:20,640 --> 00:09:23,620 So you guys right now understand why statistics are that important. 128 00:09:24,150 --> 00:09:32,300 Now it's time for us to start learning all the basics that will be really important for data science, 129 00:09:32,310 --> 00:09:38,400 while all the statistical bases that will be really important in data science, for discourse and for 130 00:09:38,400 --> 00:09:41,180 your future, as it are saying this. 131 00:09:41,940 --> 00:09:42,800 So that's it for. 132 00:09:42,980 --> 00:09:43,440 Guys, guys. 133 00:09:43,440 --> 00:09:48,090 And see all in our next class where we are going to learn all those basics.