1 00:00:01,720 --> 00:00:07,090 Hello, guys, and welcome back to another class, of course, about data analysis and science, the 2 00:00:07,090 --> 00:00:09,190 complete introduction with the use of. 3 00:00:10,240 --> 00:00:16,060 So in this class, we are going to have a complete introduction to the concept of data to understand 4 00:00:16,060 --> 00:00:18,940 what exactly is data, how it works. 5 00:00:18,940 --> 00:00:22,760 And that is where you guys will have a complete knowledge about this subject. 6 00:00:23,290 --> 00:00:25,120 So let's jump right into it. 7 00:00:25,960 --> 00:00:28,390 So basically, what is data exactly? 8 00:00:28,660 --> 00:00:33,700 So there is the concept of there can be something really, general, your cell phone, for example, 9 00:00:33,700 --> 00:00:34,970 when you click on your cell phone. 10 00:00:34,990 --> 00:00:41,500 This could be considered as data when, for example, you are using, I don't know, your chips where? 11 00:00:41,710 --> 00:00:46,390 Well, there is data there when you were typing something or on your keyboard, it could be there as 12 00:00:46,390 --> 00:00:46,660 well. 13 00:00:48,250 --> 00:00:55,840 Basically, what will help you can see data could be seen as statistical sets or facts that can be used 14 00:00:55,840 --> 00:00:56,990 for analysis. 15 00:00:57,250 --> 00:01:02,560 So basically, let's say, for example, you guys are walking outside and you are making steps. 16 00:01:02,560 --> 00:01:08,650 Those steps could be construed as data because they could be used to know exactly how many miles you 17 00:01:08,650 --> 00:01:10,370 guys are walking each and every day. 18 00:01:11,180 --> 00:01:16,630 Let's let's say, for example, we want to calculate how many steps we need to do per day to be able 19 00:01:16,630 --> 00:01:17,800 to stay healthy. 20 00:01:18,010 --> 00:01:23,980 Well, the steps that you guys do each and every day can be counted to be able to say that, for example, 21 00:01:23,980 --> 00:01:26,920 you guys have walked 1000 steps and you guys are healthy. 22 00:01:27,250 --> 00:01:32,440 Well, we can make an average of everybody who is walking and say that, OK, in general, on average, 23 00:01:32,680 --> 00:01:40,210 each person that walks one thousand steps and there is a person that is healthy, basically not only 24 00:01:40,210 --> 00:01:45,550 steps, but as I said, it could be absolutely anything that can be quantified, but not only quantified 25 00:01:45,550 --> 00:01:48,060 because there are different stereotypes that exist. 26 00:01:48,280 --> 00:01:49,810 We are going to talk about it a bit later. 27 00:01:50,230 --> 00:01:52,090 So principally, we can have numbers. 28 00:01:52,090 --> 00:01:58,060 As you can see here, we can have quantities, informations, measurements, facts, observations and 29 00:01:58,120 --> 00:01:59,890 graphs as well. 30 00:01:59,890 --> 00:02:07,560 But graphs, I see them more as a visualization, too, because they allow us to see the big picture. 31 00:02:07,570 --> 00:02:11,690 So let's say, for example, we're talking about those steps as well as how many steps are making. 32 00:02:12,610 --> 00:02:18,610 So you will be able to visualize how many steps you guys are making per day with the graphs. 33 00:02:20,090 --> 00:02:27,550 Another thing that is very important to understand with data is that it have to fit for really well, 34 00:02:27,550 --> 00:02:28,930 four big things. 35 00:02:29,470 --> 00:02:33,680 First of all, it has to be able to be collected and stored so easily. 36 00:02:33,970 --> 00:02:39,820 All types of data need to be collected and stored so it can be collected and stored. 37 00:02:39,820 --> 00:02:43,790 Well, it's not really the same thing and has to be measured. 38 00:02:43,810 --> 00:02:48,760 So basically the data that you collected, you can be able to measure it so you can be able to measure 39 00:02:48,760 --> 00:02:51,760 it numerically or with any type of measurement. 40 00:02:52,630 --> 00:02:55,810 You should be able to analyze this data once again. 41 00:02:55,810 --> 00:03:02,050 You should be able if we're talking once again about our steps, this can be considered as raw data 42 00:03:02,200 --> 00:03:02,560 data. 43 00:03:02,570 --> 00:03:04,120 So this is really raw data. 44 00:03:04,390 --> 00:03:08,740 And those steps we can analyze them to, let's say, for example, know how many steps someone who is 45 00:03:08,740 --> 00:03:12,820 healthy is walking, but we can analyze it once again. 46 00:03:12,820 --> 00:03:17,530 And instead of having this conclusion, we can calculate, for example, on average how many steps each 47 00:03:17,530 --> 00:03:18,100 person makes. 48 00:03:18,250 --> 00:03:24,520 Then on average, how many persons per day are working at least 30 minutes a day, for example, once 49 00:03:24,520 --> 00:03:24,850 again. 50 00:03:25,520 --> 00:03:27,890 And we can do a lot of analysis with all this. 51 00:03:28,390 --> 00:03:31,390 And finally, your data should be able to be visualized. 52 00:03:31,400 --> 00:03:37,900 So once again, those steps, you can visualize them on a graph or on any other tool that can be used 53 00:03:37,900 --> 00:03:44,680 in data analysis to visualize things, to visualize what, let's say your conclusions. 54 00:03:44,740 --> 00:03:47,050 So in this case, we can visualize our steps on graph. 55 00:03:47,070 --> 00:03:49,920 So let's say one, the person walked this amount of steps. 56 00:03:49,930 --> 00:03:53,050 They do this person walk this and on the steps, et cetera. 57 00:03:53,710 --> 00:03:55,710 So this is for data. 58 00:03:56,200 --> 00:04:00,760 The second thing that we are going to talk about today would be the types of data, because once again, 59 00:04:00,760 --> 00:04:04,020 this is a huge part of our data part. 60 00:04:04,510 --> 00:04:06,190 So types of data. 61 00:04:06,190 --> 00:04:09,460 There are two principal categories of types of data. 62 00:04:09,670 --> 00:04:15,820 We have the qualitative qualitative one side or the categorical. 63 00:04:15,970 --> 00:04:22,120 Basically, it's the same thing or and you also have the numerical or quantitative data. 64 00:04:22,990 --> 00:04:30,940 So basically the first type of data, which is the qualitative data type, is not really there is like 65 00:04:30,940 --> 00:04:35,440 no numbers and that you can't really measure what's inside of it with numbers. 66 00:04:35,890 --> 00:04:41,020 And the numerical or quantitative type of data is really all about numbers. 67 00:04:41,500 --> 00:04:43,700 Let's see what it looks like. 68 00:04:43,720 --> 00:04:46,360 So the first step, as I said, would be the categorical one. 69 00:04:46,570 --> 00:04:51,370 And inside of the categorical one, we have the nominal end of the data. 70 00:04:52,870 --> 00:04:55,780 So for the nominal data, basically, it's pretty simple. 71 00:04:55,780 --> 00:05:00,550 It can be used to label variables without providing quantitative values. 72 00:05:01,180 --> 00:05:08,260 So once again, let's say, for example, you receive a questionnaire where they ask you what you prefer 73 00:05:08,260 --> 00:05:15,290 to write down with your pencil, your pen, I don't know, a Sharpie, for example, or any other answer. 74 00:05:16,000 --> 00:05:18,750 Once again, you can't quantify the answer. 75 00:05:19,180 --> 00:05:21,400 You can only qualify this answer. 76 00:05:21,410 --> 00:05:23,220 For example, the person answers with a pen. 77 00:05:23,380 --> 00:05:25,390 So, OK, the answer would be a pen. 78 00:05:26,030 --> 00:05:34,210 If we say the number of pens, the number of persons that prefer writing with pens, this would no longer 79 00:05:34,210 --> 00:05:35,410 be a normal data. 80 00:05:35,770 --> 00:05:38,020 The nominal data would only be the pen. 81 00:05:38,020 --> 00:05:44,650 The number of persons that prefer writing with pens would be a it would be another type of data in this 82 00:05:44,650 --> 00:05:44,940 case. 83 00:05:47,260 --> 00:05:50,470 So you'd think it could be, for example, another example that we have here. 84 00:05:50,650 --> 00:05:52,410 Let's say, for example, your favorite animal. 85 00:05:52,690 --> 00:05:56,390 This could be once again another type of nominal data. 86 00:05:56,450 --> 00:05:56,890 Why? 87 00:05:56,890 --> 00:06:02,200 Because, well, in this case, the animals, you can quantify them. 88 00:06:02,530 --> 00:06:07,810 And once again, there is no numerical value to it. 89 00:06:08,290 --> 00:06:13,900 And another really important thing to understand, the difference between nominal and ordinal data is 90 00:06:13,900 --> 00:06:19,450 that the nominal data is not ordered, but the ordinal data is ordered. 91 00:06:19,810 --> 00:06:22,350 So let me explain it differently. 92 00:06:22,360 --> 00:06:28,270 So, for example, when we if we take our first example, which would be let's say, do we prefer writing 93 00:06:28,270 --> 00:06:29,550 with a pen or a pencil? 94 00:06:29,920 --> 00:06:35,980 If I choose pen or if I choose pencil, there is no order between it could be a penny, could be a pencil. 95 00:06:35,980 --> 00:06:36,790 It could be a Sharpie. 96 00:06:37,060 --> 00:06:40,590 All the answers are the same, but ordinal data would be a bit different. 97 00:06:40,990 --> 00:06:45,180 For example, let's have an example of ordinal data. 98 00:06:45,490 --> 00:06:52,000 Let's say, for example, I'm asking someone, do you like this type of exams on a scale of, let's 99 00:06:52,000 --> 00:06:53,750 say, for example, bad to excellent. 100 00:06:54,370 --> 00:06:57,430 So if the person doesn't like it, well, the person will say bad. 101 00:06:57,430 --> 00:06:59,410 If the person likes it, the person will say X. 102 00:06:59,890 --> 00:07:03,050 And as you can see, there is a certain order between those. 103 00:07:03,430 --> 00:07:08,410 Well, those emojis here, we have emojis, but it could be, for example, things like don't like, 104 00:07:08,410 --> 00:07:09,460 OK, fantastic. 105 00:07:09,940 --> 00:07:11,350 There is like a certain order. 106 00:07:11,500 --> 00:07:16,960 But if we ask someone the difference, if they prefer a pen or a pencil or if they prefer a dog or a 107 00:07:16,960 --> 00:07:21,460 cat, there is not necessarily a difference between those. 108 00:07:21,580 --> 00:07:27,810 Well, there is not no difference between one or the other variable, so we can't really see the difference. 109 00:07:27,850 --> 00:07:30,130 For us, it's pretty much the same thing. 110 00:07:31,060 --> 00:07:33,340 So this is for our category parts of data. 111 00:07:33,610 --> 00:07:39,250 As you can see, the main characteristics of those theories is that they can be quantified and they 112 00:07:40,120 --> 00:07:45,670 well, there are really about qualitative, about about quality. 113 00:07:45,720 --> 00:07:49,400 So it's really about the qualifying something, not really quantifying something. 114 00:07:50,110 --> 00:07:50,440 All right. 115 00:07:50,710 --> 00:07:54,600 Let's jump to the second part, which would be numerical and quantitative data. 116 00:07:54,880 --> 00:08:01,030 So basically, in those types of data, we have two types of data which are which looks the same but 117 00:08:01,030 --> 00:08:02,350 are very, very well. 118 00:08:02,530 --> 00:08:03,700 They're not really different. 119 00:08:04,030 --> 00:08:11,380 But while they're pretty much the same thing with some subtle differences so far, the discrete types 120 00:08:11,380 --> 00:08:15,160 of data, basically those would be complete numbers. 121 00:08:15,170 --> 00:08:21,060 So let's say, for example, we'll it won't be an infinity of possible numbers. 122 00:08:21,070 --> 00:08:28,330 For example, let's take the age, the age of someone could not be, for example, one point five five 123 00:08:28,330 --> 00:08:30,520 three four nine, for example. 124 00:08:30,520 --> 00:08:33,250 It will be one years old, two years, three years old. 125 00:08:33,670 --> 00:08:39,910 The height of someone will be, for example, one meter, 10 centimeters, one meter 80 centimeters 126 00:08:39,910 --> 00:08:42,900 or six feet were five feet five, for example. 127 00:08:43,090 --> 00:08:45,880 It's really it's something that is not infinite. 128 00:08:45,890 --> 00:08:47,380 For example, once again. 129 00:08:49,420 --> 00:08:53,570 For example, right here, we can have a sample versus consumer once again here. 130 00:08:53,590 --> 00:09:00,310 It's something that is the has and if we're talking about continues the U.S. So once again, just to 131 00:09:00,330 --> 00:09:03,290 fully to understand, this would be a point on the graph. 132 00:09:03,640 --> 00:09:09,790 So there is nothing for example, let's let's talk about the cost versus the number of tickets. 133 00:09:09,970 --> 00:09:12,580 If I buy one, think it's going to cost me, for example, 20 dollars. 134 00:09:12,580 --> 00:09:15,190 If I buy two tickets, it's going to cost me in this case. 135 00:09:15,190 --> 00:09:16,000 Forty dollars. 136 00:09:16,270 --> 00:09:23,830 But I can't buy in this case, one point three tickets, which in continues would work because, for 137 00:09:23,830 --> 00:09:28,510 example, in continues, you will have another thing which would be in this case, pound versus cost. 138 00:09:28,720 --> 00:09:33,310 And in this case, you will have, let's say, for example, there is an infinite possibility. 139 00:09:33,340 --> 00:09:37,840 So basically the difference between discrete and continuous is that in discrete, in the discrete type 140 00:09:37,840 --> 00:09:43,630 of data, you don't have an infinity of possibilities, because right here, for example, we can have 141 00:09:43,630 --> 00:09:46,060 the age versus the height and we have points. 142 00:09:46,180 --> 00:09:48,280 And those points are infinite. 143 00:09:48,370 --> 00:09:53,440 So there is and then, for example, we have each number one, each number to each, number three, 144 00:09:53,440 --> 00:09:54,240 each number four. 145 00:09:54,880 --> 00:09:58,810 In this case, it's different because we have an infinity of ages. 146 00:09:59,020 --> 00:10:02,620 Even if we have complete numbers, we have an infinity of ages. 147 00:10:02,620 --> 00:10:05,320 We have, for example, one point three, four or five years. 148 00:10:05,410 --> 00:10:08,020 One point if you guys get the point. 149 00:10:08,560 --> 00:10:10,360 So there is an infinity right here. 150 00:10:10,390 --> 00:10:17,200 So this is why we can see that there is well, the line right here, that's a straight line in here. 151 00:10:17,230 --> 00:10:17,850 We have point. 152 00:10:17,870 --> 00:10:23,060 So that's that's the main difference between discrete and continuous type of data. 153 00:10:23,980 --> 00:10:26,220 So we have the discrete type of data. 154 00:10:26,590 --> 00:10:34,210 As I said, it's really infinite and continuous type of data that is that could be considered infinite 155 00:10:34,840 --> 00:10:37,390 in the way that, for example, after the comma. 156 00:10:37,390 --> 00:10:43,390 So it could be, for example, the age of someone would be 30 years, comma, five, four, three, 157 00:10:43,390 --> 00:10:44,780 two, one year. 158 00:10:44,840 --> 00:10:46,510 Once again, that's just an example. 159 00:10:47,740 --> 00:10:52,210 So this is for what is the what is data, the data part. 160 00:10:52,210 --> 00:10:59,450 So you understand what exactly is data, how it works, the different types of data. 161 00:10:59,470 --> 00:11:05,750 So in this case, we talked about the numerical quantitative data as well as categorical types of data. 162 00:11:06,220 --> 00:11:09,490 So I hope you guys understand all those concepts. 163 00:11:09,490 --> 00:11:14,940 As you can see, it's pretty simple until now, if there is anything that you guys don't understand, 164 00:11:15,070 --> 00:11:20,710 don't hesitate to simply watch this glass, because once again, this is pretty simple. 165 00:11:20,710 --> 00:11:24,520 It's only the basics of what exactly is data. 166 00:11:25,030 --> 00:11:28,540 So that's a first class guys and see all in our next class.