1 00:00:01,910 --> 00:00:02,390 Welcome back. 2 00:00:02,990 --> 00:00:09,020 I want to take a quick section to cover a topic that is often forgot about in data science and machine 3 00:00:09,020 --> 00:00:10,470 learning courses. 4 00:00:10,550 --> 00:00:17,240 However in order to truly understand a topic we need to understand the high level principles and the 5 00:00:17,240 --> 00:00:19,910 landscape of that topic. 6 00:00:19,910 --> 00:00:24,990 In this case we want to work as a data scientist or a machine learning engineer. 7 00:00:25,220 --> 00:00:33,440 So we need to at least understand the topic of data engineering but what does that mean. 8 00:00:34,790 --> 00:00:36,800 What's with all these titles. 9 00:00:36,800 --> 00:00:40,770 Data scientist data analysts data visualization. 10 00:00:40,790 --> 00:00:41,870 This is madness. 11 00:00:41,870 --> 00:00:42,980 This is so confusing. 12 00:00:42,980 --> 00:00:49,820 Well let's talk about data engineering and how it fits into this whole ecosystem that we've been talking 13 00:00:49,820 --> 00:00:51,400 about up until now. 14 00:00:51,410 --> 00:00:57,170 Now we learned that data science is all about using data to make business decisions right. 15 00:00:57,200 --> 00:01:04,900 One of the ways that we can make business decisions is to use a technique like machine learning that 16 00:01:04,900 --> 00:01:11,440 allows a computer to learn and figure out the solution to a problem that may be a little too complicated 17 00:01:11,950 --> 00:01:20,160 for a human to solve or maybe too tedious and takes too long of a time so we want to automate it so 18 00:01:20,160 --> 00:01:27,780 machine learning is a technique and data science is the idea of using data and converting it into something 19 00:01:27,870 --> 00:01:37,430 useful for a product or business and data analysis is a subset of data science that allows us to analyze 20 00:01:37,580 --> 00:01:39,580 the data that we have. 21 00:01:39,680 --> 00:01:45,260 But the missing piece is that of data engineering and not all companies have data engineers. 22 00:01:45,380 --> 00:01:49,930 But as companies get bigger and bigger they need to hire for this role. 23 00:01:49,940 --> 00:01:51,690 Let's take a look at why. 24 00:01:51,800 --> 00:01:55,460 Who brings all this data to us in a nice little file. 25 00:01:55,730 --> 00:01:57,670 Where's this data coming from. 26 00:01:57,680 --> 00:01:58,550 How do we manage it. 27 00:01:58,550 --> 00:02:00,440 Do we throw it out after we're done. 28 00:02:00,440 --> 00:02:02,530 Who labeled this data. 29 00:02:02,730 --> 00:02:07,460 You see a data scientists or a machine learning expert doesn't need to concern themselves. 30 00:02:07,470 --> 00:02:16,710 Most of the time with these topics you ideally have the data accessible to you because this part of 31 00:02:16,800 --> 00:02:19,980 data is handled by a data engineer. 32 00:02:19,980 --> 00:02:22,330 So who is this mythical figure. 33 00:02:22,350 --> 00:02:30,060 Well a data engineer would take all of the data points that are incoming let's say for a company. 34 00:02:30,160 --> 00:02:35,880 Well let's say this company has all these products and all these datas are coming from their users from 35 00:02:35,880 --> 00:02:43,470 their security cameras from their Web site from IO T devices and a data engineer takes all this information 36 00:02:44,680 --> 00:02:52,450 and then produces it and maintains it in databases or a certain type of computers so that the business 37 00:02:52,510 --> 00:02:59,700 has access to this data in an organized fashion so they're kind of like the librarians where they collect 38 00:02:59,730 --> 00:03:05,850 all this information and they organize it for us so that people like machine learning or data science 39 00:03:05,850 --> 00:03:07,670 experts can use this. 40 00:03:08,310 --> 00:03:15,900 But this is a simplified version of what a data engineer does in order for us to really understand this. 41 00:03:15,990 --> 00:03:22,650 We need to go back to the beginning and ask the question what is data I'll see in the next one. 42 00:03:22,920 --> 00:03:23,190 By.