0 1 00:00:01,930 --> 00:00:07,660 So now that we've talked a bit about machine learning, it's time to tackle a closely related topic and 1 2 00:00:07,720 --> 00:00:10,270 also a big part of this course - 2 3 00:00:10,480 --> 00:00:15,850 data science. Data science is another one of those terms that gets thrown around a lot, 3 4 00:00:15,850 --> 00:00:17,950 but what does it actually mean? 4 5 00:00:17,950 --> 00:00:22,240 At its heart, data science is about turning data into value. 5 6 00:00:22,420 --> 00:00:26,030 But let's take a look at how it came to be. In the 1980s, 6 7 00:00:26,050 --> 00:00:32,320 IBM came out with the first relational database storing things like customer details or company payroll 7 8 00:00:32,320 --> 00:00:33,050 data. 8 9 00:00:33,190 --> 00:00:39,190 And soon after, we started thinking about what useful things we can mine from the data and the term 9 10 00:00:39,250 --> 00:00:46,510 data mining was coined in a paper called "From Data Mining to Knowledge Discovery in Databases". Published 10 11 00:00:46,510 --> 00:00:48,040 in 1996, 11 12 00:00:48,130 --> 00:00:56,680 it defines data mining as the application of specific algorithms for extracting patterns from data. Let's 12 13 00:00:56,680 --> 00:01:00,360 think about that - extracting patterns from data. 13 14 00:01:00,580 --> 00:01:03,460 We're going to see a lot more of that in the course. 14 15 00:01:03,460 --> 00:01:09,070 Up until now a lot of those patterns were being extracted with plain old statistics, but we're now in 15 16 00:01:09,070 --> 00:01:13,810 the 1990s when computer science was progressing at crazy speeds. 16 17 00:01:13,810 --> 00:01:19,060 So a lot of people were looking into levelling up data mining with computer science and they used the 17 18 00:01:19,060 --> 00:01:24,790 term data science to refer to this new field. As the dot-com era exploded, 18 19 00:01:24,790 --> 00:01:31,450 there were more and more data being generated. And these days every click, every scroll, every time you 19 20 00:01:31,450 --> 00:01:32,410 open an app, 20 21 00:01:32,410 --> 00:01:36,010 huge amounts of data are being generated and collected. 21 22 00:01:36,010 --> 00:01:39,090 Why? Business intelligence. 22 23 00:01:39,220 --> 00:01:44,590 If you were a retailer, wouldn't it be cool if you could predict when somebody was pregnant and then 23 24 00:01:44,590 --> 00:01:45,530 market to them 24 25 00:01:45,580 --> 00:01:50,700 the things that you sell to new parents? Well, this is exactly what Target did. 25 26 00:01:50,740 --> 00:01:56,050 They collected data on people's purchases and then based on what they bought, they could use data science 26 27 00:01:56,050 --> 00:02:01,690 to predict whether someone was pregnant. Or maybe you've seen the movie Moneyball - the story of how a 27 28 00:02:01,690 --> 00:02:07,840 poor baseball team used data science to pick undervalued players, which changed their fate from a losing 28 29 00:02:07,840 --> 00:02:14,050 team to win 20 consecutive games, the longest in American League's history. 29 30 00:02:14,050 --> 00:02:15,220 How did they do it? 30 31 00:02:15,250 --> 00:02:23,050 Data science. Data science is a broad term. So broad that the Journal of Data Science defines it as almost 31 32 00:02:23,170 --> 00:02:26,180 everything that has something to do with data. 32 33 00:02:26,320 --> 00:02:32,650 So most people think of data science as taking huge amounts of data or tapping into big data and performing 33 34 00:02:32,650 --> 00:02:36,480 machine learning or using A.I. to get important insights. 34 35 00:02:37,150 --> 00:02:39,980 But data science is actually much more than that. 35 36 00:02:40,180 --> 00:02:45,580 Just as humans need food and water before they can start thinking about shelter, or what is the meaning 36 37 00:02:45,580 --> 00:02:52,360 of life - in the data science hierarchy of needs, a company first needs to be able to collect and store 37 38 00:02:52,360 --> 00:02:56,270 their data before they can even start thinking about getting meaning out of it. 38 39 00:02:56,590 --> 00:03:00,880 And in a large company that's usually done by data engineers. 39 40 00:03:00,880 --> 00:03:09,550 The role of the data scientist is to then take this data, clean it, explore it, visualize it, and then apply 40 41 00:03:09,550 --> 00:03:16,180 intelligent algorithms to start answering questions with the data. In the upcoming modules we'll be embarking 41 42 00:03:16,180 --> 00:03:22,660 on projects that traverse all the layers that are the responsibility of the data scientist and the machine 42 43 00:03:22,660 --> 00:03:30,030 learning expert. In every module we'll try to solve a real world problem by first understanding how to 43 44 00:03:30,030 --> 00:03:33,320 clean, segment and visualize raw data, 44 45 00:03:33,450 --> 00:03:39,630 we can then learn to apply shallow and deep learning algorithms to extract meaning from the data. 45 46 00:03:39,990 --> 00:03:46,760 Algorithms like linear regression, Bayes classification and deep learning with neural networks. 46 47 00:03:47,010 --> 00:03:54,630 Also, we're going to learn how to use tools like TensorFlow, Python, Keras, pandas and NumPy to solve 47 48 00:03:54,630 --> 00:04:03,070 our data problems - problems like movie revenue prediction, spam classification and image recognition. 48 49 00:04:03,090 --> 00:04:05,340 There's a lot packed into this course. 49 50 00:04:05,350 --> 00:04:07,210 So what are you waiting for? 50 51 00:04:07,210 --> 00:04:08,040 Let's get started.