1 00:00:00,900 --> 00:00:07,440 One thing that I really like doing in my courses is to actually understand the why of everything. 2 00:00:08,070 --> 00:00:11,720 Everything that we learn there should be a reason we're learning it right. 3 00:00:11,940 --> 00:00:17,340 And you might be asking yourself a why do we even care about machine learning how is that useful and 4 00:00:17,640 --> 00:00:19,440 how do we get here. 5 00:00:19,440 --> 00:00:28,410 Well if you think about a business because most technology evolves from business needs we have the advent 6 00:00:28,620 --> 00:00:36,300 of computers and the ability for businesses to use computers to do things really really fast and efficiently 7 00:00:36,660 --> 00:00:38,360 so that they gain an edge. 8 00:00:38,550 --> 00:00:45,480 And then we got spreadsheets spreadsheets like Excel files and CSP files were amazing because we can 9 00:00:45,480 --> 00:00:53,010 store data that businesses generate such as maybe customer data into an excel file and then people got 10 00:00:53,010 --> 00:01:00,850 really really good at analyzing these CSP files these spreadsheets to make business decisions. 11 00:01:01,200 --> 00:01:07,560 Maybe forecasting that December sales are going to be high because while the past two years we've had 12 00:01:07,570 --> 00:01:15,170 really high December sales because of Christmas and then as companies got more and more data we started 13 00:01:15,170 --> 00:01:22,010 getting this idea of the relational databases spreadsheets were great CSP files were great but we started 14 00:01:22,010 --> 00:01:29,850 getting more and more information and data and we needed a better way to organize things to understand 15 00:01:29,850 --> 00:01:31,630 things from our data. 16 00:01:31,680 --> 00:01:39,080 That's when we got things like my askew well which allowed us instead of using spreadsheets to use a 17 00:01:39,170 --> 00:01:46,190 language called ASCII well to read information from our database right information to our database but 18 00:01:46,310 --> 00:01:54,050 similar to spreadsheets use the data that we gathered from the business to make business decisions so 19 00:01:54,050 --> 00:01:57,590 that our business becomes even more profitable. 20 00:01:57,740 --> 00:02:08,260 And then in 2000 we had this fancy term of big data we had big companies like Facebook Amazon Twitter 21 00:02:09,370 --> 00:02:17,290 Google that started accumulating more and more data an insane amount of data that you simply couldn't 22 00:02:17,290 --> 00:02:19,260 contain in a spreadsheet. 23 00:02:19,480 --> 00:02:29,390 User actions user likes user purchasing histories this idea of big data meant that we had so much data 24 00:02:29,420 --> 00:02:37,280 these companies had so much data and sometimes unlike relational databases which had to be a structured 25 00:02:37,340 --> 00:02:44,330 form of data sometimes we got really messy unstructured data and that's where we started getting this 26 00:02:44,330 --> 00:02:52,070 idea of no Eskew well where things like Mongo D.B. came into existence where you can store unstructured 27 00:02:52,070 --> 00:02:56,620 data and hopefully make business decisions out of that. 28 00:02:56,900 --> 00:03:03,670 Maybe if you were Amazon you can use customers purchasing history to recommend different products. 29 00:03:04,550 --> 00:03:13,400 And ever since then this idea of data getting more and more data has turned us into using machine learning 30 00:03:14,400 --> 00:03:21,750 because at some point we have so much data that as humans we can't just look like we did at spreadsheets 31 00:03:22,380 --> 00:03:29,450 and look at columns and rows and make business decisions I mean we still could but then we'd be wasting 32 00:03:29,480 --> 00:03:32,690 all this data that we've been getting over the years. 33 00:03:32,690 --> 00:03:38,360 So companies like Facebook and Google that collect massive amounts of data every single day are turning 34 00:03:38,360 --> 00:03:44,090 to things like machine learning so that instead of humans looking at the data and trying to figure things 35 00:03:44,090 --> 00:03:50,540 out we give this data to machines so that they're better able even better than humans to make business 36 00:03:50,630 --> 00:03:51,840 decisions. 37 00:03:51,890 --> 00:03:57,590 And this idea of machine learning really came to be because of this growth in data that we received 38 00:03:57,590 --> 00:04:06,110 from businesses as well as the improvements in CPE use GP use that is graphical processing units and 39 00:04:06,110 --> 00:04:07,630 computer advancements. 40 00:04:07,730 --> 00:04:15,500 So using the massive amounts of data and massive improvements in computation we can use these machines 41 00:04:15,650 --> 00:04:22,890 to give them this big data and make a decision for us just like we used to with spreadsheets. 42 00:04:22,890 --> 00:04:31,290 Now this is a simplified version of how we got here but I hope it gives you a reason as to why businesses 43 00:04:31,770 --> 00:04:40,170 like this idea of machine learning now in this course we're going to be using this framework and don't 44 00:04:40,170 --> 00:04:40,720 worry. 45 00:04:40,800 --> 00:04:41,880 Don't get intimidated. 46 00:04:41,880 --> 00:04:46,460 You're gonna get really familiar with this framework because well we're going to talk about it a lot 47 00:04:47,100 --> 00:04:49,870 but looking at this just a brief overview. 48 00:04:49,980 --> 00:04:52,650 What do you think the hardest part is. 49 00:04:52,650 --> 00:04:56,630 Can you guess it's this first bar right here. 50 00:04:57,480 --> 00:05:06,090 Grabbing the data is the amount of data is doubling every two years in our world with the Internet all 51 00:05:06,090 --> 00:05:08,440 the mobile phones and connected devices. 52 00:05:08,440 --> 00:05:14,410 We're creating more and more data but this data doesn't mean anything unless we understand it. 53 00:05:14,490 --> 00:05:23,490 Yes we are producing data but a lot of this data that we generate is unused and that's what data science 54 00:05:23,520 --> 00:05:24,260 is. 55 00:05:24,390 --> 00:05:32,040 How can we use this massive quantity of data that is completely useless right now to something that 56 00:05:32,040 --> 00:05:36,740 is useful and not all data is made equal right. 57 00:05:36,740 --> 00:05:39,060 Some are noisy some are messy. 58 00:05:39,110 --> 00:05:40,630 Where do we grab this data from. 59 00:05:40,630 --> 00:05:41,810 How do we find it. 60 00:05:41,810 --> 00:05:45,200 How do we clean it so we can actually learn from it. 61 00:05:45,230 --> 00:05:49,710 We need to understand what data is and then apply machine learning to it. 62 00:05:50,090 --> 00:05:57,110 And the industry is now evolving into these people that we want to be data scientists that is people 63 00:05:57,110 --> 00:06:02,210 that can turn data from use less to use for.