1 00:00:00,640 --> 00:00:01,450 Welcome back. 2 00:00:02,020 --> 00:00:04,320 And I couldn't be more excited for this section. 3 00:00:04,360 --> 00:00:05,260 You know why. 4 00:00:05,260 --> 00:00:11,710 Because we're going to finally start getting hands on with an end to end project putting all the tools 5 00:00:11,710 --> 00:00:15,120 that we've learned so far on the processes in the frameworks and whatnot. 6 00:00:15,160 --> 00:00:16,510 We're gonna put them all together. 7 00:00:16,600 --> 00:00:21,400 We're gonna make something of it we're gonna work on a project specifically with structured data or 8 00:00:21,400 --> 00:00:23,050 a couple of projects actually. 9 00:00:23,050 --> 00:00:27,250 And so you might be asking what is structured data. 10 00:00:27,310 --> 00:00:33,890 And while we've actually been hands on with structured data throughout what we've learned so far basically 11 00:00:34,010 --> 00:00:39,610 structured data is whatever you can fit into this sort of structure more structure is this. 12 00:00:39,680 --> 00:00:46,670 Well this is rows and columns or something like a panda's data frame or something like an Excel spreadsheet 13 00:00:46,670 --> 00:00:49,070 or maybe a Google Sheets document. 14 00:00:49,070 --> 00:00:55,130 And so without heart disease data set that we've been working with as kind of a demo on the left hand 15 00:00:55,130 --> 00:01:00,020 side you could have something like patient I.D. and in the middle you might have some feature variables 16 00:01:00,500 --> 00:01:07,610 such as details about the patient their weight their sex heart rate chest pain and then on the right 17 00:01:07,610 --> 00:01:10,480 hand side or somewhere in the data set. 18 00:01:10,490 --> 00:01:12,080 You may have a target variable. 19 00:01:12,230 --> 00:01:16,970 The thing you want to predict and that's in essence what we're trying to do with machine learning we're 20 00:01:16,970 --> 00:01:23,540 trying to learn some patterns in feature variables and how those patterns relate to a target variable. 21 00:01:23,750 --> 00:01:28,820 And that's what we're gonna do throughout this section is work on structured data projects which take 22 00:01:28,820 --> 00:01:34,850 this sort of approach getting a machine learning model to learn patterns within feature variables and 23 00:01:34,850 --> 00:01:40,750 then using those patterns to predict some target variable now how are we going to do that. 24 00:01:41,200 --> 00:01:44,080 Well we're going to put this bad boy into play. 25 00:01:44,140 --> 00:01:46,170 We've seen it a few times already. 26 00:01:46,180 --> 00:01:48,130 We'll go through it step by step. 27 00:01:48,130 --> 00:01:51,520 The data modeling section of a machine learning project. 28 00:01:51,520 --> 00:01:53,800 So this is where we already have some data. 29 00:01:53,980 --> 00:01:56,860 We're going to go through we're going to define what our problem is. 30 00:01:56,860 --> 00:01:58,300 What problem are we trying to solve. 31 00:01:58,300 --> 00:01:59,920 Is it classification is it regression. 32 00:01:59,920 --> 00:02:01,440 Is it something else. 33 00:02:01,480 --> 00:02:02,950 What data do we have. 34 00:02:02,950 --> 00:02:06,420 We'll look at the dataset where might that actually come from. 35 00:02:06,460 --> 00:02:07,840 What defines success for us. 36 00:02:07,840 --> 00:02:12,700 So if we're working through a machine learning project and we're working on machine learning proof of 37 00:02:12,700 --> 00:02:18,370 concept what level of success do we have to get to before we can say Okay this may move from proof of 38 00:02:18,370 --> 00:02:22,820 concept into actually being deployed what features should we model. 39 00:02:23,170 --> 00:02:27,820 So we have a look again back at the data and are there certain features of the data that should take 40 00:02:27,820 --> 00:02:29,500 more precedence over others. 41 00:02:29,530 --> 00:02:30,850 What kind of model should we use. 42 00:02:30,850 --> 00:02:35,260 So we looked at this a little bit in psychic learn of how to choose a different machine learning model 43 00:02:35,260 --> 00:02:39,580 for different problems and then finally what experiments could we try. 44 00:02:39,580 --> 00:02:41,070 So what have we tried. 45 00:02:41,080 --> 00:02:42,910 What else can we try and the whole thing here. 46 00:02:42,910 --> 00:02:45,870 This is an iterative process that's the main thing to remember. 47 00:02:46,030 --> 00:02:51,970 And then again if we go here we've got some tools that we can use for each step of the pipeline so if 48 00:02:51,980 --> 00:02:59,360 this whole overall pipeline is data science all the tools we're going to focus on are Panda's map plot 49 00:02:59,360 --> 00:03:03,160 lib num pi psychic line and Jupiter. 50 00:03:03,320 --> 00:03:07,850 At least for the first project we got some green boxes highlighting the ones that we're going to be 51 00:03:07,850 --> 00:03:12,230 focused on and we've already been hands on with some of these and we've seen what they can do. 52 00:03:12,230 --> 00:03:18,380 But now it's time to combine them all in an overall project setting and there's one more diagram we 53 00:03:18,380 --> 00:03:19,690 have to look at. 54 00:03:19,730 --> 00:03:23,660 Well this is the steps we're going to take to set up a new project the steps you could take with almost 55 00:03:23,660 --> 00:03:26,290 any machine learning project. 56 00:03:26,290 --> 00:03:30,470 And so far we've got our computer We've downloaded installed many conduct. 57 00:03:30,590 --> 00:03:34,070 We followed this but this year we're up to now. 58 00:03:34,100 --> 00:03:39,320 So this is if we have our computer and we have Condor or mini Condor we're gonna start a new project 59 00:03:39,800 --> 00:03:44,560 and then create a project folder and within that folder will leave some data. 60 00:03:44,870 --> 00:03:48,630 We're going to create an environment similar to the one we've already been working in. 61 00:03:48,720 --> 00:03:53,840 And then within that environment we can create our workspace in the form of a Jupiter notebook. 62 00:03:54,260 --> 00:04:00,320 And then within that workspace we can perform data analysis and manipulation using tools like num pi 63 00:04:00,350 --> 00:04:05,630 panders and map plot lib and then we'll start to look at how we can build or use machine learning models 64 00:04:05,630 --> 00:04:10,640 within psychic learn so psychic line has those pre-built machine learning models that we can use. 65 00:04:10,810 --> 00:04:14,370 We're going to see how we can apply those to our problem. 66 00:04:14,390 --> 00:04:17,520 So you might be wondering of course where can you get help. 67 00:04:17,670 --> 00:04:22,400 And the first step is to because we're going to be working through this project together is to follow 68 00:04:22,400 --> 00:04:23,900 along with the code if you can. 69 00:04:23,900 --> 00:04:28,790 If you fall behind don't worry it will be available to you through some sort of resource through some 70 00:04:28,790 --> 00:04:32,620 sort of link in the place where you get those extra resources. 71 00:04:32,960 --> 00:04:36,740 If you are following along try it for yourself or if you're aren't going along with it at a different 72 00:04:36,740 --> 00:04:37,490 time. 73 00:04:37,640 --> 00:04:39,710 Make sure if in doubt run the code right. 74 00:04:39,710 --> 00:04:40,620 That's the motto here. 75 00:04:40,630 --> 00:04:44,900 Find out run the code if you're curious about what a function does. 76 00:04:44,900 --> 00:04:50,570 Remember you can press shift tab within the brackets of a function to read the dock string that will 77 00:04:50,570 --> 00:04:55,490 give you a brief overview of what the function does and then don't be afraid if these three steps don't 78 00:04:55,490 --> 00:04:56,050 work. 79 00:04:56,240 --> 00:04:58,270 Don't be afraid to search for it right. 80 00:04:58,310 --> 00:05:03,650 You'll probably end up in places like stack overflow or the documentation I put both documentations 81 00:05:03,650 --> 00:05:07,580 here for cyclone and panders because we're gonna be using a few tools here. 82 00:05:07,700 --> 00:05:11,630 So you might even have to look up the documentation for num Pi or something like that. 83 00:05:11,630 --> 00:05:14,080 Another tool that we're using maybe even that plot lib. 84 00:05:14,180 --> 00:05:17,150 But wherever you get stuck Don't be afraid to search for it. 85 00:05:17,190 --> 00:05:23,100 What separates a good data scientist from a from a poor one is one that doesn't ask questions. 86 00:05:23,150 --> 00:05:28,550 So a good data scientist is just the one that keeps asking more and more questions a good machine learning 87 00:05:28,550 --> 00:05:32,510 engineer is just the same as a as a normal machine learning engineer. 88 00:05:32,540 --> 00:05:35,890 But they ask more questions and this is where the try again comes in. 89 00:05:35,890 --> 00:05:40,520 So once you've searched for it once you've figured out something once you've read something don't be 90 00:05:40,520 --> 00:05:42,620 afraid to take it back to your notebook and try it. 91 00:05:43,100 --> 00:05:48,950 And then finally as I said before what separates a poor machine learning engineer from a good machine 92 00:05:48,950 --> 00:05:54,080 learning engineer is the questions that they are had never be afraid to ask a question you might do 93 00:05:54,080 --> 00:05:59,350 this in the forum space or in the stack overflow section wherever you see fit. 94 00:05:59,510 --> 00:06:02,810 So without any further ado let's jump in. 95 00:06:02,810 --> 00:06:04,420 I'm really excited I hope you are. 96 00:06:04,430 --> 00:06:08,990 Let's get our hands on our first end to end machine learning project.