1 00:00:01,260 --> 00:00:04,470 Machine learning projects can cover many different topics. 2 00:00:04,530 --> 00:00:09,930 It's important to design a framework you can use to approach different kinds of problems. 3 00:00:10,020 --> 00:00:15,390 You can consider what we're about to go through as like a little field guide that you can use for machine 4 00:00:15,390 --> 00:00:16,200 learning. 5 00:00:16,200 --> 00:00:20,470 So when you come up against a problem you can refer back to this field guide and go. 6 00:00:20,580 --> 00:00:21,070 Hold on. 7 00:00:21,090 --> 00:00:23,790 I need to break this problem down into a few little steps. 8 00:00:23,790 --> 00:00:30,390 What does a field guide say the framework we're going to be using comprises six steps. 9 00:00:30,390 --> 00:00:35,820 After working on many machine learning projects across multiple different industries these are the steps 10 00:00:35,880 --> 00:00:39,730 I found which come up time and time again. 11 00:00:39,900 --> 00:00:44,220 We're going to see this this diagram a lot in the next few lectures but in this one we're gonna We're 12 00:00:44,220 --> 00:00:50,750 gonna dive into each of these steps individually and and see what what kind of components they have. 13 00:00:50,770 --> 00:00:53,090 Step one is Problem Definition. 14 00:00:53,350 --> 00:00:59,230 Since we'll be focused on code first practical solutions it's important to define what problem we're 15 00:00:59,230 --> 00:01:00,520 trying to solve. 16 00:01:00,550 --> 00:01:03,400 Is it a supervised or unsupervised learning problem. 17 00:01:03,430 --> 00:01:06,540 Is it a classification or regression problem. 18 00:01:06,580 --> 00:01:11,730 Don't worry we'll see how to figure these out in the next few lectures. 19 00:01:11,750 --> 00:01:19,160 Step two is data since machine learning involves using algorithms to find and learn different patterns 20 00:01:19,160 --> 00:01:20,180 in data. 21 00:01:20,180 --> 00:01:24,480 Data is a requirement for any machine learning project. 22 00:01:24,550 --> 00:01:29,080 The question we're trying to answer here in step two is what kind of data do we have. 23 00:01:29,150 --> 00:01:36,110 Depending on the problem there are different kinds of data structure data such as rows and columns or 24 00:01:36,230 --> 00:01:43,410 what you'd expect to find in an Excel spreadsheet or unstructured data such as images or audio. 25 00:01:43,430 --> 00:01:50,300 Once we know what kind of data we have we can start to make decisions on how to use machine learning 26 00:01:50,300 --> 00:01:58,340 with it Step three is evaluation here will define what success means to us. 27 00:01:58,490 --> 00:02:04,580 Since machine learning since much of machine learning actually is experimental you could keep going 28 00:02:04,580 --> 00:02:08,810 forever trying to improve your results in search of the perfect model. 29 00:02:09,560 --> 00:02:15,530 However since we are practitioners we know the perfect model doesn't exist. 30 00:02:15,530 --> 00:02:22,430 Instead we begin by saying for this machine learning real estate project to be feasible we need at least 31 00:02:22,520 --> 00:02:27,620 a 95 percent accurate model at predicting the cost of houses. 32 00:02:27,620 --> 00:02:34,420 Of course in the beginning this evaluation metric won't be exact and will likely change over time. 33 00:02:34,460 --> 00:02:42,260 But having this at the start of a project gives us something to aim for Step 4 is features. 34 00:02:42,340 --> 00:02:46,410 The question we answer here is what do we already know about the data. 35 00:02:46,410 --> 00:02:51,360 Now even within different types of data there are different kinds of features. 36 00:02:51,490 --> 00:02:57,850 For example for predicting whether or not someone has heart disease you might use their body weight 37 00:02:57,880 --> 00:03:01,090 as a feature since body weight is a number. 38 00:03:01,390 --> 00:03:08,050 It's called a numerical feature and after talking to a doctor they might tell you if someone's body 39 00:03:08,050 --> 00:03:10,230 weight is over a certain number. 40 00:03:10,300 --> 00:03:14,030 They're more likely to have heart disease. 41 00:03:14,170 --> 00:03:18,550 There are more kinds of features such as categorical and derived. 42 00:03:18,550 --> 00:03:20,590 We're going to look at these in future lessons. 43 00:03:20,830 --> 00:03:28,030 But the premise remains a machine learning algorithms goal is to turn these features such as weight 44 00:03:28,240 --> 00:03:36,130 sex blood pressure and chest pain into patterns to make predictions such as whether or not a patient. 45 00:03:36,240 --> 00:03:43,770 We've got unique patient ideas here has heart disease or not Step five is modelling. 46 00:03:43,890 --> 00:03:48,670 Once you've learned a little bit about your data the next step is to model it. 47 00:03:48,980 --> 00:03:55,240 The question here is based on our problem and data what machine learning model should we use. 48 00:03:55,320 --> 00:03:59,970 Unlike other algorithms and sets of instructions you have to write from scratch. 49 00:04:00,000 --> 00:04:05,760 Many of the most useful machine learning algorithms have already been coded for you which is beautiful 50 00:04:05,760 --> 00:04:06,960 for us. 51 00:04:06,960 --> 00:04:12,900 Some models work better on different problems in others and in the beginning your focus will be to figure 52 00:04:12,900 --> 00:04:17,080 out the right model for the right kind of problem. 53 00:04:17,400 --> 00:04:19,660 Step six is experimentation. 54 00:04:19,890 --> 00:04:23,700 All of the steps we've just been through happen in a cycle. 55 00:04:23,700 --> 00:04:29,310 You might start out with one problem definition and find your data isn't suited to it then you might 56 00:04:29,310 --> 00:04:34,680 build a model and find it doesn't work as well as you outlined in your evaluation metric. 57 00:04:35,340 --> 00:04:40,690 So you build another one and you find out this one actually works pretty good. 58 00:04:40,730 --> 00:04:45,480 What's important to remember is although these steps are here those steps that we've been through in 59 00:04:45,480 --> 00:04:51,400 this framework it doesn't mean that they have to be followed in order nor are they set in stone. 60 00:04:51,420 --> 00:04:55,780 Consider them a rough guide now we've been through each of them briefly. 61 00:04:55,960 --> 00:04:58,350 Let's look at each one in a little bit more detail.