1 00:00:00,650 --> 00:00:03,440 Step five modeling part two. 2 00:00:03,560 --> 00:00:07,330 Choosing mentioned in the last listen. 3 00:00:07,340 --> 00:00:09,100 There were three parts to modeling. 4 00:00:09,170 --> 00:00:13,310 Choosing a model choosing a model and comparing models. 5 00:00:13,310 --> 00:00:20,990 Once you've got your data split into training validation and test sets you can start to go through each 6 00:00:20,990 --> 00:00:22,390 of these steps. 7 00:00:22,490 --> 00:00:28,700 In this lesson we're going to cover some points on choosing a model which is you choose a model and 8 00:00:28,700 --> 00:00:30,280 train it on your training data. 9 00:00:31,100 --> 00:00:36,620 Unlike creating your own algorithms from scratch there are many pre-built machine learning models which 10 00:00:36,710 --> 00:00:40,260 you can take advantage of when you first begin. 11 00:00:40,400 --> 00:00:46,430 Your main goal will be knowing what kind of machine learning algorithm to use with what kind of problem 12 00:00:46,910 --> 00:00:52,560 this is because some algorithms work better than others on different types of data. 13 00:00:52,610 --> 00:00:56,630 We'll have a look at this specifically when we get hands on with our projects. 14 00:00:56,780 --> 00:01:03,920 But for now a tidbit to remember is if you're working with structured data decision trees such as random 15 00:01:03,920 --> 00:01:11,060 forest and gradient boosting algorithms like cat boost in x g boost tend to work best and if you're 16 00:01:11,060 --> 00:01:18,790 working with unstructured data deep learning neural networks and transfer learning tend to work best. 17 00:01:18,860 --> 00:01:22,140 Once you've chosen a model your next step is to train. 18 00:01:22,520 --> 00:01:27,680 The main goal here will be to line up the inputs and outputs. 19 00:01:27,770 --> 00:01:34,520 For example in our heart disease problem we want our model to look at the feature variables the inputs 20 00:01:35,090 --> 00:01:40,500 and then find the patterns and use them to predict the target variable. 21 00:01:40,500 --> 00:01:47,060 So remember from previous lesson these variables here are the feature variables we want to use these 22 00:01:47,060 --> 00:01:49,580 to predict the target variables. 23 00:01:49,580 --> 00:01:58,600 Another common naming setting is to use x which is the data to predict Y which is the labels different 24 00:01:58,600 --> 00:02:01,450 machine learning algorithms have different ways of doing this. 25 00:02:01,900 --> 00:02:06,340 We'll see how to do it for a handful of useful ones in future projects. 26 00:02:06,580 --> 00:02:11,020 And remember training a model takes place on the training data split. 27 00:02:11,050 --> 00:02:14,490 This is where your model learns the course material. 28 00:02:14,560 --> 00:02:19,780 We don't want to let our models cheat and see the final exam before they do their study. 29 00:02:19,900 --> 00:02:25,790 Depending on how much data you have and how complex your model is training may take a while. 30 00:02:25,870 --> 00:02:32,710 One of your biggest goals when training a model is to minimize the times between experiments. 31 00:02:32,770 --> 00:02:38,420 So sometimes this will mean to use a small portion of your data first. 32 00:02:38,590 --> 00:02:45,340 For example if your training dataset had 100000 examples you might start training a model with only 33 00:02:45,340 --> 00:02:48,770 the first 10000 and see how it goes. 34 00:02:48,790 --> 00:02:55,240 You might also decide to use a less complicated model to begin with deep model such as no networks generally 35 00:02:55,360 --> 00:02:58,350 take longer to train than other kinds of models. 36 00:02:58,350 --> 00:03:02,590 Now this is something worth considering when it comes to training your own models. 37 00:03:02,590 --> 00:03:09,730 For example if an experiment takes you three hours or even up to a couple of days for a small percentage 38 00:03:09,730 --> 00:03:15,050 abuse and performance of your model you might consider is this experiment actually worth it. 39 00:03:15,280 --> 00:03:18,160 Because machine learning is highly iterative. 40 00:03:18,160 --> 00:03:25,090 We want to minimize this experimentation time that we can go from step 1 to step 2 to step 3. 41 00:03:25,090 --> 00:03:32,930 But again if this looks confusing we'll see it in practice in the hands on projects and things to remember. 42 00:03:33,100 --> 00:03:36,280 Some models work better than others and different problems. 43 00:03:36,280 --> 00:03:38,730 Don't be afraid to try things and this is really important. 44 00:03:38,730 --> 00:03:41,910 Machine learning is as we said before a highly iterative process. 45 00:03:41,920 --> 00:03:48,760 So some things that you try out may not work the first time and that means you just neglect that thing 46 00:03:48,760 --> 00:03:51,240 going forward and try something else. 47 00:03:51,280 --> 00:03:54,910 Start small and build up add complexity as you need. 48 00:03:54,910 --> 00:03:59,620 What this means is that for example if you had one hundred thousand examples we're going to start on 49 00:03:59,620 --> 00:04:04,870 ten thousand we're going to use a simple model first to begin with rather than going to the biggest 50 00:04:04,870 --> 00:04:10,690 and latest and greatest model because what we're after is practical results not something that's that's 51 00:04:10,700 --> 00:04:12,190 the best on paper. 52 00:04:12,190 --> 00:04:15,180 We want to see something that can actually be used in the real world. 53 00:04:16,030 --> 00:04:21,850 And part of your experiments will involve tuning a model which shows good initial results to get better 54 00:04:21,910 --> 00:04:22,660 results. 55 00:04:22,660 --> 00:04:27,730 Like tuning a car if your car does well on on one track it might not do well on another track. 56 00:04:27,940 --> 00:04:29,950 So you turn it up. 57 00:04:29,950 --> 00:04:34,510 Let's have a look at how to do that but instead of four cars we'll do it for machine learning models.