1 00:00:00,480 --> 00:00:07,080 OK so in the last section we saw how to choose an estimate how or machine learning model for the problem 2 00:00:07,080 --> 00:00:08,400 that we're working on. 3 00:00:08,490 --> 00:00:09,720 Do we have a regression problem. 4 00:00:09,720 --> 00:00:11,700 Do we have a classification problem. 5 00:00:11,700 --> 00:00:16,470 And we followed this flowchart through and we tried out a bunch of different models such as linear SPC 6 00:00:16,800 --> 00:00:22,590 and ensemble classifiers a.k.a. the random ensemble classifier or the random forest classifier. 7 00:00:22,680 --> 00:00:26,470 That's in and now in this section we're going to have a look at actually. 8 00:00:26,580 --> 00:00:34,530 Let's revisit our what we're covering list so you've ticked off this and inside loan workflow Yep we've 9 00:00:34,620 --> 00:00:40,080 seen getting the data ready Yep we've seen choosing the right estimate a slash algorithm for our problems. 10 00:00:40,080 --> 00:00:40,980 Aha. 11 00:00:40,980 --> 00:00:46,930 Now we're up to here three fit the models slash algorithm and use it to make predictions on our data. 12 00:00:46,930 --> 00:00:51,410 Now we've kind of actually seen this Well we've seen the first half we've seen feeding the model. 13 00:00:51,860 --> 00:00:53,310 Well we're going to have a look at both. 14 00:00:53,310 --> 00:00:57,210 We're going to look at feeding the model and then use it to make predictions on our data. 15 00:00:57,240 --> 00:01:09,060 So what we might do is go three is feed the model slash algorithm on our data and use it to make predictions 16 00:01:10,190 --> 00:01:10,660 wonderful. 17 00:01:10,670 --> 00:01:17,750 And then under here will produce section three point one is fitting the model to the data. 18 00:01:17,750 --> 00:01:22,970 Now we've kind of got some code that will help us out for this already and again not in there. 19 00:01:23,030 --> 00:01:27,620 I'm not a fan of copying pasting code but in the interest of time because otherwise we'll just literally 20 00:01:27,620 --> 00:01:29,960 be re typing everything in this cell 21 00:01:32,970 --> 00:01:34,410 but I trust you might have already done that. 22 00:01:34,940 --> 00:01:36,780 So if not you can retype it out. 23 00:01:36,780 --> 00:01:38,210 I'm going to copy that in here. 24 00:01:38,350 --> 00:01:41,270 And the main line we're concerned about here is the fit. 25 00:01:41,280 --> 00:01:45,660 So if we type in here go fit the model to the data. 26 00:01:45,660 --> 00:01:51,210 So this re step through what we've got here because we didn't type it out we broke our rule but that's 27 00:01:51,210 --> 00:01:53,610 a right from SBA loan dot ensemble. 28 00:01:53,610 --> 00:02:00,180 Import random forest classifier This is saying hey SBA loan ensemble library import the random forest 29 00:02:00,180 --> 00:02:01,560 classifier model. 30 00:02:01,560 --> 00:02:03,350 We've set up a random seed. 31 00:02:03,540 --> 00:02:05,670 We've split our data into x and y. 32 00:02:05,760 --> 00:02:13,170 So X is a feature variables or the features or the data they're different names so I might put in here 33 00:02:13,690 --> 00:02:24,730 or actually maybe up here x equals features feature variables. 34 00:02:25,090 --> 00:02:25,570 Data. 35 00:02:25,570 --> 00:02:33,970 They're just different names for X and Y equals labels or targets or target variables. 36 00:02:33,970 --> 00:02:42,720 So different names for this is just so if you see where you're out and about see different names for 37 00:02:42,720 --> 00:02:46,680 these two X and Y common variable names that you'll see in python. 38 00:02:46,800 --> 00:02:50,780 But these are different like names that you'll be they'll be referred to right. 39 00:02:50,820 --> 00:02:55,890 And so what is fit doing when it goes through here. 40 00:02:55,890 --> 00:02:58,410 So let's have a look at X within visual here. 41 00:02:58,410 --> 00:02:59,850 Right. 42 00:02:59,850 --> 00:03:01,620 Not getting stuck in the theory of things. 43 00:03:01,620 --> 00:03:05,940 So why not head this is x and y and what are we passing and why we're passing it. 44 00:03:05,940 --> 00:03:10,220 The training data and training X and training y to our classifier. 45 00:03:10,230 --> 00:03:11,310 So this is the fit function. 46 00:03:11,310 --> 00:03:16,340 This is what we're concerned about fitting the model to the data what does fit actually do. 47 00:03:16,590 --> 00:03:23,360 Well when we pass x and y to fit it will cause the model to go through all of the examples in X.. 48 00:03:23,450 --> 00:03:30,810 The training data in our case and see what their corresponding y label is and try to figure out the 49 00:03:30,810 --> 00:03:34,520 patterns that lead to different combinations of numbers here. 50 00:03:34,650 --> 00:03:40,810 What leads to having a one is a label or a zero as a label in the case of the bottom end of this. 51 00:03:40,830 --> 00:03:46,020 Why not tail yes or a zero as a label that is fit in a nutshell right. 52 00:03:46,110 --> 00:03:52,260 And how the machine learning model in our case a random forest classifier how this does is is going 53 00:03:52,260 --> 00:03:57,300 to be different depending on the model you use and explaining the detail. 54 00:03:57,300 --> 00:04:03,450 So remember all of these green ones here a different machine learning models and explaining the details 55 00:04:03,450 --> 00:04:09,410 of each and every one would require an entire textbook or a much longer video course. 56 00:04:09,480 --> 00:04:09,960 Right. 57 00:04:10,140 --> 00:04:14,700 For now to become a practitioner to start applying machine learning code. 58 00:04:14,700 --> 00:04:21,270 The most important thing you can remember is that it's kind of similar to how you would figure out patterns 59 00:04:21,300 --> 00:04:23,330 if you had enough time. 60 00:04:23,520 --> 00:04:29,580 You'd look at the feature variables in X the age the sex the cholesterol I'm not sure actually what 61 00:04:29,580 --> 00:04:30,490 the rest of these are. 62 00:04:30,480 --> 00:04:35,700 They're just different health attributes and you'd see what different values of these lead to different 63 00:04:35,700 --> 00:04:41,020 values in y one for heart disease or zero for not heart disease. 64 00:04:41,070 --> 00:04:46,590 This concept regardless of the problem is similar throughout all of machine learning. 65 00:04:46,590 --> 00:04:50,510 So during training which is what the fit function is another term for fit. 66 00:04:51,280 --> 00:04:53,100 So we want training. 67 00:04:53,490 --> 00:04:53,840 Okay. 68 00:04:53,850 --> 00:05:00,780 That's why we're using training training machine learning model. 69 00:05:00,780 --> 00:05:06,300 The model is going to find patterns here and then during testing here. 70 00:05:06,510 --> 00:05:12,180 Or when a machine learning model is in production it's going to use the patterns. 71 00:05:12,180 --> 00:05:22,100 So use the patterns the model has learned that is the crux of all of machine learning right. 72 00:05:22,250 --> 00:05:27,380 The way we become practitioners of this is starting off by first diagnosing what problem we're working 73 00:05:27,380 --> 00:05:30,060 with and then coming in finding a model. 74 00:05:30,110 --> 00:05:35,540 Now you can dive in as deep as you want to actually figure out the nuts and bolts of what a random forest 75 00:05:35,540 --> 00:05:38,210 classifier does or what a random forest regressive does. 76 00:05:38,210 --> 00:05:44,430 That may be a bit of an extension we work on in the future but for now that's the crux of it right. 77 00:05:44,450 --> 00:05:49,340 If you want to figure out what a machine learning model does it take some data. 78 00:05:49,340 --> 00:05:54,710 It finds patterns and then it figure out or in supervised learning it figures out how those patterns 79 00:05:54,710 --> 00:05:56,540 relate to labels. 80 00:05:56,540 --> 00:05:57,740 That's in a nutshell. 81 00:05:57,740 --> 00:06:02,420 If you wanted to you can dive as deep as you like into each of these models not focusing on that for 82 00:06:02,420 --> 00:06:02,980 this cause. 83 00:06:03,000 --> 00:06:08,030 Remember our focus is to write machine learning code rather than dive into the theory if you like I'll 84 00:06:08,030 --> 00:06:12,980 leave some extra resources on some deep dives into the random forest model which is the one we've been 85 00:06:12,980 --> 00:06:13,680 using. 86 00:06:13,790 --> 00:06:20,240 But for now we've finally seen how we've heard what the crux of machine learning is finding patterns 87 00:06:20,240 --> 00:06:21,010 in data. 88 00:06:21,170 --> 00:06:31,070 We're gonna see how we can this is three point to make predictions using a machine learning model. 89 00:06:31,070 --> 00:06:38,060 So now that our models have learned patterns in the data by calling the fit function how can we use 90 00:06:38,060 --> 00:06:41,000 what it's loaned to make some predictions on data. 91 00:06:41,010 --> 00:06:44,140 The model hasn't seen we'll have a look at the next video.