1 00:00:00,390 --> 00:00:01,170 Did you figure it out. 2 00:00:02,440 --> 00:00:02,990 It's okay. 3 00:00:03,010 --> 00:00:03,410 You didn't. 4 00:00:03,610 --> 00:00:06,040 Because we're gonna see what we can do now. 5 00:00:07,020 --> 00:00:09,150 How do you predict with a regression model. 6 00:00:09,690 --> 00:00:17,100 Well the great news is because of the absolute precision on it's really going to be applauded to the 7 00:00:17,100 --> 00:00:22,860 socket line development team the way they've designed the library is absolutely amazing. 8 00:00:22,980 --> 00:00:23,560 Right. 9 00:00:23,580 --> 00:00:32,610 So predict can also be used for regression models. 10 00:00:32,610 --> 00:00:37,110 So we could go back up and copy and paste our code but what we gonna do is practice a little bit more 11 00:00:37,830 --> 00:00:43,460 from SDK learn that ensemble going to import random forest regress. 12 00:00:44,490 --> 00:00:50,820 Because random forest is our friend and a NPR random seed because we want to make sure our results are 13 00:00:50,820 --> 00:00:52,560 reproducible. 14 00:00:52,560 --> 00:00:55,890 Then we're gonna go create the data. 15 00:00:56,070 --> 00:01:00,490 X equals Boston D F don't drop. 16 00:01:00,540 --> 00:01:06,380 Now if you want to remind yourself Boston DFA looks like this. 17 00:01:07,330 --> 00:01:13,000 So remember what we're trying to do is build a model that learns of these features to predict this target 18 00:01:14,190 --> 00:01:15,010 so we go here. 19 00:01:15,040 --> 00:01:24,250 Boston data center dot drop we want to remove the target and we get access equals one Y equals Boston 20 00:01:24,500 --> 00:01:25,530 D. 21 00:01:25,930 --> 00:01:27,220 Why is the target column. 22 00:01:27,250 --> 00:01:39,850 That's the labels and then we're going to go split into training and test sets x test X trying y trying 23 00:01:40,440 --> 00:01:47,950 y test angles try and test split. 24 00:01:48,580 --> 00:01:49,350 There we go. 25 00:01:49,550 --> 00:01:50,290 Wonderful. 26 00:01:50,300 --> 00:02:01,650 And now we're going to instantiate and fit model and go model equals random forest Progresso. 27 00:02:01,650 --> 00:02:04,260 And you know we could even do this in one hit dot fit. 28 00:02:04,260 --> 00:02:05,350 This is a pretty cool thing right. 29 00:02:05,350 --> 00:02:06,210 It's called chaining. 30 00:02:08,040 --> 00:02:10,560 So we've just saved ourselves a line of code. 31 00:02:10,560 --> 00:02:12,830 This would usually be modeled up fit. 32 00:02:12,960 --> 00:02:14,320 Actually one word. 33 00:02:15,150 --> 00:02:21,480 And then what we can do is say this is going to fit the model we've seen if fit does it goes Hey find 34 00:02:21,480 --> 00:02:25,220 the patterns in X train and compare them to y train and figure them out. 35 00:02:25,290 --> 00:02:26,940 Now this is gonna line the patterns. 36 00:02:26,940 --> 00:02:28,220 Now we want to use a pattern. 37 00:02:28,230 --> 00:02:31,920 So we want to make some predictions make predictions. 38 00:02:31,920 --> 00:02:34,400 And this is where the predict function comes into play. 39 00:02:34,500 --> 00:02:40,160 Y reds equals model not predict x test. 40 00:02:40,220 --> 00:02:45,660 So this is saying hey make some predictions on the test dataset and save it to the predictions or Y 41 00:02:45,660 --> 00:02:47,120 spreads variable. 42 00:02:47,360 --> 00:02:50,710 So let's do that Oh what's happened here. 43 00:02:51,930 --> 00:02:53,380 We got it in here. 44 00:02:53,380 --> 00:02:53,860 Check. 45 00:02:53,860 --> 00:02:58,970 We're probably gonna type a number of labels for a forward is not match number of samples. 46 00:02:59,050 --> 00:02:59,680 What have we done. 47 00:03:01,260 --> 00:03:02,460 X train up. 48 00:03:02,480 --> 00:03:08,750 This is what we mixed up we've mixed up these train test saying you're gonna get errors even though 49 00:03:08,750 --> 00:03:14,200 we've talked this one line of code about 20 times in this notebook already still making errors. 50 00:03:14,510 --> 00:03:15,160 Beautiful. 51 00:03:15,170 --> 00:03:21,020 And again we're getting that warning an estimate is I should really just upgrade my SO I GET loan to 52 00:03:21,110 --> 00:03:23,940 zero point to two so it removes that warning. 53 00:03:24,140 --> 00:03:25,160 Then we go here. 54 00:03:25,160 --> 00:03:25,730 Beautiful. 55 00:03:25,730 --> 00:03:29,020 So now we've made some predictions and they're stored in wife reds. 56 00:03:29,060 --> 00:03:31,480 So what does this look like on there's too many there. 57 00:03:31,490 --> 00:03:33,490 Let's just be the verse 10. 58 00:03:33,620 --> 00:03:40,870 Now let's compare this to our test label's and we want to put that in a number higher array. 59 00:03:40,890 --> 00:03:45,410 So it just kind of looks a bit similar excellent. 60 00:03:45,430 --> 00:03:51,670 So this is what our regression model has predicted based on the X test data that it's looked at and 61 00:03:51,670 --> 00:03:52,660 this is the truth. 62 00:03:53,290 --> 00:03:55,070 So what we want to do is evaluate there. 63 00:03:55,070 --> 00:04:01,000 So how do you think you might evaluate a regression model trying to predict a number what you might 64 00:04:01,000 --> 00:04:05,470 do is figure out how far it is away from age prediction. 65 00:04:05,470 --> 00:04:11,020 So see here the first prediction is twenty three point 0 0 to the actual label is twenty three point 66 00:04:11,020 --> 00:04:16,970 six this one the prediction is thirty point eight to six in the actual label is thirty two point four. 67 00:04:17,050 --> 00:04:21,060 And so we could do that for each and every sample and maybe get the average. 68 00:04:21,130 --> 00:04:24,550 Well that's a valuation metric called mean absolute error. 69 00:04:24,910 --> 00:04:26,110 So that's what we can do. 70 00:04:26,260 --> 00:04:38,160 Compare the predictions to the true so we want to go from S.K. loan metrics import mean absolute error. 71 00:04:38,470 --> 00:04:49,490 Can we go there to say no typos and then I do a typo classic mean absolute error why test. 72 00:04:49,490 --> 00:04:55,760 So this is saying hey exactly what we just said let's go through each and every prediction. 73 00:04:55,940 --> 00:04:56,240 Right. 74 00:04:56,240 --> 00:05:03,150 So why spreads and compare them to the test labels and then figure out what the difference is between. 75 00:05:03,140 --> 00:05:09,800 So we do twenty three point six minus twenty three point 0 0 2 and then thirty two point four minus 76 00:05:09,980 --> 00:05:16,310 thirty point eight to six etc. etc. across the entire dataset and then we'll figure out what the difference 77 00:05:16,310 --> 00:05:20,940 is for each sample and then we'll animal up and then figure out the average. 78 00:05:20,960 --> 00:05:22,960 So that's what mean absolute error does. 79 00:05:23,000 --> 00:05:25,690 We'll do that boom. 80 00:05:25,720 --> 00:05:32,020 So what this is essentially saying is that on average for every single prediction here what we're trying 81 00:05:32,020 --> 00:05:38,170 to do we've trying to model that on average predicts something that is two point two. 82 00:05:38,310 --> 00:05:43,980 This is this figure here or two point one two point one away from the target. 83 00:05:43,990 --> 00:05:51,270 So on average it might predict 22 or 26 or 23 or 19 et cetera et cetera et cetera. 84 00:05:51,280 --> 00:05:53,310 So that's not too bad right. 85 00:05:53,350 --> 00:05:55,450 Two off and it may be pretty bad right. 86 00:05:55,450 --> 00:06:00,040 If you wanted to be really accurate but this all depend on what kind of problem you're working with 87 00:06:00,460 --> 00:06:04,840 what sort of error metric you allow or what sort of evaluation metric you allow. 88 00:06:04,990 --> 00:06:07,330 Depends on the problem you're working on. 89 00:06:07,510 --> 00:06:16,890 And speaking of evaluation metrics I believe that's next in what we're covering evaluating a model. 90 00:06:16,990 --> 00:06:20,950 We've kind of just touched on a little bit here but we're going to go a bit more in depth in the next 91 00:06:20,950 --> 00:06:21,790 section. 92 00:06:21,790 --> 00:06:26,950 So what we've seen in this section is fitting a model to some training data set a.k.a. finding patterns 93 00:06:26,950 --> 00:06:33,280 in data finding patterns between x and y and then using a train model using the patterns that it's learned 94 00:06:33,520 --> 00:06:36,770 to make predictions on our data. 95 00:06:36,770 --> 00:06:37,130 All right. 96 00:06:37,460 --> 00:06:39,940 So take a little break go back through what we've done. 97 00:06:40,100 --> 00:06:46,080 See if you can get a model to make some predictions and some data and then in the next section we'll 98 00:06:46,080 --> 00:06:48,600 look at how we can evaluate our models.