1
00:00:00,390 --> 00:00:01,170
Did you figure it out.

2
00:00:02,440 --> 00:00:02,990
It's okay.

3
00:00:03,010 --> 00:00:03,410
You didn't.

4
00:00:03,610 --> 00:00:06,040
Because we're gonna see what we can do now.

5
00:00:07,020 --> 00:00:09,150
How do you predict with a regression model.

6
00:00:09,690 --> 00:00:17,100
Well the great news is because of the absolute precision on it's really going to be applauded to the

7
00:00:17,100 --> 00:00:22,860
socket line development team the way they've designed the library is absolutely amazing.

8
00:00:22,980 --> 00:00:23,560
Right.

9
00:00:23,580 --> 00:00:32,610
So predict can also be used for regression models.

10
00:00:32,610 --> 00:00:37,110
So we could go back up and copy and paste our code but what we gonna do is practice a little bit more

11
00:00:37,830 --> 00:00:43,460
from SDK learn that ensemble going to import random forest regress.

12
00:00:44,490 --> 00:00:50,820
Because random forest is our friend and a NPR random seed because we want to make sure our results are

13
00:00:50,820 --> 00:00:52,560
reproducible.

14
00:00:52,560 --> 00:00:55,890
Then we're gonna go create the data.

15
00:00:56,070 --> 00:01:00,490
X equals Boston D F don't drop.

16
00:01:00,540 --> 00:01:06,380
Now if you want to remind yourself Boston DFA looks like this.

17
00:01:07,330 --> 00:01:13,000
So remember what we're trying to do is build a model that learns of these features to predict this target

18
00:01:14,190 --> 00:01:15,010
so we go here.

19
00:01:15,040 --> 00:01:24,250
Boston data center dot drop we want to remove the target and we get access equals one Y equals Boston

20
00:01:24,500 --> 00:01:25,530
D.

21
00:01:25,930 --> 00:01:27,220
Why is the target column.

22
00:01:27,250 --> 00:01:39,850
That's the labels and then we're going to go split into training and test sets x test X trying y trying

23
00:01:40,440 --> 00:01:47,950
y test angles try and test split.

24
00:01:48,580 --> 00:01:49,350
There we go.

25
00:01:49,550 --> 00:01:50,290
Wonderful.

26
00:01:50,300 --> 00:02:01,650
And now we're going to instantiate and fit model and go model equals random forest Progresso.

27
00:02:01,650 --> 00:02:04,260
And you know we could even do this in one hit dot fit.

28
00:02:04,260 --> 00:02:05,350
This is a pretty cool thing right.

29
00:02:05,350 --> 00:02:06,210
It's called chaining.

30
00:02:08,040 --> 00:02:10,560
So we've just saved ourselves a line of code.

31
00:02:10,560 --> 00:02:12,830
This would usually be modeled up fit.

32
00:02:12,960 --> 00:02:14,320
Actually one word.

33
00:02:15,150 --> 00:02:21,480
And then what we can do is say this is going to fit the model we've seen if fit does it goes Hey find

34
00:02:21,480 --> 00:02:25,220
the patterns in X train and compare them to y train and figure them out.

35
00:02:25,290 --> 00:02:26,940
Now this is gonna line the patterns.

36
00:02:26,940 --> 00:02:28,220
Now we want to use a pattern.

37
00:02:28,230 --> 00:02:31,920
So we want to make some predictions make predictions.

38
00:02:31,920 --> 00:02:34,400
And this is where the predict function comes into play.

39
00:02:34,500 --> 00:02:40,160
Y reds equals model not predict x test.

40
00:02:40,220 --> 00:02:45,660
So this is saying hey make some predictions on the test dataset and save it to the predictions or Y

41
00:02:45,660 --> 00:02:47,120
spreads variable.

42
00:02:47,360 --> 00:02:50,710
So let's do that Oh what's happened here.

43
00:02:51,930 --> 00:02:53,380
We got it in here.

44
00:02:53,380 --> 00:02:53,860
Check.

45
00:02:53,860 --> 00:02:58,970
We're probably gonna type a number of labels for a forward is not match number of samples.

46
00:02:59,050 --> 00:02:59,680
What have we done.

47
00:03:01,260 --> 00:03:02,460
X train up.

48
00:03:02,480 --> 00:03:08,750
This is what we mixed up we've mixed up these train test saying you're gonna get errors even though

49
00:03:08,750 --> 00:03:14,200
we've talked this one line of code about 20 times in this notebook already still making errors.

50
00:03:14,510 --> 00:03:15,160
Beautiful.

51
00:03:15,170 --> 00:03:21,020
And again we're getting that warning an estimate is I should really just upgrade my SO I GET loan to

52
00:03:21,110 --> 00:03:23,940
zero point to two so it removes that warning.

53
00:03:24,140 --> 00:03:25,160
Then we go here.

54
00:03:25,160 --> 00:03:25,730
Beautiful.

55
00:03:25,730 --> 00:03:29,020
So now we've made some predictions and they're stored in wife reds.

56
00:03:29,060 --> 00:03:31,480
So what does this look like on there's too many there.

57
00:03:31,490 --> 00:03:33,490
Let's just be the verse 10.

58
00:03:33,620 --> 00:03:40,870
Now let's compare this to our test label's and we want to put that in a number higher array.

59
00:03:40,890 --> 00:03:45,410
So it just kind of looks a bit similar excellent.

60
00:03:45,430 --> 00:03:51,670
So this is what our regression model has predicted based on the X test data that it's looked at and

61
00:03:51,670 --> 00:03:52,660
this is the truth.

62
00:03:53,290 --> 00:03:55,070
So what we want to do is evaluate there.

63
00:03:55,070 --> 00:04:01,000
So how do you think you might evaluate a regression model trying to predict a number what you might

64
00:04:01,000 --> 00:04:05,470
do is figure out how far it is away from age prediction.

65
00:04:05,470 --> 00:04:11,020
So see here the first prediction is twenty three point 0 0 to the actual label is twenty three point

66
00:04:11,020 --> 00:04:16,970
six this one the prediction is thirty point eight to six in the actual label is thirty two point four.

67
00:04:17,050 --> 00:04:21,060
And so we could do that for each and every sample and maybe get the average.

68
00:04:21,130 --> 00:04:24,550
Well that's a valuation metric called mean absolute error.

69
00:04:24,910 --> 00:04:26,110
So that's what we can do.

70
00:04:26,260 --> 00:04:38,160
Compare the predictions to the true so we want to go from S.K. loan metrics import mean absolute error.

71
00:04:38,470 --> 00:04:49,490
Can we go there to say no typos and then I do a typo classic mean absolute error why test.

72
00:04:49,490 --> 00:04:55,760
So this is saying hey exactly what we just said let's go through each and every prediction.

73
00:04:55,940 --> 00:04:56,240
Right.

74
00:04:56,240 --> 00:05:03,150
So why spreads and compare them to the test labels and then figure out what the difference is between.

75
00:05:03,140 --> 00:05:09,800
So we do twenty three point six minus twenty three point 0 0 2 and then thirty two point four minus

76
00:05:09,980 --> 00:05:16,310
thirty point eight to six etc. etc. across the entire dataset and then we'll figure out what the difference

77
00:05:16,310 --> 00:05:20,940
is for each sample and then we'll animal up and then figure out the average.

78
00:05:20,960 --> 00:05:22,960
So that's what mean absolute error does.

79
00:05:23,000 --> 00:05:25,690
We'll do that boom.

80
00:05:25,720 --> 00:05:32,020
So what this is essentially saying is that on average for every single prediction here what we're trying

81
00:05:32,020 --> 00:05:38,170
to do we've trying to model that on average predicts something that is two point two.

82
00:05:38,310 --> 00:05:43,980
This is this figure here or two point one two point one away from the target.

83
00:05:43,990 --> 00:05:51,270
So on average it might predict 22 or 26 or 23 or 19 et cetera et cetera et cetera.

84
00:05:51,280 --> 00:05:53,310
So that's not too bad right.

85
00:05:53,350 --> 00:05:55,450
Two off and it may be pretty bad right.

86
00:05:55,450 --> 00:06:00,040
If you wanted to be really accurate but this all depend on what kind of problem you're working with

87
00:06:00,460 --> 00:06:04,840
what sort of error metric you allow or what sort of evaluation metric you allow.

88
00:06:04,990 --> 00:06:07,330
Depends on the problem you're working on.

89
00:06:07,510 --> 00:06:16,890
And speaking of evaluation metrics I believe that's next in what we're covering evaluating a model.

90
00:06:16,990 --> 00:06:20,950
We've kind of just touched on a little bit here but we're going to go a bit more in depth in the next

91
00:06:20,950 --> 00:06:21,790
section.

92
00:06:21,790 --> 00:06:26,950
So what we've seen in this section is fitting a model to some training data set a.k.a. finding patterns

93
00:06:26,950 --> 00:06:33,280
in data finding patterns between x and y and then using a train model using the patterns that it's learned

94
00:06:33,520 --> 00:06:36,770
to make predictions on our data.

95
00:06:36,770 --> 00:06:37,130
All right.

96
00:06:37,460 --> 00:06:39,940
So take a little break go back through what we've done.

97
00:06:40,100 --> 00:06:46,080
See if you can get a model to make some predictions and some data and then in the next section we'll

98
00:06:46,080 --> 00:06:48,600
look at how we can evaluate our models.