1
00:00:00,270 --> 00:00:01,620
All righty then.

2
00:00:01,620 --> 00:00:06,180
So now we've covered some classification metrics and by the way I've just put this little paragraph

3
00:00:06,180 --> 00:00:11,580
here so you can check it out of when to use different classification metrics just to kind of break it

4
00:00:11,580 --> 00:00:13,020
down in five dot points for you.

5
00:00:13,800 --> 00:00:18,600
But now we're going to cover some regression model evaluation metrics and put your hand up if you're

6
00:00:18,600 --> 00:00:20,790
ready.

7
00:00:20,870 --> 00:00:21,920
I'm holding my hand up too.

8
00:00:21,920 --> 00:00:22,900
Don't worry.

9
00:00:22,910 --> 00:00:23,300
All right.

10
00:00:23,510 --> 00:00:35,390
So let's put in a little hitting for point to point to regression model evaluation metrics so as always

11
00:00:35,450 --> 00:00:38,060
the documentation is ready to go.

12
00:00:38,060 --> 00:00:39,030
So we've got this here.

13
00:00:39,050 --> 00:00:40,810
I might put this in here.

14
00:00:41,120 --> 00:00:48,510
Another link here model evaluation metrics documentation.

15
00:00:48,510 --> 00:00:53,090
Now I don't want you to be scared like read in the documentation like I was when I first started.

16
00:00:53,100 --> 00:00:58,380
If you read this and there might be a few times we go through it and you go wow you see something like

17
00:00:58,380 --> 00:01:02,280
this you see a bunch of different words here that you don't really understand.

18
00:01:02,310 --> 00:01:06,480
You look at a bunch of different examples you're reading this and it's like kind of confusing.

19
00:01:06,690 --> 00:01:07,490
Don't worry.

20
00:01:07,620 --> 00:01:12,480
That's exactly how I started but after a little bit of practice after implementing the code like we're

21
00:01:12,480 --> 00:01:13,530
about to do.

22
00:01:13,530 --> 00:01:15,900
That's why I'm such a big fan of implementing code.

23
00:01:15,930 --> 00:01:20,760
I started to understand a little bit more started to understand about what I needed to use.

24
00:01:20,880 --> 00:01:27,270
So speaking of what we need to use for regression we're going to look at three different ones.

25
00:01:27,270 --> 00:01:30,170
Now these are three of the most common three of the most useful.

26
00:01:30,220 --> 00:01:41,560
The first one is r squared pronounced r squared or coefficient of determination beautiful.

27
00:01:41,750 --> 00:01:51,380
And the second one is main absolute area which is also known as M.I.T. and the third one is main squared

28
00:01:53,980 --> 00:01:57,400
error which is also known as MSE.

29
00:01:57,650 --> 00:01:58,420
Wonderful.

30
00:01:58,430 --> 00:01:59,450
So now we've got that.

31
00:01:59,510 --> 00:02:01,130
Let's bring back our regression model.

32
00:02:01,160 --> 00:02:04,130
So from S.K. learn dot ensemble

33
00:02:07,310 --> 00:02:11,880
import random forest regress.

34
00:02:13,100 --> 00:02:13,710
Wonderful.

35
00:02:13,880 --> 00:02:15,620
We'll set up a random seed.

36
00:02:15,860 --> 00:02:18,070
So our results are reproducible.

37
00:02:18,170 --> 00:02:23,990
Then we'll get our X which is the feature variables from our Boston data frame.

38
00:02:24,060 --> 00:02:30,110
So we want to just drop the target and access equals 1.

39
00:02:30,110 --> 00:02:35,600
Labels are the target column or is the target column.

40
00:02:35,600 --> 00:02:37,160
Wonderful.

41
00:02:37,160 --> 00:02:40,580
And now we'll split our data into train test

42
00:02:47,090 --> 00:02:52,250
using and test split passing it X and Y

43
00:02:54,850 --> 00:02:56,010
wonderful.

44
00:02:56,320 --> 00:03:00,730
And then what we're going to do is instantiate our random forest regress.

45
00:03:01,180 --> 00:03:05,370
Because we want to build a regression model so we can evaluate it.

46
00:03:05,680 --> 00:03:10,360
Wonderful model dot fit X train y train.

47
00:03:10,420 --> 00:03:15,910
Now this is going to give us a warning because we haven't set a number of estimates to be in 100 and

48
00:03:15,910 --> 00:03:21,900
we're going to run this invalid syntax classic from now on.

49
00:03:22,000 --> 00:03:23,380
Important.

50
00:03:23,460 --> 00:03:24,000
Let get.

51
00:03:24,440 --> 00:03:29,130
Oh Boston IDF is not defined because they've got typos.

52
00:03:29,340 --> 00:03:30,130
There we go.

53
00:03:30,360 --> 00:03:34,050
See this what happens right when you're writing code don't expect to get it right the first time you're

54
00:03:34,050 --> 00:03:35,280
gonna get errors.

55
00:03:35,280 --> 00:03:41,430
So always remember if in doubt run the code and then go back and fix the errors when you need to use

56
00:03:41,490 --> 00:03:46,500
dot score we've seen this line test wonderful.

57
00:03:46,700 --> 00:03:50,060
And so now ask Where can be calculated using.

58
00:03:50,060 --> 00:03:51,990
Well actually when my getting asked Where from.

59
00:03:52,460 --> 00:04:00,740
So this is remember regression metric number one and put it here involved so we know that we're looking

60
00:04:00,740 --> 00:04:07,140
at ask grand so that is the default metric here.

61
00:04:07,300 --> 00:04:08,010
Right.

62
00:04:08,020 --> 00:04:10,340
And so the coefficient of determination.

63
00:04:10,390 --> 00:04:12,610
R squared of the prediction.

64
00:04:12,610 --> 00:04:21,110
Okay so if we wanted to figure out what exactly are squared was where we do that we go to Wikipedia

65
00:04:21,560 --> 00:04:27,080
in statistics the coefficient of determination denoted r squared or square and pronounce r squared is

66
00:04:27,080 --> 00:04:32,840
a proportion of the variance in the dependent variable that is predictable from independent variables.

67
00:04:33,230 --> 00:04:35,500
There's a pretty complicated definitions.

68
00:04:35,640 --> 00:04:39,560
Well after doing some research I created one of my own right.

69
00:04:39,560 --> 00:04:43,850
This is what I'd like you to do right if you ever come across something and you see the formal definitions

70
00:04:43,850 --> 00:04:44,190
of them.

71
00:04:44,210 --> 00:04:48,350
When you look at them from first glance seem like this is kind of confusing is to go in and search and

72
00:04:48,350 --> 00:04:51,760
find a way to explain things in your own words.

73
00:04:51,770 --> 00:04:59,370
So in my own words and r squared value compares your models predictions here.

74
00:04:59,720 --> 00:05:09,720
What squared does compares your model's predictions to the main of the target.

75
00:05:10,290 --> 00:05:15,690
This is the values of our squared can range from negative infinity.

76
00:05:15,690 --> 00:05:17,080
So that's the lowest possible.

77
00:05:17,760 --> 00:05:19,340
That's a very poor model.

78
00:05:21,230 --> 00:05:22,200
2 1.

79
00:05:22,240 --> 00:05:25,530
Now this is where I love having example right for example.

80
00:05:25,900 --> 00:05:39,530
If all your model does is predict the main of the targets it's a square value would be zero.

81
00:05:41,500 --> 00:05:55,910
And if your model perfectly predicts a range of numbers it's a square value would be 1.

82
00:05:56,680 --> 00:06:02,140
If ever you come across something you don't understand research you look at different sources go to

83
00:06:02,140 --> 00:06:08,860
Wikipedia up here read the documentation right see an example but then most importantly implement it

84
00:06:08,870 --> 00:06:11,710
yourself so I want you not to take my word for it here.

85
00:06:11,830 --> 00:06:16,300
For example if all your model does is predict the mean of the targets it's ask what value would be zero

86
00:06:16,720 --> 00:06:20,590
and if you model perfectly predicts a range of numbers it's ask when value would be 1.

87
00:06:20,590 --> 00:06:25,730
So as I said don't take my word for it but let's see this in action.

88
00:06:25,750 --> 00:06:32,360
So from SBA loan metrics import this is another way to calculate ask where it is.

89
00:06:32,380 --> 00:06:34,630
You could just go to score from there.

90
00:06:34,680 --> 00:06:36,740
So I get loan metrics.

91
00:06:36,950 --> 00:06:43,650
Then we go here and we want to fill an array with why test main.

92
00:06:43,680 --> 00:06:48,510
So the main values from the Y test dataset so we can do that pretty easily with number y.

93
00:06:48,540 --> 00:06:56,190
So why test main equals NDP full so meaningful array of Len.

94
00:06:56,370 --> 00:06:57,300
Why test.

95
00:06:57,300 --> 00:07:01,260
So we want it to be the same length of Y test and then we're gonna fill it with Y.

96
00:07:01,260 --> 00:07:03,100
Tests don't mean.

97
00:07:03,100 --> 00:07:04,080
So does that make sense.

98
00:07:06,180 --> 00:07:06,780
Beautiful.

99
00:07:07,230 --> 00:07:10,230
So if we look at why tests don't mean

100
00:07:12,940 --> 00:07:21,460
all this array is is an array full of those values well mobile will move on and so remember what my

101
00:07:21,460 --> 00:07:22,600
example said.

102
00:07:22,600 --> 00:07:28,600
If all your model does is predict the mean of the target it's r squared value would be zero.

103
00:07:28,620 --> 00:07:34,380
Okay well let's test this out because that's what we do where we're engineers we test things out to

104
00:07:34,390 --> 00:07:35,090
score.

105
00:07:35,290 --> 00:07:36,260
Why test.

106
00:07:36,310 --> 00:07:41,670
We're going to compare the true labels all our model did was predict the mean.

107
00:07:41,810 --> 00:07:47,770
So very simple model 0 and gets an R2 score of 0.

108
00:07:48,450 --> 00:07:50,260
Well okay.

109
00:07:50,290 --> 00:07:55,660
And now the second part of that example was and if your model perfectly predicts a range of numbers

110
00:07:55,810 --> 00:07:57,790
it's r squared value would be 1.

111
00:07:58,420 --> 00:07:58,740
Okay.

112
00:07:58,770 --> 00:08:04,000
So if our model got the exact same predictions as the test values it's r squared value would be 1.

113
00:08:04,000 --> 00:08:06,490
So let's see then we can do this on pretty easily.

114
00:08:06,490 --> 00:08:07,280
Why test.

115
00:08:07,300 --> 00:08:13,780
Now I forgot the exact same predictions if it predicted the Y test labels perfectly it would end up

116
00:08:13,780 --> 00:08:15,130
with a score of 1.

117
00:08:15,130 --> 00:08:16,990
So what does this tell us.

118
00:08:16,990 --> 00:08:19,240
Well this gives us a quick indication.

119
00:08:19,240 --> 00:08:25,420
This score function which implements the coefficient of determination a.k.a. the r squared gives us

120
00:08:25,420 --> 00:08:31,120
a quick insight into how closely our model's predictions are to perfect predictions.

121
00:08:31,120 --> 00:08:35,940
So of course 1.0 is perfect and so we've got 1.0.

122
00:08:35,950 --> 00:08:40,120
We saw that here and if it was predicting nothing but just the mean it would get zero.

123
00:08:40,150 --> 00:08:43,330
And it says here that the value can range from negative infinity.

124
00:08:43,330 --> 00:08:48,970
Well what this means is that if our model predicted completely off the radar this value can actually

125
00:08:48,970 --> 00:08:50,040
go negative.

126
00:08:50,080 --> 00:08:55,240
So the main is actually an okay prediction compared to something that was just predicting all zeros

127
00:08:55,360 --> 00:08:57,420
right now.

128
00:08:57,430 --> 00:09:02,920
Now we've seen r squared it kind of give us quick insight into how well our model may be doing but it

129
00:09:03,070 --> 00:09:07,170
doesn't really tell us how far off each prediction is.

130
00:09:07,210 --> 00:09:10,030
So to do that we're going to use mean absolute error.

131
00:09:10,060 --> 00:09:11,410
So we'll look at that in the next video.