1
00:00:00,933 --> 00:00:03,133
Hello and welcome
back to the course of Machine Learning.

2
00:00:03,133 --> 00:00:06,400
Today we're talking about the random
forest and the intuition behind it.

3
00:00:06,766 --> 00:00:09,433
And specifically we're going to be talking
about the random forest

4
00:00:09,433 --> 00:00:12,500
applied to regression trees
rather than classification trees.

5
00:00:12,833 --> 00:00:14,300
But the concept is very similar.

6
00:00:14,300 --> 00:00:16,033
And you'll find that
this tutorial is very similar

7
00:00:16,033 --> 00:00:19,033
to the one for the random forest
on classification trees.

8
00:00:19,533 --> 00:00:22,900
All right,
so random forest and ensemble learning.

9
00:00:22,900 --> 00:00:24,166
Ensemble learning.

10
00:00:24,166 --> 00:00:26,566
So Random forest is a version of ensemble
learning.

11
00:00:26,566 --> 00:00:31,166
You've got other versions
such as gradient boosting and ensemble

12
00:00:31,266 --> 00:00:36,500
learning
is when you take multiple algorithms

13
00:00:36,866 --> 00:00:40,900
or the same algorithm multiple times,
and you put them together

14
00:00:40,900 --> 00:00:43,900
to make something much more powerful
than the original.

15
00:00:43,900 --> 00:00:45,466
And let's see how this works.

16
00:00:45,466 --> 00:00:49,233
So when you pick a random
k data points from your training set.

17
00:00:49,233 --> 00:00:53,133
So now we're kind of going to leverage
a lot of what we talked about

18
00:00:53,466 --> 00:00:56,266
in the section on regression trees.

19
00:00:56,266 --> 00:01:00,066
So you remember there
we had lots of data points.

20
00:01:00,066 --> 00:01:01,800
And then we built a regression tree.

21
00:01:01,800 --> 00:01:05,033
And or we built
the decision tree and used that

22
00:01:05,500 --> 00:01:08,700
to, forecast the value.

23
00:01:08,700 --> 00:01:12,733
That would be assigned or the Y value
for any, new element that would be

24
00:01:12,733 --> 00:01:15,766
added to our data set as the average
in the terminal leaves, basically.

25
00:01:16,100 --> 00:01:18,566
So here what we're doing
is we're using the whole data set.

26
00:01:18,566 --> 00:01:19,033
We had.

27
00:01:19,033 --> 00:01:22,033
And we're only picking k data points
from their training set.

28
00:01:22,400 --> 00:01:25,066
Then we're going to build a decision tree

29
00:01:25,066 --> 00:01:28,066
associated to these k data points.

30
00:01:28,200 --> 00:01:31,566
Rather than building a decision tree
based on everything in your data set,

31
00:01:31,700 --> 00:01:34,800
you just building a decision tree
based on those data points

32
00:01:34,800 --> 00:01:37,800
that just like
sort of a subset of your data set,

33
00:01:38,033 --> 00:01:39,700
then you choose the number of trees

34
00:01:39,700 --> 00:01:41,633
that you want to build
and you repeat steps on and to.

35
00:01:41,633 --> 00:01:43,800
So you just keep building and building
and building these trees.

36
00:01:43,800 --> 00:01:47,633
You're building a lot of regression
decision trees

37
00:01:48,033 --> 00:01:51,733
and then finally you
to use all of them to predict.

38
00:01:51,733 --> 00:01:55,100
So for a new data point,
make each one of you and trees

39
00:01:55,100 --> 00:02:00,166
predict the value of y
for the data point in question and assign

40
00:02:00,166 --> 00:02:04,433
the new data point the average across
all of the predicted Y values.

41
00:02:04,466 --> 00:02:05,466
So basically,

42
00:02:05,466 --> 00:02:09,066
instead of just getting one prediction,
you're getting lots of predictions

43
00:02:09,066 --> 00:02:09,966
by default.

44
00:02:09,966 --> 00:02:13,300
Usually these algorithms are set to
about 500 trees at least.

45
00:02:13,633 --> 00:02:17,100
So you're getting 500 predictions
for the value of y.

46
00:02:17,566 --> 00:02:19,533
And then you're taking the average
across those.

47
00:02:19,533 --> 00:02:22,366
And in that way, you're not just

48
00:02:22,366 --> 00:02:25,366
predicting based on one tree predicting
or based on the forest of trees.

49
00:02:25,566 --> 00:02:29,333
And that improves, the accuracy
of your prediction because it is

50
00:02:29,866 --> 00:02:31,766
you're taking the average
of many predictions.

51
00:02:31,766 --> 00:02:36,366
And therefore, even if one is,
some forest difference and somehow,

52
00:02:36,933 --> 00:02:39,866
one of the decision trees
was built exactly,

53
00:02:39,866 --> 00:02:43,233
perfectly because the weight of those data
points were selected.

54
00:02:43,233 --> 00:02:46,500
It just didn't turn out as a perfect tree
or a great tree.

55
00:02:46,500 --> 00:02:50,400
Even, if you were using it by itself,
you'd get a bad prediction,

56
00:02:50,433 --> 00:02:53,333
because using the average,
it is less likely.

57
00:02:53,333 --> 00:02:55,933
So you're going to get a more accurate
prediction and more.

58
00:02:55,933 --> 00:02:59,033
And the second thing is that
they're more stable algorithms like this,

59
00:02:59,533 --> 00:03:01,466
ensemble algorithms are more stable

60
00:03:01,466 --> 00:03:05,766
because any changes in your data set
could really impact one tree.

61
00:03:05,766 --> 00:03:10,633
But to, for them to, really impact
a forest of trees, it's much harder.

62
00:03:10,633 --> 00:03:14,800
So therefore ensemble
is much more powerful in that way.

63
00:03:15,300 --> 00:03:18,100
And what this reminds me of is the game

64
00:03:18,100 --> 00:03:24,900
that is often played at fairs or parties
and things like that, where you have a jar

65
00:03:24,900 --> 00:03:29,266
and inside this jar there's lots
and lots of, for instance, jelly beans.

66
00:03:29,266 --> 00:03:34,466
Or it could be marbles, or they could be
like a huge net with balloons inside it.

67
00:03:34,466 --> 00:03:38,100
And, we have one in the mall
sometimes where

68
00:03:38,100 --> 00:03:41,733
there's lots of balloons
inside a net, in the ceiling.

69
00:03:42,200 --> 00:03:45,200
And you need to guess
how many balloons there are and who have a

70
00:03:45,200 --> 00:03:48,266
guess is
will get like a car can win a car.

71
00:03:48,266 --> 00:03:51,900
And it's like a crazy prize
for just guessing number of balloons.

72
00:03:52,300 --> 00:03:56,933
And although this is not an example
of specifically a,

73
00:03:57,133 --> 00:04:00,400
random forest
or regression on forest method,

74
00:04:00,700 --> 00:04:04,300
it's still an example
of an ensemble type of method.

75
00:04:04,466 --> 00:04:09,733
So the best way or one of the ways
to beat that game

76
00:04:09,733 --> 00:04:13,033
when you need to guess
the number of marbles in a jar,

77
00:04:13,033 --> 00:04:16,200
for instance,
is not to actually go and guess,

78
00:04:16,366 --> 00:04:20,600
but it's actually to get a pen and a paper
and stand next to the person

79
00:04:20,600 --> 00:04:23,700
that's holding this jar,
or that's conducting this event,

80
00:04:24,100 --> 00:04:26,966
and you just stand next to them,
and then you wait for other people

81
00:04:26,966 --> 00:04:27,766
to come and guess.

82
00:04:27,766 --> 00:04:31,700
Every time somebody guesses, you just
ask them as soon as they're like, guess.

83
00:04:31,700 --> 00:04:35,533
And then walking away, you ask them,
hey, because usually they write down

84
00:04:35,533 --> 00:04:39,733
their number and they put it inside
a, like an envelope or something.

85
00:04:39,733 --> 00:04:42,433
And then there is
the winner is announced later on.

86
00:04:42,433 --> 00:04:45,233
So they don't know whether they guessed
right or wrong, but regardless of that,

87
00:04:45,233 --> 00:04:48,166
they're walking away and you just
ask them, hey, what number did you guess?

88
00:04:48,166 --> 00:04:51,066
And you just write down their number,
and then the next person guesses

89
00:04:51,066 --> 00:04:52,966
and you write their number down,
and you write their number,

90
00:04:52,966 --> 00:04:56,300
and you keep writing the numbers down,
and you just keep doing that until

91
00:04:56,300 --> 00:05:00,466
you have like a substantial number of, 
entries, maybe 100

92
00:05:00,466 --> 00:05:04,366
or maybe if it's a very popular contest
and people are guessing like crazy, like,

93
00:05:04,700 --> 00:05:07,700
trying to attempt
or attempting the guessing,

94
00:05:07,733 --> 00:05:10,500
then you might even get,
like a couple of hundred.

95
00:05:10,500 --> 00:05:11,933
Or even if you're very determined,

96
00:05:11,933 --> 00:05:14,933
you might get a thousand of entries
over a couple of days.

97
00:05:15,133 --> 00:05:17,400
And then what you do
is you just average them out.

98
00:05:17,400 --> 00:05:20,266
Or if you don't want to
maybe take the median.

99
00:05:20,266 --> 00:05:23,333
If you don't want outliers like people
just guessing random numbers

100
00:05:23,333 --> 00:05:26,533
like 1 or 5 million,
so you don't want them to affect you.

101
00:05:26,533 --> 00:05:28,533
You just take the outliers out
and then you average out

102
00:05:28,533 --> 00:05:31,100
anyway, you either average it out
or you take the median.

103
00:05:31,100 --> 00:05:35,566
And statistically speaking,
you have a much higher likelihood

104
00:05:35,900 --> 00:05:39,766
of being closer to the truth
if you take the average of people

105
00:05:39,766 --> 00:05:43,833
because people are natural beings
and they are kind of the visual perception

106
00:05:43,833 --> 00:05:46,833
will be most likely normally distributed.

107
00:05:47,166 --> 00:05:51,766
And therefore you once you hit the middle
of that normal distribution,

108
00:05:51,766 --> 00:05:54,966
you are more likely to be on the money
than any one of them.

109
00:05:55,266 --> 00:05:58,800
And that's pretty cool concept that
that's an example of an ensemble method

110
00:05:58,800 --> 00:06:02,800
where you're taking instead of
just throwing that guess by yourself,

111
00:06:02,800 --> 00:06:06,700
or taking the guess of one individual
person, you're averaging out across

112
00:06:06,700 --> 00:06:11,800
multiple guesses, and you're more likely
to be the closest one to the truth.

113
00:06:12,133 --> 00:06:14,800
And if the prize is given
not just to the person

114
00:06:14,800 --> 00:06:18,300
that gets the spot on, but to the person
that guesses closest to the truth,

115
00:06:18,300 --> 00:06:21,733
then you've got yourself
a very powerful advantage.

116
00:06:22,200 --> 00:06:23,400
using, data science.

117
00:06:23,400 --> 00:06:27,133
So, if you if you have the patience
and determination, then try it out.

118
00:06:27,133 --> 00:06:29,666
Next time you see one of these games
and see how you go.

119
00:06:29,666 --> 00:06:31,266
Would love to hear back from you,

120
00:06:31,266 --> 00:06:33,900
because I never have the patience
to stand there and just count.

121
00:06:33,900 --> 00:06:38,100
But it is, it is a statistical approach
to a challenge like that.

122
00:06:38,633 --> 00:06:40,600
So hopefully you enjoyed today's tutorial.

123
00:06:40,600 --> 00:06:41,933
I look forward to seeing you next time.

124
00:06:41,933 --> 00:06:43,800
Until then, enjoy machine learning.