1
00:00:00,100 --> 00:00:01,833
Hello my friends, and welcome

2
00:00:01,833 --> 00:00:06,766
to the final practical activity
of this part three on classification.

3
00:00:07,133 --> 00:00:10,166
And now we're all going to go into part
three classification.

4
00:00:10,166 --> 00:00:13,533
Then to implement the final classification

5
00:00:13,533 --> 00:00:17,233
model of this part three
the random forest classification model.

6
00:00:17,766 --> 00:00:19,566
All right.
And we're going to start with Python.

7
00:00:19,566 --> 00:00:20,200
Of course.

8
00:00:20,200 --> 00:00:23,200
And inside this folder
you will get the same two files

9
00:00:23,200 --> 00:00:26,033
for the implementation
of this classification model.

10
00:00:26,033 --> 00:00:29,033
And the social network ad data set,

11
00:00:29,100 --> 00:00:33,333
which contains 400 observations
corresponding to 400 customers.

12
00:00:33,466 --> 00:00:37,100
You know, each row is a customer,
and for each of them we get the two

13
00:00:37,100 --> 00:00:41,066
features age and estimated salary
with which we're going to predict

14
00:00:41,066 --> 00:00:45,600
this dependent variable purchase
which tells if yes or no,

15
00:00:45,766 --> 00:00:49,800
each customer
but an SUV from this car dealership.

16
00:00:50,066 --> 00:00:52,866
And then once we train this model
to understand the correlations

17
00:00:52,866 --> 00:00:55,766
between these two features
and the dependent variable vector,

18
00:00:55,766 --> 00:00:58,900
we will be able to predict
which new customers will buy

19
00:00:59,033 --> 00:01:03,866
that brand new SUV just released by this
car company, and therefore, we'll be able

20
00:01:03,866 --> 00:01:09,433
to target the best way our customers
through beautiful ads on social networks.

21
00:01:09,566 --> 00:01:11,633
All right, so let's do this.

22
00:01:11,633 --> 00:01:15,133
Let's start the implementation
random forest classification.

23
00:01:15,366 --> 00:01:18,600
And let's open
this with either Google Collaboratory

24
00:01:18,600 --> 00:01:22,566
or Jupyter Notebook,
whatever is your favorite.

25
00:01:23,233 --> 00:01:23,600
All right.

26
00:01:23,600 --> 00:01:27,000
So right now it is opening
the notebook loading it, laying it out.

27
00:01:27,000 --> 00:01:30,966
And here is the random forest
classification implementation

28
00:01:31,233 --> 00:01:34,700
which results once again
from that classification template

29
00:01:34,700 --> 00:01:37,700
we made in the first section
on logistic regression.

30
00:01:37,966 --> 00:01:41,966
So all these cells here are exactly
the same as in logistic regression.

31
00:01:41,966 --> 00:01:45,600
You know, with the same variable names
and everything except

32
00:01:45,900 --> 00:01:48,900
this cell where we build and train the

33
00:01:49,600 --> 00:01:54,000
the classification model here,
the random forest classification model.

34
00:01:54,300 --> 00:01:56,400
So we're going to re-implement that cell.

35
00:01:56,400 --> 00:02:00,533
And since this is in read only mode
so that you can all access it.

36
00:02:00,800 --> 00:02:01,966
Well we are going to create

37
00:02:01,966 --> 00:02:05,866
a copy of this file
by clicking here on save a copy in Drive.

38
00:02:06,133 --> 00:02:08,900
This creates a copy and there we go.

39
00:02:08,900 --> 00:02:12,233
We will be able to re-implement
that cell to train

40
00:02:12,233 --> 00:02:15,366
our random forest
classification model on the training set.

41
00:02:15,700 --> 00:02:16,433
All right.

42
00:02:16,433 --> 00:02:19,366
So first let's remove that cell.

43
00:02:19,366 --> 00:02:21,166
And now now is the time.

44
00:02:21,166 --> 00:02:21,600
Where are you going

45
00:02:21,600 --> 00:02:25,533
to press pause on the video to
of course implement this yourself.

46
00:02:25,533 --> 00:02:29,433
And also to learn how to be independent
in machine learning and learn

47
00:02:29,433 --> 00:02:33,866
how to get familiar with that scikit
learn API,

48
00:02:33,866 --> 00:02:37,333
which is the way you're going to find
the information you need right now

49
00:02:37,333 --> 00:02:40,333
to build this random forest
classification model.

50
00:02:40,533 --> 00:02:41,066
All right.

51
00:02:41,066 --> 00:02:42,600
So let's do this together.

52
00:02:42,600 --> 00:02:46,966
Let's go to the API
and let's find that class

53
00:02:46,966 --> 00:02:50,500
that we need to build a random forest
classification model.

54
00:02:51,166 --> 00:02:51,500
All right.

55
00:02:51,500 --> 00:02:55,500
So here as opposed to before
we won't find the model.

56
00:02:55,500 --> 00:03:00,066
We need easily you know by scrolling down
for example down to Random Forest.

57
00:03:00,066 --> 00:03:00,800
Because know

58
00:03:00,800 --> 00:03:01,733
the name of the module

59
00:03:01,733 --> 00:03:05,400
is not random forest as it was the case
with the previous classification models.

60
00:03:05,700 --> 00:03:08,000
This time it's actually right here it is.

61
00:03:08,000 --> 00:03:09,200
And symbol method.

62
00:03:09,200 --> 00:03:11,833
And the name of the module
is exactly and symbol.

63
00:03:11,833 --> 00:03:13,800
So that's where you had to find.

64
00:03:13,800 --> 00:03:16,700
But you know if you looked for it
by scrolling down that's fine.

65
00:03:16,700 --> 00:03:20,700
Because really I want you to get familiar
with the scikit learn API.

66
00:03:21,066 --> 00:03:24,266
And so now the question is among all these
and simple methods

67
00:03:24,500 --> 00:03:26,266
where is the one we want.

68
00:03:26,266 --> 00:03:29,733
Well, that's of course
this one random forest classifier.

69
00:03:29,733 --> 00:03:30,833
Hard to miss right.

70
00:03:30,833 --> 00:03:33,033
So we're going to click this link.

71
00:03:33,033 --> 00:03:34,300
And there we go.

72
00:03:34,300 --> 00:03:37,933
This is the random forest classifier class
with all the parameters.

73
00:03:37,933 --> 00:03:40,633
So check them out.
We want enter all of them.

74
00:03:40,633 --> 00:03:44,700
But let me tell you right now
the ones we will enter the first and most

75
00:03:44,700 --> 00:03:48,800
important one is the first one actually
and estimators, which is of course

76
00:03:49,033 --> 00:03:52,633
the number of trees you want to have in
your random forest classifier.

77
00:03:52,766 --> 00:03:55,266
Right? Number of trees in the forest.

78
00:03:55,266 --> 00:03:59,633
Then once again we'll choose
another value of the criterion.

79
00:03:59,633 --> 00:04:03,233
And that's in order to be aligned
with what you learned in the theory.

80
00:04:03,233 --> 00:04:05,600
You know, with key rules,
intuition, lectures.

81
00:04:05,600 --> 00:04:08,900
He taught you about the random forest
classification model with

82
00:04:09,066 --> 00:04:10,533
the entropy criterion.

83
00:04:10,533 --> 00:04:12,566
So we're going to select this.

84
00:04:12,566 --> 00:04:14,766
And that's it. No more parameters.

85
00:04:14,766 --> 00:04:17,933
You know for the other parameters here
we'll just keep the default values.

86
00:04:18,133 --> 00:04:22,533
However we will just add a random state
parameter and set its value to zero

87
00:04:22,666 --> 00:04:26,200
just so that we can have the same results
displayed on our notebook.

88
00:04:26,333 --> 00:04:27,066
All right.

89
00:04:27,066 --> 00:04:28,800
So first let's copy this.

90
00:04:28,800 --> 00:04:31,833
You know the name of the class
in the module right.

91
00:04:31,833 --> 00:04:33,400
So I'm copying this.

92
00:04:33,400 --> 00:04:39,300
Going back to our implementation creating
a new code cell here pasting that.

93
00:04:39,700 --> 00:04:42,333
And then remember we have to start from.

94
00:04:42,333 --> 00:04:44,733
So from the scikit learn library.

95
00:04:44,733 --> 00:04:47,733
Then from the assemble module
of the scikit learn library.

96
00:04:47,833 --> 00:04:51,266
And then remember
we need to add here import.

97
00:04:51,766 --> 00:04:54,600
Well that random forest classifier

98
00:04:54,600 --> 00:04:58,233
which will allow us to build
this random forest classification model.

99
00:04:58,500 --> 00:05:02,000
And speaking of building it,
well that's exactly our next step here.

100
00:05:02,200 --> 00:05:05,166
We're going to build the classifier
through this

101
00:05:05,166 --> 00:05:08,166
classifier variable,
which will be nothing else

102
00:05:08,333 --> 00:05:12,833
than the instance of the random forest
classifier class, therefore nothing else.

103
00:05:12,866 --> 00:05:15,666
Then the random forest classifier
model itself.

104
00:05:15,666 --> 00:05:19,900
So here I'm copying this and basing it
right here, adding some parentheses

105
00:05:20,133 --> 00:05:20,933
and there we go.

106
00:05:20,933 --> 00:05:23,100
Now let's add our two parameters.

107
00:05:23,100 --> 00:05:26,100
You know, the ones of which
we're changing the default values.

108
00:05:26,133 --> 00:05:29,800
The first one is an T maters.

109
00:05:30,033 --> 00:05:32,200
So that's number of trees in the forest.

110
00:05:32,200 --> 00:05:34,900
The default value is actually 100.

111
00:05:34,900 --> 00:05:38,533
But you know it will be totally fine
with ten estimators.

112
00:05:38,533 --> 00:05:40,100
You know ten trees in the forest.

113
00:05:40,100 --> 00:05:40,866
Why is that?

114
00:05:40,866 --> 00:05:43,466
That's because our data set
is actually quite simple.

115
00:05:43,466 --> 00:05:48,333
It only contains two features and only 400
customers, you know, 400 observations.

116
00:05:48,600 --> 00:05:53,333
So we will definitely be fine
with only ten trees in the forest.

117
00:05:53,533 --> 00:05:56,200
All right. And feel free to try
out the numbers if you wish.