1
00:00:00,400 --> 00:00:03,400
But that's because we have y pred
this way in our templates.

2
00:00:03,566 --> 00:00:07,433
And if you don't want this format
of widespread, well,

3
00:00:07,433 --> 00:00:11,300
you just need to add a simple argument,
which is type here

4
00:00:11,933 --> 00:00:15,566
the type equals
and you just need to input class.

5
00:00:16,433 --> 00:00:17,700
Okay let's try.

6
00:00:17,700 --> 00:00:20,700
Let's try to execute this line again.

7
00:00:21,033 --> 00:00:22,033
And here's why.

8
00:00:22,033 --> 00:00:24,933
Let's have a look at Y right now. Y pred.

9
00:00:26,700 --> 00:00:27,400
There you go.

10
00:00:27,400 --> 00:00:28,800
Now why is vector.

11
00:00:28,800 --> 00:00:32,766
As you can see, for each observation
of the test set, that is for each user

12
00:00:32,766 --> 00:00:36,333
of the test set
we have like before the prediction

13
00:00:36,333 --> 00:00:39,533
0 or 1 for each user zero.

14
00:00:39,533 --> 00:00:42,300
If the user is predicted
not to buy the SUV,

15
00:00:42,300 --> 00:00:45,566
and one
if the user is predicted to buy the SUV.

16
00:00:45,800 --> 00:00:48,800
According to our decision tree classifier.

17
00:00:49,733 --> 00:00:52,500
Okay,
so that's a little thing to change here.

18
00:00:52,500 --> 00:00:57,800
Make sure that you know your
Y is your dependent vector of results.

19
00:00:57,833 --> 00:00:59,433
Zero one that we were used to.

20
00:00:59,433 --> 00:01:02,600
Because here, as you can see,
we use the same predict function

21
00:01:02,600 --> 00:01:06,266
with the two arguments
classifier and new data equals grid set.

22
00:01:06,900 --> 00:01:09,333
So that means that this won't work

23
00:01:09,333 --> 00:01:13,200
because this is supposed to be
a vector of prediction result.

24
00:01:13,466 --> 00:01:15,766
Only this time
it's for all the pixel points.

25
00:01:15,766 --> 00:01:18,766
You know, the imaginary pixel
point users in the grid.

26
00:01:19,066 --> 00:01:22,600
But since the predict function
is associated to the classifier,

27
00:01:22,633 --> 00:01:24,766
which is the decision tree classifier,

28
00:01:24,766 --> 00:01:28,333
then if we only keep these two arguments
here, then this will make any sense,

29
00:01:28,333 --> 00:01:33,000
because this will return y grid
as a matrix of the two probabilities,

30
00:01:33,666 --> 00:01:35,966
and therefore here
we will have some problem

31
00:01:35,966 --> 00:01:40,300
because it will be a matrix of a matrix,
whereas here it's supposed to be a vector.

32
00:01:40,666 --> 00:01:43,733
So what we only need to do,
and we will do it now

33
00:01:43,733 --> 00:01:47,566
so that we don't forget
is to add this type parameter.

34
00:01:47,933 --> 00:01:50,933
And we will set it equal to class.

35
00:01:51,066 --> 00:01:53,433
And then it will work perfectly.

36
00:01:53,433 --> 00:01:54,600
So we'll copy this

37
00:01:56,266 --> 00:01:59,566
and add it here as well.

38
00:01:59,933 --> 00:02:01,266
Perfect. And now it's ready.

39
00:02:01,266 --> 00:02:04,666
Now it will plot the graph
without any errors.

40
00:02:05,033 --> 00:02:07,966
So I know I gave you a template
that is supposed to work

41
00:02:07,966 --> 00:02:11,300
without changing anything
to plot the classifications.

42
00:02:11,633 --> 00:02:12,166
I'm sorry.

43
00:02:12,166 --> 00:02:15,233
Sometimes we need to change
a little few stuff and that's why we need

44
00:02:15,233 --> 00:02:20,633
to, you know, execute each of the lines
one by one to see if it's as it should be.

45
00:02:21,000 --> 00:02:21,900
And besides, yes,

46
00:02:21,900 --> 00:02:25,800
we would have encountered some issues
if we you know, computed the confusion

47
00:02:25,800 --> 00:02:29,433
matrix this way with this
y as a matrix of probabilities.

48
00:02:29,733 --> 00:02:33,600
But now it will be fine
because y is set the correct way.

49
00:02:34,033 --> 00:02:37,333
So we'll execute this and look at
the number of incorrect predictions.

50
00:02:37,633 --> 00:02:38,566
All right.

51
00:02:38,566 --> 00:02:41,400
Now let's enter CM here.

52
00:02:41,400 --> 00:02:45,933
And we have six
plus 11 equals 17 incorrect predictions.

53
00:02:46,233 --> 00:02:48,500
So now let's see if we were right

54
00:02:48,500 --> 00:02:52,700
to change our code this way
so that we can plot the graph.

55
00:02:52,700 --> 00:02:54,166
Let's see if it will work.

56
00:02:54,166 --> 00:02:55,966
I hope it will work
because I want to show you

57
00:02:55,966 --> 00:02:59,933
the decision tree prediction regions
and prediction boundary.

58
00:02:59,966 --> 00:03:01,500
I really want to show you this.

59
00:03:01,500 --> 00:03:04,566
For those of you who didn't follow
the Python tutorial of course.

60
00:03:04,933 --> 00:03:09,433
So let's select this
and let's see if we made a good job.

61
00:03:10,900 --> 00:03:12,366
All right looks good so far.

62
00:03:12,366 --> 00:03:14,300
Looks good. No errors.

63
00:03:14,300 --> 00:03:17,300
Let's see what happens.

64
00:03:17,600 --> 00:03:18,900
And we were right.

65
00:03:18,900 --> 00:03:21,233
This works perfectly well.

66
00:03:21,233 --> 00:03:23,333
That's the decision tree classifier.

67
00:03:23,333 --> 00:03:25,500
That's the prediction boundary.

68
00:03:25,500 --> 00:03:28,866
So as you can see there's only
horizontal and vertical lines.

69
00:03:29,400 --> 00:03:32,033
That's because as Kirill
explains, the decision tree

70
00:03:32,033 --> 00:03:36,400
algorithm is based on some conditions
of your independent variables.

71
00:03:36,400 --> 00:03:40,500
By finding, you know, each time intervals
that will make conditions

72
00:03:40,500 --> 00:03:43,800
that will classify in some rectangles
your observations.

73
00:03:44,100 --> 00:03:48,366
And actually, what's funny
is that we clearly have less overfitting

74
00:03:48,366 --> 00:03:49,300
than in Python.

75
00:03:49,300 --> 00:03:52,366
And actually that's
why we have more incorrect predictions.

76
00:03:52,666 --> 00:03:57,566
Because in Python we had, you know,
red rectangles here, red rectangles here.

77
00:03:58,166 --> 00:04:01,166
There was also red rectangle here

78
00:04:01,700 --> 00:04:02,233
and here.

79
00:04:02,233 --> 00:04:04,900
We didn't
actually specified more parameters,

80
00:04:04,900 --> 00:04:08,433
but this amazing output library
and that's why it's very popular.

81
00:04:08,800 --> 00:04:11,800
Chose the right parameters, the right
default parameters to,

82
00:04:12,233 --> 00:04:13,733
you know, prevent overfitting.

83
00:04:13,733 --> 00:04:16,233
Because here
we clearly don't have overfitting.

84
00:04:16,233 --> 00:04:19,433
We had overfitting with Python
because of all the red rectangles here

85
00:04:19,433 --> 00:04:23,266
that were desperately trying
to catch every user in the right category.

86
00:04:23,500 --> 00:04:24,633
But here it's not the case.

87
00:04:24,633 --> 00:04:28,400
And here it's doing a terrific job at,
you know,

88
00:04:28,500 --> 00:04:30,933
classifying correctly
most of the red points here,

89
00:04:30,933 --> 00:04:33,733
most of the green points
here in the red region.

90
00:04:33,733 --> 00:04:38,566
And as well as this green uses here,
who couldn't be well classified for linear

91
00:04:38,566 --> 00:04:43,366
classifiers such as logistic regression
or linear kernel SVM.

92
00:04:44,266 --> 00:04:45,766
So here it's doing a pretty good job.

93
00:04:45,766 --> 00:04:47,333
But still we have some incorrect
prediction.

94
00:04:47,333 --> 00:04:49,766
That's
because that's difficult to classify.

95
00:04:49,766 --> 00:04:52,800
Well if you want to prevent
overfitting in your data.

96
00:04:53,433 --> 00:04:56,433
So even if we have 17
incorrect predictions

97
00:04:56,533 --> 00:05:00,566
that's a very good classification
we have here okay.

98
00:05:00,566 --> 00:05:02,400
But now let's look at the test
set results.

99
00:05:02,400 --> 00:05:05,333
And I'm actually not worried about that
because

100
00:05:05,333 --> 00:05:08,600
since we don't have overfitting here,
then that means that we're

101
00:05:08,600 --> 00:05:11,666
very likely to have some good results
as well on the test set.

102
00:05:11,700 --> 00:05:13,466
Let's check it out.

103
00:05:13,466 --> 00:05:16,200
Test set and execute.

104
00:05:16,200 --> 00:05:17,400
Let's see.

105
00:05:17,400 --> 00:05:19,000
And here is the test set okay.

106
00:05:19,000 --> 00:05:21,333
So as I told you this looks very good.

107
00:05:21,333 --> 00:05:25,133
This is the set on which we have
those 17 incorrect predictions.

108
00:05:25,133 --> 00:05:27,900
You can count them
if you want. You will find 17.

109
00:05:27,900 --> 00:05:31,100
And it's classifying
most of the red users in the red region

110
00:05:31,266 --> 00:05:33,633
and most of the green users
in the green regions.

111
00:05:33,633 --> 00:05:35,300
That's quite okay.

112
00:05:35,300 --> 00:05:38,400
By the way, we can see that most of
the incorrect predictions are here.

113
00:05:38,400 --> 00:05:41,500
We can see that we have many red points
in the green region.

114
00:05:41,500 --> 00:05:43,200
So that's unlucky.

115
00:05:43,200 --> 00:05:43,433
Good.

116
00:05:43,433 --> 00:05:47,600
As I told you, we would rather prevent
overfitting than trying to,

117
00:05:47,833 --> 00:05:50,966
you know, minimize to zero
the number of incorrect predictions.