1
00:00:00,066 --> 00:00:01,033
Naive Bayes is ready.

2
00:00:01,033 --> 00:00:02,900
So now let's move on to decision tree
classification.

3
00:00:02,900 --> 00:00:04,066
And there you go.

4
00:00:04,066 --> 00:00:07,700
So that happens when you have too many
sessions opened on the Google Colab.

5
00:00:07,700 --> 00:00:11,533
I left it on purpose because I'm sure
you will also encounter this situation

6
00:00:11,766 --> 00:00:14,366
and what to do in this situation. Well,
no worries at all.

7
00:00:14,366 --> 00:00:16,133
That's very simple since Google

8
00:00:16,133 --> 00:00:20,133
Colab actually allows only maximum
five sessions to run at the same time.

9
00:00:20,166 --> 00:00:23,166
Well, what we'll just do here
is that for decision tree

10
00:00:23,166 --> 00:00:26,100
classification and random forest,
because here that's the same.

11
00:00:26,100 --> 00:00:28,266
Well we'll just close them for now.

12
00:00:28,266 --> 00:00:28,966
All right.

13
00:00:28,966 --> 00:00:32,366
And we will reopen them
after we get the best accuracy

14
00:00:32,366 --> 00:00:34,633
from these five models okay.

15
00:00:34,633 --> 00:00:35,966
So we'll get the best from these five.

16
00:00:35,966 --> 00:00:37,833
Then we'll run the last two ones.

17
00:00:37,833 --> 00:00:40,700
Decision tree classification
and random forest classification.

18
00:00:40,700 --> 00:00:42,133
And we'll see which one wins.

19
00:00:42,133 --> 00:00:43,433
Which one is the big winner.

20
00:00:43,433 --> 00:00:44,266
All right.

21
00:00:44,266 --> 00:00:47,733
So now all are implementations already.

22
00:00:47,733 --> 00:00:51,600
So of course
next natural step now is to run

23
00:00:51,600 --> 00:00:55,400
all these cells to get all the accuracies
of these five models.

24
00:00:55,466 --> 00:00:58,200
All right
so let's start with logistic regression.

25
00:00:58,200 --> 00:00:58,966
Are you ready.

26
00:00:58,966 --> 00:01:03,866
Let's click run time and then run
and all the cells will be running.

27
00:01:04,133 --> 00:01:06,300
And we shouldn't get any error. Indeed.

28
00:01:06,300 --> 00:01:07,633
And we get wow.

29
00:01:07,633 --> 00:01:11,400
We start with a very good accuracy
because we get an accuracy

30
00:01:11,400 --> 00:01:15,100
close to 95% for logistic regression.

31
00:01:15,100 --> 00:01:15,500
All right.

32
00:01:15,500 --> 00:01:15,800
And indeed

33
00:01:15,800 --> 00:01:20,400
we only have four plus five equals
nine errors nine incorrect predictions.

34
00:01:20,666 --> 00:01:21,466
Well pretty good.

35
00:01:21,466 --> 00:01:24,433
So let's see what we get
with the next ones. You know.

36
00:01:24,433 --> 00:01:26,700
And it's really reassuring
that we get these high accuracies

37
00:01:26,700 --> 00:01:29,100
because we were doing predictions
for breast cancer.

38
00:01:29,100 --> 00:01:33,200
So we really want to,
you know, be accurate on predicting

39
00:01:33,200 --> 00:01:36,400
if patients have a benign
or malignant tumor.

40
00:01:36,566 --> 00:01:37,066
Okay.

41
00:01:37,066 --> 00:01:39,900
So let's hope that we can do
even better than this.

42
00:01:39,900 --> 00:01:41,133
All right. So I'm going to scroll back up.

43
00:01:41,133 --> 00:01:44,200
Oh no actually I'm going to leave
that here in case we forget.

44
00:01:44,633 --> 00:01:46,200
So 0.947.

45
00:01:46,200 --> 00:01:48,733
Now let's move on to k nearest neighbors.

46
00:01:48,733 --> 00:01:51,300
Let's click run time here then run all.

47
00:01:51,300 --> 00:01:52,366
And there we go my friends.

48
00:01:52,366 --> 00:01:56,666
We're about to get the next accuracy
which is exactly this same one.

49
00:01:56,666 --> 00:01:59,833
I just check that,
you know I put the right model here.

50
00:01:59,833 --> 00:02:04,500
But we have exactly the same one actually,
which you know, can totally happen

51
00:02:04,500 --> 00:02:07,500
because you just have to make nine
incorrect predictions.

52
00:02:07,500 --> 00:02:10,833
You know, two classification models
can make the same number of incorrect

53
00:02:10,833 --> 00:02:14,100
predictions, and therefore you will end up
with the exact same accuracy.

54
00:02:14,400 --> 00:02:15,400
So that's very interesting.

55
00:02:15,400 --> 00:02:17,833
Actually,
this is the first time I observe this.

56
00:02:17,833 --> 00:02:18,166
All right.

57
00:02:18,166 --> 00:02:22,433
So well let's still hope we can beat this
with our next classification models.

58
00:02:22,666 --> 00:02:25,833
So now with Support Vector
Machine we're going to click run time.

59
00:02:25,833 --> 00:02:30,566
And we're going to click Run All
to see that next accuracy we're getting.

60
00:02:30,566 --> 00:02:32,533
And all right interesting.

61
00:02:32,533 --> 00:02:36,766
This time we get a lower accuracy
but still a very very good one.

62
00:02:36,766 --> 00:02:40,966
And you know that makes me very excited
to see what kernel SVM is going to do.

63
00:02:41,100 --> 00:02:42,933
You know with a nonlinear kernel.

64
00:02:42,933 --> 00:02:46,966
Because indeed here we get ten incorrect
predictions as opposed to nine

65
00:02:46,966 --> 00:02:50,633
incorrect predictions before with logistic
regression and K-nearest neighbors.

66
00:02:50,933 --> 00:02:53,566
But here with SVM, it's still very,
very good.

67
00:02:53,566 --> 00:02:56,500
We get 94% accuracy.

68
00:02:56,500 --> 00:02:58,566
All right. And now let's try kernel SVM.

69
00:02:58,566 --> 00:03:00,600
I look forward to seeing what
we're going to get.

70
00:03:00,600 --> 00:03:02,066
So let's click run time.

71
00:03:02,066 --> 00:03:03,800
And let's click run all.

72
00:03:03,800 --> 00:03:07,433
And the accuracy is yes we beat it

73
00:03:07,800 --> 00:03:11,400
95% 95.3%.

74
00:03:11,400 --> 00:03:12,066
That's excellent.

75
00:03:12,066 --> 00:03:14,500
And that was actually expected kernel SVM.

76
00:03:14,500 --> 00:03:16,333
You know is really really good.

77
00:03:16,333 --> 00:03:18,166
You will get good results with this
because you know,

78
00:03:18,166 --> 00:03:21,900
we get flexibility on the curve
to catch the correct predictions.

79
00:03:22,300 --> 00:03:23,400
All right. So very very good.

80
00:03:23,400 --> 00:03:26,466
But we still have three
other classification models.

81
00:03:26,466 --> 00:03:29,866
Let's see what we're going to get
with them starting with Naive Bayes.

82
00:03:30,366 --> 00:03:30,833
All right.

83
00:03:30,833 --> 00:03:33,833
So let's click run time and then run all

84
00:03:33,900 --> 00:03:37,033
and and the next accuracy is okay.

85
00:03:37,033 --> 00:03:43,200
So like SVM ten incorrect predictions
resulting in an accuracy of 94%.

86
00:03:43,366 --> 00:03:44,133
All right.

87
00:03:44,133 --> 00:03:45,133
That's okay.

88
00:03:45,133 --> 00:03:47,933
And now
well we still have two more chances.

89
00:03:47,933 --> 00:03:49,666
One was decision tree classification

90
00:03:49,666 --> 00:03:52,166
and the other one
with random forest classification.

91
00:03:52,166 --> 00:03:55,333
So now what we're going to do
is we're going to click runtime

92
00:03:55,333 --> 00:03:58,333
here then manage sessions.

93
00:03:58,433 --> 00:04:01,100
Then we're going to terminate
all these sessions here.

94
00:04:01,100 --> 00:04:06,700
Because you know we're only allowed to run
maximum five sessions at the same time.

95
00:04:07,033 --> 00:04:09,000
So I terminated all of them.

96
00:04:09,000 --> 00:04:11,566
You can close it
now. And we still keep the accuracy.

97
00:04:11,566 --> 00:04:12,833
So that's totally fine right.

98
00:04:12,833 --> 00:04:14,866
We keep the accuracy everywhere here.

99
00:04:14,866 --> 00:04:17,633
So we can totally compare
with our last two.

100
00:04:17,633 --> 00:04:18,566
So let's do this.

101
00:04:18,566 --> 00:04:21,866
Let's open first you know
random forest classification because

102
00:04:22,500 --> 00:04:25,500
you know it gives them in that order.

103
00:04:25,666 --> 00:04:27,266
Well actually
that doesn't really matter here.

104
00:04:27,266 --> 00:04:30,900
But anyway let's open
decision tree classification now.

105
00:04:31,466 --> 00:04:32,233
All right.

106
00:04:32,233 --> 00:04:34,166
And here we go.

107
00:04:34,166 --> 00:04:36,933
We have our last two models
I can't wait to try them

108
00:04:36,933 --> 00:04:39,266
because I can't wait to see
who is going to be the big winner.

109
00:04:39,266 --> 00:04:44,233
And if we can beat even more
that best accuracy of 95.3%.

110
00:04:44,700 --> 00:04:45,066
All right.

111
00:04:45,066 --> 00:04:47,433
So next step is not to click runtime here.

112
00:04:47,433 --> 00:04:51,366
Because remember we haven't uploaded yet
the data set into the notebook.

113
00:04:51,566 --> 00:04:54,566
So no need for refresh here
I'll good upload

114
00:04:54,800 --> 00:04:57,700
then data dot CSV open.

115
00:04:57,700 --> 00:05:00,800
Then let's do quickly
the same for random forest classification.

116
00:05:00,800 --> 00:05:03,866
But first let's not forget to replace this
by data dot

117
00:05:03,866 --> 00:05:07,366
CSV are good now
random forest classification

118
00:05:07,900 --> 00:05:10,500
little folder here, then upload,

119
00:05:10,500 --> 00:05:14,333
then data dot csv open, then okay.

120
00:05:14,633 --> 00:05:17,933
And then let's replace this
by data dot CSV.

121
00:05:18,600 --> 00:05:19,066
All right.

122
00:05:19,066 --> 00:05:22,400
And now my friends
we're about to reveal the final podium.

123
00:05:22,400 --> 00:05:26,066
You know the three best models
with the three highest accuracies.

124
00:05:26,266 --> 00:05:29,433
So let's do this starting with decision
tree classification.

125
00:05:29,766 --> 00:05:32,500
So let's click run time here. Run all.

126
00:05:32,500 --> 00:05:34,600
And now there we go.

127
00:05:34,600 --> 00:05:36,900
Wow that's incredible.

128
00:05:36,900 --> 00:05:41,100
We actually beat the accuracy
I didn't I really didn't expect this.

129
00:05:41,100 --> 00:05:44,066
Usually decision tree
classification is not the winner.

130
00:05:44,066 --> 00:05:46,466
But here we have
a beautiful exception to the rule.

131
00:05:46,466 --> 00:05:52,300
Indeed we get a beautiful
accuracy of almost 96% 95.9.