1
00:00:00,333 --> 00:00:01,033
Perfect.

2
00:00:01,033 --> 00:00:04,933
The next step is to, of course,
create an instance of this class,

3
00:00:04,933 --> 00:00:09,400
which will be an object representing
exactly that naive base model.

4
00:00:09,700 --> 00:00:10,500
And so there we go.

5
00:00:10,500 --> 00:00:13,200
We're going to call this
as usual classifier

6
00:00:13,200 --> 00:00:16,500
in order to be coherent
with the next sections

7
00:00:16,500 --> 00:00:20,000
of this implementation, and mostly
so that we don't have to change anything.

8
00:00:20,000 --> 00:00:20,400
Right.

9
00:00:20,400 --> 00:00:21,933
Because then we call this

10
00:00:21,933 --> 00:00:25,800
classifier variable to predict the results
and visualize the result.

11
00:00:26,100 --> 00:00:27,866
So there we go classifier here.

12
00:00:27,866 --> 00:00:32,400
And then and then we're going to call this
Gaussian NB class

13
00:00:32,633 --> 00:00:36,033
in order to create
indeed this Naive Bayes model.

14
00:00:36,600 --> 00:00:37,733
Okay. Perfect.

15
00:00:37,733 --> 00:00:39,933
And now you know how to finish this.

16
00:00:39,933 --> 00:00:44,300
We need to take our classifier again,
from which we're going to call the fit

17
00:00:44,633 --> 00:00:49,166
method, which will train this classifier
on the training set

18
00:00:49,400 --> 00:00:54,733
composed of indeed x train and Y train.

19
00:00:55,133 --> 00:00:55,500
Right?

20
00:00:55,500 --> 00:00:57,600
I hope you did it even faster than me,

21
00:00:57,600 --> 00:01:00,333
because indeed,
this is exactly the same as before.

22
00:01:00,333 --> 00:01:02,533
And now you are also independent

23
00:01:02,533 --> 00:01:06,233
and know how to find the information
you need in the API.

24
00:01:07,033 --> 00:01:09,533
Okay, great. So once again there we go.

25
00:01:09,533 --> 00:01:14,100
That implementation is over
and we ready to get the final result.

26
00:01:14,100 --> 00:01:17,233
And mostly we're ready to find out
if we can beat

27
00:01:17,233 --> 00:01:20,233
the record accuracy you know if 93%.

28
00:01:20,300 --> 00:01:22,766
So I can't wait to see.
So let's do it right now.

29
00:01:22,766 --> 00:01:27,000
Let's click
this folder button to upload a data set.

30
00:01:27,100 --> 00:01:27,433
Right.

31
00:01:27,433 --> 00:01:29,366
We have to do it in order to train.

32
00:01:29,366 --> 00:01:31,666
Indeed that Naive Bayes
model on the training set.

33
00:01:31,666 --> 00:01:36,466
Right now your notebook is connecting
to a runtime to enable file browsing.

34
00:01:36,733 --> 00:01:39,866
And once again, in this second
we should get the up button.

35
00:01:39,866 --> 00:01:42,666
There we go. So we're going to click it.

36
00:01:42,666 --> 00:01:45,800
And here we are in the kernel SVM folder.

37
00:01:45,900 --> 00:01:48,266
So let me show you the path once again
please.

38
00:01:48,266 --> 00:01:49,733
Fine your whole machinery.

39
00:01:49,733 --> 00:01:54,000
It is that folder which you could download
in the previous tutorial if not already.

40
00:01:54,400 --> 00:01:57,233
And then inside we're going to go to port
three classification.

41
00:01:57,233 --> 00:01:59,033
Then section 18.

42
00:01:59,033 --> 00:02:00,600
We're making good progress here.

43
00:02:00,600 --> 00:02:05,200
Naive Bayes
then Python and then social network

44
00:02:05,200 --> 00:02:08,633
ads dot csv okay we press okay.

45
00:02:08,633 --> 00:02:10,800
Here we have the data set. All good.

46
00:02:10,800 --> 00:02:16,800
And now now we can run everything
in order to get indeed our new result.

47
00:02:16,800 --> 00:02:19,033
So let's do this run all.

48
00:02:19,033 --> 00:02:21,166
And now all the cells are running.

49
00:02:21,166 --> 00:02:22,800
And especially this one. There we go.

50
00:02:22,800 --> 00:02:26,066
We now have our Gaussian
Naive Bayes model.

51
00:02:26,433 --> 00:02:30,100
And well let's see the results one by one
starting with this one.

52
00:02:30,100 --> 00:02:32,633
So that's the prediction
of a single result.

53
00:02:32,633 --> 00:02:34,266
You know that first customer of the test

54
00:02:34,266 --> 00:02:38,700
set of age
30 and estimated salary $87,000.

55
00:02:38,933 --> 00:02:41,966
And remember in the white
says the real outcome

56
00:02:41,966 --> 00:02:45,000
was zero meaning
that this customer didn't buy the SUV.

57
00:02:45,200 --> 00:02:48,466
And that's the prediction,
which is indeed the correct prediction.

58
00:02:48,900 --> 00:02:51,766
And then when predicting the test results,
well,

59
00:02:51,766 --> 00:02:55,200
once again we see that
we have a lot of correct predictions.

60
00:02:55,200 --> 00:02:56,233
All this is correct.

61
00:02:56,233 --> 00:02:57,300
All this is correct.

62
00:02:57,300 --> 00:02:59,400
This is our first incorrect prediction.

63
00:02:59,400 --> 00:03:02,866
Another one here
and then another one here.

64
00:03:02,866 --> 00:03:05,933
All correct. I'll correct
another one here.

65
00:03:06,300 --> 00:03:09,566
Oh I'm not sure
we're going to beat actually that accuracy

66
00:03:09,566 --> 00:03:14,333
we seem to have more than seven incorrect
predictions at first I'm not sure.

67
00:03:14,333 --> 00:03:15,600
But let's see, let's see.

68
00:03:15,600 --> 00:03:18,233
Well, that's exactly
what we're about to find out right now.

69
00:03:18,233 --> 00:03:19,333
So are you ready?

70
00:03:19,333 --> 00:03:23,166
The question is,
will we beat the accuracy of 93%?

71
00:03:23,166 --> 00:03:27,966
Which was the best accuracy resulting
from both Kilian and any kernel SVM.

72
00:03:28,200 --> 00:03:30,833
And so let's see what we get
with Naive Bayes.

73
00:03:30,833 --> 00:03:33,833
And no,
unfortunately we don't beat the record.

74
00:03:34,100 --> 00:03:38,633
Indeed, the accuracy
we get with that Naive Bayes model is 90%,

75
00:03:38,633 --> 00:03:42,666
which beats indeed logistic regression,
but does equally the same

76
00:03:42,666 --> 00:03:46,000
as the classic SVM model
with a linear kernel.

77
00:03:46,666 --> 00:03:50,566
All right, but still, I think we will get
nice visualization results.

78
00:03:50,633 --> 00:03:53,733
That's the code cell where we visualize
the training set results.

79
00:03:53,733 --> 00:03:56,266
And well,
this time we got the results pretty fast.

80
00:03:56,266 --> 00:03:59,100
You can see that
the cell is already executed.

81
00:03:59,100 --> 00:04:02,700
So let's see
I can show you that indeed the Naive Bayes

82
00:04:02,700 --> 00:04:04,833
curve is pretty nice right?

83
00:04:04,833 --> 00:04:07,800
It is a nice smooth curve right?

84
00:04:07,800 --> 00:04:10,533
That catches well indeed these

85
00:04:10,533 --> 00:04:13,500
green customers
here, you know, the ones who in reality.

86
00:04:13,500 --> 00:04:18,400
But the SUV in the right green region
but unfortunately you know it's separated

87
00:04:18,400 --> 00:04:19,066
the two classes.

88
00:04:19,066 --> 00:04:19,233
You know

89
00:04:19,233 --> 00:04:23,100
with these two prediction regions
a bit large, you know, not very precisely.

90
00:04:23,100 --> 00:04:26,533
And that's why we don't get an accuracy
that is higher than 93%.

91
00:04:26,766 --> 00:04:30,066
But still, you know, we made a progress
with respect to logistic regression

92
00:04:30,066 --> 00:04:33,066
because indeed
remember that for logistic regression

93
00:04:33,200 --> 00:04:36,700
these green customers here could not be
well classified.

94
00:04:36,766 --> 00:04:38,300
Right. Because of the straight line.

95
00:04:38,300 --> 00:04:40,366
They fall in the red region.

96
00:04:40,366 --> 00:04:44,266
And here in our Naive Bayes
implementation, well these green customers

97
00:04:44,266 --> 00:04:47,200
fall in the right region.
So at least it corrected that.

98
00:04:47,200 --> 00:04:51,666
But since here there is kind of
a large margin, well these green customers

99
00:04:51,666 --> 00:04:55,633
which were correctly classified
with the kernel SVM,

100
00:04:55,633 --> 00:04:58,933
if you remember right,
these are the training set results.

101
00:04:59,300 --> 00:05:01,266
Right.
So I'm talking about these ones here.

102
00:05:01,266 --> 00:05:04,466
They are correctly classified
except these two and the third one.

103
00:05:04,466 --> 00:05:06,400
But clearly with Naive Bayes.

104
00:05:06,400 --> 00:05:09,000
Well they fall into the wrong region okay.

105
00:05:09,000 --> 00:05:13,566
But anyway, at least you see the
prediction curve of the Naive Bayes model.

106
00:05:13,566 --> 00:05:18,166
And mostly you see that Naive Bayes
model is clearly a nonlinear classifier.

107
00:05:18,400 --> 00:05:22,800
And, you know, in some other situations,
because the naive Bayes is causing less

108
00:05:22,800 --> 00:05:27,300
overfitting, well, in some situations
it will do better than your other models.

109
00:05:27,300 --> 00:05:30,033
That's why it's always very important
to try all of them.

110
00:05:30,033 --> 00:05:33,333
And remember, at the end of this part,
I will actually deploy

111
00:05:33,333 --> 00:05:37,100
all our models with new simplified code
templates for each model.

112
00:05:37,100 --> 00:05:39,000
You know, without all the prints
and everything,

113
00:05:39,000 --> 00:05:42,566
in order to deploy them in a flashlight
so that we can quickly figure out

114
00:05:42,733 --> 00:05:46,266
what is the best classification model
for any data set.

115
00:05:46,266 --> 00:05:48,300
You know,
regardless of the number of features.