1
00:00:00,100 --> 00:00:01,633
And now let's do the whole thing

2
00:00:01,633 --> 00:00:04,633
and you're going to see that
we won't get the error.

3
00:00:05,000 --> 00:00:07,466
So let's select this. Okay.

4
00:00:07,466 --> 00:00:12,200
So that's all the data pre-processing
steps with now the encoding included.

5
00:00:12,566 --> 00:00:15,200
So execute. All right.

6
00:00:15,200 --> 00:00:18,766
Now we're going to fit the naive base
model to the training set.

7
00:00:19,900 --> 00:00:20,866
Execute.

8
00:00:20,866 --> 00:00:22,033
All good.

9
00:00:22,033 --> 00:00:25,200
Now we're going to create our vector
of predictions y pred.

10
00:00:25,866 --> 00:00:26,366
Okay.

11
00:00:26,366 --> 00:00:31,900
You're going to see that if I type y pred
here we will not have the predictions.

12
00:00:32,266 --> 00:00:36,433
So we can compare it to y set
which is the third column of the test set.

13
00:00:37,033 --> 00:00:38,700
Okay. So we can compare them.

14
00:00:38,700 --> 00:00:39,600
But let's not do this.

15
00:00:39,600 --> 00:00:42,333
Let's just get to the point
where I want to show you that.

16
00:00:42,333 --> 00:00:46,800
Now a confusion matrix
is going to be created without any error.

17
00:00:46,833 --> 00:00:49,766
So let's select this execute.

18
00:00:49,766 --> 00:00:54,300
And now as you can see the confusion
matrix is created without any problem.

19
00:00:54,866 --> 00:00:58,666
So and now we can have a look
to see the number of incorrect predictions

20
00:00:58,666 --> 00:01:02,433
which is 7.7 equals
14 incorrect predictions.

21
00:01:03,166 --> 00:01:03,733
Not bad.

22
00:01:03,733 --> 00:01:06,266
Out of the 100
observations of the test set.

23
00:01:06,266 --> 00:01:07,500
Okay, great.

24
00:01:07,500 --> 00:01:11,600
And now we're finally getting to the fun
part, which is to visualize the results.

25
00:01:11,833 --> 00:01:14,333
Okay.
So I'll just you can pause on the video.

26
00:01:14,333 --> 00:01:15,766
And now I'll just select this.

27
00:01:17,033 --> 00:01:18,633
So let's see command and control.

28
00:01:18,633 --> 00:01:20,666
Press enter to execute.

29
00:01:20,666 --> 00:01:24,300
And there is our Naive Bayes
graphic results.

30
00:01:24,633 --> 00:01:29,166
Isn't it beautiful how this prediction
boundary is a smooth curve?

31
00:01:29,233 --> 00:01:32,633
Classifying quite well the data sets.

32
00:01:32,900 --> 00:01:34,666
That is a data set of observations.

33
00:01:34,666 --> 00:01:36,300
Nonlinear. Separable.

34
00:01:36,300 --> 00:01:38,566
It's kind of like the kernel SVM curve.

35
00:01:38,566 --> 00:01:42,900
You know it's a beautiful smooth curve
that manages to catch those green users

36
00:01:42,900 --> 00:01:46,200
that couldn't be caught
by linear classifiers

37
00:01:46,200 --> 00:01:47,900
because we had the straight line

38
00:01:47,900 --> 00:01:50,533
and therefore it couldn't
catch the green users here

39
00:01:50,533 --> 00:01:53,533
and put them in the green category
that were in the red category.

40
00:01:53,700 --> 00:01:58,133
But thanks to this curve, we can see that
it's making less incorrect predictions.

41
00:01:58,133 --> 00:02:01,733
But still some like those 123 here.

42
00:02:02,266 --> 00:02:06,300
Regarding this users here, this
all the users with a low estimated salary

43
00:02:07,000 --> 00:02:10,066
that were incorrectly predicted
by the linear classifiers.

44
00:02:10,566 --> 00:02:13,733
And then, however, it's
still making some few mistakes here.

45
00:02:13,766 --> 00:02:17,733
We would have liked to have a lower curve
here, like the curve starting from here.

46
00:02:18,133 --> 00:02:21,800
But that's what naive Bayes could do here,
and that's already quite a good job.

47
00:02:22,333 --> 00:02:25,566
So now let's see what it does on the test
set results.

48
00:02:26,133 --> 00:02:29,000
Here we have the test result code.

49
00:02:29,000 --> 00:02:30,900
Let's execute it.

50
00:02:30,900 --> 00:02:32,733
And here is the test set.

51
00:02:32,733 --> 00:02:35,333
If the execution of this code
is taking too much time,

52
00:02:35,333 --> 00:02:37,633
you can try to take a lower resolution.

53
00:02:37,633 --> 00:02:38,633
Because right now you can see that

54
00:02:38,633 --> 00:02:42,966
we have a very high resolution with this
by 0.01 step here.

55
00:02:43,033 --> 00:02:46,033
We cannot see the pixels here
thanks to this resolution.

56
00:02:46,500 --> 00:02:50,500
If you take a no point one resolution,
the script will execute much faster.

57
00:02:50,500 --> 00:02:52,466
But then you will see the pixels point.

58
00:02:52,466 --> 00:02:53,700
So it's as you want.

59
00:02:53,700 --> 00:02:56,866
So the test set results
are actually not bad as well.

60
00:02:57,366 --> 00:02:59,966
It did a pretty good job classifying
this green uses

61
00:02:59,966 --> 00:03:04,900
here to the right green category,
but still some incorrect predictions.

62
00:03:04,900 --> 00:03:06,033
Resist.

63
00:03:06,033 --> 00:03:09,033
Since those green points here
stay in the red region.

64
00:03:09,600 --> 00:03:12,633
All right,
so that's the best graphic results.

65
00:03:12,633 --> 00:03:14,100
I hope you enjoyed what you saw.

66
00:03:14,100 --> 00:03:17,400
We're going to have other different
surprises of other classifiers.

67
00:03:17,400 --> 00:03:20,600
You'll see that we will get very different
kind of prediction boundaries

68
00:03:20,600 --> 00:03:24,600
when we look at the decision trees
classifiers and the random forest.

69
00:03:24,600 --> 00:03:25,566
Guess first.

70
00:03:25,566 --> 00:03:27,566
So I look forward to showing that to you.

71
00:03:27,566 --> 00:03:29,400
And until then, enjoy machine learning.