1
00:00:00,733 --> 00:00:01,833
Hello and welcome back to the.

2
00:00:01,833 --> 00:00:03,033
Course on Machine Learning.

3
00:00:03,033 --> 00:00:06,100
This is Kirill Eremenko,
and in today's tutorial we're.

4
00:00:06,100 --> 00:00:08,433
Talking about the Naive Bayes classifier.

5
00:00:08,433 --> 00:00:10,866
This is a very interesting machine
learning algorithm.

6
00:00:10,866 --> 00:00:12,600
And today we're going to get to.

7
00:00:12,600 --> 00:00:15,300
Know it on a very intuitive level.

8
00:00:15,300 --> 00:00:16,133
And in line

9
00:00:16,133 --> 00:00:19,900
with the super data science mission,
which is making the complex simple.

10
00:00:20,000 --> 00:00:24,500
We're going to break down this complex
topic into simple steps and bite.

11
00:00:24,500 --> 00:00:26,666
Sized pieces of information.

12
00:00:26,666 --> 00:00:29,200
I've got some very exciting slides
prepared ahead, so.

13
00:00:29,200 --> 00:00:31,333
Let's dive straight into it.

14
00:00:31,333 --> 00:00:31,633
All right.

15
00:00:31,633 --> 00:00:33,600
So here we've got the Bayes theorem.

16
00:00:33,600 --> 00:00:35,466
And this is something we talked about
in the previous tutorials.

17
00:00:35,466 --> 00:00:38,400
So by now we should be quite. Comfortable
with the concept.

18
00:00:38,400 --> 00:00:40,900
How are we going to apply it to create a.

19
00:00:40,900 --> 00:00:42,533
Machine learning algorithm.

20
00:00:42,533 --> 00:00:44,133
Well let's have a look here.

21
00:00:44,133 --> 00:00:45,400
We've got a data set.

22
00:00:45,400 --> 00:00:47,133
So it has two features.

23
00:00:47,133 --> 00:00:48,866
It has X1 and x2.

24
00:00:48,866 --> 00:00:50,200
And there are two categories category.

25
00:00:50,200 --> 00:00:52,500
One which is red and category two
which is green.

26
00:00:52,500 --> 00:00:55,900
But instead of working with these abstract
terms, we're going to convert them into

27
00:00:56,200 --> 00:00:57,100
something that.

28
00:00:57,100 --> 00:00:58,400
We can understand a bit better,

29
00:00:58,400 --> 00:01:01,666
something that's a bit easier to operate
with or to talk about.

30
00:01:01,666 --> 00:01:03,066
So we're going to call the y.

31
00:01:03,066 --> 00:01:05,100
Variable x2 variable salary.

32
00:01:05,100 --> 00:01:07,433
And the x one variable is going to be h.

33
00:01:07,433 --> 00:01:10,533
So basically
we're representing observations or people.

34
00:01:10,533 --> 00:01:14,600
That are part of our data
set in terms of their age and salary.

35
00:01:14,600 --> 00:01:17,600
As you can see
we have 30 people here on this chart.

36
00:01:17,700 --> 00:01:21,266
And the categories we're going to replace
them with walks, meaning that person.

37
00:01:21,266 --> 00:01:23,700
Walks to work and green will be. Drive.

38
00:01:23,700 --> 00:01:25,833
That means that person drives to work.

39
00:01:25,833 --> 00:01:28,466
And so now we get to a problem
to the machine learning challenge that.

40
00:01:28,466 --> 00:01:29,766
We're going to be solving.

41
00:01:29,766 --> 00:01:33,866
What happens if we add a new observation,
a new data point into this set?

42
00:01:34,300 --> 00:01:37,366
How do we. Classify this new data point.

43
00:01:37,366 --> 00:01:41,400
So as you can tell, this is a supervised
machine learning algorithm

44
00:01:41,400 --> 00:01:44,800
because we're classifying something
based on previously known classes.

45
00:01:45,366 --> 00:01:47,433
And so the question is
is this person going to be.

46
00:01:47,433 --> 00:01:49,366
Classified as a person. Who walks to work?

47
00:01:49,366 --> 00:01:49,933
Or is this.

48
00:01:49,933 --> 00:01:52,866
Person going to be classified
as a person who drives to work?

49
00:01:52,866 --> 00:01:54,633
And the Naive Bayes.

50
00:01:54,633 --> 00:01:55,500
Algorithm is going.

51
00:01:55,500 --> 00:01:57,433
To help us solve this challenge.

52
00:01:57,433 --> 00:01:59,366
All right.
So how are we going to approach this?

53
00:01:59,366 --> 00:02:00,666
We need a plan of attack.

54
00:02:00,666 --> 00:02:02,600
It is going to be quite
a complex approach.

55
00:02:02,600 --> 00:02:04,500
But at the same time
we're going to break it down into.

56
00:02:04,500 --> 00:02:06,366
Steps and it'll all make sense.

57
00:02:06,366 --> 00:02:08,100
It'll be very easy to understand.

58
00:02:08,100 --> 00:02:09,866
So our plan of attack.

59
00:02:09,866 --> 00:02:11,500
We're going to take the Bayes
theorem and we're.

60
00:02:11,500 --> 00:02:12,833
Going to apply it twice.

61
00:02:12,833 --> 00:02:15,666
First time
we're going to apply it to find out

62
00:02:15,666 --> 00:02:20,266
what is the probability
that this person walks given his features.

63
00:02:20,266 --> 00:02:24,400
And x over here is the features
or represents the features.

64
00:02:24,800 --> 00:02:26,400
Of that data point.

65
00:02:26,400 --> 00:02:28,600
So let's go back to the visualization
here.

66
00:02:28,600 --> 00:02:29,066
So here.

67
00:02:29,066 --> 00:02:30,900
You can see that this is.

68
00:02:30,900 --> 00:02:33,700
Our new data point.
That person has a certain age.

69
00:02:33,700 --> 00:02:37,366
So let's say the age of that person
maybe is like 25 years old.

70
00:02:37,700 --> 00:02:39,000
And then they have a salary.

71
00:02:39,000 --> 00:02:41,966
So let's say this
salary is $30,000 per year.

72
00:02:41,966 --> 00:02:45,166
So those are features of this observation.

73
00:02:45,166 --> 00:02:46,966
Right now
we're only working with two variables

74
00:02:46,966 --> 00:02:49,233
just for simplicity's sake
so we. Can visualize.

75
00:02:49,233 --> 00:02:50,600
Things age and salary.

76
00:02:50,600 --> 00:02:53,300
But in reality there could be many many
many more features.

77
00:02:53,300 --> 00:02:58,166
There could be features on how many what
what industry they work in or how many

78
00:02:58,166 --> 00:03:02,333
years of education they have, or how long
they've had a driver's license for.

79
00:03:02,333 --> 00:03:04,766
And the things like kind of
how far away they live from work.

80
00:03:04,766 --> 00:03:06,766
So there could be lots of variables.

81
00:03:06,766 --> 00:03:07,933
But at the same time, right now

82
00:03:07,933 --> 00:03:10,133
we're only going to be dealing
with two age and salary.

83
00:03:10,133 --> 00:03:12,266
And regardless of how many variables you.

84
00:03:12,266 --> 00:03:14,033
Have, they will be called.

85
00:03:14,033 --> 00:03:15,600
And we're going to call them features.

86
00:03:15,600 --> 00:03:19,133
So given the features of X,
so given the age of.

87
00:03:19,133 --> 00:03:23,833
25 and the salary of $30,000,
and we'll talk in more detail

88
00:03:23,833 --> 00:03:27,000
about exactly what we mean by features
just in a moment.

89
00:03:27,000 --> 00:03:32,133
And so therefore this part represents
that person that we're trying to classify.

90
00:03:32,333 --> 00:03:35,866
What is the likelihood of a person
with those features x.

91
00:03:35,866 --> 00:03:40,033
So we know that we are taking somebody
for those features that we have

92
00:03:40,033 --> 00:03:41,200
in our new data point.

93
00:03:41,200 --> 00:03:43,200
What is the likelihood of them working.

94
00:03:43,200 --> 00:03:44,333
And then we've. Got the right side.

95
00:03:44,333 --> 00:03:47,966
So we're going to talk through
each one of these as we calculate them.

96
00:03:48,000 --> 00:03:50,966
But for now let's just give. Them
their names going from right to left.

97
00:03:50,966 --> 00:03:54,433
So this one on over here
is called the prior probability.

98
00:03:55,166 --> 00:03:58,466
And we're going to calculate that first
because it's the easiest to calculate.

99
00:03:58,900 --> 00:04:01,333
Next one is the marginal likelihood.

100
00:04:01,333 --> 00:04:03,633
And we're going to calculate that second.

101
00:04:03,633 --> 00:04:06,233
The third one is a livelihood.

102
00:04:06,233 --> 00:04:07,400
That's just the names that they have.

103
00:04:07,400 --> 00:04:09,000
And we're going to calculate that.

104
00:04:09,000 --> 00:04:12,733
Third and finally what we're looking for
is called the posterior probability.

105
00:04:12,833 --> 00:04:14,633
We're going to calculate that force.

106
00:04:14,633 --> 00:04:14,900
All right.

107
00:04:14,900 --> 00:04:17,200
So that's our plan of attack for step one.

108
00:04:17,200 --> 00:04:20,966
This is all still step one to calculate
the probability that somebody walks

109
00:04:20,966 --> 00:04:24,566
given those features x
that we see in our new data point.

110
00:04:25,333 --> 00:04:28,200
Next we're going to have step two where
we're going to calculate the probability

111
00:04:28,200 --> 00:04:32,933
that somebody drives given those features
x that we see in our new data point.

112
00:04:33,133 --> 00:04:35,966
And again here we'll have the probability
which will calculate first,

113
00:04:35,966 --> 00:04:38,466
then the marginal likelihood
then the likelihood.

114
00:04:38,466 --> 00:04:40,366
And then we'll get to posterior
probability.

115
00:04:40,366 --> 00:04:43,300
And finally we're going to compare
the probability that somebody walks

116
00:04:43,300 --> 00:04:47,000
given features x versus the probability
that somebody drives given features x.

117
00:04:47,366 --> 00:04:51,100
And then from there we'll decide
which class to put that new data point in.

118
00:04:51,566 --> 00:04:55,233
So as you can see the Naive Bayes
classifier is a probabilistic type

119
00:04:55,233 --> 00:04:55,800
of classifier

120
00:04:55,800 --> 00:04:56,900
because we first calculate

121
00:04:56,900 --> 00:05:00,100
the probabilities and then based on
probabilities, we're assigning a class.

122
00:05:00,666 --> 00:05:00,966
All right.

123
00:05:00,966 --> 00:05:04,733
So are you ready to perform these steps.

124
00:05:04,733 --> 00:05:05,866
It's going to be lots of fun.

125
00:05:05,866 --> 00:05:08,766
We're going to take it nice and easy
nice and slowly

126
00:05:08,766 --> 00:05:11,033
so that we understand everything.
And after

127
00:05:11,033 --> 00:05:14,033
this you're going to be very comfortable
with the Naive Bayes classifier.

128
00:05:14,066 --> 00:05:16,500
Step one. All right.
So here we have our visualization.

129
00:05:16,500 --> 00:05:18,666
Let's move it all to the left a little bit
so we.

130
00:05:18,666 --> 00:05:19,933
Can make some space.

131
00:05:19,933 --> 00:05:20,433
Now we're going to.

132
00:05:20,433 --> 00:05:23,833
Calculate the first probability
in our Bayes theorem.

133
00:05:23,833 --> 00:05:27,400
We're going to calculate the probability
that somebody walks right.

134
00:05:27,400 --> 00:05:29,400
Just the overall probability.
And what does that mean.

135
00:05:29,400 --> 00:05:33,800
That is the probability that somebody
walks without knowing anything about them.

136
00:05:33,800 --> 00:05:34,933
So we're just saying

137
00:05:34,933 --> 00:05:38,233
we're going to add a new observation
to our data set into here.

138
00:05:38,600 --> 00:05:41,000
But we don't know their age
and we don't know their salary.

139
00:05:41,000 --> 00:05:43,866
We're just going to put it somewhere.
Into our data set.

140
00:05:43,866 --> 00:05:45,500
What is the probability that this person.

141
00:05:45,500 --> 00:05:47,933
That we're adding to our database walks
to work?

142
00:05:47,933 --> 00:05:49,566
Well, it's very easy and straightforward.

143
00:05:49,566 --> 00:05:51,700
From here we don't have much choice.

144
00:05:51,700 --> 00:05:54,433
The only thing that we can do
is calculate the number of.

145
00:05:54,433 --> 00:05:55,566
Read observations,

146
00:05:55,566 --> 00:05:58,566
the number of people that actually walk
and divide by the overall number.

147
00:05:58,800 --> 00:06:00,966
So probability
that a person walks to work.

148
00:06:00,966 --> 00:06:03,300
Without any other knowledge.

149
00:06:03,300 --> 00:06:05,466
Is the number of workers,
number of people at work,

150
00:06:05,466 --> 00:06:08,500
which is these red dots divided
by the total number of observation.

151
00:06:08,500 --> 00:06:08,966
The green dots.

152
00:06:08,966 --> 00:06:11,966
So the gray dot is in participating
in these calculations.

153
00:06:12,200 --> 00:06:14,800
So here we have
probability of somebody walks is 1010

154
00:06:14,800 --> 00:06:17,800
red dots divided by 30 dots overall.

155
00:06:17,800 --> 00:06:18,133
All right.

156
00:06:18,133 --> 00:06:18,900
So that was easy.

157
00:06:18,900 --> 00:06:21,233
We've calculated the prior probability.

158
00:06:21,233 --> 00:06:23,433
Next we calculating
the marginal likelihood.

159
00:06:23,433 --> 00:06:25,966
And this is where things get interesting.

160
00:06:25,966 --> 00:06:28,966
So how do we calculate
the marginal likelihood.

161
00:06:29,200 --> 00:06:30,233
Let's have a look.

162
00:06:30,233 --> 00:06:32,133
Here's our dataset again.

163
00:06:32,133 --> 00:06:35,866
And the first thing we're you're going to
do is we're going to select a radius.

164
00:06:35,866 --> 00:06:39,600
And we're going to draw a circle
around our observation like that.

165
00:06:40,266 --> 00:06:41,866
Now this radius you need.

166
00:06:41,866 --> 00:06:43,500
To select on your own.

167
00:06:43,500 --> 00:06:45,200
You need to decide for your algorithm.

168
00:06:45,200 --> 00:06:47,233
This is going to be like
an input parameter or an algorithm.

169
00:06:47,233 --> 00:06:49,466
You can select less.
You can select it more.

170
00:06:49,466 --> 00:06:50,533
It's up to you.

171
00:06:50,533 --> 00:06:51,866
Now what does this radius do.

172
00:06:51,866 --> 00:06:55,200
Well what we're going to do is we're going
to first of all let's

173
00:06:55,666 --> 00:06:58,266
just to make things
easier, we're going to remove.

174
00:06:58,266 --> 00:07:01,700
Our dot for now
just so that it's not confusing us.

175
00:07:02,033 --> 00:07:02,600
And then we're.

176
00:07:02,600 --> 00:07:04,000
Going to look at all the.

177
00:07:04,000 --> 00:07:06,400
Points that are inside this radius.

178
00:07:06,400 --> 00:07:07,100
And what.

179
00:07:07,100 --> 00:07:07,966
We're saying here is.

180
00:07:07,966 --> 00:07:10,966
That all of the points inside the circle.

181
00:07:11,066 --> 00:07:15,833
Are we're going to deem them to be similar
in terms of features, to the.

182
00:07:15,833 --> 00:07:18,000
Point that we had the point that we had.

183
00:07:18,000 --> 00:07:22,066
Remember, it had an age of, for example,
25 and a salary of $30,000 per year.

184
00:07:22,366 --> 00:07:24,900
So now we're going to draw
a radius around it.

185
00:07:24,900 --> 00:07:29,833
And let's say anybody between the ages
of 20 and 30 and in the salaries

186
00:07:29,833 --> 00:07:33,100
of $25,000 to $35,000,

187
00:07:33,466 --> 00:07:37,233
anybody that falls in that circle,
again, it's not a square.

188
00:07:37,266 --> 00:07:38,900
It's not just a square, it's a circle.

189
00:07:38,900 --> 00:07:42,233
anybody who falls somewhere,
somewhere in that.

190
00:07:42,433 --> 00:07:46,200
Vicinity is going to be deemed similar to.

191
00:07:46,400 --> 00:07:49,100
The new data point
that we're adding to our data set.

192
00:07:49,100 --> 00:07:50,900
So, as you can imagine,
this radius is actually going

193
00:07:50,900 --> 00:07:54,400
to have a big say in the way
your algorithm works.

194
00:07:54,733 --> 00:07:56,500
Well, let's say we have this radius and.

195
00:07:56,500 --> 00:07:58,433
This is how it all played out.

196
00:07:58,433 --> 00:08:01,200
We have three red dots
one green dot in there.

197
00:08:01,200 --> 00:08:01,500
All right.

198
00:08:01,500 --> 00:08:02,966
So now what do we do.

199
00:08:02,966 --> 00:08:05,233
How do we calculate the probability of x.

200
00:08:05,233 --> 00:08:07,200
And what is the probability of x.

201
00:08:07,200 --> 00:08:11,900
Well the probability of x
is the probability of a new point.

202
00:08:11,900 --> 00:08:13,533
That we add to our data set.

203
00:08:13,533 --> 00:08:16,866
Being similar in features to the.

204
00:08:16,866 --> 00:08:19,466
Point that we actually are adding to. It.

205
00:08:19,466 --> 00:08:20,866
So basically is the probability of.

206
00:08:20,866 --> 00:08:22,666
That new point that we're adding.

207
00:08:22,666 --> 00:08:25,700
Or like any random point
that we add, is the probability

208
00:08:25,700 --> 00:08:29,233
of that
any random point to fall into this circle.

209
00:08:29,733 --> 00:08:33,000
And p of x is calculated
as the number of similar observations.

210
00:08:33,000 --> 00:08:35,866
So the number of observations
that already. We can see in the circle.

211
00:08:35,866 --> 00:08:40,200
So one, two three, four divided by the
total number of observations which is 30.

212
00:08:40,466 --> 00:08:42,900
So p of x is four divided by three.

213
00:08:42,900 --> 00:08:48,166
Once again just to reiterate, p of x tells
us what is the likelihood of any new.

214
00:08:48,166 --> 00:08:52,533
Random variable that we add to this data
set falling inside this circle.

215
00:08:52,933 --> 00:08:57,033
And it is four over 30 because we only
have four, based on prior knowledge,

216
00:08:57,033 --> 00:09:00,300
we can tell that this four here
and this student also is four with three.

217
00:09:00,966 --> 00:09:01,233
All right.

218
00:09:01,233 --> 00:09:03,466
So that wasn't hard at all as well.

219
00:09:03,466 --> 00:09:05,200
We calculate the marginal likelihood.

220
00:09:05,200 --> 00:09:07,666
So so far we got this one
and we got this one.

221
00:09:07,666 --> 00:09:09,533
Next we're moving on to the likelihood.

222
00:09:09,533 --> 00:09:11,933
And this is probably the most complex one.

223
00:09:11,933 --> 00:09:14,866
What is the likelihood that somebody

224
00:09:14,866 --> 00:09:17,866
who walks exhibits features X.

225
00:09:18,466 --> 00:09:21,733
Well actually after we've spoken
about the marginal likelihood calculating.

226
00:09:21,733 --> 00:09:23,833
The likelihood won't be as. Complex.

227
00:09:23,833 --> 00:09:25,233
So let's have a look.

228
00:09:25,233 --> 00:09:27,033
So there's our chart.

229
00:09:27,033 --> 00:09:28,266
And now what we're.

230
00:09:28,266 --> 00:09:29,766
Going to do is we're going to.

231
00:09:29,766 --> 00:09:32,333
Draw the same circle again.

232
00:09:32,333 --> 00:09:33,900
And once again we're going to remove the.

233
00:09:33,900 --> 00:09:35,100
Gray point for now.

234
00:09:35,100 --> 00:09:37,133
And we're going to color our circle.

235
00:09:37,133 --> 00:09:37,700
And so.

236
00:09:37,700 --> 00:09:42,433
Anything that falls inside
the circle is deemed to be similar to the.

237
00:09:42,433 --> 00:09:44,100
Point that we're adding.

238
00:09:44,100 --> 00:09:46,600
So the question is
what is the probability.

239
00:09:46,600 --> 00:09:50,100
That a randomly selected data
point from our data.

240
00:09:50,100 --> 00:09:52,366
Set will be similar to.

241
00:09:52,366 --> 00:09:53,933
The data point that we're adding.

242
00:09:53,933 --> 00:09:58,266
So basically, what is the likelihood that
a randomly selected data point will be.

243
00:09:58,266 --> 00:09:59,833
From this circle?

244
00:09:59,833 --> 00:10:03,200
Given this vertical pipe means given that

245
00:10:03,200 --> 00:10:06,866
that person walks, that we know that
that person walks to work.

246
00:10:07,100 --> 00:10:08,100
The other way to think about

247
00:10:08,100 --> 00:10:11,866
this is we're only working
with people who walk to work.

248
00:10:11,866 --> 00:10:15,533
So we're only working with the red dots
which represent people who walk to work.

249
00:10:15,533 --> 00:10:17,933
So let's forget about the green dots
there like that.

250
00:10:17,933 --> 00:10:20,933
Now they're faint and we're not even
talking about them at all.

251
00:10:21,033 --> 00:10:22,600
We're only talking about the red dots.

252
00:10:22,600 --> 00:10:24,666
So the question is, given that we're only.

253
00:10:24,666 --> 00:10:28,833
Working with the red dots,
what is the likelihood that a

254
00:10:28,866 --> 00:10:32,266
randomly selected data point from our data

255
00:10:32,266 --> 00:10:35,433
set, from the red dots, is somebody.

256
00:10:35,433 --> 00:10:38,666
Who exhibits features similar to.

257
00:10:38,766 --> 00:10:41,400
The point that we are adding
to our data set.

258
00:10:41,400 --> 00:10:44,866
So basically, what is the likelihood
that a randomly selected

259
00:10:44,866 --> 00:10:48,666
red dot falls into this gray area,
into this circle?

260
00:10:49,066 --> 00:10:50,700
That's what the question we're asking.

261
00:10:50,700 --> 00:10:52,200
And that is also very simple.

262
00:10:52,200 --> 00:10:54,966
Now that we know how
all of this works, it's basically

263
00:10:54,966 --> 00:10:57,600
the number of similar observations
among those who work.

264
00:10:57,600 --> 00:11:01,200
So the number of red dots
that actually fall inside this red circle.

265
00:11:01,200 --> 00:11:05,766
In this gray circle, that's three
divided by the total number of workers.

266
00:11:05,766 --> 00:11:08,466
So people and total
number of people who walk to work.

267
00:11:08,466 --> 00:11:10,400
And that is three over ten.

268
00:11:10,400 --> 00:11:10,933
There we go.

269
00:11:10,933 --> 00:11:15,866
So that's our P of the likelihood
of somebody exhibiting the feature similar

270
00:11:15,866 --> 00:11:20,100
to that data point that were about to add,
given that we're only selecting among

271
00:11:20,366 --> 00:11:21,666
the red dots.

272
00:11:21,666 --> 00:11:22,800
So that's three over ten.

273
00:11:22,800 --> 00:11:24,133
And that was our likelihood.

274
00:11:24,133 --> 00:11:26,833
So now if we plug all that in
so there we go.

275
00:11:26,833 --> 00:11:28,166
That likelihood is done.

276
00:11:28,166 --> 00:11:29,233
So if we plug all of that.

277
00:11:29,233 --> 00:11:31,200
In we'll get our posterior probability.

278
00:11:31,200 --> 00:11:36,200
So three over ten times
ten over 30 and divided by four over 30.

279
00:11:36,200 --> 00:11:39,333
So if we calculate that it'll give us 0.75

280
00:11:39,533 --> 00:11:42,766
75% is the probability

281
00:11:42,766 --> 00:11:46,633
that somebody that we put into the place
where we're putting. X.

282
00:11:46,633 --> 00:11:50,033
Is should be classified
as a person who walks to work.

283
00:11:50,533 --> 00:11:53,900
That was step
one was pretty intense, right.

284
00:11:53,900 --> 00:11:57,266
Pretty exciting
to calculate this value now.

285
00:11:57,266 --> 00:11:58,866
And the next step is step two.

286
00:11:58,866 --> 00:12:00,700
That's step one.
The next step is step two.

287
00:12:00,700 --> 00:12:03,600
To do the same thing for the likelihood
that somebody with.

288
00:12:03,600 --> 00:12:04,800
Features X.

289
00:12:04,800 --> 00:12:09,400
Will be classified or should be classified
as a person who drives to work.

290
00:12:09,966 --> 00:12:12,000
And here
I'm going to throw you a challenge.

291
00:12:12,000 --> 00:12:15,000
I'm going to challenge you
to pause this video.

292
00:12:15,000 --> 00:12:19,333
Or rewind back to find out,
to have the image in front of you

293
00:12:19,866 --> 00:12:22,333
and do these calculations yourself, to.

294
00:12:22,333 --> 00:12:23,433
Actually go through the.

295
00:12:23,433 --> 00:12:25,700
Same steps and perform
those. Calculations.

296
00:12:25,700 --> 00:12:29,400
If you'd like to see
and compare to my calculations.

297
00:12:29,666 --> 00:12:32,966
Then I'm going to put in another video
after this one.

298
00:12:32,966 --> 00:12:35,400
So another tutorial
after this one in the course.

299
00:12:35,400 --> 00:12:37,933
So you can just go to the next tutorial
and compare.

300
00:12:37,933 --> 00:12:40,233
Otherwise
I'm just going to show you the result now.

301
00:12:40,233 --> 00:12:43,233
So the result is one over 24 likelihood.

302
00:12:43,233 --> 00:12:44,233
Or let's start from the right.

303
00:12:44,233 --> 00:12:48,000
Prior probability is 20 over 30
and marginal likelihood remains

304
00:12:48,000 --> 00:12:49,366
unchanged for over 30.

305
00:12:49,366 --> 00:12:51,833
Likelihood changes to one over 20.

306
00:12:51,833 --> 00:12:54,433
So the probability of somebody
who exhibits features

307
00:12:54,433 --> 00:12:58,200
X being a person who drives to work
is 25%.

308
00:12:58,700 --> 00:13:00,233
So that was step two.

309
00:13:00,233 --> 00:13:01,500
Now we're going to do step three.

310
00:13:01,500 --> 00:13:02,766
We're going to compare.

311
00:13:02,766 --> 00:13:06,633
The probability
of somebody with features X

312
00:13:06,933 --> 00:13:10,033
the probability of them being a person
who walks to work versus the probability

313
00:13:10,033 --> 00:13:13,033
of somebody who features
X being a person who drives to work.

314
00:13:13,100 --> 00:13:15,466
So it's 75% versus 25%.

315
00:13:15,466 --> 00:13:18,100
And therefore
the first is greater in the second.

316
00:13:18,100 --> 00:13:21,300
And therefore
it is more likely that that person.

317
00:13:21,300 --> 00:13:23,233
With features X is.

318
00:13:23,233 --> 00:13:27,133
Going to be a person who walks to work
than the person who drives to work.

319
00:13:27,133 --> 00:13:30,766
So still a 25% chance
that that is a person who drives to work.

320
00:13:31,033 --> 00:13:33,966
But percent chance
that it is a person who walks to.

321
00:13:33,966 --> 00:13:37,400
Work is great, 75%
and therefore, we're going to classify

322
00:13:37,600 --> 00:13:40,733
this point as a person who walks to work.

323
00:13:41,366 --> 00:13:43,266
There we go. That is how the.

324
00:13:43,266 --> 00:13:47,200
Naive Bayes algorithm in machine
learning works.

325
00:13:47,600 --> 00:13:49,800
I hope you found this tutorial useful.

326
00:13:49,800 --> 00:13:53,200
I was I'm pretty excited and pre
proud of these slides,

327
00:13:53,200 --> 00:13:56,700
and hopefully this is a step
by step and simple.

328
00:13:56,700 --> 00:13:59,100
Explanation of a complex concept.

329
00:13:59,100 --> 00:14:00,766
And I look forward
to seeing you next time.

330
00:14:00,766 --> 00:14:02,566
Until then, enjoy machine learning.