1
00:00:00,600 --> 00:00:03,133
Hello and welcome back to the course
on Machine Learning.

2
00:00:03,133 --> 00:00:06,266
In today's tutorial,
I'm going to show you my solution

3
00:00:06,266 --> 00:00:09,266
of the challenge
that I threw at you Lausanne.

4
00:00:09,600 --> 00:00:11,600
So in the previous tutorial,
we had the challenge

5
00:00:11,600 --> 00:00:16,433
to calculate the posterior
probability of somebody being a person

6
00:00:16,433 --> 00:00:20,333
that drives to work a
given that they are placed in the position

7
00:00:20,333 --> 00:00:24,300
where we are placing the new observation
for our data set.

8
00:00:24,533 --> 00:00:26,300
So basically, what is the likelihood

9
00:00:26,300 --> 00:00:29,766
of that new observation
representing a person who drives to work?

10
00:00:30,200 --> 00:00:33,666
And the way we need to
calculate is using the Bayes theorem,

11
00:00:33,666 --> 00:00:35,100
which is right in front of us.

12
00:00:35,100 --> 00:00:36,433
As I know we're going to walk through it.

13
00:00:36,433 --> 00:00:38,600
First we're going to calculate the prior
probability.

14
00:00:38,600 --> 00:00:40,433
Then we're going to calculate
the marginal likelihood.

15
00:00:40,433 --> 00:00:42,966
And then we're going
to calculate the likelihood.

16
00:00:42,966 --> 00:00:47,033
And you can compare along the way
if you got the same result.

17
00:00:47,266 --> 00:00:49,133
So let's go through it.

18
00:00:49,133 --> 00:00:50,366
So there's our data set.

19
00:00:50,366 --> 00:00:53,100
And now let's move it to the left
to make some space.

20
00:00:53,100 --> 00:00:56,000
And the first thing we're going to
calculate is a prior probability.

21
00:00:56,000 --> 00:00:57,500
The prior probability.

22
00:00:57,500 --> 00:00:58,833
There's kind of two ways
to think about it.

23
00:00:58,833 --> 00:01:02,733
So the first way to think about it
is if I just randomly select

24
00:01:02,933 --> 00:01:05,933
a person from our data set right now.

25
00:01:06,166 --> 00:01:07,800
So not including the gray dot,

26
00:01:07,800 --> 00:01:10,800
not including the new data point,
if I randomly select a person from there,

27
00:01:10,966 --> 00:01:15,133
what is the likelihood of them
being a person who drives to work right?

28
00:01:15,300 --> 00:01:17,733
So that will be just
the number of the answer to

29
00:01:17,733 --> 00:01:20,733
that is the number of green dots
or the total number of dots.

30
00:01:21,000 --> 00:01:24,033
The other way to think about it
is if I just randomly throw

31
00:01:24,033 --> 00:01:27,233
a new data point into our data set, right?

32
00:01:27,266 --> 00:01:31,333
Just randomly and without knowing anything
about their age or salary,

33
00:01:31,333 --> 00:01:35,266
then just knowing all of this prior
information that we have the green dots

34
00:01:35,266 --> 00:01:38,966
and the red dots, what is the likelihood
of that person that we're adding?

35
00:01:39,533 --> 00:01:42,566
What is their likelihood
to be a person who drives to work?

36
00:01:43,033 --> 00:01:46,633
Again, we don't have is very simple
because we don't have any other choice

37
00:01:46,966 --> 00:01:50,100
but to calculate the probability
and assign the probability

38
00:01:50,466 --> 00:01:51,833
just based on what we know.

39
00:01:51,833 --> 00:01:53,966
And that is just to take the green dots,

40
00:01:53,966 --> 00:01:57,133
add the 20 green dots
divided by the total number of dots.

41
00:01:57,133 --> 00:01:59,066
Here, and assign that as a probability.

42
00:01:59,066 --> 00:02:00,700
So we don't have any other choice.

43
00:02:00,700 --> 00:02:03,100
And therefore that's
what it's calculated as.

44
00:02:03,100 --> 00:02:06,466
So the probability of somebody
being a person who drives to work

45
00:02:06,466 --> 00:02:09,633
or a random dot
selected out of our existing dots,

46
00:02:09,900 --> 00:02:12,900
being a person who drives to work
is a number of drivers,

47
00:02:13,300 --> 00:02:16,233
which is 20 divided by total duration,
which is 30.

48
00:02:16,233 --> 00:02:18,300
So we go 20 over three.

49
00:02:18,300 --> 00:02:20,766
So that was the prior probability
we've done that.

50
00:02:20,766 --> 00:02:23,066
Next one is a marginal likelihood.

51
00:02:23,066 --> 00:02:26,066
So let's go ahead and calculate that.

52
00:02:26,200 --> 00:02:28,366
And you'll find that
the marginal likelihood

53
00:02:28,366 --> 00:02:31,833
is actually going to be exactly the same
as in the previous tutorial.

54
00:02:31,833 --> 00:02:33,366
And we'll talk about this separately.

55
00:02:33,366 --> 00:02:37,766
So again we're going to draw the circle
around our observation.

56
00:02:37,766 --> 00:02:40,466
We're to remove observations.
So it's not in the way.

57
00:02:40,466 --> 00:02:42,800
Then we're going to shade in this area.

58
00:02:42,800 --> 00:02:48,000
And so now the marginal likelihood
is the question what is the likelihood of

59
00:02:48,000 --> 00:02:52,133
if I just pick a random dot from our data
set just randomly.

60
00:02:52,200 --> 00:02:54,600
What is the likelihood
that I'm going to pick one out of here.

61
00:02:54,600 --> 00:02:59,966
So what the reason why we put X here is,
is because what is the likelihood of me

62
00:02:59,966 --> 00:03:03,700
picking a observation
that exhibits features

63
00:03:03,900 --> 00:03:08,900
similar to the features of that point
that we are adding to our dataset?

64
00:03:08,900 --> 00:03:11,766
So the point we're adding to the data set
we've just removed, it is over there.

65
00:03:11,766 --> 00:03:17,100
And we've agreed that any dot inside
the circle is deemed to be similar

66
00:03:17,100 --> 00:03:22,433
to that dot, or in other words,
deemed to be exhibiting similar features

67
00:03:22,466 --> 00:03:25,466
so similar age
and similar salary to that dot.

68
00:03:25,800 --> 00:03:28,000
And therefore that's
what we're calculating.

69
00:03:28,000 --> 00:03:29,733
So P of X is very simple.

70
00:03:29,733 --> 00:03:32,100
We just need to calculate the number
of similar observations,

71
00:03:32,100 --> 00:03:35,100
the number of observations
that actually fall in here, which is four

72
00:03:35,233 --> 00:03:37,800
divided by the total number
of observations which is 30.

73
00:03:37,800 --> 00:03:42,300
So that will give us the likelihood
of a new dot falling here,

74
00:03:42,466 --> 00:03:47,100
or the likelihood of if we just pick out
a random dot from our dataset right now,

75
00:03:47,600 --> 00:03:50,800
then the likelihood of it
being one of these is four over 30.

76
00:03:51,133 --> 00:03:53,300
There we go for all 30.

77
00:03:53,300 --> 00:03:53,700
All right.

78
00:03:53,700 --> 00:03:56,700
So that is our marginal likelihood done.

79
00:03:56,766 --> 00:03:59,100
And now we're going to move on
just the likelihood.

80
00:03:59,100 --> 00:04:02,700
And this time is going to be
the likelihood of somebody exhibiting

81
00:04:02,700 --> 00:04:06,800
the features of X or being similar
to the datapoint that we're adding,

82
00:04:07,033 --> 00:04:11,700
given that we're only looking at people
who are driving to work.

83
00:04:12,300 --> 00:04:13,500
So let's have a look at that.

84
00:04:14,600 --> 00:04:15,600
Here's our dataset.

85
00:04:15,600 --> 00:04:19,500
And again we're going to draw the circle
around our data point.

86
00:04:19,533 --> 00:04:22,500
Take it out and then add that shading.

87
00:04:22,500 --> 00:04:25,633
So now the question is given
that we're only dealing

88
00:04:25,633 --> 00:04:29,500
with people who drive to work,
what is the likelihood that if we pick

89
00:04:29,500 --> 00:04:32,700
one of them, that that person
will be exhibiting features similar to X?

90
00:04:33,300 --> 00:04:36,133
So because we're only dealing
with the people who drive to work,

91
00:04:36,133 --> 00:04:38,033
we can forget about the red dots.

92
00:04:38,033 --> 00:04:38,366
There you go.

93
00:04:38,366 --> 00:04:42,600
This shaded there faded out and now
we're only dealing with the green dots.

94
00:04:42,600 --> 00:04:46,366
So the question is,
given that we're selecting a random point

95
00:04:46,366 --> 00:04:48,100
out of all of the people that drive.

96
00:04:48,100 --> 00:04:53,300
So this vertical bar drives
means given that a person drives to work.

97
00:04:53,300 --> 00:04:56,300
So we're looking at a random point
out of these.

98
00:04:56,300 --> 00:04:59,400
What is the likelihood that they will
exhibit features similar to X,

99
00:04:59,600 --> 00:05:03,933
which we agreed is the same
as they fall inside this circle.

100
00:05:04,566 --> 00:05:08,500
And the likelihood of that is one over
the total number of green dots.

101
00:05:08,833 --> 00:05:12,900
So there we go
p of x, given that they drive to work,

102
00:05:12,900 --> 00:05:15,800
is the number of similar observations
among those who walk.

103
00:05:15,800 --> 00:05:17,533
So inside of here is one

104
00:05:17,533 --> 00:05:20,533
similar meaning similar to our new points
that we're adding.

105
00:05:20,700 --> 00:05:23,666
And then divided
by the total number of workers.

106
00:05:23,666 --> 00:05:25,000
And that's not 20.

107
00:05:25,000 --> 00:05:28,500
So one over 20 that is our likelihood.

108
00:05:29,100 --> 00:05:29,700
There we go.

109
00:05:29,700 --> 00:05:31,866
So now we can plug these into the formula.

110
00:05:31,866 --> 00:05:33,866
Calculate the posterior probability.

111
00:05:33,866 --> 00:05:38,266
So it's going to be one over 20 times
20 over 30 divided by four with 30.

112
00:05:38,266 --> 00:05:40,100
And it's 0.25.

113
00:05:40,100 --> 00:05:42,300
So 25%.

114
00:05:42,300 --> 00:05:42,966
So there we go.

115
00:05:42,966 --> 00:05:46,100
That was step two of our Naive

116
00:05:46,100 --> 00:05:49,233
Bayes algorithm or Naive Bayes classifier.

117
00:05:49,233 --> 00:05:51,866
Hopefully you were able to follow along.

118
00:05:51,866 --> 00:05:54,233
And also that I hope that you had chance

119
00:05:54,233 --> 00:05:57,866
to perform that exercise on your own
and you got a similar result.

120
00:05:58,233 --> 00:06:00,066
And that's it for today.

121
00:06:00,066 --> 00:06:01,700
I look forward to seeing you next time.

122
00:06:01,700 --> 00:06:03,633
And until then, enjoy machine learning.