1 00:00:00,733 --> 00:00:01,833 Hello and welcome back to the. 2 00:00:01,833 --> 00:00:03,033 Course on Machine Learning. 3 00:00:03,033 --> 00:00:06,100 This is Kirill Eremenko, and in today's tutorial we're. 4 00:00:06,100 --> 00:00:08,433 Talking about the Naive Bayes classifier. 5 00:00:08,433 --> 00:00:10,866 This is a very interesting machine learning algorithm. 6 00:00:10,866 --> 00:00:12,600 And today we're going to get to. 7 00:00:12,600 --> 00:00:15,300 Know it on a very intuitive level. 8 00:00:15,300 --> 00:00:16,133 And in line 9 00:00:16,133 --> 00:00:19,900 with the super data science mission, which is making the complex simple. 10 00:00:20,000 --> 00:00:24,500 We're going to break down this complex topic into simple steps and bite. 11 00:00:24,500 --> 00:00:26,666 Sized pieces of information. 12 00:00:26,666 --> 00:00:29,200 I've got some very exciting slides prepared ahead, so. 13 00:00:29,200 --> 00:00:31,333 Let's dive straight into it. 14 00:00:31,333 --> 00:00:31,633 All right. 15 00:00:31,633 --> 00:00:33,600 So here we've got the Bayes theorem. 16 00:00:33,600 --> 00:00:35,466 And this is something we talked about in the previous tutorials. 17 00:00:35,466 --> 00:00:38,400 So by now we should be quite. Comfortable with the concept. 18 00:00:38,400 --> 00:00:40,900 How are we going to apply it to create a. 19 00:00:40,900 --> 00:00:42,533 Machine learning algorithm. 20 00:00:42,533 --> 00:00:44,133 Well let's have a look here. 21 00:00:44,133 --> 00:00:45,400 We've got a data set. 22 00:00:45,400 --> 00:00:47,133 So it has two features. 23 00:00:47,133 --> 00:00:48,866 It has X1 and x2. 24 00:00:48,866 --> 00:00:50,200 And there are two categories category. 25 00:00:50,200 --> 00:00:52,500 One which is red and category two which is green. 26 00:00:52,500 --> 00:00:55,900 But instead of working with these abstract terms, we're going to convert them into 27 00:00:56,200 --> 00:00:57,100 something that. 28 00:00:57,100 --> 00:00:58,400 We can understand a bit better, 29 00:00:58,400 --> 00:01:01,666 something that's a bit easier to operate with or to talk about. 30 00:01:01,666 --> 00:01:03,066 So we're going to call the y. 31 00:01:03,066 --> 00:01:05,100 Variable x2 variable salary. 32 00:01:05,100 --> 00:01:07,433 And the x one variable is going to be h. 33 00:01:07,433 --> 00:01:10,533 So basically we're representing observations or people. 34 00:01:10,533 --> 00:01:14,600 That are part of our data set in terms of their age and salary. 35 00:01:14,600 --> 00:01:17,600 As you can see we have 30 people here on this chart. 36 00:01:17,700 --> 00:01:21,266 And the categories we're going to replace them with walks, meaning that person. 37 00:01:21,266 --> 00:01:23,700 Walks to work and green will be. Drive. 38 00:01:23,700 --> 00:01:25,833 That means that person drives to work. 39 00:01:25,833 --> 00:01:28,466 And so now we get to a problem to the machine learning challenge that. 40 00:01:28,466 --> 00:01:29,766 We're going to be solving. 41 00:01:29,766 --> 00:01:33,866 What happens if we add a new observation, a new data point into this set? 42 00:01:34,300 --> 00:01:37,366 How do we. Classify this new data point. 43 00:01:37,366 --> 00:01:41,400 So as you can tell, this is a supervised machine learning algorithm 44 00:01:41,400 --> 00:01:44,800 because we're classifying something based on previously known classes. 45 00:01:45,366 --> 00:01:47,433 And so the question is is this person going to be. 46 00:01:47,433 --> 00:01:49,366 Classified as a person. Who walks to work? 47 00:01:49,366 --> 00:01:49,933 Or is this. 48 00:01:49,933 --> 00:01:52,866 Person going to be classified as a person who drives to work? 49 00:01:52,866 --> 00:01:54,633 And the Naive Bayes. 50 00:01:54,633 --> 00:01:55,500 Algorithm is going. 51 00:01:55,500 --> 00:01:57,433 To help us solve this challenge. 52 00:01:57,433 --> 00:01:59,366 All right. So how are we going to approach this? 53 00:01:59,366 --> 00:02:00,666 We need a plan of attack. 54 00:02:00,666 --> 00:02:02,600 It is going to be quite a complex approach. 55 00:02:02,600 --> 00:02:04,500 But at the same time we're going to break it down into. 56 00:02:04,500 --> 00:02:06,366 Steps and it'll all make sense. 57 00:02:06,366 --> 00:02:08,100 It'll be very easy to understand. 58 00:02:08,100 --> 00:02:09,866 So our plan of attack. 59 00:02:09,866 --> 00:02:11,500 We're going to take the Bayes theorem and we're. 60 00:02:11,500 --> 00:02:12,833 Going to apply it twice. 61 00:02:12,833 --> 00:02:15,666 First time we're going to apply it to find out 62 00:02:15,666 --> 00:02:20,266 what is the probability that this person walks given his features. 63 00:02:20,266 --> 00:02:24,400 And x over here is the features or represents the features. 64 00:02:24,800 --> 00:02:26,400 Of that data point. 65 00:02:26,400 --> 00:02:28,600 So let's go back to the visualization here. 66 00:02:28,600 --> 00:02:29,066 So here. 67 00:02:29,066 --> 00:02:30,900 You can see that this is. 68 00:02:30,900 --> 00:02:33,700 Our new data point. That person has a certain age. 69 00:02:33,700 --> 00:02:37,366 So let's say the age of that person maybe is like 25 years old. 70 00:02:37,700 --> 00:02:39,000 And then they have a salary. 71 00:02:39,000 --> 00:02:41,966 So let's say this salary is $30,000 per year. 72 00:02:41,966 --> 00:02:45,166 So those are features of this observation. 73 00:02:45,166 --> 00:02:46,966 Right now we're only working with two variables 74 00:02:46,966 --> 00:02:49,233 just for simplicity's sake so we. Can visualize. 75 00:02:49,233 --> 00:02:50,600 Things age and salary. 76 00:02:50,600 --> 00:02:53,300 But in reality there could be many many many more features. 77 00:02:53,300 --> 00:02:58,166 There could be features on how many what what industry they work in or how many 78 00:02:58,166 --> 00:03:02,333 years of education they have, or how long they've had a driver's license for. 79 00:03:02,333 --> 00:03:04,766 And the things like kind of how far away they live from work. 80 00:03:04,766 --> 00:03:06,766 So there could be lots of variables. 81 00:03:06,766 --> 00:03:07,933 But at the same time, right now 82 00:03:07,933 --> 00:03:10,133 we're only going to be dealing with two age and salary. 83 00:03:10,133 --> 00:03:12,266 And regardless of how many variables you. 84 00:03:12,266 --> 00:03:14,033 Have, they will be called. 85 00:03:14,033 --> 00:03:15,600 And we're going to call them features. 86 00:03:15,600 --> 00:03:19,133 So given the features of X, so given the age of. 87 00:03:19,133 --> 00:03:23,833 25 and the salary of $30,000, and we'll talk in more detail 88 00:03:23,833 --> 00:03:27,000 about exactly what we mean by features just in a moment. 89 00:03:27,000 --> 00:03:32,133 And so therefore this part represents that person that we're trying to classify. 90 00:03:32,333 --> 00:03:35,866 What is the likelihood of a person with those features x. 91 00:03:35,866 --> 00:03:40,033 So we know that we are taking somebody for those features that we have 92 00:03:40,033 --> 00:03:41,200 in our new data point. 93 00:03:41,200 --> 00:03:43,200 What is the likelihood of them working. 94 00:03:43,200 --> 00:03:44,333 And then we've. Got the right side. 95 00:03:44,333 --> 00:03:47,966 So we're going to talk through each one of these as we calculate them. 96 00:03:48,000 --> 00:03:50,966 But for now let's just give. Them their names going from right to left. 97 00:03:50,966 --> 00:03:54,433 So this one on over here is called the prior probability. 98 00:03:55,166 --> 00:03:58,466 And we're going to calculate that first because it's the easiest to calculate. 99 00:03:58,900 --> 00:04:01,333 Next one is the marginal likelihood. 100 00:04:01,333 --> 00:04:03,633 And we're going to calculate that second. 101 00:04:03,633 --> 00:04:06,233 The third one is a livelihood. 102 00:04:06,233 --> 00:04:07,400 That's just the names that they have. 103 00:04:07,400 --> 00:04:09,000 And we're going to calculate that. 104 00:04:09,000 --> 00:04:12,733 Third and finally what we're looking for is called the posterior probability. 105 00:04:12,833 --> 00:04:14,633 We're going to calculate that force. 106 00:04:14,633 --> 00:04:14,900 All right. 107 00:04:14,900 --> 00:04:17,200 So that's our plan of attack for step one. 108 00:04:17,200 --> 00:04:20,966 This is all still step one to calculate the probability that somebody walks 109 00:04:20,966 --> 00:04:24,566 given those features x that we see in our new data point. 110 00:04:25,333 --> 00:04:28,200 Next we're going to have step two where we're going to calculate the probability 111 00:04:28,200 --> 00:04:32,933 that somebody drives given those features x that we see in our new data point. 112 00:04:33,133 --> 00:04:35,966 And again here we'll have the probability which will calculate first, 113 00:04:35,966 --> 00:04:38,466 then the marginal likelihood then the likelihood. 114 00:04:38,466 --> 00:04:40,366 And then we'll get to posterior probability. 115 00:04:40,366 --> 00:04:43,300 And finally we're going to compare the probability that somebody walks 116 00:04:43,300 --> 00:04:47,000 given features x versus the probability that somebody drives given features x. 117 00:04:47,366 --> 00:04:51,100 And then from there we'll decide which class to put that new data point in. 118 00:04:51,566 --> 00:04:55,233 So as you can see the Naive Bayes classifier is a probabilistic type 119 00:04:55,233 --> 00:04:55,800 of classifier 120 00:04:55,800 --> 00:04:56,900 because we first calculate 121 00:04:56,900 --> 00:05:00,100 the probabilities and then based on probabilities, we're assigning a class. 122 00:05:00,666 --> 00:05:00,966 All right. 123 00:05:00,966 --> 00:05:04,733 So are you ready to perform these steps. 124 00:05:04,733 --> 00:05:05,866 It's going to be lots of fun. 125 00:05:05,866 --> 00:05:08,766 We're going to take it nice and easy nice and slowly 126 00:05:08,766 --> 00:05:11,033 so that we understand everything. And after 127 00:05:11,033 --> 00:05:14,033 this you're going to be very comfortable with the Naive Bayes classifier. 128 00:05:14,066 --> 00:05:16,500 Step one. All right. So here we have our visualization. 129 00:05:16,500 --> 00:05:18,666 Let's move it all to the left a little bit so we. 130 00:05:18,666 --> 00:05:19,933 Can make some space. 131 00:05:19,933 --> 00:05:20,433 Now we're going to. 132 00:05:20,433 --> 00:05:23,833 Calculate the first probability in our Bayes theorem. 133 00:05:23,833 --> 00:05:27,400 We're going to calculate the probability that somebody walks right. 134 00:05:27,400 --> 00:05:29,400 Just the overall probability. And what does that mean. 135 00:05:29,400 --> 00:05:33,800 That is the probability that somebody walks without knowing anything about them. 136 00:05:33,800 --> 00:05:34,933 So we're just saying 137 00:05:34,933 --> 00:05:38,233 we're going to add a new observation to our data set into here. 138 00:05:38,600 --> 00:05:41,000 But we don't know their age and we don't know their salary. 139 00:05:41,000 --> 00:05:43,866 We're just going to put it somewhere. Into our data set. 140 00:05:43,866 --> 00:05:45,500 What is the probability that this person. 141 00:05:45,500 --> 00:05:47,933 That we're adding to our database walks to work? 142 00:05:47,933 --> 00:05:49,566 Well, it's very easy and straightforward. 143 00:05:49,566 --> 00:05:51,700 From here we don't have much choice. 144 00:05:51,700 --> 00:05:54,433 The only thing that we can do is calculate the number of. 145 00:05:54,433 --> 00:05:55,566 Read observations, 146 00:05:55,566 --> 00:05:58,566 the number of people that actually walk and divide by the overall number. 147 00:05:58,800 --> 00:06:00,966 So probability that a person walks to work. 148 00:06:00,966 --> 00:06:03,300 Without any other knowledge. 149 00:06:03,300 --> 00:06:05,466 Is the number of workers, number of people at work, 150 00:06:05,466 --> 00:06:08,500 which is these red dots divided by the total number of observation. 151 00:06:08,500 --> 00:06:08,966 The green dots. 152 00:06:08,966 --> 00:06:11,966 So the gray dot is in participating in these calculations. 153 00:06:12,200 --> 00:06:14,800 So here we have probability of somebody walks is 1010 154 00:06:14,800 --> 00:06:17,800 red dots divided by 30 dots overall. 155 00:06:17,800 --> 00:06:18,133 All right. 156 00:06:18,133 --> 00:06:18,900 So that was easy. 157 00:06:18,900 --> 00:06:21,233 We've calculated the prior probability. 158 00:06:21,233 --> 00:06:23,433 Next we calculating the marginal likelihood. 159 00:06:23,433 --> 00:06:25,966 And this is where things get interesting. 160 00:06:25,966 --> 00:06:28,966 So how do we calculate the marginal likelihood. 161 00:06:29,200 --> 00:06:30,233 Let's have a look. 162 00:06:30,233 --> 00:06:32,133 Here's our dataset again. 163 00:06:32,133 --> 00:06:35,866 And the first thing we're you're going to do is we're going to select a radius. 164 00:06:35,866 --> 00:06:39,600 And we're going to draw a circle around our observation like that. 165 00:06:40,266 --> 00:06:41,866 Now this radius you need. 166 00:06:41,866 --> 00:06:43,500 To select on your own. 167 00:06:43,500 --> 00:06:45,200 You need to decide for your algorithm. 168 00:06:45,200 --> 00:06:47,233 This is going to be like an input parameter or an algorithm. 169 00:06:47,233 --> 00:06:49,466 You can select less. You can select it more. 170 00:06:49,466 --> 00:06:50,533 It's up to you. 171 00:06:50,533 --> 00:06:51,866 Now what does this radius do. 172 00:06:51,866 --> 00:06:55,200 Well what we're going to do is we're going to first of all let's 173 00:06:55,666 --> 00:06:58,266 just to make things easier, we're going to remove. 174 00:06:58,266 --> 00:07:01,700 Our dot for now just so that it's not confusing us. 175 00:07:02,033 --> 00:07:02,600 And then we're. 176 00:07:02,600 --> 00:07:04,000 Going to look at all the. 177 00:07:04,000 --> 00:07:06,400 Points that are inside this radius. 178 00:07:06,400 --> 00:07:07,100 And what. 179 00:07:07,100 --> 00:07:07,966 We're saying here is. 180 00:07:07,966 --> 00:07:10,966 That all of the points inside the circle. 181 00:07:11,066 --> 00:07:15,833 Are we're going to deem them to be similar in terms of features, to the. 182 00:07:15,833 --> 00:07:18,000 Point that we had the point that we had. 183 00:07:18,000 --> 00:07:22,066 Remember, it had an age of, for example, 25 and a salary of $30,000 per year. 184 00:07:22,366 --> 00:07:24,900 So now we're going to draw a radius around it. 185 00:07:24,900 --> 00:07:29,833 And let's say anybody between the ages of 20 and 30 and in the salaries 186 00:07:29,833 --> 00:07:33,100 of $25,000 to $35,000, 187 00:07:33,466 --> 00:07:37,233 anybody that falls in that circle, again, it's not a square. 188 00:07:37,266 --> 00:07:38,900 It's not just a square, it's a circle. 189 00:07:38,900 --> 00:07:42,233 anybody who falls somewhere, somewhere in that. 190 00:07:42,433 --> 00:07:46,200 Vicinity is going to be deemed similar to. 191 00:07:46,400 --> 00:07:49,100 The new data point that we're adding to our data set. 192 00:07:49,100 --> 00:07:50,900 So, as you can imagine, this radius is actually going 193 00:07:50,900 --> 00:07:54,400 to have a big say in the way your algorithm works. 194 00:07:54,733 --> 00:07:56,500 Well, let's say we have this radius and. 195 00:07:56,500 --> 00:07:58,433 This is how it all played out. 196 00:07:58,433 --> 00:08:01,200 We have three red dots one green dot in there. 197 00:08:01,200 --> 00:08:01,500 All right. 198 00:08:01,500 --> 00:08:02,966 So now what do we do. 199 00:08:02,966 --> 00:08:05,233 How do we calculate the probability of x. 200 00:08:05,233 --> 00:08:07,200 And what is the probability of x. 201 00:08:07,200 --> 00:08:11,900 Well the probability of x is the probability of a new point. 202 00:08:11,900 --> 00:08:13,533 That we add to our data set. 203 00:08:13,533 --> 00:08:16,866 Being similar in features to the. 204 00:08:16,866 --> 00:08:19,466 Point that we actually are adding to. It. 205 00:08:19,466 --> 00:08:20,866 So basically is the probability of. 206 00:08:20,866 --> 00:08:22,666 That new point that we're adding. 207 00:08:22,666 --> 00:08:25,700 Or like any random point that we add, is the probability 208 00:08:25,700 --> 00:08:29,233 of that any random point to fall into this circle. 209 00:08:29,733 --> 00:08:33,000 And p of x is calculated as the number of similar observations. 210 00:08:33,000 --> 00:08:35,866 So the number of observations that already. We can see in the circle. 211 00:08:35,866 --> 00:08:40,200 So one, two three, four divided by the total number of observations which is 30. 212 00:08:40,466 --> 00:08:42,900 So p of x is four divided by three. 213 00:08:42,900 --> 00:08:48,166 Once again just to reiterate, p of x tells us what is the likelihood of any new. 214 00:08:48,166 --> 00:08:52,533 Random variable that we add to this data set falling inside this circle. 215 00:08:52,933 --> 00:08:57,033 And it is four over 30 because we only have four, based on prior knowledge, 216 00:08:57,033 --> 00:09:00,300 we can tell that this four here and this student also is four with three. 217 00:09:00,966 --> 00:09:01,233 All right. 218 00:09:01,233 --> 00:09:03,466 So that wasn't hard at all as well. 219 00:09:03,466 --> 00:09:05,200 We calculate the marginal likelihood. 220 00:09:05,200 --> 00:09:07,666 So so far we got this one and we got this one. 221 00:09:07,666 --> 00:09:09,533 Next we're moving on to the likelihood. 222 00:09:09,533 --> 00:09:11,933 And this is probably the most complex one. 223 00:09:11,933 --> 00:09:14,866 What is the likelihood that somebody 224 00:09:14,866 --> 00:09:17,866 who walks exhibits features X. 225 00:09:18,466 --> 00:09:21,733 Well actually after we've spoken about the marginal likelihood calculating. 226 00:09:21,733 --> 00:09:23,833 The likelihood won't be as. Complex. 227 00:09:23,833 --> 00:09:25,233 So let's have a look. 228 00:09:25,233 --> 00:09:27,033 So there's our chart. 229 00:09:27,033 --> 00:09:28,266 And now what we're. 230 00:09:28,266 --> 00:09:29,766 Going to do is we're going to. 231 00:09:29,766 --> 00:09:32,333 Draw the same circle again. 232 00:09:32,333 --> 00:09:33,900 And once again we're going to remove the. 233 00:09:33,900 --> 00:09:35,100 Gray point for now. 234 00:09:35,100 --> 00:09:37,133 And we're going to color our circle. 235 00:09:37,133 --> 00:09:37,700 And so. 236 00:09:37,700 --> 00:09:42,433 Anything that falls inside the circle is deemed to be similar to the. 237 00:09:42,433 --> 00:09:44,100 Point that we're adding. 238 00:09:44,100 --> 00:09:46,600 So the question is what is the probability. 239 00:09:46,600 --> 00:09:50,100 That a randomly selected data point from our data. 240 00:09:50,100 --> 00:09:52,366 Set will be similar to. 241 00:09:52,366 --> 00:09:53,933 The data point that we're adding. 242 00:09:53,933 --> 00:09:58,266 So basically, what is the likelihood that a randomly selected data point will be. 243 00:09:58,266 --> 00:09:59,833 From this circle? 244 00:09:59,833 --> 00:10:03,200 Given this vertical pipe means given that 245 00:10:03,200 --> 00:10:06,866 that person walks, that we know that that person walks to work. 246 00:10:07,100 --> 00:10:08,100 The other way to think about 247 00:10:08,100 --> 00:10:11,866 this is we're only working with people who walk to work. 248 00:10:11,866 --> 00:10:15,533 So we're only working with the red dots which represent people who walk to work. 249 00:10:15,533 --> 00:10:17,933 So let's forget about the green dots there like that. 250 00:10:17,933 --> 00:10:20,933 Now they're faint and we're not even talking about them at all. 251 00:10:21,033 --> 00:10:22,600 We're only talking about the red dots. 252 00:10:22,600 --> 00:10:24,666 So the question is, given that we're only. 253 00:10:24,666 --> 00:10:28,833 Working with the red dots, what is the likelihood that a 254 00:10:28,866 --> 00:10:32,266 randomly selected data point from our data 255 00:10:32,266 --> 00:10:35,433 set, from the red dots, is somebody. 256 00:10:35,433 --> 00:10:38,666 Who exhibits features similar to. 257 00:10:38,766 --> 00:10:41,400 The point that we are adding to our data set. 258 00:10:41,400 --> 00:10:44,866 So basically, what is the likelihood that a randomly selected 259 00:10:44,866 --> 00:10:48,666 red dot falls into this gray area, into this circle? 260 00:10:49,066 --> 00:10:50,700 That's what the question we're asking. 261 00:10:50,700 --> 00:10:52,200 And that is also very simple. 262 00:10:52,200 --> 00:10:54,966 Now that we know how all of this works, it's basically 263 00:10:54,966 --> 00:10:57,600 the number of similar observations among those who work. 264 00:10:57,600 --> 00:11:01,200 So the number of red dots that actually fall inside this red circle. 265 00:11:01,200 --> 00:11:05,766 In this gray circle, that's three divided by the total number of workers. 266 00:11:05,766 --> 00:11:08,466 So people and total number of people who walk to work. 267 00:11:08,466 --> 00:11:10,400 And that is three over ten. 268 00:11:10,400 --> 00:11:10,933 There we go. 269 00:11:10,933 --> 00:11:15,866 So that's our P of the likelihood of somebody exhibiting the feature similar 270 00:11:15,866 --> 00:11:20,100 to that data point that were about to add, given that we're only selecting among 271 00:11:20,366 --> 00:11:21,666 the red dots. 272 00:11:21,666 --> 00:11:22,800 So that's three over ten. 273 00:11:22,800 --> 00:11:24,133 And that was our likelihood. 274 00:11:24,133 --> 00:11:26,833 So now if we plug all that in so there we go. 275 00:11:26,833 --> 00:11:28,166 That likelihood is done. 276 00:11:28,166 --> 00:11:29,233 So if we plug all of that. 277 00:11:29,233 --> 00:11:31,200 In we'll get our posterior probability. 278 00:11:31,200 --> 00:11:36,200 So three over ten times ten over 30 and divided by four over 30. 279 00:11:36,200 --> 00:11:39,333 So if we calculate that it'll give us 0.75 280 00:11:39,533 --> 00:11:42,766 75% is the probability 281 00:11:42,766 --> 00:11:46,633 that somebody that we put into the place where we're putting. X. 282 00:11:46,633 --> 00:11:50,033 Is should be classified as a person who walks to work. 283 00:11:50,533 --> 00:11:53,900 That was step one was pretty intense, right. 284 00:11:53,900 --> 00:11:57,266 Pretty exciting to calculate this value now. 285 00:11:57,266 --> 00:11:58,866 And the next step is step two. 286 00:11:58,866 --> 00:12:00,700 That's step one. The next step is step two. 287 00:12:00,700 --> 00:12:03,600 To do the same thing for the likelihood that somebody with. 288 00:12:03,600 --> 00:12:04,800 Features X. 289 00:12:04,800 --> 00:12:09,400 Will be classified or should be classified as a person who drives to work. 290 00:12:09,966 --> 00:12:12,000 And here I'm going to throw you a challenge. 291 00:12:12,000 --> 00:12:15,000 I'm going to challenge you to pause this video. 292 00:12:15,000 --> 00:12:19,333 Or rewind back to find out, to have the image in front of you 293 00:12:19,866 --> 00:12:22,333 and do these calculations yourself, to. 294 00:12:22,333 --> 00:12:23,433 Actually go through the. 295 00:12:23,433 --> 00:12:25,700 Same steps and perform those. Calculations. 296 00:12:25,700 --> 00:12:29,400 If you'd like to see and compare to my calculations. 297 00:12:29,666 --> 00:12:32,966 Then I'm going to put in another video after this one. 298 00:12:32,966 --> 00:12:35,400 So another tutorial after this one in the course. 299 00:12:35,400 --> 00:12:37,933 So you can just go to the next tutorial and compare. 300 00:12:37,933 --> 00:12:40,233 Otherwise I'm just going to show you the result now. 301 00:12:40,233 --> 00:12:43,233 So the result is one over 24 likelihood. 302 00:12:43,233 --> 00:12:44,233 Or let's start from the right. 303 00:12:44,233 --> 00:12:48,000 Prior probability is 20 over 30 and marginal likelihood remains 304 00:12:48,000 --> 00:12:49,366 unchanged for over 30. 305 00:12:49,366 --> 00:12:51,833 Likelihood changes to one over 20. 306 00:12:51,833 --> 00:12:54,433 So the probability of somebody who exhibits features 307 00:12:54,433 --> 00:12:58,200 X being a person who drives to work is 25%. 308 00:12:58,700 --> 00:13:00,233 So that was step two. 309 00:13:00,233 --> 00:13:01,500 Now we're going to do step three. 310 00:13:01,500 --> 00:13:02,766 We're going to compare. 311 00:13:02,766 --> 00:13:06,633 The probability of somebody with features X 312 00:13:06,933 --> 00:13:10,033 the probability of them being a person who walks to work versus the probability 313 00:13:10,033 --> 00:13:13,033 of somebody who features X being a person who drives to work. 314 00:13:13,100 --> 00:13:15,466 So it's 75% versus 25%. 315 00:13:15,466 --> 00:13:18,100 And therefore the first is greater in the second. 316 00:13:18,100 --> 00:13:21,300 And therefore it is more likely that that person. 317 00:13:21,300 --> 00:13:23,233 With features X is. 318 00:13:23,233 --> 00:13:27,133 Going to be a person who walks to work than the person who drives to work. 319 00:13:27,133 --> 00:13:30,766 So still a 25% chance that that is a person who drives to work. 320 00:13:31,033 --> 00:13:33,966 But percent chance that it is a person who walks to. 321 00:13:33,966 --> 00:13:37,400 Work is great, 75% and therefore, we're going to classify 322 00:13:37,600 --> 00:13:40,733 this point as a person who walks to work. 323 00:13:41,366 --> 00:13:43,266 There we go. That is how the. 324 00:13:43,266 --> 00:13:47,200 Naive Bayes algorithm in machine learning works. 325 00:13:47,600 --> 00:13:49,800 I hope you found this tutorial useful. 326 00:13:49,800 --> 00:13:53,200 I was I'm pretty excited and pre proud of these slides, 327 00:13:53,200 --> 00:13:56,700 and hopefully this is a step by step and simple. 328 00:13:56,700 --> 00:13:59,100 Explanation of a complex concept. 329 00:13:59,100 --> 00:14:00,766 And I look forward to seeing you next time. 330 00:14:00,766 --> 00:14:02,566 Until then, enjoy machine learning.