1 00:00:00,833 --> 00:00:03,466 Hello and welcome back to the course on Machine Learning. 2 00:00:03,466 --> 00:00:05,500 Today we're talking about Bayes Theorem. 3 00:00:05,500 --> 00:00:09,933 Now our main goal for this section is the Naive Bayes classifiers. 4 00:00:10,200 --> 00:00:12,300 But at the same time we can't proceed to them 5 00:00:12,300 --> 00:00:14,100 without talking about Bayes Theorem. 6 00:00:14,100 --> 00:00:14,533 That's why 7 00:00:14,533 --> 00:00:18,000 we're going to have this lovely tutorial devoted specifically to this topic. 8 00:00:18,333 --> 00:00:21,033 So let's kick it off and get straight into it. 9 00:00:21,033 --> 00:00:23,566 Today we're talking about spanners. 10 00:00:23,566 --> 00:00:25,300 I know it's it's a bit weird. 11 00:00:25,300 --> 00:00:26,566 Why are we talking about spanners? 12 00:00:26,566 --> 00:00:28,200 We're supposed to be talking about Bayes theorem, 13 00:00:28,200 --> 00:00:31,533 but I'm going to be illustrating the Bayes theorem on an example. 14 00:00:31,533 --> 00:00:34,866 And we're going to be using spanners just because it's 15 00:00:34,866 --> 00:00:37,866 one of the things I found on the internet. 16 00:00:38,033 --> 00:00:39,966 One of the first pictures that came to mind. 17 00:00:39,966 --> 00:00:42,000 And also that way you'll always remember 18 00:00:42,000 --> 00:00:44,566 whenever somebody says Bayes theorem, you'll remember spanner 19 00:00:44,566 --> 00:00:46,300 and you'll remember what we talked about today. 20 00:00:46,300 --> 00:00:49,300 So it's a good way to anchor this into your memory. 21 00:00:49,333 --> 00:00:51,533 All right. So let's say we're at a factory. 22 00:00:51,533 --> 00:00:53,133 We're doing some analytics for a factory. 23 00:00:53,133 --> 00:00:56,400 And there's two machines, and one machine produces spanners. 24 00:00:56,700 --> 00:00:59,033 And the other machine produces spanners. 25 00:00:59,033 --> 00:01:01,800 Now the machines work at different rates and they have somewhat 26 00:01:01,800 --> 00:01:05,633 different characteristics, but overall they're producing the same spanners. 27 00:01:05,800 --> 00:01:08,800 The additional information here is that the spanners 28 00:01:08,800 --> 00:01:10,833 are actually marked or tagged. 29 00:01:10,833 --> 00:01:12,233 So we know which machine they came from. 30 00:01:12,233 --> 00:01:15,600 So the top ones came from machine one, the bottom ones came from machine two. 31 00:01:15,900 --> 00:01:19,666 And then at the end of the day we've got like a whole pile of these spanners 32 00:01:19,966 --> 00:01:23,800 and the workers go through them and their goal is to pick out 33 00:01:23,800 --> 00:01:25,433 the defective spanners. 34 00:01:25,433 --> 00:01:26,300 So here we can see that 35 00:01:26,300 --> 00:01:29,733 there's a couple of defective spanners hiding among the pile. 36 00:01:30,066 --> 00:01:33,666 Now, the question that we're going to be asking today is what's the probability. 37 00:01:33,700 --> 00:01:39,066 What's the probability of machine two producing a defective spanner. 38 00:01:39,300 --> 00:01:42,733 So if you take just a random spanner and produced by machine two. 39 00:01:42,733 --> 00:01:46,466 So as it comes out of the from the conveyor belt you pick it up. 40 00:01:46,566 --> 00:01:49,566 What is the probability that that spanner is defective. 41 00:01:49,666 --> 00:01:53,233 And the way we'll get to that probability is through some information 42 00:01:53,233 --> 00:01:55,566 that will already be given to us at the start. 43 00:01:55,566 --> 00:01:57,566 So we'll have a look at that information in a second. 44 00:01:57,566 --> 00:02:02,100 But the rule or the mathematical concept that we'll be using to get to 45 00:02:02,100 --> 00:02:05,233 that probability is called the Bayes theorem. 46 00:02:05,400 --> 00:02:07,500 Here's a mathematical representation of it. 47 00:02:07,500 --> 00:02:10,300 I know it is a bit complicated right now. 48 00:02:10,300 --> 00:02:12,300 It seems like what are all these signs like? 49 00:02:12,300 --> 00:02:15,300 We know that P is probably probability but then b a 50 00:02:15,300 --> 00:02:18,566 and the vertical line what what all the all of them doing here. 51 00:02:18,600 --> 00:02:20,666 What is this relationship telling us. 52 00:02:20,666 --> 00:02:22,066 Well don't worry about it for now. 53 00:02:22,066 --> 00:02:25,566 We will walk through it very slowly step by step, 54 00:02:25,566 --> 00:02:28,866 and you'll get to know the Bayes theorem very intimately. 55 00:02:29,266 --> 00:02:30,866 All right, so let's get started. 56 00:02:30,866 --> 00:02:34,233 What is the information that is provided to us at the very start? 57 00:02:34,700 --> 00:02:38,000 Well, we know that machine one produces 30 wrenches 58 00:02:38,000 --> 00:02:41,066 per hour and machine two 24in per hour, by the way. 59 00:02:41,133 --> 00:02:42,733 Wrenches and spanners. 60 00:02:42,733 --> 00:02:45,400 I found out just now that they're actually the same thing. 61 00:02:45,400 --> 00:02:47,633 A spanner is outside of North America. 62 00:02:47,633 --> 00:02:49,700 A wrench is inside of North America. 63 00:02:49,700 --> 00:02:50,366 So there you go. 64 00:02:50,366 --> 00:02:52,066 You learned an extra additional thing today. 65 00:02:52,066 --> 00:02:55,066 If you didn't know that, I learned an extra additional thing. 66 00:02:55,100 --> 00:02:57,566 So we're just going to call them wrenches from now. 67 00:02:57,566 --> 00:02:59,500 Machine one produces 30 wrenches. 68 00:02:59,500 --> 00:03:01,866 Machine two produces 20in per arm. 69 00:03:01,866 --> 00:03:03,166 Okay, great. 70 00:03:03,166 --> 00:03:08,066 Also, out of all of the produce parts, we can see that 1% are defective. 71 00:03:08,333 --> 00:03:11,133 So after checking the parts, at the end of the day, you know, 72 00:03:11,133 --> 00:03:14,433 thousands and thousands of wrenches, we can see that 1% of them are defective. 73 00:03:14,566 --> 00:03:16,266 Okay, so we know that. 74 00:03:16,266 --> 00:03:18,900 Then we also know that out of all of the defective parts, 75 00:03:18,900 --> 00:03:24,400 we can see that 50% came from machine one and 50% came from machine two. 76 00:03:24,733 --> 00:03:28,166 So here you can see that if you take all the defective parts, 77 00:03:28,233 --> 00:03:32,100 only the defective parts, and you calculate 78 00:03:32,133 --> 00:03:33,833 how many of them came from machine one, 79 00:03:33,833 --> 00:03:37,966 and how many of them came from machine two, you'll see that half and half that 80 00:03:37,966 --> 00:03:40,966 half of them are from machine one and half of for machine two. 81 00:03:41,233 --> 00:03:41,966 All right. 82 00:03:41,966 --> 00:03:44,033 So this is just the defective parts. 83 00:03:44,033 --> 00:03:46,466 And the question is what is the probability 84 00:03:46,466 --> 00:03:50,200 there were a part produced by machine two is defective. 85 00:03:50,200 --> 00:03:52,666 So that's the same question which we just asked already. 86 00:03:52,666 --> 00:03:54,900 What is the probability that if I come up to machine two 87 00:03:54,900 --> 00:03:57,466 and take the part that just popped out of machine two, 88 00:03:57,466 --> 00:04:00,466 what is the probability that that part is going to be defective? 89 00:04:00,633 --> 00:04:03,700 So how do we put all that information together to get the answer 90 00:04:03,700 --> 00:04:04,333 to this question? 91 00:04:04,333 --> 00:04:07,366 That's what Bayes Ethereum answers or helps us do. 92 00:04:07,833 --> 00:04:09,033 So let's have a look at how 93 00:04:09,033 --> 00:04:12,033 we can rewrite this information in more mathematical terms. 94 00:04:12,466 --> 00:04:14,066 So the first line what does it tell us. 95 00:04:14,066 --> 00:04:18,300 It tells us that machine one produces 30in per hour machine to 22in per hour. 96 00:04:18,300 --> 00:04:19,800 So in total, 97 00:04:19,800 --> 00:04:23,166 out of all the wrenches produced, there's 50 wrenches produced per hour. 98 00:04:23,500 --> 00:04:27,233 And also that means that the probability of any given wrench 99 00:04:27,233 --> 00:04:31,200 that you pick up from the pile, just any, whether it's defective or non defective, 100 00:04:31,400 --> 00:04:34,366 if you pick up a range from the final pile, the probability 101 00:04:34,366 --> 00:04:37,400 that it'll come from machine one is 30 divided by 50. 102 00:04:37,766 --> 00:04:38,033 Right. 103 00:04:38,033 --> 00:04:42,366 So if we're producing 30 from machine one and 50 overall per hour, 104 00:04:42,566 --> 00:04:46,200 that means likelihood of any given wrench to be from machine one is about 105 00:04:46,600 --> 00:04:50,700 is exactly 30 divided by 50 or 0 point 6 or 60%. 106 00:04:51,300 --> 00:04:54,700 Similarly, the likelihood of that wrench 107 00:04:54,700 --> 00:04:57,666 being from machine two is 20 over 50 or 40%. 108 00:04:57,666 --> 00:04:58,200 All right. Great. 109 00:04:58,200 --> 00:05:00,500 So we're going to write those down and keep them there for now. 110 00:05:00,500 --> 00:05:02,400 Now let's see what we can get out of the second line. 111 00:05:02,400 --> 00:05:06,866 So we can see that out of all the produced parts we can see that 1% of defective. 112 00:05:06,866 --> 00:05:09,366 Well, it's very simple to rewrite this in mathematical terms. 113 00:05:09,366 --> 00:05:13,300 We can just say probability of a part being defective equals 1%. So. 114 00:05:13,600 --> 00:05:14,500 So even though that was a 115 00:05:14,500 --> 00:05:18,300 simple transition actually these sentences are saying a bit different things. 116 00:05:18,300 --> 00:05:22,200 So the one on the left says we can see that 1% are defective. 117 00:05:22,200 --> 00:05:23,500 So that means that we took 118 00:05:23,500 --> 00:05:27,100 we counted the number of defective parts and we divided by the total amount. 119 00:05:27,266 --> 00:05:29,866 And that means that 1% defective on the right. 120 00:05:29,866 --> 00:05:34,533 We're saying the probability of a single part to be defective was 1%. 121 00:05:34,533 --> 00:05:37,666 So that means if I pick up a random part from the pile, 122 00:05:38,233 --> 00:05:41,500 the likelihood of it being defective is 1%, right? 123 00:05:41,766 --> 00:05:44,300 So that is what we're saying here. 124 00:05:44,300 --> 00:05:47,633 And it's a very simple thing to write in mathematical terms. 125 00:05:47,633 --> 00:05:51,333 There it is p defective or that there's a defect equals 1%. 126 00:05:51,900 --> 00:05:52,266 All right. 127 00:05:52,266 --> 00:05:54,466 And now we're moving on to part three. 128 00:05:54,466 --> 00:05:56,833 Out of all of the defective parts. 129 00:05:56,833 --> 00:06:00,266 We can see that 50% came from machine 130 00:06:00,433 --> 00:06:03,300 one and 50% came from machine two. 131 00:06:03,300 --> 00:06:06,000 So how do we write to that in mathematical terms. 132 00:06:06,000 --> 00:06:07,200 And this is the interesting part. 133 00:06:07,200 --> 00:06:09,933 So let's start with the machine. One part. 134 00:06:09,933 --> 00:06:14,500 50% came from machine one meaning that if we take only the defective part. 135 00:06:14,500 --> 00:06:18,400 So if it's given that we're only looking at the defective parts, 136 00:06:18,800 --> 00:06:21,600 then the likelihood that any part 137 00:06:21,600 --> 00:06:24,666 that we pick out that that part came from machine one is 50%. 138 00:06:24,666 --> 00:06:28,566 And the way trying mathematical terms is this way P of machine 139 00:06:28,566 --> 00:06:31,700 one vertical line defect equals 50%. 140 00:06:31,866 --> 00:06:36,466 So the right part over here, the vertical line in mathematical terms means given. 141 00:06:36,666 --> 00:06:40,000 So here you can see part of a part of machine 142 00:06:40,000 --> 00:06:43,233 one means the likelihood or probability of a part coming from machine one. 143 00:06:43,533 --> 00:06:47,733 Here it's same thing but given some condition. 144 00:06:47,766 --> 00:06:51,533 So this is the likelihood of a part coming from machine one 145 00:06:52,033 --> 00:06:55,666 given the condition that that part is defective. 146 00:06:55,666 --> 00:06:59,500 So you know a priori that the part is defective 147 00:06:59,500 --> 00:07:02,400 because you're picking it out of the defective pile. 148 00:07:02,400 --> 00:07:05,766 So what is the probability of that part coming from machine one? 149 00:07:06,066 --> 00:07:07,800 Well, it's 50%. 150 00:07:07,800 --> 00:07:11,133 And that is the mathematical transcription of this sentence. 151 00:07:11,133 --> 00:07:15,100 We can see that 50% came from machine 150% of defective parts. 152 00:07:15,633 --> 00:07:16,066 Okay. 153 00:07:16,066 --> 00:07:18,266 So now let's do the same thing for the machine. 154 00:07:18,266 --> 00:07:20,166 Machine two it's going to look exactly the same. 155 00:07:20,166 --> 00:07:21,766 But here is going to be machine two. 156 00:07:21,766 --> 00:07:26,733 So probability of a part that the part that we just picked up, the probability 157 00:07:26,733 --> 00:07:30,200 that it originally was created by machine two is 50%. 158 00:07:30,600 --> 00:07:34,433 Given that we're only picking out from the defective pile. 159 00:07:34,533 --> 00:07:35,000 Right. 160 00:07:35,000 --> 00:07:38,133 Because if we're picking out out of all of them, it's actually 40%. 161 00:07:38,133 --> 00:07:39,300 It's 0.4. 162 00:07:39,300 --> 00:07:43,033 But if we're picking it out only from the defective file, it's 50%. 163 00:07:43,200 --> 00:07:46,200 So you can really tell from here, from these two expressions 164 00:07:46,200 --> 00:07:51,466 or four expressions, that the likelihood of a part coming from machine to is 40%, 165 00:07:51,500 --> 00:07:55,000 meaning that it produces less, it produces less ranges, 166 00:07:55,400 --> 00:08:00,800 whereas the likelihood of a defective part coming from machine to is 50%. 167 00:08:00,800 --> 00:08:01,333 So that 168 00:08:01,333 --> 00:08:05,400 if you take any part or any branch, you pick it out from the defective file. 169 00:08:05,466 --> 00:08:08,133 The likelihood that it was originally produced by machine to is 50%. 170 00:08:08,133 --> 00:08:12,100 So you can tell right away that machine two seems to be producing 171 00:08:12,100 --> 00:08:16,133 disproportionately more defective parts than machine one, right? 172 00:08:16,166 --> 00:08:19,933 So it's only producing 40% of the output, but it's accounting for 50% 173 00:08:19,933 --> 00:08:21,433 of the defective parts. 174 00:08:21,433 --> 00:08:24,266 And the question that we want to ask is actually a bit different. 175 00:08:24,266 --> 00:08:24,466 Right. 176 00:08:24,466 --> 00:08:25,766 So the question we want to ask is 177 00:08:25,766 --> 00:08:29,766 what is the probability that a part produced by machine two is defective. 178 00:08:29,800 --> 00:08:33,833 So here it's kind of reverse here where first we're saying 179 00:08:33,866 --> 00:08:36,333 we're taking a part from machine two. 180 00:08:36,333 --> 00:08:36,600 Right. 181 00:08:36,600 --> 00:08:39,633 We're only looking at parts that are produced by machine two. 182 00:08:39,633 --> 00:08:43,933 What is the probability of a random part taken out of machine to to be defective. 183 00:08:44,266 --> 00:08:47,000 How do we write that in mathematical terms. 184 00:08:47,000 --> 00:08:48,033 It's written this way. 185 00:08:48,033 --> 00:08:52,800 So probability of a defective part given that it came from machine two. 186 00:08:53,066 --> 00:08:56,233 If you look at just the probability of a part being defective it's 1%. 187 00:08:56,333 --> 00:08:59,333 But here we're given a condition that it's machine two. 188 00:08:59,400 --> 00:09:01,200 So the way to think of it, you can either think of it 189 00:09:01,200 --> 00:09:04,600 as parts of coming out of machine two and you just pick up a random one. 190 00:09:04,600 --> 00:09:06,666 What's the probability of it being defective? Right. 191 00:09:06,666 --> 00:09:08,766 Or you can think of it in terms of quantities. 192 00:09:08,766 --> 00:09:11,866 The correct term here is frequentist interpretation. 193 00:09:11,866 --> 00:09:15,200 So you can think of it in terms of the frequentist interpretation. 194 00:09:15,300 --> 00:09:18,633 And that means instead of just picking up one part and thinking 195 00:09:18,633 --> 00:09:21,633 about what what is the likelihood of it being defective 196 00:09:21,933 --> 00:09:23,100 if we know that it came from machine? 197 00:09:23,100 --> 00:09:26,466 To think of it as there's a pile of wrenches 198 00:09:26,733 --> 00:09:30,333 that only came from machine two, so you have a pile of, you know, 199 00:09:30,400 --> 00:09:32,700 a thousand wrenches that came from machine two. 200 00:09:32,700 --> 00:09:36,433 What is the number of wrenches that are going to be defective? 201 00:09:36,433 --> 00:09:36,633 Right. 202 00:09:36,633 --> 00:09:40,033 So what is a portion of wrenches that are going to be defective? 203 00:09:40,033 --> 00:09:43,033 So those are two different ways of thinking about probabilities 204 00:09:43,166 --> 00:09:44,666 or the Bayes theorem. 205 00:09:44,666 --> 00:09:46,400 But they're pretty much the same thing. 206 00:09:46,400 --> 00:09:47,400 It exactly the same thing. 207 00:09:47,400 --> 00:09:49,600 It just different ways of thinking about it. 208 00:09:49,600 --> 00:09:52,400 So it's either what's the probability of this one part 209 00:09:52,400 --> 00:09:54,700 that just came out of a machine to being defective? 210 00:09:54,700 --> 00:09:58,333 Or what is the portion of all of these parts that came out of a machine 211 00:09:58,333 --> 00:10:00,933 to what is a portion of them that's going to be defective? 212 00:10:00,933 --> 00:10:04,366 So that's the question that we want to answer given all this information. 213 00:10:04,833 --> 00:10:06,700 All right. So let's start solving this problem. 214 00:10:06,700 --> 00:10:09,900 And will illustrate how the Bayes theorem 215 00:10:09,900 --> 00:10:13,133 will help us convert this information to this information. 216 00:10:13,900 --> 00:10:14,233 All right. 217 00:10:14,233 --> 00:10:18,233 So first of all we won't need the probability data for machine one. 218 00:10:18,233 --> 00:10:20,700 And the probability for machine one given defective. 219 00:10:20,700 --> 00:10:21,766 So we're going to 220 00:10:21,766 --> 00:10:25,633 get rid of that out of our information just to clean things up okay. 221 00:10:25,666 --> 00:10:28,666 Now let's just move everything slide everything over to make some space. 222 00:10:29,266 --> 00:10:33,100 And now let's write down the Bayes theorem, which we had a brief 223 00:10:33,100 --> 00:10:34,500 look at before. 224 00:10:34,500 --> 00:10:38,100 And Bayes theorem for this particular problem will look as follows. 225 00:10:38,500 --> 00:10:44,266 So the probability of a defective part, given that it came from machine two. 226 00:10:44,533 --> 00:10:48,466 So probability of a part being defective, given that that part 227 00:10:48,466 --> 00:10:51,500 that that wrench came from machine two equals. 228 00:10:52,400 --> 00:10:54,300 And here here we go. This is the fun part. 229 00:10:54,300 --> 00:10:57,000 So you will take the probability to start from right to left. 230 00:10:57,000 --> 00:11:00,333 We'll take the probability of a part being defective. 231 00:11:00,333 --> 00:11:05,333 Overall we will need to multiply that by the probability 232 00:11:05,633 --> 00:11:10,066 that a part came from machine two given that it was defective. 233 00:11:10,433 --> 00:11:13,433 And we'll need to divide that by the probability 234 00:11:13,466 --> 00:11:16,366 of the part coming from machine two. 235 00:11:16,366 --> 00:11:16,633 All right. 236 00:11:16,633 --> 00:11:20,800 So for now this might seem a bit better already than we saw previously. 237 00:11:20,800 --> 00:11:23,700 At least now we we can read these terms right. 238 00:11:23,700 --> 00:11:25,566 And we, we know how we got them here. 239 00:11:25,566 --> 00:11:28,200 So we actually discussed each one of these terms. 240 00:11:28,200 --> 00:11:31,300 And they're pretty straightforward even though even though 241 00:11:31,300 --> 00:11:35,666 the mathematical representation might look a bit scary at the same time, 242 00:11:35,666 --> 00:11:38,666 they're pretty straightforward terms we know which what we're talking about. 243 00:11:38,700 --> 00:11:41,766 At the same time, we don't really understand where this formula came from. 244 00:11:41,766 --> 00:11:44,566 And, more importantly, on intuitive level, 245 00:11:44,566 --> 00:11:46,466 doesn't really make sense at this stage. 246 00:11:46,466 --> 00:11:47,100 Let's okay, 247 00:11:47,100 --> 00:11:50,900 let's for now, plug in the numbers and then we'll proceed to the 248 00:11:50,900 --> 00:11:52,633 intuitive part of this formula. 249 00:11:52,633 --> 00:11:56,033 So if we plug in the numbers what we'll get is 0.5. 250 00:11:56,066 --> 00:11:58,200 So we'll just take numbers from the top. 251 00:11:58,200 --> 00:12:02,266 Or if we start from right to left again the probability of a part being defective 252 00:12:02,266 --> 00:12:05,633 overall is 1% multiplied 253 00:12:05,633 --> 00:12:08,633 by probability of a part coming from machine two. 254 00:12:08,666 --> 00:12:11,966 Given that we're only looking at defective parts, 255 00:12:11,966 --> 00:12:16,100 or given that that part is known to be defective, it's 50%. 256 00:12:16,100 --> 00:12:17,166 So there it is. 257 00:12:17,166 --> 00:12:18,533 And we're dividing that 258 00:12:18,533 --> 00:12:22,533 by the probability of the part coming from machine to overall. 259 00:12:22,900 --> 00:12:24,266 And that is 40%. 260 00:12:24,266 --> 00:12:25,366 So there we go. 261 00:12:25,366 --> 00:12:29,233 So if we plug that in we get a 0.0125. 262 00:12:29,233 --> 00:12:33,966 So one and a quarter percentage is a 1.25% 263 00:12:34,466 --> 00:12:38,433 is the probability that machine two will spit out a defective part. 264 00:12:38,433 --> 00:12:38,666 Right. 265 00:12:38,666 --> 00:12:42,300 So if you come up to machine two and you pick up a part, 266 00:12:42,633 --> 00:12:46,200 then the probability of that part being defective is 1.25%. 267 00:12:46,800 --> 00:12:49,533 Or the frequentist interpretation is 268 00:12:49,533 --> 00:12:52,500 if machine two produced 1000 parts, 269 00:12:52,500 --> 00:12:57,033 then according to Bayes theorem, 12.5 of them will be defective. 270 00:12:57,033 --> 00:13:01,100 So you can't really say 12.5 wrenches will be defective. 271 00:13:01,100 --> 00:13:04,100 So let's say it produced 10,000 parts. 272 00:13:04,266 --> 00:13:07,900 Then the Bayes theorem tells us that 125 of those wrenches 273 00:13:07,900 --> 00:13:09,066 are going to be defective. 274 00:13:09,066 --> 00:13:11,933 So as you can see, we converted all this information that we knew 275 00:13:11,933 --> 00:13:16,800 about the process and about the results of the process into exactly what we want. 276 00:13:17,233 --> 00:13:19,500 Now let's move on to the intuition, the fun stuff. 277 00:13:19,500 --> 00:13:20,100 Right. 278 00:13:20,100 --> 00:13:22,100 So with this 11. 25%. 279 00:13:22,100 --> 00:13:24,266 So let's move this to the top. 280 00:13:24,266 --> 00:13:25,500 That's our Bayes theorem. 281 00:13:25,500 --> 00:13:27,300 And let's look at an example. 282 00:13:27,300 --> 00:13:30,866 And you will see that it's actually very very intuitive what we just did. 283 00:13:30,866 --> 00:13:32,533 It all makes total sense. 284 00:13:32,533 --> 00:13:34,000 So let's look at an example. 285 00:13:34,000 --> 00:13:36,933 Let's say we produced 1000in all total. 286 00:13:36,933 --> 00:13:40,800 So in total we have a thousand wrenches after the two machines 287 00:13:40,800 --> 00:13:43,000 were working for some time. 288 00:13:43,000 --> 00:13:46,633 So we know that 400 of them came from machine two. 289 00:13:46,633 --> 00:13:46,933 Right. 290 00:13:46,933 --> 00:13:51,266 So we know that machine two produces how many wrenches per hour does it 291 00:13:51,266 --> 00:13:56,266 produce produces when you reach this per hour and machine one produces 292 00:13:56,500 --> 00:14:00,100 30 wrenches per hour as per our information from the very start. 293 00:14:00,300 --> 00:14:04,466 And that means that out of the thousand wrenches, 40% will be 294 00:14:04,500 --> 00:14:06,033 would have been produced by machine two. 295 00:14:06,033 --> 00:14:07,966 That means 400 came from machine two. 296 00:14:07,966 --> 00:14:09,966 Okay, so that makes total sense. 297 00:14:09,966 --> 00:14:12,866 Then we also know that 1% have a defect. 298 00:14:12,866 --> 00:14:16,433 We actually see this the workers who look at the wrenches. 299 00:14:16,433 --> 00:14:18,733 At the end of the day this was also given to us. 300 00:14:18,733 --> 00:14:20,733 They see that 1% have a defect. 301 00:14:20,733 --> 00:14:22,766 And that means out of 1000, 1% is ten. 302 00:14:22,766 --> 00:14:25,766 So ten wrenches have a defect. Okay, great. 303 00:14:26,133 --> 00:14:27,400 And so what's the next step? 304 00:14:27,400 --> 00:14:32,066 Next step is that we know that 50% of those ten came from machine two. 305 00:14:32,233 --> 00:14:37,200 So we know that exactly five of those defective wrenches came from machine two. 306 00:14:37,200 --> 00:14:38,233 That is also given to us 307 00:14:38,233 --> 00:14:41,233 because those defective wrenches actually have labels on them. 308 00:14:41,500 --> 00:14:46,066 And we can tell by the labels that five of them came from machine two. 309 00:14:46,566 --> 00:14:46,866 All right. 310 00:14:46,866 --> 00:14:51,300 So the question is what is it, percent defective parts from machine two. 311 00:14:51,566 --> 00:14:52,800 Well, it's very easy now right. 312 00:14:52,800 --> 00:14:57,100 Because we know that how many defective actually came from machine two five. 313 00:14:57,500 --> 00:15:01,566 And we know how many wrenches came from machine to 400. 314 00:15:01,566 --> 00:15:01,900 Right. 315 00:15:01,900 --> 00:15:07,100 So all we have to do is divide five by 400 and we'll get 1.25. 316 00:15:07,100 --> 00:15:08,500 Right. So we get the same answer. 317 00:15:08,500 --> 00:15:13,233 Not only do we get the same answer, we actually did exactly the same process. 318 00:15:13,233 --> 00:15:17,100 If you think through what we did here, we performed exactly the same process. 319 00:15:17,400 --> 00:15:19,666 We had 1000 wrenches. Right. 320 00:15:19,666 --> 00:15:22,666 And then we turned that into 400. 321 00:15:22,666 --> 00:15:27,400 So we multiplied a thousand wrenches by 40%, which is the probability of a part 322 00:15:27,400 --> 00:15:30,266 come from machine to. So this is our denominator right. 323 00:15:30,266 --> 00:15:33,300 So instead of probability machine two what we have what we just did 324 00:15:33,766 --> 00:15:37,766 this 400 is p of machine two times a thousand right. 325 00:15:37,766 --> 00:15:41,100 So we have p of machine two times a thousand in the denominator 326 00:15:41,600 --> 00:15:44,433 instead of just p of machine to in what we're doing here 327 00:15:44,433 --> 00:15:48,000 right then 1% have a defect that means ten. 328 00:15:48,000 --> 00:15:49,566 So where did this ten come from. 329 00:15:49,566 --> 00:15:51,566 That's 1% times a thousand. 330 00:15:51,566 --> 00:15:53,566 So instead of p defect at the top 331 00:15:53,566 --> 00:15:57,266 we actually have p defect times a thousand right now. 332 00:15:57,500 --> 00:15:59,500 So just visualize that for a second. 333 00:15:59,500 --> 00:16:02,966 So instead of P machine two we have permission two times 334 00:16:03,366 --> 00:16:06,300 a thousand p defect times a thousand. 335 00:16:06,300 --> 00:16:11,866 And then this 50% the probability of them of them 50% came from machine two. 336 00:16:11,866 --> 00:16:13,200 Right. Equals five. 337 00:16:13,200 --> 00:16:17,166 That's this line is actually us performing this operation here. 338 00:16:17,333 --> 00:16:19,266 We're saying the probability of card 339 00:16:19,266 --> 00:16:22,733 coming from machine two, given that it's defective is 50%. 340 00:16:22,733 --> 00:16:26,533 So basically to get to this, what we did is we took this part. 341 00:16:26,533 --> 00:16:28,233 So we took 50%. 342 00:16:28,233 --> 00:16:31,866 We multiply it by the ten which is actually. 343 00:16:32,400 --> 00:16:36,466 So we took 50% multiplied by 1% times a thousand. 344 00:16:36,466 --> 00:16:38,933 So here we've got x times 1000. 345 00:16:38,933 --> 00:16:42,733 And here we've got permission to what we did is 346 00:16:42,733 --> 00:16:45,733 we actually put in p machine two times a thousand. 347 00:16:45,733 --> 00:16:48,633 So we I just added an extra part over here. 348 00:16:48,633 --> 00:16:51,000 Times a thousand and part over here times 1000. 349 00:16:51,000 --> 00:16:52,400 And then we still divided 350 00:16:52,400 --> 00:16:55,800 that final result, that five that we got by the 400 that we got. 351 00:16:56,133 --> 00:17:01,233 So the logical steps that we took are exactly the same as the Bayes theorem. 352 00:17:01,400 --> 00:17:05,066 The only difference is that we looked at a specific example of a thousand wrenches. 353 00:17:05,433 --> 00:17:09,933 So as you can see the Bayes theorem is a very intuitive theorem. 354 00:17:09,933 --> 00:17:13,500 So we're not even going to go into the mathematics of how it's derived. 355 00:17:13,500 --> 00:17:16,100 This is about us understanding it intuitively. 356 00:17:16,100 --> 00:17:20,366 So as long as you remember the mathematical representation of the Bayes 357 00:17:20,366 --> 00:17:24,233 theorem, it actually makes total sense what it is helping us calculate it. 358 00:17:24,233 --> 00:17:29,500 And in the steps that it is implying for us to take to calculate the final result. 359 00:17:30,066 --> 00:17:30,600 So there we go. 360 00:17:30,600 --> 00:17:32,666 Hopefully that all makes sense. 361 00:17:32,666 --> 00:17:35,500 And we only have one final question left. 362 00:17:35,500 --> 00:17:37,966 We only have one obvious question here. 363 00:17:37,966 --> 00:17:41,666 So the question is why do we have to go through all of this complexity? 364 00:17:41,900 --> 00:17:43,866 Why do we have to go through all this complexity? 365 00:17:43,866 --> 00:17:48,233 Why can't we just count the number of defective wrenches from machine two, 366 00:17:48,566 --> 00:17:51,733 and then count the number of total wrenches that came from machine 367 00:17:51,733 --> 00:17:54,733 two and divide one by the other and get that same result. 368 00:17:54,866 --> 00:17:56,000 Why can't we just do that? 369 00:17:56,000 --> 00:17:57,033 Why don't we have that 370 00:17:57,033 --> 00:18:01,066 input of the total number of wrenches that came from machine to right away? 371 00:18:01,066 --> 00:18:02,566 So that's the question. 372 00:18:02,566 --> 00:18:06,300 If the items are labeled, why couldn't we just count the number 373 00:18:06,300 --> 00:18:08,533 of defective wrenches that came from machine two 374 00:18:08,533 --> 00:18:11,200 and divide by the total number that came from machine two? 375 00:18:11,200 --> 00:18:13,666 So if you think about it, that's exactly what we are doing. 376 00:18:13,666 --> 00:18:16,300 So we're dividing the total number of. 377 00:18:16,300 --> 00:18:19,900 So this is the total number of wrenches that came from machine two at the top. 378 00:18:20,066 --> 00:18:25,600 If you multiply by the thousand at the top we've got the number of machines to a. 379 00:18:25,600 --> 00:18:27,300 So defective the probability of a part 380 00:18:27,300 --> 00:18:29,900 coming from machine to if it's given that is defective times 381 00:18:29,900 --> 00:18:30,966 really would be defective. 382 00:18:30,966 --> 00:18:34,033 So at the top as you remember we got that five. 383 00:18:34,033 --> 00:18:39,433 So the top is actually the number of defective parts that came from machine 384 00:18:39,433 --> 00:18:42,866 two at the end of the day we're actually doing that exact thing. 385 00:18:43,066 --> 00:18:45,100 The question is why can we do it right away? 386 00:18:45,100 --> 00:18:49,366 Why couldn't the workers just go and count the number of defective parts 387 00:18:49,366 --> 00:18:52,366 that came from machine two and divided by the number of parts? 388 00:18:52,366 --> 00:18:55,066 That's the overall number of parts that came from Washington. 389 00:18:55,066 --> 00:18:56,466 Why not save some time? 390 00:18:56,466 --> 00:18:59,066 Well, the answer to that question can can be multifold. 391 00:18:59,066 --> 00:19:02,800 First of all, it might be very time consuming like in this example. 392 00:19:03,000 --> 00:19:04,366 Example might be time consuming 393 00:19:04,366 --> 00:19:08,700 to calculate the number of wrenches that just came for a machine to, 394 00:19:08,700 --> 00:19:12,033 for instance, we might know it's might be like a standard metric 395 00:19:12,033 --> 00:19:15,666 that the factory measures how many wrenches produced overall. 396 00:19:15,666 --> 00:19:15,900 Right? 397 00:19:15,900 --> 00:19:18,666 So we know that number 1000 might be 100,000. 398 00:19:18,666 --> 00:19:21,733 But to count them for machine two, it might take them some time 399 00:19:21,733 --> 00:19:24,600 to sit and count through those wrenches. And therefore it's just faster. 400 00:19:24,600 --> 00:19:25,733 Choose Bayes theorem. 401 00:19:25,733 --> 00:19:27,633 The other thing here is that sometimes you 402 00:19:27,633 --> 00:19:29,500 might not have access to that information. 403 00:19:29,500 --> 00:19:33,666 Sometimes the problem might be such that you just cannot perform that. 404 00:19:33,666 --> 00:19:36,666 And you have some inputs like we had in this case, and that's it. 405 00:19:36,866 --> 00:19:40,133 And it's not as simple as in this example. 406 00:19:40,133 --> 00:19:42,900 And therefore you just don't have access to that information. 407 00:19:42,900 --> 00:19:44,100 So there are numerous reasons 408 00:19:44,100 --> 00:19:47,100 why you could be in a situation where you can't just go through 409 00:19:47,100 --> 00:19:51,600 the straightforward, easy, obvious way and you need to best your Bayes theorem. 410 00:19:52,000 --> 00:19:56,033 So therefore it's a useful thing to have in your data science arsenal. 411 00:19:56,266 --> 00:19:58,100 And moreover, now that we know Bayes theorem, 412 00:19:58,100 --> 00:20:01,300 we can proceed to the Naive Bayes classifiers. 413 00:20:01,700 --> 00:20:04,700 And to finish off today's drill, I've got a quick exercise for you. 414 00:20:04,700 --> 00:20:05,966 Perform the same calculation 415 00:20:05,966 --> 00:20:09,700 or similar calculations for this value that we want to calculate. 416 00:20:09,700 --> 00:20:14,000 What is the probability of a part being defective 417 00:20:14,466 --> 00:20:17,066 given that it came from machine one? 418 00:20:17,066 --> 00:20:18,000 So there you go. 419 00:20:18,000 --> 00:20:19,333 It's a handy quick 420 00:20:19,333 --> 00:20:23,233 exercises to solidify this knowledge, and I look forward to seeing an external. 421 00:20:23,233 --> 00:20:25,200 Until then, enjoy machine learning.