1 00:00:00,133 --> 00:00:01,100 Hello and welcome back. 2 00:00:01,100 --> 00:00:05,333 Today we're talking about the confusion matrix and different accuracy ratios. 3 00:00:05,866 --> 00:00:09,300 So let's imagine that we are building a model 4 00:00:09,300 --> 00:00:14,200 that will predict based on images X-ray images of lungs 5 00:00:14,200 --> 00:00:17,200 whether there is cancer or not in the lungs. 6 00:00:17,466 --> 00:00:20,100 So we're going to build a matrix here on the left. 7 00:00:20,100 --> 00:00:22,600 On the top we have the prediction of our model. 8 00:00:22,600 --> 00:00:25,500 And it can be negative no cancer or positive. 9 00:00:25,500 --> 00:00:26,900 Yeah there is cancer. 10 00:00:26,900 --> 00:00:32,100 And then on the left we have the actual state of things is there. 11 00:00:32,333 --> 00:00:33,900 is it negative? 12 00:00:33,900 --> 00:00:36,866 There is no cancer and in reality there is no cancer. 13 00:00:36,866 --> 00:00:40,666 that person doesn't have cancer or positive. 14 00:00:40,666 --> 00:00:42,900 That person does have cancer. 15 00:00:42,900 --> 00:00:47,800 So and once we cross these rows and columns will have four different cells. 16 00:00:47,833 --> 00:00:50,833 The top left one is called the true negative. 17 00:00:51,466 --> 00:00:54,133 and by the way, before we go into this, 18 00:00:54,133 --> 00:00:58,066 the positioning of these cells depends on the source. 19 00:00:58,066 --> 00:00:59,833 Yet you are looking at some, 20 00:00:59,833 --> 00:01:04,266 places draw confusion matrix one way, other places other way. 21 00:01:04,266 --> 00:01:08,933 So and I'll link to a, an article about this at the end of this tutorial. 22 00:01:09,400 --> 00:01:11,666 So back to back to our true negative. 23 00:01:11,666 --> 00:01:15,400 So the cross between negative and negative is a true negative 24 00:01:15,400 --> 00:01:18,400 meaning the model predicted that there is no cancer. 25 00:01:18,600 --> 00:01:21,133 And in reality that person doesn't have cancer. 26 00:01:21,133 --> 00:01:26,400 So in this use case, it's a great result, for the person that, in question. 27 00:01:27,066 --> 00:01:29,300 Now the bottom right is called the true positive. 28 00:01:29,300 --> 00:01:31,833 The model predicted that this person has cancer. 29 00:01:31,833 --> 00:01:34,500 And in reality, they also have cancer. 30 00:01:34,500 --> 00:01:35,833 And while this is not 31 00:01:35,833 --> 00:01:39,366 a great result at all for the person in question, in this model, 32 00:01:39,866 --> 00:01:46,066 at least they know that they have cancer and they can now address it 33 00:01:46,066 --> 00:01:49,066 with their a doctor and potentially 34 00:01:49,066 --> 00:01:52,066 hopefully get better, get treatment. 35 00:01:52,200 --> 00:01:57,200 Now in the top right we have something that's called a false positive. 36 00:01:57,700 --> 00:02:00,600 The model predicts that the person has cancer, but in reality 37 00:02:00,600 --> 00:02:01,833 they don't have cancer. 38 00:02:01,833 --> 00:02:04,233 And this is called a type one error. 39 00:02:04,233 --> 00:02:07,800 And this is a problem because even though for the person 40 00:02:07,800 --> 00:02:10,800 in question, it's a relief that they don't have cancer. 41 00:02:10,933 --> 00:02:15,900 Imagine the emotional stress and suffering that they will go through 42 00:02:15,933 --> 00:02:19,633 when the when they're told by the model that they have cancer, 43 00:02:19,966 --> 00:02:21,100 even though they don't have to go 44 00:02:21,100 --> 00:02:24,533 through the stress and suffering because they don't actually have any cancer. 45 00:02:25,200 --> 00:02:28,900 so it would be much better if the model told them the correct answer. 46 00:02:29,866 --> 00:02:31,933 made the correct prediction that they don't have cancer. 47 00:02:31,933 --> 00:02:34,933 So it would be much better if the model gave them a true negative. 48 00:02:35,366 --> 00:02:39,266 And then in the bottom left we have a false negative, which is a type 49 00:02:39,266 --> 00:02:43,666 two error where the person actually does have cancer in in reality. 50 00:02:43,666 --> 00:02:45,233 But the model says they don't. 51 00:02:45,233 --> 00:02:50,166 And this is a very dangerous type of error because, in this use case, 52 00:02:50,600 --> 00:02:55,400 the doctors won't even treat the person won't even recommend any treatment plan 53 00:02:55,400 --> 00:02:56,300 because they'll think 54 00:02:56,300 --> 00:02:59,566 that the person doesn't have cancer and the cancer can grow and get worse. 55 00:03:00,033 --> 00:03:02,500 So both errors are not great. 56 00:03:02,500 --> 00:03:04,233 And we want to avoid them. 57 00:03:04,233 --> 00:03:06,200 So the less errors our models make, the better. 58 00:03:06,200 --> 00:03:10,200 Now let's populate this matrix with actual figures. 59 00:03:10,200 --> 00:03:12,700 Let's say we have 100 patients out of them. 60 00:03:12,700 --> 00:03:17,333 our model made some predictions and we have 43 true negatives, 41 true 61 00:03:17,333 --> 00:03:18,333 positives. 62 00:03:18,333 --> 00:03:21,900 12 the type one errors or false positives 63 00:03:21,900 --> 00:03:26,033 and four, type two errors or false negatives. 64 00:03:26,366 --> 00:03:31,666 From this confusion matrix, we can calculate the following rates or ratios. 65 00:03:31,666 --> 00:03:34,666 We have the accuracy rate and the error rate. 66 00:03:34,766 --> 00:03:38,433 The accuracy rate is the total number of correct predictions, 67 00:03:38,900 --> 00:03:41,500 meaning the true negatives plus the true positives divided 68 00:03:41,500 --> 00:03:45,500 by the total number of patients in this sample. 69 00:03:45,500 --> 00:03:48,400 So we have 84 divide 100 or 84%. 70 00:03:48,400 --> 00:03:52,666 And the error rate is the total number of incorrect predictions, 71 00:03:52,800 --> 00:03:55,700 meaning the type one errors plus the type two errors. 72 00:03:56,700 --> 00:03:57,066 we have 73 00:03:57,066 --> 00:04:00,300 16 of those divided by the total sample size 100. 74 00:04:00,300 --> 00:04:03,300 So we have 16% error rate here. 75 00:04:03,500 --> 00:04:07,233 So those are two, important rates to be able to calculate. 76 00:04:07,933 --> 00:04:10,833 So that's how the confusion matrix matrix works. 77 00:04:10,833 --> 00:04:13,900 And as additional reading, I highly recommend checking out 78 00:04:13,900 --> 00:04:18,733 this article to, have it handy for future use 79 00:04:19,000 --> 00:04:23,000 where the person investigated, the different 80 00:04:23,900 --> 00:04:28,600 forms of confusion matrix because, different tools, Python 81 00:04:28,600 --> 00:04:32,000 or any other tool that you might be using can, produce 82 00:04:32,000 --> 00:04:35,733 a different confusion matrix is better, is good to read through it. 83 00:04:35,733 --> 00:04:39,300 So you always know which confusion matrix you're dealing with in specific instance 84 00:04:39,633 --> 00:04:40,800 and in the tutorial. 85 00:04:40,800 --> 00:04:44,366 In all practical terms, you'll be dealing with the one that we discussed today 86 00:04:44,366 --> 00:04:46,433 and the one that's pictured here on the bottom right. 87 00:04:47,433 --> 00:04:48,733 So I hope you enjoyed this tutorial. 88 00:04:48,733 --> 00:04:51,566 I'll see you next time. Until then, enjoy machine learning.