1 00:00:02,100 --> 00:00:04,200 Here I've got a confusion matrix. 2 00:00:04,200 --> 00:00:07,233 And this confusion matrix has 10,000 records in it. 3 00:00:07,600 --> 00:00:10,600 And it represents scenario number one which we'll be looking at. 4 00:00:11,100 --> 00:00:15,400 As you can see this model has made 150 type one errors and 50 type two errors. 5 00:00:15,733 --> 00:00:18,733 And but overall is predicted quite a lot correctly. 6 00:00:18,933 --> 00:00:21,933 Now let's calculate the accuracy rate in this scenario. 7 00:00:22,200 --> 00:00:24,566 The accuracy rate is the total correct divide. 8 00:00:24,566 --> 00:00:29,466 But overall total and it's 9800 divided by 10,000 which is 98%. 9 00:00:30,133 --> 00:00:30,766 Okay great. 10 00:00:30,766 --> 00:00:32,300 But what are we going to do now? 11 00:00:32,300 --> 00:00:35,866 Is we going to tell the model to stop making predictions 12 00:00:35,866 --> 00:00:38,866 whatsoever, which is going to abandon the model completely? 13 00:00:38,933 --> 00:00:42,433 And we're going to say that from now on, our prediction is always zero. 14 00:00:42,433 --> 00:00:46,000 We're always going to predict that the event is not going to occur. 15 00:00:46,333 --> 00:00:49,200 So basically what will happen to the confusion matrix is 16 00:00:49,200 --> 00:00:52,200 these records will move from the right column to the left column. 17 00:00:52,366 --> 00:00:57,900 And our new confusion matrix will look like this 9850 150. 18 00:00:57,900 --> 00:01:01,766 And then nothing in the predicted column where we predicted 19 00:01:01,766 --> 00:01:03,600 that something will occur. 20 00:01:03,600 --> 00:01:05,800 And of course, that move is against all logic, right? 21 00:01:05,800 --> 00:01:07,666 Why would you abandon a model? 22 00:01:07,666 --> 00:01:10,666 But let's calculate the accuracy rate in this scenario. 23 00:01:10,800 --> 00:01:13,200 Scenario two accuracy rate has the same formula. 24 00:01:13,200 --> 00:01:17,800 In this case, accuracy rate is 9850 divided by 10,000. 25 00:01:18,200 --> 00:01:20,800 So it's 98.5%. 26 00:01:20,800 --> 00:01:23,800 The accuracy rate went up by half a percent. 27 00:01:24,300 --> 00:01:29,200 And as you can see, what we did is we just completely stopped using a model. 28 00:01:29,200 --> 00:01:31,466 But the accuracy rate went up and 29 00:01:32,400 --> 00:01:34,633 that is why you should not 30 00:01:34,633 --> 00:01:39,933 base your judgment just on accuracy rate, because things like this can happen. 31 00:01:39,933 --> 00:01:44,200 And even though obviously you're not using a model anymore, 32 00:01:44,200 --> 00:01:48,266 which means that, you're not applying any kind of logic 33 00:01:48,600 --> 00:01:52,633 into your decision making process, your accuracy rate is going up. 34 00:01:52,633 --> 00:01:55,800 So it's misleading you in, into a wrong 35 00:01:55,800 --> 00:01:58,800 conclusion that you should stop using models. 36 00:01:58,833 --> 00:02:01,566 And this effect is called the accuracy paradox. 37 00:02:01,566 --> 00:02:05,100 And starting from the next tutorial, I will show you a much better way 38 00:02:05,100 --> 00:02:09,100 to assess your models using the cumulative accuracy profile. 39 00:02:09,633 --> 00:02:10,566 I look forward to seeing you then. 40 00:02:10,566 --> 00:02:13,566 Until next time, happy analyzing!