1 00:00:00,270 --> 00:00:02,420 Now we've seen cross validation in action. 2 00:00:02,430 --> 00:00:07,250 Let's see some other important classification model evaluation metrics. 3 00:00:07,250 --> 00:00:09,550 And to do so we're going to start a new heading here. 4 00:00:09,930 --> 00:00:15,780 It's so important to evaluate our models of valuation metrics and the four main ones we're going to 5 00:00:15,780 --> 00:00:24,850 cover for classification models accuracy the next one is area under ROIC curve. 6 00:00:25,050 --> 00:00:28,240 Don't worry if you're not sure these are we gonna go through them. 7 00:00:28,280 --> 00:00:31,990 Third one is a confusion matrix sounds fancy run. 8 00:00:32,040 --> 00:00:34,920 And the final one is a classification report. 9 00:00:35,670 --> 00:00:40,090 So to start off we'll keep it nice and simple we'll do accuracy. 10 00:00:40,230 --> 00:00:46,350 So I will remind ourselves of how we can train a machine learning model and evaluate it by importing 11 00:00:46,650 --> 00:00:56,230 cross value score from psychic loans model selection and then we'll import our random forest classifier 12 00:01:00,170 --> 00:01:01,840 one of Reagan press tab. 13 00:01:01,840 --> 00:01:02,020 Yeah. 14 00:01:02,020 --> 00:01:02,440 There we go. 15 00:01:02,830 --> 00:01:03,760 Beautiful. 16 00:01:03,760 --> 00:01:08,570 We'll set up a random seed and then we'll create our X data 17 00:01:17,330 --> 00:01:18,820 and we'll create our y data 18 00:01:23,520 --> 00:01:24,620 beautiful. 19 00:01:24,660 --> 00:01:27,150 Now we'll set up a test and try and split. 20 00:01:27,150 --> 00:01:33,360 So we'll go here x train x test y train y test. 21 00:01:33,490 --> 00:01:37,130 Well actually we don't need to because we can just use the cross Val school. 22 00:01:37,230 --> 00:01:38,310 So let's do that. 23 00:01:38,310 --> 00:01:45,940 We might put a little heading up here to make sure we know that in this case we're doing accuracy F 24 00:01:46,040 --> 00:01:55,970 equals random forest classifier and then we'll print out the cross value score by going passing it out 25 00:01:55,970 --> 00:01:59,390 classifier L X data and what do you fivefold. 26 00:01:59,390 --> 00:02:03,280 Cross Validation. 27 00:02:03,380 --> 00:02:04,430 Wonderful. 28 00:02:04,430 --> 00:02:05,210 So here we go. 29 00:02:05,210 --> 00:02:09,720 We've got a number of estimates will change getting this classic warning. 30 00:02:10,080 --> 00:02:16,920 So this is just a reminder we can set no estimate as to one hundred and that'll get rid of that beautiful. 31 00:02:16,950 --> 00:02:23,700 So what this is going to give back is because the default score parameter of our classifier is the main 32 00:02:23,700 --> 00:02:24,460 accuracy. 33 00:02:24,480 --> 00:02:27,620 This is what the cross Val score is measuring here. 34 00:02:27,780 --> 00:02:32,240 Actually we might save this to a variable cross Val score equals that. 35 00:02:32,310 --> 00:02:33,500 Run it again. 36 00:02:33,780 --> 00:02:41,010 And then if we take the MP main of cross Val score we've seen this one in the cross validation video. 37 00:02:41,010 --> 00:02:46,140 This is going to give us the main accuracy of our model and basically this comes out as a decimal but 38 00:02:46,140 --> 00:02:55,530 we can easily type this out to be print I actually f heart disease classifier accuracy. 39 00:02:55,530 --> 00:03:01,440 So whether or not our model can classify whether someone has heart disease or not given their parameters 40 00:03:01,920 --> 00:03:04,140 we want to put in here. 41 00:03:04,450 --> 00:03:14,850 NDP don't mean cross Val score we're gonna times this by 100 and we want to decimal places we maybe 42 00:03:14,880 --> 00:03:21,870 need a percentage sign on the end here beautiful this is gonna be cross validated accuracy actually 43 00:03:22,200 --> 00:03:27,820 so that's important to note cross validated accuracy so that's not too bad right. 44 00:03:27,960 --> 00:03:34,010 And what accuracy is actually saying is given a random sample given the sample that a model hasn't seen 45 00:03:34,010 --> 00:03:34,430 before. 46 00:03:34,730 --> 00:03:37,990 How likely is it to predict the right label. 47 00:03:38,120 --> 00:03:41,960 So if we have a look at our heart disease data we might do it under this cell. 48 00:03:42,290 --> 00:03:43,660 So it doesn't interfere with ours. 49 00:03:43,680 --> 00:03:54,960 So like a lone cone disease don't head so given a sample that looks like this has age sex C.P. dressed 50 00:03:54,980 --> 00:03:55,770 TB. 51 00:03:55,860 --> 00:04:01,280 P.S. Cole FBA has all these features given a sample like that to our train model. 52 00:04:01,280 --> 00:04:08,030 How likely is it to predict the right target and in our case our models cross validated accuracy is 53 00:04:08,090 --> 00:04:11,680 eighty two point four eight per cent so around about eighty two point five. 54 00:04:12,050 --> 00:04:19,400 So that means 82 or about just over eight times out of ten our model will predict the right label given 55 00:04:19,550 --> 00:04:24,700 a sample something like this based on the original training data. 56 00:04:24,910 --> 00:04:26,650 So that's accuracy in a nutshell. 57 00:04:27,960 --> 00:04:32,610 What we're going to dive into next is area under the ROIC curve and if that sounds confusing it's a 58 00:04:32,610 --> 00:04:34,500 bit of a mouthful but we'll go through it. 59 00:04:34,620 --> 00:04:39,990 So that's kind of how you would present your models accuracy in print out something like this again 60 00:04:39,990 --> 00:04:45,930 it returns by default a decimal in communication stalwarts it's easier if you are represented as a percentage.