1
00:00:00,270 --> 00:00:02,420
Now we've seen cross validation in action.

2
00:00:02,430 --> 00:00:07,250
Let's see some other important classification model evaluation metrics.

3
00:00:07,250 --> 00:00:09,550
And to do so we're going to start a new heading here.

4
00:00:09,930 --> 00:00:15,780
It's so important to evaluate our models of valuation metrics and the four main ones we're going to

5
00:00:15,780 --> 00:00:24,850
cover for classification models accuracy the next one is area under ROIC curve.

6
00:00:25,050 --> 00:00:28,240
Don't worry if you're not sure these are we gonna go through them.

7
00:00:28,280 --> 00:00:31,990
Third one is a confusion matrix sounds fancy run.

8
00:00:32,040 --> 00:00:34,920
And the final one is a classification report.

9
00:00:35,670 --> 00:00:40,090
So to start off we'll keep it nice and simple we'll do accuracy.

10
00:00:40,230 --> 00:00:46,350
So I will remind ourselves of how we can train a machine learning model and evaluate it by importing

11
00:00:46,650 --> 00:00:56,230
cross value score from psychic loans model selection and then we'll import our random forest classifier

12
00:01:00,170 --> 00:01:01,840
one of Reagan press tab.

13
00:01:01,840 --> 00:01:02,020
Yeah.

14
00:01:02,020 --> 00:01:02,440
There we go.

15
00:01:02,830 --> 00:01:03,760
Beautiful.

16
00:01:03,760 --> 00:01:08,570
We'll set up a random seed and then we'll create our X data

17
00:01:17,330 --> 00:01:18,820
and we'll create our y data

18
00:01:23,520 --> 00:01:24,620
beautiful.

19
00:01:24,660 --> 00:01:27,150
Now we'll set up a test and try and split.

20
00:01:27,150 --> 00:01:33,360
So we'll go here x train x test y train y test.

21
00:01:33,490 --> 00:01:37,130
Well actually we don't need to because we can just use the cross Val school.

22
00:01:37,230 --> 00:01:38,310
So let's do that.

23
00:01:38,310 --> 00:01:45,940
We might put a little heading up here to make sure we know that in this case we're doing accuracy F

24
00:01:46,040 --> 00:01:55,970
equals random forest classifier and then we'll print out the cross value score by going passing it out

25
00:01:55,970 --> 00:01:59,390
classifier L X data and what do you fivefold.

26
00:01:59,390 --> 00:02:03,280
Cross Validation.

27
00:02:03,380 --> 00:02:04,430
Wonderful.

28
00:02:04,430 --> 00:02:05,210
So here we go.

29
00:02:05,210 --> 00:02:09,720
We've got a number of estimates will change getting this classic warning.

30
00:02:10,080 --> 00:02:16,920
So this is just a reminder we can set no estimate as to one hundred and that'll get rid of that beautiful.

31
00:02:16,950 --> 00:02:23,700
So what this is going to give back is because the default score parameter of our classifier is the main

32
00:02:23,700 --> 00:02:24,460
accuracy.

33
00:02:24,480 --> 00:02:27,620
This is what the cross Val score is measuring here.

34
00:02:27,780 --> 00:02:32,240
Actually we might save this to a variable cross Val score equals that.

35
00:02:32,310 --> 00:02:33,500
Run it again.

36
00:02:33,780 --> 00:02:41,010
And then if we take the MP main of cross Val score we've seen this one in the cross validation video.

37
00:02:41,010 --> 00:02:46,140
This is going to give us the main accuracy of our model and basically this comes out as a decimal but

38
00:02:46,140 --> 00:02:55,530
we can easily type this out to be print I actually f heart disease classifier accuracy.

39
00:02:55,530 --> 00:03:01,440
So whether or not our model can classify whether someone has heart disease or not given their parameters

40
00:03:01,920 --> 00:03:04,140
we want to put in here.

41
00:03:04,450 --> 00:03:14,850
NDP don't mean cross Val score we're gonna times this by 100 and we want to decimal places we maybe

42
00:03:14,880 --> 00:03:21,870
need a percentage sign on the end here beautiful this is gonna be cross validated accuracy actually

43
00:03:22,200 --> 00:03:27,820
so that's important to note cross validated accuracy so that's not too bad right.

44
00:03:27,960 --> 00:03:34,010
And what accuracy is actually saying is given a random sample given the sample that a model hasn't seen

45
00:03:34,010 --> 00:03:34,430
before.

46
00:03:34,730 --> 00:03:37,990
How likely is it to predict the right label.

47
00:03:38,120 --> 00:03:41,960
So if we have a look at our heart disease data we might do it under this cell.

48
00:03:42,290 --> 00:03:43,660
So it doesn't interfere with ours.

49
00:03:43,680 --> 00:03:54,960
So like a lone cone disease don't head so given a sample that looks like this has age sex C.P. dressed

50
00:03:54,980 --> 00:03:55,770
TB.

51
00:03:55,860 --> 00:04:01,280
P.S. Cole FBA has all these features given a sample like that to our train model.

52
00:04:01,280 --> 00:04:08,030
How likely is it to predict the right target and in our case our models cross validated accuracy is

53
00:04:08,090 --> 00:04:11,680
eighty two point four eight per cent so around about eighty two point five.

54
00:04:12,050 --> 00:04:19,400
So that means 82 or about just over eight times out of ten our model will predict the right label given

55
00:04:19,550 --> 00:04:24,700
a sample something like this based on the original training data.

56
00:04:24,910 --> 00:04:26,650
So that's accuracy in a nutshell.

57
00:04:27,960 --> 00:04:32,610
What we're going to dive into next is area under the ROIC curve and if that sounds confusing it's a

58
00:04:32,610 --> 00:04:34,500
bit of a mouthful but we'll go through it.

59
00:04:34,620 --> 00:04:39,990
So that's kind of how you would present your models accuracy in print out something like this again

60
00:04:39,990 --> 00:04:45,930
it returns by default a decimal in communication stalwarts it's easier if you are represented as a percentage.