1
00:00:00,170 --> 00:00:01,770
Hoo hoo hoo hoo hoo.

2
00:00:01,980 --> 00:00:04,090
I'm rubbing my hands if you can't hear them.

3
00:00:04,230 --> 00:00:04,980
You know why.

4
00:00:05,100 --> 00:00:06,240
Because I'm pumped.

5
00:00:06,270 --> 00:00:13,080
I'm ready to to evaluate our machine learning models or see how they go see how each of these models

6
00:00:13,170 --> 00:00:18,220
using the function we've just created in the last video see how logistic regression goes.

7
00:00:18,260 --> 00:00:24,000
Caner its name is classifier and random for its classifier goes at finding patterns in our training

8
00:00:24,000 --> 00:00:27,660
data and then how those patterns get evaluated in our test data.

9
00:00:28,380 --> 00:00:36,440
So without any further ado let's see how each of these models performs so we can call our function fit

10
00:00:36,440 --> 00:00:40,230
and score and hopefully it works and we got here.

11
00:00:40,530 --> 00:00:46,710
We see that it takes models or actually if we press shift tab this is where our doctoring comes in handy

12
00:00:46,710 --> 00:00:51,050
because it tells us what happens fits and evaluates given machine learning models.

13
00:00:51,060 --> 00:00:51,970
This is beautiful.

14
00:00:51,990 --> 00:00:56,850
This is just like what we've seen with other functions but except we've created this one ourself.

15
00:00:57,060 --> 00:00:58,800
That's the helpfulness of a doctrine right.

16
00:00:59,370 --> 00:01:00,290
So we're gonna go here.

17
00:01:00,300 --> 00:01:09,200
Models equals models which is our dictionary of machine learning models we could just go X train x test

18
00:01:10,390 --> 00:01:17,650
but for completeness we're gonna go X train equals x train and then we're going to go X test equals

19
00:01:17,680 --> 00:01:28,830
x test and then we're going to go y train equals Y train and then finally y test equals Y test.

20
00:01:29,080 --> 00:01:36,860
And if we've written that function correctly what do you think it's going to return.

21
00:01:36,930 --> 00:01:40,390
I'll give you a few seconds okay.

22
00:01:40,520 --> 00:01:42,230
That's enough because we want to see it.

23
00:01:42,230 --> 00:01:45,380
If in doubt run the code let's see what they our function returns

24
00:01:48,100 --> 00:01:48,380
okay.

25
00:01:48,410 --> 00:01:49,800
So we get some warnings here.

26
00:01:49,820 --> 00:01:52,140
What's it say stop.

27
00:01:52,140 --> 00:01:58,150
Total number of iterations reached would require us to dive in.

28
00:01:58,170 --> 00:02:00,920
Increase the number of iterations so this is where it's helpful.

29
00:02:00,960 --> 00:02:04,260
Increase the number of iterations Max ITER or scale it down.

30
00:02:04,260 --> 00:02:09,780
So this is saying that potentially our logistic regression model could be improved and to figure out

31
00:02:09,780 --> 00:02:14,770
how you could do that would require going to the documentation for alternative solver options.

32
00:02:14,820 --> 00:02:17,510
But what we're gonna do is just work with what we've got so far.

33
00:02:17,580 --> 00:02:23,000
So if we have a look at this this is the score how each of our models without tuning.

34
00:02:23,160 --> 00:02:25,470
Oh that's a spoiler for what's coming out.

35
00:02:25,470 --> 00:02:35,160
This is how each of our models as the baseline class has performed on finding patterns in our test data.

36
00:02:35,190 --> 00:02:36,390
So it's getting the score here.

37
00:02:36,390 --> 00:02:37,850
This is what we're getting back.

38
00:02:37,950 --> 00:02:45,180
So if we look at this which ones are highest look at logistic regression coming in as the dark horse.

39
00:02:45,450 --> 00:02:48,440
Not even on the machine learning map.

40
00:02:48,600 --> 00:02:49,410
Right.

41
00:02:49,500 --> 00:02:51,710
And it's getting the highest score.

42
00:02:51,710 --> 00:02:52,690
Mm hmm.

43
00:02:52,740 --> 00:02:59,310
Well this might take a little bit of investigation because remember where we up to we're two experiments

44
00:02:59,320 --> 00:03:00,790
and that's what we're doing here.

45
00:03:00,790 --> 00:03:05,650
An experiment is comparing different models trying different models and comparing them to each other.

46
00:03:05,650 --> 00:03:07,810
Now this is a comparison in a dictionary.

47
00:03:07,810 --> 00:03:12,750
But if we wanted to show this to someone we might compare it visually.

48
00:03:12,800 --> 00:03:16,900
So let's do that nice and quickly we can make a model.

49
00:03:16,900 --> 00:03:19,100
Actually we might make a little heading here.

50
00:03:19,590 --> 00:03:28,560
So we go here model comparison beautiful model compare equals might turn our dictionary into a data

51
00:03:28,560 --> 00:03:32,070
frame because that's nice and simple to model scores.

52
00:03:32,100 --> 00:03:33,990
This is just taking this dictionary here.

53
00:03:34,020 --> 00:03:41,420
Model scores and we're gonna set the index to be what does it have to be and has to be a list called

54
00:03:41,450 --> 00:03:44,360
accuracy because that's what this score function here.

55
00:03:44,360 --> 00:03:48,520
Because our models are all classifiers their default score metric is accuracy.

56
00:03:48,530 --> 00:03:50,420
We saw that in their socket Loan Section.

57
00:03:50,930 --> 00:03:59,720
So if we go here model of compare we need to transpose it plot dot bar boom.

58
00:03:59,720 --> 00:04:04,580
Now I'll just show you why we need to transpose it because if we did that it would look we'd like that

59
00:04:04,790 --> 00:04:07,880
and we actually want it to look good like that.

60
00:04:08,690 --> 00:04:13,110
So this is this is quickly showing how accurate each of our different models are.

61
00:04:13,300 --> 00:04:20,060
And you can see logistic regression just tips out our random forest and K and N Well because that's

62
00:04:20,060 --> 00:04:22,850
nowhere near the accuracy of our logistic regression model.

63
00:04:22,850 --> 00:04:27,950
We're going to say goodbye to cane in so once we've got this graph this might be something that we can

64
00:04:27,950 --> 00:04:33,470
take to the boss or one of our colleagues and say hey look at this we've built a machine learning model

65
00:04:33,590 --> 00:04:36,920
a beautiful logistic regression model and it's performed the best.

66
00:04:36,920 --> 00:04:40,630
So we're going to use the logistic regression model in practice.

67
00:04:40,640 --> 00:04:41,660
And so you might go.

68
00:04:41,690 --> 00:04:42,550
I found it.

69
00:04:42,890 --> 00:04:44,000
And your boss is like.

70
00:04:44,480 --> 00:04:45,140
Nice one.

71
00:04:45,140 --> 00:04:46,410
What did you find.

72
00:04:46,640 --> 00:04:51,990
And then you're like well the best algorithm for predicting heart disease is logistic regression and

73
00:04:51,990 --> 00:04:53,360
then she might say something like.

74
00:04:54,180 --> 00:04:55,280
Excellent.

75
00:04:55,280 --> 00:04:57,970
I'm surprised the hyper Reverend attorney isn't finished by now.

76
00:04:58,050 --> 00:05:01,930
And then you might wonder what is hyper parameter tuning.

77
00:05:01,950 --> 00:05:02,800
Yeah me too.

78
00:05:02,810 --> 00:05:04,230
I went pretty quick.

79
00:05:04,230 --> 00:05:09,300
So you sort of covering yourself here and then she might say Well I'm very proud.

80
00:05:09,570 --> 00:05:14,190
How about you put together a classification report to show the team and be sure to include a confusion

81
00:05:14,190 --> 00:05:18,490
matrix and the class validated precision recall on F1 scores.

82
00:05:18,510 --> 00:05:21,150
I'd also be curious to see what features are most important.

83
00:05:21,480 --> 00:05:27,350
Oh and don't forget to include an ROIC curve and then you might be thinking or asking yourself okay.

84
00:05:27,370 --> 00:05:29,870
There's some person a lot of complex words there.

85
00:05:29,920 --> 00:05:30,310
Right.

86
00:05:30,940 --> 00:05:35,370
But then you actually say of course I'll have it to you by tomorrow.

87
00:05:35,820 --> 00:05:37,390
We're going to take care of all these things.

88
00:05:37,440 --> 00:05:41,640
We're going to take care of all these things in the next video because we're still at a point where

89
00:05:42,210 --> 00:05:45,690
even though we found a great machine learning model and we've probably shown that to the boss of one

90
00:05:45,690 --> 00:05:51,690
of our colleagues and they've gone wow logistic regression performing at 88 percent accuracy we're still

91
00:05:51,690 --> 00:05:59,230
not near our valuation metric which is if we come back up the top which is we said we kind of want at

92
00:05:59,230 --> 00:06:02,970
least 95 percent accuracy to continue with this experiment.

93
00:06:03,010 --> 00:06:08,230
So that's what we're going to be doing as well as fulfilling all those requests that the boss asked

94
00:06:08,230 --> 00:06:13,630
us just to make sure our model was a little bit more robust than just getting the default score metric

95
00:06:14,860 --> 00:06:18,020
so without any further ado take a little break.

96
00:06:18,050 --> 00:06:23,770
Have a review of what we've done so far and we're going to review some of the things that we need to

97
00:06:23,770 --> 00:06:30,490
make sure we do to make sure our classification models are evaluated correctly and are improved as well

98
00:06:30,490 --> 00:06:31,060
as possible.