1
00:00:00,350 --> 00:00:00,750
All right.

2
00:00:01,050 --> 00:00:06,720
I'm very excited it's time to start modelling it's time to start using machine learning to figure out

3
00:00:06,720 --> 00:00:12,620
whether or not we can classify whether someone has heart disease or not based on their health parameter.

4
00:00:12,720 --> 00:00:18,330
So in previous video we said that we're going to try out three machine learning models so we'll put

5
00:00:18,330 --> 00:00:20,180
them out here.

6
00:00:20,250 --> 00:00:35,780
We're going to try three different machine learning models one logistic regression to K nearest

7
00:00:38,400 --> 00:00:47,240
neighbors classifier and three random forest classifier.

8
00:00:47,240 --> 00:00:53,270
And if you're wondering how we came about these three because we went through this diagram the psychic

9
00:00:53,290 --> 00:00:58,030
line machine learning map we've we followed this through based on our data.

10
00:00:58,150 --> 00:01:03,660
We came across here we actually a good question you might be asking is hey I don't see logistic regression

11
00:01:03,670 --> 00:01:08,260
we kind of covered that before where it said logistic regression that doesn't really make sense it's

12
00:01:08,260 --> 00:01:09,550
got regression here.

13
00:01:09,550 --> 00:01:15,660
So we're not trying to predict no more trying to predict a category why are we using it for classification

14
00:01:15,670 --> 00:01:20,850
and I was pretty stumped too when I found out that we could use logistic regression for classification

15
00:01:21,390 --> 00:01:24,040
you might be thinking even further back.

16
00:01:24,040 --> 00:01:28,030
How did you even find logistic regression if it's not in here.

17
00:01:28,030 --> 00:01:33,340
Well I was a bit confused at this too I was wondering why is it logistic regression listed here.

18
00:01:33,910 --> 00:01:44,110
But after searching for go S.K. learn logistic regression and you might be wondering what if I didn't

19
00:01:44,110 --> 00:01:46,990
even know to search for SBA loan logistic regression.

20
00:01:46,990 --> 00:01:57,040
Well the way I stumbled upon this is I just went machine learning models used for classification problems.

21
00:01:57,040 --> 00:02:01,930
I search something like that and then someone suggested logistic regression.

22
00:02:01,960 --> 00:02:07,300
And so I decided to search logistic regression and I found it in here.

23
00:02:07,300 --> 00:02:13,390
So I get learn it had an implementation I could of course build it from scratch but I'm more of a practitioner

24
00:02:13,390 --> 00:02:21,870
right I want to apply models that someone else's built in the past so and then I read through and maybe

25
00:02:21,870 --> 00:02:27,210
it wasn't in this logistic regression maybe it was in the user guide that's I think where it was so

26
00:02:27,210 --> 00:02:29,220
logistic regression Here we go.

27
00:02:29,220 --> 00:02:33,870
Despite its name is a linear model for classification rather than regression.

28
00:02:33,870 --> 00:02:39,450
So this line here kind of clarifies the confusion in the name in logistic regression.

29
00:02:39,840 --> 00:02:41,370
And of course you could read through all this.

30
00:02:41,370 --> 00:02:45,060
You go to some map here of how it's actually implemented.

31
00:02:45,060 --> 00:02:51,300
But the reason why we're trying it is because we're pretending that we've tried linear SBC we actually

32
00:02:51,300 --> 00:02:54,960
have in a previous socket line video I'm going to say that it's not working.

33
00:02:55,290 --> 00:02:59,460
So now we're trying we've gone through here we're not working in text data we're going to try k near

34
00:02:59,460 --> 00:03:01,980
as neighbors classifier and ensemble classifier.

35
00:03:02,010 --> 00:03:07,480
So that's that's how we've deduced these three different models and why logistic regression.

36
00:03:07,500 --> 00:03:12,120
Well I've decided to throw it in here because it's not listed there and we want to kind of check it

37
00:03:12,120 --> 00:03:17,220
out because if it's not on the the standard list of algorithms to try we might try and figure it out

38
00:03:17,220 --> 00:03:18,500
anyway.

39
00:03:18,600 --> 00:03:23,840
And again if this process seems like it's not really structured at all it's like your way.

40
00:03:23,930 --> 00:03:25,970
We did Daniel find that machine learning model.

41
00:03:25,980 --> 00:03:27,600
What do you even think of.

42
00:03:27,600 --> 00:03:31,080
Again it's just from searching something like this.

43
00:03:31,080 --> 00:03:36,760
Once we've defined that our problem is classification if you're trying to figure out which machine learning

44
00:03:36,760 --> 00:03:42,130
model that you want to use it's something as simple as this and you could search for it and explore.

45
00:03:42,460 --> 00:03:46,680
And if we go back to our keynote here that's what we're doing.

46
00:03:46,690 --> 00:03:47,850
We're up to experiments.

47
00:03:47,840 --> 00:03:50,410
We're trying different machine learning models.

48
00:03:50,410 --> 00:03:53,340
That's enough talking actually.

49
00:03:53,640 --> 00:03:56,770
One more thing and that's what I want to try and encourage you to do right.

50
00:03:56,790 --> 00:04:00,830
Is that because there's so many different ways to do things and machine learning.

51
00:04:01,020 --> 00:04:06,090
A lot of it is about just searching for the answer or not even the answer just asking questions and

52
00:04:06,090 --> 00:04:07,490
trying to implement them.

53
00:04:07,500 --> 00:04:09,390
That's what we're gonna do now.

54
00:04:09,390 --> 00:04:14,790
So because we want to build three different models and because we want to evaluate them and because

55
00:04:14,790 --> 00:04:19,140
we want to do a couple of experiments to see which one is the best we want to train them on the training

56
00:04:19,140 --> 00:04:26,910
data and test them on the test data evaluate them what we might do is set up a little dictionary with

57
00:04:26,910 --> 00:04:34,590
our models in it and then we'll create a function to fit and score our models so rather than rewrite

58
00:04:34,590 --> 00:04:38,490
all the same code for feeding and evaluating our models set settle up in a function.

59
00:04:38,490 --> 00:04:39,870
Yeah that's a that's a good idea.

60
00:04:39,870 --> 00:04:42,200
Daniel let's do that all right.

61
00:04:42,950 --> 00:04:47,960
So put models in a dictionary.

62
00:04:47,960 --> 00:05:02,090
We're going to get models equals logistic regression and we're gonna go here and this is logistic regression

63
00:05:03,800 --> 00:05:09,750
and then we want to go okay and then we're going to instantiate the class here.

64
00:05:09,860 --> 00:05:16,130
So it's gonna be k neighbors we can probably place tab order complete let's be real and then we're gonna

65
00:05:16,130 --> 00:05:24,660
go here random forest and this is going to be our trusty random forest classifier beautiful.

66
00:05:25,350 --> 00:05:29,730
And then let's go create a function.

67
00:05:29,730 --> 00:05:34,290
We want to fit and score the model so we want to train it need a function to train our models in the

68
00:05:34,290 --> 00:05:40,830
training data and then evaluate them on the test data to fit and score models.

69
00:05:40,940 --> 00:05:48,240
Death fit and score something simple it'll need to take out a dictionary of models and then we'll take

70
00:05:48,240 --> 00:05:57,890
X train x test y train y test and we could put a little doctoring here just to make sure our code is

71
00:05:58,270 --> 00:05:59,010
is legible.

72
00:05:59,130 --> 00:06:02,140
If someone else was to use this function what would it do.

73
00:06:02,220 --> 00:06:11,990
So fits and evaluates given machine learning models and then we'll tell it what parameters we're using

74
00:06:11,990 --> 00:06:21,690
so models a dig of different so I could learn Shayne lining models.

75
00:06:21,960 --> 00:06:22,880
What's X trained.

76
00:06:22,970 --> 00:06:41,610
We know what X Chinese training data no libels x test is testing data No Labels and then why train is

77
00:06:41,670 --> 00:06:50,790
training labels and then why test is test labels beautiful.

78
00:06:51,630 --> 00:06:57,940
And now what should we do we might set a random seed set random seed even if it's within the function

79
00:06:58,270 --> 00:07:03,200
to make sure that our results are reproducible and then we're going to go here.

80
00:07:03,400 --> 00:07:09,730
We'll make a list to keep model scores or actually this is really better.

81
00:07:10,090 --> 00:07:11,470
Make a dictionary.

82
00:07:11,530 --> 00:07:14,660
That's what we want because we're working with dictionaries.

83
00:07:14,710 --> 00:07:22,080
We want to go model scores equals an empty dictionary because we're gonna fill it up in a second we'll

84
00:07:22,080 --> 00:07:25,050
talk through this function as we as we do it.

85
00:07:25,140 --> 00:07:27,490
So loop through models.

86
00:07:27,540 --> 00:07:35,140
This is where we're going to go for name model so name is the key model is the value of our dictionary.

87
00:07:35,160 --> 00:07:44,010
So for key value in models to item so we're accessing the items of a dictionary with DOT items then

88
00:07:44,010 --> 00:07:48,340
we're gonna go fit the model to the data.

89
00:07:48,340 --> 00:07:54,090
That's where we go model dot fit X train y train.

90
00:07:54,090 --> 00:07:55,380
So if fitting our model there.

91
00:07:55,380 --> 00:08:03,720
So for each model it's going to fit it to the training data and then we want to evaluate the model and

92
00:08:03,840 --> 00:08:06,690
append its score to model scores.

93
00:08:06,720 --> 00:08:11,490
So this is how we're going to evaluate each model in one hit.

94
00:08:11,490 --> 00:08:15,330
We're going to save the name of it to our dictionary.

95
00:08:15,330 --> 00:08:19,560
Here is the key and then it's score as a value in model scores.

96
00:08:19,560 --> 00:08:22,660
So model scores name.

97
00:08:22,830 --> 00:08:25,620
So we're creating a key in the model scores.

98
00:08:25,620 --> 00:08:26,790
Empty dictionary.

99
00:08:26,970 --> 00:08:30,330
And then we're going to go equals model dot score.

100
00:08:30,330 --> 00:08:32,490
So this is our model here.

101
00:08:32,490 --> 00:08:37,020
So for example if it was logistic regression we've just fit it.

102
00:08:37,050 --> 00:08:40,090
So logistic regression not fit X train y train.

103
00:08:40,260 --> 00:08:50,550
And now we're appending logistic regression dots score x test y test to model scores under the name

104
00:08:51,180 --> 00:08:52,200
logistic regression.

105
00:08:53,010 --> 00:08:57,570
And if we've gone through that and you're thinking well that's a fair few steps in it it's only about

106
00:08:57,570 --> 00:08:58,950
two or three but that's right.

107
00:08:58,950 --> 00:09:03,840
When I first worked with these kind of functions I was eye color on used to just doing this line by

108
00:09:03,840 --> 00:09:08,590
line but by being efficient in this project.

109
00:09:08,760 --> 00:09:13,640
So we're gonna return model scores which will return a dictionary so we'll hit shift and enter.

110
00:09:13,860 --> 00:09:14,640
Mm hmm.

111
00:09:14,880 --> 00:09:16,380
And I wonder if this will work.

112
00:09:16,380 --> 00:09:20,610
So basically what it's going to do is we're gonna take our dictionary of models it's gonna set up a

113
00:09:20,610 --> 00:09:26,340
random seed it'll set up an empty dictionary and it's gonna loop through our models dictionary.

114
00:09:26,370 --> 00:09:34,340
So for name model So for logistic regression logistic regression and model store items logistic regression

115
00:09:34,440 --> 00:09:37,300
we'll pretend it's logistic regression not fit.

116
00:09:37,410 --> 00:09:42,990
So selling the logistic regression model to find the patterns in the training data and then it's going

117
00:09:42,990 --> 00:09:50,640
to create a key in our empty dictionary model scores with the score of how well our logistic regression

118
00:09:50,640 --> 00:09:57,210
model performs on the test data a.k.a. using the patterns that it's found in the training data and then

119
00:09:57,210 --> 00:10:03,360
it's going to return our model scores dictionary and then we're gonna see how each of our model performs

120
00:10:03,780 --> 00:10:08,490
in the test dataset and our three models here we should have three different scores.

121
00:10:08,730 --> 00:10:13,700
So I'm to leave it on a cliffhanger there and we're gonna evaluate our three models in the next video

122
00:10:13,890 --> 00:10:14,850
so we'll see their.