1
00:00:00,800 --> 00:00:01,100
All right.

2
00:00:01,100 --> 00:00:01,566
Great.

3
00:00:01,566 --> 00:00:02,866
So now they're all copies.

4
00:00:02,866 --> 00:00:04,566
Therefore we can modify them.

5
00:00:04,566 --> 00:00:09,366
Let me just show you once again how I
modified the original classification code.

6
00:00:09,366 --> 00:00:12,766
We made before to these new ones,
you know, more simplified ones,

7
00:00:12,766 --> 00:00:15,766
which will get you the accuracy
quickly and efficiently.

8
00:00:15,866 --> 00:00:19,566
So the data preprocessing phase
I kept exactly the same, including,

9
00:00:19,566 --> 00:00:22,566
you know, the feature scaling
applied to x ray, the next test.

10
00:00:22,600 --> 00:00:26,566
But here I put a template name
and I highlighted

11
00:00:26,566 --> 00:00:30,400
that you have to enter
the name of your data set here.

12
00:00:30,466 --> 00:00:32,100
So that's what we'll do I'll show you.

13
00:00:32,100 --> 00:00:34,133
But that's the only thing
that was changed.

14
00:00:34,133 --> 00:00:34,566
Indeed.

15
00:00:34,566 --> 00:00:37,733
We don't have to do much else
because this will automatically select

16
00:00:37,733 --> 00:00:40,500
all the features
and not your dependent variable.

17
00:00:40,500 --> 00:00:43,533
And this will automatically select
your dependent variable.

18
00:00:43,533 --> 00:00:46,500
And that's provided
of course you have in your data set.

19
00:00:46,500 --> 00:00:49,100
First
the features you know in the first columns

20
00:00:49,100 --> 00:00:52,033
and last the dependent variable
in the last column.

21
00:00:52,033 --> 00:00:53,766
Right. Make sure of this.

22
00:00:53,766 --> 00:00:57,333
What we're going to do now
works for any data sets, regardless

23
00:00:57,333 --> 00:00:58,466
of the number of features.

24
00:00:58,466 --> 00:01:01,466
As long as they have the features
in the first columns

25
00:01:01,466 --> 00:01:03,333
and the dependent variable
in the last column.

26
00:01:03,333 --> 00:01:05,666
Make sure to remember this.

27
00:01:05,666 --> 00:01:07,433
All right. And then all good here.

28
00:01:07,433 --> 00:01:10,266
If you want, you can change that from
oh point 25 to 0.2.

29
00:01:10,266 --> 00:01:10,900
But that's fine.

30
00:01:10,900 --> 00:01:14,100
Both values work
well then feature scaling.

31
00:01:14,733 --> 00:01:17,833
All right then
for each of the classification model,

32
00:01:17,833 --> 00:01:20,900
I kept, you know, the code to implement
and train it.

33
00:01:21,100 --> 00:01:24,900
And finally what I did in the last
cells is simply I removed,

34
00:01:24,933 --> 00:01:29,200
you know, the prints that displayed
the vector of predictions and the vector

35
00:01:29,233 --> 00:01:31,166
real results next to each other, because,

36
00:01:31,166 --> 00:01:33,833
you know, we don't really need it
for our model selection process.

37
00:01:33,833 --> 00:01:36,800
However, what I did
is that I kept this, of course,

38
00:01:36,800 --> 00:01:40,533
but in order to compute the confusion
matrix and the accuracy,

39
00:01:40,800 --> 00:01:44,933
I had to create that y print vector
containing all the predictions

40
00:01:45,166 --> 00:01:49,233
by calling to predict method apply
to x test from our classifier.

41
00:01:49,466 --> 00:01:50,533
And that's all I did.

42
00:01:50,533 --> 00:01:53,033
And I did the same
in all the different files.

43
00:01:53,033 --> 00:01:53,533
Right?

44
00:01:53,533 --> 00:01:57,966
K-nearest neighbors data preprocessing,
phase training and confusion matrix.

45
00:01:58,266 --> 00:02:00,000
Same support vector machine.

46
00:02:00,000 --> 00:02:03,000
Data preprocessing training and confusion
matrix.

47
00:02:03,366 --> 00:02:08,266
Then kernel SVM, same data preprocessing,
phase training and confusion matrix.

48
00:02:08,700 --> 00:02:11,733
Naive Bayes
data preprocessing, phase training

49
00:02:11,733 --> 00:02:15,266
and confusion
matrix and decision tree classification.

50
00:02:15,400 --> 00:02:16,300
Data preprocessing.

51
00:02:16,300 --> 00:02:20,100
Phase training, confusion matrix
and finally random forest

52
00:02:20,400 --> 00:02:23,600
data preprocessing,
phase training and confusion matrix C.

53
00:02:23,800 --> 00:02:28,033
So you have the exact same code templates
for each of the classification models

54
00:02:28,033 --> 00:02:29,100
we built together.

55
00:02:29,100 --> 00:02:31,700
The only thing that changed
is actually this cell,

56
00:02:31,700 --> 00:02:35,066
because this cell actually builds
and train the classification model.

57
00:02:35,066 --> 00:02:38,033
You want to try through this model
selection process.

58
00:02:38,033 --> 00:02:39,200
All right. Perfect.

59
00:02:39,200 --> 00:02:42,233
So now
we're getting very close to the demo.

60
00:02:42,366 --> 00:02:46,233
And so just to recap this demo works
for any data set.

61
00:02:46,233 --> 00:02:48,000
Regard list of the number of features.

62
00:02:48,000 --> 00:02:50,633
And as long as you have your features
in the first columns

63
00:02:50,633 --> 00:02:52,966
and your dependent variable
in the last column,

64
00:02:52,966 --> 00:02:54,300
and also as long as you don't have

65
00:02:54,300 --> 00:02:58,066
some special data preprocessing tools
to use on your data set.

66
00:02:58,266 --> 00:03:01,333
If you have any categorical variables
you know in strings

67
00:03:01,333 --> 00:03:04,633
or categorical variables
where you have to perform one hot encoding

68
00:03:04,766 --> 00:03:08,966
well, don't forget to use your data
preprocessing toolkit to preprocess

69
00:03:08,966 --> 00:03:10,100
the right way your data set,

70
00:03:10,100 --> 00:03:14,266
and then you can just deploy all your
classification code templates here.

71
00:03:14,400 --> 00:03:18,200
And that, my friends, is exactly
what I'm about to show you right now.

72
00:03:18,200 --> 00:03:20,400
So now the demo is going to start.

73
00:03:20,400 --> 00:03:21,300
Are you ready?

74
00:03:21,300 --> 00:03:24,400
Three. Two one go. All right.

75
00:03:24,400 --> 00:03:25,200
So I'm going to do this

76
00:03:25,200 --> 00:03:28,866
as efficiently as I can in order
to show you the power of code templates.

77
00:03:29,200 --> 00:03:34,033
So first step the first step is to upload
the data set inside the notebook.

78
00:03:34,033 --> 00:03:36,966
Right now it is connecting to runtime
to enable file browsing.

79
00:03:36,966 --> 00:03:38,700
Actually I'm going to do this for

80
00:03:38,700 --> 00:03:42,266
each of the models here because you know
it always takes a few seconds.

81
00:03:42,533 --> 00:03:45,033
So let's do it
this way to be efficient. Right.

82
00:03:45,033 --> 00:03:48,866
So I'm just
you know, loading all the files here.

83
00:03:49,500 --> 00:03:50,966
All right. Perfect.

84
00:03:50,966 --> 00:03:52,466
And everything.

85
00:03:52,466 --> 00:03:55,466
You know
every file is now connecting to a runtime.

86
00:03:55,600 --> 00:03:56,433
Now be careful.

87
00:03:56,433 --> 00:03:59,233
If you don't see the sample data here
you have to refresh.

88
00:03:59,233 --> 00:04:02,033
Otherwise you will have issues
uploading your data set.

89
00:04:02,033 --> 00:04:03,400
Good. Now it's good.

90
00:04:03,400 --> 00:04:05,966
So the next step we upload the data set.

91
00:04:05,966 --> 00:04:06,233
All right.

92
00:04:06,233 --> 00:04:09,900
So this is the model selection folder and
more precisely the classification folder.

93
00:04:10,100 --> 00:04:11,866
But let me show you the path again

94
00:04:11,866 --> 00:04:15,800
I put this machine learning
model selection folder into my desktop.

95
00:04:15,800 --> 00:04:18,666
But make sure to find it on your machine
wherever it is.

96
00:04:18,666 --> 00:04:20,933
If you have not downloaded that already,

97
00:04:20,933 --> 00:04:23,366
make sure to download it
right before this tutorial.

98
00:04:23,366 --> 00:04:26,700
In the article you will find the link
at the bottom of the article.

99
00:04:27,100 --> 00:04:30,733
Then together we're going to go inside,
then inside classification

100
00:04:30,733 --> 00:04:31,600
and there we go.

101
00:04:31,600 --> 00:04:34,666
We select data dot csv,
then we click open.

102
00:04:34,966 --> 00:04:36,266
Then we press okay.

103
00:04:36,266 --> 00:04:39,300
And then what
we simply need to do inside this code

104
00:04:39,333 --> 00:04:42,833
template is just to put here
the name of the data set.

105
00:04:42,833 --> 00:04:46,966
So you just double click
this and then enter data that CSV

106
00:04:46,966 --> 00:04:49,966
or you know
the name of your future data set.

107
00:04:50,166 --> 00:04:51,266
All right. And that's it.

108
00:04:51,266 --> 00:04:54,400
That's all we have to do in each code
template.

109
00:04:54,400 --> 00:04:58,266
Only one thing to change so that we can
really call it a curved template.

110
00:04:58,633 --> 00:04:59,166
All right. Great.

111
00:04:59,166 --> 00:05:02,266
So now we're going to do the same
in each other implementation.

112
00:05:02,533 --> 00:05:04,066
So now k nearest neighbors.

113
00:05:04,066 --> 00:05:06,900
Let's refresh this
because we need to see this. There we go.

114
00:05:06,900 --> 00:05:10,933
Then upload then data dot CSV then open.

115
00:05:11,266 --> 00:05:12,966
All right. Perfect okay.

116
00:05:12,966 --> 00:05:17,000
And then we replace here
the name by data dot CSV.

117
00:05:17,233 --> 00:05:18,100
Perfect.

118
00:05:18,100 --> 00:05:21,900
Then next one support
vector machine refresh upload.

119
00:05:22,433 --> 00:05:26,866
Then data dot CSV then open and perfect.

120
00:05:27,000 --> 00:05:28,133
We have the data set.

121
00:05:28,133 --> 00:05:33,466
Now we replace this by data
dot CSV and all good SVM is ready now.

122
00:05:33,466 --> 00:05:35,200
Kernel SVM refresh

123
00:05:36,233 --> 00:05:38,200
upload data

124
00:05:38,200 --> 00:05:41,200
dot CSV open okay

125
00:05:42,066 --> 00:05:45,300
replacing this by data dot CSV
or the name of your future data set.

126
00:05:45,566 --> 00:05:47,833
And there we go. Kernel SVM is ready.

127
00:05:47,833 --> 00:05:51,266
All right then Naive Bayes refresh

128
00:05:51,766 --> 00:05:55,966
upload data dot CSV open okay.

129
00:05:55,966 --> 00:05:58,966
And then replacing this by data dot CSV.