1
00:00:00,980 --> 00:00:10,170
In this video, we will learn how to build a cannon based classify skin and classify it uses a function

2
00:00:10,500 --> 00:00:15,540
called gain in this function is part of the glass package.

3
00:00:16,650 --> 00:00:21,040
You can see on the right there is a glass package if it is installed.

4
00:00:21,090 --> 00:00:22,020
It will be shown here.

5
00:00:22,110 --> 00:00:24,960
If it is tarnished all you first have to install it.

6
00:00:25,140 --> 00:00:28,500
Using its target is function once you have started.

7
00:00:28,680 --> 00:00:30,930
Just take it here so that it is active.

8
00:00:33,890 --> 00:00:38,610
So this function the key and in function it requires for input.

9
00:00:40,320 --> 00:00:45,570
The first input is a matrix containing the predictors associated with the training data.

10
00:00:47,250 --> 00:00:49,890
That is all the independent variables.

11
00:00:50,220 --> 00:00:57,210
The X variables should be segregated from the training data and put into a separate variable.

12
00:00:57,800 --> 00:01:00,060
I'll name this very well as Train X.

13
00:01:04,170 --> 00:01:05,610
So why train X variable?

14
00:01:06,420 --> 00:01:14,700
We'll have all the variables of training data set except desalt variable, distorted variable is on

15
00:01:14,700 --> 00:01:16,890
the sixteenth column position.

16
00:01:17,010 --> 00:01:20,340
If you're not sure about it, you can you can open that data.

17
00:01:23,950 --> 00:01:28,210
Scroll to the right and hold over the name of the variable.

18
00:01:28,840 --> 00:01:29,950
This is column 16.

19
00:01:30,940 --> 00:01:32,610
So we will remove the 16th column.

20
00:01:34,170 --> 00:01:37,240
To do that, we will write green.

21
00:01:37,600 --> 00:01:44,350
And of course, it will in square brackets, first parameter will be blank.

22
00:01:44,600 --> 00:01:49,040
That is, we want all zeros, then a comma, then minus 16.

23
00:01:49,450 --> 00:01:51,610
That is not the 16th column.

24
00:01:51,670 --> 00:01:54,940
Everything else we have three.

25
00:01:56,560 --> 00:02:01,120
The second parameter we require is the best predictor variables.

26
00:02:01,240 --> 00:02:02,320
That is test X.

27
00:02:05,930 --> 00:02:14,510
These variables, these observations will be classified by digging in classified, so this Test X will

28
00:02:14,510 --> 00:02:18,890
be graded from the test set and tested.

29
00:02:19,790 --> 00:02:22,340
Also has this old variable which has to be removed.

30
00:02:22,400 --> 00:02:24,530
So they'll remove it by writing minus 16.

31
00:02:27,620 --> 00:02:32,030
The third parameter can and function will take is a vector containing

32
00:02:34,550 --> 00:02:36,740
glass labels for the training observation.

33
00:02:37,130 --> 00:02:37,880
What does that mean?

34
00:02:40,010 --> 00:02:46,400
It contains the Y, but even that is the sole variable for all the reading observations.

35
00:02:46,760 --> 00:02:47,810
So train Y.

36
00:02:51,400 --> 00:02:55,640
Is equal to drain dollar soared.

37
00:02:57,020 --> 00:03:01,850
So does the dependent variable for which we know devalues.

38
00:03:05,450 --> 00:03:10,370
And Test Wavery will again will be the dependent variable of be tested.

39
00:03:11,360 --> 00:03:19,060
This will be used to compare the performance of the predicted values of Y against this test sort of

40
00:03:19,060 --> 00:03:19,280
way.

41
00:03:21,420 --> 00:03:26,250
But this test way is not the vote, but every day before that, and we don't know for certain.

42
00:03:26,620 --> 00:03:29,760
D given you engage nearest neighbor.

43
00:03:29,880 --> 00:03:32,880
I told you that we fixed the rally off nearest neighbor.

44
00:03:32,990 --> 00:03:35,970
Be we consider that is the value of key.

45
00:03:36,690 --> 00:03:38,660
We have to input that value of key.

46
00:03:39,540 --> 00:03:43,220
So here we will use a key value of three first.

47
00:03:43,760 --> 00:03:46,920
We'll create a variable called is equal to three.

48
00:03:49,140 --> 00:03:57,720
I also told you that since kanon classify user distances, it is important that we standardize these

49
00:03:57,720 --> 00:04:03,200
variables so that all the variables have an equal impact in terms of their scale.

50
00:04:05,440 --> 00:04:09,030
So to standardize the variables, we use a variable called scale.

51
00:04:10,470 --> 00:04:14,040
So the standardized version of the will be Tenex.

52
00:04:14,280 --> 00:04:14,910
Code is.

53
00:04:21,080 --> 00:04:25,590
Is equal to scale and within bracket.

54
00:04:25,640 --> 00:04:28,780
We will give these tenex, but he would.

55
00:04:33,250 --> 00:04:35,270
So, I mean, I leave, you will do it for mistakes.

56
00:04:36,860 --> 00:04:44,300
We do not need to do this for divine variables because they are categorical and they do not need scaling.

57
00:04:53,150 --> 00:04:58,250
One last thing that we have to do before running the and classify it is setting seed.

58
00:04:58,760 --> 00:05:06,770
As I told you, whenever there is a time by assigning the class to an observation or a science class

59
00:05:06,860 --> 00:05:07,430
randomly.

60
00:05:08,060 --> 00:05:15,260
So when you are doing it and when I am doing it, we should both get the same reasons to do that.

61
00:05:15,500 --> 00:05:19,910
We will set a seed by writing set seed zero.

62
00:05:21,470 --> 00:05:24,920
If you do this, your design and manager will be exactly same.

63
00:05:25,490 --> 00:05:28,520
So for the reproducibility of the desert, we are setting the seed.

64
00:05:31,310 --> 00:05:33,820
Now we are ready to run again and classify it.

65
00:05:36,440 --> 00:05:38,230
We will write candles and dark, Fred.

66
00:05:38,600 --> 00:05:43,790
This is the variable name which will contain the result of the garden and model.

67
00:05:45,350 --> 00:05:46,790
It starts with the gain and function.

68
00:05:48,680 --> 00:05:53,720
The first barometer is train X under Skoda's.

69
00:05:57,030 --> 00:05:59,520
Second, barometer's districts underscore this.

70
00:06:05,090 --> 00:06:08,030
The third parameter is green light.

71
00:06:15,200 --> 00:06:19,070
And the last word on that is D gave a news, look, gays equal to gay.

72
00:06:21,630 --> 00:06:23,400
So this gay is.

73
00:06:24,450 --> 00:06:30,500
But I made a name for this function and this gay is the variable name that I have to sign.

74
00:06:31,110 --> 00:06:34,840
So here you can put gays equal to one, not gay is equal to three radically alter.

75
00:06:35,460 --> 00:06:42,240
I've just put I've just say separated it here so that whenever I change the value of gay here I can

76
00:06:42,340 --> 00:06:43,320
add on the whole analysis.

77
00:06:43,440 --> 00:06:46,950
I mean, so let's run this.

78
00:06:50,710 --> 00:06:53,300
So the model are stored in gain and.

79
00:06:54,700 --> 00:07:01,410
If you want to create the confusion matrix, we will use this CNN prayed and the best way variable so

80
00:07:01,490 --> 00:07:05,040
they are able to get in and prayed

81
00:07:09,480 --> 00:07:10,920
Colma Bestway.

82
00:07:11,610 --> 00:07:12,550
So get in prayer.

83
00:07:12,580 --> 00:07:18,070
Has deep predicted values display as the actual values.

84
00:07:23,800 --> 00:07:26,950
So here is our confusion, matrix using beginning classifier.

85
00:07:28,550 --> 00:07:38,840
You can see we are getting 66 out of 120 correct responses, correct predictions and 54 incorrect predictions.

86
00:07:40,820 --> 00:07:43,940
Now the JADI value of gate from three to one.

87
00:07:52,100 --> 00:07:53,040
And then done this again.

88
00:07:55,220 --> 00:08:00,950
You can see that now we are getting only fifty nine correct predictions on the desert.

89
00:08:01,850 --> 00:08:04,670
That is a live we were getting 66 correct predictions.

90
00:08:06,170 --> 00:08:12,410
You can see by changing the value of key, that is by increasing the flexibility of Deakin and model.

91
00:08:13,130 --> 00:08:16,010
We get different accuracy of our gain in model.

92
00:08:18,750 --> 00:08:22,560
So you can see this is the template of the gear nearest neighbor model.

93
00:08:23,820 --> 00:08:26,760
We first get this class package.

94
00:08:28,020 --> 00:08:31,800
Then we create X that sticks train Y Bestway.

95
00:08:32,490 --> 00:08:34,200
And we assign a key value.

96
00:08:34,620 --> 00:08:40,110
All these four will going to the gain and function to give us the predicted values.

97
00:08:41,430 --> 00:08:47,130
Remember to standardize the values of all the dependent variables before putting it into the can and

98
00:08:47,130 --> 00:08:47,370
model.

99
00:08:48,240 --> 00:08:48,690
That's it.

100
00:08:48,990 --> 00:08:49,590
And this we do.