1
00:00:01,360 --> 00:00:08,850
In this video, we are going to learn how to implement a double sting and not a double thing or adaptive

2
00:00:08,850 --> 00:00:12,770
boosting can only be done on classification trees in art.

3
00:00:13,320 --> 00:00:15,720
It cannot be done on regular countries.

4
00:00:15,810 --> 00:00:16,440
As of now.

5
00:00:18,320 --> 00:00:23,300
To do Addabbo thing, we use a package called At our back a.D.A B.G..

6
00:00:25,010 --> 00:00:26,760
So we'll first install this package.

7
00:00:42,670 --> 00:00:45,200
The package is downloader and successfully installed.

8
00:00:45,680 --> 00:00:50,150
You can see on that idolater this package is available in the library.

9
00:00:50,890 --> 00:00:54,620
No, be activated by running library command.

10
00:00:59,030 --> 00:01:04,080
And this package is now active to be used since we are going to do classification.

11
00:01:04,380 --> 00:01:10,290
We'll be using the classification training set, which is train C and we will test the performance on

12
00:01:10,290 --> 00:01:13,860
the test classification said, which is STC.

13
00:01:17,400 --> 00:01:24,840
The first thing that we have to do is we have to change the response variable, that is the static Oscar

14
00:01:25,590 --> 00:01:27,540
two factor type of variable.

15
00:01:29,640 --> 00:01:32,070
So if you open the street and he does say.

16
00:01:37,280 --> 00:01:42,050
This last variable is numeric with the range zero or one.

17
00:01:43,220 --> 00:01:49,670
But when we are running a classification model, this has to be a factor type that is categorical.

18
00:01:50,150 --> 00:01:54,980
So I should know that the output variable is a categorical variable.

19
00:01:55,730 --> 00:01:59,810
So we need to change this numeric to factor type for that.

20
00:02:00,050 --> 00:02:02,490
We will run discom on trains each dollar.

21
00:02:02,660 --> 00:02:09,020
This variable gets the value as factor of this way variable to the night on this.

22
00:02:11,190 --> 00:02:12,800
And now I open Kranti.

23
00:02:18,160 --> 00:02:22,450
Now, if I hold over this column, it is in fact, that with two livid.

24
00:02:24,050 --> 00:02:28,670
This variable can now be used for bringing that model.

25
00:02:29,870 --> 00:02:31,970
Now we have to train our model for that.

26
00:02:32,000 --> 00:02:35,570
We use the boosting function in the back package.

27
00:02:37,690 --> 00:02:42,890
This boosting function will give us the model information, which will be stored in another boost variable.

28
00:02:43,670 --> 00:02:45,250
So this will be a very well created.

29
00:02:45,860 --> 00:02:49,630
And it will have the information of the booster model.

30
00:02:51,260 --> 00:02:52,370
And this boosting function.

31
00:02:52,640 --> 00:02:59,690
The first parameter we give is, again, the formula, which is that star tech Oscar will be our dependent

32
00:02:59,690 --> 00:03:01,560
variable at the lesson.

33
00:03:01,970 --> 00:03:09,700
And then all the independent variables for which we give us an Beloff dot data is going to be trained.

34
00:03:09,720 --> 00:03:10,040
See.

35
00:03:11,420 --> 00:03:14,510
There will be a barometer called Boose is equal to true.

36
00:03:15,440 --> 00:03:17,870
If this is true, boosting will be done.

37
00:03:18,230 --> 00:03:25,160
That is when we are taking all the bootstrap sample to train the next tree.

38
00:03:25,820 --> 00:03:29,240
It will have mode where date for the misclassified guesses.

39
00:03:30,480 --> 00:03:40,440
So if I open the help for this function, you can see there is a booze argument, which is just saying

40
00:03:40,470 --> 00:03:46,410
that if through a boot straps and all of the training set is drawn, using the weights for each observation

41
00:03:46,470 --> 00:03:47,380
on that I operation.

42
00:03:48,330 --> 00:03:50,810
So since we want the words to be considered.

43
00:03:51,750 --> 00:03:54,230
Therefore we will keep both is equal to true.

44
00:03:56,460 --> 00:04:04,920
There are other parameters also which are not mandatory, but you can go through them and thus help

45
00:04:04,920 --> 00:04:05,340
section.

46
00:04:07,630 --> 00:04:08,620
Well, run this go on.

47
00:04:14,000 --> 00:04:15,130
And now I have this.

48
00:04:15,660 --> 00:04:21,700
I boost very well, which contains the information of adaptive boosted model.

49
00:04:24,370 --> 00:04:29,020
Now, using the information in this model, I'm going to predict the values on the test data.

50
00:04:29,940 --> 00:04:33,870
And this predicted values will be used to c.D prediction accuracy.

51
00:04:35,070 --> 00:04:36,990
So I used again to predict function.

52
00:04:37,530 --> 00:04:39,450
First parameter will be the model.

53
00:04:39,900 --> 00:04:43,320
Second parameter will be the test set.

54
00:04:45,090 --> 00:04:49,380
So this spread at every level will get the predicted values.

55
00:04:49,740 --> 00:04:51,810
And from this model.

56
00:04:53,010 --> 00:04:54,030
So at underscore on.

57
00:04:57,320 --> 00:05:06,020
And there will be a parade and up believable if I click on this, you can see it has several parts.

58
00:05:06,830 --> 00:05:11,030
One is Formula One is boards, then is probability and so on.

59
00:05:11,750 --> 00:05:16,580
We are interested in this particular one, which is having the predicted class.

60
00:05:18,140 --> 00:05:22,370
So if I want to access this, I like the dialogue class.

61
00:05:23,090 --> 00:05:24,930
So this is what I do in the next part.

62
00:05:26,780 --> 00:05:34,130
I will create a confusion matrix on this table using red at our dollar plus, which is the predicted

63
00:05:34,130 --> 00:05:34,640
values.

64
00:05:35,390 --> 00:05:39,090
These will be needles and Dessy dollar start.

65
00:05:39,150 --> 00:05:39,800
Take Oscar.

66
00:05:39,860 --> 00:05:42,710
These are the actual values toward this variable.

67
00:05:44,000 --> 00:05:49,100
So when I run this on the rolls, I have the predicted values on the columns.

68
00:05:49,130 --> 00:05:50,420
I have the actual values.

69
00:05:51,260 --> 00:05:56,240
These Bagnold are the correctly classified observations.

70
00:05:56,630 --> 00:06:03,350
So we have twenty nine plus forty one correctly class rate observations, which is nearly 70 out of

71
00:06:03,510 --> 00:06:05,630
113 total observations.

72
00:06:05,720 --> 00:06:10,190
So our test set had 113 total of deletions.

73
00:06:11,960 --> 00:06:18,530
So if you want to check the prediction accuracy Greevey 70 by one one three.

74
00:06:21,200 --> 00:06:23,180
And it is nearly sixty two percent.

75
00:06:25,070 --> 00:06:30,740
We can change some of the parameters here to get different prediction accuracy.

76
00:06:31,010 --> 00:06:35,870
So if you look at this help, that is this and final argument.

77
00:06:36,620 --> 00:06:41,780
It is the number of iterations for which boosting is run by default as it is hundred.

78
00:06:42,020 --> 00:06:44,570
If I report and final is equal 2000.

79
00:06:48,620 --> 00:06:54,500
And bring this model again, predictive values again and create this table again.

80
00:07:00,030 --> 00:07:07,140
Now, by increasing the value of M final two, one thousand, which was a little one hundred, you can

81
00:07:07,140 --> 00:07:12,190
see that the prediction accuracy is now 77 out of one or thirteen.

82
00:07:13,920 --> 00:07:21,360
So if we jig's 77 by 113 is sixty eight percent.

83
00:07:23,010 --> 00:07:28,300
So you can see by changing the parameters in the boosting formula.

84
00:07:28,770 --> 00:07:32,180
You can increase the the prediction accuracy of this model.

85
00:07:34,070 --> 00:07:41,540
There is an option to block any tree that it opposed, created through throughout the hydration process.

86
00:07:42,050 --> 00:07:50,240
So if you want to plot the first tree, you can store that value in some variable and you can plot that

87
00:07:50,240 --> 00:07:52,220
value to see the tree.

88
00:07:53,100 --> 00:07:54,830
So this is what the tree looks like.

89
00:07:55,070 --> 00:07:58,910
But it does not have any labelling on it.

90
00:07:59,090 --> 00:08:02,120
That is which renewable was used to split.

91
00:08:02,240 --> 00:08:03,590
And all this splitting value.

92
00:08:03,680 --> 00:08:05,870
All that is not written for that.

93
00:08:05,870 --> 00:08:07,490
We will run this text command.

94
00:08:09,900 --> 00:08:13,470
You can see the text is now here, but still it is not looking good.

95
00:08:14,340 --> 00:08:15,180
If you zoom it.

96
00:08:19,000 --> 00:08:24,130
It comes out to be something like this, although not very readable, but still you can get some idea.

97
00:08:26,410 --> 00:08:34,720
So the three created by Adbusters also readable and you can Broyer using blog function.

98
00:08:36,490 --> 00:08:39,670
So this is how we create an edible model.

99
00:08:40,600 --> 00:08:41,560
Remember a few things.

100
00:08:42,280 --> 00:08:45,000
One, it can be done only on classification trees.

101
00:08:45,730 --> 00:08:53,250
Second, the dependent variable or the variable that you want to predict should be in factor format.

102
00:08:55,750 --> 00:08:59,330
I showed you earlier that our variable was in numeric format.

103
00:08:59,440 --> 00:09:02,380
It had Value 011 with the same values.

104
00:09:02,500 --> 00:09:04,160
It can be changed to factor alter.

105
00:09:05,590 --> 00:09:11,490
So use this as vector to change the training response variable values to factor.

106
00:09:12,490 --> 00:09:19,270
Then you can train the model and predict the values and find out the conclusion matrix in the same way

107
00:09:19,300 --> 00:09:20,210
that we used to do.

108
00:09:20,290 --> 00:09:22,050
What other methods also?

109
00:09:23,530 --> 00:09:27,300
We're definitely open the help for boosting function.

110
00:09:27,780 --> 00:09:29,100
Check out different arguments.

111
00:09:29,640 --> 00:09:36,840
Try out different values for them and try to see what does the impact of changing accuracy or the time

112
00:09:36,840 --> 00:09:42,030
taken to run the program, etc., of changing the value of these parameters.

113
00:09:42,540 --> 00:09:42,920
That's it.

114
00:09:43,110 --> 00:09:43,950
And this we do it in.