1 00:00:01,360 --> 00:00:08,850 In this video, we are going to learn how to implement a double sting and not a double thing or adaptive 2 00:00:08,850 --> 00:00:12,770 boosting can only be done on classification trees in art. 3 00:00:13,320 --> 00:00:15,720 It cannot be done on regular countries. 4 00:00:15,810 --> 00:00:16,440 As of now. 5 00:00:18,320 --> 00:00:23,300 To do Addabbo thing, we use a package called At our back a.D.A B.G.. 6 00:00:25,010 --> 00:00:26,760 So we'll first install this package. 7 00:00:42,670 --> 00:00:45,200 The package is downloader and successfully installed. 8 00:00:45,680 --> 00:00:50,150 You can see on that idolater this package is available in the library. 9 00:00:50,890 --> 00:00:54,620 No, be activated by running library command. 10 00:00:59,030 --> 00:01:04,080 And this package is now active to be used since we are going to do classification. 11 00:01:04,380 --> 00:01:10,290 We'll be using the classification training set, which is train C and we will test the performance on 12 00:01:10,290 --> 00:01:13,860 the test classification said, which is STC. 13 00:01:17,400 --> 00:01:24,840 The first thing that we have to do is we have to change the response variable, that is the static Oscar 14 00:01:25,590 --> 00:01:27,540 two factor type of variable. 15 00:01:29,640 --> 00:01:32,070 So if you open the street and he does say. 16 00:01:37,280 --> 00:01:42,050 This last variable is numeric with the range zero or one. 17 00:01:43,220 --> 00:01:49,670 But when we are running a classification model, this has to be a factor type that is categorical. 18 00:01:50,150 --> 00:01:54,980 So I should know that the output variable is a categorical variable. 19 00:01:55,730 --> 00:01:59,810 So we need to change this numeric to factor type for that. 20 00:02:00,050 --> 00:02:02,490 We will run discom on trains each dollar. 21 00:02:02,660 --> 00:02:09,020 This variable gets the value as factor of this way variable to the night on this. 22 00:02:11,190 --> 00:02:12,800 And now I open Kranti. 23 00:02:18,160 --> 00:02:22,450 Now, if I hold over this column, it is in fact, that with two livid. 24 00:02:24,050 --> 00:02:28,670 This variable can now be used for bringing that model. 25 00:02:29,870 --> 00:02:31,970 Now we have to train our model for that. 26 00:02:32,000 --> 00:02:35,570 We use the boosting function in the back package. 27 00:02:37,690 --> 00:02:42,890 This boosting function will give us the model information, which will be stored in another boost variable. 28 00:02:43,670 --> 00:02:45,250 So this will be a very well created. 29 00:02:45,860 --> 00:02:49,630 And it will have the information of the booster model. 30 00:02:51,260 --> 00:02:52,370 And this boosting function. 31 00:02:52,640 --> 00:02:59,690 The first parameter we give is, again, the formula, which is that star tech Oscar will be our dependent 32 00:02:59,690 --> 00:03:01,560 variable at the lesson. 33 00:03:01,970 --> 00:03:09,700 And then all the independent variables for which we give us an Beloff dot data is going to be trained. 34 00:03:09,720 --> 00:03:10,040 See. 35 00:03:11,420 --> 00:03:14,510 There will be a barometer called Boose is equal to true. 36 00:03:15,440 --> 00:03:17,870 If this is true, boosting will be done. 37 00:03:18,230 --> 00:03:25,160 That is when we are taking all the bootstrap sample to train the next tree. 38 00:03:25,820 --> 00:03:29,240 It will have mode where date for the misclassified guesses. 39 00:03:30,480 --> 00:03:40,440 So if I open the help for this function, you can see there is a booze argument, which is just saying 40 00:03:40,470 --> 00:03:46,410 that if through a boot straps and all of the training set is drawn, using the weights for each observation 41 00:03:46,470 --> 00:03:47,380 on that I operation. 42 00:03:48,330 --> 00:03:50,810 So since we want the words to be considered. 43 00:03:51,750 --> 00:03:54,230 Therefore we will keep both is equal to true. 44 00:03:56,460 --> 00:04:04,920 There are other parameters also which are not mandatory, but you can go through them and thus help 45 00:04:04,920 --> 00:04:05,340 section. 46 00:04:07,630 --> 00:04:08,620 Well, run this go on. 47 00:04:14,000 --> 00:04:15,130 And now I have this. 48 00:04:15,660 --> 00:04:21,700 I boost very well, which contains the information of adaptive boosted model. 49 00:04:24,370 --> 00:04:29,020 Now, using the information in this model, I'm going to predict the values on the test data. 50 00:04:29,940 --> 00:04:33,870 And this predicted values will be used to c.D prediction accuracy. 51 00:04:35,070 --> 00:04:36,990 So I used again to predict function. 52 00:04:37,530 --> 00:04:39,450 First parameter will be the model. 53 00:04:39,900 --> 00:04:43,320 Second parameter will be the test set. 54 00:04:45,090 --> 00:04:49,380 So this spread at every level will get the predicted values. 55 00:04:49,740 --> 00:04:51,810 And from this model. 56 00:04:53,010 --> 00:04:54,030 So at underscore on. 57 00:04:57,320 --> 00:05:06,020 And there will be a parade and up believable if I click on this, you can see it has several parts. 58 00:05:06,830 --> 00:05:11,030 One is Formula One is boards, then is probability and so on. 59 00:05:11,750 --> 00:05:16,580 We are interested in this particular one, which is having the predicted class. 60 00:05:18,140 --> 00:05:22,370 So if I want to access this, I like the dialogue class. 61 00:05:23,090 --> 00:05:24,930 So this is what I do in the next part. 62 00:05:26,780 --> 00:05:34,130 I will create a confusion matrix on this table using red at our dollar plus, which is the predicted 63 00:05:34,130 --> 00:05:34,640 values. 64 00:05:35,390 --> 00:05:39,090 These will be needles and Dessy dollar start. 65 00:05:39,150 --> 00:05:39,800 Take Oscar. 66 00:05:39,860 --> 00:05:42,710 These are the actual values toward this variable. 67 00:05:44,000 --> 00:05:49,100 So when I run this on the rolls, I have the predicted values on the columns. 68 00:05:49,130 --> 00:05:50,420 I have the actual values. 69 00:05:51,260 --> 00:05:56,240 These Bagnold are the correctly classified observations. 70 00:05:56,630 --> 00:06:03,350 So we have twenty nine plus forty one correctly class rate observations, which is nearly 70 out of 71 00:06:03,510 --> 00:06:05,630 113 total observations. 72 00:06:05,720 --> 00:06:10,190 So our test set had 113 total of deletions. 73 00:06:11,960 --> 00:06:18,530 So if you want to check the prediction accuracy Greevey 70 by one one three. 74 00:06:21,200 --> 00:06:23,180 And it is nearly sixty two percent. 75 00:06:25,070 --> 00:06:30,740 We can change some of the parameters here to get different prediction accuracy. 76 00:06:31,010 --> 00:06:35,870 So if you look at this help, that is this and final argument. 77 00:06:36,620 --> 00:06:41,780 It is the number of iterations for which boosting is run by default as it is hundred. 78 00:06:42,020 --> 00:06:44,570 If I report and final is equal 2000. 79 00:06:48,620 --> 00:06:54,500 And bring this model again, predictive values again and create this table again. 80 00:07:00,030 --> 00:07:07,140 Now, by increasing the value of M final two, one thousand, which was a little one hundred, you can 81 00:07:07,140 --> 00:07:12,190 see that the prediction accuracy is now 77 out of one or thirteen. 82 00:07:13,920 --> 00:07:21,360 So if we jig's 77 by 113 is sixty eight percent. 83 00:07:23,010 --> 00:07:28,300 So you can see by changing the parameters in the boosting formula. 84 00:07:28,770 --> 00:07:32,180 You can increase the the prediction accuracy of this model. 85 00:07:34,070 --> 00:07:41,540 There is an option to block any tree that it opposed, created through throughout the hydration process. 86 00:07:42,050 --> 00:07:50,240 So if you want to plot the first tree, you can store that value in some variable and you can plot that 87 00:07:50,240 --> 00:07:52,220 value to see the tree. 88 00:07:53,100 --> 00:07:54,830 So this is what the tree looks like. 89 00:07:55,070 --> 00:07:58,910 But it does not have any labelling on it. 90 00:07:59,090 --> 00:08:02,120 That is which renewable was used to split. 91 00:08:02,240 --> 00:08:03,590 And all this splitting value. 92 00:08:03,680 --> 00:08:05,870 All that is not written for that. 93 00:08:05,870 --> 00:08:07,490 We will run this text command. 94 00:08:09,900 --> 00:08:13,470 You can see the text is now here, but still it is not looking good. 95 00:08:14,340 --> 00:08:15,180 If you zoom it. 96 00:08:19,000 --> 00:08:24,130 It comes out to be something like this, although not very readable, but still you can get some idea. 97 00:08:26,410 --> 00:08:34,720 So the three created by Adbusters also readable and you can Broyer using blog function. 98 00:08:36,490 --> 00:08:39,670 So this is how we create an edible model. 99 00:08:40,600 --> 00:08:41,560 Remember a few things. 100 00:08:42,280 --> 00:08:45,000 One, it can be done only on classification trees. 101 00:08:45,730 --> 00:08:53,250 Second, the dependent variable or the variable that you want to predict should be in factor format. 102 00:08:55,750 --> 00:08:59,330 I showed you earlier that our variable was in numeric format. 103 00:08:59,440 --> 00:09:02,380 It had Value 011 with the same values. 104 00:09:02,500 --> 00:09:04,160 It can be changed to factor alter. 105 00:09:05,590 --> 00:09:11,490 So use this as vector to change the training response variable values to factor. 106 00:09:12,490 --> 00:09:19,270 Then you can train the model and predict the values and find out the conclusion matrix in the same way 107 00:09:19,300 --> 00:09:20,210 that we used to do. 108 00:09:20,290 --> 00:09:22,050 What other methods also? 109 00:09:23,530 --> 00:09:27,300 We're definitely open the help for boosting function. 110 00:09:27,780 --> 00:09:29,100 Check out different arguments. 111 00:09:29,640 --> 00:09:36,840 Try out different values for them and try to see what does the impact of changing accuracy or the time 112 00:09:36,840 --> 00:09:42,030 taken to run the program, etc., of changing the value of these parameters. 113 00:09:42,540 --> 00:09:42,920 That's it. 114 00:09:43,110 --> 00:09:43,950 And this we do it in.