1 00:00:01,250 --> 00:00:06,710 Now we are going to discuss some very powerful prediction models based on tres. 2 00:00:09,250 --> 00:00:12,370 These methods are also called ensemble methods. 3 00:00:13,480 --> 00:00:18,490 Ensemble basically means taking a group of things instead of individual things. 4 00:00:19,800 --> 00:00:25,830 So these are the three matters that we'll be discussing bagging random forest and boosting. 5 00:00:27,760 --> 00:00:30,790 The concept behind using ensemble methods is simple. 6 00:00:32,350 --> 00:00:37,420 The problem with decision trees is that decision trees have high variance. 7 00:00:39,420 --> 00:00:47,220 This means if I split my training data into two halves and train two models using these two sets of 8 00:00:47,220 --> 00:00:52,110 training data, there are two models that you will get of not very different. 9 00:00:54,730 --> 00:00:57,100 So using my movie does it. 10 00:00:58,390 --> 00:01:04,870 I created two parts of the dating dataset and trained a regression tree on both of those. 11 00:01:05,650 --> 00:01:10,890 And I'm getting two different types of revisionaries from those two does. 12 00:01:12,310 --> 00:01:21,610 You can see for this tree, the first cut is made on budget variable at a value of 37000, whereas for 13 00:01:21,640 --> 00:01:26,080 distri the first cut is made for trailered views at a value of. 14 00:01:27,070 --> 00:01:28,470 447 Dolgen. 15 00:01:30,360 --> 00:01:32,640 Similarly, the next no doubt alter different. 16 00:01:33,570 --> 00:01:35,400 Basically, the whole three is different. 17 00:01:37,470 --> 00:01:43,940 So since the output of busy and remodel, that is highly with the training set that we use. 18 00:01:45,110 --> 00:01:48,170 We say that this country has high variant. 19 00:01:50,370 --> 00:01:54,660 So this is the problem that RBC entry has Iberians. 20 00:01:57,710 --> 00:02:05,260 And a natural way to reduce the variance and hence increase the prediction accuracy is to take many 21 00:02:05,270 --> 00:02:07,280 training sets from the population. 22 00:02:08,710 --> 00:02:13,180 Then build a separate prediction model using each training set. 23 00:02:14,630 --> 00:02:19,190 Finally, we average the resulting prediction to good defying prediction. 24 00:02:20,570 --> 00:02:24,050 So you can see here, instead of using one training set. 25 00:02:24,650 --> 00:02:26,240 I used multiple training sets. 26 00:02:27,940 --> 00:02:33,840 Using each other training set, I create different models, just like the model that we created earlier. 27 00:02:35,200 --> 00:02:40,570 Using those different models, I predict the outcome for each of the region. 28 00:02:42,100 --> 00:02:44,560 So suppose 1-3 is saying that? 29 00:02:45,880 --> 00:02:48,450 The predicted value will be 40000. 30 00:02:48,970 --> 00:02:51,950 And the other three using that predicted value will be 30000. 31 00:02:52,990 --> 00:02:59,440 And so on, the final value of my prediction will be the average of all these predictions. 32 00:03:01,990 --> 00:03:06,070 So instead of using one set, we are using a group of training set. 33 00:03:06,400 --> 00:03:09,640 That is why this method is called ensemble method. 34 00:03:10,330 --> 00:03:16,590 And this particular technique in which we are using all the variables but multiple training sets is 35 00:03:16,600 --> 00:03:17,350 called bagging. 36 00:03:19,110 --> 00:03:25,980 This technique is based on the concept that if I have any observations with variance sigma squared, 37 00:03:26,910 --> 00:03:31,080 the variance of mean of these observations will be Sigma squared by N. 38 00:03:32,270 --> 00:03:38,000 Meaning that when I average a set of observations, it reduces the Gideons. 39 00:03:39,360 --> 00:03:44,190 Of course, practically, we do not have access to multiple training sites. 40 00:03:45,290 --> 00:03:52,610 So instead, we use bootstrapping to create multiple samples from a single draining dataset. 41 00:03:56,000 --> 00:04:01,580 Let us understand how bootstrapping helps us create multiple samples using the same leaders as a. 42 00:04:03,260 --> 00:04:09,530 Suppose I have this smaller data set of five numbers, seven nine, five, four, three. 43 00:04:11,620 --> 00:04:16,150 I want to create three samples or three training set from this. 44 00:04:17,130 --> 00:04:18,180 Single Training said. 45 00:04:19,630 --> 00:04:26,290 The method is I randomly choose one number out of these five numbers and added to my sample one. 46 00:04:28,190 --> 00:04:34,150 Then again, choose randomly from these five numbers and put it in my sample one. 47 00:04:35,530 --> 00:04:36,910 I continue this process. 48 00:04:37,300 --> 00:04:37,860 They like it. 49 00:04:37,960 --> 00:04:39,520 Five numbers in my sample. 50 00:04:40,730 --> 00:04:43,460 You can see that my sample has a reputation. 51 00:04:44,740 --> 00:04:45,130 Since. 52 00:04:46,250 --> 00:04:48,110 I am choosing from the five numbers. 53 00:04:48,580 --> 00:04:51,880 Multiple times I am bound to get some numbers. 54 00:04:52,360 --> 00:04:53,320 More than one Paines. 55 00:04:55,310 --> 00:05:00,990 So in my first sample, I missed out on seven, but I got four. 56 00:05:01,070 --> 00:05:01,700 Two times. 57 00:05:04,230 --> 00:05:08,520 Similarly example, do I get seven two things in sample three? 58 00:05:08,610 --> 00:05:10,160 I get nine two times. 59 00:05:10,560 --> 00:05:16,860 If I continue making samples, there will be some samples which will have some number three times or 60 00:05:16,860 --> 00:05:17,760 even four times. 61 00:05:19,200 --> 00:05:26,010 So basically, in bootstrapping, we are just digging out observations from the same training set. 62 00:05:27,210 --> 00:05:33,930 But repetition of observations, we are able to create multiple training sets out of the same single 63 00:05:33,930 --> 00:05:34,240 training. 64 00:05:36,860 --> 00:05:42,880 Now, when we are growing trees in the bagging technique, we allow the trees to grow woodland. 65 00:05:43,400 --> 00:05:45,260 That is, we do not prune the trees. 66 00:05:46,650 --> 00:05:51,960 Hence, each individual tree has high variance, but low bias. 67 00:05:54,150 --> 00:05:57,480 Averaging these trees reduces the variance. 68 00:05:59,200 --> 00:06:02,050 So overall, our prediction accuracy increases. 69 00:06:03,380 --> 00:06:06,020 Everything works in case we have regression trees. 70 00:06:07,130 --> 00:06:09,740 What should we do when we are building classification trees? 71 00:06:10,580 --> 00:06:17,150 I think you know the answer in case of classification trees for a given test observation, we can find 72 00:06:17,150 --> 00:06:21,410 out declasse predicted by all the different trees created using bagged data. 73 00:06:22,540 --> 00:06:30,310 And then we take a majority would that is the overall prediction is the most commonly agreed class among 74 00:06:30,310 --> 00:06:34,600 the predictions by our different three models in the next video. 75 00:06:34,630 --> 00:06:38,730 We will learn how to implement bagging technique in our software.