1
00:00:01,250 --> 00:00:06,710
Now we are going to discuss some very powerful prediction models based on tres.

2
00:00:09,250 --> 00:00:12,370
These methods are also called ensemble methods.

3
00:00:13,480 --> 00:00:18,490
Ensemble basically means taking a group of things instead of individual things.

4
00:00:19,800 --> 00:00:25,830
So these are the three matters that we'll be discussing bagging random forest and boosting.

5
00:00:27,760 --> 00:00:30,790
The concept behind using ensemble methods is simple.

6
00:00:32,350 --> 00:00:37,420
The problem with decision trees is that decision trees have high variance.

7
00:00:39,420 --> 00:00:47,220
This means if I split my training data into two halves and train two models using these two sets of

8
00:00:47,220 --> 00:00:52,110
training data, there are two models that you will get of not very different.

9
00:00:54,730 --> 00:00:57,100
So using my movie does it.

10
00:00:58,390 --> 00:01:04,870
I created two parts of the dating dataset and trained a regression tree on both of those.

11
00:01:05,650 --> 00:01:10,890
And I'm getting two different types of revisionaries from those two does.

12
00:01:12,310 --> 00:01:21,610
You can see for this tree, the first cut is made on budget variable at a value of 37000, whereas for

13
00:01:21,640 --> 00:01:26,080
distri the first cut is made for trailered views at a value of.

14
00:01:27,070 --> 00:01:28,470
447 Dolgen.

15
00:01:30,360 --> 00:01:32,640
Similarly, the next no doubt alter different.

16
00:01:33,570 --> 00:01:35,400
Basically, the whole three is different.

17
00:01:37,470 --> 00:01:43,940
So since the output of busy and remodel, that is highly with the training set that we use.

18
00:01:45,110 --> 00:01:48,170
We say that this country has high variant.

19
00:01:50,370 --> 00:01:54,660
So this is the problem that RBC entry has Iberians.

20
00:01:57,710 --> 00:02:05,260
And a natural way to reduce the variance and hence increase the prediction accuracy is to take many

21
00:02:05,270 --> 00:02:07,280
training sets from the population.

22
00:02:08,710 --> 00:02:13,180
Then build a separate prediction model using each training set.

23
00:02:14,630 --> 00:02:19,190
Finally, we average the resulting prediction to good defying prediction.

24
00:02:20,570 --> 00:02:24,050
So you can see here, instead of using one training set.

25
00:02:24,650 --> 00:02:26,240
I used multiple training sets.

26
00:02:27,940 --> 00:02:33,840
Using each other training set, I create different models, just like the model that we created earlier.

27
00:02:35,200 --> 00:02:40,570
Using those different models, I predict the outcome for each of the region.

28
00:02:42,100 --> 00:02:44,560
So suppose 1-3 is saying that?

29
00:02:45,880 --> 00:02:48,450
The predicted value will be 40000.

30
00:02:48,970 --> 00:02:51,950
And the other three using that predicted value will be 30000.

31
00:02:52,990 --> 00:02:59,440
And so on, the final value of my prediction will be the average of all these predictions.

32
00:03:01,990 --> 00:03:06,070
So instead of using one set, we are using a group of training set.

33
00:03:06,400 --> 00:03:09,640
That is why this method is called ensemble method.

34
00:03:10,330 --> 00:03:16,590
And this particular technique in which we are using all the variables but multiple training sets is

35
00:03:16,600 --> 00:03:17,350
called bagging.

36
00:03:19,110 --> 00:03:25,980
This technique is based on the concept that if I have any observations with variance sigma squared,

37
00:03:26,910 --> 00:03:31,080
the variance of mean of these observations will be Sigma squared by N.

38
00:03:32,270 --> 00:03:38,000
Meaning that when I average a set of observations, it reduces the Gideons.

39
00:03:39,360 --> 00:03:44,190
Of course, practically, we do not have access to multiple training sites.

40
00:03:45,290 --> 00:03:52,610
So instead, we use bootstrapping to create multiple samples from a single draining dataset.

41
00:03:56,000 --> 00:04:01,580
Let us understand how bootstrapping helps us create multiple samples using the same leaders as a.

42
00:04:03,260 --> 00:04:09,530
Suppose I have this smaller data set of five numbers, seven nine, five, four, three.

43
00:04:11,620 --> 00:04:16,150
I want to create three samples or three training set from this.

44
00:04:17,130 --> 00:04:18,180
Single Training said.

45
00:04:19,630 --> 00:04:26,290
The method is I randomly choose one number out of these five numbers and added to my sample one.

46
00:04:28,190 --> 00:04:34,150
Then again, choose randomly from these five numbers and put it in my sample one.

47
00:04:35,530 --> 00:04:36,910
I continue this process.

48
00:04:37,300 --> 00:04:37,860
They like it.

49
00:04:37,960 --> 00:04:39,520
Five numbers in my sample.

50
00:04:40,730 --> 00:04:43,460
You can see that my sample has a reputation.

51
00:04:44,740 --> 00:04:45,130
Since.

52
00:04:46,250 --> 00:04:48,110
I am choosing from the five numbers.

53
00:04:48,580 --> 00:04:51,880
Multiple times I am bound to get some numbers.

54
00:04:52,360 --> 00:04:53,320
More than one Paines.

55
00:04:55,310 --> 00:05:00,990
So in my first sample, I missed out on seven, but I got four.

56
00:05:01,070 --> 00:05:01,700
Two times.

57
00:05:04,230 --> 00:05:08,520
Similarly example, do I get seven two things in sample three?

58
00:05:08,610 --> 00:05:10,160
I get nine two times.

59
00:05:10,560 --> 00:05:16,860
If I continue making samples, there will be some samples which will have some number three times or

60
00:05:16,860 --> 00:05:17,760
even four times.

61
00:05:19,200 --> 00:05:26,010
So basically, in bootstrapping, we are just digging out observations from the same training set.

62
00:05:27,210 --> 00:05:33,930
But repetition of observations, we are able to create multiple training sets out of the same single

63
00:05:33,930 --> 00:05:34,240
training.

64
00:05:36,860 --> 00:05:42,880
Now, when we are growing trees in the bagging technique, we allow the trees to grow woodland.

65
00:05:43,400 --> 00:05:45,260
That is, we do not prune the trees.

66
00:05:46,650 --> 00:05:51,960
Hence, each individual tree has high variance, but low bias.

67
00:05:54,150 --> 00:05:57,480
Averaging these trees reduces the variance.

68
00:05:59,200 --> 00:06:02,050
So overall, our prediction accuracy increases.

69
00:06:03,380 --> 00:06:06,020
Everything works in case we have regression trees.

70
00:06:07,130 --> 00:06:09,740
What should we do when we are building classification trees?

71
00:06:10,580 --> 00:06:17,150
I think you know the answer in case of classification trees for a given test observation, we can find

72
00:06:17,150 --> 00:06:21,410
out declasse predicted by all the different trees created using bagged data.

73
00:06:22,540 --> 00:06:30,310
And then we take a majority would that is the overall prediction is the most commonly agreed class among

74
00:06:30,310 --> 00:06:34,600
the predictions by our different three models in the next video.

75
00:06:34,630 --> 00:06:38,730
We will learn how to implement bagging technique in our software.