1
00:00:01,200 --> 00:00:06,650
In this we do we are going to understand the steps taken to be a regression tree.

2
00:00:09,400 --> 00:00:14,560
The conceptual understanding that you will love from this lecture will help you until your interview

3
00:00:14,590 --> 00:00:15,640
or Veira questions.

4
00:00:16,270 --> 00:00:22,900
Plus, you'll be able to manipulate the decision tree and interpret its result much better than someone

5
00:00:23,140 --> 00:00:25,550
who just knows the court to make a decision tree.

6
00:00:27,330 --> 00:00:34,290
So as I showed you earlier in a decision tree, we are trying to create regions or segments.

7
00:00:35,680 --> 00:00:39,790
These segments have particular creek districts such as here.

8
00:00:40,120 --> 00:00:46,930
We said that this region is a group of students who studied less than an arts.

9
00:00:48,530 --> 00:00:54,960
The next region is a good group of students who studied more than 10 nuts but scored less than sixty

10
00:00:54,960 --> 00:00:56,870
five marks in the midterms.

11
00:00:57,940 --> 00:00:58,540
And so on.

12
00:00:59,800 --> 00:01:07,240
Secondly, when we have these regions, we make a prediction for the response with evil, which is usually

13
00:01:07,240 --> 00:01:10,510
the mean of the response value of observations in that region.

14
00:01:12,240 --> 00:01:18,870
So for the first region, we have five students and we are predicting that if a student studies less

15
00:01:18,870 --> 00:01:26,000
than 10 us, that student will score the average of this quarter of these five students, which is thirty

16
00:01:26,000 --> 00:01:26,820
nine MOCs.

17
00:01:28,520 --> 00:01:34,560
Similarly, if a student belongs to the second region, that is student studied more than 10 hours.

18
00:01:34,790 --> 00:01:41,120
But Madame Skoda's less than 65 monks, then that student will be scoring 70 monks.

19
00:01:42,380 --> 00:01:46,610
But the main question is, how do we decide these regions?

20
00:01:48,180 --> 00:01:50,820
Which variable should we pick for the first plate?

21
00:01:51,480 --> 00:01:52,530
And at what value?

22
00:01:54,100 --> 00:02:02,110
How and why did we decide that first this our variable will be used and that to be just putting value

23
00:02:02,110 --> 00:02:02,790
of Penas?

24
00:02:03,310 --> 00:02:04,570
And why not for Peanut's?

25
00:02:06,990 --> 00:02:14,550
The answer is we will pick such variable and such splitting value so that we get minimum sum of squared

26
00:02:14,700 --> 00:02:14,930
A.

27
00:02:16,950 --> 00:02:18,340
Dissembles, good at it.

28
00:02:18,580 --> 00:02:27,220
Time is given by this formula, which is the actual rally of response variable in the observation minus

29
00:02:27,700 --> 00:02:31,100
the predicted value of response in that region.

30
00:02:32,400 --> 00:02:40,690
Then we square that down and then we add all touchstones, the meaning of this sigma symbol here means

31
00:02:40,840 --> 00:02:44,380
that we are adding this is summation of all such terms.

32
00:02:44,860 --> 00:02:51,160
So for all the observations, we are going to find the difference from that predicted value.

33
00:02:52,460 --> 00:02:57,500
And we are going to square it and we are going to add all those values for all the regions.

34
00:02:59,740 --> 00:03:03,010
And are variable and splitting value will be chosen.

35
00:03:03,030 --> 00:03:06,170
Such that the value of this term is minimal.

36
00:03:08,050 --> 00:03:15,040
If you know or remember from linear regression, this is very similar to the or Mary Lee squared method.

37
00:03:16,540 --> 00:03:19,480
Let us understand what this means for decision trees.

38
00:03:21,340 --> 00:03:25,810
Let us consider only the first card for now, which is this is us.

39
00:03:27,810 --> 00:03:29,100
So we have two regions.

40
00:03:30,030 --> 00:03:33,380
This is region one, which is for less than 10 hours.

41
00:03:33,630 --> 00:03:36,270
And this is region two, which is far more than Tynan's.

42
00:03:39,750 --> 00:03:40,590
Region one.

43
00:03:41,100 --> 00:03:42,600
We have five values.

44
00:03:43,200 --> 00:03:45,030
That is 50 percent of the population.

45
00:03:45,660 --> 00:03:46,840
These first five.

46
00:03:47,590 --> 00:03:51,360
Where the odds values less than 10 belong to the first region.

47
00:03:52,580 --> 00:03:56,360
And for these very values, we have a predicted value of thirty nine.

48
00:03:57,800 --> 00:03:59,690
Which is the mean score of this population.

49
00:04:01,110 --> 00:04:03,870
What region do we have, the other five values?

50
00:04:04,740 --> 00:04:10,320
These observations belong to Region two, and the predicted value for them is the average value, which

51
00:04:10,320 --> 00:04:11,130
is 75.

52
00:04:13,620 --> 00:04:17,100
So as buddy formula, we find the difference of.

53
00:04:18,120 --> 00:04:23,550
The first value was just to define what this means for 39 discredit.

54
00:04:23,950 --> 00:04:25,120
And this is at first, um.

55
00:04:26,430 --> 00:04:28,810
Then we do this for the second observation.

56
00:04:29,130 --> 00:04:31,290
We find a difference of 38 and 39.

57
00:04:31,500 --> 00:04:32,170
We square it.

58
00:04:32,310 --> 00:04:33,720
And this is our second that Adam.

59
00:04:35,460 --> 00:04:43,290
Then we find a difference of 40 and 39 square, it turned out at home and so on, when we have all these

60
00:04:43,290 --> 00:04:50,100
other times, but all the regions, we add all those other towns to get the value of odysseys.

61
00:04:50,240 --> 00:04:50,730
There it is.

62
00:04:50,880 --> 00:04:52,320
Let's do some of Squeers.

63
00:04:54,890 --> 00:05:04,220
Now, instead of a value of ten for us, if we had a splitting relly 015, we'll have these seven thumbs.

64
00:05:05,060 --> 00:05:11,600
These seven observations in the region one and these three in the region to the average of these seven

65
00:05:11,600 --> 00:05:13,040
will be taken as the mean score.

66
00:05:13,280 --> 00:05:19,270
An average of these three will be taken as a means score for region to really do this exercise again

67
00:05:19,850 --> 00:05:21,470
and find out the Odyssey's value.

68
00:05:22,490 --> 00:05:25,940
And we will choose that datasource value, which is lowered out of these two.

69
00:05:27,830 --> 00:05:31,990
So basically the split is based on the value of.

70
00:05:33,380 --> 00:05:38,960
Retired, all possible variables and all possible splitting values of those variables.

71
00:05:39,680 --> 00:05:43,220
We find out the artists and we choose the RSS with just minimum.

72
00:05:45,120 --> 00:05:51,110
So for this scenario, it turns out that odds the first where he will likely choose.

73
00:05:51,720 --> 00:05:55,290
And we have a splitting rally of an Ares.

74
00:05:56,940 --> 00:05:59,850
Or this combination of variable and splitting relu.

75
00:06:00,170 --> 00:06:02,190
We get the minimum value of odysseys.

76
00:06:07,000 --> 00:06:12,760
Now, ideally, we have to do this at all possible values of our trade, even then, we also have to

77
00:06:12,760 --> 00:06:14,620
consider the second variable.

78
00:06:15,910 --> 00:06:16,920
Which is make them good.

79
00:06:17,770 --> 00:06:24,250
So it turns out that when we have a lot of preening observations and a lot of variables, it becomes

80
00:06:24,250 --> 00:06:30,580
computationally infeasible to consider every possible partition and all such possible regions.

81
00:06:31,750 --> 00:06:36,700
That is why we Tager top down approach known as the coercive binary splitting.

82
00:06:38,630 --> 00:06:46,220
The approach that we take is top down because we start at the top of the tree, that is all the observations

83
00:06:46,370 --> 00:06:51,500
are belonging to a single region in the beginning, and then we start making these split.

84
00:06:53,740 --> 00:06:58,890
Now, each split separates the predictable space into two parts.

85
00:07:00,670 --> 00:07:02,680
This is why it is called binary splitting.

86
00:07:04,840 --> 00:07:12,040
It is greedy because at each step of the rebuilding process, the best split at that particular step

87
00:07:12,640 --> 00:07:13,390
is considered.

88
00:07:13,840 --> 00:07:19,870
We do not look ahead or considered picking a split that will lead to a better three later on.

89
00:07:20,560 --> 00:07:27,250
We contend that only that current split and Jews that split, which is giving us these minimum odysseys.

90
00:07:30,490 --> 00:07:33,640
So I'll summarize the whole rebuilding process.

91
00:07:35,010 --> 00:07:39,160
When our program is trying to build a data entry, this is what is happening in the background.

92
00:07:40,780 --> 00:07:52,090
It considers all the predictors x1 x2 up to XP one by one, then all possible values of points for each

93
00:07:52,090 --> 00:07:55,720
variable is considered to divide the space into two regions.

94
00:07:57,640 --> 00:07:59,740
It calculates the squared error.

95
00:08:01,190 --> 00:08:08,720
This squared error times for all such possibilities and chooses the one with least value of the sum

96
00:08:08,720 --> 00:08:09,660
of squared error.

97
00:08:12,860 --> 00:08:20,390
It continues to make this place like this such that the resulting tree has lowest Odyssey's until the

98
00:08:20,410 --> 00:08:22,220
stop in criteria's made.

99
00:08:23,220 --> 00:08:27,180
So in our example problem we had, we were predicting students scored.

100
00:08:27,720 --> 00:08:29,430
The program continued.

101
00:08:29,640 --> 00:08:30,920
All the variables first.

102
00:08:31,320 --> 00:08:33,830
Initially, we had 10 students data.

103
00:08:34,590 --> 00:08:36,450
The average was coming out to be 57.

104
00:08:37,970 --> 00:08:47,450
It considered the variables are then make them and all possible values of these are midterm values.

105
00:08:47,630 --> 00:08:51,870
So it considered five or six or seven, eight, nine, 10.

106
00:08:51,950 --> 00:08:53,780
All such possible values of ARS.

107
00:08:54,980 --> 00:08:59,110
And all such possible splitting values of McDonald's.

108
00:08:59,840 --> 00:09:07,370
It chose this particular variable, which is odd and dispiriting value, because at this step this was

109
00:09:07,370 --> 00:09:11,120
giving the minimum Odyssey's once displayed was made.

110
00:09:11,810 --> 00:09:19,850
It went to each of detailed posted to the left node, but it did not spread it further because some

111
00:09:19,910 --> 00:09:21,850
stopping criteria was met here.

112
00:09:24,130 --> 00:09:28,000
Then it went to the date, nor the stopping criteria was not met here.

113
00:09:28,690 --> 00:09:30,490
It again tried all the variables.

114
00:09:30,640 --> 00:09:32,290
That is odd and make them.

115
00:09:33,590 --> 00:09:38,390
It again tested all the possible splitting values for Odd and Midem that even.

116
00:09:40,300 --> 00:09:40,560
It got.

117
00:09:41,000 --> 00:09:47,260
The Odyssey's value for all such possibilities and chose the Odyssey's, which was minimum, which in

118
00:09:47,260 --> 00:09:49,770
this case came out to be Metung variable.

119
00:09:50,290 --> 00:09:51,630
Less than 60 feet.

120
00:09:54,720 --> 00:09:57,120
All this happens in the background of this awkward.

121
00:09:58,220 --> 00:10:04,760
You just need to give the data let which is the one variable that you want to predict, which are the

122
00:10:04,760 --> 00:10:05,930
predicted variables.

123
00:10:06,620 --> 00:10:09,830
And what is this topping criteria for that decision tree?

124
00:10:11,190 --> 00:10:16,740
Once you specify all these values, the three can run and you get an output like this.