1
00:00:00,790 --> 00:00:03,830
Till now, we have been discussing the regression trees.

2
00:00:04,170 --> 00:00:11,550
That is, we were trying to predict a continuous quantitative variable like collections of a movie.

3
00:00:13,520 --> 00:00:19,520
Now we are going to discuss classification trees in which we will try to predict categorical variables,

4
00:00:20,150 --> 00:00:23,320
such as whether the movie will win an award or not.

5
00:00:25,620 --> 00:00:27,690
Although the output will look similar.

6
00:00:27,930 --> 00:00:31,080
There are few minor differences in the model at the back end.

7
00:00:33,810 --> 00:00:41,160
First of all, at the leaf node integration trees, we find the mean of the response, we are able to

8
00:00:41,160 --> 00:00:42,430
get the predicted value.

9
00:00:43,860 --> 00:00:46,590
But for classification trees, we'll be using mode.

10
00:00:47,460 --> 00:00:54,570
That is, we assign that class to the region, which is the most commonly occurring class in that region.

11
00:00:55,770 --> 00:01:00,480
For example, if we are classifying whether us to rent will pass or fail.

12
00:01:00,600 --> 00:01:01,990
This is a number of hours.

13
00:01:02,220 --> 00:01:03,590
That student has to read.

14
00:01:05,110 --> 00:01:07,530
We had a population of 10 student.

15
00:01:09,440 --> 00:01:13,200
In this tends to end, there are five to eight words to read.

16
00:01:13,250 --> 00:01:19,750
More than five hours and five different words to read, less than five words out of these five, two

17
00:01:19,750 --> 00:01:20,140
things.

18
00:01:20,390 --> 00:01:24,560
Four passed and the other group of five students, only two passed.

19
00:01:25,630 --> 00:01:36,520
So in the first region since four out of five past will predict that students and this region will pass

20
00:01:36,520 --> 00:01:44,460
the exam and the second region since only two out of five passed and three students fail, will say

21
00:01:44,470 --> 00:01:48,460
that students belonging to this region will fail.

22
00:01:51,780 --> 00:01:54,360
So instead of using mean to predict.

23
00:01:55,720 --> 00:02:00,490
The outcome in that particular region, we will use more to predict the outcome.

24
00:02:04,210 --> 00:02:08,200
The process of growing declassification is similar to growing integration three.

25
00:02:09,380 --> 00:02:12,090
We use because of binary splitting here also.

26
00:02:13,090 --> 00:02:19,060
However, integration breeze, we chose that split, which gave us minimum RSS.

27
00:02:20,320 --> 00:02:23,660
In classification three, obviously, we cannot use Artosis.

28
00:02:25,810 --> 00:02:29,830
So there are several other possible criterion for making dispute.

29
00:02:32,180 --> 00:02:36,290
One natural law logical criteria is classification error rate.

30
00:02:38,240 --> 00:02:41,000
Let me explain the steps that will be involved in this process.

31
00:02:42,570 --> 00:02:46,560
First, we'll consider all the variables and all possible split values.

32
00:02:48,100 --> 00:02:56,350
After each bullet for each region, we will assign it last to that region, as we saw that we assigned

33
00:02:56,530 --> 00:03:02,080
a class of pass to this region and we assign a class of field to this region.

34
00:03:02,710 --> 00:03:05,650
In this way, we will assign a class to each region.

35
00:03:07,680 --> 00:03:13,710
Then we will simply find out diffraction fraction of training observations in that region that do not

36
00:03:13,710 --> 00:03:16,110
belong to the most common class.

37
00:03:17,800 --> 00:03:27,880
So in this region, one observation in which the student field does not match with the predicted value,

38
00:03:28,000 --> 00:03:29,710
which is pass for this region.

39
00:03:31,640 --> 00:03:34,550
And those students who passed in this region.

40
00:03:36,290 --> 00:03:38,600
Do not match with the predicted outcome of this region.

41
00:03:38,660 --> 00:03:45,230
We just feel so three students out of the total, 10 students are misclassified.

42
00:03:47,600 --> 00:03:51,660
So classification error rate in this situation will be 30 percent.

43
00:03:53,930 --> 00:04:00,260
So for each variable and each split, we'll find out declassification at a rate, whichever split is

44
00:04:00,260 --> 00:04:02,480
giving us the minimum classification error rate.

45
00:04:02,900 --> 00:04:04,130
We will keep that split.

46
00:04:06,490 --> 00:04:13,620
However, it turns out that classification error rate is not sufficiently sensitive for regrowing.

47
00:04:14,910 --> 00:04:18,630
So in practice, we have two other measures which are more prevert.

48
00:04:20,140 --> 00:04:24,430
One is Gini Index, which is depicted by this formula.

49
00:04:26,710 --> 00:04:29,020
Let me explain the intuition behind Deani Index.

50
00:04:30,680 --> 00:04:33,470
Suppose we are classifying into two classes.

51
00:04:34,720 --> 00:04:35,630
Boss entry.

52
00:04:37,250 --> 00:04:43,640
Now, this peaslee probability of bias in a particular region or at a particular node.

53
00:04:45,360 --> 00:04:48,680
If the value of bees is very small, say zero.

54
00:04:50,090 --> 00:04:51,390
So the north is very beautiful.

55
00:04:51,620 --> 00:04:58,910
That is, it has most of the observations belonging to only one class and Gini index values small.

56
00:05:00,180 --> 00:05:03,390
The other scenario is if programs need large.

57
00:05:05,030 --> 00:05:10,740
What extremes and how you say it is one, then all the obligations belong to past category.

58
00:05:11,610 --> 00:05:18,230
Again, there is high purity at this node and the value of Gini index, again, comes out to be zero

59
00:05:18,630 --> 00:05:20,280
because here we'll get one.

60
00:05:20,280 --> 00:05:22,800
Minus one is equal to zero in this term.

61
00:05:25,160 --> 00:05:32,650
You can see that a small value of Gini index is a negative note purity, that is, it indicates that

62
00:05:32,650 --> 00:05:37,310
a particular north contains predominantly observations of a single class only.

63
00:05:38,790 --> 00:05:41,090
And then there are the natives, cross and droopy.

64
00:05:41,890 --> 00:05:44,380
We just got using this formula.

65
00:05:46,710 --> 00:05:52,710
It is numerically similar to Gini index, and it also takes small value when naughties built.

66
00:05:54,920 --> 00:06:00,560
So in practice, when we are building a classification three Deani index or crossing and droopy are

67
00:06:00,600 --> 00:06:01,190
preffered.

68
00:06:02,320 --> 00:06:05,290
As these are more sensitive to North purity.