1 00:00:00,790 --> 00:00:03,830 Till now, we have been discussing the regression trees. 2 00:00:04,170 --> 00:00:11,550 That is, we were trying to predict a continuous quantitative variable like collections of a movie. 3 00:00:13,520 --> 00:00:19,520 Now we are going to discuss classification trees in which we will try to predict categorical variables, 4 00:00:20,150 --> 00:00:23,320 such as whether the movie will win an award or not. 5 00:00:25,620 --> 00:00:27,690 Although the output will look similar. 6 00:00:27,930 --> 00:00:31,080 There are few minor differences in the model at the back end. 7 00:00:33,810 --> 00:00:41,160 First of all, at the leaf node integration trees, we find the mean of the response, we are able to 8 00:00:41,160 --> 00:00:42,430 get the predicted value. 9 00:00:43,860 --> 00:00:46,590 But for classification trees, we'll be using mode. 10 00:00:47,460 --> 00:00:54,570 That is, we assign that class to the region, which is the most commonly occurring class in that region. 11 00:00:55,770 --> 00:01:00,480 For example, if we are classifying whether us to rent will pass or fail. 12 00:01:00,600 --> 00:01:01,990 This is a number of hours. 13 00:01:02,220 --> 00:01:03,590 That student has to read. 14 00:01:05,110 --> 00:01:07,530 We had a population of 10 student. 15 00:01:09,440 --> 00:01:13,200 In this tends to end, there are five to eight words to read. 16 00:01:13,250 --> 00:01:19,750 More than five hours and five different words to read, less than five words out of these five, two 17 00:01:19,750 --> 00:01:20,140 things. 18 00:01:20,390 --> 00:01:24,560 Four passed and the other group of five students, only two passed. 19 00:01:25,630 --> 00:01:36,520 So in the first region since four out of five past will predict that students and this region will pass 20 00:01:36,520 --> 00:01:44,460 the exam and the second region since only two out of five passed and three students fail, will say 21 00:01:44,470 --> 00:01:48,460 that students belonging to this region will fail. 22 00:01:51,780 --> 00:01:54,360 So instead of using mean to predict. 23 00:01:55,720 --> 00:02:00,490 The outcome in that particular region, we will use more to predict the outcome. 24 00:02:04,210 --> 00:02:08,200 The process of growing declassification is similar to growing integration three. 25 00:02:09,380 --> 00:02:12,090 We use because of binary splitting here also. 26 00:02:13,090 --> 00:02:19,060 However, integration breeze, we chose that split, which gave us minimum RSS. 27 00:02:20,320 --> 00:02:23,660 In classification three, obviously, we cannot use Artosis. 28 00:02:25,810 --> 00:02:29,830 So there are several other possible criterion for making dispute. 29 00:02:32,180 --> 00:02:36,290 One natural law logical criteria is classification error rate. 30 00:02:38,240 --> 00:02:41,000 Let me explain the steps that will be involved in this process. 31 00:02:42,570 --> 00:02:46,560 First, we'll consider all the variables and all possible split values. 32 00:02:48,100 --> 00:02:56,350 After each bullet for each region, we will assign it last to that region, as we saw that we assigned 33 00:02:56,530 --> 00:03:02,080 a class of pass to this region and we assign a class of field to this region. 34 00:03:02,710 --> 00:03:05,650 In this way, we will assign a class to each region. 35 00:03:07,680 --> 00:03:13,710 Then we will simply find out diffraction fraction of training observations in that region that do not 36 00:03:13,710 --> 00:03:16,110 belong to the most common class. 37 00:03:17,800 --> 00:03:27,880 So in this region, one observation in which the student field does not match with the predicted value, 38 00:03:28,000 --> 00:03:29,710 which is pass for this region. 39 00:03:31,640 --> 00:03:34,550 And those students who passed in this region. 40 00:03:36,290 --> 00:03:38,600 Do not match with the predicted outcome of this region. 41 00:03:38,660 --> 00:03:45,230 We just feel so three students out of the total, 10 students are misclassified. 42 00:03:47,600 --> 00:03:51,660 So classification error rate in this situation will be 30 percent. 43 00:03:53,930 --> 00:04:00,260 So for each variable and each split, we'll find out declassification at a rate, whichever split is 44 00:04:00,260 --> 00:04:02,480 giving us the minimum classification error rate. 45 00:04:02,900 --> 00:04:04,130 We will keep that split. 46 00:04:06,490 --> 00:04:13,620 However, it turns out that classification error rate is not sufficiently sensitive for regrowing. 47 00:04:14,910 --> 00:04:18,630 So in practice, we have two other measures which are more prevert. 48 00:04:20,140 --> 00:04:24,430 One is Gini Index, which is depicted by this formula. 49 00:04:26,710 --> 00:04:29,020 Let me explain the intuition behind Deani Index. 50 00:04:30,680 --> 00:04:33,470 Suppose we are classifying into two classes. 51 00:04:34,720 --> 00:04:35,630 Boss entry. 52 00:04:37,250 --> 00:04:43,640 Now, this peaslee probability of bias in a particular region or at a particular node. 53 00:04:45,360 --> 00:04:48,680 If the value of bees is very small, say zero. 54 00:04:50,090 --> 00:04:51,390 So the north is very beautiful. 55 00:04:51,620 --> 00:04:58,910 That is, it has most of the observations belonging to only one class and Gini index values small. 56 00:05:00,180 --> 00:05:03,390 The other scenario is if programs need large. 57 00:05:05,030 --> 00:05:10,740 What extremes and how you say it is one, then all the obligations belong to past category. 58 00:05:11,610 --> 00:05:18,230 Again, there is high purity at this node and the value of Gini index, again, comes out to be zero 59 00:05:18,630 --> 00:05:20,280 because here we'll get one. 60 00:05:20,280 --> 00:05:22,800 Minus one is equal to zero in this term. 61 00:05:25,160 --> 00:05:32,650 You can see that a small value of Gini index is a negative note purity, that is, it indicates that 62 00:05:32,650 --> 00:05:37,310 a particular north contains predominantly observations of a single class only. 63 00:05:38,790 --> 00:05:41,090 And then there are the natives, cross and droopy. 64 00:05:41,890 --> 00:05:44,380 We just got using this formula. 65 00:05:46,710 --> 00:05:52,710 It is numerically similar to Gini index, and it also takes small value when naughties built. 66 00:05:54,920 --> 00:06:00,560 So in practice, when we are building a classification three Deani index or crossing and droopy are 67 00:06:00,600 --> 00:06:01,190 preffered. 68 00:06:02,320 --> 00:06:05,290 As these are more sensitive to North purity.