1 00:00:01,340 --> 00:00:09,140 As I told you earlier, too, big decision trees with a lot of nodes and a lot of clicks have two problems. 2 00:00:09,730 --> 00:00:16,460 They're difficult to interpret, and they overpowered the training data, thus giving bad asset performance. 3 00:00:18,410 --> 00:00:23,400 Therefore, we decided to control the tree root in the last two religions. 4 00:00:23,720 --> 00:00:30,980 We discussed some methods that stopped tree from growing when a particular condition is made. 5 00:00:33,140 --> 00:00:38,750 These strategies do succeed in giving us smaller and more interpretable trees. 6 00:00:40,480 --> 00:00:41,620 But they have a limitation. 7 00:00:42,850 --> 00:00:45,220 These studies are short sighted. 8 00:00:47,110 --> 00:00:49,780 Suppose we say that we will not make us played. 9 00:00:50,170 --> 00:00:53,890 If it is giving me few units of reduction in MSE. 10 00:00:55,350 --> 00:00:56,950 For simplicity, let's see. 11 00:00:58,320 --> 00:00:59,430 Ten units of benefit. 12 00:01:00,420 --> 00:01:05,070 So if I get 10 units of benefit from the split, I will keep that split. 13 00:01:05,640 --> 00:01:08,010 Otherwise, I will not keep that split. 14 00:01:09,760 --> 00:01:13,730 Now, in this situation, we're at this Naude. 15 00:01:14,290 --> 00:01:21,640 If I make us played, I'm getting a benefit of two units and after this split in this node, if I make 16 00:01:21,640 --> 00:01:24,880 a split, I'm getting a benefit of twelve minutes. 17 00:01:26,710 --> 00:01:27,820 So in this situation. 18 00:01:29,150 --> 00:01:32,120 Because the condition is not met at this node. 19 00:01:33,540 --> 00:01:38,850 Regrowth will be stored started this, not only this node will not be split further. 20 00:01:40,470 --> 00:01:47,130 But if you would have made the split, the next split would have given us the benefit of when we unit. 21 00:01:48,240 --> 00:01:53,140 Two in such a situation will be predesigned the constraint. 22 00:01:54,000 --> 00:02:00,570 We will never find out that this plague was possible because tree growth was stopped at this. 23 00:02:00,570 --> 00:02:01,170 Not only. 24 00:02:02,490 --> 00:02:07,650 Therefore, we need a solution where we do not miss out on such splits. 25 00:02:09,320 --> 00:02:12,150 This solution is called tree pruning. 26 00:02:14,030 --> 00:02:22,640 In this strategy, we draw a very large tree and then prune it, or you can say we cut some parts of 27 00:02:22,640 --> 00:02:27,950 it which are not beneficial in order to obtain a optimal subtree. 28 00:02:29,480 --> 00:02:33,410 So now you order subtree, which is lowest test at a rate. 29 00:02:34,800 --> 00:02:40,290 We can use cross-validation to find out the test at rate of all such Sudbury's. 30 00:02:41,790 --> 00:02:43,650 But this can be computationally expensive. 31 00:02:44,010 --> 00:02:48,240 That is, it may take a very long time to run on this software. 32 00:02:50,280 --> 00:02:56,670 Therefore, we use a method called cost, complexity, pruning or weakest link pruning. 33 00:02:58,180 --> 00:03:05,350 In this matter, we add additional cost of number of terminal nodes in the tree to the Odyssey's. 34 00:03:07,270 --> 00:03:15,010 So instead of minimizing Odyssey's, we minimize odysseys, plus are down for number of terminal node. 35 00:03:16,780 --> 00:03:18,550 Head Alpha is called unique. 36 00:03:18,620 --> 00:03:20,870 But I made it out of complexity better I would it. 37 00:03:23,740 --> 00:03:25,090 If all five zero. 38 00:03:25,840 --> 00:03:27,490 That is normal tree growth. 39 00:03:28,610 --> 00:03:32,540 We are minimizing the normal order system so we get normal re-route. 40 00:03:34,550 --> 00:03:36,470 But the value of a fire increases. 41 00:03:37,730 --> 00:03:41,000 There is increasing penalty for having more split. 42 00:03:45,260 --> 00:03:48,860 Therefore, value of Alpha controls three Grote. 43 00:03:50,560 --> 00:03:58,150 So the process is this we have to find out the value of Alpha at which we get minimum gross validated 44 00:03:58,210 --> 00:03:59,680 error on our training, said. 45 00:04:00,840 --> 00:04:08,700 Using that, many of alpha will prune out tree, and we expect that this prune tree will perform better 46 00:04:08,700 --> 00:04:09,930 on the test set. 47 00:04:12,020 --> 00:04:14,490 So let us see how to do pwning enough software. 48 00:04:14,640 --> 00:04:15,390 Next we do.