1 00:00:00,780 --> 00:00:05,160 In this lecture, we will learn how to do pruning in Python. 2 00:00:06,180 --> 00:00:07,890 No debt in Python. 3 00:00:07,920 --> 00:00:13,730 We can only do three pruning and we will not be able to do both. 4 00:00:13,740 --> 00:00:15,300 Pruning as Python. 5 00:00:15,390 --> 00:00:17,070 Do not support forced pruning. 6 00:00:18,420 --> 00:00:22,530 We have discussed these how to a strong growth or for Greece. 7 00:00:23,550 --> 00:00:25,080 We discussed three ways. 8 00:00:25,740 --> 00:00:30,030 First one was to pass the maximum depth of forward Crete. 9 00:00:30,720 --> 00:00:37,690 The second way was to pass the minimum samples that required it and done a load to do this thing. 10 00:00:38,400 --> 00:00:44,290 And the third way was to provide the minimum sample required in the leaf node. 11 00:00:46,620 --> 00:00:52,280 We will use these three ways to control our Greek growth in Python. 12 00:00:53,550 --> 00:01:02,730 I hope you remembered how we used Max depth parameter of our Engracia object to give the maximum number 13 00:01:02,730 --> 00:01:04,510 of levels in our tree. 14 00:01:05,190 --> 00:01:07,860 So we will first try this method. 15 00:01:09,480 --> 00:01:11,940 The code will remain same for us. 16 00:01:12,000 --> 00:01:14,250 We are creating a regression tree object. 17 00:01:14,400 --> 00:01:16,980 We are naming it Regg Cree one. 18 00:01:18,450 --> 00:01:22,440 Then we are creating this object with the Crete Library of Eskinder. 19 00:01:24,090 --> 00:01:28,620 And here as a parameter, I am passing max depth equal to three. 20 00:01:29,460 --> 00:01:32,910 If you want to know about all the parameters that you can pass. 21 00:01:33,090 --> 00:01:41,940 You just have to enter shift tab and it will give you all the parameters that you can provide in this 22 00:01:42,210 --> 00:01:42,720 object. 23 00:01:45,270 --> 00:01:55,050 If you head shift tab two times, it will open a larger window and if you hit ship tab three times, 24 00:01:56,370 --> 00:02:02,940 it will open the help window in the bottom part of your screen. 25 00:02:05,090 --> 00:02:07,980 You can go through all of these parameters on your own. 26 00:02:09,020 --> 00:02:16,910 But we will be on there discussing the important ones, the forces maximum depth here. 27 00:02:17,180 --> 00:02:20,380 You can define the maximum levels you want in your tree. 28 00:02:21,710 --> 00:02:28,160 The second step will be to fit a what, ekstrand wide trend data and do this figuration tree object. 29 00:02:30,670 --> 00:02:38,410 Then in this three steps, we will create the DOT data and cloud the word regression tree. 30 00:02:38,500 --> 00:02:43,330 Using that DOT data, I have added to more parameters here. 31 00:02:44,410 --> 00:02:50,530 So if you remember earlier we were getting our columns as X zero X one. 32 00:02:52,330 --> 00:03:00,360 So here, if you mention feature name as parameter and then if you mention the columns of your X data, 33 00:03:01,810 --> 00:03:09,900 then in the final three you will get variable names, insert of X one or X two or X3. 34 00:03:11,290 --> 00:03:13,940 The next parameter I have added is filled. 35 00:03:15,210 --> 00:03:19,330 It will fill Galut according to the value of that bucket. 36 00:03:19,600 --> 00:03:22,450 So it's just like conditional formatting in Excel. 37 00:03:22,540 --> 00:03:29,890 If you know what this conditional formatting of food parameter equate to grow will sort of do a conditional 38 00:03:29,900 --> 00:03:32,260 formatting on your three buckets. 39 00:03:33,460 --> 00:03:35,170 So let's run this on. 40 00:03:42,490 --> 00:03:45,110 So this is our tree. 41 00:03:46,180 --> 00:03:52,600 You can see that the maximum depth is three here as we have provided the condition on maximum depth. 42 00:03:54,190 --> 00:03:55,330 This is the level one. 43 00:03:55,630 --> 00:03:56,240 This is lower. 44 00:03:56,250 --> 00:03:56,610 You do. 45 00:03:56,770 --> 00:03:57,910 And this is level three. 46 00:04:02,120 --> 00:04:09,560 If you see inside this small cells, first we have the variable name and then we have the condition. 47 00:04:10,280 --> 00:04:17,300 So what first a split is on, but your variable and if budget is less than thirty seven thousand nine 48 00:04:17,300 --> 00:04:19,910 hundred, then we are falling. 49 00:04:20,110 --> 00:04:25,910 This branch and if budget is greater than tardies own thousand nine hundred, we are following this 50 00:04:25,910 --> 00:04:26,390 branch. 51 00:04:27,380 --> 00:04:32,820 We have the MASC when you add this bucket of over KRI. 52 00:04:33,530 --> 00:04:36,410 Then we have the total sample size at this bucket. 53 00:04:37,130 --> 00:04:42,250 So since our total sample size of the crane data is four hundred and four. 54 00:04:42,890 --> 00:04:45,360 That's why we are getting sample is four hundred and four. 55 00:04:45,920 --> 00:04:50,540 And the last is the average value of our collection variable in this bucket. 56 00:04:52,520 --> 00:04:55,760 So if you notice, the colors are changing. 57 00:04:55,850 --> 00:05:01,910 According to this collection value here, the collection value is hundred thousand. 58 00:05:01,970 --> 00:05:05,720 That's why the color of this cell is darkest. 59 00:05:06,290 --> 00:05:09,560 And here are the main collection value is twenty eight thousand eight hundred. 60 00:05:09,590 --> 00:05:11,900 That's why the color is latest. 61 00:05:13,670 --> 00:05:24,110 You can see all this information in all the cells and you can follow these branches and cells to find 62 00:05:24,110 --> 00:05:25,770 the final condition of your leaf. 63 00:05:25,870 --> 00:05:32,510 Note, for example, for this leaf note or condition, this budget is less than thirty seven thousand 64 00:05:32,510 --> 00:05:38,720 nine hundred eighty two views is more than four 40000. 65 00:05:38,940 --> 00:05:45,410 On the left side is always for crew and the right side is always for false. 66 00:05:45,740 --> 00:05:51,440 So that's why Krilov view should be greater than for 40000 for this cell. 67 00:05:52,190 --> 00:05:56,690 And then of the dedicated rating is greater than nine point one six. 68 00:05:57,450 --> 00:05:58,640 We will get this book. 69 00:05:59,720 --> 00:06:03,680 So there are two observations which are satisfying this. 70 00:06:03,770 --> 00:06:13,620 All three conditions and the average value of our collection data for these two observations is underclothing. 71 00:06:16,200 --> 00:06:23,310 This is the first day of pruning or tree we have used maximum depth as a barometer. 72 00:06:26,190 --> 00:06:33,600 The next we will discuss was to give minimal observations required at Internal A. 73 00:06:34,440 --> 00:06:42,630 That means that if you look at here, here, our sample size is 24 and still we are splitting it into 74 00:06:42,630 --> 00:06:43,470 two parts. 75 00:06:43,800 --> 00:06:48,920 So if I mention my minimum sample is split as 25. 76 00:06:49,910 --> 00:06:58,050 That is 13 would not take place only if the sample size is more than the minimum sample split. 77 00:06:58,290 --> 00:07:00,870 Then only four that is splitting will takes place. 78 00:07:01,680 --> 00:07:08,570 So for the second way, I will give minimum sample split of 40. 79 00:07:09,030 --> 00:07:17,100 So I'm seeing that for every splitting minimum 40 sample size required in that bucket. 80 00:07:20,100 --> 00:07:25,380 Here I am creating Rectory to object and the rest of the boards are the same. 81 00:07:27,030 --> 00:07:27,870 I will run this. 82 00:07:37,440 --> 00:07:38,390 So this is our. 83 00:07:38,970 --> 00:07:45,470 You can see that the maximum depth here is NorTech Willow Tree. 84 00:07:45,810 --> 00:07:51,180 It is much larger than you see for every splitting node. 85 00:07:52,500 --> 00:07:55,040 We have a sample size greater than 40. 86 00:07:56,160 --> 00:08:00,720 So this is this is another way of running your tree. 87 00:08:02,400 --> 00:08:08,610 The next way is to provide the minimum number of samples you need in your leaf nodes. 88 00:08:09,030 --> 00:08:11,730 So all this and nodes are your leaf node. 89 00:08:12,000 --> 00:08:16,860 And you can see there are some leaves where the sample size one as well. 90 00:08:17,430 --> 00:08:27,300 So you can limit your tree growth by giving value of this minimum sample required that each leaf node. 91 00:08:28,290 --> 00:08:31,960 To do that, you can pass the parameter means simple leaf. 92 00:08:33,270 --> 00:08:40,910 And as always, you can always shift up to getting details of all the parameters available within the 93 00:08:40,910 --> 00:08:41,100 Earth. 94 00:08:41,750 --> 00:08:43,290 Be sure in creating this set object. 95 00:08:44,220 --> 00:08:49,000 So here I will give my minimum some belief as friendly fire. 96 00:08:49,800 --> 00:08:51,570 And I will run this code again. 97 00:08:55,960 --> 00:09:05,320 You can see this is my new tree with condition on leaf notes, and you can see that all my leaf notes 98 00:09:05,320 --> 00:09:10,370 contain more than or equal to 25 samples. 99 00:09:11,740 --> 00:09:13,870 You can apply multiple conditions as well. 100 00:09:14,800 --> 00:09:17,900 So with this, I can also apply max depth. 101 00:09:25,780 --> 00:09:28,630 See, I wouldn't give Meng's step forward. 102 00:09:29,560 --> 00:09:31,060 I can read on this. 103 00:09:35,750 --> 00:09:41,720 Here you can see that I have pruned my tree and now my tree only have four layers. 104 00:09:43,970 --> 00:09:46,280 You can combine all the three conditions as well. 105 00:09:47,030 --> 00:09:52,640 So play around with these values and try to get the optimal tree. 106 00:09:53,030 --> 00:10:00,290 You can also use your Ardie Square and MASC value to evaluate performance of this tree in the later 107 00:10:00,290 --> 00:10:01,440 part of forecourts. 108 00:10:02,060 --> 00:10:11,480 We will see how to automate this process, how to pass multiple values of this high but parameters and 109 00:10:11,690 --> 00:10:14,000 how to select the best model. 110 00:10:14,120 --> 00:10:23,020 All of all these kinds of models that we are going to create will do that after our classification tree. 111 00:10:25,410 --> 00:10:33,940 Till then, you can cry the friend values of maximum depth, minimum, some belief and minimum symbolist 112 00:10:33,950 --> 00:10:37,040 split to get the best model for your data.