1 00:00:00,330 --> 00:00:07,350 Hello, all in the previous session, we have learned what is entropy information gain, and then we 2 00:00:07,350 --> 00:00:10,440 will end up having our beautiful country. 3 00:00:10,800 --> 00:00:15,060 So in this session, we are going to learn how exactly you get built. 4 00:00:15,120 --> 00:00:22,410 This country using something known as something known as this Guiney index, something known as this 5 00:00:22,710 --> 00:00:24,540 Gimmee Index Alexiev. 6 00:00:24,660 --> 00:00:26,800 This is exactly why you this letter. 7 00:00:26,850 --> 00:00:33,840 This is my youth case, of which we have to believe that this is Entry-Exit on the basis of class and 8 00:00:33,840 --> 00:00:39,990 on a basis of some genden, I have to predict whether that particular person is going to stay in hotels 9 00:00:39,990 --> 00:00:40,290 or not. 10 00:00:40,620 --> 00:00:47,540 This is my case for how to build a decision of this, not of caution, is the major concern in this 11 00:00:47,550 --> 00:00:47,820 century. 12 00:00:47,820 --> 00:00:49,420 I can see the major concern. 13 00:00:49,440 --> 00:00:56,760 This is the tree which which feature gets selected as a pattern or because once we have ignored our 14 00:00:56,760 --> 00:00:59,250 our our following statement almost gets old. 15 00:00:59,790 --> 00:01:05,370 So there is a one on one approach that we have already did with that, which is exactly using a concept 16 00:01:05,370 --> 00:01:10,430 of entropy and information in that we all have learned in all of the recession. 17 00:01:10,830 --> 00:01:12,120 We have one approach. 18 00:01:12,150 --> 00:01:18,160 We can very easily, easily find this decision tree with feature get selected as paranoid and subsequent 19 00:01:18,160 --> 00:01:20,710 note and all these things, we can easily find it. 20 00:01:21,240 --> 00:01:28,320 So now we have to understand how you can solve a similar problem statement using Chiney building, because 21 00:01:28,590 --> 00:01:33,840 calculation wise, this body is less complex than this, this one. 22 00:01:34,230 --> 00:01:40,980 And definitely that there are lots of algorithms that uses this Jenene that internally I that you need 23 00:01:40,980 --> 00:01:42,830 both internally to witnesses and treat. 24 00:01:43,110 --> 00:01:44,640 There are lots of algorithms. 25 00:01:45,570 --> 00:01:49,260 So how how to how exactly do we have to understand? 26 00:01:49,410 --> 00:01:51,360 It's all you have to understand. 27 00:01:51,420 --> 00:01:52,710 It's intuition. 28 00:01:53,560 --> 00:01:59,460 Let's say this this is our use case and we have to build a decision to let me open a new page. 29 00:01:59,460 --> 00:02:05,160 Very first time we will let's say let's the very first we have to start at the very first. 30 00:02:05,670 --> 00:02:07,800 We have to start with a class. 31 00:02:07,800 --> 00:02:16,350 Let's say we have to start with this class because then we have to compute the index of each and every 32 00:02:16,350 --> 00:02:16,960 feature. 33 00:02:17,160 --> 00:02:21,120 So let's say let's start with how to compute Gini index. 34 00:02:21,460 --> 00:02:25,370 You need that Virginny of this class feature Hitzig. 35 00:02:26,430 --> 00:02:29,060 So let me let me construct a table over here. 36 00:02:29,580 --> 00:02:33,690 Let's I have some lexia, my features next class. 37 00:02:34,170 --> 00:02:41,110 And here I have that second let's say I have my target variable, whether it is going to stay or not. 38 00:02:42,060 --> 00:02:47,910 So you will see in your class feature, in this class feature, you have basically eight, nine, 10, 39 00:02:47,910 --> 00:02:48,420 11. 40 00:02:48,690 --> 00:02:51,290 And with respect to this, you have some counts. 41 00:02:51,600 --> 00:02:53,670 So let me write down all these things. 42 00:02:53,740 --> 00:03:00,790 Let me say with respect to class, I have eight, nine, 10, 11 similar. 43 00:03:00,960 --> 00:03:02,130 We have some code as well. 44 00:03:02,470 --> 00:03:04,160 So this was back to class eight. 45 00:03:04,200 --> 00:03:05,760 I have gone somewhere. 46 00:03:06,030 --> 00:03:17,070 I have count as yeah, I have two years and no ones and here I have two years and count or no is definitely 47 00:03:17,070 --> 00:03:17,420 one. 48 00:03:17,850 --> 00:03:24,390 Similarly, over here I have one year and count of no is definitely three. 49 00:03:25,290 --> 00:03:31,530 Here I have count those years is three and my this one is one. 50 00:03:31,530 --> 00:03:36,570 So this is, this is, this is with respect to this is with respect to this class feature. 51 00:03:36,740 --> 00:03:43,470 This is all things with respect to class feature that you could simply obtain from this dataset. 52 00:03:43,620 --> 00:03:45,590 There's nothing left notation. 53 00:03:46,170 --> 00:03:48,540 Now you will see here you have total. 54 00:03:48,890 --> 00:03:55,190 Let me let me write down let's say here you have let's say I have a feature as it's a feature. 55 00:03:55,980 --> 00:03:58,560 So here you have this three, two plus one here. 56 00:03:58,560 --> 00:04:00,930 Similarly three here you have four. 57 00:04:00,930 --> 00:04:01,970 Here you have four. 58 00:04:03,030 --> 00:04:10,720 So so what is the probability what is the probability of of Class eight, again, same of class eight 59 00:04:10,740 --> 00:04:12,240 of class you'd like to gloss. 60 00:04:12,240 --> 00:04:13,740 It is two lengths of class. 61 00:04:13,740 --> 00:04:17,460 It is students that are going to stay in hotel. 62 00:04:17,460 --> 00:04:24,050 So we have to compute it in one in one column and then we have to compute what the probability of class 63 00:04:24,090 --> 00:04:29,490 it is to get that is not going to stay or we have to also compute. 64 00:04:31,020 --> 00:04:35,310 So basically, it's probably proactively to, I think, because here you have to count this to two. 65 00:04:35,310 --> 00:04:40,450 It means we students are going to stay a total of three, sit with this to it. 66 00:04:40,950 --> 00:04:42,990 So it will be won by three. 67 00:04:43,380 --> 00:04:50,220 Similarly, over here, you have something known as to right here, here, here you have one by four. 68 00:04:50,820 --> 00:04:52,860 Here you have something, three by four. 69 00:04:53,220 --> 00:04:57,420 Here you have something one right here you have something three by four. 70 00:04:57,750 --> 00:04:59,670 And here again, you have something. 71 00:05:00,060 --> 00:05:01,170 I can see one. 72 00:05:02,340 --> 00:05:07,960 So let's compute, Bill, before for this feature length computing. 73 00:05:08,580 --> 00:05:09,570 So what is and what is it? 74 00:05:09,750 --> 00:05:12,930 What is the genie beauty when you're when you're glass? 75 00:05:14,220 --> 00:05:15,240 Is it for us? 76 00:05:15,260 --> 00:05:15,990 What does it mean? 77 00:05:16,680 --> 00:05:22,280 It is nothing but one minus, what is the probability that it is available? 78 00:05:22,620 --> 00:05:24,570 I can see the probability of yes. 79 00:05:24,880 --> 00:05:33,870 That that that that each and every class available orbit Holdsclaw plus this probability of No. 80 00:05:35,770 --> 00:05:41,770 So if you will do some basic calculation, whether it is nothing but two squared plus one of my things, 81 00:05:41,920 --> 00:05:48,750 it will definitely give you four by four, because this is a four by nine is one of my nine. 82 00:05:49,000 --> 00:05:52,370 And if you have to do all these things, it will exactly me. 83 00:05:53,110 --> 00:05:55,380 Similarly, with respect to Glass nine. 84 00:05:55,990 --> 00:06:00,360 Similarly, the respect to Class Nine, you have some value. 85 00:06:01,390 --> 00:06:05,680 So it is nothing but one Marantis, which virtually exactly give you four minutes. 86 00:06:05,700 --> 00:06:12,790 Similarly, with respect to R, then with respect to 10, you have some values which is nothing but 87 00:06:13,060 --> 00:06:19,510 one minus this one by four plus three by four, which is nothing but 16. 88 00:06:19,540 --> 00:06:21,360 So it will become six by 16. 89 00:06:22,090 --> 00:06:25,950 And similarly, with respect to 11, here you have class 11. 90 00:06:26,470 --> 00:06:29,210 So in this case, here you have nothing but this again. 91 00:06:29,230 --> 00:06:32,770 Again, you have six by 60 that does not affect you. 92 00:06:33,370 --> 00:06:34,720 So now what do we have to do now? 93 00:06:34,720 --> 00:06:38,640 We have to compute Guiney for in class. 94 00:06:39,400 --> 00:06:40,870 So how you have to compute. 95 00:06:41,740 --> 00:06:44,770 So basically we do some of Jini build. 96 00:06:44,770 --> 00:06:46,960 This whole class feature will be nothing. 97 00:06:46,960 --> 00:06:51,180 But let me let me open a new widget to compute. 98 00:06:51,190 --> 00:06:54,190 We get them to compute weightism. 99 00:06:54,220 --> 00:06:59,100 Let's say this is my genius and that class, that this is a genius class. 100 00:06:59,470 --> 00:07:00,820 So how to compute it? 101 00:07:01,330 --> 00:07:06,500 So it's a basic basic for a very basic it is like a cake piece. 102 00:07:07,170 --> 00:07:08,290 Let me let me write down. 103 00:07:08,320 --> 00:07:14,800 It is nothing but number of instances, number of instances in Class eight. 104 00:07:16,300 --> 00:07:18,680 I don't properly in classic. 105 00:07:19,930 --> 00:07:23,230 It's divided by total instances. 106 00:07:25,050 --> 00:07:29,190 Total instances in two GenY of. 107 00:07:30,590 --> 00:07:31,400 Class IG. 108 00:07:33,370 --> 00:07:41,470 Of glass it plus I have to do four glass minus one plus I have to do four plus 10 plus I have to do 109 00:07:41,470 --> 00:07:44,740 for Class 11 as we select them here, I would say. 110 00:07:45,730 --> 00:07:48,190 And also plus nine. 111 00:07:50,460 --> 00:07:59,050 And let me let me let me do it as m so this is something like M m into G. 112 00:07:59,070 --> 00:07:59,580 Both. 113 00:08:01,290 --> 00:08:08,210 Class 10, I think, but last night, yeah, civil over here, I have some some some something like 114 00:08:08,210 --> 00:08:13,650 that similar head for class cleverness with respect to class level. 115 00:08:13,920 --> 00:08:16,190 So let me let me just put some values. 116 00:08:16,770 --> 00:08:25,350 So for classic, what we have computed in our previous this one, which is nothing but mine for classic, 117 00:08:25,890 --> 00:08:26,490 which is nothing. 118 00:08:26,490 --> 00:08:34,370 But you will also hear both instances where it is available is exactly three words. 119 00:08:34,770 --> 00:08:39,170 Total instances, total instances of class is exactly 14. 120 00:08:39,510 --> 00:08:41,790 So it means it is nothing but May 14. 121 00:08:42,060 --> 00:08:42,720 So it is nothing. 122 00:08:42,720 --> 00:08:45,990 But here I have to write as three, May four. 123 00:08:46,680 --> 00:08:54,240 And what is that what what you have computed with respect to the classic four by name in a previous 124 00:08:54,240 --> 00:08:58,850 slide similar here I have something for class nine, three by 14. 125 00:08:59,610 --> 00:09:02,100 Similarly over here I have four by nine. 126 00:09:02,580 --> 00:09:08,960 And similarly over here we have computed with respect to Class Ten, we have I think four. 127 00:09:08,970 --> 00:09:10,190 Yeah, four by four. 128 00:09:11,070 --> 00:09:17,730 So there's is nothing but just four by 14 and Geneve will be exactly six by 16 that we have already 129 00:09:17,730 --> 00:09:18,360 computed. 130 00:09:18,960 --> 00:09:23,850 So here definitely we have four by 14 and it is nothing but six by 16. 131 00:09:23,850 --> 00:09:31,740 And if you will do some basic calculations here, it is almost almost like an almost zero point four 132 00:09:31,740 --> 00:09:32,340 zero four. 133 00:09:32,340 --> 00:09:37,470 It is almost equal to zero by fours in a similar way, in a similar way. 134 00:09:37,470 --> 00:09:41,960 You have to do all these operations for your gender as well. 135 00:09:42,330 --> 00:09:48,480 So once you have let's say let's say you have to compute for X, you have to compute for genders how 136 00:09:48,480 --> 00:09:50,130 to compute, how to complete. 137 00:09:50,340 --> 00:09:51,020 What do you have to do? 138 00:09:51,030 --> 00:09:55,050 You have to just create a table very first just to create a table. 139 00:09:55,710 --> 00:09:59,430 Just calculate using that formula that I have given. 140 00:09:59,580 --> 00:10:06,640 What what was that one minus probability of just a squared plus probability of simple. 141 00:10:06,750 --> 00:10:07,740 Simple it. 142 00:10:08,430 --> 00:10:16,110 Once you will get this, then you have to compute guiney all this entire gender simple as simple as 143 00:10:16,830 --> 00:10:18,670 having some piece of cake like formula. 144 00:10:19,260 --> 00:10:23,550 So once you have this formula, you will get Eugène Guiney for this and Baginda. 145 00:10:23,820 --> 00:10:30,230 So once you will calculate it, its value will be somewhere around somewhere zero point for it and zero 146 00:10:30,240 --> 00:10:32,300 point for it to fly. 147 00:10:32,310 --> 00:10:41,880 You will see here you je ne of je ne of this class je ne of your class je ne of your class is less than 148 00:10:42,360 --> 00:10:45,930 this due of class is less than g of gender. 149 00:10:46,790 --> 00:10:48,320 That's what that's what you have. 150 00:10:48,350 --> 00:10:56,160 That's what you have calculated it with this, it means glass has launched, impurity will definitely 151 00:10:56,180 --> 00:10:58,540 selected as your rotenone. 152 00:10:58,640 --> 00:11:01,250 So let me let me open a new open or no. 153 00:11:01,980 --> 00:11:02,350 Yeah. 154 00:11:02,540 --> 00:11:04,340 So your glass will be. 155 00:11:05,470 --> 00:11:13,050 Get selected as your group and on the basis of this and on the basis of you have all the different different 156 00:11:13,060 --> 00:11:18,870 classes eight, nine, 10 and here you have 11. 157 00:11:19,000 --> 00:11:21,020 So you have some kind of a split. 158 00:11:21,850 --> 00:11:23,980 You have some kind of split here. 159 00:11:23,980 --> 00:11:25,650 You have something here. 160 00:11:25,660 --> 00:11:26,590 You have something. 161 00:11:26,590 --> 00:11:28,330 You have to just fill this blocks. 162 00:11:28,640 --> 00:11:30,090 You have to just fill in this blocks. 163 00:11:30,100 --> 00:11:35,110 According to your data, according to your data, once you have, then you have to solve this. 164 00:11:35,110 --> 00:11:37,260 Then you have to solve this, that, this. 165 00:11:37,420 --> 00:11:42,100 And at the end you will end up having your decision tree like this. 166 00:11:43,180 --> 00:11:48,450 Like this, so you will end up having like this decision, Chris? 167 00:11:49,310 --> 00:11:50,660 That's what I'm trying to show. 168 00:11:50,860 --> 00:11:56,620 That's another approach, if I will talk about with respect to how to help build up happiness. 169 00:11:57,130 --> 00:12:00,730 So let's talk about what is your post bruening? 170 00:12:00,730 --> 00:12:02,040 What are your pruning? 171 00:12:02,380 --> 00:12:07,600 That that that definitely helps if you have something like overfitting condition in your disease. 172 00:12:08,440 --> 00:12:14,160 So what is what is this post pruning and let's say what are your tree pruning? 173 00:12:14,170 --> 00:12:17,860 What is very first, let's understand what is most pruning. 174 00:12:17,870 --> 00:12:26,170 So definitely this this post pruning and this tree pruning board is extensively used whenever, whenever, 175 00:12:26,320 --> 00:12:32,080 whenever you have to cut, whenever you have to get rid of, you're always putting issues whenever you 176 00:12:32,080 --> 00:12:38,950 have some or fitting issues in your in your in your decision tree, then you have to use something known 177 00:12:38,950 --> 00:12:41,420 as most pruning and pruning. 178 00:12:41,740 --> 00:12:44,770 So what, what, what exactly do what exactly do. 179 00:12:45,040 --> 00:12:47,950 So they will basically control the depth of a tree. 180 00:12:47,950 --> 00:12:48,460 We will see. 181 00:12:48,580 --> 00:12:50,350 We we have to like this. 182 00:12:50,500 --> 00:12:52,430 We will we will seem to like this. 183 00:12:52,870 --> 00:12:56,050 So what they will do, they will control the depth of a tree. 184 00:12:56,170 --> 00:12:57,190 They will control. 185 00:12:57,220 --> 00:12:58,150 They will control it. 186 00:12:59,020 --> 00:12:59,800 They will control it. 187 00:13:00,820 --> 00:13:07,360 So let's talk about very forced this both pruning approach to this post pruning approach. 188 00:13:08,110 --> 00:13:10,780 So so in this process, what happens? 189 00:13:10,930 --> 00:13:12,270 What happens over here? 190 00:13:12,640 --> 00:13:20,860 So basically in this post pruning my decision tree is is is generated, forced, very forced. 191 00:13:21,280 --> 00:13:23,350 My decision to question it. 192 00:13:24,310 --> 00:13:27,600 And after it, what will happen after this? 193 00:13:27,640 --> 00:13:34,000 This post pruning is both pruning, remove some will the less branches, I guess, see some useless 194 00:13:34,000 --> 00:13:42,190 branches from this decision tree by by using what I can see by experimenting some cross-validation approaches 195 00:13:42,680 --> 00:13:46,230 like Liko Randy, my thoughts seem like random. 196 00:13:46,240 --> 00:13:54,970 I thought silly, like good results for basically using using this this cross-validation approach, 197 00:13:54,970 --> 00:13:57,580 which is nothing my my experimental approach. 198 00:13:58,360 --> 00:14:03,070 It removes the useless Brontë's useless branches of this. 199 00:14:03,280 --> 00:14:04,210 This is a tree. 200 00:14:04,720 --> 00:14:06,640 So what they will do, what they will do. 201 00:14:06,670 --> 00:14:11,740 So they will basically control the height of what tree or against that of a tree. 202 00:14:11,980 --> 00:14:18,010 And we will not end up creating the infinitely growing this diffidently. 203 00:14:18,010 --> 00:14:22,030 It will control basically its depth, it will control its depth. 204 00:14:22,180 --> 00:14:23,110 So what exactly? 205 00:14:23,140 --> 00:14:24,220 Let me summarize it. 206 00:14:24,220 --> 00:14:31,870 What, what, what, what this post born into, let's say in very first step, we will build our decision 207 00:14:31,870 --> 00:14:34,150 tree in the second step. 208 00:14:34,330 --> 00:14:39,090 In the second step, what we will do, we will check accuracy for training data. 209 00:14:39,100 --> 00:14:43,360 Let's say I have accuracy as ninety five percent for training. 210 00:14:43,990 --> 00:14:47,440 But with respect to my bastida, you have accuracy like this. 211 00:14:47,440 --> 00:14:48,230 Sixty percent. 212 00:14:48,670 --> 00:14:52,060 Now you will see you have some kind of overfitting issue for you. 213 00:14:52,060 --> 00:14:57,910 Whenever you have some auditing issues, consider your pruning approaches to what they will do. 214 00:14:58,240 --> 00:15:01,030 They will remove some useless branches. 215 00:15:01,060 --> 00:15:08,140 They will remove some useless branches using our experimental approach like like cross-validation, 216 00:15:08,140 --> 00:15:17,080 like things like randomizer, CV, CV, some some opportune algorithms like some of algorithms as well 217 00:15:17,650 --> 00:15:20,290 for using these these, these techniques. 218 00:15:20,410 --> 00:15:27,460 We can remove some useless branches of this, of this decision tree and and at the end we will improve 219 00:15:27,460 --> 00:15:31,840 our accuracy, we will improve the accuracy of the of this. 220 00:15:31,840 --> 00:15:35,290 This is simply the fourth test for this pruning approach will do. 221 00:15:35,530 --> 00:15:37,890 So what so what this pre pruning will do. 222 00:15:37,900 --> 00:15:44,200 Let me open a new page to what or what this pre pruning will do. 223 00:15:44,200 --> 00:15:46,510 What this pruning. 224 00:15:47,670 --> 00:15:54,680 Will do so it is it is also known as forward tuning, basically, it is also known as forward tuning, 225 00:15:54,690 --> 00:16:00,150 whereas whereas this post borning known as basically My Backward Pony. 226 00:16:00,780 --> 00:16:07,260 So what what they exactly say it is, it also controls the height of our season. 227 00:16:07,410 --> 00:16:10,070 It also controls height of the season. 228 00:16:10,080 --> 00:16:11,990 But there are some minor differences. 229 00:16:12,540 --> 00:16:17,250 So in post pwning, what we will do, we will we will very first build our decision tree. 230 00:16:17,490 --> 00:16:19,530 Then we will remove some useless branches. 231 00:16:20,220 --> 00:16:28,380 But in pre pwning, what we will do, what we will do, we will sit before building the season tree, 232 00:16:29,850 --> 00:16:37,830 before building this decision tree, let's say before building some kind of decision tree we will basically 233 00:16:37,830 --> 00:16:38,550 control. 234 00:16:38,550 --> 00:16:41,130 Is that whereas using. 235 00:16:42,180 --> 00:16:50,010 Using some seewhy techniques, which is my cross-validation techniques like Liko, Randomizer, TV like 236 00:16:50,010 --> 00:17:00,030 or grid search like grid such Siri, we will find our best parameters, we will find our best barometers. 237 00:17:00,470 --> 00:17:03,930 And then and then we will we will find our best model. 238 00:17:04,260 --> 00:17:09,240 Because once we have best parameters, then we will definitely get our best model so long as we have 239 00:17:09,250 --> 00:17:10,200 best model. 240 00:17:10,410 --> 00:17:18,030 Once we have our best decision tree, once we have best decision tree, then what we will do, we will 241 00:17:18,030 --> 00:17:19,140 perform training. 242 00:17:20,300 --> 00:17:24,840 We will go from training, then we will do testing on our test data. 243 00:17:25,400 --> 00:17:28,040 That's what that's what this people do. 244 00:17:28,070 --> 00:17:31,550 So before building this is complete, it will do all these steps. 245 00:17:32,300 --> 00:17:35,530 It will do all these steps before we see the dream. 246 00:17:36,040 --> 00:17:42,710 But putting postponing what will happen once it will be decision that we are going to apply all these 247 00:17:42,710 --> 00:17:43,320 techniques. 248 00:17:43,760 --> 00:17:46,670 There's a minor difference between this pre and post. 249 00:17:47,540 --> 00:17:49,910 So I hope you understand all this kind of thing. 250 00:17:50,600 --> 00:17:51,690 So thank you, guys. 251 00:17:51,710 --> 00:17:54,140 How nice to keep learning. 252 00:17:54,140 --> 00:17:56,270 Keep growing, keep practicing.