1 00:00:00,450 --> 00:00:07,140 Head on in the previous session, we have learned here what are the basics of PCs and what exactly this 2 00:00:07,140 --> 00:00:14,120 is and what are the factors that will help us to build that decision, which are nothing but an information 3 00:00:14,120 --> 00:00:14,420 again. 4 00:00:14,640 --> 00:00:18,630 And in the previous session, we have learned what is entropy, what is impurity. 5 00:00:19,200 --> 00:00:24,010 Also known how to compute entropy of each and every feature. 6 00:00:24,300 --> 00:00:27,830 That's what we have compute for X, Y and Z as well. 7 00:00:28,470 --> 00:00:34,950 Similarly, now what we have to do, we have to compute information again because whoso our feature 8 00:00:34,950 --> 00:00:43,200 has our highest information, then that feature will get selected as parenteral because major concern 9 00:00:43,200 --> 00:00:47,560 indices a little feature get selected as a parent node. 10 00:00:48,210 --> 00:00:51,180 That's why we have to take care of all these things. 11 00:00:51,660 --> 00:00:53,860 So what we have to do, like we open a new page. 12 00:00:54,090 --> 00:01:01,980 So let's talk about let's talk about what is let's talk about what is information for information is 13 00:01:01,990 --> 00:01:02,820 nothing about. 14 00:01:02,850 --> 00:01:11,550 So basically based on entropy, based on entropy, I can say based on entropy, which one which feature 15 00:01:11,550 --> 00:01:16,200 is going to be highest gain, which is going to gain highest. 16 00:01:17,040 --> 00:01:18,680 So who's a feature? 17 00:01:18,900 --> 00:01:21,310 Who is going to get a highest gain? 18 00:01:21,930 --> 00:01:30,650 Then that feature gets selected as a parent and all this information gain helps us allow to select. 19 00:01:30,810 --> 00:01:33,210 But I know how to compute this. 20 00:01:33,390 --> 00:01:38,670 So we have a simple formula here, which is nothing but my information gain. 21 00:01:39,150 --> 00:01:43,260 Or you can present it Gene or IJI, which is nothing. 22 00:01:43,260 --> 00:01:47,270 But you have you have seen multiple formulas in book. 23 00:01:47,350 --> 00:01:49,520 So here I going say a very simple one. 24 00:01:49,860 --> 00:01:58,320 So it is nothing but my summation of something else and divided by s into entropy of each and every 25 00:01:58,320 --> 00:01:58,830 feature. 26 00:01:59,490 --> 00:01:59,910 So. 27 00:02:00,060 --> 00:02:08,090 So this is nothing but I can say is entropy is giving which one is going to gain are highest. 28 00:02:09,000 --> 00:02:11,190 So there is what is this s what is it. 29 00:02:11,190 --> 00:02:13,020 This is after is nothing. 30 00:02:13,020 --> 00:02:17,320 But let's say, let's say if I will talk about, let's talk about this. 31 00:02:17,440 --> 00:02:22,020 We are excited and we have some output feature based on this data. 32 00:02:22,030 --> 00:02:23,100 I have to break something. 33 00:02:23,640 --> 00:02:25,020 So this is nothing. 34 00:02:25,020 --> 00:02:32,310 But let's say if I will talk about information gained with respect to X, so this is nothing but total 35 00:02:32,310 --> 00:02:36,360 data points to three data points in X. 36 00:02:37,200 --> 00:02:43,370 If I will talk about IJI, I do X, it means information X. 37 00:02:43,500 --> 00:02:48,380 In such case it is nothing but the total data points an X and what is Eli? 38 00:02:49,020 --> 00:02:52,710 Eli is nothing but a copy of X. 39 00:02:53,190 --> 00:02:57,630 What we have computed entropy of X equals to one. 40 00:02:58,170 --> 00:03:07,380 And in the second case, because here we have summation in second give and croppy also X we are is zero 41 00:03:07,500 --> 00:03:12,410 because we have completed in both the scenarios we are acces zero and X is one. 42 00:03:12,600 --> 00:03:18,950 So at the time I have made you understand how to compute it and and you will see it plays a vital role 43 00:03:19,080 --> 00:03:22,010 here and you will think, what is this Essene? 44 00:03:22,020 --> 00:03:24,240 What is this Essid, what is this Essent. 45 00:03:24,330 --> 00:03:32,760 So Essent is nothing but out of total data points out of total datapoints, how many we have all once. 46 00:03:33,380 --> 00:03:38,060 If we have to compete with respect to all of us, to how many we have, all of us. 47 00:03:38,100 --> 00:03:42,360 And if we add to this back to zero to how many we have all zero that. 48 00:03:42,390 --> 00:03:43,200 What is this? 49 00:03:43,290 --> 00:03:48,560 S so let's let's calculate information with respect to each and every future. 50 00:03:48,570 --> 00:03:50,510 Let's let's calculate it. 51 00:03:50,970 --> 00:03:52,490 Let me open a new page. 52 00:03:52,770 --> 00:03:59,280 So here I am going to compute that information gain of X so it is nothing but one minus. 53 00:03:59,700 --> 00:04:03,000 Basically I think we have three months. 54 00:04:03,270 --> 00:04:07,380 Let me open all the previous year I have X here. 55 00:04:07,410 --> 00:04:08,910 We have completed this here. 56 00:04:08,910 --> 00:04:12,180 We have a use case and we here we have this X here. 57 00:04:12,180 --> 00:04:16,350 You will see it is three months and we have total four data points. 58 00:04:16,350 --> 00:04:23,370 It means this, this and here we have this is probability or I guess this is not probability. 59 00:04:23,820 --> 00:04:27,360 This is entropy with respect to access equals to one. 60 00:04:27,360 --> 00:04:31,350 And this is when you X is you, which is nothing but zero. 61 00:04:31,350 --> 00:04:39,180 And this is in terms of lost and if you will computed it is just almost zero point almost almost to 62 00:04:39,180 --> 00:04:39,420 it. 63 00:04:39,690 --> 00:04:42,210 Almost zero point almost. 64 00:04:42,890 --> 00:04:44,310 Let me let me repeat. 65 00:04:45,690 --> 00:04:51,540 So it is nothing, but as we have summation, it's a three by four in two zero point two eight. 66 00:04:53,150 --> 00:05:03,890 Plus, as we have one 0th, one by four into zero, because it is my entropy, we are to zero and it 67 00:05:03,890 --> 00:05:08,720 is my entropy, it is entropy where X is one. 68 00:05:08,950 --> 00:05:11,750 Let's look at what is the meaning of this one, that symbol. 69 00:05:12,500 --> 00:05:14,390 So we have to compute this. 70 00:05:14,390 --> 00:05:20,840 If we have to compute this, it is its value will be somewhere close to paper into this, which is nothing. 71 00:05:20,840 --> 00:05:23,840 But it is approx. 72 00:05:23,840 --> 00:05:25,090 I can say approx. 73 00:05:25,540 --> 00:05:31,480 And some somewhere close to zero point somewhere close to zero point, somewhere close to zero point 74 00:05:31,490 --> 00:05:33,950 or you can see is approx. 75 00:05:33,950 --> 00:05:35,330 Zero point seven nine. 76 00:05:37,030 --> 00:05:42,870 Think you can all in a similar view, you will compete for Jiwon in a similar way, you can compete 77 00:05:42,870 --> 00:05:47,060 for Geetha, let me compute for let me compute for Y and Z. 78 00:05:48,000 --> 00:05:50,220 So you will see with respect to why. 79 00:05:50,610 --> 00:05:57,390 So this is with respect to Y, where you have two, one, two zero and both of you and Robbie with respect 80 00:05:57,390 --> 00:05:59,880 to one and zero is zero exactly zero. 81 00:06:00,310 --> 00:06:02,400 Let me let me die down is nothing. 82 00:06:02,400 --> 00:06:05,220 But it is just nothing but one minus. 83 00:06:06,560 --> 00:06:12,480 Two by four in two, zero plus two by four and two zero. 84 00:06:12,500 --> 00:06:14,810 So it is nothing but just one simple. 85 00:06:15,710 --> 00:06:19,560 Similarly, with respect to that, let me open the previous media. 86 00:06:20,200 --> 00:06:21,440 This is back to that. 87 00:06:21,440 --> 00:06:26,570 Here I have one one and here I have two columns with respect to zero to with one. 88 00:06:26,720 --> 00:06:30,710 So let me write down or do so with respect to this one. 89 00:06:30,710 --> 00:06:34,430 With respect to this here, I have nothing but one minus. 90 00:06:35,570 --> 00:06:37,460 To buy four to one. 91 00:06:39,120 --> 00:06:47,300 Plus this two by four in one, it is nothing but half half, which is one, so it will give me some 92 00:06:47,310 --> 00:06:47,740 results. 93 00:06:47,820 --> 00:06:49,740 So you will see this. 94 00:06:49,740 --> 00:06:52,380 Why has a HIAS information gained? 95 00:06:52,380 --> 00:06:57,770 So it means it means this column gets selected as your parent. 96 00:06:58,620 --> 00:07:06,180 So let me let me construct my decision tree because I could construct we will see this, we will get 97 00:07:06,180 --> 00:07:12,900 selected as my decision or I can say as my parent, not so envie you have to condition. 98 00:07:12,900 --> 00:07:18,790 Basically the first one is when your wise one second on it then Elvi zero. 99 00:07:19,350 --> 00:07:26,640 So so if you see if you will see in this use case when you have one, so you have this much data and 100 00:07:26,850 --> 00:07:28,830 then you have zero, you have this much data. 101 00:07:29,310 --> 00:07:32,700 So let me write down, let me write down then you have one. 102 00:07:34,130 --> 00:07:36,260 When you have one, you have this much data. 103 00:07:37,530 --> 00:07:45,210 And when you have zero, you have this much like me advised on these things, so you have here you have 104 00:07:45,210 --> 00:07:49,840 some free to jack feature and let's say here you have some class. 105 00:07:50,820 --> 00:07:55,710 So in X you have one one and said you have one zero. 106 00:07:55,710 --> 00:07:58,880 And in this class we have forced labor. 107 00:07:58,890 --> 00:08:00,180 And secondly, it was suddenly what? 108 00:08:00,180 --> 00:08:01,890 Here we have X jet. 109 00:08:01,890 --> 00:08:04,290 And on the basis of extra, I have some class. 110 00:08:05,300 --> 00:08:11,960 So here again, you have some one zero one one zero and class level, basically. 111 00:08:12,390 --> 00:08:12,960 Second, second. 112 00:08:13,200 --> 00:08:18,930 So now, again, you would figure out this is a similar kind of statement that we have for you, cause 113 00:08:18,930 --> 00:08:24,990 you will see initially you will see what here this is the similar kind of statement is a similar kind 114 00:08:25,150 --> 00:08:26,940 of statement that we have for now. 115 00:08:26,970 --> 00:08:28,980 You have a similar problem over here. 116 00:08:29,760 --> 00:08:31,920 So what do we have to perform? 117 00:08:31,920 --> 00:08:39,900 Similar kind of operations, similar kind of calculating and Groppi with respect to zero with respect 118 00:08:39,900 --> 00:08:46,560 to one, then we have to compute this information gained and whosoever has the highest that it will 119 00:08:46,560 --> 00:08:47,070 get selected. 120 00:08:48,210 --> 00:08:54,380 So while you're here on all these operations that we have all happened not only in all of our sessions. 121 00:08:54,390 --> 00:09:00,600 So once we will complete everything, once will come to this X, this is it. 122 00:09:00,960 --> 00:09:03,000 Then we have to come to the Zipzer. 123 00:09:03,210 --> 00:09:10,240 So once we will compute each and everything, you will end up having our decision to select it. 124 00:09:10,560 --> 00:09:11,520 We open a new page. 125 00:09:11,530 --> 00:09:17,340 Let's suppose suppose that this is decision which which we will get supples. 126 00:09:17,760 --> 00:09:24,990 This hypothetically, suppose I have some decision based on some condition that said this is my condition, 127 00:09:24,990 --> 00:09:27,640 let's say here, my bet and all that. 128 00:09:27,780 --> 00:09:29,160 Let's just assuming. 129 00:09:29,340 --> 00:09:33,780 So once you will calculate each and every step you will, you will definitely get your decision. 130 00:09:34,770 --> 00:09:40,230 So here in this X, you have some conditions like the ones you see here. 131 00:09:40,230 --> 00:09:42,870 You have some conditional ones, zero. 132 00:09:42,910 --> 00:09:49,350 And on the basis of this, you have something that say you have something, you have something, you 133 00:09:49,350 --> 00:09:55,500 have something and say when you accies when you have something like that and here you have led to something. 134 00:09:55,620 --> 00:09:56,600 It's some support. 135 00:09:56,610 --> 00:09:56,920 Support. 136 00:09:56,990 --> 00:10:00,410 This is that suppose this is this is this is X north. 137 00:10:01,140 --> 00:10:07,800 OK, so then this they're here, you have Allinson condition and here you have again some condition, 138 00:10:07,950 --> 00:10:16,440 some condition let's say here one zero here, one zero let's say here, one zero here again, one zero. 139 00:10:17,430 --> 00:10:19,080 And here you have some labels. 140 00:10:19,380 --> 00:10:25,260 Let's say I have my label, which is mine, nothing but nothing but my guess output I can see. 141 00:10:25,890 --> 00:10:32,370 And here I have it, my first label, similarly overhead, I have like, say, forceable similarly overhead. 142 00:10:32,370 --> 00:10:33,740 And it's I have a second level. 143 00:10:33,750 --> 00:10:34,720 Second level. 144 00:10:34,740 --> 00:10:35,640 It's second level. 145 00:10:36,420 --> 00:10:37,520 So this is also my second. 146 00:10:37,780 --> 00:10:44,400 So this all out of my second said this is my final decision tree next to this is my final decision tree 147 00:10:44,670 --> 00:10:53,390 that I have achieved after doing all the operations, after calculating entropy, then information gate, 148 00:10:53,700 --> 00:11:01,020 then we will end up having this beautiful decision tree for my use is that we use this, we have sepals. 149 00:11:01,170 --> 00:11:05,910 I have some Bastida suppose Mike tested like say zero zero one. 150 00:11:05,940 --> 00:11:13,050 Let's say I have access zero y as let's say 007 and I have the product, I have a pellet. 151 00:11:13,260 --> 00:11:15,080 What can be the label for this. 152 00:11:15,090 --> 00:11:16,860 What can the label for this. 153 00:11:17,130 --> 00:11:18,000 For what we have to do. 154 00:11:18,000 --> 00:11:20,930 Basically we have I was this tree sample. 155 00:11:20,970 --> 00:11:21,480 That's it. 156 00:11:21,670 --> 00:11:25,070 You will see when you exit zero at any wires you will see. 157 00:11:25,620 --> 00:11:26,870 So initially you try zero. 158 00:11:26,880 --> 00:11:32,870 So we have to come in this here you that is when we have to come over here and hear your exit zero zero 159 00:11:32,880 --> 00:11:38,920 X, as you said, it means this is your final output that my model that made this season. 160 00:11:38,930 --> 00:11:40,680 Trivellato Simple. 161 00:11:40,920 --> 00:11:49,200 So we have to basically follow the hierarchy for whatever hierarchy that will come at the time of your 162 00:11:49,200 --> 00:11:51,600 training, the similar hierarchy. 163 00:11:51,600 --> 00:11:56,150 We have to follow at a time of testing at the time of prediction. 164 00:11:56,370 --> 00:11:57,270 Now you will see. 165 00:11:57,390 --> 00:11:58,410 Yeah, that's OK. 166 00:11:58,410 --> 00:12:03,870 Now, I have very much converted with this decision tree in case of classification. 167 00:12:04,320 --> 00:12:10,290 In case of classification, definitely because we have a lot, a lot regarding this. 168 00:12:10,650 --> 00:12:13,650 But you will think, what if we have a regression? 169 00:12:13,650 --> 00:12:18,210 You guess what if we have a discussion, let me open a new page. 170 00:12:18,450 --> 00:12:19,140 Let me open it. 171 00:12:19,830 --> 00:12:25,830 Let's see what if we have a regression or you then say, what if your data is not in the form of zeros 172 00:12:25,830 --> 00:12:28,440 and ones because it wasn't in the I love you. 173 00:12:28,800 --> 00:12:30,780 You don't have written the form of zeros and ones. 174 00:12:30,780 --> 00:12:31,260 The what? 175 00:12:31,410 --> 00:12:33,000 What do you have to do in such case? 176 00:12:33,660 --> 00:12:36,990 Let's say let's say my data, let's say my data this time. 177 00:12:37,060 --> 00:12:43,900 Any form of some, let's say, some continuous nature that exactly and just in our real world scenarios, 178 00:12:44,520 --> 00:12:52,360 you will see let's say you have X, let's say here my X is let's say five forward five similarly invite. 179 00:12:52,390 --> 00:13:00,280 I have something like, say, six to it and said I have something like seven to ten and let's say the 180 00:13:00,280 --> 00:13:02,170 respectless, I have some output feature. 181 00:13:02,900 --> 00:13:05,640 That's my output is first class. 182 00:13:05,650 --> 00:13:09,580 Second, let's say I have some this one I have this one scenario. 183 00:13:09,580 --> 00:13:13,960 Let's say this is this is still this is still my classification use case. 184 00:13:14,500 --> 00:13:18,460 But but my continuous features are more realistic. 185 00:13:18,460 --> 00:13:18,850 They are. 186 00:13:18,850 --> 00:13:22,420 They are they are somehow related to my real world scenarios. 187 00:13:22,690 --> 00:13:28,810 Because you will see in the other scenarios you don't have over here as zero zero one or you would almost 188 00:13:28,810 --> 00:13:31,470 have this this this numerical kind of data. 189 00:13:31,480 --> 00:13:33,490 So how to how to build such a decision. 190 00:13:34,390 --> 00:13:40,510 So in such case, what we will do basically we will compute, we will compute mean mean a long column. 191 00:13:40,510 --> 00:13:44,140 Let's say that with respect to X, that's respect to X. 192 00:13:45,100 --> 00:13:50,480 So what we will do, we will see it mean is approx four point six one. 193 00:13:50,620 --> 00:13:56,380 So what we will do, we will basically split it on the basis of discrimination here and here. 194 00:13:56,380 --> 00:14:01,550 I'm going to say wherever it is, less than four point six seven, I have to accept the season and whenever 195 00:14:01,570 --> 00:14:05,490 it is greater than four point six seven, I have to be some. 196 00:14:05,520 --> 00:14:09,070 This is this is a gender approach that we have to follow. 197 00:14:09,440 --> 00:14:16,270 Or it could be also possible that we can have multiple branches, we can have multiple branches, let's 198 00:14:16,270 --> 00:14:20,830 say let's say between somewhere between zero to five range here. 199 00:14:20,830 --> 00:14:22,990 Let's say I have some five to it here. 200 00:14:22,990 --> 00:14:24,530 I have some Melonie. 201 00:14:25,000 --> 00:14:27,310 And here let's say I have greater than Gwendolyn's. 202 00:14:27,700 --> 00:14:29,460 I have multiple branches as well. 203 00:14:29,800 --> 00:14:34,440 So that's how you that's how you deal with this type of this type of scenarios. 204 00:14:34,460 --> 00:14:41,470 So in the upcoming session, we are going to learn how to build how to build your season tree using 205 00:14:41,740 --> 00:14:44,230 Guinean nephews in your Guinea. 206 00:14:44,230 --> 00:14:49,000 Berbee This is this is of the technology at how you can build a decision tree. 207 00:14:49,690 --> 00:14:51,040 So that's all about decision. 208 00:14:51,040 --> 00:14:52,630 Hopefully you will love it very much. 209 00:14:52,820 --> 00:14:53,440 Thank you. 210 00:14:53,770 --> 00:14:54,790 How can I stay? 211 00:14:54,970 --> 00:14:55,900 Keep learning. 212 00:14:55,900 --> 00:14:58,150 Keep growing, keep practicing.