1 00:00:00,170 --> 00:00:07,710 Hello, let's talk over what is Random Forest, which is exactly a machine learning algorithm, but 2 00:00:07,710 --> 00:00:15,000 before going deep into random forest, let's have a quick overview of our previous session in which 3 00:00:15,000 --> 00:00:17,850 we have learned your random forest nothing. 4 00:00:17,850 --> 00:00:20,960 But it is just a collection of multiple pieces. 5 00:00:20,970 --> 00:00:28,830 And then we have also some knowledge behind what is this is a tree, which is exactly this nothing. 6 00:00:28,830 --> 00:00:31,940 But it is just a hierarchy like structure. 7 00:00:32,280 --> 00:00:33,450 Then we are also learn. 8 00:00:33,750 --> 00:00:34,230 Yeah. 9 00:00:34,560 --> 00:00:38,740 What are the main key factors using that you can build. 10 00:00:38,740 --> 00:00:39,380 Believe this is a.. 11 00:00:39,390 --> 00:00:40,500 Which is exactly. 12 00:00:40,500 --> 00:00:42,750 Information gain and deniability. 13 00:00:42,960 --> 00:00:50,440 So whosever future has highest information that will get selected as a pattern or because a major goal 14 00:00:50,440 --> 00:00:52,580 assign the season tree is nothing. 15 00:00:53,040 --> 00:00:59,850 You have to select what future as a pattern or similarly using Guiney index as well in your body as 16 00:00:59,850 --> 00:01:00,120 well. 17 00:01:00,600 --> 00:01:03,510 Hocevar feature has lost in index. 18 00:01:03,510 --> 00:01:07,860 It means that particular feature has lowest inside it. 19 00:01:08,220 --> 00:01:13,830 So it means I'm going to select that feature as my better note that what my index will do. 20 00:01:14,310 --> 00:01:21,150 So using either information or Guiney, you can select your decision tree and once you have a decision, 21 00:01:21,330 --> 00:01:22,920 you can easily do prediction. 22 00:01:23,250 --> 00:01:25,140 So intersession what I am going to do. 23 00:01:25,140 --> 00:01:31,620 I'm going to give you our intuition behind what is exactly a random forest, which is nothing, which 24 00:01:31,620 --> 00:01:39,390 is just a collection of which is just a collection of multiple decision, which is just a collection 25 00:01:39,390 --> 00:01:41,090 of multiple pieces. 26 00:01:41,100 --> 00:01:47,370 And so out of is nothing but random forest is nothing, but it is just a collection of multiple pieces 27 00:01:47,370 --> 00:01:47,880 and trees. 28 00:01:49,220 --> 00:01:56,780 So inside the forest this season, trees like this is my decision, given that this is my decision to 29 00:01:56,780 --> 00:02:00,220 do and let's say this is my decision tree. 30 00:02:00,410 --> 00:02:08,270 So decision tree, which are big over here that are basically built using a technique known as bagging 31 00:02:08,810 --> 00:02:17,000 on, you can also see both estab aggregation because also dumb as boot is trap. 32 00:02:17,960 --> 00:02:18,980 Aggregation. 33 00:02:20,220 --> 00:02:22,080 Would this tap aggregation? 34 00:02:22,110 --> 00:02:27,360 So what exactly is this Baghi or what is this step execution? 35 00:02:27,810 --> 00:02:35,150 So bold step execution or bagging is nothing good in which we are going to create a multiple bassnectar. 36 00:02:35,200 --> 00:02:39,420 This is my Bhagavan, this is my back, and this is my back treat. 37 00:02:39,600 --> 00:02:42,540 Let's say this this is a free one. 38 00:02:42,840 --> 00:02:45,050 Gives us some protection for some data. 39 00:02:45,060 --> 00:02:45,710 Let's do one. 40 00:02:45,720 --> 00:02:48,110 Let's say this is zero Alexiev. 41 00:02:48,120 --> 00:02:50,250 This gives one for this. 42 00:02:50,430 --> 00:02:51,720 What this bank will do. 43 00:02:51,960 --> 00:02:59,160 It will exactly go with the Mesabi or I can see it will basically aggregate my prediction. 44 00:02:59,160 --> 00:03:07,920 So once I will aggregate I will come up with a final prediction as one because Mzoudi always wins or 45 00:03:07,960 --> 00:03:10,650 Hennesy Mzoudi goes with one. 46 00:03:10,830 --> 00:03:12,650 So you will also over here. 47 00:03:12,870 --> 00:03:16,380 This is exactly my Baghi techne. 48 00:03:16,710 --> 00:03:20,440 So let me consider a very simple use case order. 49 00:03:20,480 --> 00:03:22,020 Let me open a new page. 50 00:03:22,320 --> 00:03:23,910 So I'm just going to open a new page. 51 00:03:24,040 --> 00:03:28,670 Now suppose suppose I have my entire dataset. 52 00:03:29,040 --> 00:03:30,890 This is my entire dataset. 53 00:03:30,900 --> 00:03:38,330 Suppose this is my entire dataset that has some support and columns and angles. 54 00:03:38,760 --> 00:03:41,300 This is a dimension of my dataset. 55 00:03:41,520 --> 00:03:46,500 So what exactly initially will do so initially, what rhinovirus? 56 00:03:46,500 --> 00:03:46,980 Billu. 57 00:03:47,160 --> 00:03:50,250 Initially I will pick up some sample. 58 00:03:50,370 --> 00:03:50,910 Let's see. 59 00:03:50,910 --> 00:03:57,870 I'm going to need to sample one, so I will pick up some sample of data and I will pass I will pass 60 00:03:58,110 --> 00:04:04,490 this to my model, this to my model, which is exactly this is and which is exactly like this one. 61 00:04:04,740 --> 00:04:14,030 So I will pass some sample of data with roll sampling, with roll sampling, as well as feature sampling, 62 00:04:14,040 --> 00:04:16,560 road sampling and feature sampling. 63 00:04:16,560 --> 00:04:19,860 Or you can see that all sampling and column sampling. 64 00:04:19,890 --> 00:04:24,540 So what exactly is this row sampling and sampling support? 65 00:04:25,170 --> 00:04:30,670 This dataset has six hundred rows into its and column. 66 00:04:31,050 --> 00:04:34,500 So initially, what will happen initially? 67 00:04:36,020 --> 00:04:44,430 So what will happen next in sample one, let's say two hundred rows and three column will get shifted, 68 00:04:44,490 --> 00:04:52,520 let's say randomly, randomly, so let's say randomly, two hundred and three column will get transferred 69 00:04:52,520 --> 00:04:56,860 to my this isn't one model and it will do some kind of prediction on this. 70 00:04:57,050 --> 00:05:02,630 Let's see if it gets prediction, let's say having some regression use case so it will give prediction 71 00:05:02,780 --> 00:05:04,490 in the form of subcontinent's data. 72 00:05:04,490 --> 00:05:11,810 And if we have a, let's say, classification use, it will give a prediction in the form of discrimination, 73 00:05:11,810 --> 00:05:14,090 either one or either zero. 74 00:05:14,120 --> 00:05:17,700 And if regression, it will do some in the form of continuous nature. 75 00:05:17,730 --> 00:05:19,490 Let's say ten point eleven. 76 00:05:19,490 --> 00:05:20,220 Anything else? 77 00:05:20,750 --> 00:05:23,440 So this is with respect to my sample. 78 00:05:23,540 --> 00:05:31,430 Similarly, after it, after what will happen, whatever data I will have there, it will get back to 79 00:05:31,430 --> 00:05:32,420 my data set. 80 00:05:32,420 --> 00:05:40,040 It will get back to when you does it again, my sample to my sample to withdraw all sampling and future 81 00:05:40,040 --> 00:05:41,380 sampling will happen. 82 00:05:41,780 --> 00:05:51,140 And again, this data will get over here and my decision G two will get trained on this particular data, 83 00:05:51,140 --> 00:05:55,740 on this particular sample, and it will do some kind of prediction. 84 00:05:55,760 --> 00:05:57,580 Let's see if regression is good. 85 00:05:57,740 --> 00:06:04,100 So I am going to predict as athleticism some value and if classification uses, I will predict in some 86 00:06:04,100 --> 00:06:05,040 discrete nature. 87 00:06:05,940 --> 00:06:12,890 Similarly, after some rule stopping and features happening again happen, then I'm going to consider 88 00:06:12,890 --> 00:06:13,770 my sample three. 89 00:06:13,880 --> 00:06:15,920 I think this is my decision. 90 00:06:16,700 --> 00:06:17,680 This isn't it tree. 91 00:06:18,020 --> 00:06:26,780 And this season the three will get clean on this particular sample, on this particular sample, and 92 00:06:26,780 --> 00:06:30,580 it will determine exactly you prediction as well. 93 00:06:30,980 --> 00:06:35,210 So it will exactly give a prediction as well, depending upon what use case we have. 94 00:06:36,640 --> 00:06:38,260 So what will happen here? 95 00:06:38,740 --> 00:06:42,160 So as we know, as we know, that we will open a new bid. 96 00:06:43,110 --> 00:06:51,330 As we know, so as we know, this country has basically a property, it has basically a property that 97 00:06:51,330 --> 00:06:55,200 it has high variance, that it has high. 98 00:06:55,680 --> 00:06:58,380 So what exactly is the meaning of this high variance? 99 00:06:58,680 --> 00:07:07,350 It means basically there is a huge difference between what exactly my prediction is and what is the 100 00:07:07,350 --> 00:07:08,430 actual data. 101 00:07:09,030 --> 00:07:11,610 That is the exact meaning of variance. 102 00:07:12,270 --> 00:07:13,260 So let's say. 103 00:07:14,300 --> 00:07:17,960 On training data at sea, on training data. 104 00:07:19,120 --> 00:07:24,340 Next year on training data, you have some good accuracy, let's say your accuracy is somewhere around 105 00:07:24,610 --> 00:07:25,920 97 percent. 106 00:07:26,050 --> 00:07:34,530 So it means it means my decision and my decision is able to learn relationship very properly. 107 00:07:35,140 --> 00:07:41,310 But when I have some unseen data, but I have some test data, then I have some test data. 108 00:07:41,860 --> 00:07:48,100 So it will not give me that much good prediction because what this country will look like, like this 109 00:07:48,400 --> 00:07:51,300 is nothing, but it is just having some kind of season. 110 00:07:51,760 --> 00:07:52,600 But what if. 111 00:07:52,960 --> 00:07:53,740 But what if. 112 00:07:54,830 --> 00:08:04,070 I have a data that has that has something of very outlier or I can see that is far behind, that is 113 00:08:04,070 --> 00:08:07,480 far beyond the range of the any one note of this. 114 00:08:07,490 --> 00:08:14,310 So it will basically give us the lowest score, 60 percent in case of test data. 115 00:08:14,480 --> 00:08:20,060 So this is basically my variance is in case of Triniti, that you have high score, whereas in case 116 00:08:20,060 --> 00:08:21,550 of test data you have to score. 117 00:08:21,890 --> 00:08:24,320 But what random will do? 118 00:08:24,710 --> 00:08:31,670 It will convert this high variance, this high variance into law, various. 119 00:08:32,560 --> 00:08:40,570 But the question is how how exactly this random forest will convert this high variance in blue areas, 120 00:08:40,570 --> 00:08:49,030 because you will notice, if you will notice that my decision tree one will get green on some particular 121 00:08:49,030 --> 00:08:49,500 data. 122 00:08:49,510 --> 00:08:54,470 So it will basically become expert for this particular subset. 123 00:08:54,640 --> 00:08:59,230 So it will basically give some kind of prediction, that decision to give some kind of prediction. 124 00:08:59,230 --> 00:09:05,350 And then season three will get very do some kind of prediction that we have our final prediction that 125 00:09:05,350 --> 00:09:06,700 we have a final prediction. 126 00:09:06,760 --> 00:09:13,670 So basically each and every decision will get trained on specifically some kind of data. 127 00:09:13,690 --> 00:09:16,620 So they basically become expert on the data. 128 00:09:16,810 --> 00:09:18,840 That's why they convert. 129 00:09:19,180 --> 00:09:23,230 That's why they convert this high that is into low variance. 130 00:09:23,380 --> 00:09:29,800 So this is basically a main characteristic of random forest that it can load that high variance which 131 00:09:29,800 --> 00:09:32,980 caused by this is entry into this low variance. 132 00:09:33,370 --> 00:09:35,170 So let me open a previous page. 133 00:09:35,560 --> 00:09:42,610 You will see what here you have decided to do season three so that our let's say you have a regression 134 00:09:42,610 --> 00:09:43,190 use case. 135 00:09:43,480 --> 00:09:45,070 So what RF will do? 136 00:09:45,370 --> 00:09:52,710 What I will do, it will consider mean of all the predictions given by this one. 137 00:09:52,750 --> 00:09:58,710 So what it will do, it will consider mean of devotion to duty three by three. 138 00:09:58,900 --> 00:10:06,130 So it will basically basically consider mean of this, whereas in case of classification it will basically 139 00:10:06,460 --> 00:10:08,020 go through it already. 140 00:10:08,020 --> 00:10:12,100 Or I can say it will consider more of all the prediction. 141 00:10:12,380 --> 00:10:18,280 Whoever has the highest count, Marella Forest, will consider it as my final prediction. 142 00:10:19,030 --> 00:10:21,390 That's what happened in case of classification. 143 00:10:21,550 --> 00:10:28,660 In case of regression, it will basically consider me of the interpretations and interpretation if I 144 00:10:28,810 --> 00:10:29,200 talk. 145 00:10:29,350 --> 00:10:31,840 So if I will talk about interpretation. 146 00:10:31,840 --> 00:10:32,100 Right. 147 00:10:32,260 --> 00:10:34,340 So it is difficult to plot then the forest? 148 00:10:34,360 --> 00:10:41,250 It is difficult for us because it is nothing but a collection of multiple decision trees. 149 00:10:41,560 --> 00:10:47,740 And you will see in this season, if you have one decision, you can easily interpret it, you can easily 150 00:10:47,740 --> 00:10:48,310 interpret it. 151 00:10:48,460 --> 00:10:55,000 If you have to block it, you can easily block that decision using some libraries of some libraries, 152 00:10:55,000 --> 00:10:57,370 using some plantings of export. 153 00:10:57,370 --> 00:11:01,630 Underscore this, which is exactly a library in Biton. 154 00:11:01,900 --> 00:11:10,460 And by doctors using this using both this one, you can easily plot your entry, but it is difficult 155 00:11:10,460 --> 00:11:15,310 to spot because it is nothing but a collection of, let's say, hundreds, and this isn't free. 156 00:11:15,520 --> 00:11:17,430 So it will become a little bit difficult. 157 00:11:17,440 --> 00:11:21,270 It will become politically difficult to interpret your forest. 158 00:11:21,430 --> 00:11:24,740 That is, if I can say that is a drawback of random forest. 159 00:11:25,150 --> 00:11:26,640 So that's all about the session. 160 00:11:26,680 --> 00:11:28,540 Hopefully you will love the session very much. 161 00:11:28,780 --> 00:11:32,200 And my easiest way of explaining the forest for you. 162 00:11:32,350 --> 00:11:34,210 I hope you will love the session very much. 163 00:11:34,240 --> 00:11:34,810 Thank you. 164 00:11:35,140 --> 00:11:36,010 Have a nice day. 165 00:11:36,190 --> 00:11:36,940 Keep learning. 166 00:11:36,940 --> 00:11:37,660 Keep growing. 167 00:11:37,900 --> 00:11:38,710 Keep practicing.