1 00:00:00,933 --> 00:00:03,133 Hello and welcome back to the course of Machine Learning. 2 00:00:03,133 --> 00:00:06,400 Today we're talking about the random forest and the intuition behind it. 3 00:00:06,766 --> 00:00:09,433 And specifically we're going to be talking about the random forest 4 00:00:09,433 --> 00:00:12,500 applied to regression trees rather than classification trees. 5 00:00:12,833 --> 00:00:14,300 But the concept is very similar. 6 00:00:14,300 --> 00:00:16,033 And you'll find that this tutorial is very similar 7 00:00:16,033 --> 00:00:19,033 to the one for the random forest on classification trees. 8 00:00:19,533 --> 00:00:22,900 All right, so random forest and ensemble learning. 9 00:00:22,900 --> 00:00:24,166 Ensemble learning. 10 00:00:24,166 --> 00:00:26,566 So Random forest is a version of ensemble learning. 11 00:00:26,566 --> 00:00:31,166 You've got other versions such as gradient boosting and ensemble 12 00:00:31,266 --> 00:00:36,500 learning is when you take multiple algorithms 13 00:00:36,866 --> 00:00:40,900 or the same algorithm multiple times, and you put them together 14 00:00:40,900 --> 00:00:43,900 to make something much more powerful than the original. 15 00:00:43,900 --> 00:00:45,466 And let's see how this works. 16 00:00:45,466 --> 00:00:49,233 So when you pick a random k data points from your training set. 17 00:00:49,233 --> 00:00:53,133 So now we're kind of going to leverage a lot of what we talked about 18 00:00:53,466 --> 00:00:56,266 in the section on regression trees. 19 00:00:56,266 --> 00:01:00,066 So you remember there we had lots of data points. 20 00:01:00,066 --> 00:01:01,800 And then we built a regression tree. 21 00:01:01,800 --> 00:01:05,033 And or we built the decision tree and used that 22 00:01:05,500 --> 00:01:08,700 to, forecast the value. 23 00:01:08,700 --> 00:01:12,733 That would be assigned or the Y value for any, new element that would be 24 00:01:12,733 --> 00:01:15,766 added to our data set as the average in the terminal leaves, basically. 25 00:01:16,100 --> 00:01:18,566 So here what we're doing is we're using the whole data set. 26 00:01:18,566 --> 00:01:19,033 We had. 27 00:01:19,033 --> 00:01:22,033 And we're only picking k data points from their training set. 28 00:01:22,400 --> 00:01:25,066 Then we're going to build a decision tree 29 00:01:25,066 --> 00:01:28,066 associated to these k data points. 30 00:01:28,200 --> 00:01:31,566 Rather than building a decision tree based on everything in your data set, 31 00:01:31,700 --> 00:01:34,800 you just building a decision tree based on those data points 32 00:01:34,800 --> 00:01:37,800 that just like sort of a subset of your data set, 33 00:01:38,033 --> 00:01:39,700 then you choose the number of trees 34 00:01:39,700 --> 00:01:41,633 that you want to build and you repeat steps on and to. 35 00:01:41,633 --> 00:01:43,800 So you just keep building and building and building these trees. 36 00:01:43,800 --> 00:01:47,633 You're building a lot of regression decision trees 37 00:01:48,033 --> 00:01:51,733 and then finally you to use all of them to predict. 38 00:01:51,733 --> 00:01:55,100 So for a new data point, make each one of you and trees 39 00:01:55,100 --> 00:02:00,166 predict the value of y for the data point in question and assign 40 00:02:00,166 --> 00:02:04,433 the new data point the average across all of the predicted Y values. 41 00:02:04,466 --> 00:02:05,466 So basically, 42 00:02:05,466 --> 00:02:09,066 instead of just getting one prediction, you're getting lots of predictions 43 00:02:09,066 --> 00:02:09,966 by default. 44 00:02:09,966 --> 00:02:13,300 Usually these algorithms are set to about 500 trees at least. 45 00:02:13,633 --> 00:02:17,100 So you're getting 500 predictions for the value of y. 46 00:02:17,566 --> 00:02:19,533 And then you're taking the average across those. 47 00:02:19,533 --> 00:02:22,366 And in that way, you're not just 48 00:02:22,366 --> 00:02:25,366 predicting based on one tree predicting or based on the forest of trees. 49 00:02:25,566 --> 00:02:29,333 And that improves, the accuracy of your prediction because it is 50 00:02:29,866 --> 00:02:31,766 you're taking the average of many predictions. 51 00:02:31,766 --> 00:02:36,366 And therefore, even if one is, some forest difference and somehow, 52 00:02:36,933 --> 00:02:39,866 one of the decision trees was built exactly, 53 00:02:39,866 --> 00:02:43,233 perfectly because the weight of those data points were selected. 54 00:02:43,233 --> 00:02:46,500 It just didn't turn out as a perfect tree or a great tree. 55 00:02:46,500 --> 00:02:50,400 Even, if you were using it by itself, you'd get a bad prediction, 56 00:02:50,433 --> 00:02:53,333 because using the average, it is less likely. 57 00:02:53,333 --> 00:02:55,933 So you're going to get a more accurate prediction and more. 58 00:02:55,933 --> 00:02:59,033 And the second thing is that they're more stable algorithms like this, 59 00:02:59,533 --> 00:03:01,466 ensemble algorithms are more stable 60 00:03:01,466 --> 00:03:05,766 because any changes in your data set could really impact one tree. 61 00:03:05,766 --> 00:03:10,633 But to, for them to, really impact a forest of trees, it's much harder. 62 00:03:10,633 --> 00:03:14,800 So therefore ensemble is much more powerful in that way. 63 00:03:15,300 --> 00:03:18,100 And what this reminds me of is the game 64 00:03:18,100 --> 00:03:24,900 that is often played at fairs or parties and things like that, where you have a jar 65 00:03:24,900 --> 00:03:29,266 and inside this jar there's lots and lots of, for instance, jelly beans. 66 00:03:29,266 --> 00:03:34,466 Or it could be marbles, or they could be like a huge net with balloons inside it. 67 00:03:34,466 --> 00:03:38,100 And, we have one in the mall sometimes where 68 00:03:38,100 --> 00:03:41,733 there's lots of balloons inside a net, in the ceiling. 69 00:03:42,200 --> 00:03:45,200 And you need to guess how many balloons there are and who have a 70 00:03:45,200 --> 00:03:48,266 guess is will get like a car can win a car. 71 00:03:48,266 --> 00:03:51,900 And it's like a crazy prize for just guessing number of balloons. 72 00:03:52,300 --> 00:03:56,933 And although this is not an example of specifically a, 73 00:03:57,133 --> 00:04:00,400 random forest or regression on forest method, 74 00:04:00,700 --> 00:04:04,300 it's still an example of an ensemble type of method. 75 00:04:04,466 --> 00:04:09,733 So the best way or one of the ways to beat that game 76 00:04:09,733 --> 00:04:13,033 when you need to guess the number of marbles in a jar, 77 00:04:13,033 --> 00:04:16,200 for instance, is not to actually go and guess, 78 00:04:16,366 --> 00:04:20,600 but it's actually to get a pen and a paper and stand next to the person 79 00:04:20,600 --> 00:04:23,700 that's holding this jar, or that's conducting this event, 80 00:04:24,100 --> 00:04:26,966 and you just stand next to them, and then you wait for other people 81 00:04:26,966 --> 00:04:27,766 to come and guess. 82 00:04:27,766 --> 00:04:31,700 Every time somebody guesses, you just ask them as soon as they're like, guess. 83 00:04:31,700 --> 00:04:35,533 And then walking away, you ask them, hey, because usually they write down 84 00:04:35,533 --> 00:04:39,733 their number and they put it inside a, like an envelope or something. 85 00:04:39,733 --> 00:04:42,433 And then there is the winner is announced later on. 86 00:04:42,433 --> 00:04:45,233 So they don't know whether they guessed right or wrong, but regardless of that, 87 00:04:45,233 --> 00:04:48,166 they're walking away and you just ask them, hey, what number did you guess? 88 00:04:48,166 --> 00:04:51,066 And you just write down their number, and then the next person guesses 89 00:04:51,066 --> 00:04:52,966 and you write their number down, and you write their number, 90 00:04:52,966 --> 00:04:56,300 and you keep writing the numbers down, and you just keep doing that until 91 00:04:56,300 --> 00:05:00,466 you have like a substantial number of, entries, maybe 100 92 00:05:00,466 --> 00:05:04,366 or maybe if it's a very popular contest and people are guessing like crazy, like, 93 00:05:04,700 --> 00:05:07,700 trying to attempt or attempting the guessing, 94 00:05:07,733 --> 00:05:10,500 then you might even get, like a couple of hundred. 95 00:05:10,500 --> 00:05:11,933 Or even if you're very determined, 96 00:05:11,933 --> 00:05:14,933 you might get a thousand of entries over a couple of days. 97 00:05:15,133 --> 00:05:17,400 And then what you do is you just average them out. 98 00:05:17,400 --> 00:05:20,266 Or if you don't want to maybe take the median. 99 00:05:20,266 --> 00:05:23,333 If you don't want outliers like people just guessing random numbers 100 00:05:23,333 --> 00:05:26,533 like 1 or 5 million, so you don't want them to affect you. 101 00:05:26,533 --> 00:05:28,533 You just take the outliers out and then you average out 102 00:05:28,533 --> 00:05:31,100 anyway, you either average it out or you take the median. 103 00:05:31,100 --> 00:05:35,566 And statistically speaking, you have a much higher likelihood 104 00:05:35,900 --> 00:05:39,766 of being closer to the truth if you take the average of people 105 00:05:39,766 --> 00:05:43,833 because people are natural beings and they are kind of the visual perception 106 00:05:43,833 --> 00:05:46,833 will be most likely normally distributed. 107 00:05:47,166 --> 00:05:51,766 And therefore you once you hit the middle of that normal distribution, 108 00:05:51,766 --> 00:05:54,966 you are more likely to be on the money than any one of them. 109 00:05:55,266 --> 00:05:58,800 And that's pretty cool concept that that's an example of an ensemble method 110 00:05:58,800 --> 00:06:02,800 where you're taking instead of just throwing that guess by yourself, 111 00:06:02,800 --> 00:06:06,700 or taking the guess of one individual person, you're averaging out across 112 00:06:06,700 --> 00:06:11,800 multiple guesses, and you're more likely to be the closest one to the truth. 113 00:06:12,133 --> 00:06:14,800 And if the prize is given not just to the person 114 00:06:14,800 --> 00:06:18,300 that gets the spot on, but to the person that guesses closest to the truth, 115 00:06:18,300 --> 00:06:21,733 then you've got yourself a very powerful advantage. 116 00:06:22,200 --> 00:06:23,400 using, data science. 117 00:06:23,400 --> 00:06:27,133 So, if you if you have the patience and determination, then try it out. 118 00:06:27,133 --> 00:06:29,666 Next time you see one of these games and see how you go. 119 00:06:29,666 --> 00:06:31,266 Would love to hear back from you, 120 00:06:31,266 --> 00:06:33,900 because I never have the patience to stand there and just count. 121 00:06:33,900 --> 00:06:38,100 But it is, it is a statistical approach to a challenge like that. 122 00:06:38,633 --> 00:06:40,600 So hopefully you enjoyed today's tutorial. 123 00:06:40,600 --> 00:06:41,933 I look forward to seeing you next time. 124 00:06:41,933 --> 00:06:43,800 Until then, enjoy machine learning.