1 00:00:00,833 --> 00:00:03,066 Hello and welcome back to the course on Machine Learning. 2 00:00:03,066 --> 00:00:05,966 Today we're talking about random forests. 3 00:00:05,966 --> 00:00:07,466 Let's have a look inside. 4 00:00:07,466 --> 00:00:10,000 So first thing we're going to learn is a new concept. 5 00:00:10,000 --> 00:00:12,900 Ensemble learning. What is ensemble learning. 6 00:00:12,900 --> 00:00:17,700 Ensemble learning is when you take multiple machine learning algorithms 7 00:00:17,700 --> 00:00:21,233 and put them together to create one bigger machine learning algorithm. 8 00:00:21,233 --> 00:00:24,800 So that machine learning algorithm, the final one is actually using 9 00:00:24,800 --> 00:00:28,066 a leveraging many different other machine learning algorithms. 10 00:00:28,066 --> 00:00:31,066 And they can be the same machine learning algorithm as we will see today. 11 00:00:31,400 --> 00:00:32,633 We're going to be looking at the random 12 00:00:32,633 --> 00:00:36,366 forest method which combines lots of decision tree methods. 13 00:00:36,366 --> 00:00:40,666 So instead of running a decision tree method, which we talked about earlier, 14 00:00:40,666 --> 00:00:44,733 instead of running it once, we're going to run it multiple times. 15 00:00:44,733 --> 00:00:46,033 And that will give us a random forest. 16 00:00:46,033 --> 00:00:49,166 So let's have a look at the step by step process to understand 17 00:00:49,166 --> 00:00:50,700 how all of this happens. 18 00:00:50,700 --> 00:00:54,600 So step one is you pick at random k data points from the training set. 19 00:00:55,100 --> 00:00:59,000 Then you build a decision tree associated to those k data points. 20 00:00:59,333 --> 00:01:00,766 So instead of building a decision tree 21 00:01:00,766 --> 00:01:03,000 based on everything you have in your data set, 22 00:01:03,000 --> 00:01:06,666 you build a decision tree just based on some of the data points that you have. 23 00:01:06,666 --> 00:01:09,666 So kind of like a subset of all of your data set. 24 00:01:09,666 --> 00:01:13,066 Next, you choose the number of trees you want to build, and you repeat steps 25 00:01:13,066 --> 00:01:14,200 one and two. 26 00:01:14,200 --> 00:01:18,066 And then once you have all of those trees and you have a new data point. 27 00:01:18,066 --> 00:01:20,100 So when you want to check where and you thought about falls 28 00:01:20,100 --> 00:01:24,600 or how is classified for a new data point, you make each one of your entry trees 29 00:01:24,600 --> 00:01:29,433 predict the category of which the data point belongs to, and then you assign 30 00:01:29,433 --> 00:01:32,433 the new data point to the category that wins the majority vote. 31 00:01:32,600 --> 00:01:34,766 So that's how a random forest works. 32 00:01:34,766 --> 00:01:38,233 So basically you start off with one tree and then you build another tree, 33 00:01:38,233 --> 00:01:39,100 another tree, another tree. 34 00:01:39,100 --> 00:01:40,500 And each one of those trees 35 00:01:40,500 --> 00:01:43,900 is being built on a randomly selected subset from your data. 36 00:01:44,333 --> 00:01:47,866 And even though each one of those trees might not be ideal 37 00:01:48,033 --> 00:01:50,800 overall, on average they can perform very well. 38 00:01:50,800 --> 00:01:53,533 And that's a major advantage of this algorithm. 39 00:01:53,533 --> 00:01:57,100 It's kind of leveraging the power of the crowd, so to speak, 40 00:01:57,100 --> 00:01:58,800 instead of just relying on one tree. 41 00:01:58,800 --> 00:02:01,766 It's checking what all the trees are going to say, 42 00:02:01,766 --> 00:02:05,600 and then just taking the majority vote and deciding the classification 43 00:02:05,600 --> 00:02:10,800 based on that and that power of numbers can help get rid of certain errors and 44 00:02:10,800 --> 00:02:14,766 certain uncertainties in your algorithm and make it more precise. 45 00:02:15,333 --> 00:02:18,900 And in fact, it's such a good solution that when Microsoft were developing 46 00:02:18,900 --> 00:02:23,400 Kinect, you know, this device that allows you to play games on your television, 47 00:02:23,733 --> 00:02:26,733 that little device over there, it attaches to Xbox, 48 00:02:26,900 --> 00:02:29,100 and then you can play games without any controller. 49 00:02:29,100 --> 00:02:32,700 So here that device is using an infrared grid 50 00:02:32,900 --> 00:02:37,233 to understand where the hands, arms, heads 51 00:02:37,233 --> 00:02:40,533 and other parts of the body of these people are located and how they're moving. 52 00:02:41,033 --> 00:02:44,700 And it's using machine learning to understand how 53 00:02:44,700 --> 00:02:48,333 the body parts are moving and where exactly they're located in space. 54 00:02:48,666 --> 00:02:53,500 So when Microsoft was developing Kinect, they decided to go with the Random Forest 55 00:02:53,500 --> 00:02:57,166 algorithm over all of the other machine learning algorithms 56 00:02:57,166 --> 00:03:01,600 that were available to them, and use the random forest to develop 57 00:03:01,833 --> 00:03:04,800 this sophisticated piece of hardware. 58 00:03:04,800 --> 00:03:05,933 Software. 59 00:03:05,933 --> 00:03:08,666 And actually, they have a interesting article about it. 60 00:03:08,666 --> 00:03:09,833 So I'm just going to show it to you. 61 00:03:09,833 --> 00:03:12,766 Now with you, you can find on the internet. 62 00:03:12,766 --> 00:03:14,833 So it's at microsoft.com. 63 00:03:14,833 --> 00:03:16,500 This is from there. 64 00:03:16,500 --> 00:03:18,900 And you can definitely find it there. 65 00:03:18,900 --> 00:03:21,800 It's called real time human pose recognition parts 66 00:03:21,800 --> 00:03:23,766 from single depth images. 67 00:03:23,766 --> 00:03:25,666 And here it explains exactly. 68 00:03:25,666 --> 00:03:29,633 So you can actually see the random forest in action. 69 00:03:29,633 --> 00:03:30,800 So you can see that 70 00:03:30,800 --> 00:03:34,400 this is similar to what we were talking about before in the classification trees. 71 00:03:34,400 --> 00:03:36,300 But here it's actually using random forests 72 00:03:36,300 --> 00:03:39,166 to understand where body parts are. 73 00:03:39,166 --> 00:03:42,133 And then based on that the device 74 00:03:42,133 --> 00:03:45,266 finds, what it needs to do in that computer game. 75 00:03:45,700 --> 00:03:47,100 So that's how it works. 76 00:03:47,100 --> 00:03:48,033 And so here, 77 00:03:48,033 --> 00:03:52,266 if I just search for the word forest, you'll see that, decision forest. 78 00:03:52,266 --> 00:03:53,966 Decision forest. The decision forest. 79 00:03:53,966 --> 00:03:58,000 And they actually explained that, they were able to achieve faster speeds, 80 00:03:58,366 --> 00:04:01,966 faster processing with the decision forest and therefore that, you know, 81 00:04:01,966 --> 00:04:05,166 reduced the cost of the hardware that they required for this tool. 82 00:04:05,700 --> 00:04:07,033 interesting article. 83 00:04:07,033 --> 00:04:07,500 check it out. 84 00:04:07,500 --> 00:04:10,966 If you want to learn a bit more about a real life practical application 85 00:04:10,966 --> 00:04:14,000 of a decision forest or a random forest. 86 00:04:14,266 --> 00:04:15,800 And that's it for today's tutorial. 87 00:04:15,800 --> 00:04:20,333 I hope you enjoyed learning a bit about an ensemble type of machine 88 00:04:20,333 --> 00:04:21,133 learning algorithm. 89 00:04:21,133 --> 00:04:24,366 Definitely the practical side is going to be quite interesting as well, 90 00:04:24,666 --> 00:04:26,266 and I look forward to seeing you next time. 91 00:04:26,266 --> 00:04:27,966 Until then, enjoy machine learning.