1 00:00:01,033 --> 00:00:03,433 Hello and welcome back to the course on Machine Learning. 2 00:00:03,433 --> 00:00:06,000 Today we're talking about decision trees. 3 00:00:06,000 --> 00:00:08,033 All right, let's get started. 4 00:00:08,033 --> 00:00:10,233 You might have heard the term cart. 5 00:00:10,233 --> 00:00:13,233 It stands for classification and regression trees. 6 00:00:13,366 --> 00:00:16,366 And this term is an umbrella term for. 7 00:00:16,366 --> 00:00:19,400 Two types of trees which are obviously classification trees. 8 00:00:19,400 --> 00:00:21,333 And regression trees. 9 00:00:21,333 --> 00:00:24,800 Now the difference is that classification trees, they help you. 10 00:00:24,800 --> 00:00:25,700 Classify your data. 11 00:00:25,700 --> 00:00:29,533 So they work with categorical variables such as male or female, apple or orange, 12 00:00:29,533 --> 00:00:32,366 or different. Types of colors and. Variables of that sort. 13 00:00:32,366 --> 00:00:33,900 Whereas regression trees, they are. 14 00:00:33,900 --> 00:00:37,233 Designed to help you predict outcomes which can be real numbers. 15 00:00:37,233 --> 00:00:41,166 So for instance, the salary of a person or the temperature 16 00:00:41,166 --> 00:00:43,300 that's going to be outside and things like that. 17 00:00:43,300 --> 00:00:44,933 So those are the two different types. And we're going. 18 00:00:44,933 --> 00:00:48,733 To be talking about classification trees in this section of the course. 19 00:00:49,200 --> 00:00:50,700 So here we've got an example with. 20 00:00:50,700 --> 00:00:54,266 Lots of points on our two dimensional scatterplot. 21 00:00:54,566 --> 00:00:56,233 Now how does a decision tree work. 22 00:00:56,233 --> 00:00:58,733 So what is going to do is cut it up into slices. 23 00:00:58,733 --> 00:01:00,900 In the several iterations. So let's have a look. 24 00:01:00,900 --> 00:01:02,700 So will be split one. 25 00:01:02,700 --> 00:01:04,333 There'll be split two. 26 00:01:04,333 --> 00:01:05,133 So split one. 27 00:01:05,133 --> 00:01:09,000 Split the data at x two equals 60. 28 00:01:09,000 --> 00:01:12,200 Split two splits our data at x1 equals 50. 29 00:01:12,833 --> 00:01:13,500 Split three. 30 00:01:13,500 --> 00:01:15,933 Split our data x1 equals 70. 31 00:01:15,933 --> 00:01:17,100 Then split four. 32 00:01:17,100 --> 00:01:19,533 Split our data at x2. 33 00:01:19,533 --> 00:01:21,633 It's not shown here, but it's about 20. 34 00:01:21,633 --> 00:01:24,166 So that is how our decision tree works. 35 00:01:24,166 --> 00:01:26,433 And the basis for these splits. 36 00:01:26,433 --> 00:01:28,966 So how are these splits selected. 37 00:01:28,966 --> 00:01:32,566 How does the algorithm know where to select the splits. 38 00:01:32,933 --> 00:01:35,533 Well basically if you have a look at it. Now and then. 39 00:01:35,533 --> 00:01:37,333 The split is done in. 40 00:01:37,333 --> 00:01:43,866 Such a way to maximize the number of a category in each of these splits. 41 00:01:43,866 --> 00:01:48,000 So to maximize for instance, we want maximum red categories here. 42 00:01:48,233 --> 00:01:50,166 And here it is still the same. 43 00:01:50,166 --> 00:01:53,033 But then the next split maximizes the number of green here 44 00:01:53,033 --> 00:01:54,333 and then a red here. 45 00:01:54,333 --> 00:01:56,533 That's a very basic way to explain it. 46 00:01:56,533 --> 00:01:57,466 In reality there's some. 47 00:01:57,466 --> 00:01:59,733 Complex mathematics happening in the background. 48 00:01:59,733 --> 00:02:02,766 The split is trying to minimize entropy. 49 00:02:03,200 --> 00:02:05,433 And and so it's informational. 50 00:02:05,433 --> 00:02:08,100 Entropy is a very interesting term. 51 00:02:08,100 --> 00:02:10,933 It would take hours and hours and hours for us to go 52 00:02:10,933 --> 00:02:12,433 through all of that right now. 53 00:02:12,433 --> 00:02:15,433 And so if you want to get into the deeper mathematics 54 00:02:15,433 --> 00:02:18,533 behind this algorithm, then you certainly can research that. 55 00:02:18,533 --> 00:02:19,033 But for us. 56 00:02:19,033 --> 00:02:21,666 It's sufficient that we're just looking for the optimal. 57 00:02:21,666 --> 00:02:22,466 Split, or the. 58 00:02:22,466 --> 00:02:26,300 Algorithm is going to find optimal splits that are going to maximize 59 00:02:26,300 --> 00:02:30,833 a number of different points in each one of these, new pockets. 60 00:02:30,833 --> 00:02:32,400 So they're actually called Leafs. 61 00:02:32,400 --> 00:02:34,200 So you've got the starting scatterplot. 62 00:02:34,200 --> 00:02:36,166 And then at the end you've got these leaves. 63 00:02:36,166 --> 00:02:39,166 And the final leaves are actually called a terminal leafs. 64 00:02:39,166 --> 00:02:40,266 So that's how the splits occur. 65 00:02:40,266 --> 00:02:44,400 Now let's rewind a bit and let's do that whole procedure again. 66 00:02:44,400 --> 00:02:46,366 But while we're performing the splits we're going to. 67 00:02:46,366 --> 00:02:48,566 Start constructing a decision tree. 68 00:02:48,566 --> 00:02:50,333 An actual decision tree. Let's have a look. 69 00:02:50,333 --> 00:02:52,366 So there's a split number one. 70 00:02:52,366 --> 00:02:54,633 And what it's doing is it's splitting our data. 71 00:02:54,633 --> 00:02:56,533 At the 60 level. 72 00:02:56,533 --> 00:03:00,600 So now let's construct a decision tree that's going to ask exactly that question. 73 00:03:00,600 --> 00:03:04,100 So is. X2 greater than. 60 or less than 60. 74 00:03:04,100 --> 00:03:06,800 So if it's greater than 60 it falls into one branch. 75 00:03:06,800 --> 00:03:09,433 If it's less than 60 it'll fall into the next branch. 76 00:03:09,433 --> 00:03:12,466 So there we go. X2 is less than 60 or no. 77 00:03:12,466 --> 00:03:13,833 Yes and no. 78 00:03:13,833 --> 00:03:14,866 Next is. 79 00:03:14,866 --> 00:03:17,033 Split two only splits. 80 00:03:17,033 --> 00:03:20,966 The data that is above 60 in the x2 variable. 81 00:03:21,466 --> 00:03:22,833 So let's have a look at that. 82 00:03:22,833 --> 00:03:25,866 We're only dealing with data that is above x2. 83 00:03:25,900 --> 00:03:27,300 So it's over here. 84 00:03:27,300 --> 00:03:29,000 And now we're checking. 85 00:03:29,000 --> 00:03:29,966 So I'm going back now. 86 00:03:29,966 --> 00:03:33,900 Now we're checking a split two happens at 50 for the X1 variable. 87 00:03:34,433 --> 00:03:36,900 So here we go. X1 is less than 50. 88 00:03:36,900 --> 00:03:39,833 Yes or no. So if and here you can see that right away. 89 00:03:39,833 --> 00:03:42,066 This split already can tell us whether. 90 00:03:42,066 --> 00:03:43,766 Something is green or red. 91 00:03:43,766 --> 00:03:48,000 So if it's, less so if we already above 60 92 00:03:48,000 --> 00:03:51,700 and then below 50, then it's green, which we can see here. 93 00:03:52,433 --> 00:03:55,566 If we are above 50, then it's red which we can see here. 94 00:03:55,566 --> 00:03:57,600 So that's how this classification works. 95 00:03:57,600 --> 00:03:59,266 And now let's deal with the remainder. 96 00:03:59,266 --> 00:04:02,266 So here we've got a split three happening at 70. 97 00:04:02,633 --> 00:04:05,200 If you're below 70 you're obviously going to be red. 98 00:04:05,200 --> 00:04:07,633 Otherwise we're going to need to do another split. 99 00:04:07,633 --> 00:04:10,433 So below 70 then it's red. 100 00:04:10,433 --> 00:04:12,400 No we got to do another split. 101 00:04:12,400 --> 00:04:16,100 Split for if it's above 20 then it's green. 102 00:04:16,100 --> 00:04:18,166 If it's below 20 then it's red. 103 00:04:18,166 --> 00:04:20,366 If it's above 20 then it's green. 104 00:04:20,366 --> 00:04:23,066 So that's a no. If it's below then it's a yes. 105 00:04:23,066 --> 00:04:24,866 So with these decision trees. 106 00:04:24,866 --> 00:04:29,833 A good way to structure them is to always keep yes yes yes yes on one side. 107 00:04:29,833 --> 00:04:34,033 So like if you if you're looking for yeses they're always going to the left 108 00:04:34,500 --> 00:04:36,500 looking for nos, going to the right or vice versa. 109 00:04:36,500 --> 00:04:37,800 I just don't mix them up. 110 00:04:37,800 --> 00:04:40,900 And then the terminal leaves will. 111 00:04:40,900 --> 00:04:45,166 Predict exactly what color or what class is left. 112 00:04:45,866 --> 00:04:48,700 But at the same time, even if you don't get to the terminal leaf, 113 00:04:48,700 --> 00:04:51,700 because this is a very simple tree, trees can be very, very long. 114 00:04:51,900 --> 00:04:52,866 And so sometimes. 115 00:04:52,866 --> 00:04:55,000 You might not even get to the bottom. 116 00:04:55,000 --> 00:04:58,133 So if you want to classify any observation and for example this observation falls 117 00:04:58,133 --> 00:05:03,966 into this section over here, then it would go down this road then here. 118 00:05:03,966 --> 00:05:04,900 And then we go here. 119 00:05:04,900 --> 00:05:07,266 But let's say you don't even get to the end. 120 00:05:07,266 --> 00:05:08,733 You get to somewhere over here. 121 00:05:08,733 --> 00:05:12,700 Then in these boxes that don't that have still they still have a mix. 122 00:05:12,700 --> 00:05:14,766 So here you can see is a mix of green and red. 123 00:05:14,766 --> 00:05:18,800 Then the rule here is that a probabilistic classification occurs. 124 00:05:18,800 --> 00:05:22,200 So here we know, instead of checking this last condition 125 00:05:22,200 --> 00:05:25,200 we'll just check what is the likelihood of it being green and red. 126 00:05:25,333 --> 00:05:27,500 So here we see that there's more green and red. 127 00:05:27,500 --> 00:05:29,600 So if we just going to leave it at this box 128 00:05:29,600 --> 00:05:33,100 then we'll just say that it's, it's a green dot. 129 00:05:33,133 --> 00:05:36,933 Whereas if we leave it at this whole box, if we just if we just do this part, 130 00:05:36,933 --> 00:05:38,800 so we only going to check the first condition 131 00:05:38,800 --> 00:05:42,733 and then we leave it here, then we would automatically say that it's a red dot. 132 00:05:42,733 --> 00:05:45,966 If we don't go down the decision tree and check more condition. 133 00:05:45,966 --> 00:05:46,600 So that's another 134 00:05:46,600 --> 00:05:50,033 way of using the decision tree instead of going down to the very end. 135 00:05:50,400 --> 00:05:53,500 You can stop at any point and then just use the probabilities 136 00:05:53,500 --> 00:05:55,500 to predict your classification. 137 00:05:55,500 --> 00:05:57,466 And another thing is that it doesn't. 138 00:05:57,466 --> 00:05:59,600 Always have to be the two variables. 139 00:05:59,600 --> 00:06:01,166 So for instance, in a decision tree, 140 00:06:01,166 --> 00:06:03,600 just like with any other machine learning algorithm, 141 00:06:03,600 --> 00:06:07,766 you can have a multidimensional data set which has lots of different columns. 142 00:06:07,766 --> 00:06:09,300 So in our case we only have two. 143 00:06:09,300 --> 00:06:11,100 But you might have lots and lots of columns. 144 00:06:11,100 --> 00:06:14,733 And then you could have a mix of questions being asked here. 145 00:06:15,066 --> 00:06:18,066 And that's up to the algorithm to come up with those questions. 146 00:06:18,333 --> 00:06:19,666 And as a final note, 147 00:06:19,666 --> 00:06:22,666 I wanted to mention a little bit about the history of decision trees 148 00:06:22,800 --> 00:06:27,500 decision trees have been around for very, very long time, and in fact, 149 00:06:27,500 --> 00:06:31,666 they are so old that they have recently kind of started to die off. 150 00:06:31,666 --> 00:06:34,833 They were still popular about 23 years ago, but recently 151 00:06:34,966 --> 00:06:38,400 more sophisticated methods have come to replace them. 152 00:06:38,700 --> 00:06:41,700 And decision trees stop being so popular. 153 00:06:41,933 --> 00:06:43,700 And that's continued for a while. 154 00:06:43,700 --> 00:06:47,800 Until recently, they were reborn with new upgrades. 155 00:06:47,966 --> 00:06:49,033 So to speak. 156 00:06:49,033 --> 00:06:54,700 And those upgrades are additional methods that build on top of decision trees. 157 00:06:54,700 --> 00:07:00,100 And such methods are random forest gradient boosting and other methods. 158 00:07:00,433 --> 00:07:03,600 And in this part of the course, we will look at least at one of those 159 00:07:03,866 --> 00:07:05,066 other methods. 160 00:07:05,066 --> 00:07:09,566 The point is that decision trees are, though, a very simple tool. 161 00:07:09,933 --> 00:07:13,933 They aren't very powerful on their own, but they're used in other methods 162 00:07:14,100 --> 00:07:16,966 that leverage their simplicity 163 00:07:16,966 --> 00:07:20,300 and create some very powerful machine learning algorithms. 164 00:07:20,300 --> 00:07:24,666 And such algorithms even are used to perform facial recognition, like. 165 00:07:24,666 --> 00:07:27,666 On your iPhone, you get, you have facial recognition. 166 00:07:27,666 --> 00:07:30,900 And also some games such as Kinect, which is kind of like the Wii. 167 00:07:31,200 --> 00:07:34,533 But, you can play it without actually holding a controller. 168 00:07:34,533 --> 00:07:37,900 So it's like a game for your addition to your Xbox, 169 00:07:38,266 --> 00:07:41,300 and you can play as without having a controller, your hand. 170 00:07:41,300 --> 00:07:44,700 So it kind of recognizes where you're moving your arms and legs. 171 00:07:45,166 --> 00:07:50,300 And that method, Microsoft decided to use random forests for that method. 172 00:07:50,633 --> 00:07:53,633 And random forests invoke decision trees. 173 00:07:53,700 --> 00:07:55,600 So hopefully you enjoyed this today's tutorial. 174 00:07:55,600 --> 00:07:58,366 It is quite a simple method, but at the same time it lies. 175 00:07:58,366 --> 00:08:01,366 In the foundation of some of the more modern 176 00:08:01,366 --> 00:08:04,300 and more powerful methods in machine learning. 177 00:08:04,300 --> 00:08:05,566 I look forward to seeing you next time. 178 00:08:05,566 --> 00:08:07,366 And until then, enjoy machine learning.