1 00:00:00,333 --> 00:00:00,900 All right. 2 00:00:00,900 --> 00:00:03,233 And now time for the little extra bonus. 3 00:00:03,233 --> 00:00:05,500 Let's plot the decision tree. 4 00:00:05,500 --> 00:00:08,800 So in order to have the best interpretation, we are going to remove 5 00:00:08,900 --> 00:00:10,133 the feature scaling. 6 00:00:10,133 --> 00:00:15,233 And to do this we're going to clear everything and re-execute the code. 7 00:00:15,533 --> 00:00:18,166 But without features scaling. So let's do this. 8 00:00:18,166 --> 00:00:22,600 We're going to clear the data here by clicking on this button here. 9 00:00:22,600 --> 00:00:26,966 And yes then clear the plots here. Yes. 10 00:00:27,633 --> 00:00:30,966 And then clear the console as well by typing Ctrl L. 11 00:00:31,266 --> 00:00:32,833 Here we go. All right. 12 00:00:32,833 --> 00:00:34,000 Now everything is clear. 13 00:00:34,000 --> 00:00:36,933 We can re-execute the whole script. 14 00:00:36,933 --> 00:00:38,600 So let's do it step by step. 15 00:00:38,600 --> 00:00:42,366 We're first going to take all the pre-processing step 16 00:00:42,366 --> 00:00:44,866 but without the feature scaling here. 17 00:00:44,866 --> 00:00:47,033 As you can see I'm not selecting this. 18 00:00:47,033 --> 00:00:47,400 All right. 19 00:00:47,400 --> 00:00:50,100 So now command and control plus enter to execute. 20 00:00:51,066 --> 00:00:52,033 All good. 21 00:00:52,033 --> 00:00:55,966 And now you can see that if we go to our data set training 22 00:00:55,966 --> 00:01:01,000 set and test set you can see that the features are no longer scaled. 23 00:01:01,600 --> 00:01:04,633 All right so for example in the training set we have the real age. 24 00:01:04,666 --> 00:01:10,500 I mean with their real values and the real estimated salaries the real values okay. 25 00:01:10,500 --> 00:01:11,433 So nothing scaled. 26 00:01:11,433 --> 00:01:17,333 Now let's select this to fit the decision tree classifier to the training set. 27 00:01:17,400 --> 00:01:20,400 So command and control plus enter to execute. 28 00:01:20,533 --> 00:01:22,700 Here we go. Now it's fitted. 29 00:01:22,700 --> 00:01:26,033 By the way I'm just going to replace this by 30 00:01:27,566 --> 00:01:30,500 decision tree 31 00:01:30,500 --> 00:01:33,100 classification. 32 00:01:33,100 --> 00:01:34,633 All right. 33 00:01:34,633 --> 00:01:38,733 Which we are really interested in right now is to plot the decision tree. 34 00:01:38,766 --> 00:01:41,766 So we're going to add a new section here below. 35 00:01:42,066 --> 00:01:44,566 And I'm going to call this section plotting 36 00:01:45,933 --> 00:01:48,933 the Decision Tree. 37 00:01:49,600 --> 00:01:50,333 And as I 38 00:01:50,333 --> 00:01:54,533 told you in the Python tutorial this is going to take two lines. 39 00:01:54,833 --> 00:01:55,933 This is very simple. 40 00:01:55,933 --> 00:01:57,100 So let's just do it. 41 00:01:57,100 --> 00:01:59,800 It's going to be very quick okay. 42 00:01:59,800 --> 00:02:03,566 And besides the two lines that we need to write could not be more simple. 43 00:02:03,833 --> 00:02:06,900 Because indeed what we need to type here is plot. 44 00:02:07,366 --> 00:02:10,366 And then in parenthesis classifier 45 00:02:10,700 --> 00:02:16,000 because you know the classifier here is the classifier that we created here 46 00:02:16,000 --> 00:02:19,966 in this part using the r part function of the R library. 47 00:02:20,533 --> 00:02:23,800 And that's all we just need to type plot classifier. 48 00:02:23,800 --> 00:02:25,400 And that will plot the tree. 49 00:02:25,400 --> 00:02:29,100 But without the labels without the conditions written explicitly. 50 00:02:29,733 --> 00:02:33,200 So in order to add these conditions written explicitly, 51 00:02:33,533 --> 00:02:36,200 we just need to add below text 52 00:02:37,266 --> 00:02:38,700 classifier. 53 00:02:38,700 --> 00:02:40,533 Here we go. And now it's ready. 54 00:02:40,533 --> 00:02:43,100 Isn't that simple? There's only two lines. 55 00:02:43,100 --> 00:02:46,066 We'll plot an interpretable decision tree. 56 00:02:46,066 --> 00:02:47,533 Let's check it out. 57 00:02:47,533 --> 00:02:51,600 I'm going to select all this and press 58 00:02:51,600 --> 00:02:54,600 Command and Control plus enter to execute. 59 00:02:55,333 --> 00:02:56,800 And here is the tree. 60 00:02:56,800 --> 00:02:59,633 So as you can see we have at each 61 00:02:59,633 --> 00:03:02,633 split the condition that is generating the splits. 62 00:03:02,700 --> 00:03:06,300 So for example the first split is made 63 00:03:06,433 --> 00:03:11,366 based upon the condition age lower than 44.5 years old. 64 00:03:11,866 --> 00:03:17,100 So that means that if the user is below 44.5 years old, 65 00:03:17,133 --> 00:03:20,266 he will go to this subcategory after the split. 66 00:03:20,666 --> 00:03:23,766 And if the user is older than 44.5 years old, 67 00:03:24,100 --> 00:03:27,100 he will end up in this subcategory of this split. 68 00:03:27,866 --> 00:03:29,533 And then we have some new conditions. 69 00:03:29,533 --> 00:03:33,233 So here and you condition on the other independent variable. 70 00:03:33,266 --> 00:03:36,366 The estimated salary in this condition is estimated 71 00:03:36,366 --> 00:03:39,866 salary below $90,000. 72 00:03:40,200 --> 00:03:45,300 So what that means is that if the user is younger than 44.5 years old 73 00:03:45,600 --> 00:03:51,266 and has an estimated salary below $90,000, then according to our decision tree 74 00:03:51,266 --> 00:03:56,333 classifier, this user won't buy the SUV because the result here is zero. 75 00:03:57,033 --> 00:04:00,300 And if the user is younger than 44.5 76 00:04:00,300 --> 00:04:03,300 years old and has an estimated salary over 77 00:04:03,300 --> 00:04:07,266 $90,000, then according to our decision tree classifier, 78 00:04:07,366 --> 00:04:11,000 this user will buy the SUV because the result here is one. 79 00:04:11,700 --> 00:04:12,033 Okay. 80 00:04:12,033 --> 00:04:14,566 And if we go to the other side of the tree, 81 00:04:14,566 --> 00:04:18,500 well, this other side of the tree first contains all the users. 82 00:04:18,700 --> 00:04:21,500 There are older than 44.5 years old. 83 00:04:21,500 --> 00:04:25,033 And then we have some new conditions generating new splits. 84 00:04:25,066 --> 00:04:27,233 So another condition on the age here, 85 00:04:27,233 --> 00:04:29,833 then another condition on the estimated salary. 86 00:04:29,833 --> 00:04:33,666 And then again another condition on the estimated salary and following yes or no. 87 00:04:33,666 --> 00:04:36,900 This conditions we end up in some final categories. 88 00:04:36,900 --> 00:04:41,100 The final nodes of the decision tree, where the user is predicted 89 00:04:41,100 --> 00:04:45,733 not to buy the SUV for this node and predicted to buy the SUV 90 00:04:45,900 --> 00:04:48,833 for this node and this node and this node. 91 00:04:48,833 --> 00:04:51,666 Okay, so that was worth looking at this. 92 00:04:51,666 --> 00:04:56,000 We sort of explored this decision tree classifier behind the scenes. 93 00:04:56,233 --> 00:04:59,700 And what's important to understand now when you are using decision trees 94 00:04:59,700 --> 00:05:02,933 is that a big advantage of this classifier is that we can have 95 00:05:02,933 --> 00:05:06,066 these very interpretable results. 96 00:05:06,066 --> 00:05:10,300 Because here on this decision tree plot, we can see everything that's happening. 97 00:05:10,300 --> 00:05:13,466 We can see how the decision tree decides 98 00:05:13,533 --> 00:05:18,266 whether a user will be predicted to not buy the SUV or to buy the SUV. 99 00:05:18,300 --> 00:05:23,100 We clearly see the whole thinking process of the decision tree, and in some way, 100 00:05:23,100 --> 00:05:26,333 we saw how it learned from the data 101 00:05:26,500 --> 00:05:29,500 how to classify each of our user of the social network. 102 00:05:29,900 --> 00:05:30,266 All right. 103 00:05:30,266 --> 00:05:33,600 So I look forward to seeing you in the next section about Random Forest. 104 00:05:33,900 --> 00:05:36,700 We will implement Random Forest on Python in R, 105 00:05:36,700 --> 00:05:39,700 and I can't wait to show you the final graphic results. 106 00:05:39,866 --> 00:05:41,766 Until next time, enjoy machine learning.