1 00:00:00,400 --> 00:00:03,400 But that's because we have y pred this way in our templates. 2 00:00:03,566 --> 00:00:07,433 And if you don't want this format of widespread, well, 3 00:00:07,433 --> 00:00:11,300 you just need to add a simple argument, which is type here 4 00:00:11,933 --> 00:00:15,566 the type equals and you just need to input class. 5 00:00:16,433 --> 00:00:17,700 Okay let's try. 6 00:00:17,700 --> 00:00:20,700 Let's try to execute this line again. 7 00:00:21,033 --> 00:00:22,033 And here's why. 8 00:00:22,033 --> 00:00:24,933 Let's have a look at Y right now. Y pred. 9 00:00:26,700 --> 00:00:27,400 There you go. 10 00:00:27,400 --> 00:00:28,800 Now why is vector. 11 00:00:28,800 --> 00:00:32,766 As you can see, for each observation of the test set, that is for each user 12 00:00:32,766 --> 00:00:36,333 of the test set we have like before the prediction 13 00:00:36,333 --> 00:00:39,533 0 or 1 for each user zero. 14 00:00:39,533 --> 00:00:42,300 If the user is predicted not to buy the SUV, 15 00:00:42,300 --> 00:00:45,566 and one if the user is predicted to buy the SUV. 16 00:00:45,800 --> 00:00:48,800 According to our decision tree classifier. 17 00:00:49,733 --> 00:00:52,500 Okay, so that's a little thing to change here. 18 00:00:52,500 --> 00:00:57,800 Make sure that you know your Y is your dependent vector of results. 19 00:00:57,833 --> 00:00:59,433 Zero one that we were used to. 20 00:00:59,433 --> 00:01:02,600 Because here, as you can see, we use the same predict function 21 00:01:02,600 --> 00:01:06,266 with the two arguments classifier and new data equals grid set. 22 00:01:06,900 --> 00:01:09,333 So that means that this won't work 23 00:01:09,333 --> 00:01:13,200 because this is supposed to be a vector of prediction result. 24 00:01:13,466 --> 00:01:15,766 Only this time it's for all the pixel points. 25 00:01:15,766 --> 00:01:18,766 You know, the imaginary pixel point users in the grid. 26 00:01:19,066 --> 00:01:22,600 But since the predict function is associated to the classifier, 27 00:01:22,633 --> 00:01:24,766 which is the decision tree classifier, 28 00:01:24,766 --> 00:01:28,333 then if we only keep these two arguments here, then this will make any sense, 29 00:01:28,333 --> 00:01:33,000 because this will return y grid as a matrix of the two probabilities, 30 00:01:33,666 --> 00:01:35,966 and therefore here we will have some problem 31 00:01:35,966 --> 00:01:40,300 because it will be a matrix of a matrix, whereas here it's supposed to be a vector. 32 00:01:40,666 --> 00:01:43,733 So what we only need to do, and we will do it now 33 00:01:43,733 --> 00:01:47,566 so that we don't forget is to add this type parameter. 34 00:01:47,933 --> 00:01:50,933 And we will set it equal to class. 35 00:01:51,066 --> 00:01:53,433 And then it will work perfectly. 36 00:01:53,433 --> 00:01:54,600 So we'll copy this 37 00:01:56,266 --> 00:01:59,566 and add it here as well. 38 00:01:59,933 --> 00:02:01,266 Perfect. And now it's ready. 39 00:02:01,266 --> 00:02:04,666 Now it will plot the graph without any errors. 40 00:02:05,033 --> 00:02:07,966 So I know I gave you a template that is supposed to work 41 00:02:07,966 --> 00:02:11,300 without changing anything to plot the classifications. 42 00:02:11,633 --> 00:02:12,166 I'm sorry. 43 00:02:12,166 --> 00:02:15,233 Sometimes we need to change a little few stuff and that's why we need 44 00:02:15,233 --> 00:02:20,633 to, you know, execute each of the lines one by one to see if it's as it should be. 45 00:02:21,000 --> 00:02:21,900 And besides, yes, 46 00:02:21,900 --> 00:02:25,800 we would have encountered some issues if we you know, computed the confusion 47 00:02:25,800 --> 00:02:29,433 matrix this way with this y as a matrix of probabilities. 48 00:02:29,733 --> 00:02:33,600 But now it will be fine because y is set the correct way. 49 00:02:34,033 --> 00:02:37,333 So we'll execute this and look at the number of incorrect predictions. 50 00:02:37,633 --> 00:02:38,566 All right. 51 00:02:38,566 --> 00:02:41,400 Now let's enter CM here. 52 00:02:41,400 --> 00:02:45,933 And we have six plus 11 equals 17 incorrect predictions. 53 00:02:46,233 --> 00:02:48,500 So now let's see if we were right 54 00:02:48,500 --> 00:02:52,700 to change our code this way so that we can plot the graph. 55 00:02:52,700 --> 00:02:54,166 Let's see if it will work. 56 00:02:54,166 --> 00:02:55,966 I hope it will work because I want to show you 57 00:02:55,966 --> 00:02:59,933 the decision tree prediction regions and prediction boundary. 58 00:02:59,966 --> 00:03:01,500 I really want to show you this. 59 00:03:01,500 --> 00:03:04,566 For those of you who didn't follow the Python tutorial of course. 60 00:03:04,933 --> 00:03:09,433 So let's select this and let's see if we made a good job. 61 00:03:10,900 --> 00:03:12,366 All right looks good so far. 62 00:03:12,366 --> 00:03:14,300 Looks good. No errors. 63 00:03:14,300 --> 00:03:17,300 Let's see what happens. 64 00:03:17,600 --> 00:03:18,900 And we were right. 65 00:03:18,900 --> 00:03:21,233 This works perfectly well. 66 00:03:21,233 --> 00:03:23,333 That's the decision tree classifier. 67 00:03:23,333 --> 00:03:25,500 That's the prediction boundary. 68 00:03:25,500 --> 00:03:28,866 So as you can see there's only horizontal and vertical lines. 69 00:03:29,400 --> 00:03:32,033 That's because as Kirill explains, the decision tree 70 00:03:32,033 --> 00:03:36,400 algorithm is based on some conditions of your independent variables. 71 00:03:36,400 --> 00:03:40,500 By finding, you know, each time intervals that will make conditions 72 00:03:40,500 --> 00:03:43,800 that will classify in some rectangles your observations. 73 00:03:44,100 --> 00:03:48,366 And actually, what's funny is that we clearly have less overfitting 74 00:03:48,366 --> 00:03:49,300 than in Python. 75 00:03:49,300 --> 00:03:52,366 And actually that's why we have more incorrect predictions. 76 00:03:52,666 --> 00:03:57,566 Because in Python we had, you know, red rectangles here, red rectangles here. 77 00:03:58,166 --> 00:04:01,166 There was also red rectangle here 78 00:04:01,700 --> 00:04:02,233 and here. 79 00:04:02,233 --> 00:04:04,900 We didn't actually specified more parameters, 80 00:04:04,900 --> 00:04:08,433 but this amazing output library and that's why it's very popular. 81 00:04:08,800 --> 00:04:11,800 Chose the right parameters, the right default parameters to, 82 00:04:12,233 --> 00:04:13,733 you know, prevent overfitting. 83 00:04:13,733 --> 00:04:16,233 Because here we clearly don't have overfitting. 84 00:04:16,233 --> 00:04:19,433 We had overfitting with Python because of all the red rectangles here 85 00:04:19,433 --> 00:04:23,266 that were desperately trying to catch every user in the right category. 86 00:04:23,500 --> 00:04:24,633 But here it's not the case. 87 00:04:24,633 --> 00:04:28,400 And here it's doing a terrific job at, you know, 88 00:04:28,500 --> 00:04:30,933 classifying correctly most of the red points here, 89 00:04:30,933 --> 00:04:33,733 most of the green points here in the red region. 90 00:04:33,733 --> 00:04:38,566 And as well as this green uses here, who couldn't be well classified for linear 91 00:04:38,566 --> 00:04:43,366 classifiers such as logistic regression or linear kernel SVM. 92 00:04:44,266 --> 00:04:45,766 So here it's doing a pretty good job. 93 00:04:45,766 --> 00:04:47,333 But still we have some incorrect prediction. 94 00:04:47,333 --> 00:04:49,766 That's because that's difficult to classify. 95 00:04:49,766 --> 00:04:52,800 Well if you want to prevent overfitting in your data. 96 00:04:53,433 --> 00:04:56,433 So even if we have 17 incorrect predictions 97 00:04:56,533 --> 00:05:00,566 that's a very good classification we have here okay. 98 00:05:00,566 --> 00:05:02,400 But now let's look at the test set results. 99 00:05:02,400 --> 00:05:05,333 And I'm actually not worried about that because 100 00:05:05,333 --> 00:05:08,600 since we don't have overfitting here, then that means that we're 101 00:05:08,600 --> 00:05:11,666 very likely to have some good results as well on the test set. 102 00:05:11,700 --> 00:05:13,466 Let's check it out. 103 00:05:13,466 --> 00:05:16,200 Test set and execute. 104 00:05:16,200 --> 00:05:17,400 Let's see. 105 00:05:17,400 --> 00:05:19,000 And here is the test set okay. 106 00:05:19,000 --> 00:05:21,333 So as I told you this looks very good. 107 00:05:21,333 --> 00:05:25,133 This is the set on which we have those 17 incorrect predictions. 108 00:05:25,133 --> 00:05:27,900 You can count them if you want. You will find 17. 109 00:05:27,900 --> 00:05:31,100 And it's classifying most of the red users in the red region 110 00:05:31,266 --> 00:05:33,633 and most of the green users in the green regions. 111 00:05:33,633 --> 00:05:35,300 That's quite okay. 112 00:05:35,300 --> 00:05:38,400 By the way, we can see that most of the incorrect predictions are here. 113 00:05:38,400 --> 00:05:41,500 We can see that we have many red points in the green region. 114 00:05:41,500 --> 00:05:43,200 So that's unlucky. 115 00:05:43,200 --> 00:05:43,433 Good. 116 00:05:43,433 --> 00:05:47,600 As I told you, we would rather prevent overfitting than trying to, 117 00:05:47,833 --> 00:05:50,966 you know, minimize to zero the number of incorrect predictions.