1
00:00:00,710 --> 00:00:06,840
So in this video, we are going to learn how to create a classification decision tree in our.

2
00:00:10,060 --> 00:00:16,690
The template that we created for Regression Decision Tree, the same template can be used for classification.

3
00:00:17,290 --> 00:00:19,390
Only few modifications need to be done.

4
00:00:19,720 --> 00:00:26,470
I'll point out, wherever you need to do those modifications, you can use the same template that you

5
00:00:26,470 --> 00:00:28,240
created for regression decision tree.

6
00:00:29,140 --> 00:00:30,450
So copy paste that code.

7
00:00:31,180 --> 00:00:33,340
The first part is importing the dataset.

8
00:00:35,550 --> 00:00:38,100
Just ensure that you have the correct name.

9
00:00:38,700 --> 00:00:41,130
This file is also attached to the resources.

10
00:00:42,400 --> 00:00:44,990
Download that file and import that feed.

11
00:00:46,160 --> 00:00:47,420
So I run discom on.

12
00:00:50,910 --> 00:00:53,970
Now, if you want to view this data, we on this beauty of.

13
00:00:55,840 --> 00:00:59,680
You can see that it is almost similar to that previous data.

14
00:01:00,010 --> 00:01:05,010
Only thing is that collection now is a independent variable, that is.

15
00:01:05,230 --> 00:01:09,280
This will also be a part of the predictive variables.

16
00:01:10,250 --> 00:01:17,350
Start big Oscar, this variable is the deepening we will know the value of this variable we are trying

17
00:01:17,350 --> 00:01:17,920
to predict.

18
00:01:21,930 --> 00:01:30,640
I mean, if there are any missing values, we need to do missing value imputation, so that part was

19
00:01:30,640 --> 00:01:32,170
captured in the data preprocessing.

20
00:01:33,070 --> 00:01:36,610
I learned these two commands to impute dismissing values.

21
00:01:37,360 --> 00:01:39,160
This is similar to what we did earlier.

22
00:01:40,150 --> 00:01:44,730
Next step is to split the data and to test and train for that.

23
00:01:44,740 --> 00:01:46,270
We will use the package.

24
00:01:46,730 --> 00:01:47,120
Seattle's.

25
00:01:48,210 --> 00:01:50,120
So if that is in store.

26
00:01:51,560 --> 00:01:57,650
If that is already installed, would you just go and find it, see it old is already installed.

27
00:01:58,040 --> 00:01:59,360
You do not need to run this.

28
00:02:00,860 --> 00:02:05,860
You can just run this command, which is Lively's here, dude, to make it active.

29
00:02:08,340 --> 00:02:09,810
So now it is actor

30
00:02:12,810 --> 00:02:14,070
now in the same way.

31
00:02:14,270 --> 00:02:22,170
We will split this data into test entry, so let's select these three, these four lines and run them

32
00:02:22,200 --> 00:02:22,590
together.

33
00:02:22,800 --> 00:02:24,620
Why it control ended.

34
00:02:27,210 --> 00:02:31,890
I have changed the name of train and said to two trains and see so that.

35
00:02:33,150 --> 00:02:37,330
They do not override the previous test and train it.

36
00:02:37,830 --> 00:02:41,940
So now we have test C and trained C for declassification model.

37
00:02:44,740 --> 00:02:49,120
Now we are going to install the packages required for creating a.

38
00:02:50,270 --> 00:02:56,150
For creating the ABC entry, you know that we used our part and our part blog.

39
00:02:58,240 --> 00:03:05,020
So just go ahead and take whether they are installing active, if they're not installed on these two

40
00:03:05,020 --> 00:03:08,740
commands, if they are not active, then these two libatique a month.

41
00:03:09,900 --> 00:03:12,540
For me, both of them are distorted and active.

42
00:03:12,570 --> 00:03:14,310
So I'm not going to run these full length.

43
00:03:16,470 --> 00:03:19,760
Next is billing the classification tree.

44
00:03:20,910 --> 00:03:23,550
Earlier here, we had read three variable.

45
00:03:23,700 --> 00:03:27,220
Now we are putting the information in two last three rebill.

46
00:03:28,260 --> 00:03:31,830
The function that we are going to use the same it is our part only.

47
00:03:32,550 --> 00:03:36,270
The only change that you need to do is to mentioned this method.

48
00:03:37,730 --> 00:03:43,220
When you are doing classification, you need to use a method is equal to plus.

49
00:03:45,570 --> 00:03:51,320
Just all of those things I seem to control, but I'm mean, that is again, Max, depth is equal to

50
00:03:51,320 --> 00:03:51,740
three.

51
00:03:51,830 --> 00:03:55,790
That is the maximum depth of our tree is going to be three levels.

52
00:03:57,860 --> 00:04:05,120
I'll run this and you can see that the upper class street variable is created, which has data of the

53
00:04:05,270 --> 00:04:06,260
classification tree.

54
00:04:08,220 --> 00:04:13,970
Next, if you want to plot that decision, tree we use are our part, not blah function.

55
00:04:14,940 --> 00:04:20,030
Here you'll just jeans that rig three very well do.

56
00:04:20,310 --> 00:04:23,430
Three, since we want to applaud this classification tree.

57
00:04:24,170 --> 00:04:25,080
So if I run this.

58
00:04:26,440 --> 00:04:27,970
It is a classification tree.

59
00:04:29,750 --> 00:04:33,720
So let us take a look at this classification tree initially.

60
00:04:34,160 --> 00:04:42,170
Hundred percent of my population had an average of zero point five tree, meaning that there is 53 percent

61
00:04:42,170 --> 00:04:46,220
probability of getting a Star Trek Oscar for the movie.

62
00:04:46,760 --> 00:04:49,880
Now, the first split is Baz's budget variable.

63
00:04:51,160 --> 00:04:59,120
If the budget variable is less than 30000, which is for this region, for these eight point four percent

64
00:04:59,120 --> 00:05:09,020
of the movies, that is 93 percent chance of getting an Oscar for the movies which have higher budget.

65
00:05:10,920 --> 00:05:14,020
We make and I just played Bishes collection.

66
00:05:15,820 --> 00:05:23,020
If collection is more than 63000, there is seventy eight point six percent probability of getting an

67
00:05:23,020 --> 00:05:23,380
Oscar.

68
00:05:24,550 --> 00:05:33,450
Whereas if collection is less than 63000, there is 44 percent probability of getting an Oscar for that.

69
00:05:33,700 --> 00:05:35,200
It is split into two parts.

70
00:05:36,180 --> 00:05:37,390
Versity producer rating.

71
00:05:39,310 --> 00:05:46,480
So using this decision tree, you can predict whether a movie is going to get a star, take Oscar or

72
00:05:46,480 --> 00:05:46,810
not.

73
00:05:48,400 --> 00:05:55,180
Now, when we are trying to predict a value using this classification decision tree, we use predict

74
00:05:55,180 --> 00:05:55,720
function.

75
00:05:56,530 --> 00:05:57,840
This is what we used earlier.

76
00:05:57,850 --> 00:06:04,720
Also, the difference between this predict function is an X parameter type is not going to be equal

77
00:06:04,720 --> 00:06:05,320
to vector.

78
00:06:06,010 --> 00:06:07,920
The type is going to be equal to plus.

79
00:06:09,100 --> 00:06:14,900
So for regression, we use a musical director for classification we use type is equal to loss.

80
00:06:16,570 --> 00:06:17,350
If I run this.

81
00:06:19,830 --> 00:06:23,490
In my test, see, I'll get another variable.

82
00:06:27,520 --> 00:06:32,680
Which has deep predictions that whether a movie is going to get an Oscar or not.

83
00:06:33,000 --> 00:06:34,420
Bases this decision tree.

84
00:06:39,210 --> 00:06:45,440
And lastly, if you want to compare the performance of the classification to see Indri, one of the

85
00:06:45,440 --> 00:06:49,490
method could be to plot actual values versus the predicted values.

86
00:06:50,390 --> 00:06:56,480
So if I plot this table, if I run this command, you can see this is what I'm getting.

87
00:06:59,010 --> 00:07:00,510
Hit on the nose.

88
00:07:00,660 --> 00:07:02,430
We have actual values.

89
00:07:02,700 --> 00:07:05,030
That is for the Fosterville.

90
00:07:05,700 --> 00:07:16,710
The movie did not get any Oscar out of these 47 cases where the movie did not get any Oscar read correctly

91
00:07:16,710 --> 00:07:18,360
predicted for 40 cases.

92
00:07:20,610 --> 00:07:23,670
So actually, 47 movies did not get the Oscar.

93
00:07:24,450 --> 00:07:28,320
And we were right in predicting 40 of those cases.

94
00:07:30,640 --> 00:07:33,410
In the second row, we have sixty five observations.

95
00:07:33,500 --> 00:07:37,250
That is sixty five movies actually got the Oscar.

96
00:07:38,240 --> 00:07:42,680
Out of those 65 movies, we correctly predicted four, only 23.

97
00:07:44,870 --> 00:07:51,500
So when the movie's actually winning the Oscar, we have a very bad prediction, accuracy.

98
00:07:51,800 --> 00:07:56,900
Whereas when the movie's not actually winning the Oscar, we have a very good prediction, accuracy.

99
00:07:59,230 --> 00:08:07,930
If you want to check the overall accuracy that is overall, we had 63 correct guesses and 49 incorrect

100
00:08:07,930 --> 00:08:08,380
guesses.

101
00:08:08,890 --> 00:08:10,360
So 63.

102
00:08:12,920 --> 00:08:18,650
Out of one one, two is a prediction accuracy.

103
00:08:18,860 --> 00:08:25,160
If I had this coal mine, you can see that I'm predicting correctly four fifty six point two five percent

104
00:08:25,200 --> 00:08:25,620
getas.

105
00:08:27,540 --> 00:08:30,990
So then I had the population when I had the total population.

106
00:08:31,170 --> 00:08:35,040
If I was guessing that each movie will get the Oscar.

107
00:08:35,580 --> 00:08:38,830
My prediction accuracy was going to be fifty three point six percent.

108
00:08:40,500 --> 00:08:43,770
But with this model, my accuracy is nearly 56 percent.

109
00:08:44,430 --> 00:08:48,600
There is a very slight improvement in predicting using this decision.

110
00:08:48,600 --> 00:08:58,440
Tree in the coming videos will learn how to improve the accuracy of vision trees using advanced techniques.