1 00:00:00,820 --> 00:00:03,460 So let us see how to do preplanning and our. 2 00:00:05,210 --> 00:00:07,670 So the first step is to create the full three. 3 00:00:08,020 --> 00:00:11,380 That is we will not stop it at any point. 4 00:00:11,470 --> 00:00:13,930 We will not give it a stopping constraint. 5 00:00:14,710 --> 00:00:19,220 We will let it grow to its maximum, to its maximum land. 6 00:00:20,920 --> 00:00:23,120 So you know how to grow a tree. 7 00:00:23,310 --> 00:00:24,880 You'll use the ah part library. 8 00:00:25,960 --> 00:00:32,120 You need to install it and make it active before using it like we did in the last video. 9 00:00:35,120 --> 00:00:37,100 So we're using this are part, Labidi. 10 00:00:37,190 --> 00:00:40,530 We will clear the street only differences here. 11 00:00:40,730 --> 00:00:43,560 I'm using this control parameter. 12 00:00:44,710 --> 00:00:49,490 The c.p is that same during parameter that we discussed, NATO related. 13 00:00:52,390 --> 00:00:58,240 Since we have bought CPS equal to zero, the tree will grow as a normal tree. 14 00:00:58,630 --> 00:01:03,820 There will not be any pruning and we will get the maximum land for this tree. 15 00:01:04,510 --> 00:01:07,330 So if I run this command, I press control into. 16 00:01:13,070 --> 00:01:19,850 So we need to take this check box so that we can build history. 17 00:01:20,550 --> 00:01:21,680 So we'll run that on. 18 00:01:26,500 --> 00:01:35,050 And in the four three variable, we have the details of our complete regression tree with no pruning. 19 00:01:36,930 --> 00:01:45,210 Now, if you want to just have a look at this story, we can run this our plourde function and you will 20 00:01:45,210 --> 00:01:46,830 see that it is a huge tree. 21 00:01:50,210 --> 00:01:58,700 Now, when we run this regression tree, using our part, it by default runs the tree on different values 22 00:01:58,700 --> 00:01:59,570 of c.p. 23 00:02:01,190 --> 00:02:05,510 That is this control parameter that was during Parameter Aalto. 24 00:02:07,550 --> 00:02:15,590 If you want to look at how the relative error in our tree is changing with the value of C.p, we can 25 00:02:16,370 --> 00:02:18,680 run this command print c.p. 26 00:02:20,790 --> 00:02:21,720 You can see here. 27 00:02:23,690 --> 00:02:25,610 The different values of C.p. 28 00:02:27,780 --> 00:02:30,670 We get different relative air values. 29 00:02:32,370 --> 00:02:36,270 This ex added value is the cross-validation error. 30 00:02:38,160 --> 00:02:43,920 We want to find out that value of S.P. for which the cross validation error is minimum. 31 00:02:47,230 --> 00:02:52,210 That value of S.P. other tuning parameter will be used to prune the tree. 32 00:02:53,260 --> 00:02:56,720 Now you can also plot this plot. 33 00:02:56,770 --> 00:02:58,910 These are the lives of S.P. and Exeter. 34 00:02:59,320 --> 00:03:01,150 By using this block C.p function. 35 00:03:03,400 --> 00:03:10,780 And you can see that as we keep on increasing the value of S.P., the relator weather is initially decreasing. 36 00:03:11,350 --> 00:03:13,600 Then it starts to increase and so on. 37 00:03:14,110 --> 00:03:21,450 So basically somewhere around here, there should be some value of S.P., at which Relator Atari's minimum. 38 00:03:23,700 --> 00:03:30,500 So to find out that particular point, that particular value of S.P. at which cross validated later 39 00:03:30,510 --> 00:03:32,980 whether it is minimum, we will run this command. 40 00:03:34,360 --> 00:03:35,020 And just go on. 41 00:03:35,540 --> 00:03:44,030 We are finding that valley of S.P. variable for which X added value and this S.P. table is minimum. 42 00:03:45,900 --> 00:03:50,940 So that c.B value will be stored in this man S.P. variable. 43 00:03:51,840 --> 00:03:57,230 So if I had run this KomaI, a new variable is created Mincy P. 44 00:03:57,720 --> 00:04:00,670 And it has a value, nearly zero point zero one. 45 00:04:02,730 --> 00:04:12,030 Now, using this valley of S.P. Autotuning parameter, we will round up prune command and the same tree 46 00:04:12,120 --> 00:04:13,140 will be pruned. 47 00:04:16,400 --> 00:04:20,420 Will prune the full tree using this valley OCP. 48 00:04:20,990 --> 00:04:22,460 So will use prune function. 49 00:04:23,270 --> 00:04:30,720 We will input the full three variable, which contains the information of full regression tree and CPI 50 00:04:30,750 --> 00:04:32,030 equal to this mincey. 51 00:04:32,230 --> 00:04:37,160 B, the C p value at which we were getting the minimum cross validated relative error. 52 00:04:40,590 --> 00:04:41,530 And I'd underscore my. 53 00:04:44,280 --> 00:04:47,940 Groomed, grieving Abel now has the information of deep rooted three. 54 00:04:49,950 --> 00:04:55,500 If you want to blog this Plumtree, we will run this command, which is our blog. 55 00:04:57,420 --> 00:04:59,980 You can see that there are only four levels left now. 56 00:05:00,780 --> 00:05:08,460 The three is much more clean interpretable and possibly it has less overfit. 57 00:05:08,660 --> 00:05:15,270 Now to Genndy performance of this tree on the said. 58 00:05:16,510 --> 00:05:23,050 And compared it with the performance of the fall tree and the tree that we created earlier, we will 59 00:05:23,160 --> 00:05:24,820 run these commands first. 60 00:05:24,820 --> 00:05:35,420 We will predict the values on the test set using the predict function of the on before tree will then 61 00:05:35,600 --> 00:05:35,860 my. 62 00:05:39,710 --> 00:05:47,210 And now in the desert, we have another variable for tree, which contains predictions using the information 63 00:05:47,210 --> 00:05:49,400 indeed for tree regression tree. 64 00:05:50,540 --> 00:05:55,490 So if you want to look at the predictions, you can click on, Essid said. 65 00:05:58,850 --> 00:06:02,460 And you can go to the end and see that there is a new column for. 66 00:06:03,560 --> 00:06:04,850 It has the predictions it. 67 00:06:06,740 --> 00:06:15,650 You can see for first row the predicted values, a leawood, forty six thousand nearly this prediction 68 00:06:15,650 --> 00:06:21,920 value is for Dadri, which we created last time that we had a maximum depth of three. 69 00:06:22,940 --> 00:06:24,990 This country had no such constraint. 70 00:06:25,430 --> 00:06:27,020 It had a lot of levels. 71 00:06:27,710 --> 00:06:31,600 And history is predicting nearly 59000 collections. 72 00:06:32,900 --> 00:06:35,770 This seems far from the actual value. 73 00:06:36,410 --> 00:06:38,130 But if you look at the second observation. 74 00:06:39,500 --> 00:06:46,880 The prediction of the previous three was far from the actual value, but the prediction of distri of 75 00:06:46,880 --> 00:06:50,770 the full three is very near to the actual value. 76 00:06:54,980 --> 00:07:02,660 So we have the predicted values using the predicted values, we can find out the MSE Dussel that we 77 00:07:02,660 --> 00:07:09,470 are able to compare the MASC that these previously great entry, we will create a new variable, MACV 78 00:07:09,470 --> 00:07:09,890 full. 79 00:07:11,780 --> 00:07:13,070 And we will run this command. 80 00:07:15,630 --> 00:07:22,200 Now, you can see that then we had a depth of three, then we applied the constraint that the depth 81 00:07:22,230 --> 00:07:23,690 should be maximum three. 82 00:07:24,180 --> 00:07:27,670 We had masc of one hundred, 13 million. 83 00:07:29,070 --> 00:07:35,550 But when I create the full three, that is when we do do not give any constraint on the tree. 84 00:07:36,150 --> 00:07:39,480 That tree has a MSE of ninety nine million. 85 00:07:39,720 --> 00:07:42,900 So there is a considerable decrease in the MSE value. 86 00:07:44,840 --> 00:07:51,590 Now, let us take the performance off the prudery so we'll first find out the predictions using the 87 00:07:51,590 --> 00:07:54,260 Plumtree variable and. 88 00:07:56,040 --> 00:08:02,280 That those predicted values are in another column and detested, so we can again click on desert. 89 00:08:03,150 --> 00:08:09,690 We can go to the last column and you can see there is additional column broomed, which has the prediction 90 00:08:09,780 --> 00:08:11,010 from the Plumtree. 91 00:08:11,460 --> 00:08:15,360 Now let us find out the embassy of this and compare its performance. 92 00:08:17,670 --> 00:08:24,500 We'll run this command and you can see that we have MASC of Plumtree at ninety seven million. 93 00:08:24,680 --> 00:08:31,770 There is a slight improvement in the prediction accuracy of Plumtree over difficulty. 94 00:08:32,630 --> 00:08:35,570 This improvement may seem counterintuitive. 95 00:08:36,260 --> 00:08:38,570 That is, this is a smaller tree. 96 00:08:39,020 --> 00:08:41,150 Whereas Vaudrey was a very large tree. 97 00:08:41,330 --> 00:08:45,410 So you may feel that full tree should have more prediction accuracy. 98 00:08:45,920 --> 00:08:52,040 But the reason here is because in full tree we were orbiting on the training data. 99 00:08:52,880 --> 00:09:00,890 So using the alpha value at which we had lowest cross validated error, we have pruned this tree. 100 00:09:02,420 --> 00:09:05,170 To get listed on the desk. 101 00:09:07,790 --> 00:09:14,690 This is all we do to keep running, to get a shorter tree than better prediction accuracy on that side 102 00:09:14,960 --> 00:09:16,940 and more into breathability.