1 00:00:00,200 --> 00:00:02,600 Hello and welcome to this art tutorial. 2 00:00:02,600 --> 00:00:04,600 Well, I. Hope the homework went well. 3 00:00:04,600 --> 00:00:06,133 I hope you've. 4 00:00:06,133 --> 00:00:09,900 Obtained some interesting results and without waiting, 5 00:00:09,900 --> 00:00:13,800 I'm going to complete the backward elimination here on R. 6 00:00:13,900 --> 00:00:17,300 So that you can see if you obtain the same results as what we are about to. 7 00:00:17,300 --> 00:00:20,100 Obtain, and mostly so that you can see how we complete. 8 00:00:20,100 --> 00:00:20,633 Backward. 9 00:00:20,633 --> 00:00:23,200 Elimination and are up. To the end. 10 00:00:23,200 --> 00:00:24,100 So let's do that. 11 00:00:24,100 --> 00:00:26,433 So in the previous tutorial we. Went up. 12 00:00:26,433 --> 00:00:30,500 To building this regressor without the state. 13 00:00:30,500 --> 00:00:32,233 Independent variable. 14 00:00:32,233 --> 00:00:33,466 And that's completed the. 15 00:00:33,466 --> 00:00:34,800 Step five here. 16 00:00:34,800 --> 00:00:36,600 Of the backward elimination algorithm. 17 00:00:36,600 --> 00:00:37,900 And so now we need to. 18 00:00:37,900 --> 00:00:40,333 Go back to step three to look. 19 00:00:40,333 --> 00:00:43,300 Again for the independent variable that. Has the highest p value. 20 00:00:43,300 --> 00:00:46,300 And then we need to compare it to the significance level. 21 00:00:46,400 --> 00:00:47,133 To decide. 22 00:00:47,133 --> 00:00:49,533 If we need to remove this independent variable. 23 00:00:49,533 --> 00:00:51,000 With the highest p value. 24 00:00:51,000 --> 00:00:52,000 So let's do this. 25 00:00:52,000 --> 00:00:54,033 It's actually already ready. 26 00:00:54,033 --> 00:00:56,466 We just need to select this. 27 00:00:56,466 --> 00:00:57,933 And press Command and control. 28 00:00:57,933 --> 00:01:00,000 Press enter to execute. 29 00:01:00,000 --> 00:01:02,033 And here it is perfect. 30 00:01:02,033 --> 00:01:03,866 So let's move that up. 31 00:01:04,866 --> 00:01:05,733 All right. 32 00:01:05,733 --> 00:01:08,033 So as you can see. Here we have the new formula. 33 00:01:08,033 --> 00:01:09,166 Prophet expressed. 34 00:01:09,166 --> 00:01:12,233 As a linear combination of all the independent variables. 35 00:01:12,233 --> 00:01:14,366 Except for this state. 36 00:01:14,366 --> 00:01:15,100 So that's fine. 37 00:01:15,100 --> 00:01:16,100 And here therefore. 38 00:01:16,100 --> 00:01:18,100 We have the p values for these new. 39 00:01:18,100 --> 00:01:21,100 Team of three independent variables. 40 00:01:21,600 --> 00:01:22,000 All right. 41 00:01:22,000 --> 00:01:23,433 And now what. Do we see here. 42 00:01:23,433 --> 00:01:23,866 Okay. 43 00:01:23,866 --> 00:01:27,133 So we can see that of course the already spent is still 44 00:01:27,300 --> 00:01:30,633 highly statistically significant with three stars. 45 00:01:30,633 --> 00:01:32,900 Here and a very. Low p value. 46 00:01:32,900 --> 00:01:34,733 And mostly what do we need to do now. 47 00:01:34,733 --> 00:01:36,700 We need to look for the highest p value. 48 00:01:36,700 --> 00:01:39,666 And that's actually this 10.6. 49 00:01:39,666 --> 00:01:41,933 Oh to 60%. 50 00:01:41,933 --> 00:01:45,533 60% is definitely a very high p value, way. 51 00:01:45,533 --> 00:01:47,900 Above the significance level of 5%. 52 00:01:47,900 --> 00:01:50,466 So definitely administration is not. 53 00:01:50,466 --> 00:01:52,200 Statistically. Significant. 54 00:01:52,200 --> 00:01:56,100 That is administration has no effect on the dependent variable profit. 55 00:01:56,700 --> 00:01:58,633 So great. That's pretty clear. 56 00:01:58,633 --> 00:02:02,933 We need to remove administration from or regression equation. 57 00:02:03,600 --> 00:02:04,766 So let's do this. 58 00:02:04,766 --> 00:02:06,233 As usual. We're going to. Copy this 59 00:02:07,400 --> 00:02:10,300 copy paste. 60 00:02:10,300 --> 00:02:13,300 And here will simply remove administration. 61 00:02:14,400 --> 00:02:15,433 There we go. 62 00:02:15,433 --> 00:02:17,200 And now we have our new regressor. 63 00:02:17,200 --> 00:02:18,100 Ready with. 64 00:02:18,100 --> 00:02:19,933 Only two independent variables 65 00:02:19,933 --> 00:02:23,133 composed of already spent which we already know is highly. 66 00:02:23,133 --> 00:02:24,800 Statistic. Significant. 67 00:02:24,800 --> 00:02:27,700 And the marketing spend marketing spend. 68 00:02:27,700 --> 00:02:29,700 So far the p value. Is 10%. 69 00:02:29,700 --> 00:02:31,900 Let's see what it will become. 70 00:02:31,900 --> 00:02:36,900 So I'm actually going to build this new model by executing this. 71 00:02:36,900 --> 00:02:39,633 So command and control plus enter to execute. 72 00:02:39,633 --> 00:02:40,533 Here we go. 73 00:02:40,533 --> 00:02:43,166 And now let's find out about. 74 00:02:43,166 --> 00:02:45,333 The statistical results of this new. 75 00:02:45,333 --> 00:02:46,833 Regression model. 76 00:02:46,833 --> 00:02:47,866 Let's select this. 77 00:02:47,866 --> 00:02:50,533 Press Command and Control plus enter to. Execute. 78 00:02:50,533 --> 00:02:52,800 And here are the new. Statistical results. 79 00:02:52,800 --> 00:02:54,466 With only the p values 80 00:02:54,466 --> 00:02:57,733 of the two independent variables already spent and marketing spend. 81 00:02:58,366 --> 00:02:59,200 Okay, so let's see. 82 00:02:59,200 --> 00:03:01,700 Wow there is something interesting here. 83 00:03:01,700 --> 00:03:02,733 Can you see it. 84 00:03:02,733 --> 00:03:06,300 Well first participant is still highly statistically significant. 85 00:03:06,500 --> 00:03:08,033 No surprise about that. 86 00:03:08,033 --> 00:03:11,033 But we actually. Have one surprise here. 87 00:03:11,033 --> 00:03:15,200 Remember the p value of the marketing spend was 10% at the previous step. 88 00:03:15,566 --> 00:03:17,366 Now it's 6%. 89 00:03:17,366 --> 00:03:19,466 And actually we want a dot here. 90 00:03:19,466 --> 00:03:20,266 You know we. 91 00:03:20,266 --> 00:03:23,333 Jumped from this category between 10% and one. 92 00:03:23,800 --> 00:03:26,400 To this. Category between 5% and 10%. 93 00:03:26,400 --> 00:03:27,600 So we actually have. 94 00:03:27,600 --> 00:03:30,566 A dot now for this marketing spend independent variable. 95 00:03:30,566 --> 00:03:33,600 Plus it's very close to the significance level. 96 00:03:33,900 --> 00:03:35,633 It's very close to 5%. 97 00:03:35,633 --> 00:03:38,200 You know for example, if our significance level was. 98 00:03:38,200 --> 00:03:40,733 7% we would have kept this. 99 00:03:40,733 --> 00:03:42,633 Independent variable marketing spend. 100 00:03:42,633 --> 00:03:44,600 So it's kind of arbitrary. 101 00:03:44,600 --> 00:03:45,733 You know, we we. 102 00:03:45,733 --> 00:03:47,700 Don't know if we really need to remove. 103 00:03:47,700 --> 00:03:50,166 This just because. Our backward elimination. 104 00:03:50,166 --> 00:03:52,266 Tells us to, due to the fact that. 105 00:03:52,266 --> 00:03:54,166 We made an arbitrary. Choice of. 106 00:03:54,166 --> 00:03:56,700 5% for the significance level. 107 00:03:56,700 --> 00:03:58,433 So since this is a tutorial. 108 00:03:58,433 --> 00:04:01,700 About backward elimination, we will actually remove. It. 109 00:04:01,866 --> 00:04:03,900 But this will not be our final. 110 00:04:03,900 --> 00:04:06,900 Word because at the end of this part, there will be this. 111 00:04:06,900 --> 00:04:09,733 Section about evaluating model's performance. 112 00:04:09,733 --> 00:04:13,033 And we will actually add a criterion to make. 113 00:04:13,033 --> 00:04:15,066 A better call at deciding. 114 00:04:15,066 --> 00:04:18,300 If we really need to remove this marketing spend. 115 00:04:18,333 --> 00:04:19,766 Because right now, the call 116 00:04:19,766 --> 00:04:23,566 to remove this marketing spend independent variable is arbitrary. 117 00:04:24,100 --> 00:04:28,800 So we will remove it because we want to follow thoroughly. 118 00:04:28,800 --> 00:04:30,866 The backward elimination algorithm. 119 00:04:30,866 --> 00:04:32,300 But keep that in mind 120 00:04:32,300 --> 00:04:35,800 that it's not a final word about deciding what's the final time is going. 121 00:04:35,800 --> 00:04:37,766 To be, and then we will get back to this. 122 00:04:37,766 --> 00:04:40,200 Problem later in this. Course that is. 123 00:04:40,200 --> 00:04:41,300 Later in this part. 124 00:04:41,300 --> 00:04:45,433 To make a better call about whether we need to remove yes or no marketing spent. 125 00:04:45,433 --> 00:04:47,266 So let's remove it for now. 126 00:04:47,266 --> 00:04:49,266 That's actually the final solution. 127 00:04:49,266 --> 00:04:51,500 But it's really a good choice of yours. 128 00:04:51,500 --> 00:04:56,166 If you actually decided to keep marketing spend, you will be proud of this choice. 129 00:04:56,166 --> 00:04:57,600 At the end of this part. 130 00:04:57,600 --> 00:04:58,866 So congratulations! 131 00:04:58,866 --> 00:05:01,000 Anyway, if you decided to keep marketing. 132 00:05:01,000 --> 00:05:03,666 Spend and of course, also congratulations to the. 133 00:05:03,666 --> 00:05:04,833 Others who decided. 134 00:05:04,833 --> 00:05:06,733 To remove marketing spend because that means. 135 00:05:06,733 --> 00:05:08,700 You just followed thoroughly the. 136 00:05:08,700 --> 00:05:10,700 Backward elimination algorithm. 137 00:05:10,700 --> 00:05:13,066 So congratulations to both of you. 138 00:05:13,066 --> 00:05:17,233 And let's finish this backward elimination algorithm. 139 00:05:17,233 --> 00:05:20,233 Because actually, you know, this is definitely the final step. 140 00:05:20,566 --> 00:05:24,100 We have only one independent variable left R&D spend. 141 00:05:24,100 --> 00:05:27,033 And we already know it's highly. Statistically. Significant. 142 00:05:27,033 --> 00:05:28,766 But let's actually make this model. 143 00:05:28,766 --> 00:05:31,633 By selecting. And executing this. 144 00:05:31,633 --> 00:05:33,633 Here we go model created. 145 00:05:33,633 --> 00:05:35,166 And now let's have a final. 146 00:05:35,166 --> 00:05:38,100 Look at the statistical. 147 00:05:38,100 --> 00:05:39,633 Informations of our model. 148 00:05:39,633 --> 00:05:40,500 So let's press. 149 00:05:40,500 --> 00:05:42,600 Command and Control plus enter to. Execute. 150 00:05:42,600 --> 00:05:47,033 And here are the final informations of the final optimal team 151 00:05:47,400 --> 00:05:50,900 a team that is actually composed of one independent variable. 152 00:05:50,900 --> 00:05:53,366 So that's actually funny to call. It a team. 153 00:05:53,366 --> 00:05:55,500 But anyway that's just a team of. 154 00:05:55,500 --> 00:05:58,366 One that happens. But wait for the final. 155 00:05:58,366 --> 00:06:00,066 Part to see if a team. 156 00:06:00,066 --> 00:06:02,933 Of one can really be the best team. 157 00:06:02,933 --> 00:06:03,866 So that's it. 158 00:06:03,866 --> 00:06:06,266 We actually completed backward elimination. 159 00:06:06,266 --> 00:06:10,866 Our final model is composed of only one independent variable, the R&D spend. 160 00:06:11,366 --> 00:06:12,166 And we can clearly. 161 00:06:12,166 --> 00:06:13,966 See that this. Independent variable. 162 00:06:13,966 --> 00:06:15,266 Is highly statistically. 163 00:06:15,266 --> 00:06:17,000 Significant, not only because. 164 00:06:17,000 --> 00:06:18,433 We have three stars here, 165 00:06:18,433 --> 00:06:21,133 but also because the p value is really, really small. 166 00:06:21,133 --> 00:06:22,500 So definitely that's a very. 167 00:06:22,500 --> 00:06:24,733 Precious information for the investors. 168 00:06:24,733 --> 00:06:26,533 That should really look at. 169 00:06:26,533 --> 00:06:28,600 This independent. Variable to add. 170 00:06:28,600 --> 00:06:31,600 Another criterion into their investment decisions. 171 00:06:32,366 --> 00:06:33,900 So again, congratulations. 172 00:06:33,900 --> 00:06:35,366 To both of you who. 173 00:06:35,366 --> 00:06:39,233 Either found a final team of only one independent variable, 174 00:06:39,600 --> 00:06:42,300 the R&D spend or a final team of two 175 00:06:42,300 --> 00:06:45,433 independent variables the R&D spend and the marketing spend. 176 00:06:45,866 --> 00:06:49,133 And for those of you who found another team, make sure to compare. 177 00:06:49,133 --> 00:06:50,666 The steps that you did yourself 178 00:06:50,666 --> 00:06:53,800 with the steps that we did in this tutorial to spot which. 179 00:06:53,800 --> 00:06:56,400 Difference led you to different results. 180 00:06:56,400 --> 00:06:58,866 You can always ask me some questions in the Q&A. 181 00:06:58,866 --> 00:07:00,900 I'll be happy to help you with your model. 182 00:07:00,900 --> 00:07:04,566 But clearly the final team is either the R&D spend or the R&D 183 00:07:04,566 --> 00:07:06,033 spend, plus the marketing spend. 184 00:07:06,033 --> 00:07:08,200 And that's where. We also. Obtain in Python. 185 00:07:08,200 --> 00:07:09,600 And that's what you will also. 186 00:07:09,600 --> 00:07:13,533 Obtain in some other programing language or machine learning package. 187 00:07:14,133 --> 00:07:15,900 So thank you. For watching this tutorial. 188 00:07:15,900 --> 00:07:17,266 Congratulations again. 189 00:07:17,266 --> 00:07:19,200 I hope you enjoyed doing this homework. 190 00:07:19,200 --> 00:07:21,866 You will have some other homeworks in the other parts. 191 00:07:21,866 --> 00:07:26,266 So you'll definitely practice which will help you shape your expertise. 192 00:07:26,266 --> 00:07:28,000 In machine learning. 193 00:07:28,000 --> 00:07:29,700 And speaking of machine learning, 194 00:07:29,700 --> 00:07:31,800 I look forward to seeing you in the next tutorial. 195 00:07:31,800 --> 00:07:33,733 And until then, enjoy machine learning.