1 00:00:01,000 --> 00:00:03,700 And finally interpreting coefficients. 2 00:00:03,700 --> 00:00:06,233 In front of us here, we've got the four models that we built 3 00:00:06,233 --> 00:00:09,233 in our backward elimination method. 4 00:00:09,266 --> 00:00:13,500 And as we agreed, model number three is the best one is the best fitted model. 5 00:00:13,766 --> 00:00:16,766 And if we were to deliver a model on our project it would be this one. 6 00:00:17,100 --> 00:00:20,100 So let's go ahead and start interpreting coefficients by looking at this model. 7 00:00:20,666 --> 00:00:24,166 Here we've got two variables R&D spend and marketing spend. 8 00:00:24,166 --> 00:00:25,600 So it's basically how much 9 00:00:25,600 --> 00:00:29,233 companies are spending on their research and development and on their marketing. 10 00:00:29,700 --> 00:00:32,766 And we've also got the constant and we're predicting profit. 11 00:00:33,400 --> 00:00:38,466 So these coefficients here these are B1 and B2. 12 00:00:39,533 --> 00:00:40,566 What they're telling 13 00:00:40,566 --> 00:00:44,066 us is that so how to how to interpret them. 14 00:00:44,333 --> 00:00:45,600 First of all you look at the sign. 15 00:00:45,600 --> 00:00:50,866 If the sign is positive that means your variable is correlated 16 00:00:51,366 --> 00:00:56,100 with your your independent variable is correlated with your dependent variable, 17 00:00:56,733 --> 00:01:01,766 meaning that if you change, the value of your independent variable, 18 00:01:02,000 --> 00:01:05,266 then the value, then you can see that 19 00:01:05,266 --> 00:01:09,266 the dependent variable will be changing in the same direction. 20 00:01:09,300 --> 00:01:12,333 So basically if you'll be increasing spend on R&D then 21 00:01:12,333 --> 00:01:15,500 your profit will be increasing your increasing spend on marketing. 22 00:01:15,500 --> 00:01:17,966 Then your profit is also increasing which makes sense right. 23 00:01:17,966 --> 00:01:21,200 So if you're spending more on research and development and making your product 24 00:01:21,200 --> 00:01:26,200 better, then, probably your profits should increase eventually as well. 25 00:01:26,400 --> 00:01:27,366 Same thing with marketing. 26 00:01:27,366 --> 00:01:30,666 The more you spend marketing, the more you sell, and therefore 27 00:01:30,933 --> 00:01:32,166 the profits should also go up. 28 00:01:33,400 --> 00:01:35,100 And that's the sign. 29 00:01:35,100 --> 00:01:37,200 If the sign is negative, if the sign were negative, 30 00:01:37,200 --> 00:01:38,400 then it's the opposite effect. 31 00:01:38,400 --> 00:01:38,900 So basically 32 00:01:38,900 --> 00:01:42,433 you increase your independent variable and your dependent variable decreases. 33 00:01:43,733 --> 00:01:45,233 And now let's look at the magnitude. 34 00:01:45,233 --> 00:01:50,200 So here you can see right away that the magnitude is higher for indie spend. 35 00:01:50,200 --> 00:01:53,100 And the magnitude is low for marketing spend. 36 00:01:53,100 --> 00:01:56,033 Magnitude is always tricky with regressions. 37 00:01:56,033 --> 00:01:57,533 Be careful with magnitude. 38 00:01:57,533 --> 00:01:59,666 Sign is kind of, you know, pretty obvious. 39 00:01:59,666 --> 00:02:04,100 It's either it's one way or the other magnitude can, really trip you up. 40 00:02:04,100 --> 00:02:05,233 And I can give you an example here. 41 00:02:05,233 --> 00:02:08,933 So you might think that okay, right away magnitude is greater. 42 00:02:08,933 --> 00:02:13,166 So this this, coefficient for R&D spend is bigger than the marketing 43 00:02:13,166 --> 00:02:13,833 spend coefficient. 44 00:02:13,833 --> 00:02:16,833 So definitely R&D spend has a bigger impact. 45 00:02:16,833 --> 00:02:21,166 Well, what if I tell you that I could easily without changing 46 00:02:21,166 --> 00:02:25,000 anything in the regression, I could easily make this coefficient bigger, 47 00:02:26,033 --> 00:02:26,866 by a lot, 48 00:02:26,866 --> 00:02:29,866 by a thousand times bigger than R&D spend. 49 00:02:30,000 --> 00:02:31,233 Well, it's easy to do. 50 00:02:31,233 --> 00:02:35,300 All I have to do is say marketing spend instead of looking at it in dollars. 51 00:02:35,433 --> 00:02:38,233 How about I look at it in, a fraction of a cent? 52 00:02:38,233 --> 00:02:40,600 So, every marketing. 53 00:02:40,600 --> 00:02:43,833 So let's look at and sense so marketing spend instead of 54 00:02:43,833 --> 00:02:48,066 counting it in dollars, let's count it in cents right away because our variable 55 00:02:48,066 --> 00:02:52,100 is going down 100 times, our coefficient will proportionally increase. 56 00:02:52,100 --> 00:02:53,300 So it go up 100 times. 57 00:02:53,300 --> 00:02:57,266 And if I, change my data and replace everywhere 58 00:02:57,266 --> 00:03:01,000 marketing spend with sense and everything else, I will leave the same. 59 00:03:01,266 --> 00:03:02,633 Then I rerun this model. 60 00:03:02,633 --> 00:03:07,966 I guarantee you that the coefficient here will actually become 2.99. 61 00:03:07,966 --> 00:03:09,566 So you'll just increase 100 times. 62 00:03:09,566 --> 00:03:11,233 Everything else will stay the same. 63 00:03:11,233 --> 00:03:14,300 And that way right away you'll see oh well the marketing spend 64 00:03:14,300 --> 00:03:16,300 coefficient is greater. So then marketing spend 65 00:03:16,300 --> 00:03:21,033 has a bigger impact on, your dependent variable profit. 66 00:03:21,033 --> 00:03:23,766 And that's a mistake that a lot of beginners make. 67 00:03:23,766 --> 00:03:26,100 You should not fall into this trap. 68 00:03:26,100 --> 00:03:27,533 magnitude is a tricky thing. 69 00:03:27,533 --> 00:03:30,533 And the way to think about it is always say 70 00:03:30,900 --> 00:03:35,233 magnitude in terms of units of the independent variable. 71 00:03:35,233 --> 00:03:39,933 So the correct way to analyze this is even without knowing what they're measured. 72 00:03:39,933 --> 00:03:41,766 And maybe this one's measured in thousands of dollars 73 00:03:41,766 --> 00:03:44,133 and this one's measured in dollars I don't know. 74 00:03:44,133 --> 00:03:47,666 For instance, although I do I do know of course, I've seen the data, 75 00:03:47,666 --> 00:03:51,700 but if, say, I don't know, the you can still make a conclusion, 76 00:03:51,900 --> 00:03:55,800 just all you have to say is, R&D spend 77 00:03:56,266 --> 00:03:59,266 has a greater impact on profit 78 00:03:59,866 --> 00:04:02,866 per unit of R&D spend, 79 00:04:03,100 --> 00:04:07,633 then marketing, then marketing spend has per unit of marketing spend. 80 00:04:07,933 --> 00:04:08,533 And that's all. 81 00:04:08,533 --> 00:04:12,433 So basically, even if they're measured in different things, by saying per 82 00:04:12,433 --> 00:04:17,366 unit of the underlying variable, you are protecting yourself 83 00:04:17,366 --> 00:04:22,233 from that error that you know they are they are measured in different things. 84 00:04:22,800 --> 00:04:25,266 And moreover, imagine if they're measured, 85 00:04:25,266 --> 00:04:27,933 one's measured in dollars, the other one's measured in, 86 00:04:27,933 --> 00:04:31,733 you know, kilometers or something that you can't compare dollars and kilometers. 87 00:04:31,733 --> 00:04:31,933 Right. 88 00:04:31,933 --> 00:04:36,433 But you can always say per unit and this leads us to the actual 89 00:04:36,433 --> 00:04:40,966 interpretation of these variables or these coefficients. 90 00:04:41,200 --> 00:04:42,600 What does it mean? So 91 00:04:43,766 --> 00:04:47,400 this means 0.79 means that for every unit, 92 00:04:47,633 --> 00:04:51,233 if you keep all other variables constant, so you only have one other variable. 93 00:04:51,233 --> 00:04:56,333 So if you keep everything else constant but you are able to adjust R&D spend for, 94 00:04:57,033 --> 00:05:00,033 in this model or for a, for a hypothetical company, 95 00:05:00,200 --> 00:05:04,166 for every dollar or for every unit of R&D spend that you, 96 00:05:04,766 --> 00:05:07,766 increase your profit will increase. 97 00:05:07,866 --> 00:05:11,166 According to this model, your profit will increase by $0.79. 98 00:05:11,866 --> 00:05:15,233 That's exactly what this, coefficient is saying. 99 00:05:15,266 --> 00:05:18,633 And so for every unit that you decrease in your R&D spend, 100 00:05:18,766 --> 00:05:24,233 your profit will decrease by 70.79 of a unit of profit. 101 00:05:24,900 --> 00:05:28,233 And because R&D spending, mission dollars and profit is also million dollar, 102 00:05:28,500 --> 00:05:34,133 that means so for every unit increase in for every dollar 103 00:05:34,166 --> 00:05:38,733 increase in R&D spend, your profit will increase in, by $0.79. 104 00:05:39,700 --> 00:05:41,866 So let me just repeat that again 105 00:05:41,866 --> 00:05:46,733 that you are looking at units increase in R&D spend. 106 00:05:46,866 --> 00:05:51,433 They translate through this coefficient into units increase in profit. 107 00:05:51,600 --> 00:05:54,600 So if your profit was measured in apples 108 00:05:55,100 --> 00:05:59,933 then a $1 increase and one unit increase is always going to be true. 109 00:05:59,933 --> 00:06:03,533 One unit increase in R&D spend will drive 110 00:06:03,800 --> 00:06:07,266 point 79 units increase in profit. 111 00:06:07,266 --> 00:06:09,833 And of of course, of the profit units. 112 00:06:09,833 --> 00:06:12,466 So let's turn that into dolls and apples. 113 00:06:12,466 --> 00:06:16,100 A $1 increase in R&D spend 114 00:06:16,100 --> 00:06:21,866 will drive a 0.79, or 80% of an Apple increase in profits. 115 00:06:21,866 --> 00:06:26,200 So here you're talking about units of R&D spend in profit. 116 00:06:26,200 --> 00:06:28,500 You're talking about units of profit. 117 00:06:28,500 --> 00:06:31,500 And this coefficient, it links them together. 118 00:06:31,533 --> 00:06:34,533 So as long as you say per unit you're fine. 119 00:06:34,566 --> 00:06:35,666 Once you already know 120 00:06:35,666 --> 00:06:39,433 what these variables are measured and then you can start comparing them, 121 00:06:39,433 --> 00:06:40,566 if they're on the same scale. 122 00:06:40,566 --> 00:06:42,200 And in this case they are on the same scale 123 00:06:42,200 --> 00:06:44,433 because they're measured, everything's measured dollars here. 124 00:06:44,433 --> 00:06:47,800 And you can say that, a dollar increase 125 00:06:47,800 --> 00:06:53,800 in R&D spent drives $0.79, dollars $0.79 increase in profit and mark a dollar 126 00:06:53,800 --> 00:06:58,200 increase in marketing spend drives only, $0.03 increase in profit. 127 00:06:58,200 --> 00:07:01,300 So basically, if you're the venture capitalist, 128 00:07:01,566 --> 00:07:03,866 which companies are going to invest in, well, 129 00:07:03,866 --> 00:07:06,766 from this model you're going to decide that you should invest into 130 00:07:06,766 --> 00:07:09,766 companies are spending more on their R&D. 131 00:07:10,233 --> 00:07:12,400 there may be lots of reasons behind this. 132 00:07:12,400 --> 00:07:14,400 And this is not just a random fact. 133 00:07:14,400 --> 00:07:18,733 This could be true in reality because profit is revenue minus expenses. 134 00:07:18,733 --> 00:07:22,166 Maybe marketing does drive a lot of revenue to a company, 135 00:07:22,166 --> 00:07:24,900 but at the same time, maybe for these specific companies 136 00:07:24,900 --> 00:07:28,866 that we're looking at their expense, the cost of marketing. 137 00:07:28,866 --> 00:07:33,600 So, the prices that they pay for marketing is so high that the increase 138 00:07:33,600 --> 00:07:37,233 in profit, the net increase in profit is actually marginal. 139 00:07:37,233 --> 00:07:40,333 So the marketing is eating up a lot of the revenue that it's creating, 140 00:07:40,333 --> 00:07:43,333 whereas R&D is, you know, it's creating a lot of, 141 00:07:43,733 --> 00:07:46,466 revenue and a lot of it is actually staying in profit, something like that. 142 00:07:46,466 --> 00:07:48,400 But we're not worried about that right now. 143 00:07:48,400 --> 00:07:49,400 That's more financials. 144 00:07:49,400 --> 00:07:52,200 What we're doing is we're delivering a model. 145 00:07:52,200 --> 00:07:55,866 So that's how you interpret coefficients with, linear regression. 146 00:07:55,866 --> 00:07:57,300 It's very simple. 147 00:07:57,300 --> 00:08:02,333 Just remember about that per unit trick or tip I guess 148 00:08:03,033 --> 00:08:07,433 because if you forget about that you can, you know, make the wrong conclusions. 149 00:08:08,200 --> 00:08:10,666 Otherwise it's all pretty simple. 150 00:08:10,666 --> 00:08:13,666 The last thing that I wanted to mention here is that 151 00:08:13,833 --> 00:08:16,833 you can see that every time we run a model, the coefficients change. 152 00:08:17,200 --> 00:08:21,500 So what that is telling us is that coefficients actually only talk 153 00:08:21,500 --> 00:08:24,566 about the additional effect of every single variable, 154 00:08:25,166 --> 00:08:27,266 given that the other variables are already in place. 155 00:08:27,266 --> 00:08:31,166 So for instance, in this example, your coefficient of marketing, 156 00:08:31,166 --> 00:08:33,266 which is 0.029 157 00:08:33,266 --> 00:08:36,366 is means that given that R&D spend 158 00:08:36,366 --> 00:08:39,833 is already in your model and it's, fixed, 159 00:08:40,433 --> 00:08:45,000 then marketing spend adds additional, you know, contributes 160 00:08:45,000 --> 00:08:49,700 this additional effect of 0.0299, meaning that if you were to 161 00:08:50,533 --> 00:08:52,833 run a different model and take out R&D, spend, 162 00:08:52,833 --> 00:08:54,733 then the coefficient would be completely different. 163 00:08:54,733 --> 00:08:57,266 And that's what we see here, that when we take our marketing, 164 00:08:57,266 --> 00:09:00,466 the coefficient for and spend is, taken out. 165 00:09:00,466 --> 00:09:03,200 So that's another thing you should remember about that. 166 00:09:03,200 --> 00:09:05,000 Coefficients merely 167 00:09:06,233 --> 00:09:07,066 portray the 168 00:09:07,066 --> 00:09:10,333 additional effect that every single variable brings into the model. 169 00:09:10,900 --> 00:09:13,666 Hope you enjoyed this tutorial and I look forward to seeing you next time. 170 00:09:13,666 --> 00:09:15,466 Until then, happy analyzing!