1 00:00:00,600 --> 00:00:01,066 All right. 2 00:00:01,066 --> 00:00:04,066 So multiple linear regression data set. 3 00:00:04,266 --> 00:00:05,333 Let's get straight into it. 4 00:00:05,333 --> 00:00:08,100 This one's going to be fun 50 startups. 5 00:00:08,100 --> 00:00:11,400 This is going to be a venture capitalist a fund challenge. 6 00:00:11,400 --> 00:00:12,733 All right. 7 00:00:12,733 --> 00:00:14,233 So what do we have here. 8 00:00:14,233 --> 00:00:17,266 We've got we've only got five columns. 9 00:00:17,433 --> 00:00:21,000 But the real interesting thing about this data set, 10 00:00:21,000 --> 00:00:26,466 this isn't like a very realistic life, like, business challenge. 11 00:00:26,700 --> 00:00:31,466 So you've got, 50 companies here, 50 companies in total. 12 00:00:31,800 --> 00:00:34,633 And what they have is 13 00:00:34,633 --> 00:00:37,633 they have some extracts from their, 14 00:00:38,000 --> 00:00:41,300 profit and loss statements from their income reports. 15 00:00:41,300 --> 00:00:45,133 So how much did the company in this given financial year that you're analyzing? 16 00:00:45,133 --> 00:00:49,133 And for how much in that year did it spend on research and development? 17 00:00:49,600 --> 00:00:52,500 How much did it spend on an administration like paying employees, 18 00:00:52,500 --> 00:00:55,500 paying executives, and so on, and how much it had spent on marketing? 19 00:00:55,666 --> 00:01:00,233 So those are the three major, operational spends, I guess, 20 00:01:01,300 --> 00:01:02,900 and which stated works. 21 00:01:02,900 --> 00:01:05,666 And and finally, what is the profit? 22 00:01:05,666 --> 00:01:09,300 What was the profit of that company for that financial year? 23 00:01:10,200 --> 00:01:13,400 And the challenge here is this is a dataset. 24 00:01:13,400 --> 00:01:15,666 It's totally anonymized. So we don't know the companies. 25 00:01:15,666 --> 00:01:20,400 And also the there's only 50 companies. 26 00:01:20,400 --> 00:01:25,333 So there's a venture capitalist fund that has hired you as a data scientist 27 00:01:25,333 --> 00:01:29,866 to analyze these 50 companies over, analyze this dataset and create a model 28 00:01:30,500 --> 00:01:35,166 that will tell the venture capitalists fund which types of companies it should, 29 00:01:35,566 --> 00:01:39,200 it is most interested in investing. 30 00:01:39,200 --> 00:01:41,366 And their main criteria is the profit. 31 00:01:41,366 --> 00:01:44,433 So profit is their, dependent variable. 32 00:01:44,433 --> 00:01:45,733 The most important variable for them. 33 00:01:45,733 --> 00:01:47,566 And these are all independent variables. 34 00:01:47,566 --> 00:01:50,700 So you have to create a model which will tell you about profit 35 00:01:50,700 --> 00:01:52,800 based on R&D spent in the Australian market. 36 00:01:52,800 --> 00:01:53,800 Spend instead. 37 00:01:53,800 --> 00:01:55,100 And bear in mind that the 38 00:01:55,100 --> 00:01:59,266 the venture capitalist fund is not looking to invest in these 50 companies per se. 39 00:01:59,266 --> 00:02:01,800 It's not just, giving you a data set. 40 00:02:01,800 --> 00:02:04,100 And then like obviously then they would invest in this one 41 00:02:04,100 --> 00:02:06,066 because it's got the highest profit, profit. 42 00:02:06,066 --> 00:02:07,733 But what they're looking for is 43 00:02:08,833 --> 00:02:09,766 this is a sample. 44 00:02:09,766 --> 00:02:13,666 And they want to understand, for instance, where companies perform 45 00:02:13,666 --> 00:02:16,666 better in New York or California, all other things held equal, 46 00:02:16,733 --> 00:02:20,533 or which companies perform better if you hold these columns equal. 47 00:02:20,966 --> 00:02:21,400 All right. 48 00:02:21,400 --> 00:02:23,733 Will a company that spends more on marketing perform 49 00:02:23,733 --> 00:02:26,000 better, or a company that spends less on marketing? 50 00:02:26,000 --> 00:02:29,733 Also, they want to understand, the main thing probably I want to understand is, 51 00:02:30,466 --> 00:02:34,800 how when they assess companies, do they look for company to spend more on R&D 52 00:02:34,800 --> 00:02:39,100 spend or on research and development or companies that spend more on marketing? 53 00:02:39,100 --> 00:02:43,233 So which of these two, spends yields 54 00:02:43,500 --> 00:02:46,500 better results of profit, brings more results of profit. 55 00:02:47,100 --> 00:02:50,466 And based on your model that you'll create, they will have a 56 00:02:50,766 --> 00:02:54,600 they'll set up a set of guidelines for their own, venture capitalist fund. 57 00:02:54,600 --> 00:02:58,033 And they'll be like, okay, so we are more interested in companies. 58 00:02:58,033 --> 00:03:01,300 This is just an example that work in New York, that operate in New York 59 00:03:01,566 --> 00:03:07,066 and that have a very low administration spend and a very high R&D spend, which 60 00:03:07,333 --> 00:03:11,000 so the R&D spend has to be much higher than administration, marketing spend, 61 00:03:11,100 --> 00:03:11,900 something like that. 62 00:03:11,900 --> 00:03:17,100 So basically you helping them create a model, based off of this sample 63 00:03:17,100 --> 00:03:20,100 that will allow them to, assess 64 00:03:20,433 --> 00:03:23,700 where and in which into which companies 65 00:03:23,700 --> 00:03:27,300 they want to invest to achieve their goal of maximizing profit. 66 00:03:28,033 --> 00:03:28,800 So there you go. 67 00:03:28,800 --> 00:03:31,500 It's it's already not an obvious data set. 68 00:03:31,500 --> 00:03:33,666 It's got many variables, got many records. 69 00:03:33,666 --> 00:03:35,333 You can't just tell off the top of your head. 70 00:03:35,333 --> 00:03:38,200 You can you probably can see that they're ordered by profit here. Right. 71 00:03:38,200 --> 00:03:40,733 But there is kind of a mix all over the place. 72 00:03:40,733 --> 00:03:44,266 So let's get into this is going to be a fun and exciting section.