0 1 00:00:00,360 --> 00:00:06,960 The first step in data science and machine learning is always defining the problem that you want to 1 2 00:00:06,960 --> 00:00:11,800 solve and making sure that you're asking the right questions. 2 3 00:00:13,120 --> 00:00:19,460 So in this video let's specify the exact question that we want to answer. 3 4 00:00:19,500 --> 00:00:26,870 We're going to formulate our goal and we're going to set the context for the upcoming videos. 4 5 00:00:26,910 --> 00:00:34,470 Imagine that you're living in the beautiful city of Boston, Massachusetts in the United States. 5 6 00:00:34,740 --> 00:00:40,880 You moved there a number of years ago and you're doing your best to pick up the local accent. 6 7 00:00:40,900 --> 00:00:47,550 Now I am told that you can't really do a Boston accent without liberal use of profanity so I will resist 7 8 00:00:47,850 --> 00:00:53,360 the temptation of doing an impression of one right now. Anyhow, 8 9 00:00:53,700 --> 00:01:00,030 imagine you're living in Boston and you're working in the real estate business and you have this friend 9 10 00:01:00,060 --> 00:01:01,910 who recently moved to the city, 10 11 00:01:01,980 --> 00:01:09,300 so you decide to meet up with them in one of the local coffee shops. You two get talking and he asks 11 12 00:01:09,300 --> 00:01:10,040 you 12 13 00:01:10,170 --> 00:01:13,650 "So uh I'm looking to buy a place to live here, 13 14 00:01:13,650 --> 00:01:21,580 how much does a house cost in Boston?". Now this is the opportunity that you've been waiting for all your 14 15 00:01:21,580 --> 00:01:22,190 life. 15 16 00:01:22,210 --> 00:01:27,850 So you grab your little calculator and your little model house from your bag and you put it onto the 16 17 00:01:27,850 --> 00:01:30,040 table and coyly respond: 17 18 00:01:30,040 --> 00:01:34,790 "Well it depends what kind of house are we talking about." 18 19 00:01:34,810 --> 00:01:35,180 "I don't know", 19 20 00:01:35,200 --> 00:01:36,460 says your friend. 20 21 00:01:36,460 --> 00:01:38,900 How much does a house cost around here? 21 22 00:01:38,920 --> 00:01:40,000 You work in real estate, 22 23 00:01:40,000 --> 00:01:46,720 you should know. Your friend has completely ignored your request for more information and to be more 23 24 00:01:46,720 --> 00:01:50,700 specific and continues to press you on an answer. 24 25 00:01:51,040 --> 00:01:52,570 So you respond: 25 26 00:01:52,570 --> 00:01:54,420 "It depends on the size of the house, mate. 26 27 00:01:54,460 --> 00:01:55,660 What are we talking about? 27 28 00:01:55,660 --> 00:02:00,540 A shoebox of an apartment or an MTV Cribs style mansion? Also, 28 29 00:02:00,580 --> 00:02:05,460 which part of town do you want to buy a house in - downtown or in the suburbs?" 29 30 00:02:05,530 --> 00:02:11,700 Your friend just stares you down and says "Just give me a number". What are you gonna say to that? With 30 31 00:02:11,700 --> 00:02:13,460 no information about the kind of house, 31 32 00:02:13,650 --> 00:02:16,350 no information about the home's features, 32 33 00:02:16,350 --> 00:02:18,510 no information on the location, 33 34 00:02:18,540 --> 00:02:22,680 what's the best possible answer that you can give your friend? 34 35 00:02:22,770 --> 00:02:28,370 What's the most truthful way to answer this question? 35 36 00:02:28,460 --> 00:02:34,540 The answer is just the average home price in Boston, right? 36 37 00:02:34,730 --> 00:02:40,970 So you turn to your friend smile and say "567500 dollars", 37 38 00:02:41,480 --> 00:02:46,980 and you silently decide that you never want to meet somebody who is so poor at phrasing their questions 38 39 00:02:47,030 --> 00:02:49,190 ever again. 39 40 00:02:49,220 --> 00:02:55,100 The next morning you roll into your snazzy downtown office and your boss calls you over and says "I've 40 41 00:02:55,100 --> 00:02:56,730 got a new project for you. 41 42 00:02:57,200 --> 00:03:04,340 Our real estate agents need a software tool that helps them value homes in the Boston area 42 43 00:03:04,370 --> 00:03:11,510 at the push of a button. This tool needs to output a benchmark price based on the characteristics of 43 44 00:03:11,510 --> 00:03:14,540 the home. And by characteristics, 44 45 00:03:14,540 --> 00:03:20,030 I mean things like how many rooms the home has or how much crime there is in the area plus a whole bunch 45 46 00:03:20,030 --> 00:03:22,880 of other factors. Also, 46 47 00:03:22,920 --> 00:03:29,340 we want to see the contribution of each factor in your model so don't create a magic black box that 47 48 00:03:29,340 --> 00:03:31,190 just spits out a number. 48 49 00:03:31,200 --> 00:03:37,200 Our agents each know which features of a house are more important in determining the house price and 49 50 00:03:37,200 --> 00:03:40,960 which factors are less important in determining the house price. 50 51 00:03:41,010 --> 00:03:46,950 Our real estate agents need to know what the premium is for living in a home where parents can send 51 52 00:03:46,950 --> 00:03:49,370 their kids to a good school. 52 53 00:03:49,620 --> 00:03:54,190 In other words this valuation tool needs to be tractable. 53 54 00:03:54,390 --> 00:04:00,450 Also if your tool doesn't completely suck, then we might even put it up on our website for anyone to 54 55 00:04:00,450 --> 00:04:02,240 get a valuation for their home. 55 56 00:04:02,280 --> 00:04:08,980 The same way that companies like Zillow do it or Zoopla in the UK. Can you do this? 56 57 00:04:09,000 --> 00:04:12,630 Do you accept this mission?" 57 58 00:04:13,290 --> 00:04:16,760 So you look at your boss and you think yourself 58 59 00:04:17,580 --> 00:04:23,600 "You know what, this sounds like the kind of thing that my friend from the coffee shop would have needed. 59 60 00:04:23,600 --> 00:04:27,120 And also this sounds like an awesome project." 60 61 00:04:27,180 --> 00:04:34,650 So you look your boss square in the eyes and say "Roger that, I can do this" and you fist bump your boss 61 62 00:04:34,710 --> 00:04:38,850 because that's what people do in Boston and in stock photography. 62 63 00:04:38,850 --> 00:04:43,230 Also the intern standing next to you joins in with the fist bump and off you go.