0 1 00:00:00,330 --> 00:00:06,090 Okay so we've spec'd out our model and ran our first regression. 1 2 00:00:06,090 --> 00:00:12,450 We did a quick sense check of what we got back and we looked at the performance of our model. 2 3 00:00:12,450 --> 00:00:20,550 Now we're getting to our model evaluation stage. Evaluating and deploying the model is the final part 3 4 00:00:20,670 --> 00:00:22,060 of our workflow. 4 5 00:00:22,080 --> 00:00:24,480 We've done a lot of legwork up till now. 5 6 00:00:24,480 --> 00:00:26,380 We formulated our question. 6 7 00:00:26,430 --> 00:00:27,840 We've gathered our data. 7 8 00:00:27,930 --> 00:00:29,460 We've cleaned our data. 8 9 00:00:29,460 --> 00:00:33,190 We've explored and visualized our data, made sense of it. 9 10 00:00:33,210 --> 00:00:40,260 We've trained our algorithm and now we're getting to the point where we can evaluate our results, refine 10 11 00:00:40,260 --> 00:00:46,110 our model, look for problems, look for improvements and deploy our model. 11 12 00:00:46,440 --> 00:00:49,400 So what kind of things do we do at this stage? 12 13 00:00:49,710 --> 00:00:51,660 Where do we take it from here? 13 14 00:00:52,410 --> 00:00:58,900 Well we're going to start by asking a series of questions about our model and about our results. 14 15 00:00:58,980 --> 00:01:05,130 We're going to be checking for some problems and looking for improvements retraining our algorithm as 15 16 00:01:05,130 --> 00:01:06,300 necessary. 16 17 00:01:06,300 --> 00:01:11,430 The approach that we're going to take is kind of like going for a medical checkup. 17 18 00:01:11,430 --> 00:01:16,030 When you go see a doctor the doctor's typically ran some tests on you. 18 19 00:01:16,030 --> 00:01:16,520 Right? 19 20 00:01:16,560 --> 00:01:22,200 They measure your height, your weight, listen to your breathing, take an X-ray, measure your blood pressure, 20 21 00:01:22,440 --> 00:01:23,880 measure your heart rate, 21 22 00:01:23,880 --> 00:01:28,170 run a blood test, hit your knee with a tiny little hammer and so on. 22 23 00:01:28,170 --> 00:01:35,100 The point I'm trying to make is that the doctor will look at various stats to make their diagnosis. 23 24 00:01:35,160 --> 00:01:36,480 They will look at these stats, 24 25 00:01:36,480 --> 00:01:39,960 they will check if these stats are too high or too low, 25 26 00:01:39,990 --> 00:01:44,490 if there's a problem, all in light of your overall condition. 26 27 00:01:44,550 --> 00:01:52,020 Now it turns out that, just like your body, a regression model also has a number of stats that we can 27 28 00:01:52,020 --> 00:01:53,260 look at. 28 29 00:01:53,310 --> 00:01:56,640 In fact there are loads and loads of statistics that we can look at, 29 30 00:01:56,640 --> 00:01:58,210 like a scary amount. 30 31 00:01:58,380 --> 00:02:05,100 But my goal is to show you a couple of these statistics that are relevant to our regression model and 31 32 00:02:05,130 --> 00:02:08,630 also show you how to interpret them and how to make sense of them. 32 33 00:02:08,640 --> 00:02:14,340 The first one we're already quite familiar with because we've looked at r-squared extensively, but as 33 34 00:02:14,340 --> 00:02:19,590 part of this evaluation process I want to introduce you to several other stats as well. 34 35 00:02:19,590 --> 00:02:27,540 For example the p-values of the coefficients, the variance inflation factor and the Bayesian information 35 36 00:02:27,540 --> 00:02:29,010 criterion. 36 37 00:02:29,010 --> 00:02:35,100 We're going to dip our toes into the murky waters of statistics and get a taster for how to evaluate 37 38 00:02:35,400 --> 00:02:36,570 a regression model.