1 00:00:00,270 --> 00:00:01,620 All righty then. 2 00:00:01,620 --> 00:00:06,180 So now we've covered some classification metrics and by the way I've just put this little paragraph 3 00:00:06,180 --> 00:00:11,580 here so you can check it out of when to use different classification metrics just to kind of break it 4 00:00:11,580 --> 00:00:13,020 down in five dot points for you. 5 00:00:13,800 --> 00:00:18,600 But now we're going to cover some regression model evaluation metrics and put your hand up if you're 6 00:00:18,600 --> 00:00:20,790 ready. 7 00:00:20,870 --> 00:00:21,920 I'm holding my hand up too. 8 00:00:21,920 --> 00:00:22,900 Don't worry. 9 00:00:22,910 --> 00:00:23,300 All right. 10 00:00:23,510 --> 00:00:35,390 So let's put in a little hitting for point to point to regression model evaluation metrics so as always 11 00:00:35,450 --> 00:00:38,060 the documentation is ready to go. 12 00:00:38,060 --> 00:00:39,030 So we've got this here. 13 00:00:39,050 --> 00:00:40,810 I might put this in here. 14 00:00:41,120 --> 00:00:48,510 Another link here model evaluation metrics documentation. 15 00:00:48,510 --> 00:00:53,090 Now I don't want you to be scared like read in the documentation like I was when I first started. 16 00:00:53,100 --> 00:00:58,380 If you read this and there might be a few times we go through it and you go wow you see something like 17 00:00:58,380 --> 00:01:02,280 this you see a bunch of different words here that you don't really understand. 18 00:01:02,310 --> 00:01:06,480 You look at a bunch of different examples you're reading this and it's like kind of confusing. 19 00:01:06,690 --> 00:01:07,490 Don't worry. 20 00:01:07,620 --> 00:01:12,480 That's exactly how I started but after a little bit of practice after implementing the code like we're 21 00:01:12,480 --> 00:01:13,530 about to do. 22 00:01:13,530 --> 00:01:15,900 That's why I'm such a big fan of implementing code. 23 00:01:15,930 --> 00:01:20,760 I started to understand a little bit more started to understand about what I needed to use. 24 00:01:20,880 --> 00:01:27,270 So speaking of what we need to use for regression we're going to look at three different ones. 25 00:01:27,270 --> 00:01:30,170 Now these are three of the most common three of the most useful. 26 00:01:30,220 --> 00:01:41,560 The first one is r squared pronounced r squared or coefficient of determination beautiful. 27 00:01:41,750 --> 00:01:51,380 And the second one is main absolute area which is also known as M.I.T. and the third one is main squared 28 00:01:53,980 --> 00:01:57,400 error which is also known as MSE. 29 00:01:57,650 --> 00:01:58,420 Wonderful. 30 00:01:58,430 --> 00:01:59,450 So now we've got that. 31 00:01:59,510 --> 00:02:01,130 Let's bring back our regression model. 32 00:02:01,160 --> 00:02:04,130 So from S.K. learn dot ensemble 33 00:02:07,310 --> 00:02:11,880 import random forest regress. 34 00:02:13,100 --> 00:02:13,710 Wonderful. 35 00:02:13,880 --> 00:02:15,620 We'll set up a random seed. 36 00:02:15,860 --> 00:02:18,070 So our results are reproducible. 37 00:02:18,170 --> 00:02:23,990 Then we'll get our X which is the feature variables from our Boston data frame. 38 00:02:24,060 --> 00:02:30,110 So we want to just drop the target and access equals 1. 39 00:02:30,110 --> 00:02:35,600 Labels are the target column or is the target column. 40 00:02:35,600 --> 00:02:37,160 Wonderful. 41 00:02:37,160 --> 00:02:40,580 And now we'll split our data into train test 42 00:02:47,090 --> 00:02:52,250 using and test split passing it X and Y 43 00:02:54,850 --> 00:02:56,010 wonderful. 44 00:02:56,320 --> 00:03:00,730 And then what we're going to do is instantiate our random forest regress. 45 00:03:01,180 --> 00:03:05,370 Because we want to build a regression model so we can evaluate it. 46 00:03:05,680 --> 00:03:10,360 Wonderful model dot fit X train y train. 47 00:03:10,420 --> 00:03:15,910 Now this is going to give us a warning because we haven't set a number of estimates to be in 100 and 48 00:03:15,910 --> 00:03:21,900 we're going to run this invalid syntax classic from now on. 49 00:03:22,000 --> 00:03:23,380 Important. 50 00:03:23,460 --> 00:03:24,000 Let get. 51 00:03:24,440 --> 00:03:29,130 Oh Boston IDF is not defined because they've got typos. 52 00:03:29,340 --> 00:03:30,130 There we go. 53 00:03:30,360 --> 00:03:34,050 See this what happens right when you're writing code don't expect to get it right the first time you're 54 00:03:34,050 --> 00:03:35,280 gonna get errors. 55 00:03:35,280 --> 00:03:41,430 So always remember if in doubt run the code and then go back and fix the errors when you need to use 56 00:03:41,490 --> 00:03:46,500 dot score we've seen this line test wonderful. 57 00:03:46,700 --> 00:03:50,060 And so now ask Where can be calculated using. 58 00:03:50,060 --> 00:03:51,990 Well actually when my getting asked Where from. 59 00:03:52,460 --> 00:04:00,740 So this is remember regression metric number one and put it here involved so we know that we're looking 60 00:04:00,740 --> 00:04:07,140 at ask grand so that is the default metric here. 61 00:04:07,300 --> 00:04:08,010 Right. 62 00:04:08,020 --> 00:04:10,340 And so the coefficient of determination. 63 00:04:10,390 --> 00:04:12,610 R squared of the prediction. 64 00:04:12,610 --> 00:04:21,110 Okay so if we wanted to figure out what exactly are squared was where we do that we go to Wikipedia 65 00:04:21,560 --> 00:04:27,080 in statistics the coefficient of determination denoted r squared or square and pronounce r squared is 66 00:04:27,080 --> 00:04:32,840 a proportion of the variance in the dependent variable that is predictable from independent variables. 67 00:04:33,230 --> 00:04:35,500 There's a pretty complicated definitions. 68 00:04:35,640 --> 00:04:39,560 Well after doing some research I created one of my own right. 69 00:04:39,560 --> 00:04:43,850 This is what I'd like you to do right if you ever come across something and you see the formal definitions 70 00:04:43,850 --> 00:04:44,190 of them. 71 00:04:44,210 --> 00:04:48,350 When you look at them from first glance seem like this is kind of confusing is to go in and search and 72 00:04:48,350 --> 00:04:51,760 find a way to explain things in your own words. 73 00:04:51,770 --> 00:04:59,370 So in my own words and r squared value compares your models predictions here. 74 00:04:59,720 --> 00:05:09,720 What squared does compares your model's predictions to the main of the target. 75 00:05:10,290 --> 00:05:15,690 This is the values of our squared can range from negative infinity. 76 00:05:15,690 --> 00:05:17,080 So that's the lowest possible. 77 00:05:17,760 --> 00:05:19,340 That's a very poor model. 78 00:05:21,230 --> 00:05:22,200 2 1. 79 00:05:22,240 --> 00:05:25,530 Now this is where I love having example right for example. 80 00:05:25,900 --> 00:05:39,530 If all your model does is predict the main of the targets it's a square value would be zero. 81 00:05:41,500 --> 00:05:55,910 And if your model perfectly predicts a range of numbers it's a square value would be 1. 82 00:05:56,680 --> 00:06:02,140 If ever you come across something you don't understand research you look at different sources go to 83 00:06:02,140 --> 00:06:08,860 Wikipedia up here read the documentation right see an example but then most importantly implement it 84 00:06:08,870 --> 00:06:11,710 yourself so I want you not to take my word for it here. 85 00:06:11,830 --> 00:06:16,300 For example if all your model does is predict the mean of the targets it's ask what value would be zero 86 00:06:16,720 --> 00:06:20,590 and if you model perfectly predicts a range of numbers it's ask when value would be 1. 87 00:06:20,590 --> 00:06:25,730 So as I said don't take my word for it but let's see this in action. 88 00:06:25,750 --> 00:06:32,360 So from SBA loan metrics import this is another way to calculate ask where it is. 89 00:06:32,380 --> 00:06:34,630 You could just go to score from there. 90 00:06:34,680 --> 00:06:36,740 So I get loan metrics. 91 00:06:36,950 --> 00:06:43,650 Then we go here and we want to fill an array with why test main. 92 00:06:43,680 --> 00:06:48,510 So the main values from the Y test dataset so we can do that pretty easily with number y. 93 00:06:48,540 --> 00:06:56,190 So why test main equals NDP full so meaningful array of Len. 94 00:06:56,370 --> 00:06:57,300 Why test. 95 00:06:57,300 --> 00:07:01,260 So we want it to be the same length of Y test and then we're gonna fill it with Y. 96 00:07:01,260 --> 00:07:03,100 Tests don't mean. 97 00:07:03,100 --> 00:07:04,080 So does that make sense. 98 00:07:06,180 --> 00:07:06,780 Beautiful. 99 00:07:07,230 --> 00:07:10,230 So if we look at why tests don't mean 100 00:07:12,940 --> 00:07:21,460 all this array is is an array full of those values well mobile will move on and so remember what my 101 00:07:21,460 --> 00:07:22,600 example said. 102 00:07:22,600 --> 00:07:28,600 If all your model does is predict the mean of the target it's r squared value would be zero. 103 00:07:28,620 --> 00:07:34,380 Okay well let's test this out because that's what we do where we're engineers we test things out to 104 00:07:34,390 --> 00:07:35,090 score. 105 00:07:35,290 --> 00:07:36,260 Why test. 106 00:07:36,310 --> 00:07:41,670 We're going to compare the true labels all our model did was predict the mean. 107 00:07:41,810 --> 00:07:47,770 So very simple model 0 and gets an R2 score of 0. 108 00:07:48,450 --> 00:07:50,260 Well okay. 109 00:07:50,290 --> 00:07:55,660 And now the second part of that example was and if your model perfectly predicts a range of numbers 110 00:07:55,810 --> 00:07:57,790 it's r squared value would be 1. 111 00:07:58,420 --> 00:07:58,740 Okay. 112 00:07:58,770 --> 00:08:04,000 So if our model got the exact same predictions as the test values it's r squared value would be 1. 113 00:08:04,000 --> 00:08:06,490 So let's see then we can do this on pretty easily. 114 00:08:06,490 --> 00:08:07,280 Why test. 115 00:08:07,300 --> 00:08:13,780 Now I forgot the exact same predictions if it predicted the Y test labels perfectly it would end up 116 00:08:13,780 --> 00:08:15,130 with a score of 1. 117 00:08:15,130 --> 00:08:16,990 So what does this tell us. 118 00:08:16,990 --> 00:08:19,240 Well this gives us a quick indication. 119 00:08:19,240 --> 00:08:25,420 This score function which implements the coefficient of determination a.k.a. the r squared gives us 120 00:08:25,420 --> 00:08:31,120 a quick insight into how closely our model's predictions are to perfect predictions. 121 00:08:31,120 --> 00:08:35,940 So of course 1.0 is perfect and so we've got 1.0. 122 00:08:35,950 --> 00:08:40,120 We saw that here and if it was predicting nothing but just the mean it would get zero. 123 00:08:40,150 --> 00:08:43,330 And it says here that the value can range from negative infinity. 124 00:08:43,330 --> 00:08:48,970 Well what this means is that if our model predicted completely off the radar this value can actually 125 00:08:48,970 --> 00:08:50,040 go negative. 126 00:08:50,080 --> 00:08:55,240 So the main is actually an okay prediction compared to something that was just predicting all zeros 127 00:08:55,360 --> 00:08:57,420 right now. 128 00:08:57,430 --> 00:09:02,920 Now we've seen r squared it kind of give us quick insight into how well our model may be doing but it 129 00:09:03,070 --> 00:09:07,170 doesn't really tell us how far off each prediction is. 130 00:09:07,210 --> 00:09:10,030 So to do that we're going to use mean absolute error. 131 00:09:10,060 --> 00:09:11,410 So we'll look at that in the next video.