1 00:00:00,470 --> 00:00:04,710 In this video, we will assess the accuracy of the model that we have created. 2 00:00:07,020 --> 00:00:10,140 Now we have established the relationship between X and Y. 3 00:00:11,030 --> 00:00:12,240 What do we want to know? 4 00:00:12,870 --> 00:00:17,730 How well does he predicted by values with the actual Y values? 5 00:00:19,400 --> 00:00:23,940 So to assess the quality of it, we will look at two related quantities. 6 00:00:25,080 --> 00:00:30,020 One is it is a double standard edit and the other is called R Square. 7 00:00:32,760 --> 00:00:35,080 Let us first look at residual standard edit. 8 00:00:38,110 --> 00:00:45,760 We saw earlier that there's a double standard edit is on the road odysseys by and minus two, which 9 00:00:45,760 --> 00:00:53,050 can also be written like this, either summation or squared off difference between actual value and 10 00:00:53,050 --> 00:00:53,860 predicted value. 11 00:00:56,200 --> 00:01:02,770 Roughly speaking, Oddisee is the average amount that the response will deviate from the true regression 12 00:01:02,770 --> 00:01:03,040 line. 13 00:01:06,060 --> 00:01:09,120 As you can see in the desert shown below. 14 00:01:10,380 --> 00:01:13,290 This is the design we got when we landed more than an assortment. 15 00:01:14,160 --> 00:01:15,790 It is totally giving us details. 16 00:01:15,820 --> 00:01:18,280 Do a standard edit with is six point five minutes of. 17 00:01:18,690 --> 00:01:21,750 This model on 504 degrees of freedom. 18 00:01:22,320 --> 00:01:23,200 This 504. 19 00:01:23,310 --> 00:01:24,980 We are getting from minus two. 20 00:01:25,000 --> 00:01:26,080 And it's 506. 21 00:01:26,930 --> 00:01:28,620 And minus two is 504. 22 00:01:29,180 --> 00:01:31,020 And this is called the degrees of Freedom. 23 00:01:32,610 --> 00:01:37,620 So for these many degrees of freedom, we are getting a standard at all six point nine seven. 24 00:01:39,080 --> 00:01:45,930 And in other words, even if a model was correct and through values of redoes, it wouldn't be the one 25 00:01:45,930 --> 00:01:46,800 well known. 26 00:01:46,860 --> 00:01:47,610 Exactly. 27 00:01:48,480 --> 00:01:52,450 The predicted value of house price from this model. 28 00:01:53,380 --> 00:01:59,740 We'll still be off by six point five nine seven unit on an average. 29 00:02:01,790 --> 00:02:09,620 Therefore, Odyssey can also be considered as a measure of lack of fit of this model to the data. 30 00:02:10,550 --> 00:02:18,470 So this six point five nine seven value is telling you on an average by how many units your predictive 31 00:02:18,470 --> 00:02:21,830 value is missing, the actual value. 32 00:02:25,830 --> 00:02:27,240 Next is the Oswestry stick. 33 00:02:28,980 --> 00:02:35,460 The Odyssey provides an absolute measure of lack of food, but since it is measured in the units of 34 00:02:35,460 --> 00:02:40,610 light, it is not always clear what constitutes a good odyssey. 35 00:02:42,880 --> 00:02:48,030 So our R-squared provides us with an alternative squid as a proportion. 36 00:02:49,410 --> 00:02:53,010 The proportion of total variance explained by our model. 37 00:02:53,580 --> 00:02:56,160 So it always lies between zero and one. 38 00:02:57,310 --> 00:02:59,370 It is the mathematical formula for Askwith. 39 00:03:00,240 --> 00:03:06,690 R-squared is VSS minus Artosis upon basis would be assessed as total sum of squares. 40 00:03:07,500 --> 00:03:10,170 And Odyssey's is legitimate sum of squares. 41 00:03:12,170 --> 00:03:17,730 Yes, this is measuring the amount of variability inherent in the response. 42 00:03:18,600 --> 00:03:21,360 That is what our house prices data. 43 00:03:21,810 --> 00:03:25,980 The price of each house itself is writing about dimino space. 44 00:03:27,170 --> 00:03:34,460 So if you find the difference of actual house price from the mean of the house price. 45 00:03:35,580 --> 00:03:37,890 Square these values and add them up. 46 00:03:38,040 --> 00:03:39,900 You get those small squares. 47 00:03:41,440 --> 00:03:48,060 So this sort of sum of squared value is giving you the total amount of variability in whole space. 48 00:03:49,620 --> 00:03:57,420 How much of this is explained by the model that we call constructed or that we will use odysseys, Odyssey's 49 00:03:57,450 --> 00:04:04,650 is measuring the amount of variability that is not explained by our model of prediction and the assessed 50 00:04:04,680 --> 00:04:05,070 minus. 51 00:04:05,080 --> 00:04:09,150 Odyssey's is giving us the variability of way, which is explained by our model. 52 00:04:10,740 --> 00:04:16,830 Therefore, R-squared measures the proportion of explained variance from the total variance. 53 00:04:19,810 --> 00:04:25,630 R-squared venue, close to one, indicates that a large proportion of the variability in the response 54 00:04:25,630 --> 00:04:28,610 variable has been explained by the regression model. 55 00:04:30,070 --> 00:04:34,910 If it is close to zero, it indicates that regression did not explain much of divide evenly. 56 00:04:35,890 --> 00:04:43,300 This can occur either because out of linear model is wrong or because Linnean was not the right choice 57 00:04:43,300 --> 00:04:50,290 for this relationship between X and Y or both of these reasons or our model. 58 00:04:50,760 --> 00:04:56,020 The result given by the software packages that led to a standard error was six point five nine. 59 00:04:58,010 --> 00:05:00,240 R-squared value is zero point forty eight. 60 00:05:02,530 --> 00:05:04,390 So it is somewhere between zero and one. 61 00:05:05,860 --> 00:05:12,190 Nearly 50 percent of the video of the response variable is handled by the model that we constructed. 62 00:05:15,550 --> 00:05:16,810 There is an added value. 63 00:05:17,170 --> 00:05:21,170 We just call it just did R-squared, which you can see from the model result. 64 00:05:22,630 --> 00:05:31,240 The difference between this R-squared and this adjusted R-squared is that an adjusted R-squared will 65 00:05:31,240 --> 00:05:36,730 be altered, taking into account the total number of variables which are actually impacting the model. 66 00:05:38,230 --> 00:05:46,210 The reason behind doing this is if you keep on adding variables to your model, Osgoode value simply 67 00:05:46,210 --> 00:05:47,440 keeps on increasing. 68 00:05:48,980 --> 00:05:53,450 Even if the variable is not significantly related with the response variable. 69 00:05:54,470 --> 00:05:58,880 Still, the R-squared value will increase by less by a small amount. 70 00:06:01,520 --> 00:06:07,750 So the adjusted R-squared is a modified version of R-squared that has been adjusted for the number of 71 00:06:07,750 --> 00:06:14,860 predictors in the model, the adjusted R-squared increase is only the new term, improved the model 72 00:06:14,950 --> 00:06:16,900 more than would be expected by chance. 73 00:06:17,920 --> 00:06:22,060 It decreases when they predict that improves the model by less than expected by tons. 74 00:06:23,950 --> 00:06:26,410 So adjusted R-squared is a more preferred term. 75 00:06:26,410 --> 00:06:26,710 Or what? 76 00:06:26,800 --> 00:06:27,360 R-squared. 77 00:06:30,620 --> 00:06:37,000 We have a value of R-squared, but what value of R-squared will be considered as a good value of R-squared? 78 00:06:38,540 --> 00:06:41,870 This will generally depend on the type of application that you get. 79 00:06:42,710 --> 00:06:49,170 If the data is coming from a science experiment and the relationship is supposed to be actually linear 80 00:06:49,910 --> 00:06:52,930 in such a case, Oscar should be very close to one. 81 00:06:54,350 --> 00:07:01,730 But if it is a marketing data and we are missing a lot of unmeasured factors and the lenient assumption 82 00:07:01,970 --> 00:07:04,880 is also a rough approximation of the relationship. 83 00:07:06,270 --> 00:07:08,330 The residual errors are going to be large. 84 00:07:09,850 --> 00:07:13,840 In such a case, even smaller R-squared values can be acceptable. 85 00:07:14,650 --> 00:07:19,270 Generally, Oscar seventeen point five can be considered as a corporate model.