1 00:00:00,390 --> 00:00:08,430 In this video, we will assess the accuracy of the model that we have created, that we have established 2 00:00:08,430 --> 00:00:17,040 the relationship between X and Y, but we want to know how well does the predicted values with the actual 3 00:00:17,040 --> 00:00:17,720 Y values. 4 00:00:19,380 --> 00:00:21,270 So assess the quality of it. 5 00:00:21,780 --> 00:00:23,940 We will look at two related qualities. 6 00:00:25,020 --> 00:00:30,000 One is the digital standard edit and the other is called R Square. 7 00:00:32,700 --> 00:00:35,070 Let us first look at the standard error. 8 00:00:38,050 --> 00:00:45,790 We saw earlier that the visitors center that it is on the route are assessed by and minus two, which 9 00:00:45,790 --> 00:00:53,470 can also be written like this, a summation of squared of difference between actual value and predictive 10 00:00:53,470 --> 00:00:53,830 value. 11 00:00:56,130 --> 00:01:02,770 Roughly speaking, Oddisee is the average amount that the response will deviate from the true regression 12 00:01:02,770 --> 00:01:03,030 line. 13 00:01:05,940 --> 00:01:13,270 As you can see in the results shown below, this is the result we got when we ran the model and software, 14 00:01:14,020 --> 00:01:19,170 it is straightaway giving us the standard error, which is six point five minutes of for this model 15 00:01:19,800 --> 00:01:21,700 on 504 degrees of freedom. 16 00:01:22,260 --> 00:01:28,610 This 504 we are getting from minus two and this or to minus two is 504. 17 00:01:29,070 --> 00:01:30,990 And this is called the degrees of Freedom. 18 00:01:32,580 --> 00:01:37,610 So for these many degrees of freedom, we are getting a standard error of six point nine seven. 19 00:01:38,940 --> 00:01:47,580 And in other words, even if the model was correct and true values of zero and we were never known exactly 20 00:01:48,420 --> 00:01:52,410 the predicted value of house price from this model. 21 00:01:53,320 --> 00:01:59,710 They'll still be off by six point five nine seven units on an average. 22 00:02:01,730 --> 00:02:09,650 Therefore, Odyssey can also be considered as a measure of lack of fear of this model to the data. 23 00:02:10,460 --> 00:02:18,470 So this six point five nine seven value is telling you on an average by how many units your predictive 24 00:02:18,470 --> 00:02:21,830 value is missing, the actual value. 25 00:02:25,740 --> 00:02:34,140 Next is the R-squared statistic, The Odyssey provides an absolute measure of lack of it, but since 26 00:02:34,140 --> 00:02:40,650 it is measured in the units of light, it is not always clear what constitutes a good odyssey. 27 00:02:42,870 --> 00:02:45,620 So R-squared provides us with an alternative. 28 00:02:46,590 --> 00:02:48,030 Oscar is a proportion. 29 00:02:49,350 --> 00:02:57,390 The proportion of total variance explained by our model, so it always lies between zero and one, it 30 00:02:57,470 --> 00:03:04,560 is the mathematical formula for Askwith, R-squared, squaddies versus minus Odyssey's upon basis that 31 00:03:04,760 --> 00:03:10,140 assesses total sum of squares and Odyssey's vegetables are Moscowitz. 32 00:03:12,030 --> 00:03:21,360 DSF is measuring the amount of variability inherent in the response that is porras house prices data. 33 00:03:21,720 --> 00:03:25,950 The price of each house itself is varying about the mean house price. 34 00:03:27,110 --> 00:03:34,460 So if you find a difference of actual Osprey's from the man of the house price. 35 00:03:35,520 --> 00:03:44,220 Square these values and add them up, you get total sum of squares, so this total sum of square value 36 00:03:44,400 --> 00:03:48,030 is giving you the total amount of variability in how space. 37 00:03:49,530 --> 00:03:57,450 How much of this is explained by the model that we constructed or that we will use Odyssey's Odyssey's 38 00:03:57,450 --> 00:04:04,290 is measuring the amount of variability that is not explained by our model of the degradation and the 39 00:04:04,350 --> 00:04:05,070 assessed minus. 40 00:04:05,070 --> 00:04:09,120 Odyssey's is giving us the variability of light, which is explained by our model. 41 00:04:10,680 --> 00:04:16,770 Therefore, as good measures, the proportion of explained variance from the total variance. 42 00:04:19,720 --> 00:04:26,200 Oscar winning close to one indicates that a large proportion of the variability in the response variable 43 00:04:26,260 --> 00:04:33,070 has been explained by the regression model, if it is close to zero, it indicates that the regression 44 00:04:33,070 --> 00:04:34,980 did not explain much of the variability. 45 00:04:35,830 --> 00:04:43,510 This can occur either because our linear model is wrong or because linear was not the right choice for 46 00:04:43,510 --> 00:04:50,230 this relationship between X and Y or both of these reasons for our model. 47 00:04:50,800 --> 00:04:56,020 The result given by the software packages that started at about six point forty nine. 48 00:04:57,810 --> 00:05:00,180 R-squared value is zero point forty. 49 00:05:02,470 --> 00:05:09,130 So it is somewhere between zero and one, nearly 50 percent of the variability of the response variable 50 00:05:09,670 --> 00:05:12,250 is handled by the model that reconstructed. 51 00:05:15,490 --> 00:05:21,230 There is another value, we just called it just R-squared, which you can see from the model result, 52 00:05:22,570 --> 00:05:31,270 the difference between this R-squared and this adjusted R-squared is that in adjusted R-squared will 53 00:05:31,270 --> 00:05:36,710 be also taking into account the total number of variables which are actually impacting the model. 54 00:05:38,170 --> 00:05:46,210 The reason behind doing this is if you keep on adding variables to your model, Osgoode value simply 55 00:05:46,210 --> 00:05:47,440 keeps on increasing. 56 00:05:48,890 --> 00:05:56,450 Even if the variable is not significantly related with the response variable, still the Oscar value 57 00:05:56,780 --> 00:05:58,890 will increase by by a small amount. 58 00:06:01,840 --> 00:06:07,750 So the adjusted R-squared is a modified version of R-squared that has been adjusted for the number of 59 00:06:07,750 --> 00:06:15,190 predictors and the model, the adjusted R-squared increases only the new term improved the model more 60 00:06:15,190 --> 00:06:16,210 than would be expected. 61 00:06:16,210 --> 00:06:22,010 By chance, it decreases when the predictor improves the model by less than expected by chance. 62 00:06:23,920 --> 00:06:24,760 So just did. 63 00:06:24,760 --> 00:06:27,370 R-squared is a more preferred term over R-squared. 64 00:06:30,180 --> 00:06:36,560 Now we have a value of R-squared, but what value of R-squared will be considered as a good value of 65 00:06:36,560 --> 00:06:37,020 R-squared? 66 00:06:38,480 --> 00:06:41,920 This will generally depend on the type of application that you have. 67 00:06:42,620 --> 00:06:49,190 If the data is coming from a science experiment and the relationship is supposed to be actually linear 68 00:06:49,790 --> 00:06:52,910 in such a case, Oscar should be very close to one. 69 00:06:54,300 --> 00:07:01,730 But if it is a marketing data and we are missing a lot of unmeasured factors and the linear assumption 70 00:07:01,850 --> 00:07:04,880 is also a rough approximation of the relationship. 71 00:07:06,170 --> 00:07:08,300 These letters are going to be large. 72 00:07:09,730 --> 00:07:17,290 In such a case, even smaller Oscar values can be acceptable generally, Oscar, better than point five 73 00:07:17,440 --> 00:07:19,360 can be considered as a good fit model.