1 00:00:01,100 --> 00:00:06,570 In this video, I will discuss with you the results of our models so that when you see the results, 2 00:00:06,780 --> 00:00:11,310 you are able to a call business insight for your business problem. 3 00:00:13,500 --> 00:00:19,950 If you remember when we landed logistic model, we get a desert like this in our model, which contains 4 00:00:20,070 --> 00:00:21,300 all the beta values. 5 00:00:21,600 --> 00:00:26,730 And this column, DEPI values in this column for all the independent variables. 6 00:00:27,960 --> 00:00:35,000 So first thing that we should look at when we have this table is this column containing the P value. 7 00:00:36,600 --> 00:00:43,350 As I told you earlier, also P value represents our confidence level when you are saying that whether 8 00:00:43,350 --> 00:00:46,950 this variable is impacting the response variable or not. 9 00:00:47,400 --> 00:00:52,650 So this variable residential area has a P value, which is very small. 10 00:00:52,800 --> 00:00:56,560 This ease of presenting exponential, so to about minus 13. 11 00:00:57,300 --> 00:00:59,170 So this value is very small. 12 00:00:59,790 --> 00:01:05,850 Which means that we are very confident that this variable is impacting our response variable. 13 00:01:08,250 --> 00:01:13,410 Similarly, all these values, which are very small, are impacting our response very well. 14 00:01:14,250 --> 00:01:21,150 If you look at air quality, it is zero point zero two seven, which is less than five percent. 15 00:01:21,570 --> 00:01:29,010 We can still take air quality and in hospitals, all these other variables, which are more than zero 16 00:01:29,010 --> 00:01:32,070 point zero five for these variables. 17 00:01:32,310 --> 00:01:39,060 We are not sufficiently confident to make these Stegeman that these variables are actually impacting 18 00:01:39,060 --> 00:01:40,110 our response variable. 19 00:01:41,520 --> 00:01:48,810 So if you want to remove some of the variables, we will pick the variable with largest P value and 20 00:01:48,810 --> 00:01:49,940 we can remove it. 21 00:01:53,350 --> 00:01:59,950 So the first thing we see is this column and identify the variables would be value less than zero point 22 00:01:59,950 --> 00:02:00,480 zero five. 23 00:02:02,260 --> 00:02:05,110 Next thing we do is to look at this bit of values. 24 00:02:06,070 --> 00:02:12,610 If you remember, Bieda, values represent the impact of this variable, these individual variables 25 00:02:12,640 --> 00:02:13,720 on the response variable. 26 00:02:14,320 --> 00:02:19,300 So if this beta values large, this variable will have a large impact. 27 00:02:20,050 --> 00:02:23,780 If this value small, this variable will have a smaller impact on the response. 28 00:02:23,850 --> 00:02:24,140 Very good. 29 00:02:25,480 --> 00:02:28,240 Second thing is the importance of the sign. 30 00:02:29,080 --> 00:02:38,070 If this sign is negative, it means that if I increase price, the chance of house getting sold decreases. 31 00:02:39,130 --> 00:02:42,880 So this sign, the present inverse relationship. 32 00:02:43,870 --> 00:02:50,440 If I increase this variable, the response variable will increase because it is having a positive sign. 33 00:02:51,640 --> 00:02:52,860 This is having negative sign. 34 00:02:52,900 --> 00:02:58,420 So if I increase equality index, the chance of getting sold will decrease. 35 00:02:59,140 --> 00:03:00,700 So two things to note. 36 00:03:00,940 --> 00:03:02,340 Then we have beta. 37 00:03:02,770 --> 00:03:07,980 First is how large bidets and taking this or this design of BREEDA. 38 00:03:08,560 --> 00:03:12,010 So this is how we interpret the coefficient table. 39 00:03:12,460 --> 00:03:17,760 Similarly, in linear, indiscriminate axis, we get linear discriminant coefficients. 40 00:03:18,490 --> 00:03:19,930 Those also mean the same thing. 41 00:03:20,140 --> 00:03:24,940 We again see the value of the coefficient and the signs of the coefficient. 42 00:03:26,320 --> 00:03:29,240 However, this is not available in Ganon. 43 00:03:29,830 --> 00:03:32,500 As I told you earlier, kanon is non parametric. 44 00:03:33,310 --> 00:03:34,990 There is no functional relationship. 45 00:03:35,110 --> 00:03:36,400 So there are nobody does. 46 00:03:37,240 --> 00:03:44,320 Therefore, it is very difficult to estimate the effect of individual productivity was on the response 47 00:03:44,320 --> 00:03:44,770 variable. 48 00:03:45,070 --> 00:03:46,220 When we are doing Kiernan. 49 00:03:47,370 --> 00:03:53,470 So that is one of the drawbacks of Giana that it does not tell us anything about the relationship of 50 00:03:53,560 --> 00:03:55,190 each variable with the response variable. 51 00:03:57,400 --> 00:04:03,790 Second thing I want to discuss is the comparison of all these three classifiers in terms of accuracy 52 00:04:05,050 --> 00:04:06,240 for our dataset. 53 00:04:07,000 --> 00:04:15,130 We got lda with the highest accuracy, but in general, we cannot say that A will always perform the 54 00:04:15,130 --> 00:04:15,520 best. 55 00:04:16,810 --> 00:04:24,310 Whenever a linear boundary best classify is the dataset, logistic and Eilidh both perform well. 56 00:04:25,330 --> 00:04:32,500 Whereas whenever there is a nonlinear boundary, gaining performs better than these lenient methods 57 00:04:34,360 --> 00:04:35,350 out of logistic. 58 00:04:35,470 --> 00:04:43,990 And Ailee, since in LDA we had an assumption that the continuous variables are normally distributed. 59 00:04:44,950 --> 00:04:48,580 If that assumption is make a live performance better. 60 00:04:49,360 --> 00:04:53,830 If that assumption is wrong, logistic regression performs better. 61 00:04:55,210 --> 00:04:58,120 You can see these are the three confusion metrics. 62 00:04:58,930 --> 00:05:00,390 This is not the conclusion metrics. 63 00:05:00,400 --> 00:05:01,860 We got a particular dataset. 64 00:05:02,500 --> 00:05:05,690 This confusion metrics is drawn only on the test. 65 00:05:05,950 --> 00:05:09,850 We train the model on one data and tested it on another. 66 00:05:10,540 --> 00:05:15,700 If we would have drawn the convergent metrics on the training set, only probably Keinon would be the 67 00:05:15,790 --> 00:05:16,890 best method coming out. 68 00:05:17,500 --> 00:05:21,550 However, when we do it on a desert, Ali is the best method. 69 00:05:22,390 --> 00:05:30,730 So this accuracy is just the correct predictions, which is 70 out of the total observations, which 70 00:05:30,730 --> 00:05:31,610 was one two indeed. 71 00:05:32,480 --> 00:05:34,060 So this is surrounded by 120. 72 00:05:34,510 --> 00:05:35,850 This is 80 by 120. 73 00:05:36,670 --> 00:05:38,880 And this is sixty six by one grindy. 74 00:05:39,520 --> 00:05:47,440 So whenever we had logistic Aley, you can get the individual impact of all the variables on the response 75 00:05:47,440 --> 00:05:47,850 variable. 76 00:05:48,340 --> 00:05:54,490 When we are comparing the performance of three different methods, we will draw the conclusion matrix 77 00:05:54,580 --> 00:06:01,030 of all these three using a separate data called the test data, and then really compare their accuracy 78 00:06:01,330 --> 00:06:05,800 to find out which of these classifier is performing the best for our data.