1
00:00:01,100 --> 00:00:06,570
In this video, I will discuss with you the results of our models so that when you see the results,

2
00:00:06,780 --> 00:00:11,310
you are able to a call business insight for your business problem.

3
00:00:13,500 --> 00:00:19,950
If you remember when we landed logistic model, we get a desert like this in our model, which contains

4
00:00:20,070 --> 00:00:21,300
all the beta values.

5
00:00:21,600 --> 00:00:26,730
And this column, DEPI values in this column for all the independent variables.

6
00:00:27,960 --> 00:00:35,000
So first thing that we should look at when we have this table is this column containing the P value.

7
00:00:36,600 --> 00:00:43,350
As I told you earlier, also P value represents our confidence level when you are saying that whether

8
00:00:43,350 --> 00:00:46,950
this variable is impacting the response variable or not.

9
00:00:47,400 --> 00:00:52,650
So this variable residential area has a P value, which is very small.

10
00:00:52,800 --> 00:00:56,560
This ease of presenting exponential, so to about minus 13.

11
00:00:57,300 --> 00:00:59,170
So this value is very small.

12
00:00:59,790 --> 00:01:05,850
Which means that we are very confident that this variable is impacting our response variable.

13
00:01:08,250 --> 00:01:13,410
Similarly, all these values, which are very small, are impacting our response very well.

14
00:01:14,250 --> 00:01:21,150
If you look at air quality, it is zero point zero two seven, which is less than five percent.

15
00:01:21,570 --> 00:01:29,010
We can still take air quality and in hospitals, all these other variables, which are more than zero

16
00:01:29,010 --> 00:01:32,070
point zero five for these variables.

17
00:01:32,310 --> 00:01:39,060
We are not sufficiently confident to make these Stegeman that these variables are actually impacting

18
00:01:39,060 --> 00:01:40,110
our response variable.

19
00:01:41,520 --> 00:01:48,810
So if you want to remove some of the variables, we will pick the variable with largest P value and

20
00:01:48,810 --> 00:01:49,940
we can remove it.

21
00:01:53,350 --> 00:01:59,950
So the first thing we see is this column and identify the variables would be value less than zero point

22
00:01:59,950 --> 00:02:00,480
zero five.

23
00:02:02,260 --> 00:02:05,110
Next thing we do is to look at this bit of values.

24
00:02:06,070 --> 00:02:12,610
If you remember, Bieda, values represent the impact of this variable, these individual variables

25
00:02:12,640 --> 00:02:13,720
on the response variable.

26
00:02:14,320 --> 00:02:19,300
So if this beta values large, this variable will have a large impact.

27
00:02:20,050 --> 00:02:23,780
If this value small, this variable will have a smaller impact on the response.

28
00:02:23,850 --> 00:02:24,140
Very good.

29
00:02:25,480 --> 00:02:28,240
Second thing is the importance of the sign.

30
00:02:29,080 --> 00:02:38,070
If this sign is negative, it means that if I increase price, the chance of house getting sold decreases.

31
00:02:39,130 --> 00:02:42,880
So this sign, the present inverse relationship.

32
00:02:43,870 --> 00:02:50,440
If I increase this variable, the response variable will increase because it is having a positive sign.

33
00:02:51,640 --> 00:02:52,860
This is having negative sign.

34
00:02:52,900 --> 00:02:58,420
So if I increase equality index, the chance of getting sold will decrease.

35
00:02:59,140 --> 00:03:00,700
So two things to note.

36
00:03:00,940 --> 00:03:02,340
Then we have beta.

37
00:03:02,770 --> 00:03:07,980
First is how large bidets and taking this or this design of BREEDA.

38
00:03:08,560 --> 00:03:12,010
So this is how we interpret the coefficient table.

39
00:03:12,460 --> 00:03:17,760
Similarly, in linear, indiscriminate axis, we get linear discriminant coefficients.

40
00:03:18,490 --> 00:03:19,930
Those also mean the same thing.

41
00:03:20,140 --> 00:03:24,940
We again see the value of the coefficient and the signs of the coefficient.

42
00:03:26,320 --> 00:03:29,240
However, this is not available in Ganon.

43
00:03:29,830 --> 00:03:32,500
As I told you earlier, kanon is non parametric.

44
00:03:33,310 --> 00:03:34,990
There is no functional relationship.

45
00:03:35,110 --> 00:03:36,400
So there are nobody does.

46
00:03:37,240 --> 00:03:44,320
Therefore, it is very difficult to estimate the effect of individual productivity was on the response

47
00:03:44,320 --> 00:03:44,770
variable.

48
00:03:45,070 --> 00:03:46,220
When we are doing Kiernan.

49
00:03:47,370 --> 00:03:53,470
So that is one of the drawbacks of Giana that it does not tell us anything about the relationship of

50
00:03:53,560 --> 00:03:55,190
each variable with the response variable.

51
00:03:57,400 --> 00:04:03,790
Second thing I want to discuss is the comparison of all these three classifiers in terms of accuracy

52
00:04:05,050 --> 00:04:06,240
for our dataset.

53
00:04:07,000 --> 00:04:15,130
We got lda with the highest accuracy, but in general, we cannot say that A will always perform the

54
00:04:15,130 --> 00:04:15,520
best.

55
00:04:16,810 --> 00:04:24,310
Whenever a linear boundary best classify is the dataset, logistic and Eilidh both perform well.

56
00:04:25,330 --> 00:04:32,500
Whereas whenever there is a nonlinear boundary, gaining performs better than these lenient methods

57
00:04:34,360 --> 00:04:35,350
out of logistic.

58
00:04:35,470 --> 00:04:43,990
And Ailee, since in LDA we had an assumption that the continuous variables are normally distributed.

59
00:04:44,950 --> 00:04:48,580
If that assumption is make a live performance better.

60
00:04:49,360 --> 00:04:53,830
If that assumption is wrong, logistic regression performs better.

61
00:04:55,210 --> 00:04:58,120
You can see these are the three confusion metrics.

62
00:04:58,930 --> 00:05:00,390
This is not the conclusion metrics.

63
00:05:00,400 --> 00:05:01,860
We got a particular dataset.

64
00:05:02,500 --> 00:05:05,690
This confusion metrics is drawn only on the test.

65
00:05:05,950 --> 00:05:09,850
We train the model on one data and tested it on another.

66
00:05:10,540 --> 00:05:15,700
If we would have drawn the convergent metrics on the training set, only probably Keinon would be the

67
00:05:15,790 --> 00:05:16,890
best method coming out.

68
00:05:17,500 --> 00:05:21,550
However, when we do it on a desert, Ali is the best method.

69
00:05:22,390 --> 00:05:30,730
So this accuracy is just the correct predictions, which is 70 out of the total observations, which

70
00:05:30,730 --> 00:05:31,610
was one two indeed.

71
00:05:32,480 --> 00:05:34,060
So this is surrounded by 120.

72
00:05:34,510 --> 00:05:35,850
This is 80 by 120.

73
00:05:36,670 --> 00:05:38,880
And this is sixty six by one grindy.

74
00:05:39,520 --> 00:05:47,440
So whenever we had logistic Aley, you can get the individual impact of all the variables on the response

75
00:05:47,440 --> 00:05:47,850
variable.

76
00:05:48,340 --> 00:05:54,490
When we are comparing the performance of three different methods, we will draw the conclusion matrix

77
00:05:54,580 --> 00:06:01,030
of all these three using a separate data called the test data, and then really compare their accuracy

78
00:06:01,330 --> 00:06:05,800
to find out which of these classifier is performing the best for our data.