1
00:00:03,350 --> 00:00:07,420
And this lecture, we are going to discuss linear discriminant, analysis.

2
00:00:08,620 --> 00:00:14,460
Although most of the courses do not cover this technique because it has a lot of mathematics and may

3
00:00:14,460 --> 00:00:17,020
be difficult to understand for all audiences.

4
00:00:18,280 --> 00:00:21,940
I think I can give you the intuition behind using a simple example.

5
00:00:22,660 --> 00:00:26,830
And then when we do not analysis, I'll tell you how to interpret the result.

6
00:00:28,600 --> 00:00:34,720
By the end, you will be confident in applying this technique and using it to solve your business problems.

7
00:00:36,400 --> 00:00:42,920
As I told you earlier also that instead of logistic regression, this is the more preferred technique.

8
00:00:43,330 --> 00:00:45,130
When we have multiclass responses.

9
00:00:47,050 --> 00:00:52,570
Secondly, it is still simple enough to interpret the importance of different variables on the output

10
00:00:52,870 --> 00:00:53,740
from this technique.

11
00:00:55,510 --> 00:01:03,190
This is why LDA is one of the popular techniques for people in the marketing research area, also a

12
00:01:03,300 --> 00:01:05,080
list based on base to them.

13
00:01:06,220 --> 00:01:09,010
Let me show you an example of how this theorem is applied.

14
00:01:10,450 --> 00:01:13,610
Suppose we have this table of students in a class.

15
00:01:15,040 --> 00:01:16,480
We have three categories of height.

16
00:01:16,750 --> 00:01:17,790
Low, medium and high.

17
00:01:18,940 --> 00:01:22,980
And two categories of definiteness level, whether the outfit or Nordberg.

18
00:01:25,060 --> 00:01:26,710
Now, the prediction problem is this.

19
00:01:27,520 --> 00:01:33,370
I want to predict whether the student is fit, given that his height is medium.

20
00:01:35,440 --> 00:01:38,580
So the probability of a student is faked.

21
00:01:39,010 --> 00:01:43,150
Given that his height is medium, it's called conditional probability.

22
00:01:43,390 --> 00:01:47,260
Since it is based on a precondition that the height is medium.

23
00:01:49,870 --> 00:01:58,570
You can see since we have 40 students of medium height, out of which fifteen affect fifteen out of

24
00:01:58,570 --> 00:02:01,120
forty is the probability of being fit.

25
00:02:01,390 --> 00:02:03,070
Given the student is medium.

26
00:02:04,990 --> 00:02:07,840
So this is how we calculate conditional probability.

27
00:02:08,470 --> 00:02:15,460
Similarly, if we want the conditional D of student being not fit given the height of medium.

28
00:02:16,000 --> 00:02:18,990
We will get it by dividing 25 by 40.

29
00:02:21,450 --> 00:02:26,510
So conditional probability of being four days we've been before 40 of being not quite as qualified before

30
00:02:26,520 --> 00:02:29,460
B a base classifier.

31
00:02:29,670 --> 00:02:35,520
Find out these two conditional probabilities and assign declasse, which has the highest probability.

32
00:02:36,630 --> 00:02:42,420
So since 25 or 40 is higher, it will assign declasse, not fit.

33
00:02:43,710 --> 00:02:51,360
So I predicted variable was the category of height, given that a student belongs to medium height.

34
00:02:52,710 --> 00:03:00,720
A model based on based classifier will classify that studying to be not fit because this section has

35
00:03:00,840 --> 00:03:02,220
higher conditional probability.

36
00:03:03,180 --> 00:03:04,950
So no, this is pretty straightforward.

37
00:03:05,550 --> 00:03:08,090
And this is how based classified works.

38
00:03:09,940 --> 00:03:13,170
But in practice, we do not have numbers like this.

39
00:03:13,980 --> 00:03:17,740
Usually this hyd variable is a continuous variable.

40
00:03:18,630 --> 00:03:19,960
And we can make this table.

41
00:03:21,390 --> 00:03:24,150
Moreover, we do not have just one predictor variable.

42
00:03:24,390 --> 00:03:27,420
We have a lot of continuous variables as predictive variables.

43
00:03:29,430 --> 00:03:34,800
So to calculate the conditional probability in such a scenario becomes difficult.

44
00:03:36,150 --> 00:03:40,650
So this straightforward calculation of conditional probability field in that case.

45
00:03:41,730 --> 00:03:45,320
This is where based theorem of conditional probability is used.

46
00:03:47,820 --> 00:03:49,920
Let me show you how space tourism is created.

47
00:03:51,070 --> 00:03:59,130
If I want to find out the probability that a student is what medium and fit that is, these fifteen

48
00:03:59,130 --> 00:04:02,370
students out of hundreds student.

49
00:04:02,850 --> 00:04:06,660
So the probability of being medium, in fact, is we've been out of a hundred.

50
00:04:08,720 --> 00:04:11,250
This we can get in two ways.

51
00:04:12,060 --> 00:04:14,010
One is by going horizontally.

52
00:04:14,430 --> 00:04:22,200
That is first finding or the probability of being fished out of medium student and then multiplying

53
00:04:22,200 --> 00:04:25,760
it with the probability of being a medium height.

54
00:04:25,760 --> 00:04:28,740
It's three digit presented here.

55
00:04:28,960 --> 00:04:32,010
Fifteen by 40 to 40, 100.

56
00:04:33,540 --> 00:04:34,650
Which is equal into fifteen.

57
00:04:34,650 --> 00:04:37,530
Landed all by going vertically first.

58
00:04:38,130 --> 00:04:45,810
Which is probably of having medium height, given that student is fit with just fifteen by 48 multiplied

59
00:04:45,810 --> 00:04:49,980
by the radio being fished out of total Trent, which is 48 by a hundred.

60
00:04:50,490 --> 00:04:53,310
So that is fifteen by it into forty eight by a hundred.

61
00:04:54,630 --> 00:04:57,450
So you can see these two teams are equal.

62
00:04:58,110 --> 00:05:05,340
If I move this time of forty by a hundred to this denominator, I can calculate this fifteen by 40 as

63
00:05:05,340 --> 00:05:06,980
a combination of these three thumbs.

64
00:05:08,370 --> 00:05:14,240
One dummies, fifteen by 48, which is conditional probability of being medium height and given fit.

65
00:05:15,570 --> 00:05:17,160
Then is this forty eight by a hundred.

66
00:05:17,340 --> 00:05:24,180
Now this probability of being fit and in the denominator we will have probability of being a medium

67
00:05:24,180 --> 00:05:24,350
height.

68
00:05:24,350 --> 00:05:24,880
It's student.

69
00:05:26,310 --> 00:05:29,760
So fifteen to forty is equal to product of these two.

70
00:05:30,030 --> 00:05:37,560
And divided by this time this formula is known as base to them of conditional probability.

71
00:05:38,790 --> 00:05:43,310
So why did we go this long way instead of the straight fifteen 140?

72
00:05:45,120 --> 00:05:48,000
Because usually these variables are continuous variables.

73
00:05:48,840 --> 00:05:51,750
And there are a lot of different variables here.

74
00:05:53,010 --> 00:05:54,960
So it is difficult to calculate this.

75
00:05:54,960 --> 00:05:56,350
We've been before t straightaway.

76
00:05:57,060 --> 00:06:02,430
We have to use these three other times to estimate the value of this conditional probability of fifteen

77
00:06:02,490 --> 00:06:03,060
by 40.

78
00:06:05,070 --> 00:06:10,660
So when we treat this formula for continuous predictors, the final formula for linear.

79
00:06:10,770 --> 00:06:13,770
This current analysis looks something like this.

80
00:06:15,450 --> 00:06:18,480
As you can see, it is looking very complicated.

81
00:06:19,680 --> 00:06:23,550
That is why we did not discuss the mathematics behind the revision of this formula.

82
00:06:24,870 --> 00:06:30,660
I hope you understood the example of how we are calculating conditional probabilities and how we are

83
00:06:30,660 --> 00:06:32,880
assigning the response class.

84
00:06:33,300 --> 00:06:36,580
Basically conditional probabilities of each response.

85
00:06:36,630 --> 00:06:37,020
Class.

86
00:06:41,980 --> 00:06:48,080
And this formula we have, these buy items, these Bynum's are known to us from the training, they

87
00:06:48,080 --> 00:06:48,500
does it.

88
00:06:49,040 --> 00:06:53,540
Which is basically telling us how many of all these students are fit or unfit.

89
00:06:55,790 --> 00:06:58,310
These other parts are estimated.

90
00:06:58,690 --> 00:07:05,360
This is some assumptions and the assumption that we make in linear discriminant analysis is that the

91
00:07:05,360 --> 00:07:08,510
continuous variables are normally distributed.

92
00:07:09,740 --> 00:07:13,980
That is, if we have high rates of student those height.

93
00:07:14,120 --> 00:07:15,500
Have a normal distribution.

94
00:07:16,640 --> 00:07:20,160
Normal distribution can be understood using the link of video.

95
00:07:20,440 --> 00:07:21,960
I have given in the resources section.

96
00:07:23,780 --> 00:07:30,890
So if this assumption of normal distribution is actually true in reality, linear discriminant analysis

97
00:07:30,980 --> 00:07:32,150
predicts brilliantly.

98
00:07:32,720 --> 00:07:36,410
If it is not the prediction, accuracy of Alea is not that high.

99
00:07:37,580 --> 00:07:40,670
For most of the practical purposes, this assumption holds.

100
00:07:41,270 --> 00:07:45,350
And the prediction accuracy of our model is usually good enough.

101
00:07:47,210 --> 00:07:54,770
So let me summarize what a given set of predictor values and live will calculate the probability of

102
00:07:54,770 --> 00:07:58,040
that observation belonging to each group.

103
00:07:59,240 --> 00:08:05,060
Then whichever group has the highest probability that group is assigned to that observation.

104
00:08:06,650 --> 00:08:15,050
So since this classification is mathematical, it can be proved that if the assumption of normal distribution

105
00:08:15,050 --> 00:08:20,690
is correct, this classifier has the lowest possible theoretical error rate.

106
00:08:21,980 --> 00:08:28,700
Note that we are taking the input values of X as it is and not making a complex function out of it.

107
00:08:29,390 --> 00:08:33,170
This is why it is being called linear discriminant analysis.

108
00:08:33,710 --> 00:08:36,500
However, if we create a function with exquisite.

109
00:08:36,950 --> 00:08:39,650
It will be called quadratic discriminant analysis.

110
00:08:40,730 --> 00:08:45,440
Since running both linear and quadratic discriminant analysis is simple.

111
00:08:45,710 --> 00:08:49,790
In software packages, we'll be showing you how to run both of these.

112
00:08:50,660 --> 00:08:55,670
After running the model, we will use the confusion matrix to check out the quality of prediction.

113
00:08:56,540 --> 00:09:02,450
We can also compare the confusion matrix of Aldea Eudy and the logistic regression.

114
00:09:04,330 --> 00:09:10,390
One last thing before I close this lecture, I told you earlier that if we want to change the Bondie

115
00:09:10,390 --> 00:09:13,810
condition and logistic regression, we could do it.

116
00:09:14,860 --> 00:09:15,910
We can do it here.

117
00:09:15,940 --> 00:09:17,820
Also sense here also.

118
00:09:17,930 --> 00:09:19,540
We are computing probabilities.

119
00:09:20,260 --> 00:09:26,440
If you want to approve credit only if you are 80 percent sure you can change the boundary condition

120
00:09:26,440 --> 00:09:29,040
to zero point eight instead of zero point for.

121
00:09:29,470 --> 00:09:30,600
With a lead also.

122
00:09:32,410 --> 00:09:33,760
So that's it for this lecture.

123
00:09:34,480 --> 00:09:39,580
Hope you understand the intention behind linear discriminant, analysis and base classifiers.

124
00:09:40,810 --> 00:09:41,770
See you in the next one.