1 00:00:03,350 --> 00:00:07,420 And this lecture, we are going to discuss linear discriminant, analysis. 2 00:00:08,620 --> 00:00:14,460 Although most of the courses do not cover this technique because it has a lot of mathematics and may 3 00:00:14,460 --> 00:00:17,020 be difficult to understand for all audiences. 4 00:00:18,280 --> 00:00:21,940 I think I can give you the intuition behind using a simple example. 5 00:00:22,660 --> 00:00:26,830 And then when we do not analysis, I'll tell you how to interpret the result. 6 00:00:28,600 --> 00:00:34,720 By the end, you will be confident in applying this technique and using it to solve your business problems. 7 00:00:36,400 --> 00:00:42,920 As I told you earlier also that instead of logistic regression, this is the more preferred technique. 8 00:00:43,330 --> 00:00:45,130 When we have multiclass responses. 9 00:00:47,050 --> 00:00:52,570 Secondly, it is still simple enough to interpret the importance of different variables on the output 10 00:00:52,870 --> 00:00:53,740 from this technique. 11 00:00:55,510 --> 00:01:03,190 This is why LDA is one of the popular techniques for people in the marketing research area, also a 12 00:01:03,300 --> 00:01:05,080 list based on base to them. 13 00:01:06,220 --> 00:01:09,010 Let me show you an example of how this theorem is applied. 14 00:01:10,450 --> 00:01:13,610 Suppose we have this table of students in a class. 15 00:01:15,040 --> 00:01:16,480 We have three categories of height. 16 00:01:16,750 --> 00:01:17,790 Low, medium and high. 17 00:01:18,940 --> 00:01:22,980 And two categories of definiteness level, whether the outfit or Nordberg. 18 00:01:25,060 --> 00:01:26,710 Now, the prediction problem is this. 19 00:01:27,520 --> 00:01:33,370 I want to predict whether the student is fit, given that his height is medium. 20 00:01:35,440 --> 00:01:38,580 So the probability of a student is faked. 21 00:01:39,010 --> 00:01:43,150 Given that his height is medium, it's called conditional probability. 22 00:01:43,390 --> 00:01:47,260 Since it is based on a precondition that the height is medium. 23 00:01:49,870 --> 00:01:58,570 You can see since we have 40 students of medium height, out of which fifteen affect fifteen out of 24 00:01:58,570 --> 00:02:01,120 forty is the probability of being fit. 25 00:02:01,390 --> 00:02:03,070 Given the student is medium. 26 00:02:04,990 --> 00:02:07,840 So this is how we calculate conditional probability. 27 00:02:08,470 --> 00:02:15,460 Similarly, if we want the conditional D of student being not fit given the height of medium. 28 00:02:16,000 --> 00:02:18,990 We will get it by dividing 25 by 40. 29 00:02:21,450 --> 00:02:26,510 So conditional probability of being four days we've been before 40 of being not quite as qualified before 30 00:02:26,520 --> 00:02:29,460 B a base classifier. 31 00:02:29,670 --> 00:02:35,520 Find out these two conditional probabilities and assign declasse, which has the highest probability. 32 00:02:36,630 --> 00:02:42,420 So since 25 or 40 is higher, it will assign declasse, not fit. 33 00:02:43,710 --> 00:02:51,360 So I predicted variable was the category of height, given that a student belongs to medium height. 34 00:02:52,710 --> 00:03:00,720 A model based on based classifier will classify that studying to be not fit because this section has 35 00:03:00,840 --> 00:03:02,220 higher conditional probability. 36 00:03:03,180 --> 00:03:04,950 So no, this is pretty straightforward. 37 00:03:05,550 --> 00:03:08,090 And this is how based classified works. 38 00:03:09,940 --> 00:03:13,170 But in practice, we do not have numbers like this. 39 00:03:13,980 --> 00:03:17,740 Usually this hyd variable is a continuous variable. 40 00:03:18,630 --> 00:03:19,960 And we can make this table. 41 00:03:21,390 --> 00:03:24,150 Moreover, we do not have just one predictor variable. 42 00:03:24,390 --> 00:03:27,420 We have a lot of continuous variables as predictive variables. 43 00:03:29,430 --> 00:03:34,800 So to calculate the conditional probability in such a scenario becomes difficult. 44 00:03:36,150 --> 00:03:40,650 So this straightforward calculation of conditional probability field in that case. 45 00:03:41,730 --> 00:03:45,320 This is where based theorem of conditional probability is used. 46 00:03:47,820 --> 00:03:49,920 Let me show you how space tourism is created. 47 00:03:51,070 --> 00:03:59,130 If I want to find out the probability that a student is what medium and fit that is, these fifteen 48 00:03:59,130 --> 00:04:02,370 students out of hundreds student. 49 00:04:02,850 --> 00:04:06,660 So the probability of being medium, in fact, is we've been out of a hundred. 50 00:04:08,720 --> 00:04:11,250 This we can get in two ways. 51 00:04:12,060 --> 00:04:14,010 One is by going horizontally. 52 00:04:14,430 --> 00:04:22,200 That is first finding or the probability of being fished out of medium student and then multiplying 53 00:04:22,200 --> 00:04:25,760 it with the probability of being a medium height. 54 00:04:25,760 --> 00:04:28,740 It's three digit presented here. 55 00:04:28,960 --> 00:04:32,010 Fifteen by 40 to 40, 100. 56 00:04:33,540 --> 00:04:34,650 Which is equal into fifteen. 57 00:04:34,650 --> 00:04:37,530 Landed all by going vertically first. 58 00:04:38,130 --> 00:04:45,810 Which is probably of having medium height, given that student is fit with just fifteen by 48 multiplied 59 00:04:45,810 --> 00:04:49,980 by the radio being fished out of total Trent, which is 48 by a hundred. 60 00:04:50,490 --> 00:04:53,310 So that is fifteen by it into forty eight by a hundred. 61 00:04:54,630 --> 00:04:57,450 So you can see these two teams are equal. 62 00:04:58,110 --> 00:05:05,340 If I move this time of forty by a hundred to this denominator, I can calculate this fifteen by 40 as 63 00:05:05,340 --> 00:05:06,980 a combination of these three thumbs. 64 00:05:08,370 --> 00:05:14,240 One dummies, fifteen by 48, which is conditional probability of being medium height and given fit. 65 00:05:15,570 --> 00:05:17,160 Then is this forty eight by a hundred. 66 00:05:17,340 --> 00:05:24,180 Now this probability of being fit and in the denominator we will have probability of being a medium 67 00:05:24,180 --> 00:05:24,350 height. 68 00:05:24,350 --> 00:05:24,880 It's student. 69 00:05:26,310 --> 00:05:29,760 So fifteen to forty is equal to product of these two. 70 00:05:30,030 --> 00:05:37,560 And divided by this time this formula is known as base to them of conditional probability. 71 00:05:38,790 --> 00:05:43,310 So why did we go this long way instead of the straight fifteen 140? 72 00:05:45,120 --> 00:05:48,000 Because usually these variables are continuous variables. 73 00:05:48,840 --> 00:05:51,750 And there are a lot of different variables here. 74 00:05:53,010 --> 00:05:54,960 So it is difficult to calculate this. 75 00:05:54,960 --> 00:05:56,350 We've been before t straightaway. 76 00:05:57,060 --> 00:06:02,430 We have to use these three other times to estimate the value of this conditional probability of fifteen 77 00:06:02,490 --> 00:06:03,060 by 40. 78 00:06:05,070 --> 00:06:10,660 So when we treat this formula for continuous predictors, the final formula for linear. 79 00:06:10,770 --> 00:06:13,770 This current analysis looks something like this. 80 00:06:15,450 --> 00:06:18,480 As you can see, it is looking very complicated. 81 00:06:19,680 --> 00:06:23,550 That is why we did not discuss the mathematics behind the revision of this formula. 82 00:06:24,870 --> 00:06:30,660 I hope you understood the example of how we are calculating conditional probabilities and how we are 83 00:06:30,660 --> 00:06:32,880 assigning the response class. 84 00:06:33,300 --> 00:06:36,580 Basically conditional probabilities of each response. 85 00:06:36,630 --> 00:06:37,020 Class. 86 00:06:41,980 --> 00:06:48,080 And this formula we have, these buy items, these Bynum's are known to us from the training, they 87 00:06:48,080 --> 00:06:48,500 does it. 88 00:06:49,040 --> 00:06:53,540 Which is basically telling us how many of all these students are fit or unfit. 89 00:06:55,790 --> 00:06:58,310 These other parts are estimated. 90 00:06:58,690 --> 00:07:05,360 This is some assumptions and the assumption that we make in linear discriminant analysis is that the 91 00:07:05,360 --> 00:07:08,510 continuous variables are normally distributed. 92 00:07:09,740 --> 00:07:13,980 That is, if we have high rates of student those height. 93 00:07:14,120 --> 00:07:15,500 Have a normal distribution. 94 00:07:16,640 --> 00:07:20,160 Normal distribution can be understood using the link of video. 95 00:07:20,440 --> 00:07:21,960 I have given in the resources section. 96 00:07:23,780 --> 00:07:30,890 So if this assumption of normal distribution is actually true in reality, linear discriminant analysis 97 00:07:30,980 --> 00:07:32,150 predicts brilliantly. 98 00:07:32,720 --> 00:07:36,410 If it is not the prediction, accuracy of Alea is not that high. 99 00:07:37,580 --> 00:07:40,670 For most of the practical purposes, this assumption holds. 100 00:07:41,270 --> 00:07:45,350 And the prediction accuracy of our model is usually good enough. 101 00:07:47,210 --> 00:07:54,770 So let me summarize what a given set of predictor values and live will calculate the probability of 102 00:07:54,770 --> 00:07:58,040 that observation belonging to each group. 103 00:07:59,240 --> 00:08:05,060 Then whichever group has the highest probability that group is assigned to that observation. 104 00:08:06,650 --> 00:08:15,050 So since this classification is mathematical, it can be proved that if the assumption of normal distribution 105 00:08:15,050 --> 00:08:20,690 is correct, this classifier has the lowest possible theoretical error rate. 106 00:08:21,980 --> 00:08:28,700 Note that we are taking the input values of X as it is and not making a complex function out of it. 107 00:08:29,390 --> 00:08:33,170 This is why it is being called linear discriminant analysis. 108 00:08:33,710 --> 00:08:36,500 However, if we create a function with exquisite. 109 00:08:36,950 --> 00:08:39,650 It will be called quadratic discriminant analysis. 110 00:08:40,730 --> 00:08:45,440 Since running both linear and quadratic discriminant analysis is simple. 111 00:08:45,710 --> 00:08:49,790 In software packages, we'll be showing you how to run both of these. 112 00:08:50,660 --> 00:08:55,670 After running the model, we will use the confusion matrix to check out the quality of prediction. 113 00:08:56,540 --> 00:09:02,450 We can also compare the confusion matrix of Aldea Eudy and the logistic regression. 114 00:09:04,330 --> 00:09:10,390 One last thing before I close this lecture, I told you earlier that if we want to change the Bondie 115 00:09:10,390 --> 00:09:13,810 condition and logistic regression, we could do it. 116 00:09:14,860 --> 00:09:15,910 We can do it here. 117 00:09:15,940 --> 00:09:17,820 Also sense here also. 118 00:09:17,930 --> 00:09:19,540 We are computing probabilities. 119 00:09:20,260 --> 00:09:26,440 If you want to approve credit only if you are 80 percent sure you can change the boundary condition 120 00:09:26,440 --> 00:09:29,040 to zero point eight instead of zero point for. 121 00:09:29,470 --> 00:09:30,600 With a lead also. 122 00:09:32,410 --> 00:09:33,760 So that's it for this lecture. 123 00:09:34,480 --> 00:09:39,580 Hope you understand the intention behind linear discriminant, analysis and base classifiers. 124 00:09:40,810 --> 00:09:41,770 See you in the next one.