1 00:00:00,666 --> 00:00:03,333 Hi everyone, and welcome to the intuition Lecture 2 00:00:03,333 --> 00:00:06,333 for Linear Discriminant Analysis, or LDA. 3 00:00:06,833 --> 00:00:10,333 Now, for those of you who are coming from the previous section 4 00:00:10,333 --> 00:00:14,833 on Principal Component Analysis or PCA, this may seem a bit similar, 5 00:00:15,166 --> 00:00:18,000 but there is a difference between the two, and we're going to get started 6 00:00:18,000 --> 00:00:20,866 by taking a look at overall what LDA entails. 7 00:00:20,866 --> 00:00:24,066 It's a pretty brief and straightforward intuition lecture, 8 00:00:24,666 --> 00:00:27,666 but we'll get to the main takeaways between PCA and LDA. 9 00:00:28,333 --> 00:00:31,666 LDA is commonly used as a dimensionality reduction technique, 10 00:00:32,233 --> 00:00:35,233 and we've heard that before with PCA. 11 00:00:36,000 --> 00:00:38,533 It's used in the pre-processing step 12 00:00:38,533 --> 00:00:42,033 for pattern classification and machine learning algorithms, 13 00:00:42,666 --> 00:00:46,666 and its goal is to project a data set onto a lower dimensional space. 14 00:00:48,033 --> 00:00:51,400 Sounds similar to PCA, but LDA differs 15 00:00:51,400 --> 00:00:55,833 because in addition to finding the component axes with LDA, 16 00:00:56,333 --> 00:01:00,900 we are interested in the axes that maximize the separation 17 00:01:00,900 --> 00:01:04,700 between multiple classes, and that is the main takeaway. 18 00:01:04,700 --> 00:01:07,266 Or the main point is where PCA. 19 00:01:07,266 --> 00:01:11,266 We are with that distinction and working with the principal component analysis 20 00:01:11,533 --> 00:01:14,266 with the axes, the principal components within the data. 21 00:01:14,266 --> 00:01:18,300 But where as we're looking in LDA, we are looking 22 00:01:18,300 --> 00:01:21,933 for the separation of those classes within the data 23 00:01:23,366 --> 00:01:25,200 and to break it down further. 24 00:01:25,200 --> 00:01:30,533 The goal of LDA is to project a feature space onto a small subspace 25 00:01:31,066 --> 00:01:34,300 while maintaining the class discriminatory information. 26 00:01:34,900 --> 00:01:38,300 And we have both PCA and LDA as linear 27 00:01:38,300 --> 00:01:41,500 transformation techniques used for dimensionality reduction. 28 00:01:42,566 --> 00:01:47,600 PCA is a unsupervised algorithm, but LDA is supervised 29 00:01:47,833 --> 00:01:50,833 because of the relation to the dependent variable, 30 00:01:52,166 --> 00:01:54,133 and we can see here from this visualization 31 00:01:54,133 --> 00:01:57,533 the main operations and main differences between PCA and LDA. 32 00:01:57,900 --> 00:01:59,866 PCA. Again, we're looking in that 33 00:01:59,866 --> 00:02:03,366 the subspace and the dimensionality reduction technique of the data 34 00:02:03,600 --> 00:02:07,100 to examine how the principal component axes are in relation. 35 00:02:07,133 --> 00:02:10,000 Whereas in LDA we're looking for that class separation. 36 00:02:10,000 --> 00:02:13,000 And I and I think this visualization kind of makes it the most clear 37 00:02:13,033 --> 00:02:14,000 between the two. 38 00:02:14,000 --> 00:02:16,633 If you want some additional information, you can always take a look 39 00:02:16,633 --> 00:02:17,766 at the following link. 40 00:02:17,766 --> 00:02:22,900 But here we have the PCA and LDA and the main operations related to each. 41 00:02:24,100 --> 00:02:24,900 Again, 42 00:02:24,900 --> 00:02:28,566 LDA supervised because of the relation to the dependent variable. 43 00:02:28,766 --> 00:02:30,933 And I think when you start working through this 44 00:02:30,933 --> 00:02:34,200 in the upcoming lecture, in the hands on part, it's going to make more sense. 45 00:02:34,533 --> 00:02:37,533 But that's the main takeaway to focus on LDA. 46 00:02:37,666 --> 00:02:40,833 And you can accomplish this by five main steps. 47 00:02:41,033 --> 00:02:44,100 Similar again to PCA, the five main 48 00:02:44,100 --> 00:02:47,100 steps for LDA include the following. 49 00:02:47,266 --> 00:02:51,500 The computation of the d dimensional mean vectors. 50 00:02:52,133 --> 00:02:54,300 The computation of the scatter matrices. 51 00:02:54,300 --> 00:02:57,000 You have to also compute the eigenvectors. 52 00:02:57,000 --> 00:02:59,900 Sort the eigenvectors by decreasing 53 00:02:59,900 --> 00:03:04,633 eigenvalues, and use the d times k eigenvector matrix 54 00:03:04,633 --> 00:03:07,633 to transform the samples onto the new subspace. 55 00:03:08,400 --> 00:03:12,900 Overall, very similar to PCA, two different types of dimensionality 56 00:03:12,900 --> 00:03:16,600 reduction techniques, one being unsupervised and one being supervised, 57 00:03:16,800 --> 00:03:18,966 but the main distinction with LDA to take away 58 00:03:18,966 --> 00:03:22,000 is that we're looking for that class separation within the data. 59 00:03:22,700 --> 00:03:25,033 Overall, if you're coming from PCA, this should seem familiar 60 00:03:25,033 --> 00:03:26,366 for the majority of operations. 61 00:03:26,366 --> 00:03:29,533 If you're new to this, I advise you to go take a look at PCA as well. 62 00:03:30,066 --> 00:03:33,033 But when you are starting to work through the next coming part, 63 00:03:33,033 --> 00:03:33,866 it should make more sense. 64 00:03:33,866 --> 00:03:36,333 But just keep in mind that the main takeaway for LDA 65 00:03:36,333 --> 00:03:41,100 is that class separation and it is a supervised learning technique. 66 00:03:42,800 --> 00:03:45,800 If you have any questions, as always, please feel free to share them 67 00:03:45,933 --> 00:03:48,933 and enjoy machine learning.