1
00:00:00,666 --> 00:00:03,333
Hi everyone, and welcome to the intuition
Lecture

2
00:00:03,333 --> 00:00:06,333
for Linear Discriminant Analysis, or LDA.

3
00:00:06,833 --> 00:00:10,333
Now, for those of you
who are coming from the previous section

4
00:00:10,333 --> 00:00:14,833
on Principal Component Analysis or PCA,
this may seem a bit similar,

5
00:00:15,166 --> 00:00:18,000
but there is a difference between the two,
and we're going to get started

6
00:00:18,000 --> 00:00:20,866
by taking a look at overall
what LDA entails.

7
00:00:20,866 --> 00:00:24,066
It's a pretty brief
and straightforward intuition lecture,

8
00:00:24,666 --> 00:00:27,666
but we'll get to the main takeaways
between PCA and LDA.

9
00:00:28,333 --> 00:00:31,666
LDA is commonly used as a dimensionality
reduction technique,

10
00:00:32,233 --> 00:00:35,233
and we've heard that before with PCA.

11
00:00:36,000 --> 00:00:38,533
It's used in the pre-processing step

12
00:00:38,533 --> 00:00:42,033
for pattern classification and machine
learning algorithms,

13
00:00:42,666 --> 00:00:46,666
and its goal is to project a data set
onto a lower dimensional space.

14
00:00:48,033 --> 00:00:51,400
Sounds similar to PCA, but LDA differs

15
00:00:51,400 --> 00:00:55,833
because in addition
to finding the component axes with LDA,

16
00:00:56,333 --> 00:01:00,900
we are interested in the axes
that maximize the separation

17
00:01:00,900 --> 00:01:04,700
between multiple classes,
and that is the main takeaway.

18
00:01:04,700 --> 00:01:07,266
Or the main point is where PCA.

19
00:01:07,266 --> 00:01:11,266
We are with that distinction and working
with the principal component analysis

20
00:01:11,533 --> 00:01:14,266
with the axes,
the principal components within the data.

21
00:01:14,266 --> 00:01:18,300
But where as we're looking in LDA,
we are looking

22
00:01:18,300 --> 00:01:21,933
for the separation
of those classes within the data

23
00:01:23,366 --> 00:01:25,200
and to break it down further.

24
00:01:25,200 --> 00:01:30,533
The goal of LDA is to project
a feature space onto a small subspace

25
00:01:31,066 --> 00:01:34,300
while maintaining the class
discriminatory information.

26
00:01:34,900 --> 00:01:38,300
And we have both PCA and LDA as linear

27
00:01:38,300 --> 00:01:41,500
transformation techniques
used for dimensionality reduction.

28
00:01:42,566 --> 00:01:47,600
PCA is a unsupervised algorithm,
but LDA is supervised

29
00:01:47,833 --> 00:01:50,833
because of the relation
to the dependent variable,

30
00:01:52,166 --> 00:01:54,133
and we can see here
from this visualization

31
00:01:54,133 --> 00:01:57,533
the main operations and main differences
between PCA and LDA.

32
00:01:57,900 --> 00:01:59,866
PCA. Again, we're looking in that

33
00:01:59,866 --> 00:02:03,366
the subspace and the dimensionality
reduction technique of the data

34
00:02:03,600 --> 00:02:07,100
to examine how the principal component
axes are in relation.

35
00:02:07,133 --> 00:02:10,000
Whereas in LDA
we're looking for that class separation.

36
00:02:10,000 --> 00:02:13,000
And I and I think this visualization
kind of makes it the most clear

37
00:02:13,033 --> 00:02:14,000
between the two.

38
00:02:14,000 --> 00:02:16,633
If you want some additional information,
you can always take a look

39
00:02:16,633 --> 00:02:17,766
at the following link.

40
00:02:17,766 --> 00:02:22,900
But here we have the PCA and LDA
and the main operations related to each.

41
00:02:24,100 --> 00:02:24,900
Again,

42
00:02:24,900 --> 00:02:28,566
LDA supervised because of the relation
to the dependent variable.

43
00:02:28,766 --> 00:02:30,933
And I think when you start working
through this

44
00:02:30,933 --> 00:02:34,200
in the upcoming lecture, in the hands
on part, it's going to make more sense.

45
00:02:34,533 --> 00:02:37,533
But that's the main takeaway
to focus on LDA.

46
00:02:37,666 --> 00:02:40,833
And you can accomplish this
by five main steps.

47
00:02:41,033 --> 00:02:44,100
Similar again to PCA, the five main

48
00:02:44,100 --> 00:02:47,100
steps for LDA include the following.

49
00:02:47,266 --> 00:02:51,500
The computation of the d dimensional mean
vectors.

50
00:02:52,133 --> 00:02:54,300
The computation of the scatter matrices.

51
00:02:54,300 --> 00:02:57,000
You have to also compute the eigenvectors.

52
00:02:57,000 --> 00:02:59,900
Sort the eigenvectors by decreasing

53
00:02:59,900 --> 00:03:04,633
eigenvalues,
and use the d times k eigenvector matrix

54
00:03:04,633 --> 00:03:07,633
to transform the samples
onto the new subspace.

55
00:03:08,400 --> 00:03:12,900
Overall, very similar to PCA,
two different types of dimensionality

56
00:03:12,900 --> 00:03:16,600
reduction techniques, one being
unsupervised and one being supervised,

57
00:03:16,800 --> 00:03:18,966
but the main distinction
with LDA to take away

58
00:03:18,966 --> 00:03:22,000
is that we're looking for that
class separation within the data.

59
00:03:22,700 --> 00:03:25,033
Overall, if you're coming from PCA,
this should seem familiar

60
00:03:25,033 --> 00:03:26,366
for the majority of operations.

61
00:03:26,366 --> 00:03:29,533
If you're new to this, I advise you to go
take a look at PCA as well.

62
00:03:30,066 --> 00:03:33,033
But when you are starting to work
through the next coming part,

63
00:03:33,033 --> 00:03:33,866
it should make more sense.

64
00:03:33,866 --> 00:03:36,333
But just keep in mind
that the main takeaway for LDA

65
00:03:36,333 --> 00:03:41,100
is that class separation
and it is a supervised learning technique.

66
00:03:42,800 --> 00:03:45,800
If you have any questions, as always,
please feel free to share them

67
00:03:45,933 --> 00:03:48,933
and enjoy machine learning.