1
00:00:00,133 --> 00:00:00,733
All right.

2
00:00:00,733 --> 00:00:02,733
Good.
So now I'm going to give you the solution.

3
00:00:02,733 --> 00:00:07,100
So first of all we're going to 
fit or scaler.

4
00:00:07,100 --> 00:00:10,166
You know our standardization tool
on the training set.

5
00:00:10,166 --> 00:00:14,300
So I'm taking the training set
first X train.

6
00:00:14,300 --> 00:00:15,466
All right.

7
00:00:15,466 --> 00:00:18,200
And then since I've just explained
that we want

8
00:00:18,200 --> 00:00:21,133
apply feature
scaling on the dummy variables.

9
00:00:21,133 --> 00:00:25,200
Well then that means that we will fit
our standard scaler object

10
00:00:25,400 --> 00:00:30,066
only on these two columns here
containing the ages and the salaries.

11
00:00:30,066 --> 00:00:33,866
And therefore here I'm going to take
only the two columns here

12
00:00:33,866 --> 00:00:36,733
for the age and the salary.
And then of course all the rows.

13
00:00:36,733 --> 00:00:39,033
And remember the trick
to take all the rows

14
00:00:39,033 --> 00:00:42,100
we just need to add a column,
which means we are taking the range

15
00:00:42,100 --> 00:00:45,100
from the lower bound to the upper bound,
meaning everything.

16
00:00:45,133 --> 00:00:47,233
And then to take the columns well.

17
00:00:47,233 --> 00:00:50,000
Be careful
this is what we have to look at.

18
00:00:50,000 --> 00:00:54,133
Now this because indeed
we want to take this column and this one.

19
00:00:54,300 --> 00:00:57,300
And so now the question is
what are the indexes of these columns.

20
00:00:57,400 --> 00:01:00,633
Well remember that indexes in Python
start from zero.

21
00:01:00,666 --> 00:01:02,533
So this has index zero.

22
00:01:02,533 --> 00:01:05,333
Then the second column has an x
one and x two.

23
00:01:05,333 --> 00:01:08,266
And this one has index with each
column has index three.

24
00:01:08,266 --> 00:01:11,266
And the salary column has index four.

25
00:01:11,400 --> 00:01:16,000
But since I want to make this template
as much generic as we can,

26
00:01:16,000 --> 00:01:19,566
and since when you one hot encode
your categorical variables,

27
00:01:19,800 --> 00:01:22,633
they always automatically
go as the first column.

28
00:01:22,633 --> 00:01:24,600
Well we're going to do something
even better.

29
00:01:24,600 --> 00:01:28,200
We're going to specify the indexes
we want here by three,

30
00:01:28,233 --> 00:01:30,400
which is the index of the h column.

31
00:01:30,400 --> 00:01:33,300
And then a simple column. Right.

32
00:01:33,300 --> 00:01:37,066
Because this will take the range
from the column of the next three,

33
00:01:37,066 --> 00:01:40,100
which is the edge
up to all the other columns.

34
00:01:40,100 --> 00:01:41,600
You know, there is not a minus one here.

35
00:01:41,600 --> 00:01:45,600
So this will take all the remaining
columns from the H, meaning the age

36
00:01:45,733 --> 00:01:48,433
and the salary. Basically,
this will take these two columns.

37
00:01:48,433 --> 00:01:52,033
And if you have a larger
matrix of features with numerical values

38
00:01:52,033 --> 00:01:54,700
in your feature as well,
this will just take all the columns.

39
00:01:54,700 --> 00:01:56,133
All right. So that's a little trick.

40
00:01:56,133 --> 00:01:57,966
More elegant let's say.

41
00:01:57,966 --> 00:02:02,066
And so now we are
of course going to use our object

42
00:02:02,066 --> 00:02:05,066
which we called as C from which.

43
00:02:05,133 --> 00:02:09,266
Well we're going to use that fit method
that will indeed

44
00:02:09,266 --> 00:02:13,966
for each feature of X train compute
the mean of the feature

45
00:02:14,033 --> 00:02:17,300
meaning the mean of the age,
and then the mean of the salary,

46
00:02:17,666 --> 00:02:21,333
and then compute the standard deviation
of the feature, the age and the salary.

47
00:02:21,733 --> 00:02:23,700
And that's exactly what the fit
method will do.

48
00:02:23,700 --> 00:02:28,700
It will only compute the mean and
the standard deviation of all the values.

49
00:02:28,866 --> 00:02:34,233
And then you have the transform method
that will indeed apply this formula by,

50
00:02:34,233 --> 00:02:38,066
you know, transforming each of the values
here of each feature

51
00:02:38,266 --> 00:02:42,000
into this value
resulting from this formula.

52
00:02:42,300 --> 00:02:42,633
All right.

53
00:02:42,633 --> 00:02:46,833
So it's important to understand the
difference between fit and transform fit.

54
00:02:46,833 --> 00:02:50,066
We'll just get the mean and standard
deviation of each of your features.

55
00:02:50,266 --> 00:02:54,133
And transform will apply this formula
to indeed transform

56
00:02:54,133 --> 00:02:57,133
your values
so that they can all be in the same scale.

57
00:02:57,266 --> 00:02:57,966
All right.

58
00:02:57,966 --> 00:03:01,066
And now the good news
is that one of the methods

59
00:03:01,066 --> 00:03:04,933
of the standard scalar
class is actually fit transform,

60
00:03:05,100 --> 00:03:09,400
which of course will proceed
to the two tools at the same time, meaning

61
00:03:09,400 --> 00:03:13,333
it will fit your matrix of features
to get the mean and standard deviation.

62
00:03:13,500 --> 00:03:15,566
And then right after that, transform

63
00:03:15,566 --> 00:03:19,100
all the values of the features
to turn them into this formula.

64
00:03:19,366 --> 00:03:19,933
All right.

65
00:03:19,933 --> 00:03:22,700
So let's call this method
right away to make it efficient.

66
00:03:22,700 --> 00:03:28,133
You know fit underscore transform form
then some parentheses.

67
00:03:28,133 --> 00:03:32,233
And now obviously you know what
to input inside this fit transform method.

68
00:03:32,566 --> 00:03:35,966
Well that's of course
exactly the same as Xtrain here

69
00:03:36,533 --> 00:03:40,666
because indeed
we will only apply feature scaling

70
00:03:40,700 --> 00:03:45,933
to our numerical columns here
containing non-integer values.

71
00:03:45,933 --> 00:03:47,866
Right. Non dummy variables value.