1
00:00:00,300 --> 00:00:00,833
All right.

2
00:00:00,833 --> 00:00:01,166
Great.

3
00:00:01,166 --> 00:00:04,400
So this will apply feature scaling
to our two comes into training set.

4
00:00:04,566 --> 00:00:08,566
And now I hope you know
what the next step is going to be.

5
00:00:08,566 --> 00:00:10,566
And you won't fall into the trap.

6
00:00:10,566 --> 00:00:13,200
Now we have to also transform

7
00:00:13,200 --> 00:00:16,700
our matrix of features to the test set,
meaning x.

8
00:00:16,700 --> 00:00:19,766
Test this matrix of features, but

9
00:00:20,033 --> 00:00:24,166
since this data is like new data,
which we get, you know,

10
00:00:24,200 --> 00:00:25,800
later on in production,

11
00:00:25,800 --> 00:00:29,366
well, for this data
we will only apply the transform method

12
00:00:29,666 --> 00:00:33,500
because indeed the features of the test
set need to be scaled

13
00:00:33,533 --> 00:00:37,500
by the same scaler
that was used on the training set.

14
00:00:37,800 --> 00:00:39,400
We can not get a new scaler.

15
00:00:39,400 --> 00:00:40,833
You know, if we apply the fit transfer

16
00:00:40,833 --> 00:00:44,100
method here on X test,
we would get a new scaler.

17
00:00:44,633 --> 00:00:49,466
And that would absolutely not make sense
because X test will actually be the input

18
00:00:49,566 --> 00:00:53,166
of the predict function
that will return the predictions,

19
00:00:53,166 --> 00:00:55,500
you know, after the machine
learning model is trained.

20
00:00:55,500 --> 00:00:58,433
And since this machine learning
model will be trained

21
00:00:58,433 --> 00:01:02,866
with a particular scaler, you know,
the scaler applied on the training set.

22
00:01:03,233 --> 00:01:07,066
Well, in order to make predictions
that will be congruent with the way

23
00:01:07,066 --> 00:01:08,066
the model was trained.

24
00:01:08,066 --> 00:01:11,966
Well, we need to apply the same scaler
that was used on the training set

25
00:01:12,166 --> 00:01:16,366
onto the test set, so that we can get
indeed the same transformation

26
00:01:16,600 --> 00:01:18,033
and therefore Indian.

27
00:01:18,033 --> 00:01:22,400
Some relevant predictions with the predict
method applied to X test.

28
00:01:22,733 --> 00:01:26,300
So here it's clearly the transform method
that must only be applied.

29
00:01:26,500 --> 00:01:28,500
And therefore
what we're going to do to make it

30
00:01:28,500 --> 00:01:31,800
efficient is well
we're going to copy this line of code.

31
00:01:32,000 --> 00:01:33,800
And just below we're going to paste it.

32
00:01:33,800 --> 00:01:37,033
We're going to replace
of course Xtrain by X test.

33
00:01:37,366 --> 00:01:40,466
And then here as well X trained by X test.

34
00:01:40,833 --> 00:01:44,233
And then just call of course
the transform method

35
00:01:44,233 --> 00:01:49,166
from that same scaler
that was applied on the training set.

36
00:01:49,400 --> 00:01:52,100
Because indeed
this is part of the training. Right.

37
00:01:52,100 --> 00:01:53,900
Even if we haven't started the training.

38
00:01:53,900 --> 00:01:57,766
Well, this operation
that we apply here on our training set

39
00:01:58,100 --> 00:02:00,866
is, you know,
the preparation of the training.

40
00:02:00,866 --> 00:02:01,200
All right.

41
00:02:01,200 --> 00:02:04,100
So I hope it's clear it's
very important that you understand this.

42
00:02:04,100 --> 00:02:06,900
And now, well,
I have to say congratulations,

43
00:02:06,900 --> 00:02:10,733
because we're actually done
implementing our final tool.

44
00:02:10,733 --> 00:02:14,766
And of course I'm going to show you
the result of feature scaling here.

45
00:02:14,766 --> 00:02:18,566
So let me create
two more code cells inside

46
00:02:18,566 --> 00:02:22,633
which we're going to print first X train.

47
00:02:22,966 --> 00:02:25,366
And then let me copy this.

48
00:02:25,366 --> 00:02:28,700
And then we're going to print X test.

49
00:02:29,100 --> 00:02:30,233
All right.

50
00:02:30,233 --> 00:02:33,533
So let's first run this to apply
feature scaling.

51
00:02:33,833 --> 00:02:36,200
Perfect. There we go. No execution error.

52
00:02:36,200 --> 00:02:37,833
Then let's print X train.

53
00:02:38,833 --> 00:02:42,300
And of
course we get well still the same values

54
00:02:42,300 --> 00:02:46,466
for the dummy variables which are indeed
still between minus three and plus three.

55
00:02:46,466 --> 00:02:51,166
But then our age
and salary variables were transformed

56
00:02:51,266 --> 00:02:55,300
so that they take new values
between minus two and plus two.

57
00:02:55,633 --> 00:02:58,166
Sometimes you will see values
between minus three and plus three.

58
00:02:58,166 --> 00:02:59,666
Here it's minus two and plus two.

59
00:02:59,666 --> 00:03:02,800
But anyway, now all our variables

60
00:03:02,800 --> 00:03:06,433
are on the same scale
and this will be perfect to improve

61
00:03:06,433 --> 00:03:10,266
or optimize the training of certain
machine learning models.

62
00:03:10,266 --> 00:03:13,800
And of course you will see exactly
which ones they're going to be.

63
00:03:13,966 --> 00:03:16,966
The further we progress in this machine
learning course.

64
00:03:17,300 --> 00:03:18,400
So now you know everything.

65
00:03:18,400 --> 00:03:21,633
Let's also execute this cell
to print X test.

66
00:03:21,633 --> 00:03:22,800
And once again will

67
00:03:22,800 --> 00:03:27,600
you still have your dummy variables here
for the same to customers that were here.

68
00:03:27,733 --> 00:03:30,300
But then the age
and the salary were scaled

69
00:03:30,300 --> 00:03:33,300
so that they take once again
values between minus two and plus two.

70
00:03:33,633 --> 00:03:36,533
All right okay. So great.

71
00:03:36,533 --> 00:03:38,900
I'm really happy that we are now done

72
00:03:38,900 --> 00:03:42,866
with this data preprocessing toolkit,
because that means only one thing.

73
00:03:42,933 --> 00:03:47,100
That means that we are ready to start
the exciting steps of the journey,

74
00:03:47,300 --> 00:03:51,566
which is to build machine learning models
that will perform amazing predictions.

75
00:03:51,966 --> 00:03:54,600
And we're going to start
with the regression models,

76
00:03:54,600 --> 00:03:57,633
which will predict
some continuous numerical values.

77
00:03:57,966 --> 00:04:00,966
And we will learn how to do that
on different data sets.

78
00:04:00,966 --> 00:04:05,400
But before we move on to this next part,
I just want to show you the data

79
00:04:05,400 --> 00:04:09,333
preprocessing template,
which will be so useful for us

80
00:04:09,466 --> 00:04:12,866
to tackle in the Fleshlight,
the data preprocessing

81
00:04:12,866 --> 00:04:15,866
phase for each of our future machine
learning models.

82
00:04:15,900 --> 00:04:19,166
Because indeed,
you will see that this template was made

83
00:04:19,166 --> 00:04:22,166
so that we will only have each time

84
00:04:22,200 --> 00:04:26,000
1 or 2 things to change,
and most of the time one thing to change.

85
00:04:26,300 --> 00:04:31,666
Because indeed, in this template
I included the three always used tools

86
00:04:31,666 --> 00:04:35,600
that we will use for our machinery models,
which are importing the libraries.

87
00:04:35,600 --> 00:04:35,833
Right?

88
00:04:35,833 --> 00:04:39,066
We will always need these libraries,
then importing the data set.

89
00:04:39,300 --> 00:04:40,733
And here appreciate that

90
00:04:40,733 --> 00:04:44,466
we will only have one thing to change,
which will be the name of the data set,

91
00:04:44,766 --> 00:04:47,566
because indeed, this line of code
will automatically

92
00:04:47,566 --> 00:04:51,733
take all the columns except the last one,
meaning all your features.

93
00:04:51,900 --> 00:04:55,966
And this line of code will take
automatically the dependent variable.

94
00:04:55,966 --> 00:04:59,233
So here you will only have
the name of the data set to change.

95
00:04:59,533 --> 00:05:03,866
And then of course I included this tool
because for most of our machinery models

96
00:05:03,866 --> 00:05:07,333
we will have to split the data
set into these two separate sets.

97
00:05:07,500 --> 00:05:08,433
One training set

98
00:05:08,433 --> 00:05:12,900
to train our machinery model and one
to set to evaluate its performance.

99
00:05:12,900 --> 00:05:16,866
And here, once again,
we will actually have nothing to change.

100
00:05:17,100 --> 00:05:19,166
So in the whole template,

101
00:05:19,166 --> 00:05:23,333
we will only have one thing to change,
which will be the name of the data set.

102
00:05:23,333 --> 00:05:27,400
And that's why this data preprocessing
template will be so useful for us,

103
00:05:27,600 --> 00:05:32,400
because we will each time tackle the data
preprocessing phase in flashlight.

104
00:05:32,666 --> 00:05:36,000
So make sure to have this template ready
each time.

105
00:05:36,000 --> 00:05:38,566
We're going to build our future machine
learning models.

106
00:05:38,566 --> 00:05:40,133
And now take a good break.

107
00:05:40,133 --> 00:05:41,100
You really deserve it.

108
00:05:41,100 --> 00:05:43,200
After this data preprocessing phase.

109
00:05:43,200 --> 00:05:46,900
And this answers to the questions
that reduce any confusion.

110
00:05:47,100 --> 00:05:48,266
So digest it well.

111
00:05:48,266 --> 00:05:49,600
And as soon as you're ready

112
00:05:49,600 --> 00:05:53,366
to tackle the first branch of machine
learning model, which is regression,

113
00:05:53,566 --> 00:05:56,733
well, let's continue our journey together
in this next part.

114
00:05:56,933 --> 00:05:58,866
And until then, enjoy machine learning.