1
00:00:00,333 --> 00:00:03,000
And now next one is going to be.

2
00:00:03,000 --> 00:00:06,000
Well, the other vector
we want to concatenate

3
00:00:06,000 --> 00:00:09,933
to that vector of predicted profit,
which is the vector of real profits.

4
00:00:10,100 --> 00:00:10,400
All right.

5
00:00:10,400 --> 00:00:13,833
So here we can do this very efficiently
because this is exactly the same trick.

6
00:00:14,233 --> 00:00:17,900
I'm going to copy all this and paste

7
00:00:17,900 --> 00:00:21,600
that here and just replace y pred by.

8
00:00:21,966 --> 00:00:24,400
What do we have to replace y bread by.

9
00:00:24,400 --> 00:00:27,300
Well of course
we have to replace it by whitest

10
00:00:27,300 --> 00:00:31,933
because whitest contains
of course the real profit in the test set.

11
00:00:32,100 --> 00:00:32,566
All right.

12
00:00:32,566 --> 00:00:34,800
Here we are evaluating our model
on the test set.

13
00:00:34,800 --> 00:00:36,766
So here we go.

14
00:00:36,766 --> 00:00:39,333
Replacing white bread by white test.

15
00:00:39,333 --> 00:00:43,000
And here as well
actually we could keep that

16
00:00:43,000 --> 00:00:47,033
because the length of white test
is the same as the length of white bread.

17
00:00:47,500 --> 00:00:48,300
But there we go.

18
00:00:48,300 --> 00:00:52,766
Now we have a beautiful concatenation
of two vertical vectors.

19
00:00:53,100 --> 00:00:54,333
But remember this.

20
00:00:54,333 --> 00:00:58,500
You know this up to here is actually

21
00:00:58,500 --> 00:01:02,000
the first argument
of the concatenate function.

22
00:01:02,333 --> 00:01:05,333
And therefore
we need to add the second one

23
00:01:05,466 --> 00:01:08,400
which is the axis as you can see.

24
00:01:08,400 --> 00:01:11,633
So axis here can take two values 0 or 1.

25
00:01:11,866 --> 00:01:14,466
Zero means that we want to do
a vertical concatenation.

26
00:01:14,466 --> 00:01:17,466
And one means that we want to do
a horizontal concatenation.

27
00:01:17,766 --> 00:01:21,600
And since here we want to concatenate
two vertical vectors together.

28
00:01:21,733 --> 00:01:24,433
Well that concatenation
is actually horizontal.

29
00:01:24,433 --> 00:01:27,833
And therefore we have to input here
axis equals one.

30
00:01:27,833 --> 00:01:28,900
And we don't have to specify

31
00:01:28,900 --> 00:01:32,166
the name of the argument
because this is input in the same order.

32
00:01:32,933 --> 00:01:33,600
All right.

33
00:01:33,600 --> 00:01:34,233
Okay good.

34
00:01:34,233 --> 00:01:37,966
So now we're going
to observe the final result and see

35
00:01:38,200 --> 00:01:41,666
if our model was able
to return some predictions.

36
00:01:41,666 --> 00:01:45,433
You know, some predicted profits
close to the real profit.

37
00:01:45,600 --> 00:01:46,433
So there we go.

38
00:01:46,433 --> 00:01:49,200
Let's press play to run the sale.

39
00:01:49,200 --> 00:01:50,500
And awesome.

40
00:01:50,500 --> 00:01:52,233
I didn't make any mistake. Perfect.

41
00:01:52,233 --> 00:01:54,133
So let's recap.

42
00:01:54,133 --> 00:01:56,500
We can clearly see that
we have two vectors here.

43
00:01:56,500 --> 00:01:58,800
That's the first one
and that's the second one.

44
00:01:58,800 --> 00:02:02,066
On the left
we have the vector of predicted profit.

45
00:02:02,066 --> 00:02:03,400
So that's why pred.

46
00:02:03,400 --> 00:02:06,700
And on the right
we have the vector of real profits for

47
00:02:06,966 --> 00:02:10,500
of course
the ten startups of the test set.

48
00:02:11,066 --> 00:02:13,166
All right.
And so let's see let's see what we get.

49
00:02:13,166 --> 00:02:16,233
Let's see if our predicted profit
are close to the real profit.

50
00:02:16,600 --> 00:02:22,433
So for the first drop of the test set well
the predicted profit is around 103,000.

51
00:02:22,433 --> 00:02:25,933
And the real profit
is actually 103,002 hundred.

52
00:02:25,933 --> 00:02:26,700
So very close.

53
00:02:26,700 --> 00:02:29,633
That's perfect.
That's an amazing first prediction.

54
00:02:29,633 --> 00:02:31,400
Then second startup of the test set.

55
00:02:31,400 --> 00:02:35,400
The predicted profit is 132,582.

56
00:02:35,666 --> 00:02:39,000
And the real profit is actually 144,000.

57
00:02:39,000 --> 00:02:42,333
So not a great prediction like before,
but still not too bad.

58
00:02:42,733 --> 00:02:45,733
Third startup 132 146.

59
00:02:45,833 --> 00:02:48,900
Still not great,
but not too bad either for startup

60
00:02:48,900 --> 00:02:52,400
71 actually 72 and 78.

61
00:02:52,666 --> 00:02:58,300
All right,
so pretty close then 178 191 okay.

62
00:02:58,666 --> 00:03:01,300
116 105.

63
00:03:01,300 --> 00:03:03,433
So actually
the first prediction was amazing.

64
00:03:03,433 --> 00:03:06,433
But then you know 
the other ones are still quite good.

65
00:03:06,900 --> 00:03:10,500
Then 6768 actually 81 okay.

66
00:03:10,866 --> 00:03:13,866
98,090 7000.

67
00:03:13,866 --> 00:03:14,733
Very good.

68
00:03:14,733 --> 00:03:19,300
113,000 114,000 110,000.

69
00:03:19,433 --> 00:03:20,600
Very, very good.

70
00:03:20,600 --> 00:03:26,000
And 167,000 and 166,000
amazing predictions.

71
00:03:26,000 --> 00:03:29,333
So we have some, you know, amazing
predictions, very close

72
00:03:29,333 --> 00:03:32,500
to the real profits
and some okay predictions.

73
00:03:32,500 --> 00:03:35,766
You know, okay, is there are not too far
from the real results.

74
00:03:35,766 --> 00:03:37,933
So here from what we see.

75
00:03:37,933 --> 00:03:39,166
Well we could say that

76
00:03:39,166 --> 00:03:42,700
the multiple linear regression
is well adapted to this data set.

77
00:03:43,033 --> 00:03:47,433
The data set does not necessarily
have some perfect linear correlations.

78
00:03:47,700 --> 00:03:51,400
However, you can be assured that
with this linear regression class, well,

79
00:03:51,400 --> 00:03:53,100
it was able to select the right features

80
00:03:53,100 --> 00:03:55,866
with the right parameters
to make these predictions.

81
00:03:55,866 --> 00:03:59,866
And even if you tune your linear
regression model by, for example, applying

82
00:03:59,866 --> 00:04:04,533
backward elimination to select, you know,
a team of more statistically significant

83
00:04:04,533 --> 00:04:08,133
features, you will actually get
similar results you can try.

84
00:04:08,133 --> 00:04:10,300
That actually would be a good practice.

85
00:04:10,300 --> 00:04:12,500
We actually do that in the R section.

86
00:04:12,500 --> 00:04:15,333
But in terms of performance
this won't change much.

87
00:04:15,333 --> 00:04:19,433
And remember your goal is to be efficient
when building

88
00:04:19,433 --> 00:04:21,333
and testing your machine learning models.

89
00:04:21,333 --> 00:04:24,866
So when you get such results
with your multiple linear regression,

90
00:04:25,133 --> 00:04:28,100
you know in real life
you will actually try other models,

91
00:04:28,100 --> 00:04:31,100
you will actually try other modules
which you can tune also.

92
00:04:31,233 --> 00:04:33,600
And then in the
and you will compare the performance

93
00:04:33,600 --> 00:04:36,400
of each of these models
and select the best one.

94
00:04:36,400 --> 00:04:39,400
So we'll talk about this again
at the end of this section.

95
00:04:39,400 --> 00:04:43,100
And also a lot important
on model selection.

96
00:04:43,500 --> 00:04:47,666
And so now I have to say congratulations
because you now know how to build

97
00:04:47,666 --> 00:04:51,400
another machine learning model which is
multiple linear regression and therefore

98
00:04:51,400 --> 00:04:54,866
which you can add in your toolkit
thanks to this new code template.

99
00:04:55,600 --> 00:04:58,433
Perfect.
So now we're going to move on to R.

100
00:04:58,433 --> 00:05:01,900
I remind that you don't have to master
the two programing languages.

101
00:05:01,900 --> 00:05:03,633
If you want to master to. That's fine.

102
00:05:03,633 --> 00:05:05,233
Join me in the R tutorials.

103
00:05:05,233 --> 00:05:07,466
And otherwise
if you want to stick to Python,

104
00:05:07,466 --> 00:05:11,366
well feel free to skip the R section
and join us, Carol and I, in

105
00:05:11,366 --> 00:05:15,533
the next section on polynomial regression,
where you will learn

106
00:05:15,800 --> 00:05:19,233
how to make predictions on a linear
data set.

107
00:05:19,233 --> 00:05:23,400
You know, on a data set with non
linear relationships, therefore,

108
00:05:23,400 --> 00:05:27,300
with which a multiple linear regression
model would not be relevant.

109
00:05:27,766 --> 00:05:30,733
So it's an absolutely necessary model
to add in your toolkit.

110
00:05:30,733 --> 00:05:32,866
And you will added in the next section.

111
00:05:32,866 --> 00:05:34,633
Until then, enjoy machine learning.