1
00:00:00,066 --> 00:00:01,033
All right, my friends,

2
00:00:01,033 --> 00:00:04,866
here we are
at the final step of this implementation

3
00:00:04,866 --> 00:00:08,966
and actually the most exciting one,
because this is the step where

4
00:00:08,966 --> 00:00:13,166
we're going to visualize it
on a nice 2D plot, the prediction curve

5
00:00:13,166 --> 00:00:16,900
and the prediction
regions of the logistic regression model.

6
00:00:17,466 --> 00:00:17,966
All right.

7
00:00:17,966 --> 00:00:21,366
So more specifically,
what we're about to plot

8
00:00:21,800 --> 00:00:24,600
is a two dimensional plot with therefore

9
00:00:24,600 --> 00:00:27,766
two axes x and y on the x axis.

10
00:00:27,766 --> 00:00:32,466
You will have the first feature
corresponding to the h, and on the y axis

11
00:00:32,466 --> 00:00:36,833
you'll have the second feature
corresponding to the estimated salary,

12
00:00:37,233 --> 00:00:41,200
and therefore each of the observation
points you will see on the 2D

13
00:00:41,200 --> 00:00:44,666
plot will correspond
to a specific customer.

14
00:00:44,966 --> 00:00:47,933
It will either be a customer
of the training set.

15
00:00:47,933 --> 00:00:50,400
You know,
on the plot of the training set results,

16
00:00:50,400 --> 00:00:54,333
or a customer of the test
set on the plot of the test result.

17
00:00:54,800 --> 00:00:58,133
And what is most interesting to see in

18
00:00:58,133 --> 00:01:02,966
this plot will be the prediction regions,
meaning the regions

19
00:01:03,100 --> 00:01:07,133
where our logistic regression
model predicts the class zero,

20
00:01:07,133 --> 00:01:11,533
meaning the customers
didn't buy the SUV, and the other region

21
00:01:11,533 --> 00:01:15,366
where our logistic regression
model predicts the class one,

22
00:01:15,366 --> 00:01:18,366
meaning the customer, but the SUV.

23
00:01:18,366 --> 00:01:21,366
And lastly,
what will be really interesting to see

24
00:01:21,433 --> 00:01:25,533
is the curve separating these two regions.

25
00:01:25,533 --> 00:01:29,666
You know, the region of the prediction
zero and the region of the predictions

26
00:01:29,666 --> 00:01:33,966
one. And this is exactly how we are going
to see the difference

27
00:01:33,966 --> 00:01:38,066
between linear classifiers
and nonlinear classifiers.

28
00:01:38,066 --> 00:01:42,033
So here we only starting with one
classification model logistic regression.

29
00:01:42,266 --> 00:01:44,466
So we won't compare that yet.

30
00:01:44,466 --> 00:01:48,900
But you will see in the next sections
of this part that the prediction

31
00:01:48,900 --> 00:01:52,833
boundary between these two prediction
regions will be different

32
00:01:52,966 --> 00:01:56,700
depending on whether or not
your classifier is linear.

33
00:01:56,866 --> 00:01:59,166
All right.
So I can't wait to show you this.

34
00:01:59,166 --> 00:02:02,900
And let's start
first by visualizing these training

35
00:02:02,900 --> 00:02:05,966
set and test results
for the logistic regression model.

36
00:02:06,700 --> 00:02:07,266
All right.

37
00:02:07,266 --> 00:02:11,266
So the code to visualize
this is actually pretty advanced.

38
00:02:11,400 --> 00:02:14,600
And not only it is pretty advanced,
but also you will

39
00:02:14,600 --> 00:02:17,600
probably never use it again
in your career.

40
00:02:17,600 --> 00:02:20,600
Or let's say you will never have
to implement that again.

41
00:02:20,700 --> 00:02:21,466
Why is that?

42
00:02:21,466 --> 00:02:23,933
It's because in your career
you will mostly work

43
00:02:23,933 --> 00:02:27,300
with data sets, having many features,
you know, more than two.

44
00:02:27,733 --> 00:02:31,633
And here the only reason why
we have a data set of two features

45
00:02:31,833 --> 00:02:36,333
is so that we can be able to visualize
indeed, well, these prediction regions

46
00:02:36,333 --> 00:02:37,566
and prediction boundary,

47
00:02:37,566 --> 00:02:41,900
because indeed, in order to visualize
this, we need maximum two features,

48
00:02:41,900 --> 00:02:46,100
because one feature corresponds
to one dimension in this plot.

49
00:02:46,500 --> 00:02:50,400
So what I suggest is
that we don't waste too much time,

50
00:02:50,533 --> 00:02:54,200
you know, understanding the whole code
and re-implemented ourselves.

51
00:02:54,500 --> 00:02:57,300
Because really, I'm going
to show it to you right away, you know,

52
00:02:57,300 --> 00:03:00,500
on the original logistic regression
implementation,

53
00:03:00,933 --> 00:03:03,166
you will see that
the code is pretty advanced.

54
00:03:03,166 --> 00:03:07,700
You know, it's not like plotting a
regression curve like we did in part two.

55
00:03:08,133 --> 00:03:10,233
So that's the test results.

56
00:03:10,233 --> 00:03:12,066
Let me show you the training set results.

57
00:03:12,066 --> 00:03:12,466
All right.

58
00:03:12,466 --> 00:03:13,533
So that's the code.

59
00:03:13,533 --> 00:03:18,066
You see it's uses a lot of tricks
to plot all these observation points.

60
00:03:18,066 --> 00:03:20,400
Prediction regions
and prediction boundary.

61
00:03:20,400 --> 00:03:23,400
So if you want to have a look at it
and understand it fine.

62
00:03:23,400 --> 00:03:28,266
But really for the others it's totally
okay if we don't cover this code in detail

63
00:03:28,300 --> 00:03:30,933
because this is only for training
purposes.

64
00:03:30,933 --> 00:03:32,166
Just so that I can show you

65
00:03:32,166 --> 00:03:35,600
the differences between linear classifiers
and nonlinear classifiers,

66
00:03:35,800 --> 00:03:39,600
and you will probably never use that again
in your future machine learning project.

67
00:03:39,933 --> 00:03:43,166
However, what I will do
just now is explain how it's done.

68
00:03:43,500 --> 00:03:47,066
Basically, what we do is we create,
as you can see, a grid

69
00:03:47,266 --> 00:03:51,133
which is basically this frame here
containing all the edges of your features

70
00:03:51,133 --> 00:03:54,133
and all the estimated salaries,
you know, the ranges,

71
00:03:54,266 --> 00:03:57,300
and you create this grid
with a high density, meaning that

72
00:03:57,300 --> 00:04:02,933
the pixels of this grid are not separated
one by one, but every oh point 25.

73
00:04:02,933 --> 00:04:05,400
So here, for example, for the age,
it goes this way.

74
00:04:05,400 --> 00:04:11,366
It goes from 10 to 10.20 5
to 10.5 to 10.75 to 11, etcetera.

75
00:04:11,366 --> 00:04:15,400
Up to 69, 69.25, 69.5,

76
00:04:15,400 --> 00:04:20,100
69.75, 70 okay,
and same for the estimated salary.

77
00:04:20,300 --> 00:04:25,800
It goes from 20,000,
then 20,000.25, 20,000.5, etc.

78
00:04:25,800 --> 00:04:31,100
up to somewhere around 149,000 149,000.25.

79
00:04:31,100 --> 00:04:36,000
You see, so resulting in having super
dense points inside this grid

80
00:04:36,400 --> 00:04:39,700
and then the trick, you know, what we did
is not only

81
00:04:39,700 --> 00:04:43,133
we plotted all the real observation
points in the grid.

82
00:04:43,133 --> 00:04:47,266
So all the points that you see here are
the customers of either your training sets

83
00:04:47,266 --> 00:04:50,666
and then later on your test set,
the green points are, of course,

84
00:04:50,666 --> 00:04:54,766
the customers who bought the SUV,
you know, represented by one here.

85
00:04:55,166 --> 00:04:58,400
And the red points are,
of course, the customers who didn't buy

86
00:04:58,566 --> 00:05:01,433
the SUV represented by zero here.

87
00:05:01,433 --> 00:05:05,066
Okay, so all the points
are your observation points.

88
00:05:05,066 --> 00:05:06,166
Your customers.

89
00:05:06,166 --> 00:05:10,033
And then so the trick in order
to plot the prediction regions

90
00:05:10,033 --> 00:05:14,300
and therefore that prediction
boundary here separating the two regions

91
00:05:14,800 --> 00:05:17,433
is to apply to predict method

92
00:05:17,433 --> 00:05:20,900
onto each of these dense points
in the grid,

93
00:05:21,166 --> 00:05:24,800
so that all the dense points here,
you know, in this region

94
00:05:24,900 --> 00:05:29,600
were actually predicted to be zero,
meaning all the customers,

95
00:05:29,600 --> 00:05:34,800
you know, other customers inside this
region are predicted not to buy the SUV.

96
00:05:35,100 --> 00:05:38,266
And all the observation points
in this green

97
00:05:38,266 --> 00:05:41,733
region
are actually predicted to by the SUV.

98
00:05:42,066 --> 00:05:43,800
So you see how this works.
That's the trick.

99
00:05:43,800 --> 00:05:45,900
And then really
you don't have to understand

100
00:05:45,900 --> 00:05:50,000
all the techniques used to implement this,
because once again,

101
00:05:50,000 --> 00:05:53,766
you will probably never have to implement
that kind of code in your career.