1
00:00:00,566 --> 00:00:03,300
All right, so now you understand
it better.

2
00:00:03,300 --> 00:00:07,300
But remember that
these are the training set results.

3
00:00:07,366 --> 00:00:12,233
Therefore all these customers that we see
here are actually in the training set.

4
00:00:12,266 --> 00:00:17,366
Therefore these are customers with which
or logistic regression model was trained.

5
00:00:17,500 --> 00:00:20,200
And therefore that's kind of easy
to provide such results

6
00:00:20,200 --> 00:00:23,166
because these are exactly
the observations of the training.

7
00:00:23,166 --> 00:00:27,600
But now what we would like to see
is how our logistic regression model

8
00:00:27,600 --> 00:00:31,566
was able to perform on new observations,
meaning

9
00:00:31,566 --> 00:00:34,566
on the observations of the test set,
the customers of the test set.

10
00:00:34,766 --> 00:00:39,300
Because indeed the customers of the test
set are new customers

11
00:00:39,300 --> 00:00:42,766
with which our logistic regression
model wasn't trained.

12
00:00:43,033 --> 00:00:46,066
And so we have to see
if our logistic regression

13
00:00:46,066 --> 00:00:49,033
model was still able to separate.

14
00:00:49,033 --> 00:00:53,000
Well, the two classes,
meaning the customers who bought SUV

15
00:00:53,000 --> 00:00:56,433
and the customers who didn't buy the SUV,
even despite the fact

16
00:00:56,433 --> 00:00:59,833
that these are new customers
on which the model wasn't trained.

17
00:01:00,033 --> 00:01:05,200
And that's exactly what we're about to see
now when visualizing the test results.

18
00:01:05,366 --> 00:01:07,600
We already executed the cell here.

19
00:01:07,600 --> 00:01:09,900
And so these are the test results.

20
00:01:09,900 --> 00:01:14,166
And still our logistic regression
model was perfectly able to separate.

21
00:01:14,166 --> 00:01:16,666
Well the two classes zero.

22
00:01:16,666 --> 00:01:18,233
You know all those red points here.

23
00:01:18,233 --> 00:01:20,400
And one all those green points.

24
00:01:20,400 --> 00:01:23,833
There are still some incorrect predictions
of course, like this customer

25
00:01:23,833 --> 00:01:28,200
who in reality didn't buy
the new the brand new beautiful SUV.

26
00:01:28,533 --> 00:01:30,033
But was predicted to buy it.

27
00:01:30,033 --> 00:01:33,066
And a few incorrect predictions
here of the other class.

28
00:01:33,233 --> 00:01:36,366
Meaning these customers who in reality
but the SUV

29
00:01:36,633 --> 00:01:39,600
but were predicted not to
because they fall in the red region.

30
00:01:40,733 --> 00:01:41,400
All right.

31
00:01:41,400 --> 00:01:43,266
And so how can we conclude here?

32
00:01:43,266 --> 00:01:46,166
What should we conclude
and what are the takeaways

33
00:01:46,166 --> 00:01:49,533
we should get for our future class
fixation models.

34
00:01:49,833 --> 00:01:54,000
Well, first the logistic regression
model does a very good job

35
00:01:54,000 --> 00:01:55,933
at separating our two classes

36
00:01:55,933 --> 00:01:59,600
and therefore at predicting
whether the customers but the SUV.

37
00:01:59,933 --> 00:02:03,700
But we actually would hope
to build a model

38
00:02:03,900 --> 00:02:06,900
that has less prediction errors.

39
00:02:07,033 --> 00:02:08,200
And how can we build one.

40
00:02:08,200 --> 00:02:11,900
What would we need to get,
you know, as the prediction curve in order

41
00:02:11,900 --> 00:02:16,033
not to predict incorrectly
all these wrong predictions here.

42
00:02:16,033 --> 00:02:18,333
You know, all these customers here.

43
00:02:18,333 --> 00:02:21,133
Well, we actually would need
a prediction boundary

44
00:02:21,133 --> 00:02:23,300
that is something else
than a straight line.

45
00:02:23,300 --> 00:02:27,466
Because even if you try to rotate
your prediction line, for example,

46
00:02:27,466 --> 00:02:31,766
to be like that, well it will still catch
many incorrect predictions.

47
00:02:32,033 --> 00:02:36,700
So what we would need to get, you know,
ultimately is some kind of curve,

48
00:02:36,700 --> 00:02:40,100
some kind of prediction curve
that goes this way, catches

49
00:02:40,100 --> 00:02:43,100
all the red points,
you know, all the red customers here

50
00:02:43,100 --> 00:02:47,100
and then go around like this
in order to catch all the red points.

51
00:02:47,100 --> 00:02:48,800
The red customers, and leave

52
00:02:48,800 --> 00:02:52,600
all the green points to green customers
inside the green region.

53
00:02:53,033 --> 00:02:56,433
And well, as you might guess,
this is what we might be able

54
00:02:56,433 --> 00:02:59,266
to get with nonlinear classifiers.

55
00:02:59,266 --> 00:03:03,666
I won't tell you more now, but be ready
for some even more performance

56
00:03:03,833 --> 00:03:06,700
classification models
that managed to separate.

57
00:03:06,700 --> 00:03:09,000
Even better, these two classes.

58
00:03:09,000 --> 00:03:10,100
So there we go.

59
00:03:10,100 --> 00:03:12,066
That was the big part of the job.
You did it.

60
00:03:12,066 --> 00:03:13,000
And now follow me

61
00:03:13,000 --> 00:03:16,800
in the next sections to implement
the other classification models.

62
00:03:17,033 --> 00:03:18,966
And until then, enjoy machine learning.