1
00:00:00,066 --> 00:00:01,200
All right, so that's the problem.

2
00:00:01,200 --> 00:00:01,900
I hope you like it.

3
00:00:01,900 --> 00:00:04,233
I hope you're excited to work on it.

4
00:00:04,233 --> 00:00:07,800
And so now we're gonna, without further
ado, start

5
00:00:07,800 --> 00:00:12,566
our logistic regression
implementation on your favorite IDE.

6
00:00:12,966 --> 00:00:17,466
Whether it is Google Colaboratory
or Jupyter Notebook, you have the choice.

7
00:00:17,700 --> 00:00:20,900
But my favorite is by far Google
Colaboratory.

8
00:00:20,900 --> 00:00:26,366
So if you love it to follow me here
and now, let's re-implement

9
00:00:26,366 --> 00:00:29,900
this logistic regression
implementation step by step.

10
00:00:30,033 --> 00:00:33,100
Right now it is laying out the notebook
and we were about to have it in a second.

11
00:00:33,100 --> 00:00:34,100
There we go.

12
00:00:34,100 --> 00:00:34,433
All right.

13
00:00:34,433 --> 00:00:36,333
So that's the whole notebook.

14
00:00:36,333 --> 00:00:37,933
It is in Read-Only mode.

15
00:00:37,933 --> 00:00:42,600
So right now what we have to do
is to create a copy of this notebook.

16
00:00:42,866 --> 00:00:46,766
And to do this we just have to click
save a Copy and drive.

17
00:00:47,066 --> 00:00:48,633
And this will create a copy.

18
00:00:48,633 --> 00:00:52,200
As you can see of this notebook
in which we will be able

19
00:00:52,200 --> 00:00:55,200
to re-implement
the whole model from scratch.

20
00:00:55,800 --> 00:00:56,733
All right. Great.

21
00:00:56,733 --> 00:01:00,600
So as usual, the first thing we're going
to do is to delete all the code cells.

22
00:01:00,600 --> 00:01:02,833
Right. Because I want you to take action.

23
00:01:02,833 --> 00:01:04,666
I want you to learn by doing so

24
00:01:04,666 --> 00:01:08,500
I really, really want you to reimplement
all these code cells from scratch.

25
00:01:08,500 --> 00:01:10,466
So we're going to delete all of them.

26
00:01:10,466 --> 00:01:13,733
To do this we just have to click them
and then click the trash button.

27
00:01:13,733 --> 00:01:16,500
Here. Just do as I do.

28
00:01:16,500 --> 00:01:16,800
All right.

29
00:01:16,800 --> 00:01:18,900
And make sure not to delete the text cells

30
00:01:18,900 --> 00:01:22,433
because we want to keep that
well highlighted structure.

31
00:01:22,933 --> 00:01:24,833
All right features killing.

32
00:01:24,833 --> 00:01:29,000
So yes they will be feature
scaling for logistic regression.

33
00:01:29,000 --> 00:01:30,933
And I will explain why. All right.

34
00:01:30,933 --> 00:01:35,033
So now we train the logistic regression
model predict a new result.

35
00:01:35,100 --> 00:01:35,666
All right.

36
00:01:35,666 --> 00:01:38,400
And you really have everything
in this implementation.

37
00:01:38,400 --> 00:01:41,700
You'll see that you will learn
how to predict an ensemble of results.

38
00:01:41,700 --> 00:01:42,533
You know in the test set.

39
00:01:42,533 --> 00:01:45,033
You will also learn
how to predict a single result.

40
00:01:45,033 --> 00:01:45,533
Like, you know,

41
00:01:45,533 --> 00:01:47,233
when you deploy your model in production,

42
00:01:47,233 --> 00:01:49,833
when you want to predict
a single observation.

43
00:01:49,833 --> 00:01:51,366
So now confusion matrix

44
00:01:51,366 --> 00:01:54,966
that's to evaluate your model
and of course the visualizations at the.

45
00:01:55,133 --> 00:01:59,033
And once again I chose a data
set of only two features right.

46
00:01:59,033 --> 00:02:03,433
The age and the estimated salary,
so that we can indeed visualize

47
00:02:03,600 --> 00:02:06,433
the results in the end on the training set
and on the test set.

48
00:02:06,433 --> 00:02:11,200
Because remember, in the plot,
each dimension corresponds to one feature,

49
00:02:11,200 --> 00:02:14,366
and therefore there are as many dimensions
as there are features.

50
00:02:14,633 --> 00:02:17,933
And so since we have two features,
we'll have a nice 2D plot.

51
00:02:17,933 --> 00:02:20,866
And that's exactly the reason
why I needed to take two features.

52
00:02:20,866 --> 00:02:24,300
But no worries, the implementations
were able to make works

53
00:02:24,300 --> 00:02:27,366
for any data set regardless
the number of features.

54
00:02:27,533 --> 00:02:30,100
And I will prove this to you
at the end of this part.

55
00:02:30,100 --> 00:02:32,966
When deploying all our classification
models

56
00:02:32,966 --> 00:02:36,900
on a brand new generic data
set with more features.

57
00:02:37,233 --> 00:02:41,700
And this is how I will also teach you
on how to select the best model.

58
00:02:41,800 --> 00:02:43,800
All right, so there you go.

59
00:02:43,800 --> 00:02:44,933
I hope you're excited.

60
00:02:44,933 --> 00:02:48,600
You know, both by the problem case study
and this implementation.

61
00:02:48,966 --> 00:02:51,966
And now before we finish
and move on to the next tutorial,

62
00:02:52,066 --> 00:02:55,133
well I would like you
to do a little exercise.

63
00:02:55,433 --> 00:02:58,066
Now that you saw the data set
and understands it.

64
00:02:58,066 --> 00:03:02,400
And since you also have your data
preprocessing template, well there you go.

65
00:03:02,400 --> 00:03:05,500
The exercise is
I would like you to implement

66
00:03:05,500 --> 00:03:09,066
on your own the data preprocessing phase
up to this step.

67
00:03:09,066 --> 00:03:10,900
You know feature scaling.

68
00:03:10,900 --> 00:03:11,600
So basically

69
00:03:11,600 --> 00:03:15,533
I would like to implement on your own
this step importing the libraries.

70
00:03:15,666 --> 00:03:17,700
Then this step importing the data set

71
00:03:17,700 --> 00:03:20,766
and this step splitting the data
set into the training set and test it.

72
00:03:21,033 --> 00:03:25,200
And finally this last step of the data
preprocessing phase feature scaling.

73
00:03:25,266 --> 00:03:25,900
All right.

74
00:03:25,900 --> 00:03:30,033
So please try use
of course your data preprocessing template

75
00:03:30,333 --> 00:03:33,133
and of course
your data preprocessing toolkit.

76
00:03:33,133 --> 00:03:37,500
Because indeed in order to implement
that step you will need to grab a tool

77
00:03:37,533 --> 00:03:40,766
of your data preprocessing toolkit
and I'm sure you will find it.

78
00:03:41,033 --> 00:03:43,300
So you can totally do this on your own.

79
00:03:43,300 --> 00:03:45,866
There is no trap.
It's actually super easy.

80
00:03:45,866 --> 00:03:48,900
And of course
we will implement the solution together

81
00:03:49,200 --> 00:03:52,766
in the next tutorial, so I can't
wait to see what you end up with.

82
00:03:52,833 --> 00:03:56,100
And I'm
sure we will end up with the same thing.

83
00:03:56,100 --> 00:03:58,966
So let's see.
And until then, enjoy machine learning.