1
00:00:00,133 --> 00:00:01,633
Hello my friends, and welcome

2
00:00:01,633 --> 00:00:06,000
to this new practical activity
on support vector machines.

3
00:00:06,533 --> 00:00:06,900
All right.

4
00:00:06,900 --> 00:00:11,466
So we already built two classification
models logistic regression and k.

5
00:00:11,466 --> 00:00:14,466
And then we got the best results
so far with Kinen.

6
00:00:14,500 --> 00:00:17,500
And now let's see if SVM can beat it.

7
00:00:17,966 --> 00:00:18,266
All right.

8
00:00:18,266 --> 00:00:21,800
So before we start as usual let's make
sure everyone here is on the same page.

9
00:00:22,033 --> 00:00:26,333
And if that's the case then follow me
into part three classification

10
00:00:26,466 --> 00:00:30,466
and then section 16 Support
Vector machine SVM.

11
00:00:30,866 --> 00:00:33,466
And we're going to start with Python
of course as usual.

12
00:00:33,466 --> 00:00:36,300
And in this Python folder
you will get two files.

13
00:00:36,300 --> 00:00:40,666
The first is the same data
set social network at dot csv

14
00:00:40,900 --> 00:00:46,000
containing 400 observations, where each
observation is actually a customer

15
00:00:46,166 --> 00:00:51,533
who but yes or no SUV
that is advertised on social networks.

16
00:00:51,766 --> 00:00:55,566
And for each of these customers
you have the age, the estimated salary.

17
00:00:55,566 --> 00:00:57,533
So these are the two features.

18
00:00:57,533 --> 00:01:00,966
And with these two features
you will predict the dependent variable

19
00:01:00,966 --> 00:01:05,366
purchased meaning
whether or not the customers but the SUV.

20
00:01:05,366 --> 00:01:09,633
So one means yes, the customer
but a previous SUV and zero means no.

21
00:01:09,700 --> 00:01:12,166
The customer didn't buy any SUV.

22
00:01:12,166 --> 00:01:14,133
All right so same data set.

23
00:01:14,133 --> 00:01:18,466
And of course the second file is this
support vector machine implementation.

24
00:01:18,633 --> 00:01:20,133
And the Ipynb format,

25
00:01:20,133 --> 00:01:23,733
which you can either open with Google
Colaboratory or Jupyter Notebook.

26
00:01:24,000 --> 00:01:27,766
And as far as I'm concerned, I'm going
to open it with Google Collaboratory.

27
00:01:27,766 --> 00:01:30,766
But feel free to choose your favorite ID.

28
00:01:31,033 --> 00:01:31,533
All right.

29
00:01:31,533 --> 00:01:35,000
So let's put that file
here. Actually here.

30
00:01:35,200 --> 00:01:37,900
And right now it is loading the notebook
laying it out.

31
00:01:37,900 --> 00:01:40,566
And in a second we should have it open.

32
00:01:40,566 --> 00:01:41,766
There we go.

33
00:01:41,766 --> 00:01:42,033
All right.

34
00:01:42,033 --> 00:01:45,100
So that's the whole support
vector machine implementation.

35
00:01:45,100 --> 00:01:48,033
And of course it is
exactly the same as before.

36
00:01:48,033 --> 00:01:51,766
In order to re-implement this
we will only have to change

37
00:01:51,900 --> 00:01:54,900
the code cell
where we build and train this model.

38
00:01:55,100 --> 00:01:59,466
Because indeed this implementation results
from the exact same classification

39
00:01:59,466 --> 00:02:03,366
template that we made when we built
the logistic regression model.

40
00:02:03,666 --> 00:02:05,500
We saw clearly when implementing

41
00:02:05,500 --> 00:02:09,266
the K-nearest neighbors model,
how indeed we only had to change one cell

42
00:02:09,266 --> 00:02:12,733
and how this template worked super
well for that model.

43
00:02:12,900 --> 00:02:16,100
So here for SVM
we're going to do exactly the same.

44
00:02:16,233 --> 00:02:17,900
We're just going to leave all the cells

45
00:02:17,900 --> 00:02:21,166
as they are, as they actually were
in the logistic regression model.

46
00:02:21,366 --> 00:02:25,400
And we will only re-implement
the cell where we built the SVM.

47
00:02:25,933 --> 00:02:26,400
All right.

48
00:02:26,400 --> 00:02:27,300
So let's do this.

49
00:02:27,300 --> 00:02:29,966
Let's create a new copy of this file.

50
00:02:29,966 --> 00:02:32,000
Because this file is in read only mode.

51
00:02:32,000 --> 00:02:34,800
So let's click
here. Save a copy and drive.

52
00:02:34,800 --> 00:02:39,600
And this will create a copy inside
which we will indeed be able to modify

53
00:02:39,600 --> 00:02:40,500
the implementation.

54
00:02:40,500 --> 00:02:44,533
And mostly to re-implement
that could sell to build the SVM model.

55
00:02:45,033 --> 00:02:46,500
All right. Perfect.

56
00:02:46,500 --> 00:02:50,466
So at the beginning of course,
we start with the data preprocessing phase

57
00:02:50,466 --> 00:02:53,733
with all the same outputs
displayed on the notebook.

58
00:02:53,966 --> 00:02:55,266
So that's all good.

59
00:02:55,266 --> 00:02:57,100
Then we apply feature scaling
because you know

60
00:02:57,100 --> 00:02:59,100
it improves the training performance.

61
00:02:59,100 --> 00:03:02,066
And anyway it's never bad
to apply feature scaling.

62
00:03:02,066 --> 00:03:04,100
And finally there we go.

63
00:03:04,100 --> 00:03:08,500
That's the cell
we have to re-implement together.

64
00:03:08,500 --> 00:03:09,300
Because indeed

65
00:03:09,300 --> 00:03:12,833
it is the one that differs with respect
to the previous implementations.

66
00:03:13,066 --> 00:03:17,000
So let's click this trash button here to,
you know, re-implement it again.

67
00:03:17,000 --> 00:03:18,633
Let's create a new code cell.

68
00:03:18,633 --> 00:03:23,033
And now my friends, over to you
once again I would like you to please

69
00:03:23,033 --> 00:03:26,233
press pause on the video
and try to implement that code.

70
00:03:26,233 --> 00:03:26,900
Sell yourself.

71
00:03:26,900 --> 00:03:29,400
And that's
because I not only want to train you

72
00:03:29,400 --> 00:03:32,566
in machine learning, but also train you on
how to be independent

73
00:03:32,700 --> 00:03:33,900
with machine learning.

74
00:03:33,900 --> 00:03:38,133
So right now, the exercise
I want you to do is to do some research

75
00:03:38,133 --> 00:03:41,166
in the cycle Learning API to figure out

76
00:03:41,233 --> 00:03:44,333
which class allows to build the SVM model.

77
00:03:44,433 --> 00:03:46,933
So you will find it very easily actually,
because

78
00:03:46,933 --> 00:03:50,400
there is no trap in the name of the class
or the name of the module.

79
00:03:50,566 --> 00:03:51,400
So I trust

80
00:03:51,400 --> 00:03:56,100
you will totally be able to do this
exercise successfully and mostly know

81
00:03:56,100 --> 00:04:00,700
which method to use at the end to train
that SVM model on the training set.

82
00:04:01,266 --> 00:04:02,766
All right, so please press pause.

83
00:04:02,766 --> 00:04:05,566
And now in two seconds
I'm going to give you the solution.

84
00:04:07,733 --> 00:04:09,133
All right let's do this.

85
00:04:09,133 --> 00:04:12,566
So I already have the cycle API open.

86
00:04:12,600 --> 00:04:15,266
You know
that was for the nearest neighbors.

87
00:04:15,266 --> 00:04:18,066
The k nearest neighbors
which we implemented previously.

88
00:04:18,066 --> 00:04:21,800
In the previous section
we used this class k neighbors classifier.

89
00:04:22,033 --> 00:04:26,466
And now the next thing we would like
to find in this API documentation

90
00:04:26,700 --> 00:04:31,533
is the module that contains the class
that allows to build the SVM model.

91
00:04:31,900 --> 00:04:34,066
So naturally where can we find it?

92
00:04:34,066 --> 00:04:37,333
You know here
should we scroll back up or scroll down?

93
00:04:37,566 --> 00:04:41,333
Well, let's hope that you know,
the name of the module starts with an S,

94
00:04:41,333 --> 00:04:45,200
because here, you know the modules
are organized by alphabetical order.

95
00:04:45,200 --> 00:04:49,866
So since here we are at N use neighbors,
let's hope that the name of the module

96
00:04:49,866 --> 00:04:53,400
we're looking for starts with a nest
like support vector machine.

97
00:04:53,400 --> 00:04:57,900
So let's scroll down and random
projections semi-supervised learning.

98
00:04:57,900 --> 00:05:01,300
And there we go support vector machines.

99
00:05:01,300 --> 00:05:02,033
Hello.

100
00:05:02,033 --> 00:05:04,200
That's exactly what we were looking for.

101
00:05:04,200 --> 00:05:06,633
Support vector machine.
So that's not the name of the module.

102
00:05:06,633 --> 00:05:08,966
The name of the module is SVM. It's same.

103
00:05:08,966 --> 00:05:11,700
That stands for Support Vector machines.

104
00:05:11,700 --> 00:05:12,133
All right.

105
00:05:12,133 --> 00:05:14,500
And then
well you know the hardest part is done

106
00:05:14,500 --> 00:05:17,966
now according to you which estimator is
you know because here you have

107
00:05:17,966 --> 00:05:22,333
all the basically support vector machines
based machine learning models.

108
00:05:22,566 --> 00:05:25,200
And so according to you
which one do we need to take here.

109
00:05:25,200 --> 00:05:28,133
Well, we actually have two options.

110
00:05:28,133 --> 00:05:31,800
We could either take the linear SVC
which will directly

111
00:05:31,800 --> 00:05:34,800
build the linear support vector
machine model,

112
00:05:34,800 --> 00:05:39,500
or we can take this one
SVC and choose a linear kernel.

113
00:05:39,966 --> 00:05:40,600
All right.

114
00:05:40,600 --> 00:05:44,566
And we will actually go for this option
because in the next section

115
00:05:44,566 --> 00:05:48,600
we will study the kernel SVM models,
which as you might guess

116
00:05:48,600 --> 00:05:53,233
allow us to choose some different kernels
in our SVM, including the linear one

117
00:05:53,233 --> 00:05:57,833
and the nonlinear ones,
like for example, the very famous one RBF.