1
00:00:00,970 --> 00:00:07,770
In this video, we are going to learn how to train the aggression as a model on our data.

2
00:00:09,060 --> 00:00:14,690
The only difference between classification SBM and Diggnation SBM is the type of at this point we're

3
00:00:14,700 --> 00:00:15,690
able to do very well.

4
00:00:15,690 --> 00:00:20,010
We are trying to predict if that variable is having classes.

5
00:00:20,740 --> 00:00:22,230
We done a classification model.

6
00:00:22,830 --> 00:00:26,610
If that variable has continuous values, we don't know Diggnation SVM.

7
00:00:28,360 --> 00:00:35,320
So to run a regression, SBM will be using this dataset, which is move integration.

8
00:00:36,070 --> 00:00:39,400
This dataset is also attached in the resources section.

9
00:00:40,330 --> 00:00:41,580
And you can download it from the.

10
00:00:44,480 --> 00:00:47,270
We need to do all the initial steps again.

11
00:00:47,570 --> 00:00:49,160
That is, we will import the data.

12
00:00:49,730 --> 00:00:51,890
We will get that data ready for analysis.

13
00:00:52,330 --> 00:00:57,530
Is that they tend good test and train the U.S. training data to train aggregation model.

14
00:00:57,890 --> 00:01:01,160
And we will use the test data to check out its performance.

15
00:01:02,810 --> 00:01:04,730
So the first step is importing the data.

16
00:01:06,010 --> 00:01:09,130
You know how to import data from us?

17
00:01:09,190 --> 00:01:17,480
Yes, we find we just need to give the location of the file a link that it has headers and use the lead

18
00:01:17,500 --> 00:01:18,550
NTSB function.

19
00:01:19,830 --> 00:01:29,610
Run this on to import the data and this D.F. video will is created, this data frame has 18 variables.

20
00:01:30,830 --> 00:01:31,770
Let us look at this data.

21
00:01:33,840 --> 00:01:38,770
It has all the same variables, only that the last variable is now election.

22
00:01:39,460 --> 00:01:44,550
We are trying to predict the total box office collection of the movies.

23
00:01:45,150 --> 00:01:46,410
This is all these variables.

24
00:01:47,280 --> 00:01:49,590
So how much did it spend on marketing?

25
00:01:49,620 --> 00:01:56,040
How much did it spend on production or what did budget or does lead actor and actress rating and so

26
00:01:56,040 --> 00:01:56,250
on.

27
00:01:56,790 --> 00:01:58,020
So this is these variables.

28
00:01:58,350 --> 00:02:03,870
We are trying to find out the total box office collection of that particular movie.

29
00:02:05,400 --> 00:02:06,660
So this data is important.

30
00:02:08,500 --> 00:02:12,330
Now we have to get this data ready for analysis.

31
00:02:13,290 --> 00:02:14,570
We did this earlier also.

32
00:02:15,580 --> 00:02:20,220
I'm only covering missing value imputation in data processing.

33
00:02:20,850 --> 00:02:26,700
There are other parts also like outlier treatment, looking at correlations, plotting histograms and

34
00:02:26,700 --> 00:02:27,090
so on.

35
00:02:28,440 --> 00:02:30,450
If we are doing only missing value in prediction.

36
00:02:31,300 --> 00:02:35,070
So to check which variable has missing values, then somebody.

37
00:02:38,230 --> 00:02:41,870
And we see that time taken variable as in is.

38
00:02:43,240 --> 00:02:49,570
To remove the end is within this command in which we find ourselves in time taken very well, which

39
00:02:49,570 --> 00:02:57,880
has any, and we replaced the value in those cells using the mean of all the other values in that would

40
00:02:57,890 --> 00:02:58,140
evil.

41
00:02:59,750 --> 00:03:00,420
So we done this.

42
00:03:00,430 --> 00:03:00,850
Come on.

43
00:03:03,100 --> 00:03:09,420
Then we go back and take somebody to confirm that nobody below has.

44
00:03:09,590 --> 00:03:10,150
And these.

45
00:03:14,990 --> 00:03:19,020
The next step we did was to do our best transplant.

46
00:03:20,220 --> 00:03:22,870
So we have a dataset of 506 observations.

47
00:03:23,490 --> 00:03:31,590
If we use all of them to Apprendi model, we will not know how the model will perform on previously

48
00:03:31,680 --> 00:03:32,400
unseen data.

49
00:03:33,330 --> 00:03:36,000
So we will make two parts of this data.

50
00:03:36,750 --> 00:03:42,650
80 percent of the data will be used to train the model, 20 percent of the data we'll keep.

51
00:03:42,840 --> 00:03:48,990
And we will use it to take the performance of this model so that we know how this model perform on previously

52
00:03:49,020 --> 00:03:49,690
unseen data.

53
00:03:51,150 --> 00:03:52,710
To do pestilence, plague.

54
00:03:53,150 --> 00:03:55,190
We use this package called the adults.

55
00:03:56,810 --> 00:03:59,300
It has already installed, but it is not active.

56
00:04:00,050 --> 00:04:01,470
So I'll run this library.

57
00:04:01,490 --> 00:04:01,790
Go on.

58
00:04:01,790 --> 00:04:04,910
Only if it is not installed also for you.

59
00:04:05,150 --> 00:04:06,910
You can Banda's installer packages.

60
00:04:06,950 --> 00:04:07,220
Come on.

61
00:04:08,180 --> 00:04:09,140
So I'll run this command.

62
00:04:12,940 --> 00:04:14,160
And the see it back.

63
00:04:14,280 --> 00:04:15,150
Is not active.

64
00:04:18,650 --> 00:04:24,440
Then we will set seed so that we both get the same split on this dataset.

65
00:04:25,550 --> 00:04:27,020
So I set a seed of zero.

66
00:04:29,230 --> 00:04:31,720
And then we create a variable called split.

67
00:04:32,950 --> 00:04:40,780
This plate has values through and also and 80 percent of the values in this split variable will be true

68
00:04:41,710 --> 00:04:43,690
and nearly 20 percent will be false.

69
00:04:44,560 --> 00:04:49,970
Whatever the value is through the input, those observations of the D.F. data set in train.

70
00:04:50,560 --> 00:04:52,770
And we put those observations of the of dataset.

71
00:04:53,360 --> 00:04:56,940
We have split values, falls into essed dataset.

72
00:04:57,940 --> 00:04:59,710
So they'll run both of these commands.

73
00:05:04,650 --> 00:05:10,360
And you can see that in my brain said, I have three ninety nine observations, and in my estimate I

74
00:05:10,360 --> 00:05:12,640
have wondered 107 observations.

75
00:05:14,240 --> 00:05:21,080
So they'll use this train set supremely model and this desk set to test the performance.

76
00:05:23,580 --> 00:05:25,140
Now the data is ready.

77
00:05:25,950 --> 00:05:29,040
We just need to bring it on the SBM model.

78
00:05:29,930 --> 00:05:31,400
We use SBM function.

79
00:05:31,800 --> 00:05:34,290
We need this package called even zero seven one.

80
00:05:35,910 --> 00:05:41,200
It is both installed and active in my station of art.

81
00:05:41,910 --> 00:05:45,810
If it is not installed or active, you can run these commands.

82
00:05:46,500 --> 00:05:49,140
I'm not going to run them because it is active for me.

83
00:05:51,740 --> 00:05:57,530
Next comes the SBM function, which we use to train the model in support vector machines.

84
00:05:59,620 --> 00:06:06,380
The only defense from classification is that now the variable is collection instead of Star Trek.

85
00:06:06,490 --> 00:06:10,210
Oscar, this collection has numeric values.

86
00:06:10,720 --> 00:06:16,270
So by default, this SBM function will run a regression model.

87
00:06:18,130 --> 00:06:22,510
If it had four values, it would have run classification model.

88
00:06:24,100 --> 00:06:33,730
So you either do you can specifically specify that it is to be decoration by using D a parameter or

89
00:06:33,730 --> 00:06:36,460
you can just do the default values.

90
00:06:36,870 --> 00:06:41,350
This is D APR dependent variable data.

91
00:06:41,380 --> 00:06:47,460
We are going to uses three in data set that we created that meant for now is linear.

92
00:06:48,010 --> 00:06:52,420
I would suggest that you try out polynomial and radial on your own.

93
00:06:56,010 --> 00:06:56,460
Cost.

94
00:06:56,500 --> 00:07:04,510
We are going to set as zero point zero one, although you can use the dual function to find out that

95
00:07:04,510 --> 00:07:07,830
value of cost that you can get minimum estimate.

96
00:07:10,740 --> 00:07:21,070
So I named this video bill as to unfit for immigration and I would underscore mine and we will look

97
00:07:21,070 --> 00:07:25,510
at somebody of this as we, in fact are so that we know what is contained in this.

98
00:07:28,150 --> 00:07:36,490
So this is an integration type with a covenant lenient cost set of datapoint due to one.

99
00:07:38,200 --> 00:07:40,520
And these are some Diggnation parameters.

100
00:07:43,960 --> 00:07:52,170
And this and this model has 326 support makers now using this SBM fit.

101
00:07:52,470 --> 00:07:58,240
Are we are going to predict the values of collection on the basic.

102
00:07:58,750 --> 00:08:02,650
So we use to predict function models, assume fit art.

103
00:08:03,400 --> 00:08:05,830
And the data to be used as ist.

104
00:08:08,700 --> 00:08:11,460
And these values will be saved in white bread.

105
00:08:13,420 --> 00:08:17,570
So they bet on this come on vibrate.

106
00:08:17,760 --> 00:08:20,780
Ah, as predicted, collection values.

107
00:08:22,300 --> 00:08:26,120
So if you want to look at my prayer time, you can just vibrate out.

108
00:08:29,120 --> 00:08:29,620
And done it.

109
00:08:31,610 --> 00:08:33,710
And you can see all the predicted values at.

110
00:08:40,600 --> 00:08:47,080
However, if we need to compare the performance of this model with any other model in regulation, one

111
00:08:47,080 --> 00:08:55,650
of the parameters that we use is mean squared error, mean squared error is the mean of squared defense

112
00:08:55,650 --> 00:08:57,910
of predicted values and the accuracy.

113
00:08:58,900 --> 00:09:04,770
So here we find the difference between predicted, which is daughton vibrate.

114
00:09:06,400 --> 00:09:08,800
And the actual Vidis desk collection.

115
00:09:09,670 --> 00:09:18,280
We square it and we find the mean of all these values when we are comparing masc values.

116
00:09:19,030 --> 00:09:23,200
Whichever model has lowered masc value is benefiting D.

117
00:09:23,560 --> 00:09:24,130
S digga.

118
00:09:26,010 --> 00:09:34,080
So when you're trying it out with binomial comment or radial cardinal or you are trying or different

119
00:09:34,080 --> 00:09:41,670
values of cost, you should compare the MSE on the basic predictive values on the tested and find out

120
00:09:41,670 --> 00:09:44,550
the MASC value using those predicted values.

121
00:09:46,800 --> 00:09:52,530
Whichever model has lowered masc values, that model should be used.

122
00:09:53,700 --> 00:09:57,630
So here I'm getting an MSU value of ninety nine million nearly.

123
00:10:00,560 --> 00:10:06,920
If you want to compare it with a radial kidney just for radial head.

124
00:10:11,460 --> 00:10:16,320
Specify gama value of, say, one

125
00:10:19,110 --> 00:10:19,890
train, the model

126
00:10:22,600 --> 00:10:31,040
predicted values and find out the MCO radial direction.

127
00:10:34,700 --> 00:10:40,900
And you can see that the radial model as an masc of 259 million.

128
00:10:42,470 --> 00:10:49,520
Although you can t do any of cost and Dhamar to find out a much better masc, but this is suggesting

129
00:10:49,520 --> 00:10:53,930
that there is more of a linear relationship between the variables than a nonlinear relationship.

130
00:10:56,150 --> 00:11:03,980
So in this way, by importing the data, getting it ready, splitting into two two parts, training

131
00:11:03,980 --> 00:11:08,050
the model on the train set and testing it performance on the descent.

132
00:11:08,420 --> 00:11:13,550
These are all the steps that we take to create a regulation model using SVM.