1 00:00:00,970 --> 00:00:07,770 In this video, we are going to learn how to train the aggression as a model on our data. 2 00:00:09,060 --> 00:00:14,690 The only difference between classification SBM and Diggnation SBM is the type of at this point we're 3 00:00:14,700 --> 00:00:15,690 able to do very well. 4 00:00:15,690 --> 00:00:20,010 We are trying to predict if that variable is having classes. 5 00:00:20,740 --> 00:00:22,230 We done a classification model. 6 00:00:22,830 --> 00:00:26,610 If that variable has continuous values, we don't know Diggnation SVM. 7 00:00:28,360 --> 00:00:35,320 So to run a regression, SBM will be using this dataset, which is move integration. 8 00:00:36,070 --> 00:00:39,400 This dataset is also attached in the resources section. 9 00:00:40,330 --> 00:00:41,580 And you can download it from the. 10 00:00:44,480 --> 00:00:47,270 We need to do all the initial steps again. 11 00:00:47,570 --> 00:00:49,160 That is, we will import the data. 12 00:00:49,730 --> 00:00:51,890 We will get that data ready for analysis. 13 00:00:52,330 --> 00:00:57,530 Is that they tend good test and train the U.S. training data to train aggregation model. 14 00:00:57,890 --> 00:01:01,160 And we will use the test data to check out its performance. 15 00:01:02,810 --> 00:01:04,730 So the first step is importing the data. 16 00:01:06,010 --> 00:01:09,130 You know how to import data from us? 17 00:01:09,190 --> 00:01:17,480 Yes, we find we just need to give the location of the file a link that it has headers and use the lead 18 00:01:17,500 --> 00:01:18,550 NTSB function. 19 00:01:19,830 --> 00:01:29,610 Run this on to import the data and this D.F. video will is created, this data frame has 18 variables. 20 00:01:30,830 --> 00:01:31,770 Let us look at this data. 21 00:01:33,840 --> 00:01:38,770 It has all the same variables, only that the last variable is now election. 22 00:01:39,460 --> 00:01:44,550 We are trying to predict the total box office collection of the movies. 23 00:01:45,150 --> 00:01:46,410 This is all these variables. 24 00:01:47,280 --> 00:01:49,590 So how much did it spend on marketing? 25 00:01:49,620 --> 00:01:56,040 How much did it spend on production or what did budget or does lead actor and actress rating and so 26 00:01:56,040 --> 00:01:56,250 on. 27 00:01:56,790 --> 00:01:58,020 So this is these variables. 28 00:01:58,350 --> 00:02:03,870 We are trying to find out the total box office collection of that particular movie. 29 00:02:05,400 --> 00:02:06,660 So this data is important. 30 00:02:08,500 --> 00:02:12,330 Now we have to get this data ready for analysis. 31 00:02:13,290 --> 00:02:14,570 We did this earlier also. 32 00:02:15,580 --> 00:02:20,220 I'm only covering missing value imputation in data processing. 33 00:02:20,850 --> 00:02:26,700 There are other parts also like outlier treatment, looking at correlations, plotting histograms and 34 00:02:26,700 --> 00:02:27,090 so on. 35 00:02:28,440 --> 00:02:30,450 If we are doing only missing value in prediction. 36 00:02:31,300 --> 00:02:35,070 So to check which variable has missing values, then somebody. 37 00:02:38,230 --> 00:02:41,870 And we see that time taken variable as in is. 38 00:02:43,240 --> 00:02:49,570 To remove the end is within this command in which we find ourselves in time taken very well, which 39 00:02:49,570 --> 00:02:57,880 has any, and we replaced the value in those cells using the mean of all the other values in that would 40 00:02:57,890 --> 00:02:58,140 evil. 41 00:02:59,750 --> 00:03:00,420 So we done this. 42 00:03:00,430 --> 00:03:00,850 Come on. 43 00:03:03,100 --> 00:03:09,420 Then we go back and take somebody to confirm that nobody below has. 44 00:03:09,590 --> 00:03:10,150 And these. 45 00:03:14,990 --> 00:03:19,020 The next step we did was to do our best transplant. 46 00:03:20,220 --> 00:03:22,870 So we have a dataset of 506 observations. 47 00:03:23,490 --> 00:03:31,590 If we use all of them to Apprendi model, we will not know how the model will perform on previously 48 00:03:31,680 --> 00:03:32,400 unseen data. 49 00:03:33,330 --> 00:03:36,000 So we will make two parts of this data. 50 00:03:36,750 --> 00:03:42,650 80 percent of the data will be used to train the model, 20 percent of the data we'll keep. 51 00:03:42,840 --> 00:03:48,990 And we will use it to take the performance of this model so that we know how this model perform on previously 52 00:03:49,020 --> 00:03:49,690 unseen data. 53 00:03:51,150 --> 00:03:52,710 To do pestilence, plague. 54 00:03:53,150 --> 00:03:55,190 We use this package called the adults. 55 00:03:56,810 --> 00:03:59,300 It has already installed, but it is not active. 56 00:04:00,050 --> 00:04:01,470 So I'll run this library. 57 00:04:01,490 --> 00:04:01,790 Go on. 58 00:04:01,790 --> 00:04:04,910 Only if it is not installed also for you. 59 00:04:05,150 --> 00:04:06,910 You can Banda's installer packages. 60 00:04:06,950 --> 00:04:07,220 Come on. 61 00:04:08,180 --> 00:04:09,140 So I'll run this command. 62 00:04:12,940 --> 00:04:14,160 And the see it back. 63 00:04:14,280 --> 00:04:15,150 Is not active. 64 00:04:18,650 --> 00:04:24,440 Then we will set seed so that we both get the same split on this dataset. 65 00:04:25,550 --> 00:04:27,020 So I set a seed of zero. 66 00:04:29,230 --> 00:04:31,720 And then we create a variable called split. 67 00:04:32,950 --> 00:04:40,780 This plate has values through and also and 80 percent of the values in this split variable will be true 68 00:04:41,710 --> 00:04:43,690 and nearly 20 percent will be false. 69 00:04:44,560 --> 00:04:49,970 Whatever the value is through the input, those observations of the D.F. data set in train. 70 00:04:50,560 --> 00:04:52,770 And we put those observations of the of dataset. 71 00:04:53,360 --> 00:04:56,940 We have split values, falls into essed dataset. 72 00:04:57,940 --> 00:04:59,710 So they'll run both of these commands. 73 00:05:04,650 --> 00:05:10,360 And you can see that in my brain said, I have three ninety nine observations, and in my estimate I 74 00:05:10,360 --> 00:05:12,640 have wondered 107 observations. 75 00:05:14,240 --> 00:05:21,080 So they'll use this train set supremely model and this desk set to test the performance. 76 00:05:23,580 --> 00:05:25,140 Now the data is ready. 77 00:05:25,950 --> 00:05:29,040 We just need to bring it on the SBM model. 78 00:05:29,930 --> 00:05:31,400 We use SBM function. 79 00:05:31,800 --> 00:05:34,290 We need this package called even zero seven one. 80 00:05:35,910 --> 00:05:41,200 It is both installed and active in my station of art. 81 00:05:41,910 --> 00:05:45,810 If it is not installed or active, you can run these commands. 82 00:05:46,500 --> 00:05:49,140 I'm not going to run them because it is active for me. 83 00:05:51,740 --> 00:05:57,530 Next comes the SBM function, which we use to train the model in support vector machines. 84 00:05:59,620 --> 00:06:06,380 The only defense from classification is that now the variable is collection instead of Star Trek. 85 00:06:06,490 --> 00:06:10,210 Oscar, this collection has numeric values. 86 00:06:10,720 --> 00:06:16,270 So by default, this SBM function will run a regression model. 87 00:06:18,130 --> 00:06:22,510 If it had four values, it would have run classification model. 88 00:06:24,100 --> 00:06:33,730 So you either do you can specifically specify that it is to be decoration by using D a parameter or 89 00:06:33,730 --> 00:06:36,460 you can just do the default values. 90 00:06:36,870 --> 00:06:41,350 This is D APR dependent variable data. 91 00:06:41,380 --> 00:06:47,460 We are going to uses three in data set that we created that meant for now is linear. 92 00:06:48,010 --> 00:06:52,420 I would suggest that you try out polynomial and radial on your own. 93 00:06:56,010 --> 00:06:56,460 Cost. 94 00:06:56,500 --> 00:07:04,510 We are going to set as zero point zero one, although you can use the dual function to find out that 95 00:07:04,510 --> 00:07:07,830 value of cost that you can get minimum estimate. 96 00:07:10,740 --> 00:07:21,070 So I named this video bill as to unfit for immigration and I would underscore mine and we will look 97 00:07:21,070 --> 00:07:25,510 at somebody of this as we, in fact are so that we know what is contained in this. 98 00:07:28,150 --> 00:07:36,490 So this is an integration type with a covenant lenient cost set of datapoint due to one. 99 00:07:38,200 --> 00:07:40,520 And these are some Diggnation parameters. 100 00:07:43,960 --> 00:07:52,170 And this and this model has 326 support makers now using this SBM fit. 101 00:07:52,470 --> 00:07:58,240 Are we are going to predict the values of collection on the basic. 102 00:07:58,750 --> 00:08:02,650 So we use to predict function models, assume fit art. 103 00:08:03,400 --> 00:08:05,830 And the data to be used as ist. 104 00:08:08,700 --> 00:08:11,460 And these values will be saved in white bread. 105 00:08:13,420 --> 00:08:17,570 So they bet on this come on vibrate. 106 00:08:17,760 --> 00:08:20,780 Ah, as predicted, collection values. 107 00:08:22,300 --> 00:08:26,120 So if you want to look at my prayer time, you can just vibrate out. 108 00:08:29,120 --> 00:08:29,620 And done it. 109 00:08:31,610 --> 00:08:33,710 And you can see all the predicted values at. 110 00:08:40,600 --> 00:08:47,080 However, if we need to compare the performance of this model with any other model in regulation, one 111 00:08:47,080 --> 00:08:55,650 of the parameters that we use is mean squared error, mean squared error is the mean of squared defense 112 00:08:55,650 --> 00:08:57,910 of predicted values and the accuracy. 113 00:08:58,900 --> 00:09:04,770 So here we find the difference between predicted, which is daughton vibrate. 114 00:09:06,400 --> 00:09:08,800 And the actual Vidis desk collection. 115 00:09:09,670 --> 00:09:18,280 We square it and we find the mean of all these values when we are comparing masc values. 116 00:09:19,030 --> 00:09:23,200 Whichever model has lowered masc value is benefiting D. 117 00:09:23,560 --> 00:09:24,130 S digga. 118 00:09:26,010 --> 00:09:34,080 So when you're trying it out with binomial comment or radial cardinal or you are trying or different 119 00:09:34,080 --> 00:09:41,670 values of cost, you should compare the MSE on the basic predictive values on the tested and find out 120 00:09:41,670 --> 00:09:44,550 the MASC value using those predicted values. 121 00:09:46,800 --> 00:09:52,530 Whichever model has lowered masc values, that model should be used. 122 00:09:53,700 --> 00:09:57,630 So here I'm getting an MSU value of ninety nine million nearly. 123 00:10:00,560 --> 00:10:06,920 If you want to compare it with a radial kidney just for radial head. 124 00:10:11,460 --> 00:10:16,320 Specify gama value of, say, one 125 00:10:19,110 --> 00:10:19,890 train, the model 126 00:10:22,600 --> 00:10:31,040 predicted values and find out the MCO radial direction. 127 00:10:34,700 --> 00:10:40,900 And you can see that the radial model as an masc of 259 million. 128 00:10:42,470 --> 00:10:49,520 Although you can t do any of cost and Dhamar to find out a much better masc, but this is suggesting 129 00:10:49,520 --> 00:10:53,930 that there is more of a linear relationship between the variables than a nonlinear relationship. 130 00:10:56,150 --> 00:11:03,980 So in this way, by importing the data, getting it ready, splitting into two two parts, training 131 00:11:03,980 --> 00:11:08,050 the model on the train set and testing it performance on the descent. 132 00:11:08,420 --> 00:11:13,550 These are all the steps that we take to create a regulation model using SVM.