1 00:00:01,340 --> 00:00:08,400 So we have learned all these steps and all the models in our cause, I am summarizing all the steps 2 00:00:08,430 --> 00:00:15,180 that we took and what you should do when you face a business problem in which you have to classify the 3 00:00:15,180 --> 00:00:15,700 results. 4 00:00:17,610 --> 00:00:19,610 The first step is to do data collection. 5 00:00:20,310 --> 00:00:25,130 You have to identify all the relevant variables and collect data for D.C.. 6 00:00:26,520 --> 00:00:28,690 Once you've collected all the relevant data. 7 00:00:29,550 --> 00:00:30,870 You have to pre-process it. 8 00:00:31,920 --> 00:00:34,280 You learned how to do data preprocessing. 9 00:00:34,940 --> 00:00:41,340 Few of the major steps that we took would outline a treatment in which we found out the outlying values 10 00:00:41,580 --> 00:00:47,490 and tested values so that they do not harmfully impact our analysis. 11 00:00:48,600 --> 00:00:55,830 Then we have a missing value in imputation where we replaced black values with harmless values such 12 00:00:55,830 --> 00:00:57,480 as mean or medians. 13 00:00:58,710 --> 00:01:00,570 We also did variable transformation. 14 00:01:00,840 --> 00:01:06,510 We combined four different distance variables into one variable and so on. 15 00:01:07,770 --> 00:01:10,200 So data preprocessing is a very important part. 16 00:01:10,620 --> 00:01:11,670 You have to clean your data. 17 00:01:11,700 --> 00:01:17,910 You have to put it into a tabular format that all the values of the variables in proper format so that 18 00:01:18,180 --> 00:01:19,830 your model can work on it. 19 00:01:21,420 --> 00:01:22,890 Next year's model training. 20 00:01:24,120 --> 00:01:25,480 If you have only one data set. 21 00:01:25,950 --> 00:01:29,010 You have to split it into test and train data set. 22 00:01:29,560 --> 00:01:36,240 You will use the training data set to train the model and we will use the test it to test its performance. 23 00:01:37,950 --> 00:01:44,550 We have created the template for logistic regression, linear discriminant, analysis and Ganon. 24 00:01:45,500 --> 00:01:46,680 Same those template. 25 00:01:46,830 --> 00:01:52,020 Whenever you face any business problem, just replace the dataset and you are a good tool. 26 00:01:52,260 --> 00:01:54,990 You can train your model with those same templates. 27 00:01:56,730 --> 00:01:58,950 That point I agree that it is do I iterations. 28 00:02:00,420 --> 00:02:05,700 The point is, when we trained our model before that we have taken some decisions on our data. 29 00:02:05,910 --> 00:02:10,440 For example, we decided that we will impute the missing values using mean. 30 00:02:11,640 --> 00:02:13,620 What will be the impact of using median? 31 00:02:14,000 --> 00:02:15,120 Will that perform better? 32 00:02:16,440 --> 00:02:23,130 We decided in variable transformation that we will replace these four distances by average distance. 33 00:02:23,970 --> 00:02:30,330 Well, maybe it would make more business sense if we do basic by the largest distance or the smallest 34 00:02:30,330 --> 00:02:30,870 distance. 35 00:02:32,130 --> 00:02:37,320 So we should do a iterations of all these changes wherever we make our decision. 36 00:02:38,580 --> 00:02:44,490 Lastly, when we are training the model, we should also compare the performance of different methods. 37 00:02:44,670 --> 00:02:46,710 For example, here we learned three methods. 38 00:02:47,430 --> 00:02:52,320 So we should compare the performance of all these three methods using dissect. 39 00:02:54,150 --> 00:02:55,980 He's only last we know how to do that. 40 00:02:56,370 --> 00:02:59,070 We use the confusion metrics for classification problems. 41 00:03:00,190 --> 00:03:06,750 Draw the classification metrics of data set for all the different models that you have created and select 42 00:03:06,750 --> 00:03:07,350 the best one. 43 00:03:08,220 --> 00:03:09,360 So that's the last point. 44 00:03:09,660 --> 00:03:11,130 We have to select the best model. 45 00:03:12,780 --> 00:03:15,960 As I told you, there are two types of business problems. 46 00:03:16,440 --> 00:03:17,670 One is prediction problem. 47 00:03:18,320 --> 00:03:23,250 Their aim is to have maximum prediction accuracy in such a case. 48 00:03:24,120 --> 00:03:26,400 We should use the model with best accuracy. 49 00:03:28,830 --> 00:03:31,710 And the second type of problem is interpretation problem. 50 00:03:31,740 --> 00:03:36,780 That is, we want to identify the relationship between a particular prediction very well. 51 00:03:36,780 --> 00:03:37,840 And the response we're able. 52 00:03:38,500 --> 00:03:42,030 For that, we can use decode fishing values or deep parametric models. 53 00:03:43,680 --> 00:03:51,450 Once we have selected the best model, for example, say linear discriminant analysis is giving us the 54 00:03:51,450 --> 00:03:52,680 best prediction results. 55 00:03:53,280 --> 00:03:54,600 And we have selected that model. 56 00:03:55,320 --> 00:04:03,660 Now, whenever we get new data or new observations, we can feed those observations as a test set to 57 00:04:03,660 --> 00:04:08,680 our model and find out the predicted classes for those of deletions. 58 00:04:10,290 --> 00:04:13,740 So this is the whole process for a given data. 59 00:04:14,190 --> 00:04:16,890 We train the model to start predicting. 60 00:04:17,730 --> 00:04:20,820 We identified a model which is giving us the best predictions. 61 00:04:21,300 --> 00:04:27,000 And once we have that model, which is giving us the best predictions, we use it to predict for future 62 00:04:27,000 --> 00:04:27,720 observations. 63 00:04:29,610 --> 00:04:31,110 Thank you for being with us. 64 00:04:31,200 --> 00:04:31,800 And this goes.