1
00:00:01,340 --> 00:00:08,400
So we have learned all these steps and all the models in our cause, I am summarizing all the steps

2
00:00:08,430 --> 00:00:15,180
that we took and what you should do when you face a business problem in which you have to classify the

3
00:00:15,180 --> 00:00:15,700
results.

4
00:00:17,610 --> 00:00:19,610
The first step is to do data collection.

5
00:00:20,310 --> 00:00:25,130
You have to identify all the relevant variables and collect data for D.C..

6
00:00:26,520 --> 00:00:28,690
Once you've collected all the relevant data.

7
00:00:29,550 --> 00:00:30,870
You have to pre-process it.

8
00:00:31,920 --> 00:00:34,280
You learned how to do data preprocessing.

9
00:00:34,940 --> 00:00:41,340
Few of the major steps that we took would outline a treatment in which we found out the outlying values

10
00:00:41,580 --> 00:00:47,490
and tested values so that they do not harmfully impact our analysis.

11
00:00:48,600 --> 00:00:55,830
Then we have a missing value in imputation where we replaced black values with harmless values such

12
00:00:55,830 --> 00:00:57,480
as mean or medians.

13
00:00:58,710 --> 00:01:00,570
We also did variable transformation.

14
00:01:00,840 --> 00:01:06,510
We combined four different distance variables into one variable and so on.

15
00:01:07,770 --> 00:01:10,200
So data preprocessing is a very important part.

16
00:01:10,620 --> 00:01:11,670
You have to clean your data.

17
00:01:11,700 --> 00:01:17,910
You have to put it into a tabular format that all the values of the variables in proper format so that

18
00:01:18,180 --> 00:01:19,830
your model can work on it.

19
00:01:21,420 --> 00:01:22,890
Next year's model training.

20
00:01:24,120 --> 00:01:25,480
If you have only one data set.

21
00:01:25,950 --> 00:01:29,010
You have to split it into test and train data set.

22
00:01:29,560 --> 00:01:36,240
You will use the training data set to train the model and we will use the test it to test its performance.

23
00:01:37,950 --> 00:01:44,550
We have created the template for logistic regression, linear discriminant, analysis and Ganon.

24
00:01:45,500 --> 00:01:46,680
Same those template.

25
00:01:46,830 --> 00:01:52,020
Whenever you face any business problem, just replace the dataset and you are a good tool.

26
00:01:52,260 --> 00:01:54,990
You can train your model with those same templates.

27
00:01:56,730 --> 00:01:58,950
That point I agree that it is do I iterations.

28
00:02:00,420 --> 00:02:05,700
The point is, when we trained our model before that we have taken some decisions on our data.

29
00:02:05,910 --> 00:02:10,440
For example, we decided that we will impute the missing values using mean.

30
00:02:11,640 --> 00:02:13,620
What will be the impact of using median?

31
00:02:14,000 --> 00:02:15,120
Will that perform better?

32
00:02:16,440 --> 00:02:23,130
We decided in variable transformation that we will replace these four distances by average distance.

33
00:02:23,970 --> 00:02:30,330
Well, maybe it would make more business sense if we do basic by the largest distance or the smallest

34
00:02:30,330 --> 00:02:30,870
distance.

35
00:02:32,130 --> 00:02:37,320
So we should do a iterations of all these changes wherever we make our decision.

36
00:02:38,580 --> 00:02:44,490
Lastly, when we are training the model, we should also compare the performance of different methods.

37
00:02:44,670 --> 00:02:46,710
For example, here we learned three methods.

38
00:02:47,430 --> 00:02:52,320
So we should compare the performance of all these three methods using dissect.

39
00:02:54,150 --> 00:02:55,980
He's only last we know how to do that.

40
00:02:56,370 --> 00:02:59,070
We use the confusion metrics for classification problems.

41
00:03:00,190 --> 00:03:06,750
Draw the classification metrics of data set for all the different models that you have created and select

42
00:03:06,750 --> 00:03:07,350
the best one.

43
00:03:08,220 --> 00:03:09,360
So that's the last point.

44
00:03:09,660 --> 00:03:11,130
We have to select the best model.

45
00:03:12,780 --> 00:03:15,960
As I told you, there are two types of business problems.

46
00:03:16,440 --> 00:03:17,670
One is prediction problem.

47
00:03:18,320 --> 00:03:23,250
Their aim is to have maximum prediction accuracy in such a case.

48
00:03:24,120 --> 00:03:26,400
We should use the model with best accuracy.

49
00:03:28,830 --> 00:03:31,710
And the second type of problem is interpretation problem.

50
00:03:31,740 --> 00:03:36,780
That is, we want to identify the relationship between a particular prediction very well.

51
00:03:36,780 --> 00:03:37,840
And the response we're able.

52
00:03:38,500 --> 00:03:42,030
For that, we can use decode fishing values or deep parametric models.

53
00:03:43,680 --> 00:03:51,450
Once we have selected the best model, for example, say linear discriminant analysis is giving us the

54
00:03:51,450 --> 00:03:52,680
best prediction results.

55
00:03:53,280 --> 00:03:54,600
And we have selected that model.

56
00:03:55,320 --> 00:04:03,660
Now, whenever we get new data or new observations, we can feed those observations as a test set to

57
00:04:03,660 --> 00:04:08,680
our model and find out the predicted classes for those of deletions.

58
00:04:10,290 --> 00:04:13,740
So this is the whole process for a given data.

59
00:04:14,190 --> 00:04:16,890
We train the model to start predicting.

60
00:04:17,730 --> 00:04:20,820
We identified a model which is giving us the best predictions.

61
00:04:21,300 --> 00:04:27,000
And once we have that model, which is giving us the best predictions, we use it to predict for future

62
00:04:27,000 --> 00:04:27,720
observations.

63
00:04:29,610 --> 00:04:31,110
Thank you for being with us.

64
00:04:31,200 --> 00:04:31,800
And this goes.