1 00:00:00,800 --> 00:00:01,100 All right. 2 00:00:01,100 --> 00:00:01,566 Great. 3 00:00:01,566 --> 00:00:02,866 So now they're all copies. 4 00:00:02,866 --> 00:00:04,566 Therefore we can modify them. 5 00:00:04,566 --> 00:00:09,366 Let me just show you once again how I modified the original classification code. 6 00:00:09,366 --> 00:00:12,766 We made before to these new ones, you know, more simplified ones, 7 00:00:12,766 --> 00:00:15,766 which will get you the accuracy quickly and efficiently. 8 00:00:15,866 --> 00:00:19,566 So the data preprocessing phase I kept exactly the same, including, 9 00:00:19,566 --> 00:00:22,566 you know, the feature scaling applied to x ray, the next test. 10 00:00:22,600 --> 00:00:26,566 But here I put a template name and I highlighted 11 00:00:26,566 --> 00:00:30,400 that you have to enter the name of your data set here. 12 00:00:30,466 --> 00:00:32,100 So that's what we'll do I'll show you. 13 00:00:32,100 --> 00:00:34,133 But that's the only thing that was changed. 14 00:00:34,133 --> 00:00:34,566 Indeed. 15 00:00:34,566 --> 00:00:37,733 We don't have to do much else because this will automatically select 16 00:00:37,733 --> 00:00:40,500 all the features and not your dependent variable. 17 00:00:40,500 --> 00:00:43,533 And this will automatically select your dependent variable. 18 00:00:43,533 --> 00:00:46,500 And that's provided of course you have in your data set. 19 00:00:46,500 --> 00:00:49,100 First the features you know in the first columns 20 00:00:49,100 --> 00:00:52,033 and last the dependent variable in the last column. 21 00:00:52,033 --> 00:00:53,766 Right. Make sure of this. 22 00:00:53,766 --> 00:00:57,333 What we're going to do now works for any data sets, regardless 23 00:00:57,333 --> 00:00:58,466 of the number of features. 24 00:00:58,466 --> 00:01:01,466 As long as they have the features in the first columns 25 00:01:01,466 --> 00:01:03,333 and the dependent variable in the last column. 26 00:01:03,333 --> 00:01:05,666 Make sure to remember this. 27 00:01:05,666 --> 00:01:07,433 All right. And then all good here. 28 00:01:07,433 --> 00:01:10,266 If you want, you can change that from oh point 25 to 0.2. 29 00:01:10,266 --> 00:01:10,900 But that's fine. 30 00:01:10,900 --> 00:01:14,100 Both values work well then feature scaling. 31 00:01:14,733 --> 00:01:17,833 All right then for each of the classification model, 32 00:01:17,833 --> 00:01:20,900 I kept, you know, the code to implement and train it. 33 00:01:21,100 --> 00:01:24,900 And finally what I did in the last cells is simply I removed, 34 00:01:24,933 --> 00:01:29,200 you know, the prints that displayed the vector of predictions and the vector 35 00:01:29,233 --> 00:01:31,166 real results next to each other, because, 36 00:01:31,166 --> 00:01:33,833 you know, we don't really need it for our model selection process. 37 00:01:33,833 --> 00:01:36,800 However, what I did is that I kept this, of course, 38 00:01:36,800 --> 00:01:40,533 but in order to compute the confusion matrix and the accuracy, 39 00:01:40,800 --> 00:01:44,933 I had to create that y print vector containing all the predictions 40 00:01:45,166 --> 00:01:49,233 by calling to predict method apply to x test from our classifier. 41 00:01:49,466 --> 00:01:50,533 And that's all I did. 42 00:01:50,533 --> 00:01:53,033 And I did the same in all the different files. 43 00:01:53,033 --> 00:01:53,533 Right? 44 00:01:53,533 --> 00:01:57,966 K-nearest neighbors data preprocessing, phase training and confusion matrix. 45 00:01:58,266 --> 00:02:00,000 Same support vector machine. 46 00:02:00,000 --> 00:02:03,000 Data preprocessing training and confusion matrix. 47 00:02:03,366 --> 00:02:08,266 Then kernel SVM, same data preprocessing, phase training and confusion matrix. 48 00:02:08,700 --> 00:02:11,733 Naive Bayes data preprocessing, phase training 49 00:02:11,733 --> 00:02:15,266 and confusion matrix and decision tree classification. 50 00:02:15,400 --> 00:02:16,300 Data preprocessing. 51 00:02:16,300 --> 00:02:20,100 Phase training, confusion matrix and finally random forest 52 00:02:20,400 --> 00:02:23,600 data preprocessing, phase training and confusion matrix C. 53 00:02:23,800 --> 00:02:28,033 So you have the exact same code templates for each of the classification models 54 00:02:28,033 --> 00:02:29,100 we built together. 55 00:02:29,100 --> 00:02:31,700 The only thing that changed is actually this cell, 56 00:02:31,700 --> 00:02:35,066 because this cell actually builds and train the classification model. 57 00:02:35,066 --> 00:02:38,033 You want to try through this model selection process. 58 00:02:38,033 --> 00:02:39,200 All right. Perfect. 59 00:02:39,200 --> 00:02:42,233 So now we're getting very close to the demo. 60 00:02:42,366 --> 00:02:46,233 And so just to recap this demo works for any data set. 61 00:02:46,233 --> 00:02:48,000 Regard list of the number of features. 62 00:02:48,000 --> 00:02:50,633 And as long as you have your features in the first columns 63 00:02:50,633 --> 00:02:52,966 and your dependent variable in the last column, 64 00:02:52,966 --> 00:02:54,300 and also as long as you don't have 65 00:02:54,300 --> 00:02:58,066 some special data preprocessing tools to use on your data set. 66 00:02:58,266 --> 00:03:01,333 If you have any categorical variables you know in strings 67 00:03:01,333 --> 00:03:04,633 or categorical variables where you have to perform one hot encoding 68 00:03:04,766 --> 00:03:08,966 well, don't forget to use your data preprocessing toolkit to preprocess 69 00:03:08,966 --> 00:03:10,100 the right way your data set, 70 00:03:10,100 --> 00:03:14,266 and then you can just deploy all your classification code templates here. 71 00:03:14,400 --> 00:03:18,200 And that, my friends, is exactly what I'm about to show you right now. 72 00:03:18,200 --> 00:03:20,400 So now the demo is going to start. 73 00:03:20,400 --> 00:03:21,300 Are you ready? 74 00:03:21,300 --> 00:03:24,400 Three. Two one go. All right. 75 00:03:24,400 --> 00:03:25,200 So I'm going to do this 76 00:03:25,200 --> 00:03:28,866 as efficiently as I can in order to show you the power of code templates. 77 00:03:29,200 --> 00:03:34,033 So first step the first step is to upload the data set inside the notebook. 78 00:03:34,033 --> 00:03:36,966 Right now it is connecting to runtime to enable file browsing. 79 00:03:36,966 --> 00:03:38,700 Actually I'm going to do this for 80 00:03:38,700 --> 00:03:42,266 each of the models here because you know it always takes a few seconds. 81 00:03:42,533 --> 00:03:45,033 So let's do it this way to be efficient. Right. 82 00:03:45,033 --> 00:03:48,866 So I'm just you know, loading all the files here. 83 00:03:49,500 --> 00:03:50,966 All right. Perfect. 84 00:03:50,966 --> 00:03:52,466 And everything. 85 00:03:52,466 --> 00:03:55,466 You know every file is now connecting to a runtime. 86 00:03:55,600 --> 00:03:56,433 Now be careful. 87 00:03:56,433 --> 00:03:59,233 If you don't see the sample data here you have to refresh. 88 00:03:59,233 --> 00:04:02,033 Otherwise you will have issues uploading your data set. 89 00:04:02,033 --> 00:04:03,400 Good. Now it's good. 90 00:04:03,400 --> 00:04:05,966 So the next step we upload the data set. 91 00:04:05,966 --> 00:04:06,233 All right. 92 00:04:06,233 --> 00:04:09,900 So this is the model selection folder and more precisely the classification folder. 93 00:04:10,100 --> 00:04:11,866 But let me show you the path again 94 00:04:11,866 --> 00:04:15,800 I put this machine learning model selection folder into my desktop. 95 00:04:15,800 --> 00:04:18,666 But make sure to find it on your machine wherever it is. 96 00:04:18,666 --> 00:04:20,933 If you have not downloaded that already, 97 00:04:20,933 --> 00:04:23,366 make sure to download it right before this tutorial. 98 00:04:23,366 --> 00:04:26,700 In the article you will find the link at the bottom of the article. 99 00:04:27,100 --> 00:04:30,733 Then together we're going to go inside, then inside classification 100 00:04:30,733 --> 00:04:31,600 and there we go. 101 00:04:31,600 --> 00:04:34,666 We select data dot csv, then we click open. 102 00:04:34,966 --> 00:04:36,266 Then we press okay. 103 00:04:36,266 --> 00:04:39,300 And then what we simply need to do inside this code 104 00:04:39,333 --> 00:04:42,833 template is just to put here the name of the data set. 105 00:04:42,833 --> 00:04:46,966 So you just double click this and then enter data that CSV 106 00:04:46,966 --> 00:04:49,966 or you know the name of your future data set. 107 00:04:50,166 --> 00:04:51,266 All right. And that's it. 108 00:04:51,266 --> 00:04:54,400 That's all we have to do in each code template. 109 00:04:54,400 --> 00:04:58,266 Only one thing to change so that we can really call it a curved template. 110 00:04:58,633 --> 00:04:59,166 All right. Great. 111 00:04:59,166 --> 00:05:02,266 So now we're going to do the same in each other implementation. 112 00:05:02,533 --> 00:05:04,066 So now k nearest neighbors. 113 00:05:04,066 --> 00:05:06,900 Let's refresh this because we need to see this. There we go. 114 00:05:06,900 --> 00:05:10,933 Then upload then data dot CSV then open. 115 00:05:11,266 --> 00:05:12,966 All right. Perfect okay. 116 00:05:12,966 --> 00:05:17,000 And then we replace here the name by data dot CSV. 117 00:05:17,233 --> 00:05:18,100 Perfect. 118 00:05:18,100 --> 00:05:21,900 Then next one support vector machine refresh upload. 119 00:05:22,433 --> 00:05:26,866 Then data dot CSV then open and perfect. 120 00:05:27,000 --> 00:05:28,133 We have the data set. 121 00:05:28,133 --> 00:05:33,466 Now we replace this by data dot CSV and all good SVM is ready now. 122 00:05:33,466 --> 00:05:35,200 Kernel SVM refresh 123 00:05:36,233 --> 00:05:38,200 upload data 124 00:05:38,200 --> 00:05:41,200 dot CSV open okay 125 00:05:42,066 --> 00:05:45,300 replacing this by data dot CSV or the name of your future data set. 126 00:05:45,566 --> 00:05:47,833 And there we go. Kernel SVM is ready. 127 00:05:47,833 --> 00:05:51,266 All right then Naive Bayes refresh 128 00:05:51,766 --> 00:05:55,966 upload data dot CSV open okay. 129 00:05:55,966 --> 00:05:58,966 And then replacing this by data dot CSV.