1 00:00:00,610 --> 00:00:07,740 I continue with our cause, and in this video, I would order some data splitting, so I training the 2 00:00:07,740 --> 00:00:14,040 parameters of a prediction function and testing it on same data is an incorrect procedure. 3 00:00:14,040 --> 00:00:16,850 From my logical point of view. 4 00:00:17,190 --> 00:00:23,280 A model is you do is you simply to predict the same level as opposed to regular training. 5 00:00:24,240 --> 00:00:32,580 It would have above a score, but would not be able to predict anything useful on the data hasn't previously 6 00:00:33,210 --> 00:00:34,020 been explored. 7 00:00:34,500 --> 00:00:36,920 So this situation is called overfitting. 8 00:00:37,260 --> 00:00:44,460 So to avoid this is a common practice to run automatic learning experiments that are splitting to provide 9 00:00:44,460 --> 00:00:48,180 some of the data that is available on a training set and a test. 10 00:00:49,140 --> 00:00:56,930 So data splitting is an operation that allow us to divide the available data or to assess the NRA for 11 00:00:57,390 --> 00:01:00,320 for cross validation validation purposes. 12 00:01:00,810 --> 00:01:08,520 So easier to train a predictive model and to test the model performance training and testing the model 13 00:01:08,520 --> 00:01:14,790 forms on the form the basis for for the usage of the model, for prediction. 14 00:01:14,790 --> 00:01:21,900 And in predictive analytics, for example, if we want a dataset has a 100 rollerbladers. 15 00:01:23,480 --> 00:01:26,350 Which includes the predictable and respond variables. 16 00:01:26,690 --> 00:01:36,950 We will split it into a convenient ratio, say 70 and 30, and allocate 70 rows for training and 30 17 00:01:36,950 --> 00:01:37,930 rolls for testing. 18 00:01:38,510 --> 00:01:44,150 The role will be selected randomly to reduce bias once the training data is available. 19 00:01:44,450 --> 00:01:49,730 The data is fed to neural network to get massive universal function in place. 20 00:01:50,330 --> 00:01:57,680 That training data determines the ways, biases and activation function to be used so that we can get 21 00:01:58,190 --> 00:01:59,770 to output from input. 22 00:02:00,500 --> 00:02:07,970 So once sufficient convergence is achieved, the model is all in a memory and the next step is testing 23 00:02:07,970 --> 00:02:08,560 the model. 24 00:02:09,320 --> 00:02:12,590 So we lost the 30 rolls of data to check it be. 25 00:02:12,620 --> 00:02:21,110 Our actual output matches the predictable from the model, the evaluation issue to get the various metrics 26 00:02:21,470 --> 00:02:25,880 that can validate the model if accuracy is to worry. 27 00:02:27,020 --> 00:02:34,070 The model has to be rebuilt with the change in the training data and all the biometrics back to the 28 00:02:34,070 --> 00:02:41,510 new neural network builder, so to split the data, the second library has been used more specifically 29 00:02:41,750 --> 00:02:43,480 the cycle London model. 30 00:02:44,000 --> 00:02:48,510 And of course, selection, not an explicit function, has been used. 31 00:02:49,340 --> 00:02:51,470 So this function quickly covid. 32 00:02:52,370 --> 00:02:57,320 A random split into the training and I said, so let's create a function. 33 00:03:01,970 --> 00:03:03,920 So if we had to import a library. 34 00:03:07,840 --> 00:03:09,940 So is from Escalon. 35 00:03:13,070 --> 00:03:14,780 Doc Modahl. 36 00:03:16,650 --> 00:03:19,230 Selection is in part. 37 00:03:20,100 --> 00:03:25,330 Trying to split and at least try to make it easier for us. 38 00:03:25,380 --> 00:03:32,340 We would divide a satellite data frame into two, which is the predictors X and Y. 39 00:03:35,620 --> 00:03:36,310 Devi. 40 00:03:38,350 --> 00:03:39,410 They predict the. 41 00:03:40,930 --> 00:03:41,290 And. 42 00:03:43,580 --> 00:03:44,360 Which is why. 43 00:03:51,460 --> 00:03:52,480 And through this. 44 00:03:54,320 --> 00:03:57,590 We wish you the bonds that are friend revolution. 45 00:04:00,780 --> 00:04:03,090 So, Josh, you actually equal. 46 00:04:04,110 --> 00:04:07,230 Data scale not drop. 47 00:04:09,090 --> 00:04:09,480 Is. 48 00:04:13,380 --> 00:04:14,130 Um, uh. 49 00:04:15,590 --> 00:04:17,060 Axis equal one. 50 00:04:18,730 --> 00:04:23,290 And then bring I thought this. 51 00:04:24,930 --> 00:04:25,470 Roy. 52 00:04:30,910 --> 00:04:33,220 I think it could be capital I. 53 00:04:35,360 --> 00:04:43,250 And to make it more easier with that character, why so we won't let any confusion. 54 00:04:46,000 --> 00:04:49,390 Data scale is. 55 00:04:51,800 --> 00:04:52,280 Matt. 56 00:04:54,600 --> 00:05:00,480 And then bring white out this dry. 57 00:05:16,890 --> 00:05:21,570 I did make a mistake in here because it May Square bracket. 58 00:05:23,400 --> 00:05:26,280 Had run the sale and we got our Resul. 59 00:05:28,200 --> 00:05:37,380 So the band does not that our friend John Ross was available from Rose, our column Werrimull Rose, 60 00:05:37,380 --> 00:05:45,750 or by mystifying label names and corresponding assets, or by specifying the index or club names directly 61 00:05:46,200 --> 00:05:52,920 when using a multi index, labels on different label can be removed by specifying the label. 62 00:05:53,370 --> 00:05:54,560 So to extract it. 63 00:05:54,960 --> 00:05:59,610 We had to remove the target color map from the starting date scale data frame. 64 00:06:00,150 --> 00:06:00,900 So now. 65 00:06:01,800 --> 00:06:02,910 That being the. 66 00:06:04,780 --> 00:06:05,380 Exer. 67 00:06:12,340 --> 00:06:13,000 So. 68 00:06:16,390 --> 00:06:18,370 Now, let's try some gold for the. 69 00:06:21,080 --> 00:06:25,820 So it is a half hour, an hour, why is only the map Kolob? 70 00:06:27,790 --> 00:06:28,540 So. 71 00:06:31,010 --> 00:06:37,730 I had heard in Colombia and I had only one goal of which target, now we can split the frame. 72 00:06:39,370 --> 00:06:41,560 So that was uncalled for, that is. 73 00:06:42,750 --> 00:06:44,370 I underscored when. 74 00:06:46,070 --> 00:06:55,520 Come, I underscore, has come why underscore under Skytrain, comma, why underscored as equal train. 75 00:06:56,850 --> 00:07:02,370 Pressplay, which is X, Y and Z test, underscores I. 76 00:07:03,330 --> 00:07:12,420 So which a Kojiro high tree there are, because we split 70 percent for the training and. 77 00:07:15,000 --> 00:07:19,290 Thirty percent for the testing and then we get your random underscores that. 78 00:07:20,380 --> 00:07:21,220 Equal five. 79 00:07:22,460 --> 00:07:26,690 Now is very simple with just brain. 80 00:07:29,490 --> 00:07:30,570 Eyestrain. 81 00:07:32,660 --> 00:07:34,700 Shall we call? 82 00:07:35,940 --> 00:07:40,110 Don't I underscore train the. 83 00:07:42,450 --> 00:07:45,630 And then that could be. 84 00:07:48,250 --> 00:07:50,050 And one to. 85 00:07:53,100 --> 00:07:54,780 Thanks, Tess. 86 00:07:56,840 --> 00:07:57,610 And then. 87 00:08:00,970 --> 00:08:03,250 Why train so? 88 00:08:05,200 --> 00:08:06,760 Hectares and then. 89 00:08:08,420 --> 00:08:09,140 Why train? 90 00:08:11,390 --> 00:08:16,730 And then we should bring the White House, so why? 91 00:08:18,380 --> 00:08:18,920 This. 92 00:08:24,340 --> 00:08:25,150 And then why? 93 00:08:27,410 --> 00:08:32,070 So that explains it before we run the court. 94 00:08:32,750 --> 00:08:34,670 So in here we. 95 00:08:36,360 --> 00:08:39,610 Trent Lott split from John, showed up for Paramatta. 96 00:08:40,880 --> 00:08:47,090 I guess I am not upset, so I said, why are predicting an attack at the frame? 97 00:08:47,340 --> 00:08:49,610 So I said, why is a predictor? 98 00:08:49,650 --> 00:08:55,580 Why the target cyber parameter can take the following Thai's plot in dargah or not. 99 00:08:55,950 --> 00:09:04,860 So the option is Diffa with a zero point two five is between zero point zero and one point zero is present. 100 00:09:04,860 --> 00:09:07,170 A proportion of the data set a goal. 101 00:09:08,530 --> 00:09:17,830 In the split, so if the parameter is your is and the absolute number of decibels, if the parameter 102 00:09:17,830 --> 00:09:20,980 is not, the value is set to complement that. 103 00:09:21,900 --> 00:09:26,010 Trained inside somebody for the values to zero to five. 104 00:09:26,340 --> 00:09:33,000 So in our case, we set aside zero poetry, which means 30 percent of the data is divided up as a test 105 00:09:33,000 --> 00:09:33,450 data. 106 00:09:34,080 --> 00:09:42,540 So the last one is random set parameter is you to set it used by the random number generator. 107 00:09:42,930 --> 00:09:44,610 So in this way, the. 108 00:09:46,480 --> 00:09:53,080 30, the Rabbitohs splitting up the operation, is Gardet now executed, I have to say, what is the 109 00:09:53,090 --> 00:09:53,830 result we get? 110 00:09:58,340 --> 00:09:59,090 And. 111 00:10:01,000 --> 00:10:02,860 I got here because. 112 00:10:04,070 --> 00:10:05,320 It's to become a. 113 00:10:06,360 --> 00:10:07,110 Not adult. 114 00:10:15,240 --> 00:10:16,730 And we got our resolve. 115 00:10:17,830 --> 00:10:18,550 So. 116 00:10:20,000 --> 00:10:28,580 The and the frame is split into two datasets that had two hundred and fifty four rows for Ekstrand and 117 00:10:29,210 --> 00:10:36,650 one hundred and fifty two rows for the act as a similar subdivision for the Y. 118 00:10:37,920 --> 00:10:39,670 And that is the end of this video. 119 00:10:39,930 --> 00:10:43,760 I hope you enjoy it and I will see you in the next video.