1 00:00:00,300 --> 00:00:00,766 All right. 2 00:00:00,766 --> 00:00:03,866 Welcome back to talking about k fold cross-validation, a 3 00:00:03,866 --> 00:00:07,433 very important tool in your toolkit for assessing, 4 00:00:07,733 --> 00:00:11,833 how well your model is working with the data that you have. 5 00:00:12,300 --> 00:00:12,600 Okay. 6 00:00:12,600 --> 00:00:14,466 So here is what we normally do. 7 00:00:14,466 --> 00:00:18,233 We have a data set, and we usually split it into a training 8 00:00:18,233 --> 00:00:20,200 set and a test set. 9 00:00:20,200 --> 00:00:23,366 And from here we're going to talk about k fold cross-validation. 10 00:00:23,366 --> 00:00:26,066 But first I wanted to make a quick note that, 11 00:00:26,066 --> 00:00:29,266 so there is there are two schools of thought. 12 00:00:29,300 --> 00:00:33,233 Well, basically one school of thought is that when you're doing KS 13 00:00:33,233 --> 00:00:35,666 fold cross-validation, you don't need the test set. 14 00:00:35,666 --> 00:00:40,200 It is enough to do k fold cross-validation in the second, school of thought, 15 00:00:40,200 --> 00:00:44,600 you still do the test set and you do k fold cross-validation on the training set. 16 00:00:45,366 --> 00:00:47,433 and then you still use the test set later on. 17 00:00:47,433 --> 00:00:49,366 So those are two different approaches. 18 00:00:49,366 --> 00:00:52,566 We are going to talk more about that at the end of this tutorial. 19 00:00:53,000 --> 00:00:56,133 In throughout this tutorial we're going to use a second school 20 00:00:56,133 --> 00:00:59,733 of thought because it's more, it's more general. 21 00:00:59,733 --> 00:01:01,166 And then we'll be able 22 00:01:01,166 --> 00:01:06,066 to, like simplify it to make it, appropriate for the first goal 23 00:01:06,100 --> 00:01:08,033 thought at the end of this tutorial when we discuss it. 24 00:01:08,033 --> 00:01:11,033 So for now, let's, stick to breaking 25 00:01:11,033 --> 00:01:14,033 the data set into a training set and a test set. 26 00:01:14,033 --> 00:01:17,566 Now, once you've broken down what you do next for k fold cross-validation. 27 00:01:17,566 --> 00:01:18,633 So normally what you would do is 28 00:01:18,633 --> 00:01:22,266 you would just train your model here and then test your model here. 29 00:01:23,100 --> 00:01:23,433 Right. 30 00:01:23,433 --> 00:01:27,500 And from that you would get like a result. Yes. 31 00:01:27,500 --> 00:01:28,666 The model hasn't seen this data. 32 00:01:28,666 --> 00:01:31,666 So you would be able to tell how well it performs on this test set. 33 00:01:31,800 --> 00:01:34,333 But what if you just get lucky on this test set? 34 00:01:34,333 --> 00:01:37,333 What if it just so happens that it does well in test set, 35 00:01:37,466 --> 00:01:40,366 but then on future data, it's not going to do well at all. 36 00:01:40,366 --> 00:01:42,600 So that's what k fold cross-validation is for. 37 00:01:42,600 --> 00:01:46,200 It's here to combat that scenario where you just got lucky on the test set 38 00:01:46,466 --> 00:01:49,466 to ensure with more certainty that your model is doing well. 39 00:01:49,633 --> 00:01:51,433 So what are we going to do is we're going to take the training set. 40 00:01:51,433 --> 00:01:54,433 We're going to split it into, ten folds. 41 00:01:54,433 --> 00:01:55,833 It's actually k folds. 42 00:01:55,833 --> 00:01:59,433 But for our tutorial for simplicity's sake, 43 00:01:59,433 --> 00:02:01,166 which is going to assume k equals to ten. 44 00:02:01,166 --> 00:02:02,933 So it's playing into ten folds. 45 00:02:02,933 --> 00:02:06,566 A fold is just a fancy word for saying we're going to split ten parts. 46 00:02:06,566 --> 00:02:08,200 Each part is about this. 47 00:02:08,200 --> 00:02:10,700 They're all about the same in size and they don't overlap. 48 00:02:12,033 --> 00:02:15,633 then what we're going to do is we're going to train the data 49 00:02:15,633 --> 00:02:19,500 on, nine of these folds and keep one fold 50 00:02:19,600 --> 00:02:22,933 as an unseen, fold for validation. 51 00:02:22,933 --> 00:02:25,033 So that's, that's going to be our training data 52 00:02:25,033 --> 00:02:26,500 and that's going to be our validation data. 53 00:02:26,500 --> 00:02:30,133 Think of it as, basically training data, testing data. 54 00:02:30,133 --> 00:02:31,500 But we're going to use validation. 55 00:02:31,500 --> 00:02:34,500 So we don't confuse with this taste testing testing data. 56 00:02:34,666 --> 00:02:37,733 So we're going to train it on this, this data of these nine folds. 57 00:02:37,800 --> 00:02:40,900 And then validate or find our metrics and calculate whatever 58 00:02:40,900 --> 00:02:45,200 we need to calculate of how well our model is performing on this validation 59 00:02:45,200 --> 00:02:48,233 set of validation fold, because it has not seen it before. 60 00:02:48,566 --> 00:02:50,866 Great. Then we're going to do that again. 61 00:02:50,866 --> 00:02:53,400 But now we're going to shift the validation fold. 62 00:02:53,400 --> 00:02:55,066 The validation fold becomes this fold. 63 00:02:55,066 --> 00:02:56,166 So now we're going to train the data 64 00:02:56,166 --> 00:03:00,000 on the on this data set right on these nine folds. 65 00:03:00,000 --> 00:03:01,366 And as you can see it's slightly. 66 00:03:01,366 --> 00:03:05,266 So the training data has slightly changed and the fold is completely new. 67 00:03:05,266 --> 00:03:06,666 All the validation folds completely new. 68 00:03:06,666 --> 00:03:08,900 And again it's not going to be seen during this training. 69 00:03:08,900 --> 00:03:11,766 So we're going to get a new model as a result of this training. 70 00:03:11,766 --> 00:03:13,433 a new trained model. 71 00:03:13,433 --> 00:03:15,000 And we're going to validate on this fold. 72 00:03:15,000 --> 00:03:19,400 And note every time we do this for every fold or every like 73 00:03:19,433 --> 00:03:20,633 combination of folds. 74 00:03:20,633 --> 00:03:24,766 So here and here we have to use the same hyperparameters. 75 00:03:24,766 --> 00:03:25,500 Very important. 76 00:03:25,500 --> 00:03:27,900 So we've decided on our hyperparameters. 77 00:03:27,900 --> 00:03:32,900 And now we're just training the model again and again on slightly different 78 00:03:32,900 --> 00:03:36,933 training data and validating it on the validating fold, which is changing, 79 00:03:37,100 --> 00:03:38,700 which is shifting, as you can see. 80 00:03:38,700 --> 00:03:40,266 So here's our six training. 81 00:03:40,266 --> 00:03:42,033 So we train it on all of this data 82 00:03:42,033 --> 00:03:45,633 and then validate it on this fold which is not seen during training. 83 00:03:46,066 --> 00:03:49,033 So we keep doing that and we keep shifting shifting. 84 00:03:49,033 --> 00:03:51,766 So if we have ten folds we're going to have to do train 85 00:03:51,766 --> 00:03:54,366 ten train train ten models. 86 00:03:54,366 --> 00:03:56,066 And each time we're going to use the same hyperparameter. 87 00:03:56,066 --> 00:03:59,166 So the model hyperparameter model and the hyperparameters 88 00:03:59,166 --> 00:04:00,533 are the same during training. 89 00:04:00,533 --> 00:04:05,100 Of course, it will result in it'll be a different slightly different result. 90 00:04:05,100 --> 00:04:07,800 And then we'll validate it on the validation fold. 91 00:04:07,800 --> 00:04:11,566 And as a result we will have ten sets of metrics. 92 00:04:11,566 --> 00:04:15,166 Remember, if we just did the training set and the test set just the normal two, 93 00:04:15,433 --> 00:04:18,000 then we would have one set of metrics. Then we could have gotten lucky. 94 00:04:18,000 --> 00:04:21,600 Whereas here we're going to have ten sets of metrics. 95 00:04:21,600 --> 00:04:25,800 It's much less likely that we got lucky ten times. 96 00:04:26,100 --> 00:04:26,333 Right. 97 00:04:26,333 --> 00:04:29,200 So it's much more reliable now that we're going to have, 98 00:04:29,200 --> 00:04:32,200 ten sets of metrics and we can look at them in aggregate. 99 00:04:32,366 --> 00:04:33,766 And that's exactly what we're going to do. 100 00:04:33,766 --> 00:04:35,333 So let's make some space. 101 00:04:35,333 --> 00:04:39,000 And here we're going to assess these metrics and look at them in aggregate. 102 00:04:39,266 --> 00:04:42,266 And if these metrics look good in aggregate 103 00:04:42,300 --> 00:04:44,466 then the modeling approach is valid. 104 00:04:44,466 --> 00:04:45,733 So the model you've selected 105 00:04:45,733 --> 00:04:49,800 and the hyperparameters you selected are good for this data. 106 00:04:50,166 --> 00:04:53,333 And then what we're going to do is we're going to train the model again 107 00:04:53,333 --> 00:04:56,366 to go and train the model one more time, one last time. 108 00:04:56,533 --> 00:04:59,133 This time we're going to train an all of the training data, 109 00:04:59,133 --> 00:05:03,333 and then we're going to test it on the test set as usual. 110 00:05:03,333 --> 00:05:04,866 That's our final step. 111 00:05:04,866 --> 00:05:08,266 On the other hand, if the aggregate metrics don't look good 112 00:05:08,566 --> 00:05:10,033 then something's wrong then. 113 00:05:10,033 --> 00:05:12,900 Otherwise we need to. If so, they don't look. 114 00:05:12,900 --> 00:05:17,866 If if they don't look good, we need to adjust hyperparameters of the model. 115 00:05:17,866 --> 00:05:20,866 Or we have to change the model entirely 116 00:05:21,300 --> 00:05:24,300 and repeat this whole process of k fold cross-validation. 117 00:05:24,866 --> 00:05:27,300 So that's what k fold cross-validation is. 118 00:05:27,300 --> 00:05:28,333 And that's how it works. 119 00:05:28,333 --> 00:05:31,500 As we discussed at the beginning, there's a few schools of thought. 120 00:05:31,500 --> 00:05:34,500 This was a second school or they don't really have numbers. 121 00:05:34,633 --> 00:05:37,933 This is one of the schools of thought that says we should have the training set. 122 00:05:37,933 --> 00:05:40,666 Apply k fold cross-validation to the training set, 123 00:05:40,666 --> 00:05:43,300 says the metrics, and then retrain the model on the training set. 124 00:05:43,300 --> 00:05:47,366 Once we're happy and then still test it on test set, the other school of thought 125 00:05:47,366 --> 00:05:50,966 says, let's get rid of this testing step, right? 126 00:05:50,966 --> 00:05:53,900 So we've already, run the model here many times, 127 00:05:54,900 --> 00:05:57,733 and we've tested it on these validation metrics. 128 00:05:57,733 --> 00:06:00,733 So we're just going to train the model on the training set. 129 00:06:00,733 --> 00:06:03,100 And we will no need to test it anymore. 130 00:06:03,100 --> 00:06:05,133 We've tested it. We know it works. 131 00:06:05,133 --> 00:06:07,866 then there's a modification of that school of thought as well 132 00:06:07,866 --> 00:06:12,000 where you don't even train it on this training set anymore. 133 00:06:12,100 --> 00:06:18,066 So you just say, okay, well, we've, train the model ten times here. 134 00:06:18,100 --> 00:06:20,366 We've got ten different models. As a result. 135 00:06:20,366 --> 00:06:22,533 We've tested all of them, so we don't need to do this test. 136 00:06:22,533 --> 00:06:23,666 And we're not going to do the training. 137 00:06:23,666 --> 00:06:25,800 We're just going to pick one of these models. 138 00:06:25,800 --> 00:06:28,900 that is kind of like a little bit challenging in my view. 139 00:06:28,900 --> 00:06:31,700 Like how do you pick which one to go with? 140 00:06:31,700 --> 00:06:32,566 You're going to just take 141 00:06:32,566 --> 00:06:36,500 the one of the the best metrics or them the closest metrics to the average. 142 00:06:36,500 --> 00:06:39,733 So it's it creates a little bit of extra work to pick the model out of these, 143 00:06:39,733 --> 00:06:42,133 because they're all going to be slightly different models, 144 00:06:42,133 --> 00:06:45,133 because the underlying training data was slightly different. 145 00:06:45,900 --> 00:06:48,900 So that's also an option. 146 00:06:49,000 --> 00:06:51,933 And then there's yet another 147 00:06:51,933 --> 00:06:55,233 modification of how to think about k fold cross-validation. 148 00:06:55,633 --> 00:06:58,566 what you could do is you could take, 149 00:06:58,566 --> 00:07:02,600 this part and do it first, you know, do the classic part first. 150 00:07:02,766 --> 00:07:05,766 Split your data into a training set and a test set. 151 00:07:05,900 --> 00:07:08,700 Train your model on the training set, test it on the test set. 152 00:07:08,700 --> 00:07:12,166 And then if you're happy with the results of this classic approach, 153 00:07:12,400 --> 00:07:17,033 you could go an extra step and apply k fold cross-validation. 154 00:07:17,033 --> 00:07:18,500 So do all of this. 155 00:07:18,500 --> 00:07:20,633 But after you've done the training and testing. 156 00:07:20,633 --> 00:07:24,800 And so then once you've done all of this, if your aggregate metrics 157 00:07:24,800 --> 00:07:27,900 still look good, then you can confirm and say, well, I'm happy, 158 00:07:28,533 --> 00:07:32,100 that even like, like I, I didn't get lucky on the test set. 159 00:07:32,100 --> 00:07:36,300 Basically it was indeed the fact that it works on the test set, 160 00:07:36,600 --> 00:07:42,366 is is not just a chance in reality, which I tested it later. 161 00:07:42,366 --> 00:07:44,700 I tested it with k fold cross-validation still works. 162 00:07:44,700 --> 00:07:46,400 So I'm just going to keep my original model 163 00:07:46,400 --> 00:07:48,100 that I trained in this first approach. 164 00:07:48,100 --> 00:07:49,600 So in this case 165 00:07:49,600 --> 00:07:54,566 k fold cross-validation is acting as like an add on to your classic method. 166 00:07:54,566 --> 00:07:57,166 So it's kind of the same thing as we discussed. 167 00:07:57,166 --> 00:08:00,300 as that this the what's it called? 168 00:08:00,900 --> 00:08:03,200 The general, the most general k fold cross-validation. 169 00:08:03,200 --> 00:08:05,600 When you do all this first and then you train and then you test, 170 00:08:05,600 --> 00:08:07,800 it's kind of the same thing, but it's just doing it backwards. 171 00:08:07,800 --> 00:08:13,266 So you can do that to, what whatever you're happy with, whatever works for you. 172 00:08:13,266 --> 00:08:16,333 As long as, you know why you're doing it 173 00:08:16,333 --> 00:08:20,200 and, you know, like, what results you're aiming for. 174 00:08:20,200 --> 00:08:23,700 You know, how to assess, the, indications 175 00:08:23,700 --> 00:08:27,133 that k fold cross-validation is giving to, the rest of the details. 176 00:08:27,133 --> 00:08:30,900 There's no, like, one hard and fast way that you have to do it, as long as you get 177 00:08:31,200 --> 00:08:33,366 the outcome. We get the benefits of k 178 00:08:33,366 --> 00:08:35,500 k fold cross-validation that you're aiming to get. 179 00:08:35,500 --> 00:08:39,833 And of course, you don't let, the, the validation 180 00:08:39,833 --> 00:08:41,300 data leak into the training data 181 00:08:41,300 --> 00:08:44,066 so you don't let the model see the validation data during training 182 00:08:44,066 --> 00:08:46,866 or the test set during training set if you're doing those okay. 183 00:08:46,866 --> 00:08:48,500 So that's k fold cross 184 00:08:48,500 --> 00:08:53,466 congratulate on adding a new powerful tool for model assessment to your toolkit. 185 00:08:53,733 --> 00:08:55,233 And I look forward to seeing you back here next time. 186 00:08:55,233 --> 00:08:57,266 Until then, enjoy machine learning.