1 00:00:00,940 --> 00:00:07,720 So now we've got a fully trained model on the full training dataset a.k.a. ten thousand images. 2 00:00:07,720 --> 00:00:13,220 Because if we look at X we got 10000 images and labels. 3 00:00:13,290 --> 00:00:21,110 Now let's use our fully trained model to make some predictions on the test dataset which is also about 4 00:00:21,110 --> 00:00:23,810 10000 images we'll have a look at that in a second. 5 00:00:23,990 --> 00:00:31,220 But let's come up to Kaggle and see how they want our predictions to look. 6 00:00:31,260 --> 00:00:36,600 So a submission file for each image in the test set you must predict the probability for each of the 7 00:00:36,600 --> 00:00:37,820 different breeds. 8 00:00:37,920 --> 00:00:41,280 The file should contain a header and have the following format. 9 00:00:41,790 --> 00:00:44,340 So we need eye day dog breed one. 10 00:00:44,340 --> 00:00:50,320 I'm not even going to pretend I'm going to try to pronounce that once again Afghan hound Yorkshire terrier. 11 00:00:50,370 --> 00:00:53,700 And then the prediction probabilities for all of those. 12 00:00:53,820 --> 00:00:54,440 All right. 13 00:00:54,570 --> 00:00:57,120 Now it says multi class log loss. 14 00:00:57,120 --> 00:01:04,550 That's just the prediction probabilities for each dog breed create another header section making predictions 15 00:01:05,120 --> 00:01:07,950 on the test data set. 16 00:01:07,960 --> 00:01:12,470 Now I want you to have a think before we go through the code. 17 00:01:12,570 --> 00:01:18,240 What did we have to do to train our model on the training images again. 18 00:01:18,310 --> 00:01:26,450 Pretend we're this is data agnostic whatever data are working with whether it's text or audio or images 19 00:01:26,450 --> 00:01:28,100 or video or something like that. 20 00:01:28,190 --> 00:01:34,960 What is our rule before we can use it with a machine learning model or in our case a deep learning no 21 00:01:34,960 --> 00:01:38,780 network we've got to convert it into numbers. 22 00:01:38,790 --> 00:01:44,340 So that's what we have to do the exact same data type that our model was trained on. 23 00:01:44,340 --> 00:01:51,480 So the data batch up here the data batch of images we need to create one of those out of the test images 24 00:01:52,260 --> 00:01:57,380 so that we can make predictions on those images so we'll write ourselves a little note. 25 00:01:57,810 --> 00:02:08,460 Since our model has been trained on images in the form of tens of batches to make predictions on the 26 00:02:08,460 --> 00:02:15,960 test data we'll have to get it into the same format. 27 00:02:15,960 --> 00:02:21,390 So if we come back to our graphic and keynote remember how I always said we're focusing on the inputs. 28 00:02:21,420 --> 00:02:27,180 Okay getting our data into the right shape for our machine learning algorithm and then making sure the 29 00:02:27,180 --> 00:02:28,120 outputs are correct. 30 00:02:28,170 --> 00:02:30,450 This is what we focus on in the beginning. 31 00:02:30,480 --> 00:02:36,480 This is already being implemented for us in quite a large way as we get deeper and deeper as you get 32 00:02:36,480 --> 00:02:37,520 into further projects. 33 00:02:37,530 --> 00:02:42,120 Then you might want to start diving into this but for the time being and for approaching your first 34 00:02:42,120 --> 00:02:49,680 projects like we're doing now you're gonna be focused on inputs and outputs now a little tidbit here 35 00:02:49,770 --> 00:02:52,950 is again our functions coming in handy. 36 00:02:52,950 --> 00:03:04,740 Luckily we created create data batches earlier which can take a list of file names as input. 37 00:03:04,860 --> 00:03:12,250 If you don't remember go back up to where we create data batches and convert them into tensor batches 38 00:03:14,230 --> 00:03:18,660 so what steps do we have to do to make predictions on the test data. 39 00:03:18,700 --> 00:03:23,850 Let's write those down to make predictions on the test data. 40 00:03:24,400 --> 00:03:30,400 Well now this is kind of what I do with a lot of my projects is just talk to myself talk myself through 41 00:03:30,400 --> 00:03:30,790 it. 42 00:03:30,880 --> 00:03:37,660 The rubber ducky technique so get the test image file names that'll involve us writing just a little 43 00:03:37,660 --> 00:03:43,630 list comprehension like we've done before to access the test go through all the files in here and save 44 00:03:43,630 --> 00:03:51,070 the file names to some sort of list Yep we can do that and then we've got to convert the file names 45 00:03:51,580 --> 00:04:07,120 into test data batches using create data batches and setting the test data. 46 00:04:07,120 --> 00:04:11,260 Remember how we created parameters and create data batches if you don't that's okay we're gonna see 47 00:04:11,260 --> 00:04:18,970 in the second parameter to true because why what's different about our test data to our training data. 48 00:04:18,970 --> 00:04:20,860 There's no labels with the test data. 49 00:04:21,340 --> 00:04:32,860 So since the test data doesn't have labels and then finally we make a predictions predictions array 50 00:04:32,920 --> 00:04:46,100 by passing the test data batches to the predict method called on our model. 51 00:04:46,200 --> 00:04:49,410 We've got three steps there but we can do them we can tackle them one by one. 52 00:04:49,410 --> 00:04:51,200 How about we start with the first one. 53 00:04:51,210 --> 00:04:52,660 That's pretty logical. 54 00:04:52,740 --> 00:04:55,800 Load test image file names 55 00:04:59,000 --> 00:04:59,720 what could we do. 56 00:04:59,720 --> 00:05:01,260 We could go test file names. 57 00:05:01,310 --> 00:05:03,920 Nice and simple equals. 58 00:05:04,010 --> 00:05:06,370 Actually it might set up the test path. 59 00:05:06,380 --> 00:05:11,570 How about we do that test path equals and we'll just copy it here to save us. 60 00:05:11,570 --> 00:05:16,420 Writing it out twice copy path. 61 00:05:16,550 --> 00:05:21,000 That's a little handy tip you can do that and paste. 62 00:05:21,050 --> 00:05:25,180 We don't need content we could just keep it there if we wanted to. 63 00:05:25,210 --> 00:05:29,090 But I'm going to remove it so it's consistent with the other path that we have. 64 00:05:29,090 --> 00:05:35,300 And I'm gonna put this one here so that's my path to the test folder. 65 00:05:35,340 --> 00:05:44,130 So now we're gonna go create a little list comprehension equals test path plus f name file name for 66 00:05:44,220 --> 00:05:56,730 F name in our stop list test path now we've seen this before when we're creating the original training 67 00:05:56,730 --> 00:05:59,830 file names but let's just remind ourselves what it's doing. 68 00:05:59,880 --> 00:06:04,660 So our stop Lister is just going through a list directory that's what it's short for. 69 00:06:04,890 --> 00:06:13,890 It's just saying hey Python get me all of the file names in this folder and then we're creating a list 70 00:06:14,340 --> 00:06:19,250 which appends the test path which is this all adds together. 71 00:06:19,470 --> 00:06:29,230 The test path plus every single file name in the test folder let's have a look. 72 00:06:29,280 --> 00:06:32,550 If in doubt run the code. 73 00:06:32,800 --> 00:06:33,760 Wonderful. 74 00:06:33,760 --> 00:06:38,190 So now we've got the file names towards our test data. 75 00:06:38,290 --> 00:06:43,240 We can take that off get our green take emerging. 76 00:06:43,240 --> 00:06:46,390 This notebook is gonna be filled with them by the end. 77 00:06:46,390 --> 00:06:46,900 Okay. 78 00:06:47,020 --> 00:06:47,920 Now we've got it. 79 00:06:47,950 --> 00:06:48,940 What's our next step. 80 00:06:48,940 --> 00:06:54,760 Convert the file names into test data batches using create data batches. 81 00:06:54,850 --> 00:06:57,490 Oh I said date there. 82 00:06:57,760 --> 00:06:58,940 That's wrong. 83 00:06:59,010 --> 00:07:03,940 Now we go fix it up first of all let's just do a little a length check of our test file name see how 84 00:07:03,940 --> 00:07:05,020 many images we have 85 00:07:08,970 --> 00:07:10,950 ten thousand three hundred fifty seven. 86 00:07:10,980 --> 00:07:11,610 Wonderful. 87 00:07:12,030 --> 00:07:16,220 So now to create a data batch it's pretty simple. 88 00:07:16,470 --> 00:07:22,320 We just bring in our function we go test data we write a comment here before we get started. 89 00:07:22,320 --> 00:07:32,480 So create test data batch and we're going to create data batches on it knows what we're talking about 90 00:07:32,880 --> 00:07:40,670 and then we're gonna pass it the test file names and we're also gonna see there's our pest data Lane. 91 00:07:40,820 --> 00:07:48,230 So we go to test data equals true and we'll run that wonderful. 92 00:07:48,280 --> 00:07:53,770 We get a little printout saying creating test data batches such a great function that we wrote. 93 00:07:54,070 --> 00:07:55,770 Let's come up here and have a look at it again. 94 00:07:56,880 --> 00:08:06,300 Turning our data into batches so we've got a function create data batch create data batches and what 95 00:08:06,300 --> 00:08:06,780 does it do. 96 00:08:06,870 --> 00:08:13,890 If it's test data which it is it prints out this little statement and then it goes hey turn our file 97 00:08:13,890 --> 00:08:21,600 names into tenses and then use from 10 to slices to convert it into a tensor dataset and then create 98 00:08:21,660 --> 00:08:25,980 a data batch by mapping the process image function. 99 00:08:25,980 --> 00:08:31,680 Remember we don't pass get image label to the test dataset because there's no labels with the test data 100 00:08:31,680 --> 00:08:32,390 set. 101 00:08:32,460 --> 00:08:38,000 So we process our images just like we did on the training data and then we turn it into a batch. 102 00:08:38,220 --> 00:08:45,240 In our case batches of size 32 so that it can be computed really fast on with our GPA. 103 00:08:45,750 --> 00:08:49,420 So let's come back to where we were. 104 00:08:49,450 --> 00:08:50,340 That's what we've got. 105 00:08:51,790 --> 00:08:53,620 And if we go test data 106 00:08:56,850 --> 00:08:59,340 it's a batch dataset of shapes. 107 00:08:59,340 --> 00:09:00,570 So there's no labels here. 108 00:09:00,570 --> 00:09:06,310 These are just images of two to four by two to four by three for the color channels of type tensor float 109 00:09:06,350 --> 00:09:06,900 float. 110 00:09:07,770 --> 00:09:09,610 So beautiful. 111 00:09:09,660 --> 00:09:13,940 We've got our test data in the form of tensor batches. 112 00:09:13,950 --> 00:09:15,270 That's what we're after. 113 00:09:15,360 --> 00:09:23,940 And now the finale we can come up here see how handy writing our functions to begin with has helped 114 00:09:23,940 --> 00:09:24,360 us out. 115 00:09:24,360 --> 00:09:28,200 We've created a full model we've grabbed a data batch in one hit. 116 00:09:28,200 --> 00:09:30,840 Now we're going to make a predictions array. 117 00:09:30,840 --> 00:09:43,230 So how can we do that make predictions on test data match using the loaded full model. 118 00:09:43,290 --> 00:09:43,680 All right. 119 00:09:43,680 --> 00:09:56,400 So test predictions Eagles loaded full model don't predict test data. 120 00:09:56,590 --> 00:10:02,550 We're going to set this to the Bose Eagles 1 Excellent. 121 00:10:02,560 --> 00:10:12,210 Now I kind of have to warn you that this is going to take a fairly long time reason being is because 122 00:10:12,840 --> 00:10:16,350 let's put a little note here before we run MSL. 123 00:10:16,920 --> 00:10:21,670 Note calling even though we're running on a GPO. 124 00:10:21,870 --> 00:10:32,220 This is just like training our four model so calling predict on our full model and passing it the test 125 00:10:32,310 --> 00:10:42,220 data batch will take a long time to run and then it's gonna be something like about now. 126 00:10:44,040 --> 00:10:48,860 We'll just go about one hour that's what I found anyway. 127 00:10:48,870 --> 00:10:54,720 Reason being is because we have another 10000 images here and what we have to do is get our loaded for 128 00:10:54,720 --> 00:10:59,910 models so all of the patterns that are loaded for model has learned and the training data set it has 129 00:10:59,910 --> 00:11:08,480 to go through those process the 10000 images in our test file names or our test data batch and then 130 00:11:08,540 --> 00:11:12,560 make predictions on those based on the patterns it finds in that test data. 131 00:11:13,310 --> 00:11:18,980 So we've set the votes equal to one so that when we run this it's going to give us another little time 132 00:11:18,980 --> 00:11:22,870 step to update us about what's going on. 133 00:11:22,880 --> 00:11:32,600 So without any further ado let's run this and wait for it to load and then we'll do the same thing again. 134 00:11:32,740 --> 00:11:38,070 All speed up the video so it'll be like instantaneously loaded for you. 135 00:11:38,230 --> 00:11:40,950 In reality it'll take you a while to run this cell. 136 00:11:40,960 --> 00:11:47,080 In reality it take me a while to run this cell but I'm just doing it to show you it loading up creating 137 00:11:47,080 --> 00:11:54,800 an EPA message and then going through making some predictions on the test data so we'll just wait a 138 00:11:54,800 --> 00:11:55,970 little while this is going to go up. 139 00:11:55,970 --> 00:11:56,870 There we go. 140 00:11:56,880 --> 00:11:58,830 So EDTA about 38 minutes. 141 00:11:58,940 --> 00:12:09,150 It has to go through 324 batches and so that is because if you go ten thousand three and fifty seven 142 00:12:09,150 --> 00:12:13,610 divided by 32 it's going to round that up to 324. 143 00:12:13,620 --> 00:12:14,880 That's where that number comes from 144 00:12:18,570 --> 00:12:19,270 okay. 145 00:12:19,300 --> 00:12:23,110 So I'm going to wait for this to go through. 146 00:12:23,140 --> 00:12:27,560 You might have to wait for it to go through as well you don't actually have to wait. 147 00:12:27,580 --> 00:12:33,520 I'll be back in a few seconds so I'm going to pause my video here and I'll be back in three two one 148 00:12:34,210 --> 00:12:35,660 and we're back. 149 00:12:35,830 --> 00:12:42,340 Now I've gone and done something a little bit cheeky here as you see we've got about an hour left but 150 00:12:42,340 --> 00:12:45,600 I figured rather me sit here and wait for this. 151 00:12:45,610 --> 00:12:50,950 I've actually like any good chef has done I've prepared a dish earlier. 152 00:12:51,040 --> 00:12:55,460 So I've actually gone through this process and waited for one hour to go through. 153 00:12:55,660 --> 00:13:01,990 And now you could wait for yours to go through but if you'd like to download a premade CSB of the test 154 00:13:01,990 --> 00:13:07,390 predictions all attach that in the resource section so you don't have to wait that full hour to go through. 155 00:13:07,510 --> 00:13:13,450 And now I probably should have told you this before running this cell but it's a good thing to sort 156 00:13:13,450 --> 00:13:17,120 of just get your feet wet and then figure out a better way of doing things. 157 00:13:17,440 --> 00:13:25,780 If we wanted this to save test predictions to a CSB file after it runs through because if we waited 158 00:13:25,780 --> 00:13:32,230 a full hour for it to go through and then we somehow ran time disconnected and it didn't save they've 159 00:13:32,260 --> 00:13:33,250 been pretty disappointing. 160 00:13:34,150 --> 00:13:43,060 So what we can do is use MP dot saved text this function here to save will this loading dock Schering 161 00:13:43,210 --> 00:13:45,380 while this is running. 162 00:13:45,490 --> 00:13:46,320 I don't think it will. 163 00:13:46,990 --> 00:13:48,430 But what this is going to do. 164 00:13:49,000 --> 00:13:51,620 Oh we forgot this parades array. 165 00:13:52,150 --> 00:13:58,900 So let's just say that this finished if I ran this cell after it it's going to save test predictions. 166 00:13:58,910 --> 00:14:07,040 So this num pi array which is a prediction probabilities array to this file as a CSB with the delimiter 167 00:14:07,370 --> 00:14:10,190 comma for comma separated values. 168 00:14:10,190 --> 00:14:17,470 And then once it was saved it's going to appear in our files as spreads array don't see us V because 169 00:14:18,280 --> 00:14:26,080 that's the file name we gave it and then we can use MP dot load text to load it back in so what we'll 170 00:14:26,080 --> 00:14:30,710 do rather than wait for this to finish I'll just show you that in action. 171 00:14:31,070 --> 00:14:34,930 So these are some prediction probabilities that I've made in the past with a full model. 172 00:14:35,390 --> 00:14:39,520 So I'm going to stop this had about an hour to go. 173 00:14:39,520 --> 00:14:43,770 So again my estimation it's going to take about an hour is incorrect. 174 00:14:44,710 --> 00:14:46,660 So keyboard interrupt. 175 00:14:46,660 --> 00:14:47,290 There we go. 176 00:14:47,770 --> 00:14:52,600 So we're not going to run that because I've already I've already saved this to press array but I am 177 00:14:52,600 --> 00:14:58,720 going to run there so let's pretend I've run this after this cell has completed so this EDTA has reached 178 00:14:58,730 --> 00:15:02,650 0 so I'm going to load in this array from a c v file 179 00:15:05,660 --> 00:15:06,350 beautiful. 180 00:15:06,960 --> 00:15:14,920 Now let's have a look at the first 10. 181 00:15:15,170 --> 00:15:16,130 There we go. 182 00:15:16,130 --> 00:15:21,530 So these are all prediction probabilities for the 10000 images so let's have a look at the shape of 183 00:15:21,530 --> 00:15:22,370 test predictions 184 00:15:28,220 --> 00:15:32,190 ten thousand three and fifty seven so that's how many test images we have. 185 00:15:32,190 --> 00:15:36,440 And each one of them has 120 different prediction probabilities. 186 00:15:36,450 --> 00:15:43,260 Now we've seen these values before and so remember I've just taken this. 187 00:15:43,260 --> 00:15:49,950 Let it run to its extent save this HSV to a file that I've done earlier as if we've cooked something 188 00:15:49,950 --> 00:15:53,500 earlier and then I've just reloaded it in here. 189 00:15:53,610 --> 00:16:01,560 If my runtime disconnected I'd want this file to be saved somewhere that I can access it later and so 190 00:16:02,070 --> 00:16:02,970 what's our next mission. 191 00:16:03,150 --> 00:16:06,540 If we wanted to submit this predictions array to cargo what would it have to look like. 192 00:16:06,630 --> 00:16:11,040 So the sample submission is here. 193 00:16:11,340 --> 00:16:13,290 Something went wrong all letting your data. 194 00:16:13,290 --> 00:16:14,190 That's not very fun. 195 00:16:14,340 --> 00:16:20,810 But luckily I've got this file in dog vision and it's a V. 196 00:16:20,970 --> 00:16:25,050 And so all I've done is I've just download it and opened it and Google Sheets. 197 00:16:25,050 --> 00:16:26,070 So this is what it looks like. 198 00:16:26,070 --> 00:16:30,950 This is the the sample submission we need to get our predictions array into something like this. 199 00:16:30,960 --> 00:16:38,370 So an idea column and then a column for each of the different dog breeds and then their prediction probabilities. 200 00:16:38,370 --> 00:16:43,770 So that's what we'll work towards in the next video we'll get our prediction probabilities array for 201 00:16:43,770 --> 00:16:49,280 dog vision or we're actually going back to the Kaggle dog breed identification format. 202 00:16:49,770 --> 00:16:54,330 So we'll get it set up like that so we can make a submission to Kaggle with our predictions.