1
00:00:00,940 --> 00:00:07,720
So now we've got a fully trained model on the full training dataset a.k.a. ten thousand images.

2
00:00:07,720 --> 00:00:13,220
Because if we look at X we got 10000 images and labels.

3
00:00:13,290 --> 00:00:21,110
Now let's use our fully trained model to make some predictions on the test dataset which is also about

4
00:00:21,110 --> 00:00:23,810
10000 images we'll have a look at that in a second.

5
00:00:23,990 --> 00:00:31,220
But let's come up to Kaggle and see how they want our predictions to look.

6
00:00:31,260 --> 00:00:36,600
So a submission file for each image in the test set you must predict the probability for each of the

7
00:00:36,600 --> 00:00:37,820
different breeds.

8
00:00:37,920 --> 00:00:41,280
The file should contain a header and have the following format.

9
00:00:41,790 --> 00:00:44,340
So we need eye day dog breed one.

10
00:00:44,340 --> 00:00:50,320
I'm not even going to pretend I'm going to try to pronounce that once again Afghan hound Yorkshire terrier.

11
00:00:50,370 --> 00:00:53,700
And then the prediction probabilities for all of those.

12
00:00:53,820 --> 00:00:54,440
All right.

13
00:00:54,570 --> 00:00:57,120
Now it says multi class log loss.

14
00:00:57,120 --> 00:01:04,550
That's just the prediction probabilities for each dog breed create another header section making predictions

15
00:01:05,120 --> 00:01:07,950
on the test data set.

16
00:01:07,960 --> 00:01:12,470
Now I want you to have a think before we go through the code.

17
00:01:12,570 --> 00:01:18,240
What did we have to do to train our model on the training images again.

18
00:01:18,310 --> 00:01:26,450
Pretend we're this is data agnostic whatever data are working with whether it's text or audio or images

19
00:01:26,450 --> 00:01:28,100
or video or something like that.

20
00:01:28,190 --> 00:01:34,960
What is our rule before we can use it with a machine learning model or in our case a deep learning no

21
00:01:34,960 --> 00:01:38,780
network we've got to convert it into numbers.

22
00:01:38,790 --> 00:01:44,340
So that's what we have to do the exact same data type that our model was trained on.

23
00:01:44,340 --> 00:01:51,480
So the data batch up here the data batch of images we need to create one of those out of the test images

24
00:01:52,260 --> 00:01:57,380
so that we can make predictions on those images so we'll write ourselves a little note.

25
00:01:57,810 --> 00:02:08,460
Since our model has been trained on images in the form of tens of batches to make predictions on the

26
00:02:08,460 --> 00:02:15,960
test data we'll have to get it into the same format.

27
00:02:15,960 --> 00:02:21,390
So if we come back to our graphic and keynote remember how I always said we're focusing on the inputs.

28
00:02:21,420 --> 00:02:27,180
Okay getting our data into the right shape for our machine learning algorithm and then making sure the

29
00:02:27,180 --> 00:02:28,120
outputs are correct.

30
00:02:28,170 --> 00:02:30,450
This is what we focus on in the beginning.

31
00:02:30,480 --> 00:02:36,480
This is already being implemented for us in quite a large way as we get deeper and deeper as you get

32
00:02:36,480 --> 00:02:37,520
into further projects.

33
00:02:37,530 --> 00:02:42,120
Then you might want to start diving into this but for the time being and for approaching your first

34
00:02:42,120 --> 00:02:49,680
projects like we're doing now you're gonna be focused on inputs and outputs now a little tidbit here

35
00:02:49,770 --> 00:02:52,950
is again our functions coming in handy.

36
00:02:52,950 --> 00:03:04,740
Luckily we created create data batches earlier which can take a list of file names as input.

37
00:03:04,860 --> 00:03:12,250
If you don't remember go back up to where we create data batches and convert them into tensor batches

38
00:03:14,230 --> 00:03:18,660
so what steps do we have to do to make predictions on the test data.

39
00:03:18,700 --> 00:03:23,850
Let's write those down to make predictions on the test data.

40
00:03:24,400 --> 00:03:30,400
Well now this is kind of what I do with a lot of my projects is just talk to myself talk myself through

41
00:03:30,400 --> 00:03:30,790
it.

42
00:03:30,880 --> 00:03:37,660
The rubber ducky technique so get the test image file names that'll involve us writing just a little

43
00:03:37,660 --> 00:03:43,630
list comprehension like we've done before to access the test go through all the files in here and save

44
00:03:43,630 --> 00:03:51,070
the file names to some sort of list Yep we can do that and then we've got to convert the file names

45
00:03:51,580 --> 00:04:07,120
into test data batches using create data batches and setting the test data.

46
00:04:07,120 --> 00:04:11,260
Remember how we created parameters and create data batches if you don't that's okay we're gonna see

47
00:04:11,260 --> 00:04:18,970
in the second parameter to true because why what's different about our test data to our training data.

48
00:04:18,970 --> 00:04:20,860
There's no labels with the test data.

49
00:04:21,340 --> 00:04:32,860
So since the test data doesn't have labels and then finally we make a predictions predictions array

50
00:04:32,920 --> 00:04:46,100
by passing the test data batches to the predict method called on our model.

51
00:04:46,200 --> 00:04:49,410
We've got three steps there but we can do them we can tackle them one by one.

52
00:04:49,410 --> 00:04:51,200
How about we start with the first one.

53
00:04:51,210 --> 00:04:52,660
That's pretty logical.

54
00:04:52,740 --> 00:04:55,800
Load test image file names

55
00:04:59,000 --> 00:04:59,720
what could we do.

56
00:04:59,720 --> 00:05:01,260
We could go test file names.

57
00:05:01,310 --> 00:05:03,920
Nice and simple equals.

58
00:05:04,010 --> 00:05:06,370
Actually it might set up the test path.

59
00:05:06,380 --> 00:05:11,570
How about we do that test path equals and we'll just copy it here to save us.

60
00:05:11,570 --> 00:05:16,420
Writing it out twice copy path.

61
00:05:16,550 --> 00:05:21,000
That's a little handy tip you can do that and paste.

62
00:05:21,050 --> 00:05:25,180
We don't need content we could just keep it there if we wanted to.

63
00:05:25,210 --> 00:05:29,090
But I'm going to remove it so it's consistent with the other path that we have.

64
00:05:29,090 --> 00:05:35,300
And I'm gonna put this one here so that's my path to the test folder.

65
00:05:35,340 --> 00:05:44,130
So now we're gonna go create a little list comprehension equals test path plus f name file name for

66
00:05:44,220 --> 00:05:56,730
F name in our stop list test path now we've seen this before when we're creating the original training

67
00:05:56,730 --> 00:05:59,830
file names but let's just remind ourselves what it's doing.

68
00:05:59,880 --> 00:06:04,660
So our stop Lister is just going through a list directory that's what it's short for.

69
00:06:04,890 --> 00:06:13,890
It's just saying hey Python get me all of the file names in this folder and then we're creating a list

70
00:06:14,340 --> 00:06:19,250
which appends the test path which is this all adds together.

71
00:06:19,470 --> 00:06:29,230
The test path plus every single file name in the test folder let's have a look.

72
00:06:29,280 --> 00:06:32,550
If in doubt run the code.

73
00:06:32,800 --> 00:06:33,760
Wonderful.

74
00:06:33,760 --> 00:06:38,190
So now we've got the file names towards our test data.

75
00:06:38,290 --> 00:06:43,240
We can take that off get our green take emerging.

76
00:06:43,240 --> 00:06:46,390
This notebook is gonna be filled with them by the end.

77
00:06:46,390 --> 00:06:46,900
Okay.

78
00:06:47,020 --> 00:06:47,920
Now we've got it.

79
00:06:47,950 --> 00:06:48,940
What's our next step.

80
00:06:48,940 --> 00:06:54,760
Convert the file names into test data batches using create data batches.

81
00:06:54,850 --> 00:06:57,490
Oh I said date there.

82
00:06:57,760 --> 00:06:58,940
That's wrong.

83
00:06:59,010 --> 00:07:03,940
Now we go fix it up first of all let's just do a little a length check of our test file name see how

84
00:07:03,940 --> 00:07:05,020
many images we have

85
00:07:08,970 --> 00:07:10,950
ten thousand three hundred fifty seven.

86
00:07:10,980 --> 00:07:11,610
Wonderful.

87
00:07:12,030 --> 00:07:16,220
So now to create a data batch it's pretty simple.

88
00:07:16,470 --> 00:07:22,320
We just bring in our function we go test data we write a comment here before we get started.

89
00:07:22,320 --> 00:07:32,480
So create test data batch and we're going to create data batches on it knows what we're talking about

90
00:07:32,880 --> 00:07:40,670
and then we're gonna pass it the test file names and we're also gonna see there's our pest data Lane.

91
00:07:40,820 --> 00:07:48,230
So we go to test data equals true and we'll run that wonderful.

92
00:07:48,280 --> 00:07:53,770
We get a little printout saying creating test data batches such a great function that we wrote.

93
00:07:54,070 --> 00:07:55,770
Let's come up here and have a look at it again.

94
00:07:56,880 --> 00:08:06,300
Turning our data into batches so we've got a function create data batch create data batches and what

95
00:08:06,300 --> 00:08:06,780
does it do.

96
00:08:06,870 --> 00:08:13,890
If it's test data which it is it prints out this little statement and then it goes hey turn our file

97
00:08:13,890 --> 00:08:21,600
names into tenses and then use from 10 to slices to convert it into a tensor dataset and then create

98
00:08:21,660 --> 00:08:25,980
a data batch by mapping the process image function.

99
00:08:25,980 --> 00:08:31,680
Remember we don't pass get image label to the test dataset because there's no labels with the test data

100
00:08:31,680 --> 00:08:32,390
set.

101
00:08:32,460 --> 00:08:38,000
So we process our images just like we did on the training data and then we turn it into a batch.

102
00:08:38,220 --> 00:08:45,240
In our case batches of size 32 so that it can be computed really fast on with our GPA.

103
00:08:45,750 --> 00:08:49,420
So let's come back to where we were.

104
00:08:49,450 --> 00:08:50,340
That's what we've got.

105
00:08:51,790 --> 00:08:53,620
And if we go test data

106
00:08:56,850 --> 00:08:59,340
it's a batch dataset of shapes.

107
00:08:59,340 --> 00:09:00,570
So there's no labels here.

108
00:09:00,570 --> 00:09:06,310
These are just images of two to four by two to four by three for the color channels of type tensor float

109
00:09:06,350 --> 00:09:06,900
float.

110
00:09:07,770 --> 00:09:09,610
So beautiful.

111
00:09:09,660 --> 00:09:13,940
We've got our test data in the form of tensor batches.

112
00:09:13,950 --> 00:09:15,270
That's what we're after.

113
00:09:15,360 --> 00:09:23,940
And now the finale we can come up here see how handy writing our functions to begin with has helped

114
00:09:23,940 --> 00:09:24,360
us out.

115
00:09:24,360 --> 00:09:28,200
We've created a full model we've grabbed a data batch in one hit.

116
00:09:28,200 --> 00:09:30,840
Now we're going to make a predictions array.

117
00:09:30,840 --> 00:09:43,230
So how can we do that make predictions on test data match using the loaded full model.

118
00:09:43,290 --> 00:09:43,680
All right.

119
00:09:43,680 --> 00:09:56,400
So test predictions Eagles loaded full model don't predict test data.

120
00:09:56,590 --> 00:10:02,550
We're going to set this to the Bose Eagles 1 Excellent.

121
00:10:02,560 --> 00:10:12,210
Now I kind of have to warn you that this is going to take a fairly long time reason being is because

122
00:10:12,840 --> 00:10:16,350
let's put a little note here before we run MSL.

123
00:10:16,920 --> 00:10:21,670
Note calling even though we're running on a GPO.

124
00:10:21,870 --> 00:10:32,220
This is just like training our four model so calling predict on our full model and passing it the test

125
00:10:32,310 --> 00:10:42,220
data batch will take a long time to run and then it's gonna be something like about now.

126
00:10:44,040 --> 00:10:48,860
We'll just go about one hour that's what I found anyway.

127
00:10:48,870 --> 00:10:54,720
Reason being is because we have another 10000 images here and what we have to do is get our loaded for

128
00:10:54,720 --> 00:10:59,910
models so all of the patterns that are loaded for model has learned and the training data set it has

129
00:10:59,910 --> 00:11:08,480
to go through those process the 10000 images in our test file names or our test data batch and then

130
00:11:08,540 --> 00:11:12,560
make predictions on those based on the patterns it finds in that test data.

131
00:11:13,310 --> 00:11:18,980
So we've set the votes equal to one so that when we run this it's going to give us another little time

132
00:11:18,980 --> 00:11:22,870
step to update us about what's going on.

133
00:11:22,880 --> 00:11:32,600
So without any further ado let's run this and wait for it to load and then we'll do the same thing again.

134
00:11:32,740 --> 00:11:38,070
All speed up the video so it'll be like instantaneously loaded for you.

135
00:11:38,230 --> 00:11:40,950
In reality it'll take you a while to run this cell.

136
00:11:40,960 --> 00:11:47,080
In reality it take me a while to run this cell but I'm just doing it to show you it loading up creating

137
00:11:47,080 --> 00:11:54,800
an EPA message and then going through making some predictions on the test data so we'll just wait a

138
00:11:54,800 --> 00:11:55,970
little while this is going to go up.

139
00:11:55,970 --> 00:11:56,870
There we go.

140
00:11:56,880 --> 00:11:58,830
So EDTA about 38 minutes.

141
00:11:58,940 --> 00:12:09,150
It has to go through 324 batches and so that is because if you go ten thousand three and fifty seven

142
00:12:09,150 --> 00:12:13,610
divided by 32 it's going to round that up to 324.

143
00:12:13,620 --> 00:12:14,880
That's where that number comes from

144
00:12:18,570 --> 00:12:19,270
okay.

145
00:12:19,300 --> 00:12:23,110
So I'm going to wait for this to go through.

146
00:12:23,140 --> 00:12:27,560
You might have to wait for it to go through as well you don't actually have to wait.

147
00:12:27,580 --> 00:12:33,520
I'll be back in a few seconds so I'm going to pause my video here and I'll be back in three two one

148
00:12:34,210 --> 00:12:35,660
and we're back.

149
00:12:35,830 --> 00:12:42,340
Now I've gone and done something a little bit cheeky here as you see we've got about an hour left but

150
00:12:42,340 --> 00:12:45,600
I figured rather me sit here and wait for this.

151
00:12:45,610 --> 00:12:50,950
I've actually like any good chef has done I've prepared a dish earlier.

152
00:12:51,040 --> 00:12:55,460
So I've actually gone through this process and waited for one hour to go through.

153
00:12:55,660 --> 00:13:01,990
And now you could wait for yours to go through but if you'd like to download a premade CSB of the test

154
00:13:01,990 --> 00:13:07,390
predictions all attach that in the resource section so you don't have to wait that full hour to go through.

155
00:13:07,510 --> 00:13:13,450
And now I probably should have told you this before running this cell but it's a good thing to sort

156
00:13:13,450 --> 00:13:17,120
of just get your feet wet and then figure out a better way of doing things.

157
00:13:17,440 --> 00:13:25,780
If we wanted this to save test predictions to a CSB file after it runs through because if we waited

158
00:13:25,780 --> 00:13:32,230
a full hour for it to go through and then we somehow ran time disconnected and it didn't save they've

159
00:13:32,260 --> 00:13:33,250
been pretty disappointing.

160
00:13:34,150 --> 00:13:43,060
So what we can do is use MP dot saved text this function here to save will this loading dock Schering

161
00:13:43,210 --> 00:13:45,380
while this is running.

162
00:13:45,490 --> 00:13:46,320
I don't think it will.

163
00:13:46,990 --> 00:13:48,430
But what this is going to do.

164
00:13:49,000 --> 00:13:51,620
Oh we forgot this parades array.

165
00:13:52,150 --> 00:13:58,900
So let's just say that this finished if I ran this cell after it it's going to save test predictions.

166
00:13:58,910 --> 00:14:07,040
So this num pi array which is a prediction probabilities array to this file as a CSB with the delimiter

167
00:14:07,370 --> 00:14:10,190
comma for comma separated values.

168
00:14:10,190 --> 00:14:17,470
And then once it was saved it's going to appear in our files as spreads array don't see us V because

169
00:14:18,280 --> 00:14:26,080
that's the file name we gave it and then we can use MP dot load text to load it back in so what we'll

170
00:14:26,080 --> 00:14:30,710
do rather than wait for this to finish I'll just show you that in action.

171
00:14:31,070 --> 00:14:34,930
So these are some prediction probabilities that I've made in the past with a full model.

172
00:14:35,390 --> 00:14:39,520
So I'm going to stop this had about an hour to go.

173
00:14:39,520 --> 00:14:43,770
So again my estimation it's going to take about an hour is incorrect.

174
00:14:44,710 --> 00:14:46,660
So keyboard interrupt.

175
00:14:46,660 --> 00:14:47,290
There we go.

176
00:14:47,770 --> 00:14:52,600
So we're not going to run that because I've already I've already saved this to press array but I am

177
00:14:52,600 --> 00:14:58,720
going to run there so let's pretend I've run this after this cell has completed so this EDTA has reached

178
00:14:58,730 --> 00:15:02,650
0 so I'm going to load in this array from a c v file

179
00:15:05,660 --> 00:15:06,350
beautiful.

180
00:15:06,960 --> 00:15:14,920
Now let's have a look at the first 10.

181
00:15:15,170 --> 00:15:16,130
There we go.

182
00:15:16,130 --> 00:15:21,530
So these are all prediction probabilities for the 10000 images so let's have a look at the shape of

183
00:15:21,530 --> 00:15:22,370
test predictions

184
00:15:28,220 --> 00:15:32,190
ten thousand three and fifty seven so that's how many test images we have.

185
00:15:32,190 --> 00:15:36,440
And each one of them has 120 different prediction probabilities.

186
00:15:36,450 --> 00:15:43,260
Now we've seen these values before and so remember I've just taken this.

187
00:15:43,260 --> 00:15:49,950
Let it run to its extent save this HSV to a file that I've done earlier as if we've cooked something

188
00:15:49,950 --> 00:15:53,500
earlier and then I've just reloaded it in here.

189
00:15:53,610 --> 00:16:01,560
If my runtime disconnected I'd want this file to be saved somewhere that I can access it later and so

190
00:16:02,070 --> 00:16:02,970
what's our next mission.

191
00:16:03,150 --> 00:16:06,540
If we wanted to submit this predictions array to cargo what would it have to look like.

192
00:16:06,630 --> 00:16:11,040
So the sample submission is here.

193
00:16:11,340 --> 00:16:13,290
Something went wrong all letting your data.

194
00:16:13,290 --> 00:16:14,190
That's not very fun.

195
00:16:14,340 --> 00:16:20,810
But luckily I've got this file in dog vision and it's a V.

196
00:16:20,970 --> 00:16:25,050
And so all I've done is I've just download it and opened it and Google Sheets.

197
00:16:25,050 --> 00:16:26,070
So this is what it looks like.

198
00:16:26,070 --> 00:16:30,950
This is the the sample submission we need to get our predictions array into something like this.

199
00:16:30,960 --> 00:16:38,370
So an idea column and then a column for each of the different dog breeds and then their prediction probabilities.

200
00:16:38,370 --> 00:16:43,770
So that's what we'll work towards in the next video we'll get our prediction probabilities array for

201
00:16:43,770 --> 00:16:49,280
dog vision or we're actually going back to the Kaggle dog breed identification format.

202
00:16:49,770 --> 00:16:54,330
So we'll get it set up like that so we can make a submission to Kaggle with our predictions.