1
00:00:00,270 --> 00:00:05,010
So we've seen how we can evaluate our models performance or at least initial performance on something

2
00:00:05,010 --> 00:00:10,000
like epoch accuracy an epoch loss and see how we can track different experiments.

3
00:00:10,470 --> 00:00:15,270
But this is just on the models training and validation sets.

4
00:00:15,270 --> 00:00:18,450
And it's only really on what the model has learned in the data set.

5
00:00:18,450 --> 00:00:23,850
What are other really good way to evaluate what our model has learned is to use it for what we're building

6
00:00:23,850 --> 00:00:24,690
it for.

7
00:00:24,690 --> 00:00:28,980
If we're building dog vision and where you're always going to keep in mind what the end goal is with

8
00:00:28,980 --> 00:00:34,500
whatever project we're working on we're building dog vision to see if we can build a machine learning

9
00:00:34,500 --> 00:00:37,620
model that can identify a dog breed in a photo.

10
00:00:38,070 --> 00:00:42,660
So it's going to make some predictions giving a photo so let's do that if we come back to have a look

11
00:00:42,660 --> 00:00:44,240
at our workflow.

12
00:00:44,280 --> 00:00:46,050
We fit the model to the data.

13
00:00:46,050 --> 00:00:47,510
Now it's time to make a prediction.

14
00:00:47,520 --> 00:00:51,700
So we're in we're still in Step 3 We've done a little bit we've touched on step four.

15
00:00:51,850 --> 00:00:54,310
And remember these aren't really linear steps.

16
00:00:54,360 --> 00:00:57,890
So these are just I've just put them here because it's easy to understand it like this.

17
00:00:57,900 --> 00:01:03,140
But remember they aren't necessarily linear so you can do them out of order.

18
00:01:03,300 --> 00:01:14,890
We come back we'll make a little heading making and evaluating predictions using a trained model command

19
00:01:14,900 --> 00:01:22,490
ma'am for turning it into markdown shift and into Okay so making predictions with a trained model is

20
00:01:22,490 --> 00:01:25,250
very similar to how we did it in socket loan.

21
00:01:25,250 --> 00:01:32,330
So we can call the predict function and pass it data in the same form that the model was trained on

22
00:01:32,330 --> 00:01:33,970
that's the important thing.

23
00:01:33,980 --> 00:01:41,110
Let's see the code first and then we'll discuss what's happening so make predictions on the validation

24
00:01:41,110 --> 00:01:42,380
data.

25
00:01:42,430 --> 00:01:46,800
So remember we created a validation data batch.

26
00:01:46,960 --> 00:01:50,010
Now this was not used to train on.

27
00:01:50,050 --> 00:01:51,210
That's the important point.

28
00:01:51,220 --> 00:01:56,680
If we come back to our three sets how we evaluate a model is because it trains in the training set.

29
00:01:56,740 --> 00:01:57,790
We want to check it out.

30
00:01:57,820 --> 00:02:03,880
Its initial results on the validation set and then both of these exams so the practice exam and the

31
00:02:03,880 --> 00:02:08,180
final exam are data sets that the model hasn't seen before.

32
00:02:08,200 --> 00:02:09,380
So we come back.

33
00:02:09,460 --> 00:02:10,860
Let's go.

34
00:02:11,380 --> 00:02:14,530
Model or actually we'll save it to predictions predictions.

35
00:02:14,560 --> 00:02:23,850
Equal model don't predict Val data because let's just remind ourselves of what value data is the data

36
00:02:28,050 --> 00:02:29,130
batch data set.

37
00:02:29,130 --> 00:02:29,730
There we go.

38
00:02:29,730 --> 00:02:35,460
We've got some images and some labels so we know what the true labels are of the validation.

39
00:02:35,460 --> 00:02:42,060
So when we pass this batch data set to model not predict because of the nature of the predict function

40
00:02:42,390 --> 00:02:48,900
it's only going to look at the images in our data and then make predictions based on those images.

41
00:02:48,900 --> 00:02:54,840
So then what we can do is compare those predictions which are going to be in the form of labels while

42
00:02:54,930 --> 00:02:57,300
not exactly we're going to see what they look like in a second.

43
00:02:57,480 --> 00:03:04,080
We can compare the models predictions that it makes on the validation images to the actual labels from

44
00:03:04,080 --> 00:03:06,610
the validation images.

45
00:03:06,620 --> 00:03:08,150
Now that was a lot of talking.

46
00:03:08,210 --> 00:03:10,850
It's much better to see this in encode.

47
00:03:11,900 --> 00:03:15,010
So verbose is just going to say hey when you're making predictions.

48
00:03:15,080 --> 00:03:16,280
Show me your progress

49
00:03:21,080 --> 00:03:23,580
while that was really quick because we've got 200 images.

50
00:03:23,600 --> 00:03:28,430
So that's the beauty of working on a GP you predictions are really fast as well.

51
00:03:28,550 --> 00:03:33,740
And so that only took about 187 milliseconds so not even a whole second.

52
00:03:33,740 --> 00:03:39,110
And if we didn't set verbose to equal one we wouldn't get this little progress bar at putting you might

53
00:03:39,110 --> 00:03:42,300
be looking at this and going what is going on here.

54
00:03:42,320 --> 00:03:44,610
Lots of just different numbers.

55
00:03:44,840 --> 00:03:51,920
So let's look at the shape here prediction shape for remember.

56
00:03:52,070 --> 00:03:53,760
Where does this line up.

57
00:03:53,790 --> 00:04:02,340
So if we go Len y Val two hundred.

58
00:04:02,790 --> 00:04:03,980
So there's the first shape.

59
00:04:03,980 --> 00:04:06,430
So two hundred images.

60
00:04:06,550 --> 00:04:08,980
Where do you think the second shape is coming from.

61
00:04:08,980 --> 00:04:14,150
Hundred and twenty where have we seen that before.

62
00:04:14,540 --> 00:04:16,160
Len unique breeds

63
00:04:18,690 --> 00:04:19,780
120.

64
00:04:19,920 --> 00:04:26,630
So this means that we've got an array of 200 by 120 so we have 200.

65
00:04:26,700 --> 00:04:37,880
Let's see the first element so we have 200 go predictions zero two hundred arrays of one hundred and

66
00:04:37,880 --> 00:04:41,490
twenty different numbers that are really small.

67
00:04:41,680 --> 00:04:45,150
And we're wondering what is going on here.

68
00:04:45,490 --> 00:04:50,840
Well let's find out the length of this.

69
00:04:51,050 --> 00:04:52,510
Hundred and twenty.

70
00:04:52,590 --> 00:04:53,520
Wonderful.

71
00:04:53,520 --> 00:04:58,970
So what this actually is is an associated probability

72
00:05:01,300 --> 00:05:03,180
for the likeliness.

73
00:05:03,190 --> 00:05:10,450
So basically what our model thinks a certain image is so there's a probability value here for every

74
00:05:10,450 --> 00:05:12,010
single label.

75
00:05:12,010 --> 00:05:15,640
So the value here the highest value in this predictions array.

76
00:05:15,670 --> 00:05:17,740
So the zero predictions array.

77
00:05:18,100 --> 00:05:19,200
This is for one image.

78
00:05:19,210 --> 00:05:26,570
The first image in the validation data so the highest value in here is going to correspond to the index

79
00:05:26,660 --> 00:05:31,790
of the label that the model thinks is most likely.

80
00:05:31,960 --> 00:05:36,430
And again this is a lot of talking it's gonna make a bit more sense once we start to put it together

81
00:05:36,430 --> 00:05:39,490
with code but just bear with me for a second.

82
00:05:39,580 --> 00:05:46,030
If we sum up all of these they're going to equal very close to 1 or maybe 1 Exactly and I say very close

83
00:05:47,010 --> 00:05:51,810
because the way computer store numbers is not exactly.

84
00:05:51,870 --> 00:05:57,830
So if you keep going decimal points right it's gonna get very close to one but it might not be exactly

85
00:05:57,830 --> 00:06:02,330
one and you might be saying Daniel what are you telling me there's the way the computer store numbers

86
00:06:02,330 --> 00:06:03,400
is not exact.

87
00:06:03,410 --> 00:06:10,190
Well I don't even fully understand it but you just have to know that when a computer stores a number

88
00:06:10,190 --> 00:06:15,890
that has lots of decimals because of the way it's most efficient to actually stored on a computer chip.

89
00:06:15,890 --> 00:06:18,420
It's not going to be perfectly exact.

90
00:06:18,440 --> 00:06:24,350
So that's why I say when you sum up a single prediction array it's going to be very close to one and

91
00:06:24,350 --> 00:06:29,510
now I'm going to tie this back up into where we've come back into our model.

92
00:06:29,510 --> 00:06:33,380
This is why we've used soft Max activation.

93
00:06:33,380 --> 00:06:34,750
You might be saying Daniel far out.

94
00:06:34,760 --> 00:06:36,220
This is a video as ago go.

95
00:06:36,230 --> 00:06:40,640
You've already told us about soft Max it didn't really make sense and it's really not making sense now.

96
00:06:40,640 --> 00:06:47,090
But if we go back to soft Max we just go Yeah how about we go what is soft Max.

97
00:06:47,090 --> 00:06:56,580
If we come back to our friendly Wikipedia page it's going to tell us that but after applying soft Max

98
00:06:56,670 --> 00:07:04,840
each component will be in the interval between 0 to 1 and the components will add up to 1 so that is

99
00:07:04,840 --> 00:07:06,040
what we're getting here.

100
00:07:07,070 --> 00:07:14,310
We're getting our components because we've used a soft Max activation in the last layer of our network.

101
00:07:14,660 --> 00:07:24,300
It's output it an array a prediction array that is 120 in length and all of these values in prediction

102
00:07:24,300 --> 00:07:26,770
0 and each other.

103
00:07:26,910 --> 00:07:27,630
Prediction here.

104
00:07:27,630 --> 00:07:30,440
So predictions 1 will be very close to 1 as well.

105
00:07:30,720 --> 00:07:38,390
So predictions 1 add up to 1 or very close to 1 0.

106
00:07:38,400 --> 00:07:39,950
Predictions.

107
00:07:39,960 --> 00:07:40,570
There we go.

108
00:07:42,890 --> 00:07:47,870
See this is just over 1 so very close to one of the more decimals there are the less exact a number

109
00:07:47,870 --> 00:07:48,900
is with a computer.

110
00:07:49,850 --> 00:07:55,300
And so let's put this all together because right now it's just been a lot of talking.

111
00:07:55,300 --> 00:07:56,730
I want to show you a concrete example.

112
00:07:56,740 --> 00:07:58,780
Let's take predictions zero.

113
00:07:58,870 --> 00:08:00,520
So we're going to remove these cells

114
00:08:04,190 --> 00:08:06,100
and just one little tidbit as well.

115
00:08:06,110 --> 00:08:09,380
Remember our mobile Net V2 architecture.

116
00:08:09,380 --> 00:08:15,260
All that that our soft Max layer is doing is taking these 1280 output.

117
00:08:15,290 --> 00:08:22,080
This array of numbers and changing it into an array of numbers like this of length one hundred and twenty

118
00:08:22,580 --> 00:08:25,060
using a soft Max activation.

119
00:08:25,280 --> 00:08:31,520
Because remember for four multi class classification problems we want a soft Max activation but if we

120
00:08:31,520 --> 00:08:34,900
were doing only single classification we'd use sigmoid.

121
00:08:35,170 --> 00:08:37,130
So to come back here.

122
00:08:37,130 --> 00:08:38,620
Let's check it out.

123
00:08:38,630 --> 00:08:45,470
So the first prediction I'm going to type out a few little indexing tricks here and I just want you

124
00:08:45,470 --> 00:08:51,020
to just watch on I'll talk through it a little bit as it goes but I want you to start thinking about

125
00:08:51,320 --> 00:08:54,630
how these things all tied together so let's do it.

126
00:08:55,910 --> 00:09:01,330
Because what our goal now is our model has made some predictions here.

127
00:09:01,430 --> 00:09:06,590
But our goal is to understand these predictions furthers our computer understands these right because

128
00:09:06,590 --> 00:09:07,610
they're old numeric.

129
00:09:07,610 --> 00:09:13,850
So remember at the start how we spent a lot of time turning our data into numbers kind of when you get

130
00:09:13,850 --> 00:09:17,280
to the output of a machine learning model right.

131
00:09:17,280 --> 00:09:20,910
So this is what we're focused on we spent a lot of time preparing our inputs.

132
00:09:20,910 --> 00:09:26,660
Now we're going to spend a lot of time preparing our outputs which is what we're doing now so let's

133
00:09:26,660 --> 00:09:38,680
go print prediction zero and then what we're going to do is go find the max value probability so remember

134
00:09:38,680 --> 00:09:44,540
the function inside get loan you might not call predict program which outputs a prediction probability.

135
00:09:44,590 --> 00:09:50,130
This is what predict does by default in tensor flow when we go here.

136
00:09:50,140 --> 00:09:56,410
Prediction and pay Max predictions zero.

137
00:09:56,410 --> 00:10:02,660
So we're finding let's remind ourselves prediction 0 what it looks like.

138
00:10:03,000 --> 00:10:11,420
We're finding the max value in here that's all that line does and then what are we going to do next.

139
00:10:11,830 --> 00:10:13,680
So print we want to.

140
00:10:15,070 --> 00:10:15,700
What else do we want.

141
00:10:15,700 --> 00:10:24,220
We want to find the sum so NDP are actually we might set this up so we can do multiple so index equals

142
00:10:24,220 --> 00:10:29,460
zero just so we can see it with multiple examples.

143
00:10:29,460 --> 00:10:37,260
So the sum sum is going to be NDP some predictions index.

144
00:10:37,260 --> 00:10:43,320
Nice and simple then we're gonna go wow that was a terrible typo.

145
00:10:43,320 --> 00:10:49,410
Daniel come on we're gonna go to the max index so find the index in this array which has the max value

146
00:10:49,980 --> 00:10:58,710
so which is equal to max value NDP ARG Max predictions index.

147
00:10:58,710 --> 00:11:01,350
And of course what we're doing now is a little bit tedious.

148
00:11:01,350 --> 00:11:03,310
We're going to function ize this later on.

149
00:11:03,330 --> 00:11:08,730
So you could imagine if we just kept writing cells like this this is where we're functioning using all

150
00:11:08,730 --> 00:11:15,600
of our code as much as we can so that we don't just have to continually write these type of print statements

151
00:11:16,080 --> 00:11:26,970
to explore our data or explore our predictions predictions index wonderful predicted label now before

152
00:11:26,970 --> 00:11:31,830
we run this cell we're going against our cardinal rule of if in doubt run the code once you do just

153
00:11:31,830 --> 00:11:36,070
have a think about what this one is doing.

154
00:11:36,290 --> 00:11:38,390
Let me remind you of what unique breeds is

155
00:11:44,210 --> 00:11:51,610
so these are all our dog breeds so if we have predictions and we find the max index of it and then we

156
00:11:51,610 --> 00:11:56,260
find where that index occurs in unique breeds What do you think that will return hint we've typed it

157
00:11:56,260 --> 00:11:58,520
out here all right.

158
00:11:58,530 --> 00:12:01,200
Let's delete this bad boy and then we'll shift into

159
00:12:06,480 --> 00:12:07,820
wonderful.

160
00:12:07,890 --> 00:12:09,890
So here's index 0.

161
00:12:10,020 --> 00:12:12,350
This is that array just the exact same here.

162
00:12:12,570 --> 00:12:15,060
The output of our model when we call predict.

163
00:12:15,060 --> 00:12:21,720
So the max value the probability of prediction the maximum this can be as one because remember the total

164
00:12:21,720 --> 00:12:29,710
of a soft max function is between 0 1 so the highest a single value can be is 1.

165
00:12:29,720 --> 00:12:37,320
So this is saying we're predicting the highest label in here with a twenty one point six percent prediction

166
00:12:37,320 --> 00:12:38,580
probability.

167
00:12:38,580 --> 00:12:41,760
Now the sum of all of these values is very close to 1.

168
00:12:41,850 --> 00:12:48,270
Again aligning with the soft Max rule between 0 and 1 and will add up to 1.

169
00:12:48,270 --> 00:12:54,330
But because of how computers calculate numbers not exactly 1 and the maximum index are where this value

170
00:12:54,330 --> 00:12:57,720
occurs is 17.

171
00:12:57,720 --> 00:13:01,100
So if we counted three we find 17 somewhere maybe there.

172
00:13:01,110 --> 00:13:02,940
That's it.

173
00:13:03,010 --> 00:13:06,180
Now the predicted label is border area.

174
00:13:06,190 --> 00:13:13,730
So for index 0 or for validation image 0 the predicted label is Border Terrier.

175
00:13:13,730 --> 00:13:14,870
Now let's change this up.

176
00:13:14,870 --> 00:13:16,830
What if we did.

177
00:13:17,180 --> 00:13:20,000
What if we did INDEX What's our favorite number 42

178
00:13:24,380 --> 00:13:28,220
so we got max value probability of prediction so this is a lot higher.

179
00:13:28,220 --> 00:13:31,350
This is seventy five percent so it's pretty confident with this one.

180
00:13:31,640 --> 00:13:37,310
So you could say the higher this value is to 1 the closer it is to 1 the more confident a model is about

181
00:13:37,310 --> 00:13:45,180
a certain prediction the sum is very close to 1 the maximum index is 113 and the predicted label is

182
00:13:45,180 --> 00:13:46,210
walking hand.

183
00:13:46,230 --> 00:13:55,860
So if we go here unique breeds if we were to find index number one hundred and thirteen we would find

184
00:13:55,860 --> 00:13:56,870
walk a hand.

185
00:13:56,940 --> 00:13:57,990
There we go walk around

186
00:14:01,020 --> 00:14:05,630
113 walk a hand.

187
00:14:05,710 --> 00:14:13,420
So this is how we convert our prediction probabilities in numeric form something that our computer understands

188
00:14:13,960 --> 00:14:17,560
to something that we understand a predicted label.

189
00:14:17,560 --> 00:14:24,220
So now we've got a kind of way to turn our predictions into something that we can understand what we'll

190
00:14:24,220 --> 00:14:31,120
start to do next is we'll create a bit of a function that's going to allow us to find the predicted

191
00:14:31,120 --> 00:14:39,460
labels for every sample in the validation set So about 200 or so images and then we're going to UN batch

192
00:14:39,730 --> 00:14:49,210
this dataset the validation data and compare the predictions in a visual way to the truth labels now

193
00:14:49,210 --> 00:14:53,250
again a lot of talking but in the next video we'll start to build out that functionality and we'll see

194
00:14:53,250 --> 00:14:55,070
what it looks like in reality.

195
00:14:55,230 --> 00:15:00,450
And I'm so excited right because one of the funnest parts is comparing your model's predictions with

196
00:15:00,450 --> 00:15:03,660
what's actually going on dog vision is coming to life.