1
00:00:00,300 --> 00:00:01,033
I'm a friend.

2
00:00:01,033 --> 00:00:03,800
Are you ready to build
that convolutional neural network?

3
00:00:03,800 --> 00:00:06,933
We have a long
but yet very exciting journey ahead of us.

4
00:00:07,000 --> 00:00:09,566
Let's do this. Let's kick this off.

5
00:00:09,566 --> 00:00:09,900
All right,

6
00:00:09,900 --> 00:00:11,866
so we're going to start
with the very essential step

7
00:00:11,866 --> 00:00:15,933
importing the libraries, which will only
consist of importing TensorFlow.

8
00:00:15,933 --> 00:00:19,266
And actually the preprocessing module
by the Keras library.

9
00:00:19,400 --> 00:00:23,166
So let's do this first quickly
and efficiently in a new code cell.

10
00:00:23,166 --> 00:00:26,166
So actually you know how to import
TensorFlow.

11
00:00:26,200 --> 00:00:29,133
We start with the import command.

12
00:00:29,133 --> 00:00:32,133
Then we specify
the name of the library TensorFlow.

13
00:00:32,400 --> 00:00:35,333
And we add the shortcut tf.

14
00:00:35,333 --> 00:00:36,900
Just like before with the a.

15
00:00:36,900 --> 00:00:40,966
And then I would like to import
something else which will allow us to do

16
00:00:41,100 --> 00:00:45,200
the image preprocessing in part one,
and which is the image

17
00:00:45,233 --> 00:00:48,600
submodule of the pre-processing module
of the Keras library.

18
00:00:48,966 --> 00:00:53,300
And therefore here we're going to start
from, well, the Keras library

19
00:00:53,633 --> 00:00:58,166
from which we're going to get access
to the pre-processing module

20
00:00:58,166 --> 00:01:02,500
or from which we're going to import
the image submodule.

21
00:01:02,933 --> 00:01:06,266
And the reason why we want to import
this is because we want to import

22
00:01:06,266 --> 00:01:10,800
a specific class,
which is the image data generator.

23
00:01:11,133 --> 00:01:13,333
And I will explain very quickly
what this is about.

24
00:01:13,333 --> 00:01:17,866
But this is absolutely compulsory in part
one data preprocessing.

25
00:01:17,866 --> 00:01:20,500
You know when pre-processing your images.

26
00:01:20,500 --> 00:01:21,500
So let's import it.

27
00:01:21,500 --> 00:01:27,466
Here we just need to add import
and then image data generator.

28
00:01:28,500 --> 00:01:30,000
Okay good.

29
00:01:30,000 --> 00:01:33,333
I will explain very soon
what this is about and how we will use it.

30
00:01:33,566 --> 00:01:34,300
All right.

31
00:01:34,300 --> 00:01:37,033
And then you know this
other thing that I like doing

32
00:01:37,033 --> 00:01:40,533
just to show you that indeed
we are working with TensorFlow 2.0.

33
00:01:40,800 --> 00:01:44,800
I just want to print the version
of TensorFlow we're using right now.

34
00:01:44,900 --> 00:01:47,900
And remember to do this
we need to call TensorFlow first.

35
00:01:48,033 --> 00:01:52,166
And then after a dot a double underscore
and then version

36
00:01:52,266 --> 00:01:53,800
and double underscore again.

37
00:01:53,800 --> 00:01:56,900
And this will you know
print in the output.

38
00:01:56,900 --> 00:01:59,000
The version of TensorFlow we're using.

39
00:01:59,000 --> 00:02:01,866
This is just to make sure
we're working with TensorFlow 2.0.

40
00:02:01,866 --> 00:02:03,500
However you know depending on

41
00:02:03,500 --> 00:02:06,233
when you run this code
you know after I record this tutorial

42
00:02:06,233 --> 00:02:09,800
you might have a different version,
but you will definitely get a TensorFlow

43
00:02:09,833 --> 00:02:11,800
two version. Okay.

44
00:02:11,800 --> 00:02:12,166
All right.

45
00:02:12,166 --> 00:02:14,233
So here I just execute the first cell

46
00:02:14,233 --> 00:02:18,066
importing TensorFlow end image
preprocessing module like Keras.

47
00:02:18,233 --> 00:02:22,966
And now let's run this
to indeed reassure ourselves

48
00:02:22,966 --> 00:02:26,766
that we are working with TensorFlow 2.0,
which is

49
00:02:26,766 --> 00:02:29,600
so much better than TensorFlow one.

50
00:02:29,600 --> 00:02:30,733
All right. Good.

51
00:02:30,733 --> 00:02:33,600
So now we can move on to part
one data preprocessing

52
00:02:33,600 --> 00:02:35,366
which will be done in two steps.

53
00:02:35,366 --> 00:02:37,433
First, preprocessing the training set.

54
00:02:37,433 --> 00:02:40,300
And second preprocessing the test set.

55
00:02:40,300 --> 00:02:41,866
So let's start with the training set.

56
00:02:41,866 --> 00:02:43,833
And let's create a new code cell.

57
00:02:43,833 --> 00:02:46,966
And now let me explain
how we're going to do this.

58
00:02:47,633 --> 00:02:48,000
All right.

59
00:02:48,000 --> 00:02:51,466
So how are we going to preprocess
our images.

60
00:02:51,833 --> 00:02:54,600
Well we're actually going to do
multiple things.

61
00:02:54,600 --> 00:02:58,966
The first thing we'll do is
we will apply some transformations

62
00:02:58,966 --> 00:03:01,766
on all the images of the training set.

63
00:03:01,766 --> 00:03:03,700
The images are the training set only.

64
00:03:03,700 --> 00:03:06,700
We won't apply these same transformations
on the test set.

65
00:03:06,733 --> 00:03:09,933
The reason why we want to apply
some transformations

66
00:03:09,933 --> 00:03:13,533
on the images of the training set
is for only one purpose.

67
00:03:13,733 --> 00:03:16,166
It is to avoid overfitting.

68
00:03:16,166 --> 00:03:18,900
Indeed,
if we don't apply these transformations

69
00:03:18,900 --> 00:03:22,466
well,
when training our CNN on the training set,

70
00:03:22,666 --> 00:03:27,033
we will get a huge difference
between the accuracy on the training set

71
00:03:27,033 --> 00:03:30,200
and the one on the test set,
you know, on the evaluation set.

72
00:03:30,433 --> 00:03:33,866
Actually, we will get very high accuracies
on the training set,

73
00:03:33,866 --> 00:03:38,333
you know, close to 98%
and much lower accuracies on the test set.

74
00:03:38,600 --> 00:03:40,233
And that is called overfitting.

75
00:03:40,233 --> 00:03:43,233
And that's something
we absolutely need to avoid.

76
00:03:43,300 --> 00:03:44,700
Anyway, you know, whether you're

77
00:03:44,700 --> 00:03:49,133
working on a classic data set or working
for computer vision and for computer

78
00:03:49,133 --> 00:03:52,266
vision, well, the way to avoid overfitting

79
00:03:52,400 --> 00:03:55,400
is, as I said, to apply transformations.

80
00:03:55,700 --> 00:03:56,600
So that was the why.

81
00:03:56,600 --> 00:04:01,000
And now let me explain to what you know,
what are these transformations.

82
00:04:01,000 --> 00:04:04,100
And then I will proceed in the end to the
how how are we going to implement that.

83
00:04:04,500 --> 00:04:07,500
So the what
what are these transformations is.

84
00:04:07,633 --> 00:04:10,733
Well,
some simple geometrical transformations

85
00:04:10,733 --> 00:04:14,766
or some zooms
or some rotations on your images.

86
00:04:14,966 --> 00:04:18,466
So basically we're going to apply
some geometrical transformations

87
00:04:18,466 --> 00:04:21,766
like transactions
to shift some of the pixels.

88
00:04:22,200 --> 00:04:24,333
Then we're going to rotate a bit
the images.

89
00:04:24,333 --> 00:04:26,366
We're going to do some horizontal flips.

90
00:04:26,366 --> 00:04:28,800
We're going to do some zoom
in and zoom out.

91
00:04:28,800 --> 00:04:32,933
Well you know we're going to apply
a series of transformation so as to modify

92
00:04:32,933 --> 00:04:36,233
the images
and get them as we say, augment it.

93
00:04:36,466 --> 00:04:38,300
In fact, the technical term

94
00:04:38,300 --> 00:04:41,300
of what we're going to do now,
you know, with all these transformations

95
00:04:41,300 --> 00:04:46,333
is called image augmentation,
which consists basically of transforming

96
00:04:46,333 --> 00:04:51,666
your images of the training set
so that your CNN model doesn't over learn,

97
00:04:51,666 --> 00:04:54,966
you know, it's not over
trained on the existing images,

98
00:04:54,966 --> 00:04:58,800
because by applying these transformations,
we will get new images,

99
00:04:59,000 --> 00:05:02,100
which is the reason why we call this image
augmentation.

100
00:05:02,100 --> 00:05:06,166
We basically augment the variety,
you know, the diversity of the training

101
00:05:06,166 --> 00:05:07,533
set images.

102
00:05:07,533 --> 00:05:09,233
All right. So that is the what.

103
00:05:09,233 --> 00:05:12,566
And now we're going to proceed to the how
and to proceed to the how.

104
00:05:12,566 --> 00:05:16,400
I'm going to take you to the Keras API
because you have to see it.

105
00:05:16,600 --> 00:05:18,266
You know,
just like what we did with scikit

106
00:05:18,266 --> 00:05:21,366
learn, I'm going to show you
and guide you through the curves API

107
00:05:21,500 --> 00:05:24,500
to find the exact tool
we're going to use for this.

108
00:05:24,733 --> 00:05:27,433
So let's open a new tab here.

109
00:05:27,433 --> 00:05:28,066
There we go.

110
00:05:28,066 --> 00:05:32,233
And in the search bar
let's enter just Keras Keras like that.

111
00:05:32,400 --> 00:05:33,366
Let's press enter.

112
00:05:33,366 --> 00:05:35,233
And let's just get the first link.

113
00:05:35,233 --> 00:05:36,600
There is only one Keras.

114
00:05:36,600 --> 00:05:39,600
And that's of course
the deep learning library in Python

115
00:05:40,100 --> 00:05:41,533
developed by Francois Shelley.

116
00:05:41,533 --> 00:05:44,866
By the way,
a very talented French data scientist.

117
00:05:45,300 --> 00:05:45,600
All right.

118
00:05:45,600 --> 00:05:48,600
So let's go now to API docs.

119
00:05:48,900 --> 00:05:52,200
And now, my friends,
welcome to the Keras API.

120
00:05:52,233 --> 00:05:55,233
This is probably my favorite
deep learning library.

121
00:05:55,233 --> 00:05:57,100
It's absolutely fantastic.

122
00:05:57,100 --> 00:05:58,233
And now where we want to go is

123
00:05:58,233 --> 00:06:02,300
of course to data preprocessing
which includes of course three things.

124
00:06:02,300 --> 00:06:03,633
Actually you have to know it

125
00:06:03,633 --> 00:06:07,333
image data preprocessing, which is what
we're about to use right now, but then

126
00:06:07,333 --> 00:06:11,533
also time series data preprocessing
and also text data preprocessing.

127
00:06:11,533 --> 00:06:16,633
You can also do some deep NLP,
you know, NLP with deep learning with CAS.

128
00:06:17,100 --> 00:06:20,733
But now of course we're looking
for something in image data preprocessing.

129
00:06:20,933 --> 00:06:23,966
And let me show you
exactly what that something is.

130
00:06:24,200 --> 00:06:25,300
We just need to scroll down.

131
00:06:25,300 --> 00:06:27,100
Well actually
you already know what this is

132
00:06:27,100 --> 00:06:29,966
because we already import the class,
but there it is.

133
00:06:29,966 --> 00:06:35,700
I'm talking, of course, about the image
data generator class, which will indeed

134
00:06:35,800 --> 00:06:40,866
generate batches of tensor image data
with real time data augmentation,

135
00:06:40,866 --> 00:06:45,200
which is exactly what I've just explained
and I haven't mentioned the batches yet.

136
00:06:45,200 --> 00:06:46,166
Well, that's because, you know,

137
00:06:46,166 --> 00:06:50,133
we will create different batches
of actually 32 images.

138
00:06:50,500 --> 00:06:53,833
And these images will
either be the original ones or, you know,

139
00:06:53,833 --> 00:06:57,933
the augmented ones, the transformed ones
after we apply the transformations.

140
00:06:58,533 --> 00:07:01,066
And speaking of applying
these transformations, well,

141
00:07:01,066 --> 00:07:04,066
we're going to do that
exactly with this image data.

142
00:07:04,066 --> 00:07:07,666
Generate a class for which
you will find all the arguments here.

143
00:07:07,666 --> 00:07:11,233
And, you know, most of them correspond
to different transformations.

144
00:07:11,466 --> 00:07:14,533
I can already tell you
that we will use the zoom range,

145
00:07:14,533 --> 00:07:18,566
which consists of zooming in or
zooming out on the images, but also we'll

146
00:07:18,566 --> 00:07:23,033
use the horizontal flip, which consists
of flipping the images horizontally.

147
00:07:23,333 --> 00:07:26,500
And then we will also use this one,
the shear range,

148
00:07:26,633 --> 00:07:28,233
which is some kind of transfection.

149
00:07:28,233 --> 00:07:31,733
You can check it online, but no need
to understand this and all the details.

150
00:07:31,733 --> 00:07:34,566
Just know
that it's a geometrical transformation.

151
00:07:34,566 --> 00:07:38,233
And if you want to go further that it's
actually some kind of transfection.

152
00:07:38,500 --> 00:07:39,466
But there we go.

153
00:07:39,466 --> 00:07:41,466
These are the three transformations.

154
00:07:41,466 --> 00:07:45,800
We'll use the shear range, the zoom range
and the horizontal flip.

155
00:07:46,066 --> 00:07:50,133
And now I'm sure some of you are asking
why do we use these transformations?

156
00:07:50,400 --> 00:07:52,033
Well, I'll be honest with you.

157
00:07:52,033 --> 00:07:56,566
The reason I'm using them is
because I simply took, you know, the code

158
00:07:56,566 --> 00:08:01,433
snippet example from Keras,
which is right below exactly here.

159
00:08:01,733 --> 00:08:06,666
This is the code snippet example
using the image data generator class.

160
00:08:06,666 --> 00:08:09,666
And as you can see,
we use a shearing transformation,

161
00:08:09,666 --> 00:08:13,000
a zoom transformation,
and a horizontal flip transformation.

162
00:08:13,000 --> 00:08:14,566
And we're just going to do the same.

163
00:08:14,566 --> 00:08:17,500
But of course feel free
to try some other transformations.

164
00:08:17,500 --> 00:08:20,533
Who knows, maybe you'll get better
accuracy in the end.

165
00:08:20,600 --> 00:08:22,366
Okay. But let's just trust is.

166
00:08:22,366 --> 00:08:26,166
And actually I trust this because
of course I tried it on our future CNN,

167
00:08:26,166 --> 00:08:27,200
which we're about to build.

168
00:08:27,200 --> 00:08:30,833
And you're going to see that the results
in the end will be absolutely amazing.

169
00:08:31,033 --> 00:08:32,633
Okay. So let's just take this.

170
00:08:32,633 --> 00:08:35,933
Let's just take this code snippet to,
you know,

171
00:08:35,933 --> 00:08:39,566
actually get the tool
that will apply these transformations.

172
00:08:39,833 --> 00:08:42,833
Then of course we'll have to connect
the tool to our training set.

173
00:08:43,166 --> 00:08:45,266
So back into our implementation.

174
00:08:45,266 --> 00:08:48,000
Well let's base that right here.

175
00:08:48,000 --> 00:08:51,933
And this as you can see creates an object
which we call train

176
00:08:51,966 --> 00:08:55,133
data gen of the image data
generated class.

177
00:08:55,133 --> 00:08:59,466
So train data gen is an instance
of that image data generator class.

178
00:08:59,466 --> 00:09:02,700
And which represents of course
the tool that will apply

179
00:09:02,700 --> 00:09:06,566
all the transformations
on the images of the training set.

180
00:09:06,866 --> 00:09:08,533
And there is one I haven't mentioned.

181
00:09:08,533 --> 00:09:12,233
You know, I mentioned and explain these
three ones which are the transformations.

182
00:09:12,566 --> 00:09:14,333
But we also notice this one.

183
00:09:14,333 --> 00:09:17,800
Rescale equals one divided by 255.

184
00:09:18,133 --> 00:09:20,100
Can you guess what this is about?

185
00:09:20,100 --> 00:09:23,900
You know, we already saw this many times
on our classic data set.

186
00:09:24,100 --> 00:09:26,933
Well,
this is of course about feature scaling.

187
00:09:26,933 --> 00:09:31,966
This will apply feature scaling to each
and every single one of your pixels

188
00:09:32,166 --> 00:09:35,366
by dividing their value by 255.

189
00:09:35,366 --> 00:09:40,733
Because remember that each pixel
takes a value between 0 and 255.

190
00:09:40,733 --> 00:09:45,900
So by dividing all of them by 255,
we indeed get all the pixel values

191
00:09:46,066 --> 00:09:49,366
between 0 and 1,
which is just like a normalization.

192
00:09:49,600 --> 00:09:53,166
And once again, feature scaling
is absolutely compulsory

193
00:09:53,166 --> 00:09:56,400
for neural networks,
you know, in training neural networks.

194
00:09:56,700 --> 00:09:59,133
All right.
So basically this is feature scaling.

195
00:09:59,133 --> 00:10:02,466
And these are the transformations
that will perform

196
00:10:02,666 --> 00:10:05,900
image augmentation
on the images of the training set.

197
00:10:05,900 --> 00:10:10,100
And this I remind is in order to prevent
overfitting in the end

198
00:10:10,100 --> 00:10:13,566
you can try actually you know the future
training will have without these.

199
00:10:13,566 --> 00:10:16,266
And you will see what I mean
by overfitting.

200
00:10:16,266 --> 00:10:18,500
All right. Good. So that's not it.

201
00:10:18,500 --> 00:10:19,100
That's not it.

202
00:10:19,100 --> 00:10:22,366
You know, for the training set
preprocessing, we need to of course now

203
00:10:22,366 --> 00:10:26,933
connect that train data
gen object to our training set.

204
00:10:26,933 --> 00:10:28,566
You know to our training set images.

205
00:10:28,566 --> 00:10:30,633
So far this is just the object.

206
00:10:30,633 --> 00:10:34,166
And so the way we're going to do
this is we will

207
00:10:34,166 --> 00:10:37,166
of course go back to our Keras API.

208
00:10:37,366 --> 00:10:41,600
Because indeed the way to do
this is just to take this next code here

209
00:10:41,833 --> 00:10:45,500
that will actually import the training set

210
00:10:45,733 --> 00:10:48,733
by accessing it from,
you know, our directory.

211
00:10:49,000 --> 00:10:52,566
And at the same time
creating these batches and resizing

212
00:10:52,700 --> 00:10:56,166
the images, you know, in case
we need to resize them in order to

213
00:10:56,333 --> 00:11:00,000
reduce the computations of the machine,
you know, to make it less compute

214
00:11:00,000 --> 00:11:03,200
intensive, which is what we'll do,
because we will see that

215
00:11:03,200 --> 00:11:06,233
with a lower size
will still get amazing results in the end.

216
00:11:06,633 --> 00:11:07,600
So let's get this.

217
00:11:07,600 --> 00:11:10,600
And once again
I will explain all this code.

218
00:11:10,600 --> 00:11:13,700
And mostly
we will have to change it the right way

219
00:11:13,700 --> 00:11:16,700
so that we can adapt
it indeed to our situation.

220
00:11:17,133 --> 00:11:19,133
So let's take it step by step.

221
00:11:19,133 --> 00:11:22,333
This is actually the name
you want to give to

222
00:11:22,366 --> 00:11:25,433
your training set,
which you are importing in the notebook.

223
00:11:25,666 --> 00:11:27,566
And let's just give the usual names.

224
00:11:27,566 --> 00:11:32,100
We're going to call that training
underscore set just like before.

225
00:11:32,500 --> 00:11:35,833
Then we take indeed our train data
gen object,

226
00:11:35,833 --> 00:11:38,833
that instance of the image data
generator class.

227
00:11:38,866 --> 00:11:43,733
And from this object we're going to call
a method of this class.

228
00:11:43,733 --> 00:11:46,633
Right. Because this class
is every class contains methods.

229
00:11:46,633 --> 00:11:50,366
And one of them is this flow
from directory which will just simply,

230
00:11:50,533 --> 00:11:55,733
you know, connect this image augmentation
tool to the images of your training set.

231
00:11:56,333 --> 00:11:56,800
All right.

232
00:11:56,800 --> 00:11:59,433
Then let's have a look
at the different parameters.

233
00:11:59,433 --> 00:12:05,300
So the first one here is actually
the path leading to your training set.

234
00:12:05,633 --> 00:12:09,233
And so of course we have to change this
because we have a different path

235
00:12:09,233 --> 00:12:10,600
to our data set.

236
00:12:10,600 --> 00:12:12,100
So this is a whole folder

237
00:12:12,100 --> 00:12:15,100
which I've shared with you
at the beginning of this section.

238
00:12:15,133 --> 00:12:17,033
And this is also the root folder.

239
00:12:17,033 --> 00:12:18,433
You know this is the base of the folder.

240
00:12:18,433 --> 00:12:20,300
You know the beginning of the path.

241
00:12:20,300 --> 00:12:23,100
And so now in order
to access the training set

242
00:12:23,100 --> 00:12:26,233
well we first need to specify
that we want to go into this data

243
00:12:26,233 --> 00:12:29,233
set folder
and then into this training set folder.

244
00:12:29,233 --> 00:12:32,533
And that's exactly the path
leading to the training set.

245
00:12:32,833 --> 00:12:36,233
And therefore here
you know in this parameter of the flow

246
00:12:36,233 --> 00:12:37,666
from directory function.

247
00:12:37,666 --> 00:12:41,266
Well we simply need to replace data here
by data set

248
00:12:41,466 --> 00:12:44,466
and then train here by training set.

249
00:12:44,700 --> 00:12:45,000
All right.

250
00:12:45,000 --> 00:12:48,700
This is a simple path
leading to the train set folder starting

251
00:12:48,700 --> 00:12:52,433
from the root of our directory folder
okay good.

252
00:12:52,600 --> 00:12:54,900
Now next argument target size.

253
00:12:54,900 --> 00:12:58,166
That's indeed
the final size of your images

254
00:12:58,166 --> 00:13:02,233
when they, you know, will be fed
into the convolutional neural network.

255
00:13:02,633 --> 00:13:06,266
And actually I tried with 150 by 150.

256
00:13:06,500 --> 00:13:09,233
And that's actually made the training
very very long.

257
00:13:09,233 --> 00:13:14,500
So I actually wanted to reduce
that to, you know, 64 by

258
00:13:15,466 --> 00:13:17,666
64. And that's totally fine.

259
00:13:17,666 --> 00:13:19,566
This will make the training much faster.

260
00:13:19,566 --> 00:13:21,800
And still we will have amazing results.

261
00:13:21,800 --> 00:13:23,366
You'll see that at the end.

262
00:13:23,366 --> 00:13:26,666
Then the batch size is,
you know, the size of the batches, meaning

263
00:13:26,666 --> 00:13:29,200
how many images
we want to have in each batch.

264
00:13:29,200 --> 00:13:31,500
And the 32 is a classic default value.

265
00:13:31,500 --> 00:13:33,833
And we're going to keep that.
That will be totally fine.

266
00:13:33,833 --> 00:13:34,600
And finally

267
00:13:34,600 --> 00:13:38,966
we have to specify the class mode,
which is either binary or categorical.

268
00:13:39,300 --> 00:13:42,866
And of course since now
we have a binary outcome, you know, cat

269
00:13:42,866 --> 00:13:46,800
or dog, well, we have to choose of course
class mode equals binary.

270
00:13:47,233 --> 00:13:48,533
Okay, perfect.

271
00:13:48,533 --> 00:13:52,133
And that closes
the pre-processing of the training set.

272
00:13:52,133 --> 00:13:55,266
We are done with this
first step of data preprocessing.

273
00:13:55,266 --> 00:14:00,000
And so now we're going to move on to
the next step preprocessing the test set.

274
00:14:00,366 --> 00:14:04,200
And of course in the spirit of always
be as much efficient as we can,

275
00:14:04,233 --> 00:14:07,200
well,
we're going to go back to our Keras API.

276
00:14:07,200 --> 00:14:10,433
And we're just going to take this time,
this line of code

277
00:14:10,433 --> 00:14:14,266
to, you know, get that same image data
generator

278
00:14:14,266 --> 00:14:18,133
object to, you know, apply
the transformations to the test images.

279
00:14:18,366 --> 00:14:19,833
But be careful.

280
00:14:19,833 --> 00:14:22,200
We're not going to apply
the same transformations here,

281
00:14:22,200 --> 00:14:24,900
such as shearing the zoom
and the horizontal flip,

282
00:14:24,900 --> 00:14:27,666
because of course
we don't want to touch the test images

283
00:14:27,666 --> 00:14:31,466
because they're like new images, like when
deploying our model in production.

284
00:14:31,633 --> 00:14:35,433
And therefore, of course, we have to
keep them intact like the original ones.

285
00:14:35,733 --> 00:14:41,200
However, what we have to do to them
is indeed to rescale their pixels.

286
00:14:41,200 --> 00:14:43,133
And that's the same as before, you know,

287
00:14:43,133 --> 00:14:47,366
remember when we were applying feature
scaling to our training set and test it?

288
00:14:47,566 --> 00:14:50,833
Well, we used the fit transform
method on the training set,

289
00:14:50,833 --> 00:14:53,833
but only the trend
for method on the test set.

290
00:14:54,000 --> 00:14:57,633
And that was of course to avoid
information leakage from the test set.

291
00:14:57,966 --> 00:14:59,800
And well, here that's exactly the same.

292
00:14:59,800 --> 00:15:00,566
We have to keep

293
00:15:00,566 --> 00:15:04,466
the images of the test set intact
by not applying any transformation.

294
00:15:04,666 --> 00:15:08,566
However, we have to feature scaled them
because once again, the future predict

295
00:15:08,566 --> 00:15:12,566
method of the CNN
will have to be applied to the same scale

296
00:15:12,833 --> 00:15:15,466
as the one that was applied
on the training set.

297
00:15:15,466 --> 00:15:17,300
So you see,
this is exactly the same as before.

298
00:15:17,300 --> 00:15:17,766
It's just that

299
00:15:17,766 --> 00:15:22,200
we are using some different classes,
but which are, after all, the same tools.

300
00:15:22,700 --> 00:15:23,000
All right.

301
00:15:23,000 --> 00:15:24,333
So let's get this and let's

302
00:15:24,333 --> 00:15:27,900
put that back into our implementation
in a new coat cell.

303
00:15:28,200 --> 00:15:29,600
So we're going to paste that.

304
00:15:29,600 --> 00:15:31,866
We're going to keep the same name
for the object.

305
00:15:31,866 --> 00:15:33,066
That's solely fine.

306
00:15:33,066 --> 00:15:37,733
And then well same we're going to go back
to this and we're going to get exactly

307
00:15:37,733 --> 00:15:41,200
this which will actually import the test

308
00:15:41,200 --> 00:15:44,200
set images into our notebook.

309
00:15:44,266 --> 00:15:44,600
All right.

310
00:15:44,600 --> 00:15:46,366
So let's test that.

311
00:15:46,366 --> 00:15:48,366
And now let's do the required change.

312
00:15:48,366 --> 00:15:51,966
Actually please press pause on the video
and do the changes yourself.

313
00:15:51,966 --> 00:15:54,000
I'm sure you're going to 
do this successfully

314
00:15:54,000 --> 00:15:56,466
because this is exactly
the same as before.

315
00:15:56,466 --> 00:15:59,133
All right. So now let's do it together.

316
00:15:59,133 --> 00:16:01,833
The first thing I would like to do
is just change its name,

317
00:16:01,833 --> 00:16:05,333
which is exactly the name of the variable
that will contain the test set.

318
00:16:05,500 --> 00:16:09,800
And just to be consistent with before,
well, let's just call the test set.

319
00:16:11,000 --> 00:16:11,833
All right.

320
00:16:11,833 --> 00:16:13,800
So test set then this is correct.

321
00:16:13,800 --> 00:16:17,000
We call our test data in here
which will only apply

322
00:16:17,033 --> 00:16:20,033
which is scaling
to the pixels of the test images.

323
00:16:20,133 --> 00:16:21,900
Then we call that same function

324
00:16:21,900 --> 00:16:25,500
flow from directory
to access the test set from our directory.

325
00:16:25,800 --> 00:16:29,100
And here once again
we need to replace data here by data set

326
00:16:29,400 --> 00:16:32,700
and then validation by you know remember

327
00:16:33,666 --> 00:16:36,500
now we want to get the path
that leads to the test set.

328
00:16:36,500 --> 00:16:39,633
And therefore that first data set
and then test set.

329
00:16:39,866 --> 00:16:40,500
All right.

330
00:16:40,500 --> 00:16:45,633
So here we just need to replace validation
by test set.

331
00:16:46,100 --> 00:16:46,966
Good.

332
00:16:46,966 --> 00:16:49,800
Then of course
we need to have the same target size.

333
00:16:49,800 --> 00:16:53,200
Because basically the break
method has to be called on the exact

334
00:16:53,200 --> 00:16:56,733
same format as the one that was used
for the images of the training.

335
00:16:56,733 --> 00:17:00,533
So here we need to get the same size
as in the training set.

336
00:17:00,533 --> 00:17:06,566
Therefore, 64 by 64
and the same batch size.

337
00:17:06,566 --> 00:17:10,433
Basically our Mo will be evaluated
on batches of 32 images.

338
00:17:10,700 --> 00:17:13,500
And of course the same class modes.
Binary.

339
00:17:13,500 --> 00:17:14,166
Good.

340
00:17:14,166 --> 00:17:15,066
Well there you go.

341
00:17:15,066 --> 00:17:17,433
We're done with data preprocessing.

342
00:17:17,433 --> 00:17:19,800
It was very different.
It was actually brand new.

343
00:17:19,800 --> 00:17:23,933
But we recognize some of the same process
steps as what we did before.

344
00:17:24,666 --> 00:17:27,266
And so now I'm very excited
because we can move on

345
00:17:27,266 --> 00:17:30,900
to the exciting part,
which is about building the CNN.

346
00:17:31,200 --> 00:17:35,033
Yes, we're ready for part two now, which
we're going to tackle in several steps.

347
00:17:35,300 --> 00:17:37,633
And so make sure
to get good energy for this.

348
00:17:37,633 --> 00:17:40,866
And as soon as it is a case,
join me in the next tutorial

349
00:17:40,900 --> 00:17:43,900
to smash this time
part to building the CNN.

350
00:17:44,266 --> 00:17:46,200
And until then, enjoy machine learning.