1
00:00:00,630 --> 00:00:09,490
Now, in the last part of this project, we're going to use a pre trained model, we just named Lee

2
00:00:09,490 --> 00:00:10,520
Didi's 16.

3
00:00:12,030 --> 00:00:15,390
This model was Binet of 2013 image that competition.

4
00:00:17,160 --> 00:00:20,600
However, we now have more advanced architecture also.

5
00:00:21,870 --> 00:00:30,250
But for this problem, we are going to consider a logged CNN model which was trained on imaging data

6
00:00:30,300 --> 00:00:30,810
dataset.

7
00:00:32,070 --> 00:00:39,450
This image dataset had one point four million labeled images and they were chosen different classes

8
00:00:39,840 --> 00:00:44,020
in which images were to be classified image.

9
00:00:44,350 --> 00:00:48,960
It contained animal glasses, different species of cats and dogs.

10
00:00:49,740 --> 00:00:55,590
And does we expect this model to perform well on the cats versus dogs classification problem?

11
00:00:55,670 --> 00:00:56,100
Also.

12
00:00:58,790 --> 00:01:01,770
So we'll use this BDD 16 architecture.

13
00:01:02,800 --> 00:01:10,390
This will delivered in 2014, and it is a simple and widely used, convolutional network architecture.

14
00:01:10,630 --> 00:01:12,100
What image problem?

15
00:01:14,640 --> 00:01:22,470
Although it is a little bit older and there are some really there are some advanced and somewhat heavier

16
00:01:22,470 --> 00:01:29,160
recent mortals, this architecture will be easier to understand, and that is why we are going to use

17
00:01:29,400 --> 00:01:31,620
this architecture of this particular problem.

18
00:01:35,370 --> 00:01:40,620
Also, this is one of the connectors which comes prepackaged that get us.

19
00:01:41,610 --> 00:01:48,670
So we did not load up the beaks of all dealers get us has these boots stored in it.

20
00:01:52,920 --> 00:01:59,130
So let's see how they use these convolutional layers of this breathing model.

21
00:02:00,840 --> 00:02:02,410
Again, I created a new project.

22
00:02:02,670 --> 00:02:06,060
If you are working on the same project, you need not loaded.

23
00:02:06,170 --> 00:02:06,500
Ready?

24
00:02:07,110 --> 00:02:07,650
Get us.

25
00:02:07,800 --> 00:02:10,500
And you'll need not create these directories again.

26
00:02:11,490 --> 00:02:12,540
This is my new budget.

27
00:02:12,840 --> 00:02:13,590
I went to do it.

28
00:02:22,920 --> 00:02:27,790
Not to instantiate DVD 16 model are too loaded.

29
00:02:28,580 --> 00:02:31,000
We use this application, understood.

30
00:02:31,180 --> 00:02:32,510
We did 16 function.

31
00:02:33,890 --> 00:02:40,070
This will load all the weight at all the cells in this PGD 16 architecture.

32
00:02:40,520 --> 00:02:48,890
When it was trained on the image that does it, the other parameters that include top include top refers

33
00:02:48,890 --> 00:02:54,380
to including or excluding the densely connected classifier.

34
00:02:54,980 --> 00:02:56,120
On top of the network.

35
00:02:57,470 --> 00:03:07,070
So as I told you, we Didi's 16 was actually used to classify images into tosing classes since.

36
00:03:07,140 --> 00:03:07,980
And our problem.

37
00:03:08,150 --> 00:03:09,410
We have only two classes.

38
00:03:10,460 --> 00:03:14,060
The dentally connected list is not required.

39
00:03:14,450 --> 00:03:18,590
So we'll have our own densely connected layer at the end of the model.

40
00:03:19,430 --> 00:03:23,690
We are just going to use the convolutional part of this model.

41
00:03:26,120 --> 00:03:31,850
Input chip parameter is an optional battery that we are telling that the images that we are going to

42
00:03:31,880 --> 00:03:33,560
input are in this shape.

43
00:03:33,800 --> 00:03:37,130
One could be very unhappy pixels the three channels.

44
00:03:38,060 --> 00:03:44,630
However, if we do not give this input chip, it will still accept shapes of any dimensions.

45
00:03:46,780 --> 00:03:49,660
So let's load the convolutional wake's.

46
00:03:57,360 --> 00:03:57,710
Now, the.

47
00:03:58,290 --> 00:03:58,670
Downloaded.

48
00:04:00,440 --> 00:04:02,180
It took nearly five to six minutes for me.

49
00:04:03,400 --> 00:04:08,990
Now, since the data is downloaded, we can look at how is this convolutional structured?

50
00:04:10,220 --> 00:04:11,450
Let's run this.

51
00:04:12,510 --> 00:04:16,110
And look at what does the architecture in this model.

52
00:04:18,090 --> 00:04:21,370
We did you 16 is a big architecture.

53
00:04:22,620 --> 00:04:26,630
We have already discussed the entire architecture in auto to relate to.

54
00:04:28,130 --> 00:04:31,560
It takes in an input of 150 women, 50 by three.

55
00:04:31,680 --> 00:04:34,080
This is the input that we have specified.

56
00:04:36,300 --> 00:04:41,510
Then there is a convolutional block which is giving out 64 features.

57
00:04:42,330 --> 00:04:45,710
Then there is another congressional led with 60 vote vetoes.

58
00:04:46,910 --> 00:04:52,080
Then there is a pulling leg, which is doing up two by two pulling.

59
00:04:52,800 --> 00:04:54,860
That is traducing height and weight, too.

60
00:04:56,130 --> 00:04:57,660
Then there are two convolutional legs.

61
00:04:57,960 --> 00:04:59,250
Again, a max pooling.

62
00:05:00,420 --> 00:05:04,080
Then again, three convolutional layers with one max cooling.

63
00:05:04,650 --> 00:05:07,330
Then three convolutional layers, one with max cooling.

64
00:05:08,040 --> 00:05:11,650
And then there are three more commission layers with one max pooling.

65
00:05:12,750 --> 00:05:19,750
So at the end of it, we have 512 features of four by four tails.

66
00:05:20,560 --> 00:05:26,730
So it is a very small format for Matrix, with 512 features extracted.

67
00:05:29,660 --> 00:05:37,460
Overall, there were fourteen point seven million parameters on the laptop that we use for day to day

68
00:05:37,460 --> 00:05:41,060
work, running such a model would have been not possible.

69
00:05:42,230 --> 00:05:51,050
That is why for routine jobs, using a preordering model makes more sense than to creating a new model

70
00:05:51,410 --> 00:05:54,770
with so many bad and we just to be doing so.

71
00:05:54,860 --> 00:06:01,600
In fact, if you are not running a DIPU system, training such a model would be not possible on a C

72
00:06:01,600 --> 00:06:02,360
Bubis system.

73
00:06:04,520 --> 00:06:08,500
Now this is only the convolutional base after the base.

74
00:06:08,540 --> 00:06:12,020
We need to attach a fully connected classifier.

75
00:06:13,280 --> 00:06:17,690
That is, there should be few layers of normal neural network.

76
00:06:18,440 --> 00:06:24,680
And the final output layer should have only one neuron with a sigmoid activation function.

77
00:06:24,950 --> 00:06:28,940
Telling us which binary class does that image belong to.

78
00:06:30,150 --> 00:06:34,680
So the output of this convolutional base is flacking.

79
00:06:36,660 --> 00:06:38,010
So we have a flattening led.

80
00:06:39,460 --> 00:06:43,480
Then we have a hidden lair with 256 neurons

81
00:06:46,840 --> 00:06:53,170
and activation function relu, the output of these 256 neurons goes into only one neuron.

82
00:06:53,290 --> 00:06:58,930
And this neuron has a sigmoid activation function, which basically tells us the probability of belonging

83
00:06:58,930 --> 00:07:00,430
to one class.

84
00:07:02,910 --> 00:07:05,130
So we've done this also.

85
00:07:05,870 --> 00:07:13,380
And our model is now the entire convolutional base, which is then, as we did, 16.

86
00:07:14,950 --> 00:07:16,660
The three layers that we have added.

87
00:07:19,130 --> 00:07:25,970
But noted that all these sixteen point eight million parameters are right now trainable parameters.

88
00:07:27,610 --> 00:07:34,280
But we are going to do is we are going to fix the weight of the convolutional base.

89
00:07:34,610 --> 00:07:37,700
We are not going to retrain the convolutional base.

90
00:07:38,590 --> 00:07:45,030
We are assuming that the convolutional base of the BGT model, since it was also trained on a similar

91
00:07:45,140 --> 00:07:48,530
to, say, which had got breeds and dog breeds.

92
00:07:49,040 --> 00:07:51,800
And it was actually working on thousand different classes.

93
00:07:52,780 --> 00:08:00,470
The kind of features that it was taking out of the images, the same features could be used to just

94
00:08:00,470 --> 00:08:02,150
distinguish between cats and dogs.

95
00:08:03,590 --> 00:08:11,080
So the idea is the congressional leaders are giving us the features, the same features which were identified

96
00:08:11,680 --> 00:08:18,980
by Viji in image net dataset, the same features can be used for our Douglass's tax classify.

97
00:08:20,320 --> 00:08:25,060
Those features will be input into our own neural network.

98
00:08:25,150 --> 00:08:25,750
At the end.

99
00:08:26,240 --> 00:08:28,590
And that neural network only has to be trained.

100
00:08:29,660 --> 00:08:34,450
So to freeze these rates in the convolutional base, we use dysfunction.

101
00:08:34,520 --> 00:08:34,800
Freeze.

102
00:08:34,900 --> 00:08:35,340
Wait.

103
00:08:37,030 --> 00:08:45,810
And when we done this debate of neurons in the convolutional base that is in this BDD 16 model, these

104
00:08:45,820 --> 00:08:46,750
will be fixed.

105
00:08:47,460 --> 00:08:51,080
It has fourteen point seven million parameters that were to retrain.

106
00:08:52,360 --> 00:08:59,050
Now, if we look at our model, only these two million parameters remain, which are to retrain the

107
00:08:59,200 --> 00:09:03,250
other fourteen point seven million parameters, which were part of BGT 16.

108
00:09:03,850 --> 00:09:06,700
These are Tumnus non-criminal parameters.

109
00:09:06,940 --> 00:09:07,690
These are fixed.

110
00:09:10,870 --> 00:09:15,430
So that is all our model is really the architecture is set.

111
00:09:16,300 --> 00:09:24,730
We just have to repeat whatever we did and now we have to train DTG, which will be giving images in

112
00:09:24,730 --> 00:09:26,140
batches of 20.

113
00:09:27,520 --> 00:09:34,300
He also doing the documentation that is new artificial images will be created by randomly changing these

114
00:09:34,300 --> 00:09:43,060
parameters, giving it other Dacian of some degrees, shifting it by some pixels, zooming in those

115
00:09:43,070 --> 00:09:44,320
coming out until on.

116
00:09:47,120 --> 00:09:55,210
So this is what we did a little better than we import this thing, dodging into the train generator,

117
00:09:55,540 --> 00:09:58,000
which will give us images in batches of 20.

118
00:10:01,360 --> 00:10:08,860
Then we done this test data, which will only be scanned the test images by 250 feet.

119
00:10:10,570 --> 00:10:15,850
And we use it in vegetation generator, which will do the same for ventilation images.

120
00:10:17,800 --> 00:10:18,910
Then we compiled a model.

121
00:10:19,480 --> 00:10:24,150
Compiling the same loss function is going to be binary cross entropy.

122
00:10:25,210 --> 00:10:31,960
However, in the optimizer, which didn't have automats prop optimize it, but we are using a very small

123
00:10:31,960 --> 00:10:33,700
learning rate in this model.

124
00:10:35,140 --> 00:10:39,190
So earlier we had finished about minus faught as the learning rate.

125
00:10:39,790 --> 00:10:44,110
Now we have reduced that even further to doing 210 but minus five.

126
00:10:44,940 --> 00:10:51,820
This is because we want to take very small steps in the direction of optimum, the metric that we are

127
00:10:51,820 --> 00:10:52,420
going to monitor.

128
00:10:52,460 --> 00:10:53,440
That is accuracy.

129
00:10:53,950 --> 00:10:59,380
So we compiled a model to know that one two have come by.

130
00:10:59,560 --> 00:11:00,890
Do not test the architecture.

131
00:11:01,270 --> 00:11:06,100
Otherwise, these parameters that we froze will not be frozen anymore.

132
00:11:06,460 --> 00:11:12,880
So if you change the architecture, you have to freeze these parameters again and then compile again.

133
00:11:14,070 --> 00:11:17,620
So compiling would be the last step before you bring your model.

134
00:11:19,910 --> 00:11:23,810
Now we are going to train our model for this.

135
00:11:23,900 --> 00:11:29,830
We are again going to use the train, Jeneda, which will input 20 training images, editing steps.

136
00:11:30,040 --> 00:11:37,030
But Epoch is said 200, which means we are going to use 2000 images in one book.

137
00:11:37,510 --> 00:11:37,940
Twenty.

138
00:11:38,180 --> 00:11:38,960
Into one hundred.

139
00:11:39,230 --> 00:11:40,550
So 2000 images.

140
00:11:40,960 --> 00:11:42,970
But each book will be input.

141
00:11:43,250 --> 00:11:47,710
For training, e-books is set to 30 model gain.

142
00:11:48,080 --> 00:11:54,500
If we do not see the solution converging, we can run this model logging for more number of epochs.

143
00:11:57,120 --> 00:12:02,940
Validation data is a validation data, which will again give validation images in batches of going deep.

144
00:12:04,170 --> 00:12:11,100
Validation Steps is 50 because we are going to use 1000 images for validation purposes.

145
00:12:12,630 --> 00:12:19,800
So, again, I'm going to warn you that it is very important that this will be done on your computer.

146
00:12:20,130 --> 00:12:23,160
Only if your computer has sufficient computational power.

147
00:12:23,850 --> 00:12:28,790
So if you're running tens of law on Deepu, all you have good RAM in your C.P.U.

148
00:12:29,310 --> 00:12:31,950
Only then this model will be able to train.

149
00:12:32,490 --> 00:12:37,020
Otherwise, it is going to take awfully lot of pain and you won't be able to get the results.

150
00:12:37,650 --> 00:12:39,060
So only done.

151
00:12:39,450 --> 00:12:41,670
If you have good computational power in your computer.