1
00:00:00,510 --> 00:00:07,650
In this lecture, we are going to see the impact of bullying layer on the number of parameters we have

2
00:00:07,650 --> 00:00:12,090
to train and the execution time on our CNN model.

3
00:00:13,440 --> 00:00:19,890
First, we will run the CNN model that we discussed in our last lecture with the polling leader.

4
00:00:20,900 --> 00:00:29,850
Then we will remove the polling layer from that model to see its impact on the execution time.

5
00:00:32,740 --> 00:00:37,360
So this is the architecture of the model that we developed in the last lecture.

6
00:00:38,890 --> 00:00:42,880
First we have then put layer, then on layer, then pulling layer.

7
00:00:43,690 --> 00:00:46,990
Then we have to dense layer and then output layer.

8
00:00:49,280 --> 00:00:55,090
We are going to remove this pulling layer to notice its impact on the execution day.

9
00:00:57,850 --> 00:00:59,650
So the code will remain the same.

10
00:01:01,570 --> 00:01:03,040
First we have the corn layer.

11
00:01:03,520 --> 00:01:04,540
Then pulling layer.

12
00:01:04,720 --> 00:01:05,770
Then like then layer.

13
00:01:06,010 --> 00:01:06,850
Then Dennis Layer.

14
00:01:07,720 --> 00:01:10,790
We are calling this model as model underscored a..

15
00:01:11,500 --> 00:01:16,270
This is same as the model we develop in our last lecture.

16
00:01:16,960 --> 00:01:22,390
And then we have a second model in which we are not taking this layer.

17
00:01:23,080 --> 00:01:31,000
So first we have the corn layer, then flat and layer, then two dense layer and one output layer.

18
00:01:32,380 --> 00:01:35,790
We are calling this model S model underscore B.

19
00:01:37,520 --> 00:01:43,320
And later on we will compare the performance of Model A. versus performance of Model B.

20
00:01:45,490 --> 00:01:47,170
Let's just try this.

21
00:01:49,750 --> 00:01:58,200
Beacon also look at somebody to get an idea of how many parameters our model out optimizing.

22
00:02:13,950 --> 00:02:24,030
So here you can see for the first dance, let we have our on one point six million parameters to train

23
00:02:25,640 --> 00:02:30,780
if we compared with Model B values.

24
00:02:38,740 --> 00:02:46,510
You can see in what second model that we do not have any pulling layer, the number of cranial parameters

25
00:02:47,200 --> 00:02:49,060
is around six point five million.

26
00:02:50,500 --> 00:02:58,810
So all you can say that there are four times more trainable parameter in your model without pulling

27
00:02:58,810 --> 00:02:59,080
leer.

28
00:03:00,790 --> 00:03:07,750
And we know that the execution time is directly dependent on the number of parameters that we are going

29
00:03:07,750 --> 00:03:08,230
to train.

30
00:03:09,700 --> 00:03:15,850
So obviously we can expect that Model B will take a lot more time than more delay.

31
00:03:18,010 --> 00:03:26,590
Let's just combine both of these models and then we will run model eight four three box.

32
00:03:26,740 --> 00:03:28,990
Then we will run Model B for three books.

33
00:03:29,530 --> 00:03:33,130
And after that, we'll compare the execution time for both models.

34
00:03:33,490 --> 00:03:35,170
Let's just first model A.

35
00:03:46,350 --> 00:03:47,900
Now, we have drained our water.

36
00:03:49,080 --> 00:03:54,870
And as you can see, for each boat, the execution time is on 31 seconds.

37
00:03:55,740 --> 00:04:00,800
And after the completion of Pertti book, we are getting the validation accuracy of 82 percent.

38
00:04:01,230 --> 00:04:04,470
And same as the accuracy for training data as well.

39
00:04:06,210 --> 00:04:11,400
Now, let's run the model without the pulling layer.

40
00:04:13,080 --> 00:04:13,710
Then this.

41
00:04:34,670 --> 00:04:37,520
So now we have trained our second Märta lesbian.

42
00:04:38,600 --> 00:04:46,460
And as you can see, the execution time for each epoch is around sixty two to sixty three seconds,

43
00:04:47,270 --> 00:04:53,480
whereas for one model one, the execution time for each box was around 30 to 31 seconds.

44
00:04:54,740 --> 00:04:59,320
So the execution time is almost double for our model B.

45
00:04:59,990 --> 00:05:06,070
That is the model without the pulling layer as compared to a model with pulling layer.

46
00:05:07,760 --> 00:05:16,490
You can also look at the accuracy scored on training set, the accuracy of over model BS more and on

47
00:05:16,490 --> 00:05:17,530
the validation set.

48
00:05:17,720 --> 00:05:21,680
The accuracy is almost the same for both the models.

49
00:05:23,690 --> 00:05:32,240
This is because when we are pulling for pixels and the one pixel, there is some information loss and

50
00:05:32,420 --> 00:05:36,410
that information loss is resulting in the lower accuracy.

51
00:05:37,160 --> 00:05:44,900
So if you are using pulling layer, your execution time will be less and the accuracy will also be a

52
00:05:44,900 --> 00:05:48,470
little less as compared to a model without pulling their.

53
00:05:52,150 --> 00:06:00,370
In this case, we have only used one convolutional layer, but in real life scenario, you may have

54
00:06:00,370 --> 00:06:03,030
to use multiple convolutional layer.

55
00:06:03,490 --> 00:06:09,520
And in such cases, a use of pulling layer becomes much more important.

56
00:06:11,080 --> 00:06:19,810
So using pulling layer, you can significantly reduce your execution time without impacting much accuracy.