1
00:00:01,090 --> 00:00:08,200
The Lost foundational concept we need to understand before we start building our CNN model in this orbit

2
00:00:08,590 --> 00:00:16,580
is that of a pooling lid and since you know how conditional layers work pooling layer is going to be

3
00:00:16,700 --> 00:00:26,070
easy to understand we use pooling layer in our network to reduce the computational load memory usage

4
00:00:26,670 --> 00:00:29,240
and the number of parameters to be estimated.

5
00:00:31,390 --> 00:00:39,370
Just like in conclusion earlier each neuron in appallingly it also has a small rectangular deceptive

6
00:00:39,370 --> 00:00:40,870
field.

7
00:00:40,870 --> 00:00:48,280
We have to define the size of this rectangular receptive field thus trade the type.

8
00:00:48,300 --> 00:00:55,640
Just like before however pulling neurons have no weight.

9
00:00:56,430 --> 00:01:03,320
All they do is aggregate the input using an aggregate function such as Max or mean.

10
00:01:03,960 --> 00:01:10,570
In this image the layer on top is that of a pooling lid.

11
00:01:10,640 --> 00:01:18,390
You can see that each new dawn is looking at a two by two set of neurons on the lower layer.

12
00:01:18,480 --> 00:01:24,860
The first neuron is looking at these four cells which have a red boundary.

13
00:01:24,960 --> 00:01:30,710
Next is looking at these four which are dotted blue boundary.

14
00:01:30,740 --> 00:01:38,930
This means that this right here is to by default the straight in upwelling there is same as the width

15
00:01:38,960 --> 00:01:45,170
of the receptive field.

16
00:01:45,190 --> 00:01:54,560
Now if we use max function or Max pooling as it is called Only the maximum input value although the

17
00:01:54,560 --> 00:02:02,370
four values in this receptive field makes it to the next layer the other three inputs are dropped.

18
00:02:02,370 --> 00:02:14,460
For example if these are the four outputs of these four cells 1 5 3 2 then order these 4 5 is the largest

19
00:02:14,460 --> 00:02:22,110
value so this neuron in the top layer will have 5 as output

20
00:02:25,460 --> 00:02:26,860
so it is very simple.

21
00:02:26,870 --> 00:02:34,340
No weights no filters to be trained just find the maximum value out of the forward values that it sees

22
00:02:34,960 --> 00:02:36,110
and it outputs that

23
00:02:39,040 --> 00:02:47,510
if you look at the TGIF at the bottom if this is the feature map at which are Max pooling layer is looking

24
00:02:47,510 --> 00:02:56,220
at for the first squared off for neurons the largest value is 6.

25
00:02:56,270 --> 00:03:04,720
So we ended six here in the first cell of the max pooling their similar to Max pooling.

26
00:03:04,850 --> 00:03:08,390
That is average pulling an average pooling.

27
00:03:08,390 --> 00:03:11,520
We find out the mean of the four values.

28
00:03:11,660 --> 00:03:19,610
So if we are doing average pooling it will be the average of these four values six six four and five.

29
00:03:20,510 --> 00:03:25,150
So that averages 5.2 two in the next trade.

30
00:03:25,180 --> 00:03:31,720
We look at the next four cells and we find out there Max and their average value those are stored in

31
00:03:31,720 --> 00:03:35,110
the next neuron now.

32
00:03:35,120 --> 00:03:43,230
Also notice that since we are using a straight off to here the pooling layer has half the weight and

33
00:03:43,320 --> 00:03:51,650
half the height of previously it you can now imagine how this will reduce the computations and memory

34
00:03:51,650 --> 00:04:01,530
usage instead of a willing layer if we had the next convolution cleared straight away so that layer

35
00:04:01,650 --> 00:04:07,520
would have six endured six as eight and eight as with.

36
00:04:07,710 --> 00:04:17,890
So six into it forty eight input neurons so each neuron in the next layer would have forty eight parameters

37
00:04:18,070 --> 00:04:30,330
to be trained but if we have this pooling layer on top then each neuron gets only three into four that

38
00:04:30,330 --> 00:04:33,300
is 12 input neurons.

39
00:04:33,300 --> 00:04:41,180
So only 12 parameters but neuron are to be trained so instead of forty eight we get well that I mean

40
00:04:41,260 --> 00:04:42,080
train.

41
00:04:42,480 --> 00:04:51,330
So the amount of computation goes down significantly so in this example we saw that we can do both Max

42
00:04:51,330 --> 00:04:59,160
pulling and mean pulling but commonly Max pooling works better than the alternative options because

43
00:04:59,730 --> 00:05:10,000
it highlights the main features instead of averaging them old so in our model most often we'll be using

44
00:05:10,030 --> 00:05:19,790
max pooling only that this is the content we think Max pulling it is a tradeoff we give away some extra

45
00:05:19,790 --> 00:05:28,010
information in the previous layer to reduce the computational load on our system I will highlight this

46
00:05:28,010 --> 00:05:30,560
impact on computation when we write the code.