1
00:00:00,910 --> 00:00:07,390
All right so in this lesson the rubber is going to meet the road in this lesson we're going to build

2
00:00:07,570 --> 00:00:10,710
our model out in Charisse.

3
00:00:11,060 --> 00:00:16,400
Now the thing is anytime you're doing anything intensive flow and in this case carris is using tensor

4
00:00:16,400 --> 00:00:18,350
flow as the back end.

5
00:00:18,470 --> 00:00:24,440
It involves using a three step process to build and construct your model.

6
00:00:24,440 --> 00:00:27,300
The first step is defining your model.

7
00:00:27,350 --> 00:00:30,420
So you have to set out the structure of the model.

8
00:00:30,440 --> 00:00:36,380
That's step one step two is compiling the model.

9
00:00:36,530 --> 00:00:42,350
And that means telling tensor flow in advance how you'd want to measure the loss what kind of calculations

10
00:00:42,350 --> 00:00:44,580
you'd want to run to adjust the weights.

11
00:00:44,630 --> 00:00:47,360
And this is still part of the setup.

12
00:00:47,360 --> 00:00:50,740
We need to tell tensor flow how it should optimize our model.

13
00:00:50,840 --> 00:00:56,300
Once it comes to training it tensor flow needs to know about the calculations needs to run ahead of

14
00:00:56,300 --> 00:01:03,900
time speaking of training the model that's actually step three training the model is what you can do

15
00:01:03,990 --> 00:01:11,000
after you've done the setup and the word you'll see to describe this increase is the word fit during

16
00:01:11,000 --> 00:01:15,710
this step tensor flow will actually crunch through the data and do all the calculations.

17
00:01:16,520 --> 00:01:22,100
So let's talk through Step 1 defining the model because we're going to use carrots and I want to talk

18
00:01:22,100 --> 00:01:27,410
you through the code before we're actually going to write it in Jupiter notebook.

19
00:01:27,410 --> 00:01:31,250
Let's look at a simple example for image classification.

20
00:01:31,340 --> 00:01:37,700
Suppose we are aiming to have this structure here that I'm showing here on the slide this very very

21
00:01:37,700 --> 00:01:42,570
simple artificial neural network that we're going to build actually has a name.

22
00:01:42,620 --> 00:01:50,990
It's called the multi layer perception and since that one is defining our model we have to set up the

23
00:01:51,080 --> 00:01:53,300
architecture for this perception.

24
00:01:53,420 --> 00:01:55,650
And that means creating the layers.

25
00:01:56,180 --> 00:02:02,330
The very first layer that we will create in our code is actually not the input layer.

26
00:02:02,390 --> 00:02:09,560
The very first layer that we need to create is the first hidden layer Charisse only needs a little bit

27
00:02:09,560 --> 00:02:13,420
of help from us to work out what the input layer is.

28
00:02:13,490 --> 00:02:15,680
So what does this help that we need to provide carrots.

29
00:02:16,340 --> 00:02:21,850
Well we have to tell carrots how many inputs there are to this first layer.

30
00:02:22,190 --> 00:02:28,280
If we're working with an image then the number of inputs is gonna be determined by the resolution of

31
00:02:28,280 --> 00:02:31,520
that image and the color space of that image.

32
00:02:31,970 --> 00:02:40,070
So if I've got this image right here and it's 32 pixels by 32 pixels and it's a color image then the

33
00:02:40,070 --> 00:02:51,350
total number of inputs is gonna be 32 by 32 by 3 3 because we've got a red green and blue value for

34
00:02:51,440 --> 00:03:00,660
every single pixel so 32 by 32 by three is gonna be equal to three thousand and seventy two.

35
00:03:00,660 --> 00:03:07,200
That's the information we're gonna provide in our code when we create our first hidden layer the function

36
00:03:07,200 --> 00:03:14,620
call from Caris to create this layer is called dense dense will create all the neurons for us in this

37
00:03:14,620 --> 00:03:21,010
layer and the arguments that we have to feed to the dance function is the number of outputs that we

38
00:03:21,010 --> 00:03:21,570
want.

39
00:03:21,580 --> 00:03:27,250
So in this case the number of units or the number of neurons in the layer itself.

40
00:03:27,250 --> 00:03:32,110
And because we are calling this for the very first time and we're creating our very first hidden layer

41
00:03:32,470 --> 00:03:35,410
we have to specify the input dimensions.

42
00:03:35,410 --> 00:03:39,180
And this is where that three thousand and seventy two comes in.

43
00:03:39,250 --> 00:03:43,720
The final thing that we should specify is the activation function.

44
00:03:43,780 --> 00:03:46,900
In other words how are the neurons going to behave.

45
00:03:46,900 --> 00:03:48,420
And again we have a choice here.

46
00:03:48,760 --> 00:03:51,190
And I'm going to talk about this in detail in a minute.

47
00:03:52,000 --> 00:03:56,710
So now that we've set up the first hidden layer what would our code look like for a second hidden layer

48
00:03:57,460 --> 00:04:03,670
this hidden layer has five outputs because there's five nodes so the number five has to be specified

49
00:04:03,850 --> 00:04:06,700
as the units in the dense function.

50
00:04:06,790 --> 00:04:13,510
And again we specified activation function but in this case carries a smart enough to know how many

51
00:04:13,600 --> 00:04:16,710
inputs the second hidden layer is going to have.

52
00:04:16,720 --> 00:04:22,120
It only needed to know the number of inputs explicitly for the very very first hidden layer for that

53
00:04:22,120 --> 00:04:27,490
second layer it knows to look towards the first hidden layer that we've already created to work out

54
00:04:27,610 --> 00:04:31,860
the number of inputs let me ask you this.

55
00:04:31,940 --> 00:04:39,040
What's the code roughly going to look like for that final output layer unsurprisingly.

56
00:04:39,060 --> 00:04:44,910
We're also going to create this layer using the dense function from Caris and for the number of units

57
00:04:45,180 --> 00:04:46,000
we've got for.

58
00:04:46,020 --> 00:04:51,630
Because we've got four neurons on the slide but the only difference here is for the output layer we've

59
00:04:51,630 --> 00:04:55,180
got a different activation function instead of Rela.

60
00:04:55,200 --> 00:04:57,890
We're using soft banks now.

61
00:04:57,990 --> 00:05:01,200
I promise to you to talk a little bit more about activation of functions.

62
00:05:01,200 --> 00:05:02,050
So let's do that now.

63
00:05:02,070 --> 00:05:08,280
This is a really good opportunity to dive into this topic in a little bit more detail one of the things

64
00:05:08,280 --> 00:05:14,730
that we really talked about with neurons is that they may activate and pass that signal down to all

65
00:05:14,730 --> 00:05:18,240
the neurons that are connected downstream.

66
00:05:18,240 --> 00:05:24,900
The very first models of these neurons imagined a sort of binary output for the neurons.

67
00:05:25,000 --> 00:05:30,900
The imagine that the neuron would either activate or not activate and this can be modeled with an activation

68
00:05:30,900 --> 00:05:34,840
function that takes the form of a stepwise function.

69
00:05:34,890 --> 00:05:41,550
In other words if the activation function looks like this then you either get a value between 0 and

70
00:05:41,550 --> 00:05:44,340
1 from the neuron.

71
00:05:44,340 --> 00:05:47,730
Now these were sort of the early models from the 1940s.

72
00:05:47,730 --> 00:05:54,550
Since then we found that well maybe a neuron can activate strongly or weakly or somewhere in the middle.

73
00:05:54,570 --> 00:05:58,910
So we really want to get sort of a range of different signals from a neuron.

74
00:05:59,040 --> 00:06:04,680
And if this activation function were shaped a little bit differently then we can accomplish exactly

75
00:06:04,680 --> 00:06:08,450
that end to a different activation function.

76
00:06:08,520 --> 00:06:16,710
In this case we've got a sigmoid activation function which will give us a signal between 0 and 1.

77
00:06:16,710 --> 00:06:22,960
Now our neuron can send a strong or a weak signal depending on where on the curve it lies.

78
00:06:22,980 --> 00:06:28,680
Looking at this chart if it's getting input value of six then it's going to send a strong signal of

79
00:06:28,680 --> 00:06:29,000
one.

80
00:06:29,430 --> 00:06:34,080
But if it's getting an input value of negative six then it's not going to activate it it's not going

81
00:06:34,080 --> 00:06:37,370
to send anything if the inputs are somewhere in between.

82
00:06:37,470 --> 00:06:41,550
It's going to activate by 25 percent or by 75 percent.

83
00:06:41,550 --> 00:06:48,030
This is the idea behind an activation function that's a curve like the sigmoid function but a few minutes

84
00:06:48,030 --> 00:06:52,830
earlier we've seen two other functions we've seen the real function and a soft backs function as the

85
00:06:52,830 --> 00:06:54,780
activation function for the neurons.

86
00:06:54,810 --> 00:06:58,260
So what are those and what do they look like.

87
00:06:58,260 --> 00:07:06,060
Well the really function stands for rectified linear unit which is quite a mouthful and why everybody

88
00:07:06,390 --> 00:07:07,630
appreciates this.

89
00:07:07,800 --> 00:07:15,630
Reload and the function itself actually has a very simple shape it's a straight line for all the negative

90
00:07:15,630 --> 00:07:21,660
values and it's a 45 degree line for all the positive values.

91
00:07:21,660 --> 00:07:27,240
Now we've seen three different activation of functions in this case so which one should you choose.

92
00:07:27,240 --> 00:07:29,370
Well there's actually many more.

93
00:07:29,370 --> 00:07:33,390
If you go to the cross documentation we are presented with a menu.

94
00:07:33,420 --> 00:07:43,800
So you've got available activations soft backs you see Lou soft plus soft sign reload which we've talked

95
00:07:43,800 --> 00:07:48,350
about 10 H sigmoid hard sigmoid and so on.

96
00:07:48,390 --> 00:07:53,160
So we've got quite a few different activations to choose from for our neurons.

97
00:07:53,160 --> 00:07:58,560
Now at first this seems really intimidating but there's an easy solution.

98
00:07:58,560 --> 00:08:04,950
What you often find is that a particular field a particular type of problem actually favors a particular

99
00:08:04,950 --> 00:08:06,390
activation function.

100
00:08:06,390 --> 00:08:12,030
So all you have to do is look what kind of activation functions are used for the problem that you're

101
00:08:12,030 --> 00:08:12,770
trying to solve.

102
00:08:13,200 --> 00:08:19,170
And stick with that one that's going to be a really good starting point in our case.

103
00:08:19,170 --> 00:08:20,970
We're gonna go off the reload function.

104
00:08:21,060 --> 00:08:27,740
This is going to be our activation function of choice for all the neurons in our hidden layers.

105
00:08:27,780 --> 00:08:30,380
The big exception was the output layer.

106
00:08:30,510 --> 00:08:33,080
This is where we saw that would soft Max.

107
00:08:33,270 --> 00:08:36,980
We had real activation functions in our first hidden layer.

108
00:08:37,000 --> 00:08:43,070
And our second hidden layer but we had the soft Max activation function in our output layer.

109
00:08:43,080 --> 00:08:44,580
Why is that.

110
00:08:44,580 --> 00:08:51,980
Well soft Max is a mathematical function that will transform our outputs into a probability.

111
00:08:52,650 --> 00:08:57,270
And this means that we get a really nice interpretation for our output.

112
00:08:57,270 --> 00:09:03,540
In other words the model can tell us something like there is a 20 percent chance of this image containing

113
00:09:03,540 --> 00:09:04,380
a cat.

114
00:09:04,410 --> 00:09:10,160
And the reason that we can interpret the output of a model like this is thanks to this soft Max activation

115
00:09:10,170 --> 00:09:17,610
function in the output layer soft Max will give us a distribution for all our output numbers and all

116
00:09:17,610 --> 00:09:24,660
these numbers are going to be between 0 and 1 and they're all going to sum to one soft Max is basically

117
00:09:24,660 --> 00:09:27,510
going to reach scale our output.

118
00:09:27,510 --> 00:09:33,240
And this is why you often see soft Max in the output layer for all of these multi class classification

119
00:09:33,240 --> 00:09:34,520
problems.

120
00:09:34,650 --> 00:09:36,300
But enough about the theory.

121
00:09:36,420 --> 00:09:42,750
Let's head back into our Jupiter notebook and configure our multilayer perception.

122
00:09:42,960 --> 00:09:53,700
I'll add a markdown cell here that's going to read define the neural network using carrots.

123
00:09:53,750 --> 00:10:00,120
Next we're going to go to the very top to our input statements and input the caris functionality that

124
00:10:00,120 --> 00:10:16,070
we need and that includes from Caris thought models import sequential also from Kerry's dot layers.

125
00:10:16,080 --> 00:10:25,800
We're gonna import you guessed that dense and we're also going to import activation lets it shift enter

126
00:10:25,800 --> 00:10:33,510
on the cell scroll all the way back down and add our code to create our perception.

127
00:10:33,510 --> 00:10:38,370
So I'm going to call this one model underscore one because we're gonna have more than one model that

128
00:10:38,370 --> 00:10:40,600
we're going to try out next.

129
00:10:40,650 --> 00:10:49,470
I'm going to use sequential parentheses and then square brackets and to add on this line between the

130
00:10:49,470 --> 00:10:50,820
two square brackets.

131
00:10:50,820 --> 00:10:57,940
I'm going to call Dennis open parentheses and I'll say units as equal to.

132
00:10:58,170 --> 00:11:03,160
And here I get to choose how many output units my first hidden layer will have.

133
00:11:03,420 --> 00:11:06,630
I'm going to go with 1 128.

134
00:11:06,750 --> 00:11:13,080
Put a comma after that and then I'm going to specify my input dimensions.

135
00:11:13,080 --> 00:11:16,970
Input underscore DRM and a set that equal to.

136
00:11:16,980 --> 00:11:20,960
We said well three thousand and seventy two right.

137
00:11:21,060 --> 00:11:24,470
But we don't have to use this magic number here.

138
00:11:24,600 --> 00:11:26,910
We've already added a constant here.

139
00:11:27,360 --> 00:11:29,730
So let's use this one instead.

140
00:11:29,790 --> 00:11:32,600
Total underscore inputs.

141
00:11:33,720 --> 00:11:41,990
We'll put that right here and then we can specify something else namely the activation.

142
00:11:42,060 --> 00:11:46,860
And here we're going to go with single quotes reload all lowercase.

143
00:11:47,130 --> 00:11:51,410
That's our very first hidden layer in our neural network.

144
00:11:51,690 --> 00:11:57,690
Now that we've got our first 10 layered on let's add a couple of more layers to this neural network.

145
00:11:57,690 --> 00:12:02,620
So to do that we just have to call that dense function a couple of more times.

146
00:12:02,660 --> 00:12:09,450
So when I had a comma here on this line go down to the next line and add our second hidden layer here.

147
00:12:09,450 --> 00:12:12,620
And this is going to have 64 neurons in it.

148
00:12:13,260 --> 00:12:20,670
And as the activation function we're also going to go with redo rectified linear unit but we don't have

149
00:12:20,670 --> 00:12:28,570
to specify the input dimensions because Caris is smart enough to work this out from the previous layer.

150
00:12:28,860 --> 00:12:30,740
That's our second hidden layer done.

151
00:12:30,840 --> 00:12:33,000
Let's add a third hidden layer.

152
00:12:33,150 --> 00:12:34,580
We're really really going deep.

153
00:12:34,620 --> 00:12:36,270
Inception style.

154
00:12:36,660 --> 00:12:46,260
And on this one we'll have sixteen units or 16 neurons will add the activation function again will specify

155
00:12:46,590 --> 00:12:51,540
reload and add a comma at the end go to the next line.

156
00:12:51,540 --> 00:12:54,450
And here we're going to specify our output layer.

157
00:12:54,960 --> 00:12:58,050
In this case we're going to have 10 different outputs.

158
00:12:58,050 --> 00:12:58,600
Right.

159
00:12:58,620 --> 00:13:03,670
We're gonna have 10 different categories that we're differentiating in the model.

160
00:13:03,720 --> 00:13:07,290
So we'll have 10 neurons or 10 units.

161
00:13:07,290 --> 00:13:13,770
And for the activation we won't have really too but because this is our output layer we're gonna be

162
00:13:13,770 --> 00:13:22,530
using soft Max soft Max is ideal for giving us a probability interpretation in a classification problem.

163
00:13:22,530 --> 00:13:26,760
So this is what we'll use when creating the layers.

164
00:13:26,790 --> 00:13:32,700
I've added the parameter name here for the first two and I've left it out for the second two.

165
00:13:32,760 --> 00:13:38,760
This is how you'll often see it written in the documentation and on Stack Overflow.

166
00:13:38,760 --> 00:13:42,600
Now let's hit shift enter on the cell and that's it.

167
00:13:42,600 --> 00:13:44,530
We've created our Caris model.

168
00:13:44,640 --> 00:13:53,160
If we check the type of model underscore one we see that this object is a sequential model.

169
00:13:53,250 --> 00:13:54,520
Fantastic.

170
00:13:54,570 --> 00:13:56,660
So we figured out quite a lot of stuff.

171
00:13:56,730 --> 00:14:03,240
We've written some very terse code where a lot is happening behind the scenes and we've demystified

172
00:14:03,510 --> 00:14:07,760
a lot of the vocabulary that we're seeing in this code here.

173
00:14:07,860 --> 00:14:14,520
Having laid out the structure of our model with in this case three hidden lives and an output layer

174
00:14:15,060 --> 00:14:17,970
it's time to compile our model.

175
00:14:17,970 --> 00:14:24,540
It's time to specify what calculations are going to take place during the training step and how the

176
00:14:24,540 --> 00:14:27,720
weights are adjust that and how the losses are measured.

177
00:14:27,720 --> 00:14:30,600
All that and more in the next lesson.

178
00:14:30,630 --> 00:14:31,500
I'll see you there.