1
00:00:00,930 --> 00:00:01,320
All right.

2
00:00:01,350 --> 00:00:04,620
So in this lesson we're going to set up our lost function.

3
00:00:04,650 --> 00:00:12,240
We're gonna set up our optimizer and we're going to set up some metrics like the accuracy how add a

4
00:00:12,240 --> 00:00:20,550
markdown cell here loss optimization and metrics.

5
00:00:20,610 --> 00:00:24,780
The very first thing that we're going to tackle is the loss function.

6
00:00:24,990 --> 00:00:29,300
Now the lost function is key for optimizing the neural network.

7
00:00:29,370 --> 00:00:35,860
In this example we'll continue to use the cross entropy loss that we talked about in the last module.

8
00:00:35,860 --> 00:00:40,560
So let's have a look at the tensor flow documentation for cross entropy loss.

9
00:00:40,770 --> 00:00:47,640
If we Google tensor flow cross entropy then the first result I'm getting here is this one if I click

10
00:00:47,640 --> 00:00:55,710
on this result and scroll down a little bit then I can see that this function is deprecated.

11
00:00:55,710 --> 00:01:02,160
So apparently there's already a newer version of this function that's out and this new function pretty

12
00:01:02,160 --> 00:01:06,650
much has the same name except at the end it's got version 2.

13
00:01:06,690 --> 00:01:13,800
So if we click on that then in the description we can see that this function calculates the soft Max

14
00:01:13,920 --> 00:01:17,490
cross entropy which is what we're after between two things.

15
00:01:17,490 --> 00:01:25,230
The lower bits and the labels the low gits are the outputs from our output layer and the labels are

16
00:01:25,230 --> 00:01:30,790
going to be the actual labels that we're going to supply are y values.

17
00:01:30,930 --> 00:01:34,500
So let's use this function in our Jupiter notebook.

18
00:01:34,500 --> 00:01:46,000
Our loss is going to be equal to TAF Dot and Dot soft Max cross entropy with low its version 2.

19
00:01:46,290 --> 00:01:52,530
All we need to supply in the parentheses are the labels which is going to be our place holder tensor

20
00:01:52,530 --> 00:02:01,410
for our y values and our logics which is another name for the outputs that we're gonna get from our

21
00:02:01,410 --> 00:02:03,160
last layer.

22
00:02:03,210 --> 00:02:06,950
So I'm going to set low it's equal to output output here.

23
00:02:06,960 --> 00:02:13,080
Remember is what we're getting out of these soft Max activation function from our last layer.

24
00:02:14,010 --> 00:02:17,870
Now there's only one tiny modification I want to make to this line of code.

25
00:02:18,390 --> 00:02:24,020
And that modification is due to the fact that we're gonna be training our model in batches.

26
00:02:24,090 --> 00:02:30,420
We've got a big training data set and we're gonna split it up when to split it up into smaller pieces

27
00:02:30,460 --> 00:02:36,760
and we're going to train on these smaller pieces one at a time until we chew through the entire dataset.

28
00:02:36,900 --> 00:02:42,750
And that means that while this last calculation works for calculating the loss on the entire dataset

29
00:02:43,530 --> 00:02:48,230
it's not going to give us a good result when we've got individual patches.

30
00:02:48,270 --> 00:02:55,440
So what we need to do when we have individual patches is we need to take the average of the losses and

31
00:02:55,680 --> 00:03:02,400
the good thing is tend to flow has a fantastic function for us called reduce on the score mean that

32
00:03:02,400 --> 00:03:04,230
we'll do just that.

33
00:03:04,230 --> 00:03:11,160
So I'm going to wrap the output from soft Max cross entropy with low gits version to inside the parentheses

34
00:03:11,400 --> 00:03:17,950
for reduce underscore mean now that tensor flow knows what loss function we're going to use we can move

35
00:03:17,950 --> 00:03:22,930
on to the next step which is telling tensor flow which optimizer we want to use.

36
00:03:23,170 --> 00:03:30,620
So we'll store this in a variable called optimizer and that's going to be equal to.

37
00:03:30,620 --> 00:03:32,890
Well let's google for it actually.

38
00:03:32,890 --> 00:03:40,510
So tensor flow optimizer will bring up several optimizer so that we can use like the gradient descent

39
00:03:40,570 --> 00:03:47,590
optimizer or the atom optimizer Adam as we've discussed in the previous module is a state of the art

40
00:03:47,650 --> 00:03:48,790
optimizer.

41
00:03:48,790 --> 00:03:52,710
So I'll tell you what let's use this one again.

42
00:03:52,990 --> 00:03:59,470
We can see from the documentation him that there are default values for all the parameters for the atom

43
00:03:59,470 --> 00:04:03,550
optimizer and we wouldn't actually have to specify anything of our own.

44
00:04:03,550 --> 00:04:13,660
So if we say T F train don't Adam optimizer that's good enough that just sticks with the default values.

45
00:04:13,800 --> 00:04:20,120
I can even hit shift tab on my keyboard to bring up what those are in the quick documentation.

46
00:04:20,320 --> 00:04:26,740
But suppose we want to use our own learning rate instead of this default one then we can simply specify

47
00:04:26,860 --> 00:04:34,660
that learning rate here we can see learning rate equal to learning rate or if you want to use a positional

48
00:04:34,660 --> 00:04:40,200
argument instead of one that is named We can delete that and just leave it like this.

49
00:04:40,210 --> 00:04:42,280
I think that's perfectly fine.

50
00:04:42,580 --> 00:04:49,180
Looking back at the documentation for the optimizer the key thing that it needs to do is minimize our

51
00:04:49,180 --> 00:04:55,950
loss and the optimizer is method that will do that is called minimize.

52
00:04:55,960 --> 00:05:01,300
So in this case we will call this function and we will supply our lost function.

53
00:05:01,420 --> 00:05:07,120
Everything else we will keep the same scrolling down we can actually see that this minimize function

54
00:05:07,420 --> 00:05:16,270
has an output as well and it outputs an operation that updates our variables in tensor flow which variables

55
00:05:16,810 --> 00:05:19,430
while our weights and our biases.

56
00:05:19,450 --> 00:05:19,750
Right.

57
00:05:20,590 --> 00:05:28,690
So let's store the output of minimize in a variable called Train underscore step and that will be equal

58
00:05:28,690 --> 00:05:36,170
to our optimizer dot minimize and then loss.

59
00:05:37,270 --> 00:05:43,330
So in these two lines of code we've told tensor flow which optimizer we want to use we've initialized

60
00:05:43,330 --> 00:05:44,680
our optimizer here.

61
00:05:45,160 --> 00:05:51,470
And we've said what the learning rate should be for the atom optimizer the next thing we've done is

62
00:05:51,470 --> 00:05:57,470
we've nailed down what the operation is that the optimizer will do to minimize the loss.

63
00:05:57,470 --> 00:06:04,070
We've said which loss it should minimize namely this one right here and we're storing this work in something

64
00:06:04,070 --> 00:06:07,500
we're calling train underscore step.

65
00:06:07,760 --> 00:06:13,910
The reason we're doing this is so that we can call this training step again and again and again in a

66
00:06:13,910 --> 00:06:20,620
loop when we're training our model that way we will minimize our loss as we iterate over our data.

67
00:06:20,630 --> 00:06:25,640
Now I know there's quite a bit of setup that we're doing here with tensor flow and this is why karats

68
00:06:25,640 --> 00:06:26,780
exists right.

69
00:06:26,810 --> 00:06:30,260
This is why there's a bridge to make all of this easier.

70
00:06:30,260 --> 00:06:36,680
When you first get started but essentially what we're doing is we're laying out all the calculations

71
00:06:37,040 --> 00:06:43,040
and all the variables ahead of time for tensor flow so that when it comes to running the calculations

72
00:06:43,460 --> 00:06:50,480
it knows what those are another calculation that it will need to run for example is it needs to calculate

73
00:06:50,480 --> 00:06:52,480
the accuracy of the model.

74
00:06:52,580 --> 00:06:56,760
So this is another thing that we're going to have to outline ahead of time.

75
00:06:56,870 --> 00:07:00,190
So what I'll do is add a markdown cell here.

76
00:07:00,440 --> 00:07:07,100
It's gonna be very small and it's just going to read accuracy metric that another markdown sell him

77
00:07:07,850 --> 00:07:14,270
that reads defining optimize them and let another mark down sell him.

78
00:07:14,480 --> 00:07:16,250
That's going to read.

79
00:07:16,250 --> 00:07:19,240
Defining loss function.

80
00:07:19,290 --> 00:07:22,130
There we go to calculate the accuracy.

81
00:07:22,220 --> 00:07:27,830
We need to compare two things our prediction and the true label.

82
00:07:27,830 --> 00:07:28,270
Right.

83
00:07:28,310 --> 00:07:30,560
And we need to check if they're equal.

84
00:07:30,560 --> 00:07:35,110
Once we know if they are equal then we have a correct prediction.

85
00:07:35,240 --> 00:07:41,410
And once we know how many correct predictions we have we can work out the accuracy of our model.

86
00:07:41,540 --> 00:07:43,280
So let's start with that.

87
00:07:43,280 --> 00:07:49,300
The correct on underscore score prediction is going to be equal to the result of a comparison.

88
00:07:49,340 --> 00:07:56,320
The result of a calculation tensor flow has a function called equal which will make this comparison

89
00:07:56,330 --> 00:08:00,560
for us we want to compare two quantities in this case.

90
00:08:00,830 --> 00:08:03,350
One of them is going to be our output.

91
00:08:03,440 --> 00:08:06,510
And the other one is going to be the true label.

92
00:08:06,560 --> 00:08:13,070
So the output was the output from our final layer which we've called output.

93
00:08:13,070 --> 00:08:13,340
Right.

94
00:08:14,180 --> 00:08:19,230
And the true labels will be stored in a tensor called Capital Y.

95
00:08:19,370 --> 00:08:25,280
Now we have to make one more modification to this code because the y values will actually look something

96
00:08:25,280 --> 00:08:26,450
like this.

97
00:08:26,450 --> 00:08:32,420
We're going to have 10 different values most of them going to be zero and one of them is going to be

98
00:08:32,420 --> 00:08:34,200
equal to one.

99
00:08:34,220 --> 00:08:39,860
So what we'll have to do is we'll have to pick out the index of the largest one.

100
00:08:39,940 --> 00:08:40,610
Right.

101
00:08:40,670 --> 00:08:45,080
This one is going to be at index 5 0 1 2 3 4 5.

102
00:08:45,080 --> 00:08:50,900
The index will correspond to the class of the label and we're gonna have to do something very very similar

103
00:08:50,930 --> 00:08:52,730
for the outputs from our final layer.

104
00:08:52,730 --> 00:08:53,410
Right.

105
00:08:53,540 --> 00:09:00,320
Because we're getting 10 outputs each one of them will be a probability between 0 and 1 and we have

106
00:09:00,320 --> 00:09:07,160
to pick the largest probability out of the 10 and then pick out the index where that largest probability

107
00:09:07,160 --> 00:09:15,110
occurs the reason we've got these nice probabilities is because our output is using these soft Max activation

108
00:09:15,110 --> 00:09:16,600
function.

109
00:09:16,610 --> 00:09:24,590
So what's the easiest way to get the index of the largest probability tensor flow has a function called

110
00:09:24,860 --> 00:09:34,160
ARC Max and ARG Max will pull out the index of the maximum value from our output and just has to know

111
00:09:34,160 --> 00:09:34,730
where to look.

112
00:09:35,270 --> 00:09:37,300
And it has to look along a row.

113
00:09:37,340 --> 00:09:37,900
So we'll see.

114
00:09:37,900 --> 00:09:44,120
Axis is equal to 1 and will do the same thing for our y values.

115
00:09:44,130 --> 00:09:53,030
So TFT aren't Max parentheses y comma axis equals 1.

116
00:09:53,040 --> 00:10:01,680
Now we can compare whether this value is equal to this value is our highest probability prediction equal

117
00:10:01,680 --> 00:10:03,810
to the actual label.

118
00:10:05,850 --> 00:10:12,120
And once we've got these predictions we can calculate the accuracy the accuracy is going to be equal

119
00:10:12,120 --> 00:10:17,280
to the average of the accuracy of all the patches where we're doing the calculation.

120
00:10:17,310 --> 00:10:26,670
So once again we're using tensor flow reduced mean and here we're going to supply the correct prediction.

121
00:10:26,760 --> 00:10:32,850
Now I want to make another small modification him because I'd like to be 100 percent sure that I've

122
00:10:32,850 --> 00:10:34,200
got a decimal number here.

123
00:10:34,650 --> 00:10:43,050
So I'm going to just convert this correct prediction into a decimal number with TAF dot cost parentheses

124
00:10:43,650 --> 00:10:45,220
correct underscore.

125
00:10:45,300 --> 00:10:50,790
Fred T of dot float 32.

126
00:10:50,910 --> 00:10:58,230
So now I've converted my correct predictions into a decimal number and I'm averaging them across all

127
00:10:58,230 --> 00:11:05,130
the patches that we're gonna be doing the training on and that way we've calculated the accuracy.

128
00:11:05,130 --> 00:11:10,680
The downside of having to do all the setup ahead of time is we can't press like shift ends around these

129
00:11:10,680 --> 00:11:13,320
cells and see if we've done it all correctly.

130
00:11:13,320 --> 00:11:19,320
We'll only find out when we're actually training our model and we're getting closer to that in every

131
00:11:19,320 --> 00:11:20,240
lesson.

132
00:11:20,310 --> 00:11:21,590
So stay tuned.

133
00:11:21,630 --> 00:11:23,090
It's coming right up.

134
00:11:23,130 --> 00:11:24,540
I'll see you in the next lesson.

135
00:11:24,540 --> 00:11:25,290
Take care.