1 00:00:00,930 --> 00:00:01,320 All right. 2 00:00:01,350 --> 00:00:04,620 So in this lesson we're going to set up our lost function. 3 00:00:04,650 --> 00:00:12,240 We're gonna set up our optimizer and we're going to set up some metrics like the accuracy how add a 4 00:00:12,240 --> 00:00:20,550 markdown cell here loss optimization and metrics. 5 00:00:20,610 --> 00:00:24,780 The very first thing that we're going to tackle is the loss function. 6 00:00:24,990 --> 00:00:29,300 Now the lost function is key for optimizing the neural network. 7 00:00:29,370 --> 00:00:35,860 In this example we'll continue to use the cross entropy loss that we talked about in the last module. 8 00:00:35,860 --> 00:00:40,560 So let's have a look at the tensor flow documentation for cross entropy loss. 9 00:00:40,770 --> 00:00:47,640 If we Google tensor flow cross entropy then the first result I'm getting here is this one if I click 10 00:00:47,640 --> 00:00:55,710 on this result and scroll down a little bit then I can see that this function is deprecated. 11 00:00:55,710 --> 00:01:02,160 So apparently there's already a newer version of this function that's out and this new function pretty 12 00:01:02,160 --> 00:01:06,650 much has the same name except at the end it's got version 2. 13 00:01:06,690 --> 00:01:13,800 So if we click on that then in the description we can see that this function calculates the soft Max 14 00:01:13,920 --> 00:01:17,490 cross entropy which is what we're after between two things. 15 00:01:17,490 --> 00:01:25,230 The lower bits and the labels the low gits are the outputs from our output layer and the labels are 16 00:01:25,230 --> 00:01:30,790 going to be the actual labels that we're going to supply are y values. 17 00:01:30,930 --> 00:01:34,500 So let's use this function in our Jupiter notebook. 18 00:01:34,500 --> 00:01:46,000 Our loss is going to be equal to TAF Dot and Dot soft Max cross entropy with low its version 2. 19 00:01:46,290 --> 00:01:52,530 All we need to supply in the parentheses are the labels which is going to be our place holder tensor 20 00:01:52,530 --> 00:02:01,410 for our y values and our logics which is another name for the outputs that we're gonna get from our 21 00:02:01,410 --> 00:02:03,160 last layer. 22 00:02:03,210 --> 00:02:06,950 So I'm going to set low it's equal to output output here. 23 00:02:06,960 --> 00:02:13,080 Remember is what we're getting out of these soft Max activation function from our last layer. 24 00:02:14,010 --> 00:02:17,870 Now there's only one tiny modification I want to make to this line of code. 25 00:02:18,390 --> 00:02:24,020 And that modification is due to the fact that we're gonna be training our model in batches. 26 00:02:24,090 --> 00:02:30,420 We've got a big training data set and we're gonna split it up when to split it up into smaller pieces 27 00:02:30,460 --> 00:02:36,760 and we're going to train on these smaller pieces one at a time until we chew through the entire dataset. 28 00:02:36,900 --> 00:02:42,750 And that means that while this last calculation works for calculating the loss on the entire dataset 29 00:02:43,530 --> 00:02:48,230 it's not going to give us a good result when we've got individual patches. 30 00:02:48,270 --> 00:02:55,440 So what we need to do when we have individual patches is we need to take the average of the losses and 31 00:02:55,680 --> 00:03:02,400 the good thing is tend to flow has a fantastic function for us called reduce on the score mean that 32 00:03:02,400 --> 00:03:04,230 we'll do just that. 33 00:03:04,230 --> 00:03:11,160 So I'm going to wrap the output from soft Max cross entropy with low gits version to inside the parentheses 34 00:03:11,400 --> 00:03:17,950 for reduce underscore mean now that tensor flow knows what loss function we're going to use we can move 35 00:03:17,950 --> 00:03:22,930 on to the next step which is telling tensor flow which optimizer we want to use. 36 00:03:23,170 --> 00:03:30,620 So we'll store this in a variable called optimizer and that's going to be equal to. 37 00:03:30,620 --> 00:03:32,890 Well let's google for it actually. 38 00:03:32,890 --> 00:03:40,510 So tensor flow optimizer will bring up several optimizer so that we can use like the gradient descent 39 00:03:40,570 --> 00:03:47,590 optimizer or the atom optimizer Adam as we've discussed in the previous module is a state of the art 40 00:03:47,650 --> 00:03:48,790 optimizer. 41 00:03:48,790 --> 00:03:52,710 So I'll tell you what let's use this one again. 42 00:03:52,990 --> 00:03:59,470 We can see from the documentation him that there are default values for all the parameters for the atom 43 00:03:59,470 --> 00:04:03,550 optimizer and we wouldn't actually have to specify anything of our own. 44 00:04:03,550 --> 00:04:13,660 So if we say T F train don't Adam optimizer that's good enough that just sticks with the default values. 45 00:04:13,800 --> 00:04:20,120 I can even hit shift tab on my keyboard to bring up what those are in the quick documentation. 46 00:04:20,320 --> 00:04:26,740 But suppose we want to use our own learning rate instead of this default one then we can simply specify 47 00:04:26,860 --> 00:04:34,660 that learning rate here we can see learning rate equal to learning rate or if you want to use a positional 48 00:04:34,660 --> 00:04:40,200 argument instead of one that is named We can delete that and just leave it like this. 49 00:04:40,210 --> 00:04:42,280 I think that's perfectly fine. 50 00:04:42,580 --> 00:04:49,180 Looking back at the documentation for the optimizer the key thing that it needs to do is minimize our 51 00:04:49,180 --> 00:04:55,950 loss and the optimizer is method that will do that is called minimize. 52 00:04:55,960 --> 00:05:01,300 So in this case we will call this function and we will supply our lost function. 53 00:05:01,420 --> 00:05:07,120 Everything else we will keep the same scrolling down we can actually see that this minimize function 54 00:05:07,420 --> 00:05:16,270 has an output as well and it outputs an operation that updates our variables in tensor flow which variables 55 00:05:16,810 --> 00:05:19,430 while our weights and our biases. 56 00:05:19,450 --> 00:05:19,750 Right. 57 00:05:20,590 --> 00:05:28,690 So let's store the output of minimize in a variable called Train underscore step and that will be equal 58 00:05:28,690 --> 00:05:36,170 to our optimizer dot minimize and then loss. 59 00:05:37,270 --> 00:05:43,330 So in these two lines of code we've told tensor flow which optimizer we want to use we've initialized 60 00:05:43,330 --> 00:05:44,680 our optimizer here. 61 00:05:45,160 --> 00:05:51,470 And we've said what the learning rate should be for the atom optimizer the next thing we've done is 62 00:05:51,470 --> 00:05:57,470 we've nailed down what the operation is that the optimizer will do to minimize the loss. 63 00:05:57,470 --> 00:06:04,070 We've said which loss it should minimize namely this one right here and we're storing this work in something 64 00:06:04,070 --> 00:06:07,500 we're calling train underscore step. 65 00:06:07,760 --> 00:06:13,910 The reason we're doing this is so that we can call this training step again and again and again in a 66 00:06:13,910 --> 00:06:20,620 loop when we're training our model that way we will minimize our loss as we iterate over our data. 67 00:06:20,630 --> 00:06:25,640 Now I know there's quite a bit of setup that we're doing here with tensor flow and this is why karats 68 00:06:25,640 --> 00:06:26,780 exists right. 69 00:06:26,810 --> 00:06:30,260 This is why there's a bridge to make all of this easier. 70 00:06:30,260 --> 00:06:36,680 When you first get started but essentially what we're doing is we're laying out all the calculations 71 00:06:37,040 --> 00:06:43,040 and all the variables ahead of time for tensor flow so that when it comes to running the calculations 72 00:06:43,460 --> 00:06:50,480 it knows what those are another calculation that it will need to run for example is it needs to calculate 73 00:06:50,480 --> 00:06:52,480 the accuracy of the model. 74 00:06:52,580 --> 00:06:56,760 So this is another thing that we're going to have to outline ahead of time. 75 00:06:56,870 --> 00:07:00,190 So what I'll do is add a markdown cell here. 76 00:07:00,440 --> 00:07:07,100 It's gonna be very small and it's just going to read accuracy metric that another markdown sell him 77 00:07:07,850 --> 00:07:14,270 that reads defining optimize them and let another mark down sell him. 78 00:07:14,480 --> 00:07:16,250 That's going to read. 79 00:07:16,250 --> 00:07:19,240 Defining loss function. 80 00:07:19,290 --> 00:07:22,130 There we go to calculate the accuracy. 81 00:07:22,220 --> 00:07:27,830 We need to compare two things our prediction and the true label. 82 00:07:27,830 --> 00:07:28,270 Right. 83 00:07:28,310 --> 00:07:30,560 And we need to check if they're equal. 84 00:07:30,560 --> 00:07:35,110 Once we know if they are equal then we have a correct prediction. 85 00:07:35,240 --> 00:07:41,410 And once we know how many correct predictions we have we can work out the accuracy of our model. 86 00:07:41,540 --> 00:07:43,280 So let's start with that. 87 00:07:43,280 --> 00:07:49,300 The correct on underscore score prediction is going to be equal to the result of a comparison. 88 00:07:49,340 --> 00:07:56,320 The result of a calculation tensor flow has a function called equal which will make this comparison 89 00:07:56,330 --> 00:08:00,560 for us we want to compare two quantities in this case. 90 00:08:00,830 --> 00:08:03,350 One of them is going to be our output. 91 00:08:03,440 --> 00:08:06,510 And the other one is going to be the true label. 92 00:08:06,560 --> 00:08:13,070 So the output was the output from our final layer which we've called output. 93 00:08:13,070 --> 00:08:13,340 Right. 94 00:08:14,180 --> 00:08:19,230 And the true labels will be stored in a tensor called Capital Y. 95 00:08:19,370 --> 00:08:25,280 Now we have to make one more modification to this code because the y values will actually look something 96 00:08:25,280 --> 00:08:26,450 like this. 97 00:08:26,450 --> 00:08:32,420 We're going to have 10 different values most of them going to be zero and one of them is going to be 98 00:08:32,420 --> 00:08:34,200 equal to one. 99 00:08:34,220 --> 00:08:39,860 So what we'll have to do is we'll have to pick out the index of the largest one. 100 00:08:39,940 --> 00:08:40,610 Right. 101 00:08:40,670 --> 00:08:45,080 This one is going to be at index 5 0 1 2 3 4 5. 102 00:08:45,080 --> 00:08:50,900 The index will correspond to the class of the label and we're gonna have to do something very very similar 103 00:08:50,930 --> 00:08:52,730 for the outputs from our final layer. 104 00:08:52,730 --> 00:08:53,410 Right. 105 00:08:53,540 --> 00:09:00,320 Because we're getting 10 outputs each one of them will be a probability between 0 and 1 and we have 106 00:09:00,320 --> 00:09:07,160 to pick the largest probability out of the 10 and then pick out the index where that largest probability 107 00:09:07,160 --> 00:09:15,110 occurs the reason we've got these nice probabilities is because our output is using these soft Max activation 108 00:09:15,110 --> 00:09:16,600 function. 109 00:09:16,610 --> 00:09:24,590 So what's the easiest way to get the index of the largest probability tensor flow has a function called 110 00:09:24,860 --> 00:09:34,160 ARC Max and ARG Max will pull out the index of the maximum value from our output and just has to know 111 00:09:34,160 --> 00:09:34,730 where to look. 112 00:09:35,270 --> 00:09:37,300 And it has to look along a row. 113 00:09:37,340 --> 00:09:37,900 So we'll see. 114 00:09:37,900 --> 00:09:44,120 Axis is equal to 1 and will do the same thing for our y values. 115 00:09:44,130 --> 00:09:53,030 So TFT aren't Max parentheses y comma axis equals 1. 116 00:09:53,040 --> 00:10:01,680 Now we can compare whether this value is equal to this value is our highest probability prediction equal 117 00:10:01,680 --> 00:10:03,810 to the actual label. 118 00:10:05,850 --> 00:10:12,120 And once we've got these predictions we can calculate the accuracy the accuracy is going to be equal 119 00:10:12,120 --> 00:10:17,280 to the average of the accuracy of all the patches where we're doing the calculation. 120 00:10:17,310 --> 00:10:26,670 So once again we're using tensor flow reduced mean and here we're going to supply the correct prediction. 121 00:10:26,760 --> 00:10:32,850 Now I want to make another small modification him because I'd like to be 100 percent sure that I've 122 00:10:32,850 --> 00:10:34,200 got a decimal number here. 123 00:10:34,650 --> 00:10:43,050 So I'm going to just convert this correct prediction into a decimal number with TAF dot cost parentheses 124 00:10:43,650 --> 00:10:45,220 correct underscore. 125 00:10:45,300 --> 00:10:50,790 Fred T of dot float 32. 126 00:10:50,910 --> 00:10:58,230 So now I've converted my correct predictions into a decimal number and I'm averaging them across all 127 00:10:58,230 --> 00:11:05,130 the patches that we're gonna be doing the training on and that way we've calculated the accuracy. 128 00:11:05,130 --> 00:11:10,680 The downside of having to do all the setup ahead of time is we can't press like shift ends around these 129 00:11:10,680 --> 00:11:13,320 cells and see if we've done it all correctly. 130 00:11:13,320 --> 00:11:19,320 We'll only find out when we're actually training our model and we're getting closer to that in every 131 00:11:19,320 --> 00:11:20,240 lesson. 132 00:11:20,310 --> 00:11:21,590 So stay tuned. 133 00:11:21,630 --> 00:11:23,090 It's coming right up. 134 00:11:23,130 --> 00:11:24,540 I'll see you in the next lesson. 135 00:11:24,540 --> 00:11:25,290 Take care.