1 00:00:00,300 --> 00:00:03,210 In the last video we broke down what's going on here. 2 00:00:04,370 --> 00:00:12,020 What we're doing is essentially just passing Kerry's sequential model and input a.k.a. our image of 3 00:00:12,020 --> 00:00:19,460 a dog and then mobile Net V2 a.k.a. this model you are all that we're using is going to find the patterns 4 00:00:19,490 --> 00:00:26,090 inside it and then we're going hey no we don't want the same shape as mobile Net V2 we'd rather convert 5 00:00:26,090 --> 00:00:33,530 it to the size of our own output shape which is the amount of labels that we have with the amount of 6 00:00:33,530 --> 00:00:35,770 unique labels that we have. 7 00:00:35,780 --> 00:00:38,570 So now let's talk about what's going on in compile. 8 00:00:38,810 --> 00:00:47,380 And by the way if you do want to dive deeper on any of this I encourage you to search it up and figure 9 00:00:47,380 --> 00:00:49,000 it out and try out yourself. 10 00:00:49,000 --> 00:00:51,840 We're going to see the outputs of this later on. 11 00:00:51,850 --> 00:00:56,860 So we're going to actually see the code running but whenever you look up something and try to figure 12 00:00:56,860 --> 00:01:01,570 it out even if you don't understand it the first time trying to figure things out for yourself is a 13 00:01:01,570 --> 00:01:03,710 way to really cement your knowledge. 14 00:01:03,730 --> 00:01:06,370 So if anything here you want to look up and find more. 15 00:01:06,370 --> 00:01:11,650 Don't be afraid to ask questions and don't be afraid to look it up and check it out what's going on 16 00:01:11,650 --> 00:01:13,880 behind the scenes for yourself. 17 00:01:13,890 --> 00:01:18,460 Now what is happening with model dot compile this is something else you'll see when we're building a 18 00:01:18,460 --> 00:01:27,670 model with carers and so I think this one is best explained with a story let's say you're at the International 19 00:01:27,730 --> 00:01:29,860 Hill descending championships. 20 00:01:29,860 --> 00:01:30,910 That's right. 21 00:01:31,060 --> 00:01:39,080 Your one of the world's best at going down hills and you start standing on the top of a hill and now 22 00:01:39,080 --> 00:01:47,380 your goal is to get to the bottom of the hill but the catch is that you're blindfolded and luckily your 23 00:01:47,380 --> 00:01:54,850 friend Adam is standing at the bottom of the hill shouting instructions at you on how to get down and 24 00:01:54,870 --> 00:01:59,180 at the bottom of the hill there's a judge evaluating how well you're doing. 25 00:01:59,530 --> 00:02:01,760 They know where you need to end up. 26 00:02:02,020 --> 00:02:08,950 So they can pair how you're doing to where you're supposed to be and their comparison is how you get 27 00:02:08,950 --> 00:02:14,550 scored a.k.a. your accuracy at getting down the hill you might be wondering. 28 00:02:14,720 --> 00:02:18,280 Daniel why am I at the International Hill descending championships. 29 00:02:18,290 --> 00:02:23,050 Well let me break it down this is where the model dot compile comes into play. 30 00:02:24,560 --> 00:02:29,770 Let's channel into this terminology because this can seem very confusing when you first begin. 31 00:02:29,930 --> 00:02:34,730 So the loss so the loss is the height of the hill. 32 00:02:34,760 --> 00:02:42,110 Now our model's goal is to minimize the loss getting it to zero a.k.a. getting to the bottom of the 33 00:02:42,110 --> 00:02:43,230 hill. 34 00:02:43,250 --> 00:02:46,960 So this means that the model is learning perfectly. 35 00:02:46,970 --> 00:02:50,380 Loss is a measure of when the model is learning. 36 00:02:51,050 --> 00:02:59,440 So when it's going through the training set and comparing the image to a label the loss is a measure 37 00:02:59,440 --> 00:03:01,630 of how well the model is guessing. 38 00:03:01,660 --> 00:03:07,090 So when it's trying to make a prediction what is this learning the higher the loss the worse the prediction 39 00:03:07,090 --> 00:03:08,070 is. 40 00:03:08,230 --> 00:03:13,510 So the worse the model is learning patterns the lower the loss the better the model is learning patterns 41 00:03:13,510 --> 00:03:18,370 just like you if you're descending the hill the higher you are on the Hill at the International Hill 42 00:03:18,370 --> 00:03:22,290 descending championships the higher you are the worse off you're doing. 43 00:03:22,510 --> 00:03:25,160 So your goal is to get to the bottom of the hill. 44 00:03:25,270 --> 00:03:32,090 And now if we come back let's discuss what the optimizer is in our Hill story so your friend Adam at 45 00:03:32,090 --> 00:03:39,050 the bottom of the Hill who's telling you how to get down the hill is the optimizer he is the one telling 46 00:03:39,050 --> 00:03:43,220 you how to navigate the hill a.k.a. lower the lost function. 47 00:03:43,220 --> 00:03:47,600 So how high you are on the Hill because he can see what's going on. 48 00:03:47,600 --> 00:03:56,120 He can see your movements so he's basing his instructions on what you've done so far and now his name 49 00:03:56,120 --> 00:04:02,270 is Adam your friend's name is Adam because the Adam optimizer Yes it's actually called the Adam optimizer 50 00:04:02,960 --> 00:04:10,770 is a great general optimizer which performs well on most models so if we have a look at this we go to 51 00:04:10,840 --> 00:04:18,940 Adam optimizer Adam the latest trends in deep learning gentle instruction for Adam optimization and 52 00:04:18,940 --> 00:04:30,400 if we go machine learning model optimizes it's gonna tell us a few more types of optimization algorithms. 53 00:04:30,700 --> 00:04:32,220 So this is what's going on here. 54 00:04:32,230 --> 00:04:35,980 The lost function is how high you are on the Hill you're trying to minimize that because you're at the 55 00:04:36,040 --> 00:04:38,790 International Hill descent championships. 56 00:04:39,100 --> 00:04:42,400 The optimizer is your friend Adam at the bottom. 57 00:04:42,400 --> 00:04:49,210 You can also use another optimizer called ALM s prop or stochastic gradient descent but generally to 58 00:04:49,210 --> 00:04:51,940 begin with Adam is pretty good on most problems. 59 00:04:52,150 --> 00:04:57,190 So Adam's telling you how to get down the hill because rember you're blindfolded you're a model who's 60 00:04:57,190 --> 00:05:02,320 going through the training data for the first time and trying to learn patterns or trying to go down 61 00:05:02,320 --> 00:05:05,290 the bottom of the hill for the first time blindfolded. 62 00:05:05,290 --> 00:05:08,750 And then finally the metrics here. 63 00:05:08,860 --> 00:05:11,380 This is the on look at the bottom of the hill. 64 00:05:11,440 --> 00:05:16,750 This is the judge of how well your performance is at the bottom of the hill telling you how you're going 65 00:05:16,750 --> 00:05:17,640 at the championship. 66 00:05:19,050 --> 00:05:28,300 So in our case it's giving us the accuracy of how well our model is predicting the correct image label. 67 00:05:29,280 --> 00:05:34,550 So that's a fair bit to go through but these are three parts of most deep learning models. 68 00:05:34,680 --> 00:05:37,600 Basically all of them you're going to have some sort of lost function. 69 00:05:37,680 --> 00:05:43,350 So how well your model is guessing you're gonna have an optimizer function so a function that helps 70 00:05:43,350 --> 00:05:51,720 your model improve its guesses and then a metric which is a way of evaluating those guesses after it's 71 00:05:51,720 --> 00:05:52,830 learned. 72 00:05:52,890 --> 00:05:57,240 So this is a step that we're gonna have to take with when we're building any carries deep learning model. 73 00:05:57,240 --> 00:06:05,580 We define the model as some sort of layers and then we define how the model is going to learn so hold 74 00:06:05,580 --> 00:06:07,070 onto that little story. 75 00:06:07,170 --> 00:06:11,400 If you have any questions about the story or if you want to figure out what lost function and want to 76 00:06:11,400 --> 00:06:20,540 use search up something like what loss function should I use how to choose loss functions when training 77 00:06:20,540 --> 00:06:26,270 deep learning models beautiful is a great resource or it might be. 78 00:06:26,740 --> 00:06:32,590 I'll leave a resource there how you can choose a lost function but mostly it's going to depend on what 79 00:06:32,590 --> 00:06:34,510 problem you're dealing with. 80 00:06:34,570 --> 00:06:39,970 So if you're using binary classification a.k.a. if you're predicting whether something is one thing 81 00:06:39,970 --> 00:06:47,690 or another such as images of cats or dogs you would want to change your activation function to sigmoid 82 00:06:48,020 --> 00:06:53,360 and then your loss function to binary cross entropy. 83 00:06:53,360 --> 00:07:03,590 So we change this to binary cross entropy but because we're doing multi class we keep it at categorical 84 00:07:03,590 --> 00:07:10,520 cross entropy because if we come here multi class classification activation soft Max loss categorical 85 00:07:10,520 --> 00:07:12,410 cross entropy. 86 00:07:12,410 --> 00:07:13,760 Now this is a lot to take on. 87 00:07:14,000 --> 00:07:16,150 So don't worry if you don't get it to begin with. 88 00:07:16,160 --> 00:07:17,930 But remember the whole story. 89 00:07:17,930 --> 00:07:19,310 Remember what the loss is. 90 00:07:19,310 --> 00:07:25,070 Remember what the optimizer is just your friend Adam telling you how to walk down the hill and the metrics 91 00:07:25,340 --> 00:07:27,110 you could use a bunch of different metrics here. 92 00:07:27,140 --> 00:07:34,850 So we go here TAF carer's metrics is gonna tell you some metrics that you can use. 93 00:07:34,880 --> 00:07:42,620 Accuracy is the default one for classification but we've got area under the curve categorical accuracy 94 00:07:43,760 --> 00:07:44,600 a whole bunch. 95 00:07:44,620 --> 00:07:52,760 Main precision recall so a whole bunch of different options I'll be sure to link those as well but that 96 00:07:52,850 --> 00:07:58,550 we've gone through what compiling the model does and then finally we can finish off with this one because 97 00:07:58,550 --> 00:08:05,240 building I think you can imagine what's happening here is just another little way to say hey this is 98 00:08:05,240 --> 00:08:07,540 the input shape we're going to take to our model. 99 00:08:07,700 --> 00:08:11,960 So we're using a carer's layer from tensor flow hub. 100 00:08:11,960 --> 00:08:18,830 And if we come back to here it says that if we want to use this layer we set up a sequential model and 101 00:08:18,830 --> 00:08:27,050 then build the model with our input shape and it's this shape because that is the size of images that 102 00:08:27,050 --> 00:08:36,320 mobile Net V2 was trained on all right now that has been enough talking we've broken down what our create 103 00:08:36,320 --> 00:08:38,110 model function does. 104 00:08:38,250 --> 00:08:44,450 If you have any questions be sure to leave it in the question and answer or in the discord chat but 105 00:08:44,840 --> 00:08:47,300 let's have a look at what's going on. 106 00:08:47,360 --> 00:08:51,860 Well probably actually in the next video just quickly debrief what's going on in summary. 107 00:08:52,070 --> 00:08:57,620 But the next thing we want to do is create some callbacks for our model. 108 00:08:57,650 --> 00:09:03,770 So while our model is training callbacks are gonna be some functions that are going to implement a few 109 00:09:03,770 --> 00:09:09,530 little helpful things that our model can do while it's learning patterns in the data. 110 00:09:09,530 --> 00:09:11,060 So I'll see you in the next video.