1 00:00:00,360 --> 00:00:04,980 Now we've got a beautiful function to create a model for us and we've actually used it already we've 2 00:00:04,980 --> 00:00:07,500 instantiated a model and got the summary. 3 00:00:07,560 --> 00:00:09,970 You might be looking at this like what's going on here. 4 00:00:10,230 --> 00:00:12,560 Even looking at this like what's going on here. 5 00:00:12,570 --> 00:00:17,930 So let's go through this step by step and figure out what's actually happening. 6 00:00:17,940 --> 00:00:24,620 So first of all we instantiate a model so set up the model layers by using model equals t after a crystal 7 00:00:24,640 --> 00:00:25,590 sequential. 8 00:00:25,590 --> 00:00:27,030 What's happening here. 9 00:00:27,120 --> 00:00:29,280 A linear stack of layers. 10 00:00:29,340 --> 00:00:35,800 So when it says that what it actually means is it's just a stack of layers. 11 00:00:35,800 --> 00:00:42,470 That's going to take some sort of input find patterns in that input and then have some sort of output. 12 00:00:42,610 --> 00:00:44,220 That's what the linear stack of layers make. 13 00:00:44,500 --> 00:00:48,670 So if we look here what's the first layer it's going to run from top to bottom. 14 00:00:48,670 --> 00:00:49,170 Here we go. 15 00:00:49,180 --> 00:00:51,700 Here's another collab era. 16 00:00:51,730 --> 00:00:53,620 You might see this hopefully it fixes itself. 17 00:00:53,620 --> 00:01:02,970 Otherwise we can fix it later the first layer is hub carers layer model U.R.L. what this is saying is 18 00:01:02,970 --> 00:01:06,840 it's telling tensor flow hub to create a carer's layer. 19 00:01:06,840 --> 00:01:12,420 So what we're doing is we're using carers to build out deep learning model it's telling it to create 20 00:01:12,420 --> 00:01:21,930 a carer's layer of model U.R.L. which is our mobile Net V2 architecture so you might be wondering what's 21 00:01:21,930 --> 00:01:29,020 going on with mobile Net V2 and what do we and they have two layers. 22 00:01:29,020 --> 00:01:30,430 Well let's check that out. 23 00:01:31,090 --> 00:01:35,160 Let's go mobile net the two architecture 24 00:01:40,050 --> 00:01:47,950 so here we go review mobile Net V2 lightweight model image classification. 25 00:01:48,030 --> 00:01:53,070 So this is what I do when I'm trying to figure out what's going on with a model that I'm using maybe 26 00:01:53,070 --> 00:01:58,170 I found a model from tensor flow Harb or maybe I've seen a tutorial or using some sort of model I'm 27 00:01:58,170 --> 00:02:00,030 trying to figure out what's happening. 28 00:02:00,120 --> 00:02:03,530 So we've got mobile Net V2 convolution or blocks. 29 00:02:03,600 --> 00:02:06,180 Now if you wanted to check out that you might want to look up. 30 00:02:06,210 --> 00:02:14,530 This is com short for convolution so you might want to go what is convolution no network 31 00:02:18,390 --> 00:02:22,750 so I'll leave a great resource for you to check that out but here's essentially what's happening. 32 00:02:22,770 --> 00:02:24,000 That's a good image actually 33 00:02:28,480 --> 00:02:32,810 oh another medium article beautiful. 34 00:02:32,890 --> 00:02:34,650 So here's what's happening. 35 00:02:34,690 --> 00:02:41,580 We have an input image and then mobile Net V2 has a whole bunch of these top of layers built into it 36 00:02:42,580 --> 00:02:49,510 so they're going to look at the input image do some data transformations on it and then output it to 37 00:02:49,510 --> 00:02:54,470 something a.k.a. a list of numbers. 38 00:02:54,480 --> 00:03:00,570 So this is what our job is if we come back to the Kino we've prepared our inputs and we're passing it 39 00:03:00,570 --> 00:03:07,590 to the machine learning algorithm in our case my net V2 and it's gonna output something so the beautiful 40 00:03:07,590 --> 00:03:13,440 thing about transfer learning is that all of these parts are taken care of for us using mobile Net V2 41 00:03:14,430 --> 00:03:19,380 we've defined our inputs and this is what we want at the end some sort of output. 42 00:03:19,440 --> 00:03:24,930 Now let's come back to this review of mobile Net V2 and it's okay if you don't fully understand about 43 00:03:24,930 --> 00:03:26,040 what's going on on here. 44 00:03:26,040 --> 00:03:31,080 This is part of the experimentation of deep learning machine learning it's figuring out what's going 45 00:03:31,080 --> 00:03:37,350 on what is happening to our data when we transform it our first focus is to always write working code 46 00:03:37,830 --> 00:03:44,620 and then figure out what's happening after that if we want to dive deeper we can so the overall architecture 47 00:03:45,430 --> 00:03:52,060 of mobile Net V2 it takes an input of size 224 squared times three a.k.a. the height and width of our 48 00:03:52,060 --> 00:04:00,550 images times color channels it does a whole bunch of transformations so takes this input does transformations 49 00:04:00,550 --> 00:04:03,000 like this okay. 50 00:04:03,150 --> 00:04:10,000 And then finally it's going to output a single list of numbers that are a size of 12 one hundred and 51 00:04:10,070 --> 00:04:16,150 eighty so that's the last layer and now you'll see this overall architecture and in papers and whatnot 52 00:04:16,500 --> 00:04:21,450 hopefully they for their good paper they've included this and if they're in the better paper that included 53 00:04:21,460 --> 00:04:26,140 code and you'll see this and you'll be wondering what is this. 54 00:04:26,170 --> 00:04:31,240 Well it's something like this again you'll be wondering what is this. 55 00:04:31,240 --> 00:04:34,170 So let's have a look what it might look like in num pi. 56 00:04:34,170 --> 00:04:48,090 So if we go outputs equals NPV let's just do ones of shape equals one one twelve eighty and then outputs 57 00:04:49,680 --> 00:04:54,720 so it's just going to be a long list of different numbers but 12 80. 58 00:04:54,730 --> 00:04:58,530 We don't actually need that. 59 00:04:58,560 --> 00:05:00,870 This is where our dense light comes in. 60 00:05:01,800 --> 00:05:05,490 This is where our output is coming. 61 00:05:05,580 --> 00:05:18,110 We want our outputs in the form or in the shape of 120 because that's how many labels we have so let's 62 00:05:18,110 --> 00:05:20,570 put this all together. 63 00:05:20,640 --> 00:05:26,460 We have an input of an image mobile Net v2 is going to go through for us and find all the patterns and 64 00:05:26,460 --> 00:05:30,570 then condense them into one long array of numbers twelve hundred and eighty long. 65 00:05:31,260 --> 00:05:35,790 But then this is a beautiful thing about transfer learning is because all of this has been implemented 66 00:05:35,790 --> 00:05:38,780 for us all we have to do is say hi. 67 00:05:38,790 --> 00:05:47,190 Actually instead of that 12 80 we want our output to be in the shape of however many labels we have 68 00:05:49,560 --> 00:05:54,340 and you might be wondering now Okay what is dance do. 69 00:05:54,380 --> 00:05:59,810 Let's look at the dock strength just you regulate density connected and then layer. 70 00:06:00,210 --> 00:06:06,690 Well thanks Doc string that actually doesn't make too much sense but you might be wondering what the 71 00:06:06,690 --> 00:06:08,950 activation is now Soph. 72 00:06:08,940 --> 00:06:11,190 Max let's check that out. 73 00:06:11,190 --> 00:06:13,110 What is soft Max. 74 00:06:13,200 --> 00:06:19,100 And this is where you're going to get a whole bunch of mathematical jargon but let's just go to Wikipedia 75 00:06:19,810 --> 00:06:27,000 now if you read that first line it might sound pretty confusing to you unless you've got a mathematics 76 00:06:27,000 --> 00:06:28,020 degree or something like that. 77 00:06:28,020 --> 00:06:38,450 But the main thing here is that after applying soft Max each component will be the interval 0 1 and 78 00:06:38,450 --> 00:06:41,150 the components will add up to 1. 79 00:06:41,150 --> 00:06:46,170 Now we will see this later on but that's all you have to know for now. 80 00:06:46,310 --> 00:06:52,740 And if you're wondering what soft Max do I use which activation which lost function. 81 00:06:52,740 --> 00:06:57,690 If we're working with binary classification the activation function is sigmoid. 82 00:06:58,200 --> 00:07:05,320 So we come here sigmoid and if we're working with multi class classification the activation is soft 83 00:07:05,330 --> 00:07:14,830 Max soft Max so that is what is happening in here we're creating a carer's model it's going to run in 84 00:07:14,830 --> 00:07:22,030 sequential fashion the first layer it's gonna call is the model U.R.L. which is actually the mobile 85 00:07:22,030 --> 00:07:29,980 Net V2 architecture which has been implemented for us and within that mobile Net V2 architecture is 86 00:07:29,980 --> 00:07:37,000 going to be a series of convolutions which are going to find patterns in our input images and learn 87 00:07:37,000 --> 00:07:39,110 the features of those images. 88 00:07:39,250 --> 00:07:42,150 If you're wondering what a feature is that's come up to an image. 89 00:07:42,400 --> 00:07:49,810 Let's take this one for example a convolution may look at each pixel in this image and go okay. 90 00:07:49,880 --> 00:07:55,840 There's a vertical line there's a circle here there's a circle here there's a horizontal line here. 91 00:07:55,940 --> 00:08:03,140 Now the thing is we don't tell the model which of these features it learns it figures them out on its 92 00:08:03,230 --> 00:08:06,170 own so that's the important takeaway there. 93 00:08:06,180 --> 00:08:10,500 That's the whole premise of machine learning is that it figures out the patterns in these images for 94 00:08:10,500 --> 00:08:11,550 us. 95 00:08:11,550 --> 00:08:12,320 So we come back. 96 00:08:12,930 --> 00:08:20,670 So once it's gone through the mobile Net V2 architecture it's going to output a single array of size 97 00:08:20,940 --> 00:08:24,750 12 80 with all of the patterns it's learned in an image. 98 00:08:24,990 --> 00:08:27,920 But we want to tell hey we want to go. 99 00:08:27,930 --> 00:08:30,420 No we don't need 180 patterns. 100 00:08:30,660 --> 00:08:39,030 We need 120 and then we want to use the activation soft Max to convert those patterns into numbers between 101 00:08:39,030 --> 00:08:42,930 0 and 1 and then the highest number. 102 00:08:43,470 --> 00:08:44,750 So all of the outputs. 103 00:08:44,760 --> 00:08:52,140 So it'll be an array of 120 all of those will add up to one and the highest value is which one is our 104 00:08:52,140 --> 00:08:53,170 label. 105 00:08:53,190 --> 00:08:58,890 Now that was a lot to take in but we're going to over the next few videos break this down even more 106 00:08:59,430 --> 00:09:04,150 so in the next one we're going to go through what's happening when we compile a model. 107 00:09:04,290 --> 00:09:05,460 So I'll see you there.