1
00:00:00,360 --> 00:00:04,980
Now we've got a beautiful function to create a model for us and we've actually used it already we've

2
00:00:04,980 --> 00:00:07,500
instantiated a model and got the summary.

3
00:00:07,560 --> 00:00:09,970
You might be looking at this like what's going on here.

4
00:00:10,230 --> 00:00:12,560
Even looking at this like what's going on here.

5
00:00:12,570 --> 00:00:17,930
So let's go through this step by step and figure out what's actually happening.

6
00:00:17,940 --> 00:00:24,620
So first of all we instantiate a model so set up the model layers by using model equals t after a crystal

7
00:00:24,640 --> 00:00:25,590
sequential.

8
00:00:25,590 --> 00:00:27,030
What's happening here.

9
00:00:27,120 --> 00:00:29,280
A linear stack of layers.

10
00:00:29,340 --> 00:00:35,800
So when it says that what it actually means is it's just a stack of layers.

11
00:00:35,800 --> 00:00:42,470
That's going to take some sort of input find patterns in that input and then have some sort of output.

12
00:00:42,610 --> 00:00:44,220
That's what the linear stack of layers make.

13
00:00:44,500 --> 00:00:48,670
So if we look here what's the first layer it's going to run from top to bottom.

14
00:00:48,670 --> 00:00:49,170
Here we go.

15
00:00:49,180 --> 00:00:51,700
Here's another collab era.

16
00:00:51,730 --> 00:00:53,620
You might see this hopefully it fixes itself.

17
00:00:53,620 --> 00:01:02,970
Otherwise we can fix it later the first layer is hub carers layer model U.R.L. what this is saying is

18
00:01:02,970 --> 00:01:06,840
it's telling tensor flow hub to create a carer's layer.

19
00:01:06,840 --> 00:01:12,420
So what we're doing is we're using carers to build out deep learning model it's telling it to create

20
00:01:12,420 --> 00:01:21,930
a carer's layer of model U.R.L. which is our mobile Net V2 architecture so you might be wondering what's

21
00:01:21,930 --> 00:01:29,020
going on with mobile Net V2 and what do we and they have two layers.

22
00:01:29,020 --> 00:01:30,430
Well let's check that out.

23
00:01:31,090 --> 00:01:35,160
Let's go mobile net the two architecture

24
00:01:40,050 --> 00:01:47,950
so here we go review mobile Net V2 lightweight model image classification.

25
00:01:48,030 --> 00:01:53,070
So this is what I do when I'm trying to figure out what's going on with a model that I'm using maybe

26
00:01:53,070 --> 00:01:58,170
I found a model from tensor flow Harb or maybe I've seen a tutorial or using some sort of model I'm

27
00:01:58,170 --> 00:02:00,030
trying to figure out what's happening.

28
00:02:00,120 --> 00:02:03,530
So we've got mobile Net V2 convolution or blocks.

29
00:02:03,600 --> 00:02:06,180
Now if you wanted to check out that you might want to look up.

30
00:02:06,210 --> 00:02:14,530
This is com short for convolution so you might want to go what is convolution no network

31
00:02:18,390 --> 00:02:22,750
so I'll leave a great resource for you to check that out but here's essentially what's happening.

32
00:02:22,770 --> 00:02:24,000
That's a good image actually

33
00:02:28,480 --> 00:02:32,810
oh another medium article beautiful.

34
00:02:32,890 --> 00:02:34,650
So here's what's happening.

35
00:02:34,690 --> 00:02:41,580
We have an input image and then mobile Net V2 has a whole bunch of these top of layers built into it

36
00:02:42,580 --> 00:02:49,510
so they're going to look at the input image do some data transformations on it and then output it to

37
00:02:49,510 --> 00:02:54,470
something a.k.a. a list of numbers.

38
00:02:54,480 --> 00:03:00,570
So this is what our job is if we come back to the Kino we've prepared our inputs and we're passing it

39
00:03:00,570 --> 00:03:07,590
to the machine learning algorithm in our case my net V2 and it's gonna output something so the beautiful

40
00:03:07,590 --> 00:03:13,440
thing about transfer learning is that all of these parts are taken care of for us using mobile Net V2

41
00:03:14,430 --> 00:03:19,380
we've defined our inputs and this is what we want at the end some sort of output.

42
00:03:19,440 --> 00:03:24,930
Now let's come back to this review of mobile Net V2 and it's okay if you don't fully understand about

43
00:03:24,930 --> 00:03:26,040
what's going on on here.

44
00:03:26,040 --> 00:03:31,080
This is part of the experimentation of deep learning machine learning it's figuring out what's going

45
00:03:31,080 --> 00:03:37,350
on what is happening to our data when we transform it our first focus is to always write working code

46
00:03:37,830 --> 00:03:44,620
and then figure out what's happening after that if we want to dive deeper we can so the overall architecture

47
00:03:45,430 --> 00:03:52,060
of mobile Net V2 it takes an input of size 224 squared times three a.k.a. the height and width of our

48
00:03:52,060 --> 00:04:00,550
images times color channels it does a whole bunch of transformations so takes this input does transformations

49
00:04:00,550 --> 00:04:03,000
like this okay.

50
00:04:03,150 --> 00:04:10,000
And then finally it's going to output a single list of numbers that are a size of 12 one hundred and

51
00:04:10,070 --> 00:04:16,150
eighty so that's the last layer and now you'll see this overall architecture and in papers and whatnot

52
00:04:16,500 --> 00:04:21,450
hopefully they for their good paper they've included this and if they're in the better paper that included

53
00:04:21,460 --> 00:04:26,140
code and you'll see this and you'll be wondering what is this.

54
00:04:26,170 --> 00:04:31,240
Well it's something like this again you'll be wondering what is this.

55
00:04:31,240 --> 00:04:34,170
So let's have a look what it might look like in num pi.

56
00:04:34,170 --> 00:04:48,090
So if we go outputs equals NPV let's just do ones of shape equals one one twelve eighty and then outputs

57
00:04:49,680 --> 00:04:54,720
so it's just going to be a long list of different numbers but 12 80.

58
00:04:54,730 --> 00:04:58,530
We don't actually need that.

59
00:04:58,560 --> 00:05:00,870
This is where our dense light comes in.

60
00:05:01,800 --> 00:05:05,490
This is where our output is coming.

61
00:05:05,580 --> 00:05:18,110
We want our outputs in the form or in the shape of 120 because that's how many labels we have so let's

62
00:05:18,110 --> 00:05:20,570
put this all together.

63
00:05:20,640 --> 00:05:26,460
We have an input of an image mobile Net v2 is going to go through for us and find all the patterns and

64
00:05:26,460 --> 00:05:30,570
then condense them into one long array of numbers twelve hundred and eighty long.

65
00:05:31,260 --> 00:05:35,790
But then this is a beautiful thing about transfer learning is because all of this has been implemented

66
00:05:35,790 --> 00:05:38,780
for us all we have to do is say hi.

67
00:05:38,790 --> 00:05:47,190
Actually instead of that 12 80 we want our output to be in the shape of however many labels we have

68
00:05:49,560 --> 00:05:54,340
and you might be wondering now Okay what is dance do.

69
00:05:54,380 --> 00:05:59,810
Let's look at the dock strength just you regulate density connected and then layer.

70
00:06:00,210 --> 00:06:06,690
Well thanks Doc string that actually doesn't make too much sense but you might be wondering what the

71
00:06:06,690 --> 00:06:08,950
activation is now Soph.

72
00:06:08,940 --> 00:06:11,190
Max let's check that out.

73
00:06:11,190 --> 00:06:13,110
What is soft Max.

74
00:06:13,200 --> 00:06:19,100
And this is where you're going to get a whole bunch of mathematical jargon but let's just go to Wikipedia

75
00:06:19,810 --> 00:06:27,000
now if you read that first line it might sound pretty confusing to you unless you've got a mathematics

76
00:06:27,000 --> 00:06:28,020
degree or something like that.

77
00:06:28,020 --> 00:06:38,450
But the main thing here is that after applying soft Max each component will be the interval 0 1 and

78
00:06:38,450 --> 00:06:41,150
the components will add up to 1.

79
00:06:41,150 --> 00:06:46,170
Now we will see this later on but that's all you have to know for now.

80
00:06:46,310 --> 00:06:52,740
And if you're wondering what soft Max do I use which activation which lost function.

81
00:06:52,740 --> 00:06:57,690
If we're working with binary classification the activation function is sigmoid.

82
00:06:58,200 --> 00:07:05,320
So we come here sigmoid and if we're working with multi class classification the activation is soft

83
00:07:05,330 --> 00:07:14,830
Max soft Max so that is what is happening in here we're creating a carer's model it's going to run in

84
00:07:14,830 --> 00:07:22,030
sequential fashion the first layer it's gonna call is the model U.R.L. which is actually the mobile

85
00:07:22,030 --> 00:07:29,980
Net V2 architecture which has been implemented for us and within that mobile Net V2 architecture is

86
00:07:29,980 --> 00:07:37,000
going to be a series of convolutions which are going to find patterns in our input images and learn

87
00:07:37,000 --> 00:07:39,110
the features of those images.

88
00:07:39,250 --> 00:07:42,150
If you're wondering what a feature is that's come up to an image.

89
00:07:42,400 --> 00:07:49,810
Let's take this one for example a convolution may look at each pixel in this image and go okay.

90
00:07:49,880 --> 00:07:55,840
There's a vertical line there's a circle here there's a circle here there's a horizontal line here.

91
00:07:55,940 --> 00:08:03,140
Now the thing is we don't tell the model which of these features it learns it figures them out on its

92
00:08:03,230 --> 00:08:06,170
own so that's the important takeaway there.

93
00:08:06,180 --> 00:08:10,500
That's the whole premise of machine learning is that it figures out the patterns in these images for

94
00:08:10,500 --> 00:08:11,550
us.

95
00:08:11,550 --> 00:08:12,320
So we come back.

96
00:08:12,930 --> 00:08:20,670
So once it's gone through the mobile Net V2 architecture it's going to output a single array of size

97
00:08:20,940 --> 00:08:24,750
12 80 with all of the patterns it's learned in an image.

98
00:08:24,990 --> 00:08:27,920
But we want to tell hey we want to go.

99
00:08:27,930 --> 00:08:30,420
No we don't need 180 patterns.

100
00:08:30,660 --> 00:08:39,030
We need 120 and then we want to use the activation soft Max to convert those patterns into numbers between

101
00:08:39,030 --> 00:08:42,930
0 and 1 and then the highest number.

102
00:08:43,470 --> 00:08:44,750
So all of the outputs.

103
00:08:44,760 --> 00:08:52,140
So it'll be an array of 120 all of those will add up to one and the highest value is which one is our

104
00:08:52,140 --> 00:08:53,170
label.

105
00:08:53,190 --> 00:08:58,890
Now that was a lot to take in but we're going to over the next few videos break this down even more

106
00:08:59,430 --> 00:09:04,150
so in the next one we're going to go through what's happening when we compile a model.

107
00:09:04,290 --> 00:09:05,460
So I'll see you there.