1 00:00:00,630 --> 00:00:09,490 Now, in the last part of this project, we're going to use a pre trained model, we just named Lee 2 00:00:09,490 --> 00:00:10,520 Didi's 16. 3 00:00:12,030 --> 00:00:15,390 This model was Binet of 2013 image that competition. 4 00:00:17,160 --> 00:00:20,600 However, we now have more advanced architecture also. 5 00:00:21,870 --> 00:00:30,250 But for this problem, we are going to consider a logged CNN model which was trained on imaging data 6 00:00:30,300 --> 00:00:30,810 dataset. 7 00:00:32,070 --> 00:00:39,450 This image dataset had one point four million labeled images and they were chosen different classes 8 00:00:39,840 --> 00:00:44,020 in which images were to be classified image. 9 00:00:44,350 --> 00:00:48,960 It contained animal glasses, different species of cats and dogs. 10 00:00:49,740 --> 00:00:55,590 And does we expect this model to perform well on the cats versus dogs classification problem? 11 00:00:55,670 --> 00:00:56,100 Also. 12 00:00:58,790 --> 00:01:01,770 So we'll use this BDD 16 architecture. 13 00:01:02,800 --> 00:01:10,390 This will delivered in 2014, and it is a simple and widely used, convolutional network architecture. 14 00:01:10,630 --> 00:01:12,100 What image problem? 15 00:01:14,640 --> 00:01:22,470 Although it is a little bit older and there are some really there are some advanced and somewhat heavier 16 00:01:22,470 --> 00:01:29,160 recent mortals, this architecture will be easier to understand, and that is why we are going to use 17 00:01:29,400 --> 00:01:31,620 this architecture of this particular problem. 18 00:01:35,370 --> 00:01:40,620 Also, this is one of the connectors which comes prepackaged that get us. 19 00:01:41,610 --> 00:01:48,670 So we did not load up the beaks of all dealers get us has these boots stored in it. 20 00:01:52,920 --> 00:01:59,130 So let's see how they use these convolutional layers of this breathing model. 21 00:02:00,840 --> 00:02:02,410 Again, I created a new project. 22 00:02:02,670 --> 00:02:06,060 If you are working on the same project, you need not loaded. 23 00:02:06,170 --> 00:02:06,500 Ready? 24 00:02:07,110 --> 00:02:07,650 Get us. 25 00:02:07,800 --> 00:02:10,500 And you'll need not create these directories again. 26 00:02:11,490 --> 00:02:12,540 This is my new budget. 27 00:02:12,840 --> 00:02:13,590 I went to do it. 28 00:02:22,920 --> 00:02:27,790 Not to instantiate DVD 16 model are too loaded. 29 00:02:28,580 --> 00:02:31,000 We use this application, understood. 30 00:02:31,180 --> 00:02:32,510 We did 16 function. 31 00:02:33,890 --> 00:02:40,070 This will load all the weight at all the cells in this PGD 16 architecture. 32 00:02:40,520 --> 00:02:48,890 When it was trained on the image that does it, the other parameters that include top include top refers 33 00:02:48,890 --> 00:02:54,380 to including or excluding the densely connected classifier. 34 00:02:54,980 --> 00:02:56,120 On top of the network. 35 00:02:57,470 --> 00:03:07,070 So as I told you, we Didi's 16 was actually used to classify images into tosing classes since. 36 00:03:07,140 --> 00:03:07,980 And our problem. 37 00:03:08,150 --> 00:03:09,410 We have only two classes. 38 00:03:10,460 --> 00:03:14,060 The dentally connected list is not required. 39 00:03:14,450 --> 00:03:18,590 So we'll have our own densely connected layer at the end of the model. 40 00:03:19,430 --> 00:03:23,690 We are just going to use the convolutional part of this model. 41 00:03:26,120 --> 00:03:31,850 Input chip parameter is an optional battery that we are telling that the images that we are going to 42 00:03:31,880 --> 00:03:33,560 input are in this shape. 43 00:03:33,800 --> 00:03:37,130 One could be very unhappy pixels the three channels. 44 00:03:38,060 --> 00:03:44,630 However, if we do not give this input chip, it will still accept shapes of any dimensions. 45 00:03:46,780 --> 00:03:49,660 So let's load the convolutional wake's. 46 00:03:57,360 --> 00:03:57,710 Now, the. 47 00:03:58,290 --> 00:03:58,670 Downloaded. 48 00:04:00,440 --> 00:04:02,180 It took nearly five to six minutes for me. 49 00:04:03,400 --> 00:04:08,990 Now, since the data is downloaded, we can look at how is this convolutional structured? 50 00:04:10,220 --> 00:04:11,450 Let's run this. 51 00:04:12,510 --> 00:04:16,110 And look at what does the architecture in this model. 52 00:04:18,090 --> 00:04:21,370 We did you 16 is a big architecture. 53 00:04:22,620 --> 00:04:26,630 We have already discussed the entire architecture in auto to relate to. 54 00:04:28,130 --> 00:04:31,560 It takes in an input of 150 women, 50 by three. 55 00:04:31,680 --> 00:04:34,080 This is the input that we have specified. 56 00:04:36,300 --> 00:04:41,510 Then there is a convolutional block which is giving out 64 features. 57 00:04:42,330 --> 00:04:45,710 Then there is another congressional led with 60 vote vetoes. 58 00:04:46,910 --> 00:04:52,080 Then there is a pulling leg, which is doing up two by two pulling. 59 00:04:52,800 --> 00:04:54,860 That is traducing height and weight, too. 60 00:04:56,130 --> 00:04:57,660 Then there are two convolutional legs. 61 00:04:57,960 --> 00:04:59,250 Again, a max pooling. 62 00:05:00,420 --> 00:05:04,080 Then again, three convolutional layers with one max cooling. 63 00:05:04,650 --> 00:05:07,330 Then three convolutional layers, one with max cooling. 64 00:05:08,040 --> 00:05:11,650 And then there are three more commission layers with one max pooling. 65 00:05:12,750 --> 00:05:19,750 So at the end of it, we have 512 features of four by four tails. 66 00:05:20,560 --> 00:05:26,730 So it is a very small format for Matrix, with 512 features extracted. 67 00:05:29,660 --> 00:05:37,460 Overall, there were fourteen point seven million parameters on the laptop that we use for day to day 68 00:05:37,460 --> 00:05:41,060 work, running such a model would have been not possible. 69 00:05:42,230 --> 00:05:51,050 That is why for routine jobs, using a preordering model makes more sense than to creating a new model 70 00:05:51,410 --> 00:05:54,770 with so many bad and we just to be doing so. 71 00:05:54,860 --> 00:06:01,600 In fact, if you are not running a DIPU system, training such a model would be not possible on a C 72 00:06:01,600 --> 00:06:02,360 Bubis system. 73 00:06:04,520 --> 00:06:08,500 Now this is only the convolutional base after the base. 74 00:06:08,540 --> 00:06:12,020 We need to attach a fully connected classifier. 75 00:06:13,280 --> 00:06:17,690 That is, there should be few layers of normal neural network. 76 00:06:18,440 --> 00:06:24,680 And the final output layer should have only one neuron with a sigmoid activation function. 77 00:06:24,950 --> 00:06:28,940 Telling us which binary class does that image belong to. 78 00:06:30,150 --> 00:06:34,680 So the output of this convolutional base is flacking. 79 00:06:36,660 --> 00:06:38,010 So we have a flattening led. 80 00:06:39,460 --> 00:06:43,480 Then we have a hidden lair with 256 neurons 81 00:06:46,840 --> 00:06:53,170 and activation function relu, the output of these 256 neurons goes into only one neuron. 82 00:06:53,290 --> 00:06:58,930 And this neuron has a sigmoid activation function, which basically tells us the probability of belonging 83 00:06:58,930 --> 00:07:00,430 to one class. 84 00:07:02,910 --> 00:07:05,130 So we've done this also. 85 00:07:05,870 --> 00:07:13,380 And our model is now the entire convolutional base, which is then, as we did, 16. 86 00:07:14,950 --> 00:07:16,660 The three layers that we have added. 87 00:07:19,130 --> 00:07:25,970 But noted that all these sixteen point eight million parameters are right now trainable parameters. 88 00:07:27,610 --> 00:07:34,280 But we are going to do is we are going to fix the weight of the convolutional base. 89 00:07:34,610 --> 00:07:37,700 We are not going to retrain the convolutional base. 90 00:07:38,590 --> 00:07:45,030 We are assuming that the convolutional base of the BGT model, since it was also trained on a similar 91 00:07:45,140 --> 00:07:48,530 to, say, which had got breeds and dog breeds. 92 00:07:49,040 --> 00:07:51,800 And it was actually working on thousand different classes. 93 00:07:52,780 --> 00:08:00,470 The kind of features that it was taking out of the images, the same features could be used to just 94 00:08:00,470 --> 00:08:02,150 distinguish between cats and dogs. 95 00:08:03,590 --> 00:08:11,080 So the idea is the congressional leaders are giving us the features, the same features which were identified 96 00:08:11,680 --> 00:08:18,980 by Viji in image net dataset, the same features can be used for our Douglass's tax classify. 97 00:08:20,320 --> 00:08:25,060 Those features will be input into our own neural network. 98 00:08:25,150 --> 00:08:25,750 At the end. 99 00:08:26,240 --> 00:08:28,590 And that neural network only has to be trained. 100 00:08:29,660 --> 00:08:34,450 So to freeze these rates in the convolutional base, we use dysfunction. 101 00:08:34,520 --> 00:08:34,800 Freeze. 102 00:08:34,900 --> 00:08:35,340 Wait. 103 00:08:37,030 --> 00:08:45,810 And when we done this debate of neurons in the convolutional base that is in this BDD 16 model, these 104 00:08:45,820 --> 00:08:46,750 will be fixed. 105 00:08:47,460 --> 00:08:51,080 It has fourteen point seven million parameters that were to retrain. 106 00:08:52,360 --> 00:08:59,050 Now, if we look at our model, only these two million parameters remain, which are to retrain the 107 00:08:59,200 --> 00:09:03,250 other fourteen point seven million parameters, which were part of BGT 16. 108 00:09:03,850 --> 00:09:06,700 These are Tumnus non-criminal parameters. 109 00:09:06,940 --> 00:09:07,690 These are fixed. 110 00:09:10,870 --> 00:09:15,430 So that is all our model is really the architecture is set. 111 00:09:16,300 --> 00:09:24,730 We just have to repeat whatever we did and now we have to train DTG, which will be giving images in 112 00:09:24,730 --> 00:09:26,140 batches of 20. 113 00:09:27,520 --> 00:09:34,300 He also doing the documentation that is new artificial images will be created by randomly changing these 114 00:09:34,300 --> 00:09:43,060 parameters, giving it other Dacian of some degrees, shifting it by some pixels, zooming in those 115 00:09:43,070 --> 00:09:44,320 coming out until on. 116 00:09:47,120 --> 00:09:55,210 So this is what we did a little better than we import this thing, dodging into the train generator, 117 00:09:55,540 --> 00:09:58,000 which will give us images in batches of 20. 118 00:10:01,360 --> 00:10:08,860 Then we done this test data, which will only be scanned the test images by 250 feet. 119 00:10:10,570 --> 00:10:15,850 And we use it in vegetation generator, which will do the same for ventilation images. 120 00:10:17,800 --> 00:10:18,910 Then we compiled a model. 121 00:10:19,480 --> 00:10:24,150 Compiling the same loss function is going to be binary cross entropy. 122 00:10:25,210 --> 00:10:31,960 However, in the optimizer, which didn't have automats prop optimize it, but we are using a very small 123 00:10:31,960 --> 00:10:33,700 learning rate in this model. 124 00:10:35,140 --> 00:10:39,190 So earlier we had finished about minus faught as the learning rate. 125 00:10:39,790 --> 00:10:44,110 Now we have reduced that even further to doing 210 but minus five. 126 00:10:44,940 --> 00:10:51,820 This is because we want to take very small steps in the direction of optimum, the metric that we are 127 00:10:51,820 --> 00:10:52,420 going to monitor. 128 00:10:52,460 --> 00:10:53,440 That is accuracy. 129 00:10:53,950 --> 00:10:59,380 So we compiled a model to know that one two have come by. 130 00:10:59,560 --> 00:11:00,890 Do not test the architecture. 131 00:11:01,270 --> 00:11:06,100 Otherwise, these parameters that we froze will not be frozen anymore. 132 00:11:06,460 --> 00:11:12,880 So if you change the architecture, you have to freeze these parameters again and then compile again. 133 00:11:14,070 --> 00:11:17,620 So compiling would be the last step before you bring your model. 134 00:11:19,910 --> 00:11:23,810 Now we are going to train our model for this. 135 00:11:23,900 --> 00:11:29,830 We are again going to use the train, Jeneda, which will input 20 training images, editing steps. 136 00:11:30,040 --> 00:11:37,030 But Epoch is said 200, which means we are going to use 2000 images in one book. 137 00:11:37,510 --> 00:11:37,940 Twenty. 138 00:11:38,180 --> 00:11:38,960 Into one hundred. 139 00:11:39,230 --> 00:11:40,550 So 2000 images. 140 00:11:40,960 --> 00:11:42,970 But each book will be input. 141 00:11:43,250 --> 00:11:47,710 For training, e-books is set to 30 model gain. 142 00:11:48,080 --> 00:11:54,500 If we do not see the solution converging, we can run this model logging for more number of epochs. 143 00:11:57,120 --> 00:12:02,940 Validation data is a validation data, which will again give validation images in batches of going deep. 144 00:12:04,170 --> 00:12:11,100 Validation Steps is 50 because we are going to use 1000 images for validation purposes. 145 00:12:12,630 --> 00:12:19,800 So, again, I'm going to warn you that it is very important that this will be done on your computer. 146 00:12:20,130 --> 00:12:23,160 Only if your computer has sufficient computational power. 147 00:12:23,850 --> 00:12:28,790 So if you're running tens of law on Deepu, all you have good RAM in your C.P.U. 148 00:12:29,310 --> 00:12:31,950 Only then this model will be able to train. 149 00:12:32,490 --> 00:12:37,020 Otherwise, it is going to take awfully lot of pain and you won't be able to get the results. 150 00:12:37,650 --> 00:12:39,060 So only done. 151 00:12:39,450 --> 00:12:41,670 If you have good computational power in your computer.