1 00:00:00,440 --> 00:00:05,430 Now, we are going to start with a complete end to end project in this project. 2 00:00:06,240 --> 00:00:11,310 We will try to classify colored images of cats and dogs. 3 00:00:14,070 --> 00:00:21,690 So if take up dataset from Google, Google is a website where a lot of data science competitions are 4 00:00:21,690 --> 00:00:22,200 being held. 5 00:00:23,040 --> 00:00:29,340 There was a competition which was held in 2003 in which thousands of images of cats and dogs were given 6 00:00:30,060 --> 00:00:35,280 and a model was to be built to classify those images into cats and dogs. 7 00:00:37,650 --> 00:00:42,030 The best accuracy achieved in that competition was nearly ninety eight percent. 8 00:00:43,650 --> 00:00:51,480 We are going to use a subset of that data and try to build our model and really try to achieve over 9 00:00:51,480 --> 00:00:53,580 90 percent accuracy with our model. 10 00:00:57,260 --> 00:00:59,480 Here are some of the details of this project. 11 00:01:01,310 --> 00:01:08,720 This is a binary classification problem, unlike fashion amnesty in which there were 10 categories to 12 00:01:08,720 --> 00:01:09,350 be predicted. 13 00:01:10,370 --> 00:01:11,630 Here we have only two. 14 00:01:12,410 --> 00:01:15,980 Either that images of a cat or it is of a dog. 15 00:01:18,050 --> 00:01:19,220 So only two glasses. 16 00:01:19,400 --> 00:01:21,710 That is why it is a binary classification problem. 17 00:01:23,720 --> 00:01:26,300 Then this is a data set of coloured images. 18 00:01:27,440 --> 00:01:32,960 That is, we will have three channels are GNB instead of only one channel. 19 00:01:33,290 --> 00:01:36,740 As we have, in fact, amnesty does it then? 20 00:01:37,490 --> 00:01:40,880 We do not have a standard dimension of all these images. 21 00:01:42,680 --> 00:01:48,350 As you saw in the previous project, we were using grindy eight by 28 pixel images. 22 00:01:49,340 --> 00:01:53,350 But here are dataset does not have one standard dimension. 23 00:01:54,680 --> 00:02:01,810 So when we are feeding the data to our model, we will have to convert the images to one standard dimension. 24 00:02:02,630 --> 00:02:04,040 So that is one additional step. 25 00:02:07,010 --> 00:02:08,580 Then we are using a gaggle dataset. 26 00:02:09,320 --> 00:02:16,070 If you are interested, you can go to the Kaggle website and see this cat versus dog competition. 27 00:02:18,050 --> 00:02:19,790 You can also see the leaderboard there. 28 00:02:20,390 --> 00:02:22,100 How much accuracy people have achieved. 29 00:02:23,270 --> 00:02:26,630 And you can compare your model with other people's model. 30 00:02:28,550 --> 00:02:32,180 And the last point is we are going to use a subset of the total data. 31 00:02:32,780 --> 00:02:37,520 The total data had over 50000 images in our model. 32 00:02:37,670 --> 00:02:48,080 We are going to use only 4000 images to tell them to train 1000 for validation dataset and 1000 for 33 00:02:48,080 --> 00:02:48,500 testing. 34 00:02:51,290 --> 00:02:58,490 So using only this small part of the data, we are still going to achieve accuracy's, which are comparable 35 00:02:58,850 --> 00:03:01,760 to the other models built by people in the competition. 36 00:03:05,700 --> 00:03:07,990 So here is how we have structured the data. 37 00:03:10,200 --> 00:03:19,890 These zip file that you download from the link that we have provided has 4000 images and those images 38 00:03:20,490 --> 00:03:22,050 are structured in this format. 39 00:03:23,820 --> 00:03:27,060 So the first folder will have three folders inside of it. 40 00:03:28,920 --> 00:03:33,180 These three folders will be tighter, green, valid and paste. 41 00:03:34,800 --> 00:03:38,760 The drain folder will further have two folders. 42 00:03:39,420 --> 00:03:41,730 These folders will be cats and dogs. 43 00:03:42,480 --> 00:03:46,890 So class air here is cat and Class B is dogs. 44 00:03:48,330 --> 00:03:51,420 And this folder, we will have thousand images of cats. 45 00:03:52,380 --> 00:03:55,020 And in this world that we will have thousand images of dogs. 46 00:03:56,730 --> 00:04:03,780 Similarly, in validation dataset, there'll be two folders, one containing 500 images of cats, the 47 00:04:03,780 --> 00:04:08,470 other containing 500 images of dogs in the testing dataset. 48 00:04:08,820 --> 00:04:10,890 We'll have our own images. 49 00:04:11,490 --> 00:04:16,530 So in total, there are 4000 images, 2000 will be used for training. 50 00:04:16,530 --> 00:04:21,280 The Model 1000 will be used for validation set. 51 00:04:22,110 --> 00:04:27,460 And the last Aldan images will be used for testing the accuracy on previously unseen data. 52 00:04:31,870 --> 00:04:38,320 So the process we are going to follow while building this project is this first we will be creating 53 00:04:38,440 --> 00:04:43,480 a CNN model with four convolutional layers. 54 00:04:44,140 --> 00:04:48,580 So it will have four different conversion layers paired with pooling layers. 55 00:04:50,140 --> 00:04:57,340 And this model will be able to achieve accuracy in the range of 70 to 75 percent. 56 00:04:57,760 --> 00:04:59,440 I'm talking about validation, accuracy here. 57 00:05:01,300 --> 00:05:04,990 So this model will be able to achieve somewhere between 70 to 75. 58 00:05:07,000 --> 00:05:16,450 Then because we have a small dataset, we can improve the performance of our model by doing data augmentation. 59 00:05:17,810 --> 00:05:24,130 Data augmentation is the process of creating artificial images using these small dataset that you have. 60 00:05:25,780 --> 00:05:29,020 So in the second step, we will augment our data. 61 00:05:29,380 --> 00:05:37,360 And then then our model again, for example, if you have this image of a cat, you can create a new 62 00:05:37,360 --> 00:05:44,770 image by zooming in a small part of this image, or you can create a new image by rotating this image 63 00:05:44,770 --> 00:05:45,460 of a cat. 64 00:05:47,110 --> 00:05:54,010 And there are many more transformations that you can do to this image to create a similar image of a 65 00:05:54,010 --> 00:05:56,230 cat using an existing image. 66 00:05:57,760 --> 00:06:02,920 So using one image, you'll be able to create multiple images just by transforming the image a little 67 00:06:02,920 --> 00:06:03,160 bit. 68 00:06:03,920 --> 00:06:10,060 Transformations include linear transformations, rotations, zooming in, zooming out, etc.. 69 00:06:11,740 --> 00:06:19,020 So after you do this and you run the model again, you'll be able to achieve an accuracy or 80 percent. 70 00:06:22,260 --> 00:06:30,240 Lastly, we'll use one of the architectures that we have discussed previously, and we will try to implement 71 00:06:30,270 --> 00:06:39,270 those learned architectures to try to classify this as a dog dataset, using that Prytania architecture 72 00:06:39,810 --> 00:06:42,720 will be able to achieve an accuracy over 90 percent. 73 00:06:45,600 --> 00:06:55,140 So after this project, you'll have understanding of how to import images, how to run binary or multiclass 74 00:06:55,140 --> 00:07:03,480 classification using CNN and how to use Prytania architectures to solve the problem that you have with 75 00:07:03,480 --> 00:07:03,630 you.