1
00:00:00,440 --> 00:00:05,430
Now, we are going to start with a complete end to end project in this project.

2
00:00:06,240 --> 00:00:11,310
We will try to classify colored images of cats and dogs.

3
00:00:14,070 --> 00:00:21,690
So if take up dataset from Google, Google is a website where a lot of data science competitions are

4
00:00:21,690 --> 00:00:22,200
being held.

5
00:00:23,040 --> 00:00:29,340
There was a competition which was held in 2003 in which thousands of images of cats and dogs were given

6
00:00:30,060 --> 00:00:35,280
and a model was to be built to classify those images into cats and dogs.

7
00:00:37,650 --> 00:00:42,030
The best accuracy achieved in that competition was nearly ninety eight percent.

8
00:00:43,650 --> 00:00:51,480
We are going to use a subset of that data and try to build our model and really try to achieve over

9
00:00:51,480 --> 00:00:53,580
90 percent accuracy with our model.

10
00:00:57,260 --> 00:00:59,480
Here are some of the details of this project.

11
00:01:01,310 --> 00:01:08,720
This is a binary classification problem, unlike fashion amnesty in which there were 10 categories to

12
00:01:08,720 --> 00:01:09,350
be predicted.

13
00:01:10,370 --> 00:01:11,630
Here we have only two.

14
00:01:12,410 --> 00:01:15,980
Either that images of a cat or it is of a dog.

15
00:01:18,050 --> 00:01:19,220
So only two glasses.

16
00:01:19,400 --> 00:01:21,710
That is why it is a binary classification problem.

17
00:01:23,720 --> 00:01:26,300
Then this is a data set of coloured images.

18
00:01:27,440 --> 00:01:32,960
That is, we will have three channels are GNB instead of only one channel.

19
00:01:33,290 --> 00:01:36,740
As we have, in fact, amnesty does it then?

20
00:01:37,490 --> 00:01:40,880
We do not have a standard dimension of all these images.

21
00:01:42,680 --> 00:01:48,350
As you saw in the previous project, we were using grindy eight by 28 pixel images.

22
00:01:49,340 --> 00:01:53,350
But here are dataset does not have one standard dimension.

23
00:01:54,680 --> 00:02:01,810
So when we are feeding the data to our model, we will have to convert the images to one standard dimension.

24
00:02:02,630 --> 00:02:04,040
So that is one additional step.

25
00:02:07,010 --> 00:02:08,580
Then we are using a gaggle dataset.

26
00:02:09,320 --> 00:02:16,070
If you are interested, you can go to the Kaggle website and see this cat versus dog competition.

27
00:02:18,050 --> 00:02:19,790
You can also see the leaderboard there.

28
00:02:20,390 --> 00:02:22,100
How much accuracy people have achieved.

29
00:02:23,270 --> 00:02:26,630
And you can compare your model with other people's model.

30
00:02:28,550 --> 00:02:32,180
And the last point is we are going to use a subset of the total data.

31
00:02:32,780 --> 00:02:37,520
The total data had over 50000 images in our model.

32
00:02:37,670 --> 00:02:48,080
We are going to use only 4000 images to tell them to train 1000 for validation dataset and 1000 for

33
00:02:48,080 --> 00:02:48,500
testing.

34
00:02:51,290 --> 00:02:58,490
So using only this small part of the data, we are still going to achieve accuracy's, which are comparable

35
00:02:58,850 --> 00:03:01,760
to the other models built by people in the competition.

36
00:03:05,700 --> 00:03:07,990
So here is how we have structured the data.

37
00:03:10,200 --> 00:03:19,890
These zip file that you download from the link that we have provided has 4000 images and those images

38
00:03:20,490 --> 00:03:22,050
are structured in this format.

39
00:03:23,820 --> 00:03:27,060
So the first folder will have three folders inside of it.

40
00:03:28,920 --> 00:03:33,180
These three folders will be tighter, green, valid and paste.

41
00:03:34,800 --> 00:03:38,760
The drain folder will further have two folders.

42
00:03:39,420 --> 00:03:41,730
These folders will be cats and dogs.

43
00:03:42,480 --> 00:03:46,890
So class air here is cat and Class B is dogs.

44
00:03:48,330 --> 00:03:51,420
And this folder, we will have thousand images of cats.

45
00:03:52,380 --> 00:03:55,020
And in this world that we will have thousand images of dogs.

46
00:03:56,730 --> 00:04:03,780
Similarly, in validation dataset, there'll be two folders, one containing 500 images of cats, the

47
00:04:03,780 --> 00:04:08,470
other containing 500 images of dogs in the testing dataset.

48
00:04:08,820 --> 00:04:10,890
We'll have our own images.

49
00:04:11,490 --> 00:04:16,530
So in total, there are 4000 images, 2000 will be used for training.

50
00:04:16,530 --> 00:04:21,280
The Model 1000 will be used for validation set.

51
00:04:22,110 --> 00:04:27,460
And the last Aldan images will be used for testing the accuracy on previously unseen data.

52
00:04:31,870 --> 00:04:38,320
So the process we are going to follow while building this project is this first we will be creating

53
00:04:38,440 --> 00:04:43,480
a CNN model with four convolutional layers.

54
00:04:44,140 --> 00:04:48,580
So it will have four different conversion layers paired with pooling layers.

55
00:04:50,140 --> 00:04:57,340
And this model will be able to achieve accuracy in the range of 70 to 75 percent.

56
00:04:57,760 --> 00:04:59,440
I'm talking about validation, accuracy here.

57
00:05:01,300 --> 00:05:04,990
So this model will be able to achieve somewhere between 70 to 75.

58
00:05:07,000 --> 00:05:16,450
Then because we have a small dataset, we can improve the performance of our model by doing data augmentation.

59
00:05:17,810 --> 00:05:24,130
Data augmentation is the process of creating artificial images using these small dataset that you have.

60
00:05:25,780 --> 00:05:29,020
So in the second step, we will augment our data.

61
00:05:29,380 --> 00:05:37,360
And then then our model again, for example, if you have this image of a cat, you can create a new

62
00:05:37,360 --> 00:05:44,770
image by zooming in a small part of this image, or you can create a new image by rotating this image

63
00:05:44,770 --> 00:05:45,460
of a cat.

64
00:05:47,110 --> 00:05:54,010
And there are many more transformations that you can do to this image to create a similar image of a

65
00:05:54,010 --> 00:05:56,230
cat using an existing image.

66
00:05:57,760 --> 00:06:02,920
So using one image, you'll be able to create multiple images just by transforming the image a little

67
00:06:02,920 --> 00:06:03,160
bit.

68
00:06:03,920 --> 00:06:10,060
Transformations include linear transformations, rotations, zooming in, zooming out, etc..

69
00:06:11,740 --> 00:06:19,020
So after you do this and you run the model again, you'll be able to achieve an accuracy or 80 percent.

70
00:06:22,260 --> 00:06:30,240
Lastly, we'll use one of the architectures that we have discussed previously, and we will try to implement

71
00:06:30,270 --> 00:06:39,270
those learned architectures to try to classify this as a dog dataset, using that Prytania architecture

72
00:06:39,810 --> 00:06:42,720
will be able to achieve an accuracy over 90 percent.

73
00:06:45,600 --> 00:06:55,140
So after this project, you'll have understanding of how to import images, how to run binary or multiclass

74
00:06:55,140 --> 00:07:03,480
classification using CNN and how to use Prytania architectures to solve the problem that you have with

75
00:07:03,480 --> 00:07:03,630
you.