1 00:00:01,260 --> 00:00:05,390 Now, in this lecture, you're going to be on Mardle using Cadeaux. 2 00:00:07,660 --> 00:00:09,690 But before that, we will do one small thing. 3 00:00:10,280 --> 00:00:18,660 We will decode another small set of our training examples as validation set validation said is used 4 00:00:18,660 --> 00:00:21,020 to tune the model, hypothetically does. 5 00:00:22,140 --> 00:00:26,910 So we'll just take our first 5000 examples as validation set. 6 00:00:27,750 --> 00:00:32,400 And the remaining we will see them in another variable called partial training, say. 7 00:00:34,590 --> 00:00:40,710 So here I am setting the NICE's or phosphide told him in a variable called Val NICE's. 8 00:00:44,220 --> 00:00:50,480 Now, I use this very well could pick out the information of First Beethoven and stored it into revalidation 9 00:00:50,580 --> 00:00:51,060 images. 10 00:00:54,070 --> 00:00:59,380 So the first thousand green images are now stored in val images. 11 00:01:00,670 --> 00:01:05,020 The remaining part of the training images is stored in part underscored grainy images. 12 00:01:08,800 --> 00:01:10,270 We do the same thing with labels. 13 00:01:11,620 --> 00:01:19,160 The first 5000 labels are stored in that labels and the rest of the labels are stored in part grain 14 00:01:19,180 --> 00:01:19,690 labels. 15 00:01:22,890 --> 00:01:26,540 So we have created two parts of the trading examples. 16 00:01:27,050 --> 00:01:30,280 One is validation set and one is the partial dating site. 17 00:01:31,150 --> 00:01:36,760 Will use the partial training set to train the model and the validation set will be used to do only 18 00:01:36,770 --> 00:01:37,520 hyper barometer's. 19 00:01:38,270 --> 00:01:41,270 And you will see the use of validation set in the coming lichter's. 20 00:01:44,750 --> 00:01:49,610 Now, there are two ways in which we can define that model using get us. 21 00:01:50,570 --> 00:01:55,250 One is using sequential EPA and the other is using functionally EPA. 22 00:01:57,710 --> 00:02:04,040 Sequencing the EPA is used when you want to make a normal neural network with linear stack of layers. 23 00:02:05,120 --> 00:02:11,840 The functional Libya is used for a big, complex network structure where you have multiple usages of 24 00:02:11,840 --> 00:02:12,930 several small one. 25 00:02:12,930 --> 00:02:16,920 It looks wiltsie an example of functional EPA. 26 00:02:17,330 --> 00:02:23,600 When we build regression model for this classification example, we will use sequential EPA. 27 00:02:26,560 --> 00:02:29,710 Now we'll be taking three steps in defining a model. 28 00:02:30,790 --> 00:02:33,220 The first step is defining the network structure. 29 00:02:34,270 --> 00:02:42,220 This includes setting up the number of players, number of neurons in each layer and the activation 30 00:02:42,220 --> 00:02:44,050 function to be used in each layer. 31 00:02:45,190 --> 00:02:47,560 This is captured and this part of the code. 32 00:02:52,290 --> 00:02:55,350 The second part is configuring the learning process. 33 00:02:56,730 --> 00:03:03,330 This includes selecting the lost function and optimize it and some metrics will be monitored. 34 00:03:05,070 --> 00:03:06,900 This is this part of the good. 35 00:03:12,110 --> 00:03:15,920 The third is specifying the operations and training the model. 36 00:03:17,360 --> 00:03:20,960 This is done in this last part of this called. 37 00:03:23,720 --> 00:03:25,890 So let's start discussing each line of good. 38 00:03:25,970 --> 00:03:28,930 One by one here. 39 00:03:29,120 --> 00:03:32,930 We start by creating a new variable called model. 40 00:03:35,120 --> 00:03:39,040 And this video will contain the information of the structure of our network. 41 00:03:41,050 --> 00:03:50,500 This is the function we use for using sequence, the EPA us model underscores the question then to start 42 00:03:50,500 --> 00:03:51,620 defining the structure. 43 00:03:52,270 --> 00:03:54,250 We use this pipe symbol. 44 00:03:55,720 --> 00:04:00,640 This pipe operator comes with the magnet package, which is auto installed. 45 00:04:00,670 --> 00:04:07,450 When we install Kiraz package, this is used for passing values as arguments to a function. 46 00:04:09,040 --> 00:04:10,970 We can do away with the symbol also. 47 00:04:11,110 --> 00:04:16,090 But the use of this symbol makes the code more readable and compact. 48 00:04:17,080 --> 00:04:19,700 So as a good practice, we will use this operator. 49 00:04:20,960 --> 00:04:22,510 This is a pipe operator. 50 00:04:25,170 --> 00:04:34,330 And if you remember even this operator that we used by assigning Dukane images and train labels, this 51 00:04:34,330 --> 00:04:35,800 multiple assignment operator. 52 00:04:36,670 --> 00:04:40,830 This is committers Xerox packet, which was also part of the key aspects. 53 00:04:42,730 --> 00:04:46,120 We're using this also because it makes the court compact. 54 00:04:48,170 --> 00:04:53,090 So first, we plan on the input layer, my flattening. 55 00:04:53,450 --> 00:04:56,090 I mean that we have 28 micro indeed image. 56 00:04:56,930 --> 00:05:01,820 We can turn it into one dimension by putting all these pixels in one line. 57 00:05:02,690 --> 00:05:09,200 What does happen if you have this three by three two dimensional array? 58 00:05:09,590 --> 00:05:14,390 You can flatten it by putting these multiple rows in front of each other. 59 00:05:14,780 --> 00:05:17,060 So this will become a one dimensional Eddie. 60 00:05:20,280 --> 00:05:26,390 This step is important as we have to give a straight say of input values in place of what to lean, 61 00:05:26,400 --> 00:05:32,580 but you can converted in one dimensional using edit easier function also. 62 00:05:33,830 --> 00:05:36,630 But when we have cameras, why should we bother? 63 00:05:36,930 --> 00:05:46,300 Just specify here live flatten like this and specify the input shape. 64 00:05:47,760 --> 00:05:51,090 That is what is the kind of input that this layer is having. 65 00:05:52,440 --> 00:06:00,650 It will automatically convert this 28 by 28 input into seven eight report pixel values for the next 66 00:06:00,650 --> 00:06:00,810 year. 67 00:06:04,040 --> 00:06:08,400 Next, we specify the details of what thens had to live. 68 00:06:09,650 --> 00:06:17,390 That is, we are telling that this list is Dennes, meaning each neuron is connected to all neurons 69 00:06:17,480 --> 00:06:18,290 of the next year. 70 00:06:19,280 --> 00:06:26,780 And in this layer, we want one twenty eight neurons and the activation function for all these neurons 71 00:06:27,080 --> 00:06:29,880 will meet the loop that is rectified. 72 00:06:29,910 --> 00:06:32,210 Linear Unit two. 73 00:06:32,300 --> 00:06:35,100 In this way we have defined one hidden layer. 74 00:06:35,840 --> 00:06:36,600 It's a densely. 75 00:06:37,310 --> 00:06:38,730 It has 128 neurons. 76 00:06:38,960 --> 00:06:40,640 And they look at vision function. 77 00:06:43,200 --> 00:06:44,970 Next, we specify the output layer. 78 00:06:45,750 --> 00:06:47,740 You can add more layers also. 79 00:06:48,510 --> 00:06:50,280 But here I'm using only one hit. 80 00:06:50,910 --> 00:06:55,410 And one output layer in this last net. 81 00:06:55,890 --> 00:07:01,700 We have 10 neutrons waiting because we have been classes to. 82 00:07:01,710 --> 00:07:09,090 We predicted each of these neurons will be predicting the probability of one class, such as whether 83 00:07:09,090 --> 00:07:10,770 it is a shirt or a boot. 84 00:07:11,700 --> 00:07:13,730 And if you remember from your theory lecture. 85 00:07:14,910 --> 00:07:21,270 This Softmax activation, just make sure that the sum of all the probabilities come out to one. 86 00:07:22,860 --> 00:07:26,880 So this last letter has 10 neurons with softmax activation. 87 00:07:28,980 --> 00:07:30,200 That's all for the structure. 88 00:07:31,230 --> 00:07:37,650 In short, it is a 70, 80 foot hyphen, 128, 110 neural network. 89 00:07:39,810 --> 00:07:45,690 Once we have run the entire code, I would suggest that you come back to this point, an experiment 90 00:07:45,690 --> 00:07:47,640 a little bit here, right. 91 00:07:47,790 --> 00:07:50,070 To see what is the effect of having more lives. 92 00:07:50,490 --> 00:07:54,570 And what is the effect of increasing or decreasing the number of neurons. 93 00:07:54,740 --> 00:07:57,820 Indeed, in these next underscored, 94 00:08:01,350 --> 00:08:06,510 you can see that a new variable called model is created and it has the structure stolen in it. 95 00:08:08,760 --> 00:08:12,960 Now, let's look at the second step at this step. 96 00:08:13,470 --> 00:08:15,300 We configured the learning process. 97 00:08:17,710 --> 00:08:22,060 In this, the first thing is specifying optimizer. 98 00:08:23,770 --> 00:08:26,950 We have discussed the concept behind stochastic gradient descent. 99 00:08:28,150 --> 00:08:34,780 This is Didi's is that only there are other optimizers also with small differences. 100 00:08:35,700 --> 00:08:40,330 Other optimizers include Adam Automats Prop and few others. 101 00:08:42,010 --> 00:08:45,700 In fact, in the coming name, we may see a few more added to this list. 102 00:08:46,990 --> 00:08:54,430 But to answer the question which should be used when ideally the optimizer depends on the shape of the 103 00:08:54,430 --> 00:08:55,310 error function go. 104 00:08:56,890 --> 00:08:58,300 But we do not know that shape. 105 00:08:59,110 --> 00:09:01,300 So we do not know the ideal optimize it. 106 00:09:03,520 --> 00:09:08,980 But practically in most of these scenarios, all of these work very well. 107 00:09:09,940 --> 00:09:16,660 It's just that for some scenarios, as Didi's converges faster and for some situations, automats prop 108 00:09:16,660 --> 00:09:17,530 converges faster. 109 00:09:19,150 --> 00:09:23,840 So my suggestion would be labeled Branly model ones with Edgerly. 110 00:09:24,850 --> 00:09:31,060 If you think it is taking too long to convert and your alternative much improvement and training, it 111 00:09:31,060 --> 00:09:35,140 is always worth a short brio automats prop optimize it also. 112 00:09:37,420 --> 00:09:40,660 Let's move on to this again parameter which is lost function. 113 00:09:42,100 --> 00:09:49,510 We have discussed this in the theory part for classification models, views cross entropy and for regression 114 00:09:49,510 --> 00:09:52,290 models we usually mean square values. 115 00:09:54,000 --> 00:09:56,830 But within Crosson probably you will find three options. 116 00:09:58,240 --> 00:10:01,780 Which of these three should you use different on the type of problem you have? 117 00:10:03,940 --> 00:10:10,830 These arbitrary names, sparse, categorical cross entropy, binary cross entropy and Categorical Cross 118 00:10:10,830 --> 00:10:11,290 and Brookie. 119 00:10:13,930 --> 00:10:21,590 If your problem has two glasses to be predicted, like whether a male is spam of Norks, man, use deep 120 00:10:21,610 --> 00:10:22,750 binary cross Brookie. 121 00:10:25,380 --> 00:10:32,610 If you have multiple classes such as this problem where we have fashion objects and each example is 122 00:10:32,740 --> 00:10:38,260 exclusive, meaning each image contains only one object to be predicted. 123 00:10:39,580 --> 00:10:42,990 Then we use sparse, categorical cross and Ruby. 124 00:10:44,670 --> 00:10:48,500 So that is why I have written losses equal to pass categorical course and repeated. 125 00:10:51,200 --> 00:10:58,580 If we have multiple classes and one observation can belong to many classes, for example, if we are 126 00:10:58,580 --> 00:11:04,670 labelling, whether an email is from someone you know or not, and we are also labeling whether the 127 00:11:04,670 --> 00:11:06,220 email is important or not. 128 00:11:07,730 --> 00:11:13,600 Here, one e-mail can be bought, it can be from someone you know, and it can be important. 129 00:11:15,470 --> 00:11:18,100 So it may belong to two classes at the same time. 130 00:11:18,980 --> 00:11:21,910 In this interview, we use categorical cross and Droopy. 131 00:11:24,230 --> 00:11:25,550 I hope you understood this. 132 00:11:26,420 --> 00:11:28,160 Here's a test of what I just said. 133 00:11:29,600 --> 00:11:30,560 You can look at this. 134 00:11:31,160 --> 00:11:32,210 Come and take part here. 135 00:11:32,390 --> 00:11:35,240 To understand the three Crosson rupees. 136 00:11:37,420 --> 00:11:39,040 The third parameter is metrics. 137 00:11:39,940 --> 00:11:45,440 This is not mandatory, but we specified this to monitor the performance of model on the training. 138 00:11:48,380 --> 00:11:54,610 Basically, we would like to see the improvement and accuracy of our classification model on the mean 139 00:11:54,610 --> 00:11:58,310 squared error of our regression model over each epoch. 140 00:12:00,440 --> 00:12:07,340 As I told you, we go over the entire Bringuier does it several times each time. 141 00:12:07,670 --> 00:12:10,340 We will calculate the accuracy of our model. 142 00:12:10,760 --> 00:12:17,480 At that instant and store it so that we can see if the learning process is having any improvement in 143 00:12:17,480 --> 00:12:18,440 accuracy or not. 144 00:12:20,960 --> 00:12:21,940 So that these three. 145 00:12:21,980 --> 00:12:24,490 But I would just say we can run this part of the code. 146 00:12:28,080 --> 00:12:30,190 Now we are configured the learning process also. 147 00:12:30,700 --> 00:12:38,040 This brings us to the third part where we actually train, not more than training is done using the 148 00:12:38,040 --> 00:12:41,440 correct function within feet function. 149 00:12:41,680 --> 00:12:44,680 We have to specify the input variable first. 150 00:12:46,120 --> 00:12:49,300 That is the posture training dataset that we will input. 151 00:12:51,580 --> 00:12:58,990 Then comes the actual output corresponding to those inputs to the actual output is stored in partial 152 00:12:58,990 --> 00:12:59,860 train labels. 153 00:13:00,520 --> 00:13:01,870 So that is the second parameter. 154 00:13:04,690 --> 00:13:07,570 Next, we specify the Époque number. 155 00:13:08,140 --> 00:13:12,790 This is the number of times an entire training data will be put into the model. 156 00:13:14,170 --> 00:13:16,910 We said this 230 for this example. 157 00:13:19,060 --> 00:13:20,680 Then we have that site. 158 00:13:21,040 --> 00:13:25,270 This is the number of observations which will be used during each. 159 00:13:25,300 --> 00:13:27,220 Forward and backward propagation. 160 00:13:27,220 --> 00:13:30,380 Step two, we take a bad size of 100. 161 00:13:33,740 --> 00:13:41,810 Lastly, we tell that we have a separate ventilation data also, which is a list of valid images and 162 00:13:41,810 --> 00:13:48,490 that labels and we would like to see the accuracy scored on this validating data as well. 163 00:13:51,190 --> 00:13:57,080 Keep in mind that only this part, brain images and bartering labels will be used to bring any more 164 00:13:57,080 --> 00:14:01,420 than this validation data is like this data. 165 00:14:01,720 --> 00:14:09,270 In this scenario, that is our model will not have seen the validation data when it calculates the accuracy 166 00:14:09,280 --> 00:14:10,520 on this validation data. 167 00:14:11,660 --> 00:14:14,210 Well, let's follow this line of thought as well. 168 00:14:14,640 --> 00:14:16,030 And this will be number one. 169 00:14:24,890 --> 00:14:27,770 Well, you can see that neural network model is getting green. 170 00:14:29,120 --> 00:14:34,100 And the accuracy and loss value is being recorded for each epoch. 171 00:14:36,580 --> 00:14:39,720 In the next video, we will see the performance of this train. 172 00:14:39,790 --> 00:14:40,080 More than.