1 00:00:00,830 --> 00:00:05,260 Here is a summary table of a classification neural network architecture. 2 00:00:06,420 --> 00:00:12,030 In the second table, you can see I have put four columns, first column in both the first table and 3 00:00:12,030 --> 00:00:14,110 the second table is for hyper barometer's. 4 00:00:15,090 --> 00:00:19,370 These are the values that we have to set prior to training our model. 5 00:00:20,730 --> 00:00:28,220 For example, how many layers will add neural network have is something we have to decide and give beforehand. 6 00:00:29,520 --> 00:00:35,520 So the common classification neural network hyper parameters are mentioned in the first row of table 7 00:00:35,520 --> 00:00:36,300 one and two. 8 00:00:39,130 --> 00:00:46,960 The second, third and fourth column in the second table are for three classifications, and I use flusters 9 00:00:47,020 --> 00:00:56,090 binary classification, which is classifying into two classes like marking a mail as spam or not spam. 10 00:00:57,820 --> 00:01:03,880 Second is multilabel binary classification, which means there are multiple binary variables. 11 00:01:04,330 --> 00:01:08,590 For example, the first variable is whether a mail is spam or not. 12 00:01:08,980 --> 00:01:13,180 And the second variable could be whether a mail is important or not. 13 00:01:15,550 --> 00:01:20,500 The third one is multiclass classification, which we discussed in last lecture. 14 00:01:20,530 --> 00:01:26,720 Also, if we have fought losses, trousers, shirts, socks and ties. 15 00:01:27,700 --> 00:01:31,330 This scenario falls under multiclass classification. 16 00:01:34,570 --> 00:01:41,320 Now, let's see what values of hyper parameters do we usually use for these three types of classification 17 00:01:41,320 --> 00:01:41,950 scenarios? 18 00:01:45,110 --> 00:01:48,400 The first parameter is number of input neurons. 19 00:01:49,540 --> 00:01:55,450 Number of input neurons are always equal to the number of input features. 20 00:01:56,500 --> 00:02:01,750 So if you have 16 input variables, you will have 16 input neurons. 21 00:02:02,980 --> 00:02:05,760 So you'll always have one neuron. 22 00:02:05,860 --> 00:02:06,970 But input feature. 23 00:02:09,040 --> 00:02:13,280 The second hyper parameter is how many layers do we want in our network? 24 00:02:14,270 --> 00:02:16,120 Ideally, this depends on the problem. 25 00:02:16,660 --> 00:02:24,340 But typically we keep the number of the layers between one, two, five, keeping more than five and 26 00:02:24,340 --> 00:02:28,780 a lid only increase the computational effort for our system. 27 00:02:31,530 --> 00:02:33,810 Third type of barometer is hidden activation. 28 00:02:34,030 --> 00:02:39,300 That is the activation function that we put on the neurons in decoding layers. 29 00:02:40,870 --> 00:02:42,420 This is usually RELU. 30 00:02:43,210 --> 00:02:44,890 We discussed rectifier linear unit. 31 00:02:45,700 --> 00:02:50,140 It is a very common function which is used for hidden layer activation. 32 00:02:51,130 --> 00:02:57,490 I told you earlier, also we used a loop because it is very fast to execute in our systems. 33 00:02:59,650 --> 00:03:04,600 Other three paper parameters very with the type of classification that we are doing. 34 00:03:06,280 --> 00:03:10,880 So the number of output neurons in binary classification is one. 35 00:03:11,600 --> 00:03:15,820 But then multilabel binary classification, it is one bad label. 36 00:03:17,080 --> 00:03:24,730 For example, if we are classifying an email as spam or not spam and the other label is important or 37 00:03:24,730 --> 00:03:30,490 not important, we will need to neutrons in the output layer. 38 00:03:31,060 --> 00:03:33,860 One neuron would tell us whether it is spam or not spam. 39 00:03:34,360 --> 00:03:42,160 And the other neuron will tell us whether it is important or not important for multiclass classification. 40 00:03:42,640 --> 00:03:45,640 We have one output neuron Berglas. 41 00:03:47,500 --> 00:03:55,510 So for example, if we are classifying images and two shirts, trousers, socks and ties, we will have 42 00:03:55,720 --> 00:03:58,820 four different output neurons for each of these glass. 43 00:04:00,070 --> 00:04:06,850 And we will put a softmax activation layer on top of it to get the probability of each class happening 44 00:04:08,440 --> 00:04:15,850 next, hyper parameterize output, layer activation in binary classification, logistic or sigmoid function 45 00:04:15,850 --> 00:04:16,570 works very well. 46 00:04:17,680 --> 00:04:20,230 You can use step function also. 47 00:04:21,290 --> 00:04:25,990 But as we have discussed, logistic function performs much better than a step function. 48 00:04:26,860 --> 00:04:32,650 So for binary classification and multilabel binary classification, we use the sigmoid function. 49 00:04:33,730 --> 00:04:39,400 But in multiclass classification, after the sigmoid function, we have to put an additional layer of 50 00:04:39,670 --> 00:04:43,750 softmax activation lost. 51 00:04:43,750 --> 00:04:45,730 Her output parameter is lost function. 52 00:04:47,220 --> 00:04:52,090 We will be using cross entropy as the lost function for all types of classifications. 53 00:04:53,770 --> 00:05:00,220 So these are the hyper parameters that you have to mention when you are running MUTILATOR work model 54 00:05:00,610 --> 00:05:01,420 in the software. 55 00:05:02,620 --> 00:05:05,230 The values that are given here are typical values. 56 00:05:05,590 --> 00:05:11,770 That is, these are commonly used values, but it is not a hard and fast rule to use these values. 57 00:05:11,830 --> 00:05:18,370 Only you can customize your neural network by using any other hyper barometer value. 58 00:05:23,070 --> 00:05:30,330 Next is the somebody, David, of the aggression neural network architecture head on the left. 59 00:05:30,360 --> 00:05:35,670 We have hyper barometer's and on the right we have typical values that we use for this hyper barometer's. 60 00:05:37,970 --> 00:05:40,350 The first one is number of input neurons. 61 00:05:40,770 --> 00:05:48,720 Again, it is one, but input feature number of lives, that depends on the problem. 62 00:05:48,870 --> 00:05:51,910 But usually we keep one, two, five hidden lives. 63 00:05:54,120 --> 00:05:58,630 Then comes number of neutrons, but he did it again, this depends on the problem. 64 00:05:59,100 --> 00:06:07,710 But typically we take 10, 200 neurons, indicate a list then as output neurons. 65 00:06:09,540 --> 00:06:12,960 If we are predicting only one thing, we need only one, not put neuron. 66 00:06:13,620 --> 00:06:17,960 If we are predicting multiple things, we need one output neuron. 67 00:06:18,390 --> 00:06:21,570 But the number of things that we want to predict. 68 00:06:22,770 --> 00:06:28,240 For example, if you are predicting house price, that requires only one or per neuron. 69 00:06:29,550 --> 00:06:36,120 On the other hand, if you are predicting the length and weight of a pet, I love a lot from the image 70 00:06:36,120 --> 00:06:43,710 of the flood that requires to output neurons one for the length of the petal and ticking for David totipotent. 71 00:06:46,170 --> 00:06:52,160 Next, US hit an activation hyper parameter, which means what will be the activation function independently 72 00:06:52,200 --> 00:07:01,410 as most commonly used activation function is Relu indicating layers relu cannot be used as activation 73 00:07:01,410 --> 00:07:02,940 function and output layer. 74 00:07:04,020 --> 00:07:10,920 So for regression neural network in the output activation, we do not really need any activation function 75 00:07:11,010 --> 00:07:11,580 as such. 76 00:07:12,360 --> 00:07:18,870 If you want to play any particular boundary condition on the output, for example, if you want that 77 00:07:18,870 --> 00:07:20,760 the output should only be positive. 78 00:07:21,390 --> 00:07:25,000 Then you can play a RELU kind of function on top of it. 79 00:07:25,770 --> 00:07:31,510 Otherwise, there is no requirement of an activation function on the output limit. 80 00:07:33,630 --> 00:07:37,950 Last iBOT barometer is lost function for regression neural network. 81 00:07:38,460 --> 00:07:43,500 The squared error can also work very well as a lost function. 82 00:07:44,790 --> 00:07:46,620 You cannot use cross entropy here. 83 00:07:47,190 --> 00:07:55,230 So we often use means squared error, which is the mean of squared errors that we calculate for individual 84 00:07:55,230 --> 00:07:56,160 training examples. 85 00:07:58,020 --> 00:08:01,840 So these are all the hyper parameters that you need to specify. 86 00:08:02,010 --> 00:08:08,640 While running a regression neural network in the software on the right, you'll see the typical values. 87 00:08:08,730 --> 00:08:10,290 These are not fixed. 88 00:08:10,920 --> 00:08:15,750 You can still customize your neural network by changing these hyper parameter values. 89 00:08:16,620 --> 00:08:17,160 That's all. 90 00:08:17,220 --> 00:08:18,780 See you in the practical actives.