1 00:00:00,870 --> 00:00:06,420 Here is a summary table of a classification neural network architecture. 2 00:00:06,420 --> 00:00:09,900 In the second day well you can see I have put forward columns. 3 00:00:10,020 --> 00:00:14,870 First column in bold the first table and the second table is for height but parameters. 4 00:00:15,090 --> 00:00:20,210 These are the values that we have to set prior to training our model. 5 00:00:20,730 --> 00:00:29,500 For example how many layers will add neural network have is something we have to decide and give beforehand. 6 00:00:29,520 --> 00:00:35,520 So the common classification neural network hyper parameters are mentioned in the first row of table 7 00:00:35,520 --> 00:00:36,300 1 and 2 8 00:00:39,130 --> 00:00:47,500 the second third and fourth column in the second table are 4 3 classifications and I use first as binary 9 00:00:47,500 --> 00:00:57,780 classification which is classifying into two classes like marking a meal as spam or not spam. 10 00:00:57,820 --> 00:01:04,130 Second is multi-level binary classification which means there are multiple binary variables. 11 00:01:04,300 --> 00:01:08,710 For example the first variable is whether a mailing spam or not. 12 00:01:08,980 --> 00:01:18,310 And the second variable could be whether a mail is important or not the turban is multi class classification 13 00:01:18,760 --> 00:01:20,570 which we discussed in last lecture. 14 00:01:20,570 --> 00:01:30,490 Also if we have food classes trousers shirts socks and ties this scenario falls under my dick class 15 00:01:30,490 --> 00:01:34,540 classification. 16 00:01:34,570 --> 00:01:41,350 Now let's see what values of hyper parameters do we usually use for these three types of classification 17 00:01:41,350 --> 00:01:41,950 scenarios 18 00:01:45,130 --> 00:01:54,310 the first parameter is number of input neurons number of input neutrons are always equal to the number 19 00:01:54,310 --> 00:01:56,500 of input features. 20 00:01:56,500 --> 00:02:06,130 So if you have 16 input variables you will have 16 input neutrons so you'll always have one neuron but 21 00:02:06,220 --> 00:02:13,330 input feature the second to happen parameter is how many hidden layers do we want in our network. 22 00:02:14,260 --> 00:02:23,390 Ideally this depends on the problem but typically we keep the number of the layers between 1 2 5 keeping 23 00:02:23,560 --> 00:02:32,680 more than 5 in a list only increase the computational effort for our system third type a barometer is 24 00:02:32,820 --> 00:02:40,790 it an activation that is the activation function that we put on the neurons in decoding lives. 25 00:02:40,840 --> 00:02:43,090 This is usually Gray Lou. 26 00:02:43,210 --> 00:02:45,680 We discussed rectified linear unit. 27 00:02:45,700 --> 00:02:50,740 It is a very common function which is used for hidden lid activation. 28 00:02:51,130 --> 00:03:00,150 I told you earlier also we used a little because it is very fast to execute in our systems other sleep 29 00:03:00,200 --> 00:03:06,010 hyper parameters vary with the type of classification that we are doing. 30 00:03:06,280 --> 00:03:10,900 So the number of output neurons in binary classification is 1. 31 00:03:11,750 --> 00:03:16,920 But then multi-level binary classification it is one but label. 32 00:03:17,080 --> 00:03:25,030 For example if we are classifying an email as spam or not spam and the other label is important or not 33 00:03:25,030 --> 00:03:30,950 important we will need two neurons in the output layer. 34 00:03:31,060 --> 00:03:33,900 One neuron would tell us whether it is spam or not spam. 35 00:03:34,360 --> 00:03:42,160 And the other neuron regardless whether it is important or not important for multi class classification 36 00:03:42,640 --> 00:03:45,730 we have one output neuron but loss. 37 00:03:47,500 --> 00:03:56,470 So for example if we are classifying images into shirts trousers socks and dyes we will have four different 38 00:03:56,500 --> 00:04:04,660 output neuron for each of these glass and we will put a soft Max activation layer on top of it to get 39 00:04:04,660 --> 00:04:13,210 the probability of each class happening next hyper parameter is output layer activation in binary classification 40 00:04:14,080 --> 00:04:17,680 logistic or sigmoid function works very well. 41 00:04:17,680 --> 00:04:25,270 You can use step function also but as we have discussed logistic function performs much better then 42 00:04:25,270 --> 00:04:26,860 a step function. 43 00:04:26,860 --> 00:04:34,180 So for binary classification and multi-level binary classification we use the sigmoid function but in 44 00:04:34,180 --> 00:04:40,450 multi class classification after the sigmoid function we have to put an additional layer of soft Max 45 00:04:40,600 --> 00:04:49,480 Activision blaster hyper parameter that is lost function we will be using cross entropy as the lost 46 00:04:49,480 --> 00:04:53,770 function for all types of classifications. 47 00:04:53,770 --> 00:05:00,190 So these are the hyper parameters that you have to mention when you are running a neural network model 48 00:05:00,610 --> 00:05:07,840 in a software the values that are given here are typical values that is these are commonly used values 49 00:05:08,350 --> 00:05:15,370 but it is not a hard and fast rule to use these values only you can customize your neural network by 50 00:05:15,430 --> 00:05:18,370 using any other hyper barometer value 51 00:05:23,060 --> 00:05:29,200 next is the summary table of the regression neural network architecture. 52 00:05:29,590 --> 00:05:34,490 Here on the left we have hyper barometers and on the right we have typical values that we use for these 53 00:05:34,760 --> 00:05:37,250 hyper parameters. 54 00:05:38,000 --> 00:05:40,730 The first one is number of input neurons. 55 00:05:40,760 --> 00:05:48,810 Again it is one but input feature number of lives that depend on the problem. 56 00:05:48,860 --> 00:05:56,960 But usually we keep 1 2 5 human lives then comes number of neurons per alert. 57 00:05:57,030 --> 00:06:06,790 Again this depends on the problem but typically we take 10 two hundred neurons head and live then as 58 00:06:06,880 --> 00:06:09,120 output neurons. 59 00:06:09,520 --> 00:06:13,150 If we are predicting only one thing we need only one output neuron. 60 00:06:13,600 --> 00:06:21,130 If we are predicting multiple things we need one output neuron but the number of things that we want 61 00:06:21,130 --> 00:06:22,510 to predict. 62 00:06:22,780 --> 00:06:30,250 For example if you are predicting house price that requires only one output neuron on the other hand 63 00:06:30,340 --> 00:06:38,020 if you are predicting the length and breadth of a better life a lot from the image of the flood that 64 00:06:38,020 --> 00:06:46,630 requires two output neurons one for the length of the petal and taking for the top dependent next is 65 00:06:46,680 --> 00:06:54,610 it an activation hyper parameter which means what will be the activation function indignantly as most 66 00:06:54,610 --> 00:07:01,390 commonly used activation function is really to indicate in layers that they look cannot be used as activation 67 00:07:01,390 --> 00:07:03,060 function in output layer. 68 00:07:04,030 --> 00:07:10,930 So for regression neural network in the output activation we do not really need any activation function 69 00:07:11,020 --> 00:07:12,230 as such. 70 00:07:12,340 --> 00:07:19,060 If you want to apply any particular boundary condition on the output for example if you want that the 71 00:07:19,060 --> 00:07:25,660 output should only be positive then you can apply a relay kind of function on top of it. 72 00:07:25,780 --> 00:07:32,380 Otherwise there is no requirement of an activation function on the output. 73 00:07:33,640 --> 00:07:38,380 Last time what parameters lost function for regression neural network. 74 00:07:38,470 --> 00:07:44,770 The squared error can also work very well as a loss function. 75 00:07:44,770 --> 00:07:46,850 You cannot use cross entropy here. 76 00:07:47,200 --> 00:07:55,240 So we often use means squared error which is the mean of squared errors that we calculate for individual 77 00:07:55,240 --> 00:07:57,890 training examples. 78 00:07:58,020 --> 00:08:04,740 So these are all the hyper parameters that you need to specify while running a regression neural network 79 00:08:05,260 --> 00:08:10,930 in the software on the right you see the typical values these are not fixed. 80 00:08:10,930 --> 00:08:16,460 You can still customize your neural network by changing these hyper parameter values. 81 00:08:16,600 --> 00:08:17,230 That's all. 82 00:08:17,230 --> 00:08:18,790 See you in these practical actives.