1
00:00:00,870 --> 00:00:06,420
Here is a summary table of a classification neural network architecture.

2
00:00:06,420 --> 00:00:09,900
In the second day well you can see I have put forward columns.

3
00:00:10,020 --> 00:00:14,870
First column in bold the first table and the second table is for height but parameters.

4
00:00:15,090 --> 00:00:20,210
These are the values that we have to set prior to training our model.

5
00:00:20,730 --> 00:00:29,500
For example how many layers will add neural network have is something we have to decide and give beforehand.

6
00:00:29,520 --> 00:00:35,520
So the common classification neural network hyper parameters are mentioned in the first row of table

7
00:00:35,520 --> 00:00:36,300
1 and 2

8
00:00:39,130 --> 00:00:47,500
the second third and fourth column in the second table are 4 3 classifications and I use first as binary

9
00:00:47,500 --> 00:00:57,780
classification which is classifying into two classes like marking a meal as spam or not spam.

10
00:00:57,820 --> 00:01:04,130
Second is multi-level binary classification which means there are multiple binary variables.

11
00:01:04,300 --> 00:01:08,710
For example the first variable is whether a mailing spam or not.

12
00:01:08,980 --> 00:01:18,310
And the second variable could be whether a mail is important or not the turban is multi class classification

13
00:01:18,760 --> 00:01:20,570
which we discussed in last lecture.

14
00:01:20,570 --> 00:01:30,490
Also if we have food classes trousers shirts socks and ties this scenario falls under my dick class

15
00:01:30,490 --> 00:01:34,540
classification.

16
00:01:34,570 --> 00:01:41,350
Now let's see what values of hyper parameters do we usually use for these three types of classification

17
00:01:41,350 --> 00:01:41,950
scenarios

18
00:01:45,130 --> 00:01:54,310
the first parameter is number of input neurons number of input neutrons are always equal to the number

19
00:01:54,310 --> 00:01:56,500
of input features.

20
00:01:56,500 --> 00:02:06,130
So if you have 16 input variables you will have 16 input neutrons so you'll always have one neuron but

21
00:02:06,220 --> 00:02:13,330
input feature the second to happen parameter is how many hidden layers do we want in our network.

22
00:02:14,260 --> 00:02:23,390
Ideally this depends on the problem but typically we keep the number of the layers between 1 2 5 keeping

23
00:02:23,560 --> 00:02:32,680
more than 5 in a list only increase the computational effort for our system third type a barometer is

24
00:02:32,820 --> 00:02:40,790
it an activation that is the activation function that we put on the neurons in decoding lives.

25
00:02:40,840 --> 00:02:43,090
This is usually Gray Lou.

26
00:02:43,210 --> 00:02:45,680
We discussed rectified linear unit.

27
00:02:45,700 --> 00:02:50,740
It is a very common function which is used for hidden lid activation.

28
00:02:51,130 --> 00:03:00,150
I told you earlier also we used a little because it is very fast to execute in our systems other sleep

29
00:03:00,200 --> 00:03:06,010
hyper parameters vary with the type of classification that we are doing.

30
00:03:06,280 --> 00:03:10,900
So the number of output neurons in binary classification is 1.

31
00:03:11,750 --> 00:03:16,920
But then multi-level binary classification it is one but label.

32
00:03:17,080 --> 00:03:25,030
For example if we are classifying an email as spam or not spam and the other label is important or not

33
00:03:25,030 --> 00:03:30,950
important we will need two neurons in the output layer.

34
00:03:31,060 --> 00:03:33,900
One neuron would tell us whether it is spam or not spam.

35
00:03:34,360 --> 00:03:42,160
And the other neuron regardless whether it is important or not important for multi class classification

36
00:03:42,640 --> 00:03:45,730
we have one output neuron but loss.

37
00:03:47,500 --> 00:03:56,470
So for example if we are classifying images into shirts trousers socks and dyes we will have four different

38
00:03:56,500 --> 00:04:04,660
output neuron for each of these glass and we will put a soft Max activation layer on top of it to get

39
00:04:04,660 --> 00:04:13,210
the probability of each class happening next hyper parameter is output layer activation in binary classification

40
00:04:14,080 --> 00:04:17,680
logistic or sigmoid function works very well.

41
00:04:17,680 --> 00:04:25,270
You can use step function also but as we have discussed logistic function performs much better then

42
00:04:25,270 --> 00:04:26,860
a step function.

43
00:04:26,860 --> 00:04:34,180
So for binary classification and multi-level binary classification we use the sigmoid function but in

44
00:04:34,180 --> 00:04:40,450
multi class classification after the sigmoid function we have to put an additional layer of soft Max

45
00:04:40,600 --> 00:04:49,480
Activision blaster hyper parameter that is lost function we will be using cross entropy as the lost

46
00:04:49,480 --> 00:04:53,770
function for all types of classifications.

47
00:04:53,770 --> 00:05:00,190
So these are the hyper parameters that you have to mention when you are running a neural network model

48
00:05:00,610 --> 00:05:07,840
in a software the values that are given here are typical values that is these are commonly used values

49
00:05:08,350 --> 00:05:15,370
but it is not a hard and fast rule to use these values only you can customize your neural network by

50
00:05:15,430 --> 00:05:18,370
using any other hyper barometer value

51
00:05:23,060 --> 00:05:29,200
next is the summary table of the regression neural network architecture.

52
00:05:29,590 --> 00:05:34,490
Here on the left we have hyper barometers and on the right we have typical values that we use for these

53
00:05:34,760 --> 00:05:37,250
hyper parameters.

54
00:05:38,000 --> 00:05:40,730
The first one is number of input neurons.

55
00:05:40,760 --> 00:05:48,810
Again it is one but input feature number of lives that depend on the problem.

56
00:05:48,860 --> 00:05:56,960
But usually we keep 1 2 5 human lives then comes number of neurons per alert.

57
00:05:57,030 --> 00:06:06,790
Again this depends on the problem but typically we take 10 two hundred neurons head and live then as

58
00:06:06,880 --> 00:06:09,120
output neurons.

59
00:06:09,520 --> 00:06:13,150
If we are predicting only one thing we need only one output neuron.

60
00:06:13,600 --> 00:06:21,130
If we are predicting multiple things we need one output neuron but the number of things that we want

61
00:06:21,130 --> 00:06:22,510
to predict.

62
00:06:22,780 --> 00:06:30,250
For example if you are predicting house price that requires only one output neuron on the other hand

63
00:06:30,340 --> 00:06:38,020
if you are predicting the length and breadth of a better life a lot from the image of the flood that

64
00:06:38,020 --> 00:06:46,630
requires two output neurons one for the length of the petal and taking for the top dependent next is

65
00:06:46,680 --> 00:06:54,610
it an activation hyper parameter which means what will be the activation function indignantly as most

66
00:06:54,610 --> 00:07:01,390
commonly used activation function is really to indicate in layers that they look cannot be used as activation

67
00:07:01,390 --> 00:07:03,060
function in output layer.

68
00:07:04,030 --> 00:07:10,930
So for regression neural network in the output activation we do not really need any activation function

69
00:07:11,020 --> 00:07:12,230
as such.

70
00:07:12,340 --> 00:07:19,060
If you want to apply any particular boundary condition on the output for example if you want that the

71
00:07:19,060 --> 00:07:25,660
output should only be positive then you can apply a relay kind of function on top of it.

72
00:07:25,780 --> 00:07:32,380
Otherwise there is no requirement of an activation function on the output.

73
00:07:33,640 --> 00:07:38,380
Last time what parameters lost function for regression neural network.

74
00:07:38,470 --> 00:07:44,770
The squared error can also work very well as a loss function.

75
00:07:44,770 --> 00:07:46,850
You cannot use cross entropy here.

76
00:07:47,200 --> 00:07:55,240
So we often use means squared error which is the mean of squared errors that we calculate for individual

77
00:07:55,240 --> 00:07:57,890
training examples.

78
00:07:58,020 --> 00:08:04,740
So these are all the hyper parameters that you need to specify while running a regression neural network

79
00:08:05,260 --> 00:08:10,930
in the software on the right you see the typical values these are not fixed.

80
00:08:10,930 --> 00:08:16,460
You can still customize your neural network by changing these hyper parameter values.

81
00:08:16,600 --> 00:08:17,230
That's all.

82
00:08:17,230 --> 00:08:18,790
See you in these practical actives.