1
00:00:00,830 --> 00:00:05,260
Here is a summary table of a classification neural network architecture.

2
00:00:06,420 --> 00:00:12,030
In the second table, you can see I have put four columns, first column in both the first table and

3
00:00:12,030 --> 00:00:14,110
the second table is for hyper barometer's.

4
00:00:15,090 --> 00:00:19,370
These are the values that we have to set prior to training our model.

5
00:00:20,730 --> 00:00:28,220
For example, how many layers will add neural network have is something we have to decide and give beforehand.

6
00:00:29,520 --> 00:00:35,520
So the common classification neural network hyper parameters are mentioned in the first row of table

7
00:00:35,520 --> 00:00:36,300
one and two.

8
00:00:39,130 --> 00:00:46,960
The second, third and fourth column in the second table are for three classifications, and I use flusters

9
00:00:47,020 --> 00:00:56,090
binary classification, which is classifying into two classes like marking a mail as spam or not spam.

10
00:00:57,820 --> 00:01:03,880
Second is multilabel binary classification, which means there are multiple binary variables.

11
00:01:04,330 --> 00:01:08,590
For example, the first variable is whether a mail is spam or not.

12
00:01:08,980 --> 00:01:13,180
And the second variable could be whether a mail is important or not.

13
00:01:15,550 --> 00:01:20,500
The third one is multiclass classification, which we discussed in last lecture.

14
00:01:20,530 --> 00:01:26,720
Also, if we have fought losses, trousers, shirts, socks and ties.

15
00:01:27,700 --> 00:01:31,330
This scenario falls under multiclass classification.

16
00:01:34,570 --> 00:01:41,320
Now, let's see what values of hyper parameters do we usually use for these three types of classification

17
00:01:41,320 --> 00:01:41,950
scenarios?

18
00:01:45,110 --> 00:01:48,400
The first parameter is number of input neurons.

19
00:01:49,540 --> 00:01:55,450
Number of input neurons are always equal to the number of input features.

20
00:01:56,500 --> 00:02:01,750
So if you have 16 input variables, you will have 16 input neurons.

21
00:02:02,980 --> 00:02:05,760
So you'll always have one neuron.

22
00:02:05,860 --> 00:02:06,970
But input feature.

23
00:02:09,040 --> 00:02:13,280
The second hyper parameter is how many layers do we want in our network?

24
00:02:14,270 --> 00:02:16,120
Ideally, this depends on the problem.

25
00:02:16,660 --> 00:02:24,340
But typically we keep the number of the layers between one, two, five, keeping more than five and

26
00:02:24,340 --> 00:02:28,780
a lid only increase the computational effort for our system.

27
00:02:31,530 --> 00:02:33,810
Third type of barometer is hidden activation.

28
00:02:34,030 --> 00:02:39,300
That is the activation function that we put on the neurons in decoding layers.

29
00:02:40,870 --> 00:02:42,420
This is usually RELU.

30
00:02:43,210 --> 00:02:44,890
We discussed rectifier linear unit.

31
00:02:45,700 --> 00:02:50,140
It is a very common function which is used for hidden layer activation.

32
00:02:51,130 --> 00:02:57,490
I told you earlier, also we used a loop because it is very fast to execute in our systems.

33
00:02:59,650 --> 00:03:04,600
Other three paper parameters very with the type of classification that we are doing.

34
00:03:06,280 --> 00:03:10,880
So the number of output neurons in binary classification is one.

35
00:03:11,600 --> 00:03:15,820
But then multilabel binary classification, it is one bad label.

36
00:03:17,080 --> 00:03:24,730
For example, if we are classifying an email as spam or not spam and the other label is important or

37
00:03:24,730 --> 00:03:30,490
not important, we will need to neutrons in the output layer.

38
00:03:31,060 --> 00:03:33,860
One neuron would tell us whether it is spam or not spam.

39
00:03:34,360 --> 00:03:42,160
And the other neuron will tell us whether it is important or not important for multiclass classification.

40
00:03:42,640 --> 00:03:45,640
We have one output neuron Berglas.

41
00:03:47,500 --> 00:03:55,510
So for example, if we are classifying images and two shirts, trousers, socks and ties, we will have

42
00:03:55,720 --> 00:03:58,820
four different output neurons for each of these glass.

43
00:04:00,070 --> 00:04:06,850
And we will put a softmax activation layer on top of it to get the probability of each class happening

44
00:04:08,440 --> 00:04:15,850
next, hyper parameterize output, layer activation in binary classification, logistic or sigmoid function

45
00:04:15,850 --> 00:04:16,570
works very well.

46
00:04:17,680 --> 00:04:20,230
You can use step function also.

47
00:04:21,290 --> 00:04:25,990
But as we have discussed, logistic function performs much better than a step function.

48
00:04:26,860 --> 00:04:32,650
So for binary classification and multilabel binary classification, we use the sigmoid function.

49
00:04:33,730 --> 00:04:39,400
But in multiclass classification, after the sigmoid function, we have to put an additional layer of

50
00:04:39,670 --> 00:04:43,750
softmax activation lost.

51
00:04:43,750 --> 00:04:45,730
Her output parameter is lost function.

52
00:04:47,220 --> 00:04:52,090
We will be using cross entropy as the lost function for all types of classifications.

53
00:04:53,770 --> 00:05:00,220
So these are the hyper parameters that you have to mention when you are running MUTILATOR work model

54
00:05:00,610 --> 00:05:01,420
in the software.

55
00:05:02,620 --> 00:05:05,230
The values that are given here are typical values.

56
00:05:05,590 --> 00:05:11,770
That is, these are commonly used values, but it is not a hard and fast rule to use these values.

57
00:05:11,830 --> 00:05:18,370
Only you can customize your neural network by using any other hyper barometer value.

58
00:05:23,070 --> 00:05:30,330
Next is the somebody, David, of the aggression neural network architecture head on the left.

59
00:05:30,360 --> 00:05:35,670
We have hyper barometer's and on the right we have typical values that we use for this hyper barometer's.

60
00:05:37,970 --> 00:05:40,350
The first one is number of input neurons.

61
00:05:40,770 --> 00:05:48,720
Again, it is one, but input feature number of lives, that depends on the problem.

62
00:05:48,870 --> 00:05:51,910
But usually we keep one, two, five hidden lives.

63
00:05:54,120 --> 00:05:58,630
Then comes number of neutrons, but he did it again, this depends on the problem.

64
00:05:59,100 --> 00:06:07,710
But typically we take 10, 200 neurons, indicate a list then as output neurons.

65
00:06:09,540 --> 00:06:12,960
If we are predicting only one thing, we need only one, not put neuron.

66
00:06:13,620 --> 00:06:17,960
If we are predicting multiple things, we need one output neuron.

67
00:06:18,390 --> 00:06:21,570
But the number of things that we want to predict.

68
00:06:22,770 --> 00:06:28,240
For example, if you are predicting house price, that requires only one or per neuron.

69
00:06:29,550 --> 00:06:36,120
On the other hand, if you are predicting the length and weight of a pet, I love a lot from the image

70
00:06:36,120 --> 00:06:43,710
of the flood that requires to output neurons one for the length of the petal and ticking for David totipotent.

71
00:06:46,170 --> 00:06:52,160
Next, US hit an activation hyper parameter, which means what will be the activation function independently

72
00:06:52,200 --> 00:07:01,410
as most commonly used activation function is Relu indicating layers relu cannot be used as activation

73
00:07:01,410 --> 00:07:02,940
function and output layer.

74
00:07:04,020 --> 00:07:10,920
So for regression neural network in the output activation, we do not really need any activation function

75
00:07:11,010 --> 00:07:11,580
as such.

76
00:07:12,360 --> 00:07:18,870
If you want to play any particular boundary condition on the output, for example, if you want that

77
00:07:18,870 --> 00:07:20,760
the output should only be positive.

78
00:07:21,390 --> 00:07:25,000
Then you can play a RELU kind of function on top of it.

79
00:07:25,770 --> 00:07:31,510
Otherwise, there is no requirement of an activation function on the output limit.

80
00:07:33,630 --> 00:07:37,950
Last iBOT barometer is lost function for regression neural network.

81
00:07:38,460 --> 00:07:43,500
The squared error can also work very well as a lost function.

82
00:07:44,790 --> 00:07:46,620
You cannot use cross entropy here.

83
00:07:47,190 --> 00:07:55,230
So we often use means squared error, which is the mean of squared errors that we calculate for individual

84
00:07:55,230 --> 00:07:56,160
training examples.

85
00:07:58,020 --> 00:08:01,840
So these are all the hyper parameters that you need to specify.

86
00:08:02,010 --> 00:08:08,640
While running a regression neural network in the software on the right, you'll see the typical values.

87
00:08:08,730 --> 00:08:10,290
These are not fixed.

88
00:08:10,920 --> 00:08:15,750
You can still customize your neural network by changing these hyper parameter values.

89
00:08:16,620 --> 00:08:17,160
That's all.

90
00:08:17,220 --> 00:08:18,780
See you in the practical actives.