1
00:00:01,650 --> 00:00:06,140
In the last lecture we discussed a single cell called perception.

2
00:00:06,270 --> 00:00:12,360
Now in this lecture we are going to extend the concepts that we learn in the last one.

3
00:00:12,360 --> 00:00:21,390
I told you that a perception takes in binary input that is 1 and 0 and gives out a single binary output.

4
00:00:21,900 --> 00:00:24,840
But there is no logical reason to put this limitation.

5
00:00:25,980 --> 00:00:31,860
We can easily extend this to any real input values.

6
00:00:31,860 --> 00:00:39,960
So instead of having black and white only or zero and one only we can have different shades of grey

7
00:00:39,990 --> 00:00:40,890
as well.

8
00:00:40,890 --> 00:00:52,390
That is we accept any deal value as input DVDs and threshold still function in the CMB.

9
00:00:52,410 --> 00:00:56,260
Next we will take a look at this equation of perception.

10
00:00:56,260 --> 00:01:02,970
We will slightly modify it to lead to generally used equation in this equation.

11
00:01:02,970 --> 00:01:11,400
We are multiplying without adding these terms and comparing them with the threshold we will make a small

12
00:01:11,400 --> 00:01:12,350
change here.

13
00:01:12,690 --> 00:01:16,150
Bring this threshold to the left and right.

14
00:01:16,170 --> 00:01:19,140
This new term as B.

15
00:01:19,590 --> 00:01:25,480
Basically it means that we have B is equal to minus or a threshold.

16
00:01:25,500 --> 00:01:31,370
People usually call this constant as the bias doesn't really make any difference.

17
00:01:31,440 --> 00:01:38,860
But this is the mathematical representation of perception as you would find in most of the books.

18
00:01:38,910 --> 00:01:42,420
Now let's move on and look at the graphical representation of this function

19
00:01:45,920 --> 00:01:47,090
if you look at this graph.

20
00:01:48,380 --> 00:01:56,450
If the calculated value of this left part that is summation of weight multiplied by features plus the

21
00:01:56,450 --> 00:02:01,920
bias if this summation if this left part is less than zero.

22
00:02:02,690 --> 00:02:05,550
The output comes out to be zero.

23
00:02:05,870 --> 00:02:13,520
So you can see in the graph below zero the output of the function is also zero.

24
00:02:14,180 --> 00:02:17,810
When this left part is greater than zero.

25
00:02:17,990 --> 00:02:22,010
This function suddenly activates and gives an output of 1

26
00:02:25,030 --> 00:02:30,610
this type of function is called a simple step function.

27
00:02:30,610 --> 00:02:36,940
This is one type of activation function activation functions are basically those functions which take

28
00:02:36,940 --> 00:02:38,980
into account some people.

29
00:02:38,990 --> 00:02:43,960
Threshold value here the threshold value is 0.

30
00:02:44,710 --> 00:02:52,450
And this function takes a sudden step at this threshold value which is why it is called a step activation

31
00:02:52,450 --> 00:02:56,400
function.

32
00:02:57,150 --> 00:03:01,200
There are many other types of activation functions.

33
00:03:01,200 --> 00:03:05,510
Most popular one is the sigmoid function.

34
00:03:06,120 --> 00:03:10,920
It is a pictorial representation of how sigmoid function looks.

35
00:03:11,070 --> 00:03:14,360
It is a smooth S shape go.

36
00:03:14,430 --> 00:03:22,950
It also has a minimum of zero at minus infinity and maximum of one at plus infinity.

37
00:03:22,950 --> 00:03:32,490
But instead of having a step and raising suddenly this function arises gradually and continuously.

38
00:03:32,490 --> 00:03:38,400
This function is also called logistic function and is also used in logistic regression which is a very

39
00:03:38,400 --> 00:03:39,960
basic classification algorithm

40
00:03:43,440 --> 00:03:49,770
not the sigmoid function solves a major problem that we have with this step function.

41
00:03:49,770 --> 00:03:56,790
When we are training our perception using historical data to find the value of wheat and threshold this

42
00:03:56,790 --> 00:04:01,070
step function is very sensitive to individual observations.

43
00:04:01,200 --> 00:04:09,480
For example when we are classifying fashion objects in our fashion M NIST dataset and our algorithm

44
00:04:09,540 --> 00:04:18,000
is mis classifying a particular image of boobs as trousers to rectify this a model will need to find

45
00:04:18,000 --> 00:04:21,360
new weight and biased values.

46
00:04:21,420 --> 00:04:23,350
This is where the problem comes.

47
00:04:23,430 --> 00:04:30,000
Small change in the weight and biased values will completely flip the output for a lot of the other

48
00:04:30,000 --> 00:04:31,570
observations.

49
00:04:31,590 --> 00:04:37,710
This makes the step function very hard to control with sigmoid function.

50
00:04:37,710 --> 00:04:43,090
The change is gradual so it is easier to control the behavior.

51
00:04:43,340 --> 00:04:50,730
Now when we replace this type function with a sigmoid activation function we call this new cell as a

52
00:04:50,730 --> 00:04:56,900
sigmoid neuron or a logistic neuron instead of perception.

53
00:04:57,090 --> 00:05:01,310
Mathematically a sigmoid function formula looks like this.

54
00:05:01,650 --> 00:05:09,900
It is sigmoid or z is equal to one upon one plus it is to the power of minus.

55
00:05:10,770 --> 00:05:17,460
And if you plot this function on the graph that is if you have the Z on x axis and you calculate the

56
00:05:17,460 --> 00:05:25,590
value of this function using this formula and plotted on the y axis this is how this formula looks like.

57
00:05:25,590 --> 00:05:30,610
Now we will replace the value of Z with this summation plus bias value.

58
00:05:30,900 --> 00:05:40,650
So WG a C plus B was the input to r activation function so we input this in place of Z.

59
00:05:41,220 --> 00:05:45,060
So this is what the output of r neuron looks like.

60
00:05:45,060 --> 00:05:52,980
It is one upon one plus exponential minus summation of words with features minus b..

61
00:05:53,520 --> 00:06:02,700
If you calculate this value it will always lay between 0 to 1 and it will have a shape like this.

62
00:06:03,060 --> 00:06:11,250
So you can compare it with this that function also in step function we calculated output using this

63
00:06:11,250 --> 00:06:18,300
formula with regard 0 if the summation was less than zero and we got one if the submission was greater

64
00:06:18,300 --> 00:06:23,840
than equal to zero we have replaced this step with the sigmoid function.

65
00:06:23,850 --> 00:06:25,220
This is a continuous function.

66
00:06:25,230 --> 00:06:27,800
We do not need two parts to it.

67
00:06:27,810 --> 00:06:37,470
So we just input the value of WD X JS and the bias to calculate the output which is a continuous function.

68
00:06:37,470 --> 00:06:45,270
Now with this are artificial neural cell is ready which takes in any number of real value inputs and

69
00:06:45,270 --> 00:06:49,420
gives an output between 0 and 1.

70
00:06:49,620 --> 00:06:57,880
It is time to create an artificial neural network which is basically a network of these individual cells.

71
00:06:58,510 --> 00:07:01,730
So just a brief recap of this class.

72
00:07:01,890 --> 00:07:08,550
Initially I said that we taken binary input and gave out one single binary output.

73
00:07:08,550 --> 00:07:18,870
We replaced the input from binary to any real value and we have replaced the binary output to a value

74
00:07:18,870 --> 00:07:21,620
between 0 and 1.

75
00:07:21,690 --> 00:07:28,440
So in this generalized form we taken any input which have any real value and we get one output with

76
00:07:28,440 --> 00:07:30,030
lies between 0 and 1.