1 00:00:01,630 --> 00:00:05,250 In the last lecture, we discussed a single cell called Perceptron. 2 00:00:06,280 --> 00:00:10,470 Now, in this lecture, we are going to extend the concepts that we learned in the last one. 3 00:00:12,360 --> 00:00:19,950 I told you that a Perceptron takes in binary input that is one and zero and gives out a single binary 4 00:00:19,950 --> 00:00:20,410 output. 5 00:00:21,900 --> 00:00:24,840 But there is no logical reason to put this limitation. 6 00:00:25,980 --> 00:00:29,670 We can easily extend this to any real input values. 7 00:00:31,860 --> 00:00:39,960 So instead of having black and white only or zero and one only, we can have different shades of grey 8 00:00:39,960 --> 00:00:40,410 as well. 9 00:00:40,890 --> 00:00:48,330 That is, we accept any real value as input DVDs and trishaw still function in the same way. 10 00:00:52,420 --> 00:00:59,320 Next, we will take a look at this equation of Perceptron will slightly modify it to Legia generally 11 00:00:59,320 --> 00:01:02,530 used equation in this equation. 12 00:01:02,980 --> 00:01:07,930 We are multiplying weight, adding these terms and comparing them with detail. 13 00:01:10,510 --> 00:01:16,060 We will make a small change here, bring this threshold to the left and right. 14 00:01:16,180 --> 00:01:23,950 This new term as B basically it means that we have B is equal to minus altricial. 15 00:01:25,510 --> 00:01:31,270 People usually call this constant as the bias doesn't really make any difference. 16 00:01:31,420 --> 00:01:36,970 But this is the mathematical representation of Perceptron, as you would find in most of the books. 17 00:01:38,920 --> 00:01:39,790 Now, let's move on. 18 00:01:39,880 --> 00:01:42,430 I look at the graphical representation of this function. 19 00:01:45,920 --> 00:01:54,410 If you look at this graph, if the calculated value of this left part, that is summation of weight 20 00:01:54,440 --> 00:02:02,870 multiplied by features, plus the bias, if the summation if this left part is less than zero, the 21 00:02:02,870 --> 00:02:04,370 output comes out to be zero. 22 00:02:05,870 --> 00:02:11,270 So you can see in the graph below zero, the output of the function is also zero. 23 00:02:14,210 --> 00:02:17,390 When this left part is greater than zero. 24 00:02:17,990 --> 00:02:22,010 This function suddenly activates and gives an output of one. 25 00:02:25,030 --> 00:02:28,930 This type of function is called a simple step function. 26 00:02:30,640 --> 00:02:36,910 This is one type of activation function activation functions are basically those functions which take 27 00:02:36,910 --> 00:02:41,610 into account some able to racial value here. 28 00:02:42,520 --> 00:02:43,930 The threshold value is zero. 29 00:02:44,680 --> 00:02:52,450 And this function takes a sudden step at this threshold value, which is why it is called a step activation 30 00:02:52,450 --> 00:02:52,870 function. 31 00:02:57,180 --> 00:02:59,820 There are many other types of activation functions. 32 00:03:01,200 --> 00:03:03,570 Most popular one is the sigmoid function. 33 00:03:06,120 --> 00:03:09,630 It is a pictorial representation of how sigmoid function looks. 34 00:03:11,070 --> 00:03:13,550 It is a smooth s shaped go. 35 00:03:14,430 --> 00:03:21,780 It also has a minimum of zero at minus infinity and maximum of one at plus infinity. 36 00:03:22,950 --> 00:03:31,110 But instead of having a step and raising, suddenly, this function rises gradually and continuously. 37 00:03:32,490 --> 00:03:38,100 This function is also called logistic function and is also used in logistic regression, which is a 38 00:03:38,100 --> 00:03:39,990 very basic classification algorithm. 39 00:03:43,420 --> 00:03:50,700 Now, the sigmoid function solves a major problem that we have with this step function when we are training 40 00:03:50,730 --> 00:03:55,490 our Perceptron using historical data to find the value of beads and treasure. 41 00:03:56,600 --> 00:04:00,180 This step function is very sensitive to individual observations. 42 00:04:01,230 --> 00:04:09,480 For example, when we are classifying fashion objects in our fashion m NASD dataset and that algorithm 43 00:04:09,510 --> 00:04:18,000 is misclassifying a particular image of boots as trousers to rectify, this model will need to find 44 00:04:18,000 --> 00:04:19,800 new ways and bias values. 45 00:04:21,450 --> 00:04:22,770 This is where the problem comes. 46 00:04:23,430 --> 00:04:30,720 Small change in dividend bias values will completely flip the output for a lot of the other observations. 47 00:04:31,620 --> 00:04:37,530 This makes the step function very hard to control with sigmoid function. 48 00:04:37,710 --> 00:04:41,110 The changes gradual, so it is easier to control the behavior. 49 00:04:43,350 --> 00:04:50,340 Now, when we replace this step function with a sigmoid activation function, we call this new cell 50 00:04:50,460 --> 00:04:55,390 as a sigmoid neuron or a logistic neuron instead of Perceptron. 51 00:04:57,090 --> 00:05:00,840 Mathematically, a sigmoid function formula looks like this. 52 00:05:01,650 --> 00:05:03,780 It is sigmoid. 53 00:05:03,800 --> 00:05:06,870 A Z is equal to one upon one. 54 00:05:06,870 --> 00:05:09,840 Plus it is to the power of minus C. 55 00:05:10,760 --> 00:05:17,340 And if you plot this function on the graph, that is, if you have the Z on X axis and you calculate 56 00:05:17,340 --> 00:05:21,420 the value of this function using this formula and plot it on the Y axis. 57 00:05:21,930 --> 00:05:23,880 This is how this formula looks like. 58 00:05:25,620 --> 00:05:27,430 Now we will replace the value of Z. 59 00:05:27,780 --> 00:05:30,090 With these summation plus bias value. 60 00:05:30,870 --> 00:05:37,050 So W.G. A, C plus B was the input to our activation function. 61 00:05:37,890 --> 00:05:40,630 So we input this in place of zie. 62 00:05:41,220 --> 00:05:44,700 So this is what the output of our neuron looks like. 63 00:05:45,060 --> 00:05:51,570 It is one upon one plus exponential minus summation of words with features. 64 00:05:51,780 --> 00:06:01,530 Minus B, if you calculate this value, it will always lay between zero to one and it will have a shape 65 00:06:01,530 --> 00:06:02,160 like this. 66 00:06:03,060 --> 00:06:06,930 So you can compare it with this step function also in step function. 67 00:06:07,050 --> 00:06:14,280 We calculated output using this formula where we got zero. 68 00:06:14,400 --> 00:06:19,320 If this summation was less than zero and we got one, if the summation was greater than equal to zero. 69 00:06:20,280 --> 00:06:23,640 We have replaced astep with a sigmoid function. 70 00:06:23,850 --> 00:06:25,200 This is a continuous function. 71 00:06:25,260 --> 00:06:27,030 We do not need two parts to it. 72 00:06:27,750 --> 00:06:35,730 So we just input the value of the Bluejays X and the bias to calculate the output, which is a continuous 73 00:06:35,730 --> 00:06:36,090 function. 74 00:06:37,270 --> 00:06:45,270 Now with this are artificial neural cell Israeli, which takes in any number of real value inputs and 75 00:06:45,270 --> 00:06:47,760 gives an output between zero and one. 76 00:06:49,610 --> 00:06:56,240 It is time to create an artificial neural network, which is basically a network of these individual 77 00:06:56,240 --> 00:06:56,660 cells. 78 00:06:58,520 --> 00:07:00,890 So just a brief recap of this class. 79 00:07:01,910 --> 00:07:07,760 Initially, I said that we taken binary input and gave out one single binary output. 80 00:07:08,570 --> 00:07:18,860 We replaced the input from binary to any real value, and we have replaced the binary output to a value 81 00:07:18,860 --> 00:07:20,240 between zero and one. 82 00:07:21,680 --> 00:07:28,460 So in this generalized form, we taken any input which have any real value and we get one output with 83 00:07:28,460 --> 00:07:30,020 lies between zero and one.