1 00:00:01,650 --> 00:00:06,140 In the last lecture we discussed a single cell called perception. 2 00:00:06,270 --> 00:00:12,360 Now in this lecture we are going to extend the concepts that we learn in the last one. 3 00:00:12,360 --> 00:00:21,390 I told you that a perception takes in binary input that is 1 and 0 and gives out a single binary output. 4 00:00:21,900 --> 00:00:24,840 But there is no logical reason to put this limitation. 5 00:00:25,980 --> 00:00:31,860 We can easily extend this to any real input values. 6 00:00:31,860 --> 00:00:39,960 So instead of having black and white only or zero and one only we can have different shades of grey 7 00:00:39,990 --> 00:00:40,890 as well. 8 00:00:40,890 --> 00:00:52,390 That is we accept any deal value as input DVDs and threshold still function in the CMB. 9 00:00:52,410 --> 00:00:56,260 Next we will take a look at this equation of perception. 10 00:00:56,260 --> 00:01:02,970 We will slightly modify it to lead to generally used equation in this equation. 11 00:01:02,970 --> 00:01:11,400 We are multiplying without adding these terms and comparing them with the threshold we will make a small 12 00:01:11,400 --> 00:01:12,350 change here. 13 00:01:12,690 --> 00:01:16,150 Bring this threshold to the left and right. 14 00:01:16,170 --> 00:01:19,140 This new term as B. 15 00:01:19,590 --> 00:01:25,480 Basically it means that we have B is equal to minus or a threshold. 16 00:01:25,500 --> 00:01:31,370 People usually call this constant as the bias doesn't really make any difference. 17 00:01:31,440 --> 00:01:38,860 But this is the mathematical representation of perception as you would find in most of the books. 18 00:01:38,910 --> 00:01:42,420 Now let's move on and look at the graphical representation of this function 19 00:01:45,920 --> 00:01:47,090 if you look at this graph. 20 00:01:48,380 --> 00:01:56,450 If the calculated value of this left part that is summation of weight multiplied by features plus the 21 00:01:56,450 --> 00:02:01,920 bias if this summation if this left part is less than zero. 22 00:02:02,690 --> 00:02:05,550 The output comes out to be zero. 23 00:02:05,870 --> 00:02:13,520 So you can see in the graph below zero the output of the function is also zero. 24 00:02:14,180 --> 00:02:17,810 When this left part is greater than zero. 25 00:02:17,990 --> 00:02:22,010 This function suddenly activates and gives an output of 1 26 00:02:25,030 --> 00:02:30,610 this type of function is called a simple step function. 27 00:02:30,610 --> 00:02:36,940 This is one type of activation function activation functions are basically those functions which take 28 00:02:36,940 --> 00:02:38,980 into account some people. 29 00:02:38,990 --> 00:02:43,960 Threshold value here the threshold value is 0. 30 00:02:44,710 --> 00:02:52,450 And this function takes a sudden step at this threshold value which is why it is called a step activation 31 00:02:52,450 --> 00:02:56,400 function. 32 00:02:57,150 --> 00:03:01,200 There are many other types of activation functions. 33 00:03:01,200 --> 00:03:05,510 Most popular one is the sigmoid function. 34 00:03:06,120 --> 00:03:10,920 It is a pictorial representation of how sigmoid function looks. 35 00:03:11,070 --> 00:03:14,360 It is a smooth S shape go. 36 00:03:14,430 --> 00:03:22,950 It also has a minimum of zero at minus infinity and maximum of one at plus infinity. 37 00:03:22,950 --> 00:03:32,490 But instead of having a step and raising suddenly this function arises gradually and continuously. 38 00:03:32,490 --> 00:03:38,400 This function is also called logistic function and is also used in logistic regression which is a very 39 00:03:38,400 --> 00:03:39,960 basic classification algorithm 40 00:03:43,440 --> 00:03:49,770 not the sigmoid function solves a major problem that we have with this step function. 41 00:03:49,770 --> 00:03:56,790 When we are training our perception using historical data to find the value of wheat and threshold this 42 00:03:56,790 --> 00:04:01,070 step function is very sensitive to individual observations. 43 00:04:01,200 --> 00:04:09,480 For example when we are classifying fashion objects in our fashion M NIST dataset and our algorithm 44 00:04:09,540 --> 00:04:18,000 is mis classifying a particular image of boobs as trousers to rectify this a model will need to find 45 00:04:18,000 --> 00:04:21,360 new weight and biased values. 46 00:04:21,420 --> 00:04:23,350 This is where the problem comes. 47 00:04:23,430 --> 00:04:30,000 Small change in the weight and biased values will completely flip the output for a lot of the other 48 00:04:30,000 --> 00:04:31,570 observations. 49 00:04:31,590 --> 00:04:37,710 This makes the step function very hard to control with sigmoid function. 50 00:04:37,710 --> 00:04:43,090 The change is gradual so it is easier to control the behavior. 51 00:04:43,340 --> 00:04:50,730 Now when we replace this type function with a sigmoid activation function we call this new cell as a 52 00:04:50,730 --> 00:04:56,900 sigmoid neuron or a logistic neuron instead of perception. 53 00:04:57,090 --> 00:05:01,310 Mathematically a sigmoid function formula looks like this. 54 00:05:01,650 --> 00:05:09,900 It is sigmoid or z is equal to one upon one plus it is to the power of minus. 55 00:05:10,770 --> 00:05:17,460 And if you plot this function on the graph that is if you have the Z on x axis and you calculate the 56 00:05:17,460 --> 00:05:25,590 value of this function using this formula and plotted on the y axis this is how this formula looks like. 57 00:05:25,590 --> 00:05:30,610 Now we will replace the value of Z with this summation plus bias value. 58 00:05:30,900 --> 00:05:40,650 So WG a C plus B was the input to r activation function so we input this in place of Z. 59 00:05:41,220 --> 00:05:45,060 So this is what the output of r neuron looks like. 60 00:05:45,060 --> 00:05:52,980 It is one upon one plus exponential minus summation of words with features minus b.. 61 00:05:53,520 --> 00:06:02,700 If you calculate this value it will always lay between 0 to 1 and it will have a shape like this. 62 00:06:03,060 --> 00:06:11,250 So you can compare it with this that function also in step function we calculated output using this 63 00:06:11,250 --> 00:06:18,300 formula with regard 0 if the summation was less than zero and we got one if the submission was greater 64 00:06:18,300 --> 00:06:23,840 than equal to zero we have replaced this step with the sigmoid function. 65 00:06:23,850 --> 00:06:25,220 This is a continuous function. 66 00:06:25,230 --> 00:06:27,800 We do not need two parts to it. 67 00:06:27,810 --> 00:06:37,470 So we just input the value of WD X JS and the bias to calculate the output which is a continuous function. 68 00:06:37,470 --> 00:06:45,270 Now with this are artificial neural cell is ready which takes in any number of real value inputs and 69 00:06:45,270 --> 00:06:49,420 gives an output between 0 and 1. 70 00:06:49,620 --> 00:06:57,880 It is time to create an artificial neural network which is basically a network of these individual cells. 71 00:06:58,510 --> 00:07:01,730 So just a brief recap of this class. 72 00:07:01,890 --> 00:07:08,550 Initially I said that we taken binary input and gave out one single binary output. 73 00:07:08,550 --> 00:07:18,870 We replaced the input from binary to any real value and we have replaced the binary output to a value 74 00:07:18,870 --> 00:07:21,620 between 0 and 1. 75 00:07:21,690 --> 00:07:28,440 So in this generalized form we taken any input which have any real value and we get one output with 76 00:07:28,440 --> 00:07:30,030 lies between 0 and 1.