1 00:00:00,166 --> 00:00:02,633 Hello and welcome back to the course on Deep Learning. 2 00:00:02,633 --> 00:00:03,000 All right. 3 00:00:03,000 --> 00:00:05,033 Today we're talking about the activation function. 4 00:00:05,033 --> 00:00:06,900 Let's get straight into it. 5 00:00:06,900 --> 00:00:08,100 So this is where we left off. 6 00:00:08,100 --> 00:00:11,766 Previously we talked about the structure of a one neuron. 7 00:00:11,833 --> 00:00:13,200 So there it is in the middle. 8 00:00:13,200 --> 00:00:15,666 We know that it has some inputs values coming in. 9 00:00:15,666 --> 00:00:17,000 It's got some weights. 10 00:00:17,000 --> 00:00:19,533 Then it adds up the weighted. 11 00:00:19,533 --> 00:00:21,966 It calculates a weighted sum of those inputs 12 00:00:21,966 --> 00:00:23,633 and then applies the activation function. 13 00:00:23,633 --> 00:00:28,466 In step three it passes on the signal to the next round. 14 00:00:28,466 --> 00:00:29,700 And that's what we're talking about today. 15 00:00:29,700 --> 00:00:32,766 We're talking about the value that is going to be passed over. 16 00:00:32,766 --> 00:00:35,766 So we're talking about the activation function that's being applied. 17 00:00:36,266 --> 00:00:39,166 So what options do we have for the activation function. 18 00:00:39,166 --> 00:00:42,166 Well we're going to look at four different types of activation functions 19 00:00:42,400 --> 00:00:43,300 that you could choose from. 20 00:00:43,300 --> 00:00:45,533 Of course there are more different types of activation function. 21 00:00:45,533 --> 00:00:47,166 But these are the predominant ones 22 00:00:47,166 --> 00:00:50,166 that you'll be hearing about and that we'll be using in this course. 23 00:00:50,233 --> 00:00:52,966 So here is the threshold function. 24 00:00:52,966 --> 00:00:54,133 This is what it looks like. 25 00:00:54,133 --> 00:00:58,466 So on the x axis you have the weighted sum of inputs. 26 00:00:58,800 --> 00:01:03,500 On the y axis you have just you know the values from 0 to 1. 27 00:01:03,866 --> 00:01:08,066 And basically the threshold function is a very simple type of function where 28 00:01:08,533 --> 00:01:13,233 if the value is less than zero, 29 00:01:13,300 --> 00:01:16,400 then the threshold function passes on zero. 30 00:01:16,733 --> 00:01:19,733 If the value is more than zero or equal 31 00:01:19,733 --> 00:01:22,833 to zero, then threshold function passes on a one. 32 00:01:22,833 --> 00:01:26,100 So it's basically kind of like yes, no type of function. 33 00:01:26,300 --> 00:01:32,466 very, very straightforward, very kind of like rigid type of function either. 34 00:01:32,466 --> 00:01:34,900 Yes or no, no other options. 35 00:01:34,900 --> 00:01:36,066 So there you go. That's how it works. 36 00:01:36,066 --> 00:01:37,266 Very simple function. 37 00:01:37,266 --> 00:01:39,900 Let's move on to something a bit more complex. 38 00:01:39,900 --> 00:01:42,400 Now the sigmoid function. 39 00:01:42,400 --> 00:01:44,166 very interesting formula. 40 00:01:44,166 --> 00:01:45,900 that we have here. 41 00:01:45,900 --> 00:01:51,166 You'll see just now there is one divide by one plus e to the power of minus x. 42 00:01:51,333 --> 00:01:56,333 Whereas in this case, of course x is the value of the summed, 43 00:01:56,966 --> 00:01:58,433 of the weighted sums. 44 00:01:58,433 --> 00:02:00,466 And so yeah. 45 00:02:00,466 --> 00:02:02,466 So this is what the sigmoid looks like. 46 00:02:02,466 --> 00:02:06,433 It's function which is used in the logistic regression, 47 00:02:06,433 --> 00:02:09,366 if you recall, from the machine learning course. 48 00:02:09,366 --> 00:02:13,033 So what is good about this function is that it is smooth, unlike the, 49 00:02:13,533 --> 00:02:17,966 threshold function, this one doesn't have those kinks in its curve. 50 00:02:17,966 --> 00:02:21,600 And therefore, it's just nice and smooth, gradual, progression. 51 00:02:21,600 --> 00:02:25,566 So, anything below zero, it just like, drops off 52 00:02:25,566 --> 00:02:29,700 above zero, it approximates towards one. 53 00:02:30,000 --> 00:02:35,133 And this, sigmoid function is very useful, in the final layer, in the output 54 00:02:35,133 --> 00:02:38,833 layer, especially when you're trying to predict probabilities. 55 00:02:38,833 --> 00:02:41,066 And we'll see that throughout this course. 56 00:02:41,066 --> 00:02:43,066 And then we've got the rectifier function. 57 00:02:43,066 --> 00:02:45,900 Rectifier function, even though it has a kink, 58 00:02:45,900 --> 00:02:50,833 is one of the most popular functions for artificial neural networks. 59 00:02:50,833 --> 00:02:53,700 So it goes all the way to zero. 60 00:02:53,700 --> 00:02:55,000 It's is zero. 61 00:02:55,000 --> 00:02:58,066 And then from there it gradually progresses 62 00:02:58,066 --> 00:03:01,400 as the input value, increases as well. 63 00:03:01,600 --> 00:03:03,300 And we'll see that throughout the course. 64 00:03:03,300 --> 00:03:04,933 We'll see that in other intuition tutorials. 65 00:03:04,933 --> 00:03:06,466 And we'll also see that, 66 00:03:06,466 --> 00:03:09,600 how we use this function in the practical, side of the course. 67 00:03:09,600 --> 00:03:13,166 And I will comment on this a bit more in a few slides from now. 68 00:03:13,433 --> 00:03:15,033 So just remember that rectifier function is 69 00:03:15,033 --> 00:03:18,466 one of the most used functions in artificial neural networks. 70 00:03:18,900 --> 00:03:22,600 And finally we've got one more function, that you will probably hear about. 71 00:03:22,600 --> 00:03:25,133 It's the hyperbolic tangent function. 72 00:03:25,133 --> 00:03:27,333 It's very similar to the sigmoid function. 73 00:03:27,333 --> 00:03:32,333 But here the the hyperbolic tangent function goes below zero. 74 00:03:32,333 --> 00:03:36,233 So the values go from 0 to 1 or approximately 75 00:03:36,233 --> 00:03:39,266 to one and go from 0 to -1 on the other side. 76 00:03:39,600 --> 00:03:42,233 And that can be useful in some applications. 77 00:03:42,233 --> 00:03:45,700 So we're not going to go into too much depth on each one of these functions. 78 00:03:45,700 --> 00:03:50,066 I just wanted to, acquaint you of them so that you know what they look like 79 00:03:50,066 --> 00:03:51,633 and what they're called. 80 00:03:51,633 --> 00:03:54,266 if you'd like to get some additional reading, 81 00:03:54,266 --> 00:03:59,033 then check out this paper by Heavier Glory. 82 00:03:59,033 --> 00:04:01,200 What have you got? 83 00:04:01,200 --> 00:04:05,400 called deep sparse rectified Neural Networks 2011 paper. 84 00:04:05,633 --> 00:04:10,833 And there you will find out exactly why the, rectifier function 85 00:04:10,833 --> 00:04:16,000 is such a, valuable function, why it's so popular to use. 86 00:04:16,200 --> 00:04:20,566 But nevertheless, for now, you don't really need to know all of those things. 87 00:04:20,566 --> 00:04:22,333 For now, we're just going to start applying them. 88 00:04:22,333 --> 00:04:24,100 We just start using them more and more and more. 89 00:04:24,100 --> 00:04:28,200 And so when you feel comfortable with the practical side of things, 90 00:04:28,466 --> 00:04:33,866 then you can go and refer to this paper and then you will be able to soak in 91 00:04:33,866 --> 00:04:36,866 that knowledge much quicker, and it'll make much more sense. 92 00:04:37,100 --> 00:04:40,166 So but just keep this in mind that when you're ready, when you feel that 93 00:04:40,166 --> 00:04:42,700 you're ready, then you can go and refer to this paper 94 00:04:42,700 --> 00:04:45,433 and get some valuable knowledge from there. 95 00:04:45,433 --> 00:04:48,266 So, just to quickly recap, 96 00:04:48,266 --> 00:04:51,433 we have the threshold activation function, which looks like this, 97 00:04:52,066 --> 00:04:55,066 the sigmoid activation function, which looks like this. 98 00:04:55,600 --> 00:05:00,166 We have the rectifier function and we have the hyperbolic tangent function. 99 00:05:00,366 --> 00:05:04,933 And now to finish off this tutorial, let's quickly do a few exercises. 100 00:05:04,933 --> 00:05:09,033 So we'll just do two quick exercises to, help that knowledge sink in. 101 00:05:09,033 --> 00:05:14,466 So first one is we've got an example here of a neural network with just one neuron. 102 00:05:14,466 --> 00:05:16,000 And then right away the output layer. 103 00:05:16,000 --> 00:05:19,900 And the question is assuming that your dependent variable is binary. 104 00:05:19,900 --> 00:05:23,600 So it's either zero and one, which threshold function would you use. 105 00:05:23,600 --> 00:05:28,533 So out of the ones that we've discussed we have the threshold function, 106 00:05:28,866 --> 00:05:32,466 the sigmoid function, the rectifier function. 107 00:05:32,666 --> 00:05:35,200 And we've got the hyperbolic tangent function. 108 00:05:35,200 --> 00:05:37,833 in it's in their role forms. 109 00:05:37,833 --> 00:05:41,200 Which ones would you be able to use? 110 00:05:41,733 --> 00:05:42,866 for a binary variable. 111 00:05:43,833 --> 00:05:44,300 Okay. 112 00:05:44,300 --> 00:05:49,233 So the answers here are there's two options that we can approach this with. 113 00:05:49,233 --> 00:05:52,200 So number one is the threshold activation function. 114 00:05:52,200 --> 00:05:54,633 Because we know that it's between 0 and 1. 115 00:05:54,633 --> 00:05:57,533 And it gives you a zero under certain values. 116 00:05:57,533 --> 00:06:00,000 And then otherwise it gives you a one. So it only can give you two values. 117 00:06:00,000 --> 00:06:01,800 It fits perfectly fits. 118 00:06:01,800 --> 00:06:04,300 this requirement perfectly. 119 00:06:04,300 --> 00:06:08,466 And therefore you can just say y equals, the, 120 00:06:09,100 --> 00:06:13,466 threshold function of your, so weighted sum and that's it. 121 00:06:13,833 --> 00:06:18,133 And then the second case which you could use is the sigmoid activation function. 122 00:06:18,300 --> 00:06:21,600 It is actually also between 0 and 1, just what we need. 123 00:06:21,600 --> 00:06:25,333 But at the same time you want is just zero one. 124 00:06:25,333 --> 00:06:25,533 Right. 125 00:06:25,533 --> 00:06:28,866 So you it's not exactly the what we need. 126 00:06:28,866 --> 00:06:37,366 But in this case what you could use it as is the probability of y being yes or no. 127 00:06:37,366 --> 00:06:40,000 So we want y to be zero one. 128 00:06:40,000 --> 00:06:45,400 But instead we'll say that the sigmoid function, sigmoid activation 129 00:06:45,400 --> 00:06:48,766 function tells us whether, 130 00:06:48,766 --> 00:06:51,766 it tells us the probability of y being equal to one. 131 00:06:51,766 --> 00:06:56,100 So basically, the closer you get to the top, the more likely 132 00:06:56,333 --> 00:06:59,766 it is that, this is indeed a one or a yes rather than a no. 133 00:07:00,600 --> 00:07:01,233 And, yeah. 134 00:07:01,233 --> 00:07:04,366 So that's, very similar to the logistic regression approach. 135 00:07:04,800 --> 00:07:07,433 And that's those are just two examples. 136 00:07:07,433 --> 00:07:09,900 And if you have a binary variable. 137 00:07:09,900 --> 00:07:12,700 And now let's have a look at another practical application. 138 00:07:12,700 --> 00:07:13,733 Let's have a look at 139 00:07:13,733 --> 00:07:16,833 how all this would play out if we had a neural network like this. 140 00:07:17,266 --> 00:07:20,266 So in the first input layer we have some inputs. 141 00:07:20,466 --> 00:07:23,466 they are sent off to our first hidden layer. 142 00:07:23,666 --> 00:07:26,000 And then an activation function is applied. 143 00:07:26,000 --> 00:07:29,033 And usually what you would apply here and what you'll see throughout this course 144 00:07:29,033 --> 00:07:32,266 is we would apply a rectifier activation function. 145 00:07:32,700 --> 00:07:34,433 So it would look something like that. 146 00:07:34,433 --> 00:07:36,466 We apply the rectifier activation function. 147 00:07:36,466 --> 00:07:40,433 And then from there the signals would be passed on to 148 00:07:40,666 --> 00:07:44,700 the output layer where the sigmoid activation function would be applied. 149 00:07:44,933 --> 00:07:46,733 And that would be our final output. 150 00:07:46,733 --> 00:07:48,933 And that could predict a probability for instance. 151 00:07:48,933 --> 00:07:50,366 So this combination is going to be 152 00:07:50,366 --> 00:07:53,900 quite common where in the hidden layers we apply the rectifier function. 153 00:07:54,400 --> 00:07:58,333 And then in the output layer we apply the sigmoid function. 154 00:07:58,733 --> 00:07:59,633 So there we go. 155 00:07:59,633 --> 00:08:01,333 I hope you enjoyed today's tutorial. 156 00:08:01,333 --> 00:08:04,933 Now you are quite well versed in the four different types of activation functions, 157 00:08:04,933 --> 00:08:08,400 and you will get some hands on practical experience with them. 158 00:08:08,400 --> 00:08:12,166 Throughout this course we will be using them all over the place, 159 00:08:12,166 --> 00:08:13,866 so you'll get to know them quite intimately 160 00:08:13,866 --> 00:08:16,466 and you should be quite comfortable with them. 161 00:08:16,466 --> 00:08:20,600 But for now, this is the knowledge that you need to progress and understand 162 00:08:20,866 --> 00:08:23,800 what is going to be happening further down in this course. 163 00:08:23,800 --> 00:08:26,800 And on that note, I will look forward to seeing you next time. 164 00:08:26,800 --> 00:08:28,500 Until then, enjoy deep learning.