1 00:00:00,566 --> 00:00:03,000 Hello and welcome back to the course on Deep Learning. 2 00:00:03,000 --> 00:00:05,700 In the previous tutorial, we found out what Convolutional 3 00:00:05,700 --> 00:00:07,166 Neural Networks are all about. 4 00:00:07,166 --> 00:00:09,600 And today we're going to dive into step one. 5 00:00:09,600 --> 00:00:10,933 Can evolution. 6 00:00:10,933 --> 00:00:14,600 So this is the convolution function. 7 00:00:14,866 --> 00:00:18,433 I know we try to stay away from mathematics and keep things intuitive, 8 00:00:18,433 --> 00:00:23,000 but I couldn't help but share this formula for you because it is so simple. 9 00:00:23,066 --> 00:00:26,933 A convolution is basically a combined integration of two functions. 10 00:00:27,200 --> 00:00:30,900 And it shows you how one function modifies, 11 00:00:30,900 --> 00:00:32,633 the other, modifies the shape of the other. 12 00:00:32,633 --> 00:00:36,300 And if you've done any signal processing or electrical engineering 13 00:00:36,300 --> 00:00:39,300 or a profession where signal processing is required, 14 00:00:39,466 --> 00:00:42,300 you would have inevitably come across the convolution function. 15 00:00:42,300 --> 00:00:43,700 It is quite popular. 16 00:00:43,700 --> 00:00:49,400 Now once again, we're going to, keep the mathematics lights or keep them separate. 17 00:00:49,400 --> 00:00:54,833 And if you'd like to get into the math behind the convolutional neural networks, 18 00:00:54,833 --> 00:00:59,433 a great additional read is, Introduction to Convolutional Neural Networks 19 00:00:59,666 --> 00:01:05,500 by Jensen Wu, who is a professor at the Nanjing University in China. 20 00:01:05,666 --> 00:01:10,400 This paper was published literally days ago, like 5 or 6 days ago. 21 00:01:10,500 --> 00:01:13,933 And it is oriented specifically at people who are starting out, 22 00:01:13,933 --> 00:01:17,300 at beginners who are getting to know convolutional neural networks. 23 00:01:17,300 --> 00:01:20,100 So, the mathematics there should be accessible. 24 00:01:20,100 --> 00:01:24,000 I actually emailed, a professor Shan seen Wu, and, 25 00:01:24,266 --> 00:01:27,266 yeah, he said his whole goal is to 26 00:01:27,266 --> 00:01:30,500 make, break, the complex things down 27 00:01:30,500 --> 00:01:33,566 so that people who are new to this field can understand. And. 28 00:01:34,100 --> 00:01:38,866 also, he mentioned that he's got some materials available on his homepage. 29 00:01:38,866 --> 00:01:40,500 So if you in the URL, 30 00:01:40,500 --> 00:01:44,533 if you just remove the last two parts and you just go to, like 31 00:01:45,966 --> 00:01:49,233 to that part, that's his homepage and you'll be able to find 32 00:01:49,233 --> 00:01:52,900 more additional tutorials and materials which haven't been published as papers. 33 00:01:53,233 --> 00:01:55,800 But he uses them in his tutorials. 34 00:01:55,800 --> 00:01:57,866 So, you might find those useful. 35 00:01:57,866 --> 00:02:01,433 So browse around there if you'd like to get an introduction 36 00:02:01,433 --> 00:02:05,266 into the mathematics behind convolutional neural networks and kind of, 37 00:02:05,766 --> 00:02:07,600 build a solid. Base around that. 38 00:02:07,600 --> 00:02:12,400 Area, but we're going to move on and, we're going to talk about the convolution. 39 00:02:12,400 --> 00:02:15,800 So what is a convolution in intuitive terms 40 00:02:16,400 --> 00:02:18,900 here on the left we've got an input image as we discussed. 41 00:02:18,900 --> 00:02:20,900 that's how we're going to look at images. 42 00:02:20,900 --> 00:02:22,800 Just ones and zeros to simplify things. 43 00:02:22,800 --> 00:02:24,900 And you can see the smiley face there. 44 00:02:24,900 --> 00:02:26,300 There we've got a feature detector. 45 00:02:26,300 --> 00:02:30,000 So feature detectors a three by three matrix doesn't have to be three by three. 46 00:02:30,000 --> 00:02:31,766 No it doesn't. 47 00:02:31,766 --> 00:02:34,000 AlexNet. I think it uses, seven. 48 00:02:34,000 --> 00:02:34,666 By seven. 49 00:02:35,700 --> 00:02:37,933 and then some other, one of those other. 50 00:02:37,933 --> 00:02:40,933 Famous ones uses like five by five feature detectors. 51 00:02:41,266 --> 00:02:45,666 they can be different, but usually you'll see that there are three by three. 52 00:02:45,900 --> 00:02:49,266 And, They are, you know, reasons to make them three by three. 53 00:02:49,266 --> 00:02:52,100 So we're going to stick to the conventional way. 54 00:02:52,100 --> 00:02:54,066 Having a three by three feature detector. 55 00:02:54,066 --> 00:02:56,333 also the feature detectors called 56 00:02:56,333 --> 00:02:58,633 these are important terms because you might come across them. 57 00:02:58,633 --> 00:03:01,700 They're many different terms, for the feature detector. 58 00:03:01,700 --> 00:03:04,033 But the most common ones are feature detector. 59 00:03:04,033 --> 00:03:09,166 Or you might hear it being called kernel, or you might hear it being called filter. 60 00:03:09,366 --> 00:03:13,066 So in this course we're going to be using either filter or a feature detector 61 00:03:13,100 --> 00:03:14,366 interchangeably. 62 00:03:14,366 --> 00:03:16,900 But just bear in mind that it has those names. 63 00:03:16,900 --> 00:03:21,433 And A convolution operation is signified by an X 64 00:03:21,433 --> 00:03:25,266 in a circle, just as you saw in the formulas before. 65 00:03:25,633 --> 00:03:29,600 And here what happens is, on an intuitive level, 66 00:03:29,666 --> 00:03:33,633 or just to think of it in terms of what is actually happening in the background 67 00:03:33,633 --> 00:03:36,633 rather than the mathematics, well, you take this feature detector, 68 00:03:36,933 --> 00:03:40,433 or filter and you put it on your image like you see on the left. 69 00:03:40,533 --> 00:03:44,133 So you cover the, for instance, in this case 70 00:03:44,366 --> 00:03:48,033 the top left corner, nine pixels in the top left corner. 71 00:03:48,300 --> 00:03:51,600 And you basically multiply 72 00:03:52,400 --> 00:03:54,900 each value by each value, so respective values. 73 00:03:54,900 --> 00:03:59,733 So, the top zero by the top left go value by the top left value. 74 00:03:59,733 --> 00:04:02,100 Then basically it's position number. 75 00:04:02,100 --> 00:04:03,433 One, one by position number one. 76 00:04:03,433 --> 00:04:08,600 One position by a number or zero, one by zero, one zero, two by zero two and so on. 77 00:04:08,600 --> 00:04:13,066 So just it's, element wise multiplication of these matrices 78 00:04:13,233 --> 00:04:14,400 and then you add up the results. 79 00:04:14,400 --> 00:04:16,600 So in this case nothing matches up. 80 00:04:16,600 --> 00:04:19,633 So always always either a zero by zero zero by one. 81 00:04:19,800 --> 00:04:21,600 So the result is zero. 82 00:04:21,600 --> 00:04:23,000 Here you can see that one. 83 00:04:23,000 --> 00:04:25,800 Of them matched up the one on the left. 84 00:04:25,800 --> 00:04:28,066 Matched up. And therefore we got a one here. 85 00:04:28,066 --> 00:04:30,733 Nothing matched up. Nothing matched up, nothing matched up. 86 00:04:30,733 --> 00:04:32,066 Then we move on to the next row. 87 00:04:32,066 --> 00:04:35,500 So, and the step at which we're moving 88 00:04:35,500 --> 00:04:38,500 this whole, filter is called the stride. 89 00:04:38,500 --> 00:04:40,433 So here we have a stride of one pixel. 90 00:04:40,433 --> 00:04:43,033 So here you can see again something matched up the bottom right corner. 91 00:04:43,033 --> 00:04:44,033 Matched up. 92 00:04:44,033 --> 00:04:45,366 Against stride. 93 00:04:45,366 --> 00:04:48,200 But Bottom one in the middle matched up. 94 00:04:48,200 --> 00:04:50,000 here. Top right, one matched up. 95 00:04:50,000 --> 00:04:52,133 Then nothing. Mentioned. The stride is one. 96 00:04:52,133 --> 00:04:53,366 you can change the stride. 97 00:04:53,366 --> 00:04:55,933 you can make it one, two. 98 00:04:55,933 --> 00:04:58,566 you can make it three. Whatever you like. 99 00:04:58,566 --> 00:05:02,633 the conventionally the one that works well is usually a two. 100 00:05:02,666 --> 00:05:04,433 So that's what people stick to. 101 00:05:04,433 --> 00:05:08,733 And we'll we'll talk about what the stride is, towards the end of this tutorial. 102 00:05:09,466 --> 00:05:11,700 So here we've got, we're matching up. 103 00:05:11,700 --> 00:05:12,600 So we just keep our eye here. 104 00:05:12,600 --> 00:05:15,600 You can see we've got a two because two of them matched up. 105 00:05:15,633 --> 00:05:17,800 And so on and so on, so on. 106 00:05:17,800 --> 00:05:19,300 there we go. There's another one that matched up. 107 00:05:21,300 --> 00:05:23,566 There we go. 108 00:05:23,566 --> 00:05:24,666 And there we're done. 109 00:05:24,666 --> 00:05:27,633 So what's what have we created? 110 00:05:27,633 --> 00:05:28,066 Right. 111 00:05:28,066 --> 00:05:31,066 a couple of important things here. 112 00:05:31,533 --> 00:05:34,666 the image on the right is called a feature map. 113 00:05:35,200 --> 00:05:36,566 Also has several terms. 114 00:05:36,566 --> 00:05:40,100 It also can be called sometimes, it convolved feature. 115 00:05:40,833 --> 00:05:43,833 So when you apply a convolution operation operator to something, 116 00:05:44,100 --> 00:05:45,666 it doesn't become convoluted. 117 00:05:45,666 --> 00:05:46,866 It becomes convolved. 118 00:05:46,866 --> 00:05:49,933 And yeah, I use sometimes I like I. 119 00:05:50,333 --> 00:05:51,233 Think to myself in the. 120 00:05:51,233 --> 00:05:54,233 Wrong way, but it's the correct term is convolved. 121 00:05:54,633 --> 00:05:55,700 it's a kind of old feature. 122 00:05:55,700 --> 00:05:57,900 Or it can also be called the activation map, 123 00:05:57,900 --> 00:06:00,900 but we're going to be calling it a feature map in this course. 124 00:06:01,000 --> 00:06:01,666 so it can be called it. 125 00:06:01,666 --> 00:06:03,333 Any one of those things. 126 00:06:03,333 --> 00:06:06,200 And what have we done here. 127 00:06:06,200 --> 00:06:09,833 Well, as you can see, we've reduced the size of the image. 128 00:06:09,833 --> 00:06:10,533 That's number one. 129 00:06:10,533 --> 00:06:13,533 And that's the important thing I wanted to mention about, 130 00:06:13,866 --> 00:06:16,800 your input image and the feature detect and the stride. 131 00:06:16,800 --> 00:06:17,266 Right. 132 00:06:17,266 --> 00:06:19,933 If you have a stride of one, you can see the image reduced a bit. 133 00:06:19,933 --> 00:06:23,100 But if you have a Strided two, the image is going to reduce more. 134 00:06:23,100 --> 00:06:25,466 So the feature map is going to be even smaller. 135 00:06:25,466 --> 00:06:30,400 And that's an, a very important, function of the feature 136 00:06:30,400 --> 00:06:35,400 detector of this whole convolution step is to make the image smaller. 137 00:06:35,633 --> 00:06:38,900 because that will be it'll be easier to process it. 138 00:06:39,966 --> 00:06:41,966 and. it'll be just faster. 139 00:06:41,966 --> 00:06:44,966 It will and, 140 00:06:45,966 --> 00:06:48,266 It'll be just faster because imagine like here 141 00:06:48,266 --> 00:06:51,766 we've got a what, a seven by seven image. 142 00:06:51,766 --> 00:06:54,933 But imagine if you have a proper photo, right. 143 00:06:54,933 --> 00:06:59,200 or you have a, like a 256 by 256 pixel image. 144 00:06:59,200 --> 00:07:03,433 That's it's a huge number of pixels by 256, squared. 145 00:07:03,766 --> 00:07:06,766 or like let's say you have a 300 by 300 pixels. 146 00:07:06,800 --> 00:07:09,800 So, so we don't get confused with the RGB 256. 147 00:07:09,800 --> 00:07:14,400 Let's just say we have a 300 by 300, image in terms of size in pixels. 148 00:07:14,633 --> 00:07:17,300 Then you have 300 squared number of pixels. 149 00:07:17,300 --> 00:07:19,066 That's a huge number. 150 00:07:19,066 --> 00:07:24,433 and therefore feature detectors, will reduce the size of the image. 151 00:07:24,433 --> 00:07:27,433 And therefore stride of two is actually beneficial. 152 00:07:27,600 --> 00:07:29,866 But then the question is do we lose information 153 00:07:29,866 --> 00:07:34,233 or are we losing information when we're applying the feature detector. 154 00:07:34,400 --> 00:07:36,800 Well, some information we are losing 155 00:07:36,800 --> 00:07:40,300 of course, because we have less values in our resulting matrix. 156 00:07:40,566 --> 00:07:44,166 But at the same time, the purpose of the feature detector is to detect certain 157 00:07:44,166 --> 00:07:47,733 features, certain parts of the image that are integral. 158 00:07:48,466 --> 00:07:50,966 And so, for instance, if you think about it this way, 159 00:07:50,966 --> 00:07:53,966 like the feature detector has a certain pattern on it, 160 00:07:54,000 --> 00:07:57,866 the highest number in your feature map is when that pattern matches up. 161 00:07:57,866 --> 00:08:00,833 In fact, the highest number you can get is in. 162 00:08:00,833 --> 00:08:05,500 Another simplified example is when, the feature is that it matches exactly. 163 00:08:05,500 --> 00:08:09,166 And you can see with that number four we have in our feature map. 164 00:08:09,400 --> 00:08:10,466 That's exactly. 165 00:08:10,466 --> 00:08:15,666 So if you look over here, that's exactly where this feature detector, 166 00:08:15,666 --> 00:08:19,000 because there's only four ones in it matched perfectly. 167 00:08:19,000 --> 00:08:21,300 So you can see this this part over here. 168 00:08:21,300 --> 00:08:23,300 So the feature was detected here. 169 00:08:23,300 --> 00:08:27,066 And as we discussed at the very start of this section 170 00:08:28,200 --> 00:08:29,966 that features is 171 00:08:29,966 --> 00:08:33,000 how we see things, is how we recognize things. 172 00:08:33,000 --> 00:08:35,333 We don't look at every single. 173 00:08:35,333 --> 00:08:37,300 Pixel, so to speak, 174 00:08:37,300 --> 00:08:40,233 in what we see on an image or in real life. 175 00:08:40,233 --> 00:08:41,766 We don't look at every single pixel. 176 00:08:41,766 --> 00:08:46,333 We look at features, we look at the the nose, the hat, the the feather. 177 00:08:47,000 --> 00:08:49,766 the, the, eyes under 178 00:08:49,766 --> 00:08:53,766 or the little black marks under the cheetah's, eyes to distinguish 179 00:08:53,766 --> 00:08:57,266 it between a cheetah and a leopard or the shape of the train. 180 00:08:57,300 --> 00:09:00,600 We don't, to distinguish between a bullet train, a normal train, and so on. 181 00:09:00,600 --> 00:09:02,533 So we don't look at everything, 182 00:09:02,533 --> 00:09:04,500 we look at features, and that's what we're preserving. 183 00:09:04,500 --> 00:09:08,033 And that's what the feature map helps us preserve. 184 00:09:08,033 --> 00:09:12,500 Actually, that's what it it's, allows us to bring forward 185 00:09:12,500 --> 00:09:16,366 and get rid of all of the unnecessary things that even as humans, 186 00:09:16,366 --> 00:09:19,366 we don't processes so much information. 187 00:09:19,466 --> 00:09:23,966 going into your eyes at the, at any given time, like gigabytes of information, 188 00:09:24,133 --> 00:09:28,200 if you look at every single, dot, if not terabytes of information 189 00:09:28,200 --> 00:09:32,133 going into your eyes per second, and still we're able 190 00:09:32,133 --> 00:09:35,233 to process that because we get rid of what is unnecessary. 191 00:09:35,233 --> 00:09:38,233 We only focus on the important features of features that are important to us. 192 00:09:38,800 --> 00:09:41,866 And, that is exactly what the feature map does. 193 00:09:42,133 --> 00:09:44,533 So now moving on. 194 00:09:44,533 --> 00:09:46,166 This is our input image. 195 00:09:46,166 --> 00:09:49,400 And you we create a feature map. 196 00:09:49,400 --> 00:09:52,400 So the front one let's say the front one is the one we just created. 197 00:09:52,600 --> 00:09:54,166 But then how come there's many of them. 198 00:09:54,166 --> 00:09:57,100 But we create multiple feature maps. 199 00:09:57,100 --> 00:10:00,166 because we use different filters. 200 00:10:00,166 --> 00:10:00,500 Right. 201 00:10:00,500 --> 00:10:01,933 And that's another way. 202 00:10:01,933 --> 00:10:03,766 That we preserve lots of the information. 203 00:10:03,766 --> 00:10:07,733 So we don't just have one feature map, we look for certain features 204 00:10:07,733 --> 00:10:12,266 and then, or basically the network decides through its training. 205 00:10:12,266 --> 00:10:14,333 And this is something we'll discuss towards the end of this section. 206 00:10:14,333 --> 00:10:18,000 Through its training, it decides which, features 207 00:10:18,000 --> 00:10:21,733 are important for certain types or certain categories, and. 208 00:10:21,733 --> 00:10:23,166 It looks for them, and therefore. 209 00:10:23,166 --> 00:10:26,033 We'll have different filters. And we'll talk about filters just now. 210 00:10:26,033 --> 00:10:27,700 But basically it'll apply these filters. 211 00:10:27,700 --> 00:10:32,466 So to get this feature map it applied a filter like the one we saw. 212 00:10:32,466 --> 00:10:34,633 But then to get this feature map but apply a different filter 213 00:10:34,633 --> 00:10:37,233 to get this feature map to apply a different filter and so on. 214 00:10:38,200 --> 00:10:40,166 and. So 215 00:10:40,166 --> 00:10:43,166 basically it just creates these feature maps. 216 00:10:43,500 --> 00:10:47,700 And actually that's why personally I think the term feature detector 217 00:10:47,933 --> 00:10:49,500 is better than filter. 218 00:10:49,500 --> 00:10:50,400 So remember over here 219 00:10:50,400 --> 00:10:55,000 we have this filter which we also can call a feature detector. 220 00:10:55,000 --> 00:10:59,366 Well actually the word feature detector I think is better suited. 221 00:10:59,366 --> 00:11:03,300 And the reason for that is that's what the purpose is, right? 222 00:11:03,300 --> 00:11:06,400 We don't want to just we don't want to just filter out our image. 223 00:11:06,400 --> 00:11:07,633 But even though that's the whole 224 00:11:07,633 --> 00:11:10,133 that's the same same just a question of terminology. 225 00:11:10,133 --> 00:11:12,166 But basically we want to detect features. All right. 226 00:11:12,166 --> 00:11:15,666 In this in this layer we're going to our in this. 227 00:11:16,700 --> 00:11:20,200 Feature map we've detected where certain features are in the image. 228 00:11:20,200 --> 00:11:22,133 In this feature map we've detected where certain 229 00:11:22,133 --> 00:11:25,133 other features are where a certain specific feature is located. 230 00:11:25,266 --> 00:11:28,200 And this feature map we've detected where 231 00:11:28,200 --> 00:11:31,200 a certain other feature is located on the image. 232 00:11:31,266 --> 00:11:33,300 So that's, that's what we were doing. 233 00:11:33,300 --> 00:11:34,500 And let's have a look at a couple of examples. 234 00:11:34,500 --> 00:11:40,133 So, here, we're using and this is, from GitHub, dawg. 235 00:11:40,566 --> 00:11:41,666 They're documentation. 236 00:11:41,666 --> 00:11:45,000 It's a free, like a kind of tool, like paint. 237 00:11:45,200 --> 00:11:47,000 And you can. 238 00:11:47,000 --> 00:11:50,300 Use it to adjust your images or work with your images, but basically they have 239 00:11:50,300 --> 00:11:53,466 some valuable examples in their documentation. 240 00:11:53,466 --> 00:11:57,066 And here they have a picture of the Taj Mahal. 241 00:11:57,066 --> 00:11:59,700 And you can choose which filter you want to apply. 242 00:11:59,700 --> 00:12:02,700 So if you download this program and you upload a photo into it, 243 00:12:02,700 --> 00:12:06,566 and then you can actually, start a convolution matrix 244 00:12:06,566 --> 00:12:10,700 and apply filters, and you will see that, these things, 245 00:12:10,800 --> 00:12:13,800 these convolution matrices are actually applied in image processing and. 246 00:12:14,100 --> 00:12:15,133 Design and so on. 247 00:12:15,133 --> 00:12:16,700 So let's have a look at what we get, what we get. 248 00:12:16,700 --> 00:12:19,133 So so if we apply this filter five in the. 249 00:12:19,133 --> 00:12:21,000 Middle of minus one minus one minus one minus. 250 00:12:21,000 --> 00:12:23,700 One, you can see that it's sharpens the image. 251 00:12:23,700 --> 00:12:25,566 And yeah. 252 00:12:25,566 --> 00:12:28,733 So this is, it's quite intuitive if you think of it. 253 00:12:28,800 --> 00:12:32,833 So five is the pixel the main pixel like in the middle of the, 254 00:12:32,833 --> 00:12:36,233 of the filter or the feature detector. 255 00:12:36,433 --> 00:12:37,966 And then minus one, minus one and minus. 256 00:12:37,966 --> 00:12:43,333 One just to kind of like reduces the pixels around the inner in an intuitive. 257 00:12:44,300 --> 00:12:46,100 sense. 258 00:12:46,100 --> 00:12:46,866 then blur. 259 00:12:46,866 --> 00:12:50,300 So basically it takes, c equal significant 260 00:12:50,300 --> 00:12:54,500 gives equal significance to all of the, pixels around the one in the center. 261 00:12:54,500 --> 00:12:58,966 And therefore it combines them together and you get a blur edge enhance. 262 00:12:58,966 --> 00:13:00,666 So here you can see that. 263 00:13:00,666 --> 00:13:02,833 minus one and one and then. 264 00:13:02,833 --> 00:13:03,800 You get zeros. Right. 265 00:13:03,800 --> 00:13:09,566 So you did delete remove the pixels around, the main one in the middle. 266 00:13:09,566 --> 00:13:12,000 And you only keep this. One at a minus one. 267 00:13:12,000 --> 00:13:12,733 And it gives you an edge. 268 00:13:12,733 --> 00:13:14,233 And this was a bit harder to. Understand. 269 00:13:14,233 --> 00:13:16,200 How it works. 270 00:13:16,200 --> 00:13:19,200 Like probably 100, just to think of it intuitively. 271 00:13:19,200 --> 00:13:20,833 edge detect. Right. 272 00:13:20,833 --> 00:13:23,500 So this one probably makes more sense. Right? 273 00:13:23,500 --> 00:13:27,366 You, take the middle one, you reduce the middle one. 274 00:13:28,900 --> 00:13:32,433 The probably like the strength of the middle pixel. 275 00:13:32,433 --> 00:13:35,600 And then you look for the ones you look for. 276 00:13:35,600 --> 00:13:37,333 these ones you, 277 00:13:38,400 --> 00:13:40,433 increase the strength. 278 00:13:40,433 --> 00:13:42,000 Of the ones around them. 279 00:13:42,000 --> 00:13:44,566 So you have. The ones. There. 280 00:13:44,566 --> 00:13:45,833 yeah. So that's that. 281 00:13:45,833 --> 00:13:49,100 Gives you, like, an edge detection, and you can see what you get there and. 282 00:13:49,100 --> 00:13:51,700 Boss, another one. So. 283 00:13:51,700 --> 00:13:55,566 the, the key here is that it's asymmetrical. 284 00:13:55,566 --> 00:13:58,066 And you can see the image becomes asymmetrical as well. 285 00:13:58,066 --> 00:13:58,233 So you. 286 00:13:58,233 --> 00:14:02,566 Got like that kind of, feeling that it's standing out. 287 00:14:02,566 --> 00:14:03,633 Towards you. 288 00:14:03,633 --> 00:14:06,466 And that's what you get when you have like minuses here. 289 00:14:06,466 --> 00:14:07,100 And pluses here. 290 00:14:07,100 --> 00:14:09,933 Again, this is very this is getting a bit technical now. 291 00:14:09,933 --> 00:14:12,700 But at least we can get some kind of intuitive understanding. 292 00:14:12,700 --> 00:14:14,066 Let's just go quickly through them again. 293 00:14:14,066 --> 00:14:16,966 So there's sharpen. There's blur. 294 00:14:16,966 --> 00:14:19,733 There's edge enhance. There's edge detect. 295 00:14:19,733 --> 00:14:20,733 There's emboss. 296 00:14:20,733 --> 00:14:24,433 And so as you can see these are great examples of the same image. 297 00:14:24,566 --> 00:14:27,233 But we're getting feature maps. 298 00:14:27,233 --> 00:14:28,066 So we use different 299 00:14:28,066 --> 00:14:31,500 feature detectors to get different feature maps of the same image. 300 00:14:31,733 --> 00:14:36,266 And therefore now we have lots of the lots of this versions of this image. 301 00:14:37,666 --> 00:14:40,566 where in each one we've tried to detect certain 302 00:14:40,566 --> 00:14:44,766 things, these terms, they're not applicable to us. 303 00:14:44,766 --> 00:14:47,833 They're we can say like emboss is probably not applicable to us 304 00:14:47,833 --> 00:14:51,533 in terms of convolutional neural networks, but edge detect, that's important. 305 00:14:51,533 --> 00:14:53,166 We want to detect the edges. 306 00:14:53,166 --> 00:14:56,400 Edge enhance probably not blur sharpen. 307 00:14:56,400 --> 00:14:59,700 So certain things like edge detectors probably the most important one 308 00:15:00,000 --> 00:15:02,366 for our type of, work. 309 00:15:02,366 --> 00:15:04,800 And in terms of understanding like computers, 310 00:15:04,800 --> 00:15:06,266 they will decide for themselves. 311 00:15:06,266 --> 00:15:08,933 The neural network will decide for itself what's important, what's not. 312 00:15:08,933 --> 00:15:12,833 And it probably won't be even, recognizable to the human eye. 313 00:15:12,833 --> 00:15:14,600 You won't be able to understand what those features 314 00:15:14,600 --> 00:15:16,700 mean, but the computer will decide. 315 00:15:16,700 --> 00:15:19,733 And that's the beauty that, of neural networks, 316 00:15:19,733 --> 00:15:22,766 that they can process so many different. 317 00:15:22,766 --> 00:15:24,400 Things and understand without. 318 00:15:24,400 --> 00:15:27,566 Even having that intuition, without having that, 319 00:15:28,066 --> 00:15:30,700 explanation why they will understand which features are. 320 00:15:30,700 --> 00:15:34,266 Important to them, whether we have a name for them or not. 321 00:15:35,133 --> 00:15:38,666 That's a whole that's an irrelevant question for the artificial neural 322 00:15:38,666 --> 00:15:39,866 network. 323 00:15:39,866 --> 00:15:43,566 And my favorite one, here's a image of Geoffrey Hinton. 324 00:15:43,566 --> 00:15:45,933 photo of Geoffrey Hinton. 325 00:15:45,933 --> 00:15:50,400 passed through, the, one of these filters. 326 00:15:50,800 --> 00:15:52,933 All right, so that brings us to the end of today's tutorial. 327 00:15:52,933 --> 00:15:55,300 I hope you enjoyed learning about convolution. 328 00:15:55,300 --> 00:16:00,100 The key takeaway is that, convolution, the the primary purpose of a convolution 329 00:16:00,300 --> 00:16:04,466 is to find features in your image using the feature detector, 330 00:16:04,500 --> 00:16:08,200 put them into a feature map, and by having them in a feature map, 331 00:16:08,200 --> 00:16:11,500 it still preserves the spatial relationships, 332 00:16:12,000 --> 00:16:15,633 between pixels, which is very important for us to, you know, 333 00:16:15,633 --> 00:16:19,033 because if they're completely jumbled up, then we've, we've lost the pattern. 334 00:16:19,200 --> 00:16:23,333 And at the same time, it's important to understand that most of the time, 335 00:16:23,333 --> 00:16:29,300 the features a neural network will detect and use to recognize certain images 336 00:16:29,300 --> 00:16:32,766 and classes will mean nothing to humans, but nevertheless, they work. 337 00:16:33,000 --> 00:16:34,300 And that's what convolution is. 338 00:16:34,300 --> 00:16:36,133 And I look forward to seeing you on next tutorial. 339 00:16:36,133 --> 00:16:37,933 Until then, enjoy deep learning.