1 00:00:00,180 --> 00:00:02,440 ‫So we have seen asynchronous processing, 2 00:00:02,440 --> 00:00:04,170 ‫we have seen synchronous processing, 3 00:00:04,170 --> 00:00:06,890 ‫and now we are going to see Event Source Mapping. 4 00:00:06,890 --> 00:00:08,730 ‫So this is the last category of how Lambda 5 00:00:08,730 --> 00:00:11,200 ‫can process events in AWS. 6 00:00:11,200 --> 00:00:15,010 ‫So it applies to Kinesis Data Streams, 7 00:00:15,010 --> 00:00:19,040 ‫SQS and SQS FIFO queue and DynamoDB Streams. 8 00:00:19,040 --> 00:00:22,490 ‫So the common denominator of all of these things together 9 00:00:22,490 --> 00:00:25,423 ‫is that records need to be polled from the source. 10 00:00:25,423 --> 00:00:28,480 ‫So Lambda needs to ask the service to get some records 11 00:00:28,480 --> 00:00:30,210 ‫and then the records will be returned. 12 00:00:30,210 --> 00:00:34,160 ‫So that means that Lambda needs to poll from these services. 13 00:00:34,160 --> 00:00:36,260 ‫So in this case the Lambda function 14 00:00:36,260 --> 00:00:37,600 ‫is invoked synchronously. 15 00:00:37,600 --> 00:00:38,970 ‫So let's see this, 16 00:00:38,970 --> 00:00:41,300 ‫We have Kinesis and the Lambda service, 17 00:00:41,300 --> 00:00:44,570 ‫and if we configure Lambda to read from Kinesis, 18 00:00:44,570 --> 00:00:46,590 ‫then they will be internally 19 00:00:46,590 --> 00:00:49,150 ‫an Event Source Mapping that will be created. 20 00:00:49,150 --> 00:00:52,370 ‫And that is responsible for polling Kinesis, 21 00:00:52,370 --> 00:00:55,110 ‫and returning and getting the results back from Kinesis. 22 00:00:55,110 --> 00:00:57,483 ‫So Kinesis will return a batch to us. 23 00:00:58,326 --> 00:00:59,910 ‫And then once this Event Source Mapping 24 00:00:59,910 --> 00:01:02,370 ‫has some data for Lambda to process, 25 00:01:02,370 --> 00:01:05,500 ‫it's going to invoke our Lambda function synchronously, 26 00:01:05,500 --> 00:01:07,290 ‫with an event batch. 27 00:01:07,290 --> 00:01:09,320 ‫So at its core, this is how it works. 28 00:01:09,320 --> 00:01:11,970 ‫So there are two categories of Event Source Mapper. 29 00:01:11,970 --> 00:01:14,240 ‫The first one is streams and the second one is queues. 30 00:01:14,240 --> 00:01:15,260 ‫So let's deal with streams, 31 00:01:15,260 --> 00:01:17,600 ‫and streams apply to Kinesis Data Streams 32 00:01:17,600 --> 00:01:18,880 ‫and DynamoDB Streams, 33 00:01:18,880 --> 00:01:20,240 ‫and we'll see what DynamoDB Streams are 34 00:01:20,240 --> 00:01:21,370 ‫very, very soon. 35 00:01:21,370 --> 00:01:24,780 ‫So in case of streams, they will be an Event Source Mapping. 36 00:01:24,780 --> 00:01:27,320 ‫They will create an iterator for each shard, 37 00:01:27,320 --> 00:01:30,250 ‫so each Kinesis shard or DynamoDB Stream shard. 38 00:01:30,250 --> 00:01:34,510 ‫And the items will be processed in order at the shard level. 39 00:01:34,510 --> 00:01:37,080 ‫And so we can configure where to start to read from. 40 00:01:37,080 --> 00:01:39,350 ‫You can read it with just the new items, 41 00:01:39,350 --> 00:01:41,250 ‫or from the beginning of the shard 42 00:01:41,250 --> 00:01:43,580 ‫or from a specific timestamp. 43 00:01:43,580 --> 00:01:46,980 ‫Whenever an item is processed from a shard, 44 00:01:46,980 --> 00:01:48,710 ‫whether it be from Kinesis or DynamoDB, 45 00:01:48,710 --> 00:01:50,450 ‫they are not removed from the streams. 46 00:01:50,450 --> 00:01:53,190 ‫That means that other consumers can read the data 47 00:01:53,190 --> 00:01:54,980 ‫in Kinesis or DynamoDB, 48 00:01:54,980 --> 00:01:57,260 ‫which is how they work in the first place. 49 00:01:57,260 --> 00:01:59,713 ‫But I just wanted to reemphasize this. 50 00:02:01,149 --> 00:02:02,330 ‫So the use case for this 51 00:02:02,330 --> 00:02:04,520 ‫is either low traffic or high traffic. 52 00:02:04,520 --> 00:02:06,710 ‫So if you have a low traffic stream, 53 00:02:06,710 --> 00:02:09,850 ‫you can use a batch window to accumulate records 54 00:02:09,850 --> 00:02:10,683 ‫before processing. 55 00:02:10,683 --> 00:02:11,890 ‫So to make sure that 56 00:02:11,890 --> 00:02:14,730 ‫you invoke another function efficiently. 57 00:02:14,730 --> 00:02:17,500 ‫And then if you have a very high throughput stream 58 00:02:17,500 --> 00:02:19,210 ‫and you want to speed up processing, 59 00:02:19,210 --> 00:02:22,900 ‫you can set up Lambda to process multiple batches 60 00:02:22,900 --> 00:02:24,900 ‫in parallel at the shard level. 61 00:02:24,900 --> 00:02:28,050 ‫So here's a diagram from AWS blog. 62 00:02:28,050 --> 00:02:31,680 ‫And so we have a shard and there's a record processor and 63 00:02:31,680 --> 00:02:33,920 ‫there's a way to have a parallelism 64 00:02:33,920 --> 00:02:36,880 ‫to have multiple Lambda functions process the batch 65 00:02:36,880 --> 00:02:38,380 ‫within the same shard. 66 00:02:38,380 --> 00:02:42,740 ‫So you can have up to 10 batch processors per shard. 67 00:02:42,740 --> 00:02:46,360 ‫And for each batch they will be in-order processing, 68 00:02:46,360 --> 00:02:48,263 ‫at the partition key level. 69 00:02:49,316 --> 00:02:51,335 ‫So if you specify a partition key, 70 00:02:51,335 --> 00:02:53,070 ‫it will not be read in order entirely for the shard. 71 00:02:53,070 --> 00:02:55,660 ‫But each key within the shard will be read in order. 72 00:02:55,660 --> 00:02:58,240 ‫So this is how we can parallelize the processing 73 00:02:58,240 --> 00:03:00,730 ‫for Lambda function and your streams. 74 00:03:00,730 --> 00:03:01,830 ‫What about errors? 75 00:03:01,830 --> 00:03:04,220 ‫By default, if your function returns an error, 76 00:03:04,220 --> 00:03:06,270 ‫the entire batch is going to be reprocessed 77 00:03:06,270 --> 00:03:07,790 ‫until the function succeeds 78 00:03:07,790 --> 00:03:09,640 ‫or the items in the batch expire. 79 00:03:09,640 --> 00:03:11,010 ‫So this is very important. 80 00:03:11,010 --> 00:03:14,590 ‫Having an error in a batch can block your processing. 81 00:03:14,590 --> 00:03:17,420 ‫So to ensure in-order processing, 82 00:03:17,420 --> 00:03:19,470 ‫processing for the effected batch is paused 83 00:03:19,470 --> 00:03:21,100 ‫until the error is resolved. 84 00:03:21,100 --> 00:03:24,150 ‫And so you can manage that in several ways. 85 00:03:24,150 --> 00:03:26,070 ‫You can configure the event source mapping 86 00:03:26,070 --> 00:03:28,990 ‫to number one, discard old events. 87 00:03:28,990 --> 00:03:31,180 ‫or restrict the number of retries 88 00:03:31,180 --> 00:03:33,330 ‫or split the batch on errors. 89 00:03:33,330 --> 00:03:36,160 ‫This case, this is around Lambda timeout issue. 90 00:03:36,160 --> 00:03:37,640 ‫So if your Lambda function doesn't have enough time 91 00:03:37,640 --> 00:03:38,930 ‫to process the entire batch, 92 00:03:38,930 --> 00:03:41,620 ‫maybe you will have enough time to process half the batch. 93 00:03:41,620 --> 00:03:44,870 ‫And then in case you want to discard old events, 94 00:03:44,870 --> 00:03:46,940 ‫all the events can go to a destination. 95 00:03:46,940 --> 00:03:50,138 ‫And we'll see destinations in the next lectures. 96 00:03:50,138 --> 00:03:51,120 ‫So this is all about streams. 97 00:03:51,120 --> 00:03:53,290 ‫Now you need to know about queues. 98 00:03:53,290 --> 00:03:56,290 ‫So for the queues, it's for SQS and SQS FIFO. 99 00:03:56,290 --> 00:03:57,920 ‫And so you have the same idea. 100 00:03:57,920 --> 00:03:59,410 ‫The SQS queue will be polled 101 00:03:59,410 --> 00:04:01,390 ‫by a Lambda Event Source Mapping. 102 00:04:01,390 --> 00:04:03,320 ‫And then whenever a batch is returned, 103 00:04:03,320 --> 00:04:05,800 ‫your Lambda function will be invoked synchronously 104 00:04:05,800 --> 00:04:07,193 ‫with the event batch. 105 00:04:08,430 --> 00:04:09,300 ‫So in the case of SQS, 106 00:04:09,300 --> 00:04:11,350 ‫the Event Source Mapping will poll SQS 107 00:04:11,350 --> 00:04:12,280 ‫using Long Polling. 108 00:04:12,280 --> 00:04:13,690 ‫So it's going to be efficient. 109 00:04:13,690 --> 00:04:15,160 ‫And we can specify the batch size 110 00:04:15,160 --> 00:04:17,180 ‫from one to 10 messages. 111 00:04:17,180 --> 00:04:18,610 ‫And here is the configuration. 112 00:04:18,610 --> 00:04:21,023 ‫So just the batch size and the SQS queue. 113 00:04:22,691 --> 00:04:23,524 ‫Then there's some recommendations 114 00:04:23,524 --> 00:04:24,470 ‫from the websites of AWS, 115 00:04:24,470 --> 00:04:26,760 ‫which is to set the queue visibility timeouts 116 00:04:26,760 --> 00:04:28,840 ‫to six times the time out of your Lambda function, 117 00:04:28,840 --> 00:04:29,940 ‫which is configurable. 118 00:04:31,086 --> 00:04:32,130 ‫And then if you want to use a DLQ, 119 00:04:32,130 --> 00:04:34,423 ‫so if you want to make sure that 120 00:04:34,423 --> 00:04:36,490 ‫if there's a problem reading or processing a message 121 00:04:36,490 --> 00:04:38,400 ‫in SQS, it goes to dead-letter queue, 122 00:04:38,400 --> 00:04:41,640 ‫then you set up the DLQ on the SQS queue, 123 00:04:41,640 --> 00:04:42,473 ‫not on Lambda. 124 00:04:42,473 --> 00:04:45,470 ‫So we'd set up a DLQ on SQS 125 00:04:45,470 --> 00:04:47,030 ‫but not on Lambda. Why? 126 00:04:47,030 --> 00:04:48,350 ‫Because the DLQ for Lambda 127 00:04:48,350 --> 00:04:50,940 ‫only works for asynchronous invocations. 128 00:04:50,940 --> 00:04:53,350 ‫And this is a synchronous invocations. 129 00:04:53,350 --> 00:04:55,220 ‫Or as we'll see, we can also use 130 00:04:55,220 --> 00:04:57,200 ‫a Lambda destination for failures. 131 00:04:57,200 --> 00:04:59,660 ‫And this is going to be seen in the next lectures. 132 00:04:59,660 --> 00:05:03,440 ‫So now the bates of information on queues and Lambda, 133 00:05:03,440 --> 00:05:06,230 ‫and I'm sorry this is quite boring but I have to say it. 134 00:05:06,230 --> 00:05:08,430 ‫So Lambda supports in-order processing, 135 00:05:08,430 --> 00:05:10,900 ‫if you have a FIFO queue, so First-In, First-Out. 136 00:05:10,900 --> 00:05:12,930 ‫And the number of Lambda functions 137 00:05:12,930 --> 00:05:15,080 ‫that will be scaling to process your queue 138 00:05:15,080 --> 00:05:17,520 ‫will be equal to the number of active message groups. 139 00:05:17,520 --> 00:05:19,490 ‫So this is the group ID setting. 140 00:05:19,490 --> 00:05:21,020 ‫If you use a standard queue, 141 00:05:21,020 --> 00:05:23,453 ‫then the items will not be processed in order. 142 00:05:24,587 --> 00:05:26,150 ‫And for a standard queue, Lambda will scale 143 00:05:26,150 --> 00:05:28,320 ‫as fast as possible to read all the messages 144 00:05:28,320 --> 00:05:30,180 ‫in your standard queue. 145 00:05:30,180 --> 00:05:32,510 ‫If there is an error happening in your queue, 146 00:05:32,510 --> 00:05:34,150 ‫then the batches are going to be returned 147 00:05:34,150 --> 00:05:36,060 ‫to the queue as individual items 148 00:05:36,060 --> 00:05:38,340 ‫and might be processed in a different grouping 149 00:05:38,340 --> 00:05:40,030 ‫than the original batch. 150 00:05:40,030 --> 00:05:41,990 ‫Occasionally, the Events Source Mapping 151 00:05:41,990 --> 00:05:44,240 ‫might receive the same item twice from the queue, 152 00:05:44,240 --> 00:05:46,270 ‫even if no function error occurred. 153 00:05:46,270 --> 00:05:47,470 ‫So you need to make sure 154 00:05:48,383 --> 00:05:49,610 ‫to have item potent processing 155 00:05:49,610 --> 00:05:52,120 ‫for Lambda function in case that happens. 156 00:05:52,120 --> 00:05:55,120 ‫Finally, when they're processed by Lambda, 157 00:05:55,120 --> 00:05:57,440 ‫Lambda will delete the items from the queue, 158 00:05:57,440 --> 00:05:59,450 ‫and then they will never be seen again. 159 00:05:59,450 --> 00:06:00,570 ‫And finally, as I said, 160 00:06:00,570 --> 00:06:02,890 ‫you can configure the source queue to send the items 161 00:06:02,890 --> 00:06:05,720 ‫to a dead-letter queue if they can't be processed. 162 00:06:05,720 --> 00:06:07,560 ‫So hopefully all of this makes sense. 163 00:06:07,560 --> 00:06:09,610 ‫We'll see in the hands on how we can set it up. 164 00:06:09,610 --> 00:06:10,940 ‫And so what about the scaling? 165 00:06:10,940 --> 00:06:11,780 ‫So I already said it, 166 00:06:11,780 --> 00:06:13,930 ‫but we can go again to summarize the scaling 167 00:06:13,930 --> 00:06:14,980 ‫for the Event Mapper. 168 00:06:17,181 --> 00:06:18,810 ‫So for Kinesis Data Streams and DynamoDB Streams, 169 00:06:18,810 --> 00:06:21,770 ‫you get one Lambda invocation per stream shard. 170 00:06:21,770 --> 00:06:24,170 ‫Or if you use parallelization, 171 00:06:24,170 --> 00:06:26,490 ‫you can have up to 10 batches processed 172 00:06:26,490 --> 00:06:28,240 ‫per shard simultaneously. 173 00:06:28,240 --> 00:06:29,870 ‫And for SQS Standard, 174 00:06:29,870 --> 00:06:31,440 ‫I said Lambda will scale up pretty quickly. 175 00:06:31,440 --> 00:06:32,273 ‫So yes it does. 176 00:06:32,273 --> 00:06:35,600 ‫It adds 16 more instances per minute to scale up. 177 00:06:35,600 --> 00:06:36,810 ‫So it's pretty quick. 178 00:06:36,810 --> 00:06:39,420 ‫And then the maximum amount of batches per second 179 00:06:39,420 --> 00:06:42,570 ‫processed simultaneously, is 1000. 180 00:06:42,570 --> 00:06:44,550 ‫For SQS FIFO, it's a bit different. 181 00:06:44,550 --> 00:06:46,020 ‫So the messaging with the same group ID 182 00:06:46,020 --> 00:06:47,960 ‫will be processed in order no matter what. 183 00:06:47,960 --> 00:06:49,680 ‫And the Lambda function will scale up 184 00:06:49,680 --> 00:06:51,950 ‫to the number of active message groups. 185 00:06:51,950 --> 00:06:54,370 ‫Again, they're defined by the group ID. 186 00:06:54,370 --> 00:06:57,560 ‫So that's all for Lambda Events Mapping Sourcing, 187 00:06:57,560 --> 00:07:00,640 ‫and I know this is a pretty long boring theory lecture. 188 00:07:00,640 --> 00:07:02,440 ‫It will make a lot of sense in the next lecture 189 00:07:02,440 --> 00:07:03,570 ‫when we do the hands on, 190 00:07:03,570 --> 00:07:04,810 ‫but we had to go over it 191 00:07:04,810 --> 00:07:06,480 ‫and I would recommend you look again 192 00:07:06,480 --> 00:07:09,350 ‫at this lecture before your exam 193 00:07:09,350 --> 00:07:11,110 ‫because it's possible the exam will ask you 194 00:07:11,110 --> 00:07:14,630 ‫some pretty pointy details about Event Mappers. 195 00:07:14,630 --> 00:07:17,480 ‫All right, that's it. I will see you in the next lecture.