1
00:00:00,180 --> 00:00:02,440
‫So we have seen asynchronous processing,

2
00:00:02,440 --> 00:00:04,170
‫we have seen synchronous processing,

3
00:00:04,170 --> 00:00:06,890
‫and now we are going to see Event Source Mapping.

4
00:00:06,890 --> 00:00:08,730
‫So this is the last category of how Lambda

5
00:00:08,730 --> 00:00:11,200
‫can process events in AWS.

6
00:00:11,200 --> 00:00:15,010
‫So it applies to Kinesis Data Streams,

7
00:00:15,010 --> 00:00:19,040
‫SQS and SQS FIFO queue and DynamoDB Streams.

8
00:00:19,040 --> 00:00:22,490
‫So the common denominator of all of these things together

9
00:00:22,490 --> 00:00:25,423
‫is that records need to be polled from the source.

10
00:00:25,423 --> 00:00:28,480
‫So Lambda needs to ask the service to get some records

11
00:00:28,480 --> 00:00:30,210
‫and then the records will be returned.

12
00:00:30,210 --> 00:00:34,160
‫So that means that Lambda needs to poll from these services.

13
00:00:34,160 --> 00:00:36,260
‫So in this case the Lambda function

14
00:00:36,260 --> 00:00:37,600
‫is invoked synchronously.

15
00:00:37,600 --> 00:00:38,970
‫So let's see this,

16
00:00:38,970 --> 00:00:41,300
‫We have Kinesis and the Lambda service,

17
00:00:41,300 --> 00:00:44,570
‫and if we configure Lambda to read from Kinesis,

18
00:00:44,570 --> 00:00:46,590
‫then they will be internally

19
00:00:46,590 --> 00:00:49,150
‫an Event Source Mapping that will be created.

20
00:00:49,150 --> 00:00:52,370
‫And that is responsible for polling Kinesis,

21
00:00:52,370 --> 00:00:55,110
‫and returning and getting the results back from Kinesis.

22
00:00:55,110 --> 00:00:57,483
‫So Kinesis will return a batch to us.

23
00:00:58,326 --> 00:00:59,910
‫And then once this Event Source Mapping

24
00:00:59,910 --> 00:01:02,370
‫has some data for Lambda to process,

25
00:01:02,370 --> 00:01:05,500
‫it's going to invoke our Lambda function synchronously,

26
00:01:05,500 --> 00:01:07,290
‫with an event batch.

27
00:01:07,290 --> 00:01:09,320
‫So at its core, this is how it works.

28
00:01:09,320 --> 00:01:11,970
‫So there are two categories of Event Source Mapper.

29
00:01:11,970 --> 00:01:14,240
‫The first one is streams and the second one is queues.

30
00:01:14,240 --> 00:01:15,260
‫So let's deal with streams,

31
00:01:15,260 --> 00:01:17,600
‫and streams apply to Kinesis Data Streams

32
00:01:17,600 --> 00:01:18,880
‫and DynamoDB Streams,

33
00:01:18,880 --> 00:01:20,240
‫and we'll see what DynamoDB Streams are

34
00:01:20,240 --> 00:01:21,370
‫very, very soon.

35
00:01:21,370 --> 00:01:24,780
‫So in case of streams, they will be an Event Source Mapping.

36
00:01:24,780 --> 00:01:27,320
‫They will create an iterator for each shard,

37
00:01:27,320 --> 00:01:30,250
‫so each Kinesis shard or DynamoDB Stream shard.

38
00:01:30,250 --> 00:01:34,510
‫And the items will be processed in order at the shard level.

39
00:01:34,510 --> 00:01:37,080
‫And so we can configure where to start to read from.

40
00:01:37,080 --> 00:01:39,350
‫You can read it with just the new items,

41
00:01:39,350 --> 00:01:41,250
‫or from the beginning of the shard

42
00:01:41,250 --> 00:01:43,580
‫or from a specific timestamp.

43
00:01:43,580 --> 00:01:46,980
‫Whenever an item is processed from a shard,

44
00:01:46,980 --> 00:01:48,710
‫whether it be from Kinesis or DynamoDB,

45
00:01:48,710 --> 00:01:50,450
‫they are not removed from the streams.

46
00:01:50,450 --> 00:01:53,190
‫That means that other consumers can read the data

47
00:01:53,190 --> 00:01:54,980
‫in Kinesis or DynamoDB,

48
00:01:54,980 --> 00:01:57,260
‫which is how they work in the first place.

49
00:01:57,260 --> 00:01:59,713
‫But I just wanted to reemphasize this.

50
00:02:01,149 --> 00:02:02,330
‫So the use case for this

51
00:02:02,330 --> 00:02:04,520
‫is either low traffic or high traffic.

52
00:02:04,520 --> 00:02:06,710
‫So if you have a low traffic stream,

53
00:02:06,710 --> 00:02:09,850
‫you can use a batch window to accumulate records

54
00:02:09,850 --> 00:02:10,683
‫before processing.

55
00:02:10,683 --> 00:02:11,890
‫So to make sure that

56
00:02:11,890 --> 00:02:14,730
‫you invoke another function efficiently.

57
00:02:14,730 --> 00:02:17,500
‫And then if you have a very high throughput stream

58
00:02:17,500 --> 00:02:19,210
‫and you want to speed up processing,

59
00:02:19,210 --> 00:02:22,900
‫you can set up Lambda to process multiple batches

60
00:02:22,900 --> 00:02:24,900
‫in parallel at the shard level.

61
00:02:24,900 --> 00:02:28,050
‫So here's a diagram from AWS blog.

62
00:02:28,050 --> 00:02:31,680
‫And so we have a shard and there's a record processor and

63
00:02:31,680 --> 00:02:33,920
‫there's a way to have a parallelism

64
00:02:33,920 --> 00:02:36,880
‫to have multiple Lambda functions process the batch

65
00:02:36,880 --> 00:02:38,380
‫within the same shard.

66
00:02:38,380 --> 00:02:42,740
‫So you can have up to 10 batch processors per shard.

67
00:02:42,740 --> 00:02:46,360
‫And for each batch they will be in-order processing,

68
00:02:46,360 --> 00:02:48,263
‫at the partition key level.

69
00:02:49,316 --> 00:02:51,335
‫So if you specify a partition key,

70
00:02:51,335 --> 00:02:53,070
‫it will not be read in order entirely for the shard.

71
00:02:53,070 --> 00:02:55,660
‫But each key within the shard will be read in order.

72
00:02:55,660 --> 00:02:58,240
‫So this is how we can parallelize the processing

73
00:02:58,240 --> 00:03:00,730
‫for Lambda function and your streams.

74
00:03:00,730 --> 00:03:01,830
‫What about errors?

75
00:03:01,830 --> 00:03:04,220
‫By default, if your function returns an error,

76
00:03:04,220 --> 00:03:06,270
‫the entire batch is going to be reprocessed

77
00:03:06,270 --> 00:03:07,790
‫until the function succeeds

78
00:03:07,790 --> 00:03:09,640
‫or the items in the batch expire.

79
00:03:09,640 --> 00:03:11,010
‫So this is very important.

80
00:03:11,010 --> 00:03:14,590
‫Having an error in a batch can block your processing.

81
00:03:14,590 --> 00:03:17,420
‫So to ensure in-order processing,

82
00:03:17,420 --> 00:03:19,470
‫processing for the effected batch is paused

83
00:03:19,470 --> 00:03:21,100
‫until the error is resolved.

84
00:03:21,100 --> 00:03:24,150
‫And so you can manage that in several ways.

85
00:03:24,150 --> 00:03:26,070
‫You can configure the event source mapping

86
00:03:26,070 --> 00:03:28,990
‫to number one, discard old events.

87
00:03:28,990 --> 00:03:31,180
‫or restrict the number of retries

88
00:03:31,180 --> 00:03:33,330
‫or split the batch on errors.

89
00:03:33,330 --> 00:03:36,160
‫This case, this is around Lambda timeout issue.

90
00:03:36,160 --> 00:03:37,640
‫So if your Lambda function doesn't have enough time

91
00:03:37,640 --> 00:03:38,930
‫to process the entire batch,

92
00:03:38,930 --> 00:03:41,620
‫maybe you will have enough time to process half the batch.

93
00:03:41,620 --> 00:03:44,870
‫And then in case you want to discard old events,

94
00:03:44,870 --> 00:03:46,940
‫all the events can go to a destination.

95
00:03:46,940 --> 00:03:50,138
‫And we'll see destinations in the next lectures.

96
00:03:50,138 --> 00:03:51,120
‫So this is all about streams.

97
00:03:51,120 --> 00:03:53,290
‫Now you need to know about queues.

98
00:03:53,290 --> 00:03:56,290
‫So for the queues, it's for SQS and SQS FIFO.

99
00:03:56,290 --> 00:03:57,920
‫And so you have the same idea.

100
00:03:57,920 --> 00:03:59,410
‫The SQS queue will be polled

101
00:03:59,410 --> 00:04:01,390
‫by a Lambda Event Source Mapping.

102
00:04:01,390 --> 00:04:03,320
‫And then whenever a batch is returned,

103
00:04:03,320 --> 00:04:05,800
‫your Lambda function will be invoked synchronously

104
00:04:05,800 --> 00:04:07,193
‫with the event batch.

105
00:04:08,430 --> 00:04:09,300
‫So in the case of SQS,

106
00:04:09,300 --> 00:04:11,350
‫the Event Source Mapping will poll SQS

107
00:04:11,350 --> 00:04:12,280
‫using Long Polling.

108
00:04:12,280 --> 00:04:13,690
‫So it's going to be efficient.

109
00:04:13,690 --> 00:04:15,160
‫And we can specify the batch size

110
00:04:15,160 --> 00:04:17,180
‫from one to 10 messages.

111
00:04:17,180 --> 00:04:18,610
‫And here is the configuration.

112
00:04:18,610 --> 00:04:21,023
‫So just the batch size and the SQS queue.

113
00:04:22,691 --> 00:04:23,524
‫Then there's some recommendations

114
00:04:23,524 --> 00:04:24,470
‫from the websites of AWS,

115
00:04:24,470 --> 00:04:26,760
‫which is to set the queue visibility timeouts

116
00:04:26,760 --> 00:04:28,840
‫to six times the time out of your Lambda function,

117
00:04:28,840 --> 00:04:29,940
‫which is configurable.

118
00:04:31,086 --> 00:04:32,130
‫And then if you want to use a DLQ,

119
00:04:32,130 --> 00:04:34,423
‫so if you want to make sure that

120
00:04:34,423 --> 00:04:36,490
‫if there's a problem reading or processing a message

121
00:04:36,490 --> 00:04:38,400
‫in SQS, it goes to dead-letter queue,

122
00:04:38,400 --> 00:04:41,640
‫then you set up the DLQ on the SQS queue,

123
00:04:41,640 --> 00:04:42,473
‫not on Lambda.

124
00:04:42,473 --> 00:04:45,470
‫So we'd set up a DLQ on SQS

125
00:04:45,470 --> 00:04:47,030
‫but not on Lambda. Why?

126
00:04:47,030 --> 00:04:48,350
‫Because the DLQ for Lambda

127
00:04:48,350 --> 00:04:50,940
‫only works for asynchronous invocations.

128
00:04:50,940 --> 00:04:53,350
‫And this is a synchronous invocations.

129
00:04:53,350 --> 00:04:55,220
‫Or as we'll see, we can also use

130
00:04:55,220 --> 00:04:57,200
‫a Lambda destination for failures.

131
00:04:57,200 --> 00:04:59,660
‫And this is going to be seen in the next lectures.

132
00:04:59,660 --> 00:05:03,440
‫So now the bates of information on queues and Lambda,

133
00:05:03,440 --> 00:05:06,230
‫and I'm sorry this is quite boring but I have to say it.

134
00:05:06,230 --> 00:05:08,430
‫So Lambda supports in-order processing,

135
00:05:08,430 --> 00:05:10,900
‫if you have a FIFO queue, so First-In, First-Out.

136
00:05:10,900 --> 00:05:12,930
‫And the number of Lambda functions

137
00:05:12,930 --> 00:05:15,080
‫that will be scaling to process your queue

138
00:05:15,080 --> 00:05:17,520
‫will be equal to the number of active message groups.

139
00:05:17,520 --> 00:05:19,490
‫So this is the group ID setting.

140
00:05:19,490 --> 00:05:21,020
‫If you use a standard queue,

141
00:05:21,020 --> 00:05:23,453
‫then the items will not be processed in order.

142
00:05:24,587 --> 00:05:26,150
‫And for a standard queue, Lambda will scale

143
00:05:26,150 --> 00:05:28,320
‫as fast as possible to read all the messages

144
00:05:28,320 --> 00:05:30,180
‫in your standard queue.

145
00:05:30,180 --> 00:05:32,510
‫If there is an error happening in your queue,

146
00:05:32,510 --> 00:05:34,150
‫then the batches are going to be returned

147
00:05:34,150 --> 00:05:36,060
‫to the queue as individual items

148
00:05:36,060 --> 00:05:38,340
‫and might be processed in a different grouping

149
00:05:38,340 --> 00:05:40,030
‫than the original batch.

150
00:05:40,030 --> 00:05:41,990
‫Occasionally, the Events Source Mapping

151
00:05:41,990 --> 00:05:44,240
‫might receive the same item twice from the queue,

152
00:05:44,240 --> 00:05:46,270
‫even if no function error occurred.

153
00:05:46,270 --> 00:05:47,470
‫So you need to make sure

154
00:05:48,383 --> 00:05:49,610
‫to have item potent processing

155
00:05:49,610 --> 00:05:52,120
‫for Lambda function in case that happens.

156
00:05:52,120 --> 00:05:55,120
‫Finally, when they're processed by Lambda,

157
00:05:55,120 --> 00:05:57,440
‫Lambda will delete the items from the queue,

158
00:05:57,440 --> 00:05:59,450
‫and then they will never be seen again.

159
00:05:59,450 --> 00:06:00,570
‫And finally, as I said,

160
00:06:00,570 --> 00:06:02,890
‫you can configure the source queue to send the items

161
00:06:02,890 --> 00:06:05,720
‫to a dead-letter queue if they can't be processed.

162
00:06:05,720 --> 00:06:07,560
‫So hopefully all of this makes sense.

163
00:06:07,560 --> 00:06:09,610
‫We'll see in the hands on how we can set it up.

164
00:06:09,610 --> 00:06:10,940
‫And so what about the scaling?

165
00:06:10,940 --> 00:06:11,780
‫So I already said it,

166
00:06:11,780 --> 00:06:13,930
‫but we can go again to summarize the scaling

167
00:06:13,930 --> 00:06:14,980
‫for the Event Mapper.

168
00:06:17,181 --> 00:06:18,810
‫So for Kinesis Data Streams and DynamoDB Streams,

169
00:06:18,810 --> 00:06:21,770
‫you get one Lambda invocation per stream shard.

170
00:06:21,770 --> 00:06:24,170
‫Or if you use parallelization,

171
00:06:24,170 --> 00:06:26,490
‫you can have up to 10 batches processed

172
00:06:26,490 --> 00:06:28,240
‫per shard simultaneously.

173
00:06:28,240 --> 00:06:29,870
‫And for SQS Standard,

174
00:06:29,870 --> 00:06:31,440
‫I said Lambda will scale up pretty quickly.

175
00:06:31,440 --> 00:06:32,273
‫So yes it does.

176
00:06:32,273 --> 00:06:35,600
‫It adds 16 more instances per minute to scale up.

177
00:06:35,600 --> 00:06:36,810
‫So it's pretty quick.

178
00:06:36,810 --> 00:06:39,420
‫And then the maximum amount of batches per second

179
00:06:39,420 --> 00:06:42,570
‫processed simultaneously, is 1000.

180
00:06:42,570 --> 00:06:44,550
‫For SQS FIFO, it's a bit different.

181
00:06:44,550 --> 00:06:46,020
‫So the messaging with the same group ID

182
00:06:46,020 --> 00:06:47,960
‫will be processed in order no matter what.

183
00:06:47,960 --> 00:06:49,680
‫And the Lambda function will scale up

184
00:06:49,680 --> 00:06:51,950
‫to the number of active message groups.

185
00:06:51,950 --> 00:06:54,370
‫Again, they're defined by the group ID.

186
00:06:54,370 --> 00:06:57,560
‫So that's all for Lambda Events Mapping Sourcing,

187
00:06:57,560 --> 00:07:00,640
‫and I know this is a pretty long boring theory lecture.

188
00:07:00,640 --> 00:07:02,440
‫It will make a lot of sense in the next lecture

189
00:07:02,440 --> 00:07:03,570
‫when we do the hands on,

190
00:07:03,570 --> 00:07:04,810
‫but we had to go over it

191
00:07:04,810 --> 00:07:06,480
‫and I would recommend you look again

192
00:07:06,480 --> 00:07:09,350
‫at this lecture before your exam

193
00:07:09,350 --> 00:07:11,110
‫because it's possible the exam will ask you

194
00:07:11,110 --> 00:07:14,630
‫some pretty pointy details about Event Mappers.

195
00:07:14,630 --> 00:07:17,480
‫All right, that's it. I will see you in the next lecture.