1 00:00:00,230 --> 00:00:01,880 So the first service you need to know about 2 00:00:01,880 --> 00:00:03,610 is Kinesis Data Streams. 3 00:00:03,610 --> 00:00:05,590 And Kinesis Data Streams is a way 4 00:00:05,590 --> 00:00:09,330 for you to stream big data in your systems. 5 00:00:09,330 --> 00:00:11,780 So a Kinesis Data Stream is made 6 00:00:11,780 --> 00:00:13,470 of multiple shards, 7 00:00:13,470 --> 00:00:15,070 and shards are numbered. 8 00:00:15,070 --> 00:00:18,640 Number one, number two, all the way to number N. 9 00:00:18,640 --> 00:00:21,600 And this is something you have to provision ahead of time. 10 00:00:21,600 --> 00:00:23,440 So when you start with Kinesis Data Streams, 11 00:00:23,440 --> 00:00:26,930 you're saying, hey, I want a stream with six shards. 12 00:00:26,930 --> 00:00:30,240 And so the data is going to be split across all the shards. 13 00:00:30,240 --> 00:00:31,073 Okay? 14 00:00:31,073 --> 00:00:34,390 And the shards are going to be defining your stream capacity 15 00:00:34,390 --> 00:00:37,260 in terms of ingestion and consumption rates. 16 00:00:37,260 --> 00:00:39,720 So, for now, let's just start with this. 17 00:00:39,720 --> 00:00:41,080 And then we have producers. 18 00:00:41,080 --> 00:00:44,150 So producers send data into Kinesis Data Streams, 19 00:00:44,150 --> 00:00:45,600 and producers can be manyfold. 20 00:00:45,600 --> 00:00:46,840 They could be applications, 21 00:00:46,840 --> 00:00:49,760 they could be clients which has desktop, or mobile clients, 22 00:00:49,760 --> 00:00:52,630 they could be leveraging the AWS SDK at a very, 23 00:00:52,630 --> 00:00:55,350 very low level, or the Kinesis Producer Library, 24 00:00:55,350 --> 00:00:57,749 KPL, at a higher level and we'll have a deep down, 25 00:00:57,749 --> 00:01:01,010 deeper dive onto the producers in the next lectures, 26 00:01:01,010 --> 00:01:03,220 or it could be the Kinesis Agent 27 00:01:03,220 --> 00:01:05,459 inside of the server to stream, for example 28 00:01:05,459 --> 00:01:08,460 application logs into Kinesis Data Streams. 29 00:01:08,460 --> 00:01:10,480 So all these producers do the exact same thing. 30 00:01:10,480 --> 00:01:13,890 They rely on the SDK at a very, very low level, 31 00:01:13,890 --> 00:01:15,157 and they're going to produce records 32 00:01:15,157 --> 00:01:17,300 into our Kinesis Data Stream. 33 00:01:17,300 --> 00:01:20,630 So a record, at its fundamental, is made of two things, 34 00:01:20,630 --> 00:01:24,330 it's made of a partition key and it is made of the 35 00:01:24,330 --> 00:01:27,870 data blob, or the value that is up to one megabytes. 36 00:01:27,870 --> 00:01:29,460 So the partition key will define 37 00:01:29,460 --> 00:01:33,040 and help determine in which shard will the record go to. 38 00:01:33,040 --> 00:01:35,420 And the data blob is the value itself. 39 00:01:35,420 --> 00:01:37,800 So when you have the producers sending data 40 00:01:37,800 --> 00:01:40,000 to Kinesis Data Streams, they can send data 41 00:01:40,000 --> 00:01:42,170 at a rate of one megabytes per second, 42 00:01:42,170 --> 00:01:45,320 or a thousand messages per second, per shard. 43 00:01:45,320 --> 00:01:47,886 So if you have six shards, you get six megabytes per second, 44 00:01:47,886 --> 00:01:52,500 or 6,000 messages per second, overall, okay? 45 00:01:52,500 --> 00:01:54,830 Now, once the data is in Kinesis Data Streams, 46 00:01:54,830 --> 00:01:57,070 it can be consumed by many consumers, 47 00:01:57,070 --> 00:01:58,960 and these consumers, again, can have many forms 48 00:01:58,960 --> 00:02:01,510 and we'll explore them in details in this section. 49 00:02:01,510 --> 00:02:03,410 So we have applications and they could be relying 50 00:02:03,410 --> 00:02:07,460 on the SDK or at a high level, the Kinesis Client Libraries, 51 00:02:07,460 --> 00:02:10,050 so KCL. They could be Lambda functions, 52 00:02:10,050 --> 00:02:11,870 if you want to do serverless processing on top 53 00:02:11,870 --> 00:02:13,035 of Kinesis Data Streams. 54 00:02:13,035 --> 00:02:15,150 It could be Kinesis Data Firehose, 55 00:02:15,150 --> 00:02:16,910 as we'll see in this section, 56 00:02:16,910 --> 00:02:19,430 or Kinesis Data Analytics. 57 00:02:19,430 --> 00:02:22,160 So when the consumer receives a record, it receives, again, 58 00:02:22,160 --> 00:02:24,590 the partition key, also a sequence number 59 00:02:24,590 --> 00:02:28,500 which represents where the record was in the shard, 60 00:02:28,500 --> 00:02:31,950 as well as the data blob, so the data itself. 61 00:02:31,950 --> 00:02:33,367 Now we have different consumption modes 62 00:02:33,367 --> 00:02:35,130 for Kinesis Data Streams. 63 00:02:35,130 --> 00:02:37,470 We have two megabytes per second 64 00:02:37,470 --> 00:02:41,950 of throughput shared for all the consumers, per shard, okay? 65 00:02:41,950 --> 00:02:45,250 Or you get two megabytes per second, per shard, per consumer 66 00:02:45,250 --> 00:02:48,340 if you are enabling the enhanced consumer mode, 67 00:02:48,340 --> 00:02:49,250 the enhanced fan-out. 68 00:02:49,250 --> 00:02:51,930 So, we will look at it again in this section 69 00:02:51,930 --> 00:02:53,400 in greater detail. 70 00:02:53,400 --> 00:02:56,286 So again, producers send data to Kinesis Data Streams. 71 00:02:56,286 --> 00:02:59,200 It stays in there for a while, 72 00:02:59,200 --> 00:03:02,200 and then it is read by many different consumers. 73 00:03:02,200 --> 00:03:04,500 Okay, some properties of Kinesis Data Streams. 74 00:03:04,500 --> 00:03:06,600 The first one is that retention can be set 75 00:03:06,600 --> 00:03:09,070 between 1 day to 365 days. 76 00:03:09,070 --> 00:03:10,780 And that means that by default 77 00:03:10,780 --> 00:03:13,965 you have the ability to reprocess or replay data. 78 00:03:13,965 --> 00:03:16,410 And once data is inserted into Kinesis, 79 00:03:16,410 --> 00:03:17,620 it cannot be deleted. 80 00:03:17,620 --> 00:03:19,760 That's called immutability. 81 00:03:19,760 --> 00:03:22,500 Also, when you send messages to Kinesis Data Streams 82 00:03:22,500 --> 00:03:25,690 you add a partition key. And messages that share 83 00:03:25,690 --> 00:03:28,250 the same partition key will go to the same shard, 84 00:03:28,250 --> 00:03:30,550 and that gives you key based ordering. 85 00:03:30,550 --> 00:03:33,090 For producers, you can send data using the SDK, 86 00:03:33,090 --> 00:03:36,240 Kinesis Producer Library, KPL, or the Kinesis Agents. 87 00:03:36,240 --> 00:03:37,970 And for consumers, you can write your own. 88 00:03:37,970 --> 00:03:41,430 So, Kinesis Client Library, KCL, or the SDK, 89 00:03:41,430 --> 00:03:44,030 or you can use a managed consumer on AWS, 90 00:03:44,030 --> 00:03:46,480 such as AWS Lambda, Kinesis Data Firehose, 91 00:03:46,480 --> 00:03:48,830 or Kinesis Data Analytics. 92 00:03:48,830 --> 00:03:49,900 Now for capacity modes, 93 00:03:49,900 --> 00:03:52,230 you have two options for Kinesis Data Stream. 94 00:03:52,230 --> 00:03:54,400 The first one, that's the historic capacity mode, 95 00:03:54,400 --> 00:03:56,080 it's called provisioned mode. 96 00:03:56,080 --> 00:03:58,393 So you choose a number of shards provisioned, 97 00:03:58,393 --> 00:04:01,880 and then you can scale them manually or using an API. 98 00:04:01,880 --> 00:04:04,080 And each shard in Kinesis Data Streams 99 00:04:04,080 --> 00:04:06,370 is going to get one megabyte per second, 100 00:04:06,370 --> 00:04:08,490 or 1000 records per second. 101 00:04:08,490 --> 00:04:11,240 And then for the out-throughput 102 00:04:11,240 --> 00:04:13,765 each shard will get two megabytes per second, 103 00:04:13,765 --> 00:04:17,433 and this is applicable to classic or fan-out consumer. 104 00:04:18,380 --> 00:04:20,519 You also pay per shard provisioned per hour. 105 00:04:20,519 --> 00:04:22,370 So you need to think a lot in advance, 106 00:04:22,370 --> 00:04:24,610 and that's why it's called provisioned mode. 107 00:04:24,610 --> 00:04:28,005 But the second mode is a neuro mode called On-demand mode. 108 00:04:28,005 --> 00:04:30,100 And in this, you don't need to provision 109 00:04:30,100 --> 00:04:31,413 or manage the capacity. 110 00:04:31,413 --> 00:04:33,920 That means that the capacity will be adjusted 111 00:04:33,920 --> 00:04:35,600 over time, on demand. 112 00:04:35,600 --> 00:04:37,430 You get the default capacity provisioned, 113 00:04:37,430 --> 00:04:41,140 which is four megabytes per second, or 4,000 records per, 114 00:04:41,140 --> 00:04:43,720 and then there will be automatic scaling based on 115 00:04:43,720 --> 00:04:47,500 the observed throughput peak during the last 30 days. 116 00:04:47,500 --> 00:04:49,060 And in this mode, you're still going to pay 117 00:04:49,060 --> 00:04:52,560 per stream per hour, and per data in/out per gigabyte. 118 00:04:52,560 --> 00:04:54,180 So a different pricing model. 119 00:04:54,180 --> 00:04:58,030 So if you don't know your capacity events, go for On-demand, 120 00:04:58,030 --> 00:04:59,660 but if you can plan capacity events, 121 00:04:59,660 --> 00:05:01,885 you should go for Provisioned mode. 122 00:05:01,885 --> 00:05:04,800 In terms of security for Kinesis Data Streams, 123 00:05:04,800 --> 00:05:07,410 it is deployed within a region. 124 00:05:07,410 --> 00:05:09,030 And so you have your shards. 125 00:05:09,030 --> 00:05:12,150 You can control access to produce and read 126 00:05:12,150 --> 00:05:14,220 from the shard using IAM policies. 127 00:05:14,220 --> 00:05:16,920 There is encryption in flight using HTTPS, 128 00:05:16,920 --> 00:05:19,426 and encryption at rest using KMS. 129 00:05:19,426 --> 00:05:22,350 You can implement your own encryption 130 00:05:22,350 --> 00:05:24,250 and decryption of data on the client side, 131 00:05:24,250 --> 00:05:25,810 which is called client side encryption, 132 00:05:25,810 --> 00:05:27,970 and it is harder to implement because you need to 133 00:05:27,970 --> 00:05:30,360 encrypt the data yourself and decrypt it yourself. 134 00:05:30,360 --> 00:05:31,724 But this enhances security. 135 00:05:31,724 --> 00:05:33,866 VPC endpoints are available for Kinesis. 136 00:05:33,866 --> 00:05:36,830 This allows you to access Kinesis directly 137 00:05:36,830 --> 00:05:38,340 from HTTPS, for instance 138 00:05:38,340 --> 00:05:41,230 in a private subject without going through the internet. 139 00:05:41,230 --> 00:05:42,170 And finally, 140 00:05:42,170 --> 00:05:44,857 all the API calls can be monitored using CloudTrail. 141 00:05:44,857 --> 00:05:48,330 So that's it for an overview of Kinesis Data Streams. 142 00:05:48,330 --> 00:05:49,163 I hope you liked it. 143 00:05:49,163 --> 00:05:52,180 And I will see you in the next lecture for a deeper dive 144 00:05:52,180 --> 00:05:56,120 on all the moving parts in Kinesis Data Streams.