1 00:00:00,210 --> 00:00:02,040 ‫In the exam you will see 2 00:00:02,040 --> 00:00:04,350 ‫the IPI calls of the DynamoDB referred by their name. 3 00:00:04,350 --> 00:00:06,420 ‫So it's good for us to see them once. 4 00:00:06,420 --> 00:00:08,760 ‫So if you want to write data, you have a few options. 5 00:00:08,760 --> 00:00:11,340 ‫You have PutItem, and when you do a PutItem, 6 00:00:11,340 --> 00:00:15,030 ‫it creates or fully replaces a new item 7 00:00:15,030 --> 00:00:17,100 ‫that has the same Primary Key. 8 00:00:17,100 --> 00:00:18,990 ‫It will consume write capacity units. 9 00:00:18,990 --> 00:00:21,060 ‫And so the idea is that you wanna do a full replace 10 00:00:21,060 --> 00:00:22,530 ‫or writing a new item. 11 00:00:22,530 --> 00:00:24,240 ‫The second one is UpdateItem, 12 00:00:24,240 --> 00:00:26,460 ‫which is a bit different than PutItem. 13 00:00:26,460 --> 00:00:29,850 ‫This one will edit the existing items, attributes, 14 00:00:29,850 --> 00:00:32,700 ‫or we'll add a new item if it does not exist. 15 00:00:32,700 --> 00:00:34,790 ‫But the idea is that with UpdateItem, 16 00:00:34,790 --> 00:00:38,640 ‫we only edit a few attributes, not every other attribute. 17 00:00:38,640 --> 00:00:41,520 ‫So this is the difference between PutItem and UpdateItem. 18 00:00:41,520 --> 00:00:44,400 ‫And we can use it with Atomic Counters that we'll see 19 00:00:44,400 --> 00:00:46,230 ‫in this section as well. 20 00:00:46,230 --> 00:00:47,940 ‫And then you have Conditional Writes, 21 00:00:47,940 --> 00:00:50,370 ‫which is to accept a write/update/ delete 22 00:00:50,370 --> 00:00:52,470 ‫only if a condition is met. 23 00:00:52,470 --> 00:00:54,630 ‫And this is helping with concurrent access to items. 24 00:00:54,630 --> 00:00:57,810 ‫And we'll see this as well in this section. 25 00:00:57,810 --> 00:00:59,783 ‫To read data, we have a GetItem. 26 00:00:59,783 --> 00:01:01,830 ‫And GetItem is very simple and easy to understand. 27 00:01:01,830 --> 00:01:04,770 ‫You read based on the Primary Key and the Primary Key again, 28 00:01:04,770 --> 00:01:06,420 ‫can be a HASH or a HASH+Range, 29 00:01:06,420 --> 00:01:08,010 ‫so you have the two options. 30 00:01:08,010 --> 00:01:10,230 ‫And you get two modes to read from. 31 00:01:10,230 --> 00:01:12,870 ‫So you have the Eventually Consistent Read Mode 32 00:01:12,870 --> 00:01:15,360 ‫or to have Strongly Consistent Read Modes, 33 00:01:15,360 --> 00:01:16,530 ‫but you need to specify it. 34 00:01:16,530 --> 00:01:18,090 ‫It will take more RCU 35 00:01:18,090 --> 00:01:20,340 ‫and then maybe a little bit more latency. 36 00:01:20,340 --> 00:01:23,700 ‫You can also specify a Projected Expression in your API. 37 00:01:23,700 --> 00:01:25,260 ‫And this Projection Expression 38 00:01:25,260 --> 00:01:28,620 ‫is going to help you receive only a few attributes 39 00:01:28,620 --> 00:01:30,300 ‫out of the DynamoDB. 40 00:01:30,300 --> 00:01:31,230 ‫Next we have the query. 41 00:01:31,230 --> 00:01:32,880 ‫And the query is to return items 42 00:01:32,880 --> 00:01:35,280 ‫based on a Key Condition Expression, 43 00:01:35,280 --> 00:01:37,890 ‫which is a Partition Key, so it must be the equal operator. 44 00:01:37,890 --> 00:01:40,890 ‫So you're saying, Hey, I want to query for John 123, 45 00:01:40,890 --> 00:01:43,830 ‫and also optionally a Sort Key, and because you can sort, 46 00:01:43,830 --> 00:01:47,130 ‫then you can have equal, less than, over than, 47 00:01:47,130 --> 00:01:49,260 ‫begins, between and so on. 48 00:01:49,260 --> 00:01:51,780 ‫Then you can specify a FilterExpression. 49 00:01:51,780 --> 00:01:53,700 ‫And this is to add additional filtering 50 00:01:53,700 --> 00:01:55,980 ‫after the query operation has been done, 51 00:01:55,980 --> 00:01:58,050 ‫but before the data is returned to you. 52 00:01:58,050 --> 00:02:00,570 ‫And this is to use with non-key attributes. 53 00:02:00,570 --> 00:02:02,310 ‫So, you cannot use a FilterExpression 54 00:02:02,310 --> 00:02:04,740 ‫with HASH or RANGE attributes. 55 00:02:04,740 --> 00:02:05,760 ‫And what the query returns 56 00:02:05,760 --> 00:02:08,190 ‫is going to be a list of items, obviously. 57 00:02:08,190 --> 00:02:10,680 ‫And you have a limit of how many items you retrieve 58 00:02:10,680 --> 00:02:13,530 ‫based on the limit query parameter. 59 00:02:13,530 --> 00:02:14,910 ‫And either you're going to reach 60 00:02:14,910 --> 00:02:16,110 ‫that limit of never items 61 00:02:16,110 --> 00:02:19,170 ‫or you're going to get up to one Megabyte of data. 62 00:02:19,170 --> 00:02:22,290 ‫But if you want to get, obviously, more data of your time, 63 00:02:22,290 --> 00:02:24,060 ‫you can do pagination on the results, 64 00:02:24,060 --> 00:02:26,430 ‫and ask for more and more and more. 65 00:02:26,430 --> 00:02:28,050 ‫Now you can query a table, 66 00:02:28,050 --> 00:02:31,170 ‫or a Local Secondary Index or a Global Secondary Indexes, 67 00:02:31,170 --> 00:02:34,470 ‫and we'll see those in the next lectures as well. 68 00:02:34,470 --> 00:02:38,430 ‫And finally, you have a Scan, so GetItem was for one item, 69 00:02:38,430 --> 00:02:41,580 ‫then the query was for a specific Partition Key 70 00:02:41,580 --> 00:02:42,930 ‫and a Sort Key. 71 00:02:42,930 --> 00:02:46,470 ‫And Scan items is to read an entire table. 72 00:02:46,470 --> 00:02:48,780 ‫And then if you wanted to, you could filter it, the data, 73 00:02:48,780 --> 00:02:50,250 ‫but this is only done on the client side, 74 00:02:50,250 --> 00:02:51,870 ‫so this is very inefficient. 75 00:02:51,870 --> 00:02:54,870 ‫So Scan is to really export the entire table. 76 00:02:54,870 --> 00:02:57,450 ‫And each Scan will return up to one megabyte of data, 77 00:02:57,450 --> 00:02:59,100 ‫and if you want to keep on reading 78 00:02:59,100 --> 00:03:00,990 ‫then you need to use pagination techniques. 79 00:03:00,990 --> 00:03:03,750 ‫So that means page one, page two, page three, and so on. 80 00:03:03,750 --> 00:03:05,130 ‫It will consume a lot of RCU 81 00:03:05,130 --> 00:03:07,440 ‫because you are reading your entire table. 82 00:03:07,440 --> 00:03:08,273 ‫And so therefore, 83 00:03:08,273 --> 00:03:10,950 ‫if you want to not impact your normal operations 84 00:03:10,950 --> 00:03:14,850 ‫you need to impact the Scan using a limit statement 85 00:03:14,850 --> 00:03:16,170 ‫or to reduce the size of the result, 86 00:03:16,170 --> 00:03:17,700 ‫and then pause a little bit. 87 00:03:17,700 --> 00:03:19,230 ‫And if you wanted to, instead, 88 00:03:19,230 --> 00:03:23,160 ‫consume a lot of RCUs and do a scan as fast as possible, 89 00:03:23,160 --> 00:03:26,070 ‫then for faster performance, you would use a Parallel Scan. 90 00:03:26,070 --> 00:03:28,290 ‫In this case, multiple workers that you define 91 00:03:28,290 --> 00:03:30,870 ‫will scan multiple data segments at the same time, 92 00:03:30,870 --> 00:03:33,420 ‫which will increase the throughput and RCU consumed. 93 00:03:33,420 --> 00:03:35,730 ‫And, if you wanted to still have a Parallel Scan 94 00:03:35,730 --> 00:03:36,900 ‫and limit its impact, 95 00:03:36,900 --> 00:03:40,950 ‫you could use limit queries, limit conditions, and so on. 96 00:03:40,950 --> 00:03:43,020 ‫And then scans can be used 97 00:03:43,020 --> 00:03:45,570 ‫with ProjectionExpression and FilterExpression. 98 00:03:45,570 --> 00:03:48,450 ‫So ProjectionExpression to only retrieve certain attributes, 99 00:03:48,450 --> 00:03:51,573 ‫and FilterExpression to change stuff, a client side. 100 00:03:52,530 --> 00:03:55,530 ‫Now if you need to delete data out of DynamoDB, 101 00:03:55,530 --> 00:03:56,670 ‫you have the DeleteItem, 102 00:03:56,670 --> 00:03:59,250 ‫which is used to delete an individual item. 103 00:03:59,250 --> 00:04:01,530 ‫And then, you can also do a conditional delete, 104 00:04:01,530 --> 00:04:06,360 ‫so delete this item only if money equals zero. 105 00:04:06,360 --> 00:04:09,060 ‫And if you need to delete everything in your table, 106 00:04:09,060 --> 00:04:10,620 ‫you have DeleteTable. 107 00:04:10,620 --> 00:04:12,870 ‫So this is to delete a whole table and its item. 108 00:04:12,870 --> 00:04:15,120 ‫And it's much quicker than doing a scan 109 00:04:15,120 --> 00:04:18,120 ‫and then deleting each every single item on the table. 110 00:04:18,120 --> 00:04:20,370 ‫Okay, DeleteTable, which will just drop everything. 111 00:04:20,370 --> 00:04:22,200 ‫And this is something that can come up in the exam. 112 00:04:22,200 --> 00:04:24,420 ‫If you wanted to just delete everything, 113 00:04:24,420 --> 00:04:27,540 ‫do not do a scan, just use the DeleteTable API. 114 00:04:27,540 --> 00:04:29,280 ‫Now for efficiency purposes, 115 00:04:29,280 --> 00:04:32,520 ‫you can actually batch operations in DynamoDB. 116 00:04:32,520 --> 00:04:34,980 ‫So, you save in latency and you get an efficiency 117 00:04:34,980 --> 00:04:38,550 ‫by reducing the number of API calls you do to the database. 118 00:04:38,550 --> 00:04:41,400 ‫And all the operations as part of a batch, 119 00:04:41,400 --> 00:04:45,510 ‫are going to be applied in parallel by DynamoDB 120 00:04:45,510 --> 00:04:47,250 ‫for better efficiency. 121 00:04:47,250 --> 00:04:49,650 ‫So, because you have a batch of operations though, 122 00:04:49,650 --> 00:04:51,870 ‫part of the batch can fail. 123 00:04:51,870 --> 00:04:55,410 ‫In that case you will receive the failed items back, 124 00:04:55,410 --> 00:04:59,190 ‫and you can retry only these failed items. 125 00:04:59,190 --> 00:05:02,013 ‫So in the right mechanism you have BatchWriteItem, 126 00:05:03,330 --> 00:05:07,050 ‫and this allows you to perform up to 25 PutItem 127 00:05:07,050 --> 00:05:09,720 ‫and/or DeleteItem in one call. 128 00:05:09,720 --> 00:05:12,540 ‫You have up to 16 megabytes of data written 129 00:05:12,540 --> 00:05:15,630 ‫and still have the same limit of 400 kilobytes 130 00:05:15,630 --> 00:05:17,310 ‫of data per item. 131 00:05:17,310 --> 00:05:19,260 ‫And you cannot update item, 132 00:05:19,260 --> 00:05:23,340 ‫you can only do PutItem or DeleteItem. 133 00:05:23,340 --> 00:05:27,390 ‫Now, if you have items that were not able to be written 134 00:05:27,390 --> 00:05:28,230 ‫for whatever reason, 135 00:05:28,230 --> 00:05:31,440 ‫usually because of a lack of write capacity, 136 00:05:31,440 --> 00:05:32,730 ‫then you will receive back 137 00:05:32,730 --> 00:05:34,430 ‫something called UnprocessedItems, 138 00:05:35,280 --> 00:05:37,110 ‫and then you can retry the items 139 00:05:37,110 --> 00:05:39,000 ‫within the UnprocessedItems. 140 00:05:39,000 --> 00:05:41,370 ‫So two options to process them correctly. 141 00:05:41,370 --> 00:05:43,830 ‫Either you use an exponential backup strategy 142 00:05:43,830 --> 00:05:46,350 ‫to keep on trying with longer and longer time 143 00:05:46,350 --> 00:05:47,730 ‫until it succeeds, 144 00:05:47,730 --> 00:05:51,600 ‫or if you consistently get these UnprocessedItems 145 00:05:51,600 --> 00:05:53,910 ‫and scaling issues, then of course, 146 00:05:53,910 --> 00:05:56,070 ‫you need to add write capacity units 147 00:05:56,070 --> 00:05:59,580 ‫to allow your batch operations to complete efficiently. 148 00:05:59,580 --> 00:06:01,560 ‫For batch GetItem, 149 00:06:01,560 --> 00:06:04,680 ‫you will return items from one or more tables, 150 00:06:04,680 --> 00:06:06,930 ‫and you can receive up to 100 items 151 00:06:06,930 --> 00:06:09,180 ‫and up to 16 megabytes of data. 152 00:06:09,180 --> 00:06:11,940 ‫And all these items are going to be retrieved in parallel 153 00:06:11,940 --> 00:06:13,530 ‫to minimize latency. 154 00:06:13,530 --> 00:06:15,750 ‫Again, if you are missing some items, 155 00:06:15,750 --> 00:06:18,450 ‫it's because you may have some UnprocessedKeys 156 00:06:18,450 --> 00:06:20,340 ‫because you have failed read operations 157 00:06:20,340 --> 00:06:22,320 ‫because you don't have enough capacity. 158 00:06:22,320 --> 00:06:24,510 ‫In which case, same idea, 159 00:06:24,510 --> 00:06:27,750 ‫you use exponential backup to retry 160 00:06:27,750 --> 00:06:29,790 ‫or you add read capacity units 161 00:06:29,790 --> 00:06:31,983 ‫to increase your read capacity. 162 00:06:32,820 --> 00:06:34,890 ‫Also, we have PartiQL. 163 00:06:34,890 --> 00:06:36,510 ‫So we've seen that in DynamoDB 164 00:06:36,510 --> 00:06:39,270 ‫we have specific API calls to do specific things, 165 00:06:39,270 --> 00:06:41,190 ‫but sometimes all that you know, 166 00:06:41,190 --> 00:06:46,190 ‫as a data engineer or whatever, as a developer, may be SQL. 167 00:06:46,440 --> 00:06:50,610 ‫And so you can use SQL on DynamoDB by using PartiQL. 168 00:06:50,610 --> 00:06:53,160 ‫So here you have a standard SQL query 169 00:06:53,160 --> 00:06:55,050 ‫where you find the OrderID, 170 00:06:55,050 --> 00:06:56,790 ‫and the Total from your Orders Table, 171 00:06:56,790 --> 00:06:58,320 ‫and you have a filtering condition, 172 00:06:58,320 --> 00:06:59,940 ‫and an ordering condition. 173 00:06:59,940 --> 00:07:01,830 ‫And so, using PartiQL, 174 00:07:01,830 --> 00:07:04,590 ‫you can do the exact same operations we saw before, 175 00:07:04,590 --> 00:07:07,050 ‫which is to select, insert, update, 176 00:07:07,050 --> 00:07:09,030 ‫and delete data in DynamoDB. 177 00:07:09,030 --> 00:07:12,450 ‫But this time, instead of doing the DynamoDB specific APIs 178 00:07:12,450 --> 00:07:13,860 ‫you can just use SQL. 179 00:07:13,860 --> 00:07:15,330 ‫And you can run your queries 180 00:07:15,330 --> 00:07:17,460 ‫across multiple DynamoDB tables, 181 00:07:17,460 --> 00:07:18,690 ‫but you cannot do joins okay? 182 00:07:18,690 --> 00:07:21,540 ‫You can just do the select, insert, update, and delete. 183 00:07:21,540 --> 00:07:23,280 ‫Everything you can do with an API 184 00:07:23,280 --> 00:07:27,180 ‫but then you use SQL for writing these calls. 185 00:07:27,180 --> 00:07:30,450 ‫So you run PartiQL queries from the Management Console 186 00:07:30,450 --> 00:07:33,330 ‫or from the NoSQL Workbench for DynamoDB, 187 00:07:33,330 --> 00:07:38,160 ‫or from DynamoDB APIs, or from the CLI, or from the SDK. 188 00:07:38,160 --> 00:07:39,120 ‫And the goal of it 189 00:07:39,120 --> 00:07:42,510 ‫is really not to add new capabilities to DynamoDB 190 00:07:42,510 --> 00:07:44,280 ‫because you have the same capabilities, 191 00:07:44,280 --> 00:07:45,900 ‫but it's just to use SQL 192 00:07:45,900 --> 00:07:49,410 ‫to write these API calls against DynamoDB. 193 00:07:49,410 --> 00:07:50,700 ‫So hopefully that makes sense. 194 00:07:50,700 --> 00:07:54,003 ‫I hope you liked it and I will see you in the next lecture.