1 00:00:00,150 --> 00:00:01,170 ‫Few months ago. 2 00:00:01,170 --> 00:00:11,160 ‫Shopify's engineering have posted this interesting blog and brilliant in fact, talking about Top ten 3 00:00:11,160 --> 00:00:15,450 ‫tips for Building Resilient Payment System. 4 00:00:15,450 --> 00:00:22,170 ‫And I'll reference the blog for you guys and the description and the show notes. 5 00:00:22,470 --> 00:00:30,180 ‫But each of these ten tips is absolutely well crafted just for their use cases. 6 00:00:30,180 --> 00:00:41,550 ‫And in this particular episode, I'd like to focus on just one of these tips, because, frankly speaking, 7 00:00:41,550 --> 00:00:45,000 ‫each one of these steps is its own content. 8 00:00:45,030 --> 00:00:52,380 ‫It is on article no, and there is not much details, but boy, you can extract so much if you understand 9 00:00:52,380 --> 00:00:53,300 ‫the fundamentals. 10 00:00:53,310 --> 00:01:02,160 ‫So for this particular show, I'll focus on the database engineering aspects, specifically step number 11 00:01:02,160 --> 00:01:08,520 ‫six, which is to use item potency keys. 12 00:01:08,940 --> 00:01:09,300 ‫Right. 13 00:01:09,300 --> 00:01:11,250 ‫And how they. 14 00:01:12,390 --> 00:01:21,720 ‫Use a unique key that optimizes their inserts and select and queries for ident potency keys. 15 00:01:21,750 --> 00:01:22,870 ‫How about we jump into it? 16 00:01:22,890 --> 00:01:23,220 ‫All right. 17 00:01:23,220 --> 00:01:28,320 ‫So I talked about what I dub Potency is in another video. 18 00:01:28,320 --> 00:01:37,230 ‫Basically in a nutshell, either important request or other important backend is when you send a request 19 00:01:37,350 --> 00:01:43,860 ‫and this request is repeatable such that it doesn't change the state on the back end. 20 00:01:44,340 --> 00:01:45,720 ‫An example is a gift request. 21 00:01:45,720 --> 00:01:54,510 ‫A gift request by definition must be out important because if I do a read on a specific endpoint, if 22 00:01:54,510 --> 00:01:57,360 ‫I send that read twice, it doesn't matter. 23 00:01:57,360 --> 00:01:58,710 ‫Nothing changes on the backend. 24 00:01:58,710 --> 00:01:59,880 ‫Nothing should change. 25 00:01:59,880 --> 00:02:00,320 ‫Right? 26 00:02:01,230 --> 00:02:06,740 ‫Post, on the other hand, by definition is always not important, right? 27 00:02:06,780 --> 00:02:08,190 ‫Unless you make it to be. 28 00:02:08,190 --> 00:02:17,430 ‫If you post, if you insert a row repeating that insert is basically will change the state. 29 00:02:17,430 --> 00:02:18,210 ‫You don't want that. 30 00:02:18,210 --> 00:02:20,280 ‫It's not a desired behavior. 31 00:02:20,670 --> 00:02:27,420 ‫So if your endpoint says, okay, slash post and that creates a new entry, for example, let me fix 32 00:02:27,420 --> 00:02:27,920 ‫my mike. 33 00:02:27,930 --> 00:02:37,140 ‫And of course Shopify being a payment system, you want the ability to retry a payment without actually 34 00:02:37,140 --> 00:02:38,760 ‫causing a double spend. 35 00:02:38,760 --> 00:02:41,040 ‫You don't want to pay for something twice. 36 00:02:41,040 --> 00:02:42,900 ‫That's never fun, right? 37 00:02:42,900 --> 00:02:51,960 ‫And the opposite side for merchants you don't want if that happens, you don't want an accidental twice 38 00:02:51,960 --> 00:02:52,920 ‫or a refund. 39 00:02:52,920 --> 00:02:53,220 ‫Right. 40 00:02:53,370 --> 00:02:56,610 ‫So that's why I don't want is here as a critical concept. 41 00:02:56,610 --> 00:03:01,320 ‫I'm going to reference the video for you guys if you're interested to learn more about that item. 42 00:03:01,320 --> 00:03:03,420 ‫Potency is a very critical concept. 43 00:03:03,840 --> 00:03:05,670 ‫You have to build it yourself. 44 00:03:05,670 --> 00:03:10,860 ‫You have to configure your back in to be it important doesn't it's not for free. 45 00:03:11,340 --> 00:03:15,990 ‫That's why something called an absurd is a thing, right where you insert. 46 00:03:15,990 --> 00:03:19,920 ‫But if this exist, it becomes an update. 47 00:03:19,920 --> 00:03:22,530 ‫An absurd is an important concept. 48 00:03:23,640 --> 00:03:26,790 ‫So let's go ahead and read this blurb. 49 00:03:27,120 --> 00:03:33,930 ‫And in this particular blurb they talk about the importance of item potency, blah, blah, blah. 50 00:03:33,930 --> 00:03:34,950 ‫We know, right? 51 00:03:34,950 --> 00:03:42,450 ‫But here's the important thing that I am going to spend the most of the show about just talking about 52 00:03:42,450 --> 00:03:45,630 ‫that particular thing because. 53 00:03:47,380 --> 00:03:55,990 ‫It fascinated me the brilliance of Shopify engineering when it comes to database level tuning and data 54 00:03:55,990 --> 00:03:58,420 ‫modeling, which is very underrated. 55 00:03:59,140 --> 00:03:59,380 ‫All right. 56 00:03:59,650 --> 00:04:00,910 ‫Let's go ahead and read this. 57 00:04:01,810 --> 00:04:05,350 ‫An item potency key needs to be unique. 58 00:04:06,330 --> 00:04:08,020 ‫Well, that's that's important, right? 59 00:04:08,040 --> 00:04:15,540 ‫Because in this particular case, when they send a request, they add a key to uniquely identify payment 60 00:04:15,540 --> 00:04:16,200 ‫requests. 61 00:04:16,560 --> 00:04:19,880 ‫That's how you identify a payment or a request, Right. 62 00:04:19,920 --> 00:04:25,950 ‫If someone retried the same request with the same key that, you know, this is an actual retry. 63 00:04:25,980 --> 00:04:32,730 ‫That's whether happening from the user or from a proxy or for a reverse proxy or from API gateway, 64 00:04:32,730 --> 00:04:35,550 ‫some any middle layer that does that retried. 65 00:04:35,580 --> 00:04:36,540 ‫It doesn't matter. 66 00:04:36,540 --> 00:04:43,830 ‫We know that a retrial has happened right sometimes the user go back and then forward and then hit refresh 67 00:04:43,830 --> 00:04:46,170 ‫and then you get this message or do you want to resend it again? 68 00:04:46,170 --> 00:04:47,100 ‫You say yes. 69 00:04:47,100 --> 00:04:54,150 ‫And that sends the same technical request ID unless you went all the way and generated a brand new request 70 00:04:54,150 --> 00:04:57,540 ‫ID and physically wanted to pay. 71 00:04:57,540 --> 00:04:59,100 ‫Again, that's a different story. 72 00:04:59,100 --> 00:05:08,610 ‫But most of the the item potency key requests are sent within within a few seconds. 73 00:05:08,880 --> 00:05:09,340 ‫Right. 74 00:05:09,810 --> 00:05:12,340 ‫But what they do here is another point. 75 00:05:12,360 --> 00:05:15,600 ‫The key needs to be unique for the time. 76 00:05:15,600 --> 00:05:21,330 ‫We want the request to be retrievable, and that's a very critical use case for them. 77 00:05:21,330 --> 00:05:24,690 ‫They don't want the request to be retrievable infinitely. 78 00:05:24,690 --> 00:05:24,990 ‫Right. 79 00:05:24,990 --> 00:05:32,550 ‫If you send a request in 2018, a payment request in 2018, it's not going to live until 2022. 80 00:05:32,580 --> 00:05:33,870 ‫That doesn't make any sense. 81 00:05:33,900 --> 00:05:39,210 ‫It should live within, they estimate, a payment request to live within a 24 hour. 82 00:05:39,210 --> 00:05:41,690 ‫If you never made a payment within a 24 hour. 83 00:05:41,700 --> 00:05:47,040 ‫It was a failed, for example, on the back end for any reason. 84 00:05:47,160 --> 00:05:50,610 ‫We can try to retry it within this amount. 85 00:05:50,610 --> 00:05:54,810 ‫But if after that it says, hey, you know what, all bets are off, just do it again, right? 86 00:05:54,870 --> 00:05:57,330 ‫We will email you and say, hey, we could not retry that. 87 00:05:57,330 --> 00:06:04,950 ‫So typically 24 hours or less, that's say that's something you add as a as a designer, an architect. 88 00:06:05,310 --> 00:06:07,530 ‫We prefer using. 89 00:06:07,980 --> 00:06:09,480 ‫And here's the interesting part. 90 00:06:10,260 --> 00:06:19,410 ‫We prefer using an end universally that sounds like a typo, a universally unique lexis graphically 91 00:06:19,410 --> 00:06:23,280 ‫sortable identifier, or this thing that's called the U. 92 00:06:23,280 --> 00:06:24,510 ‫L IDs. 93 00:06:24,510 --> 00:06:25,050 ‫Right? 94 00:06:25,350 --> 00:06:33,720 ‫So this is called ul ID, right for these item potency keys instead of a random version for UID. 95 00:06:33,720 --> 00:06:40,320 ‫So if you don't know universal unique identifiers is or sometimes in Microsoft, we call them guedes 96 00:06:40,320 --> 00:06:46,080 ‫or globally unique identifier is a certain number of bits for gas. 97 00:06:46,080 --> 00:06:49,680 ‫Then 128 bits random mistaken 128 bits. 98 00:06:49,680 --> 00:06:55,980 ‫And these ones are guaranteed to be unique if you can generate them on the device. 99 00:06:55,980 --> 00:07:02,670 ‫And then you are 99% sure that's going to be unique, which is powerful concept. 100 00:07:02,670 --> 00:07:04,500 ‫Why do you want to use those? 101 00:07:05,520 --> 00:07:13,320 ‫You want to use those because you want the client to generate a unique ID as opposed of a database or 102 00:07:13,320 --> 00:07:18,390 ‫a back end to generate a sequential unique identifier. 103 00:07:18,480 --> 00:07:25,710 ‫Because you see, sequential identifiers are very powerful because sequence is beautiful in databases, 104 00:07:25,720 --> 00:07:28,560 ‫database is liked or like order things. 105 00:07:28,560 --> 00:07:33,540 ‫It likes things that are ordered because they can put them on the same page and you can query them and 106 00:07:33,540 --> 00:07:40,350 ‫you can they be tucked in nicely to each other and problems generating sequences of expensive because 107 00:07:40,350 --> 00:07:44,290 ‫you have to talk to the database to give you a unique sequence, right? 108 00:07:44,310 --> 00:07:52,200 ‫So there is a central point to generating like almost a central point of failure where we ask someone 109 00:07:52,200 --> 00:07:58,140 ‫to give us a unique ID versus the client just generates it and we know it's unique. 110 00:07:58,140 --> 00:08:01,440 ‫So that's why you UID is are very powerful. 111 00:08:01,440 --> 00:08:08,250 ‫The problem with the UID is or they are random or why does that why is that a problem? 112 00:08:08,250 --> 00:08:11,970 ‫Let's continue reading and continue and explain that a little bit more. 113 00:08:12,300 --> 00:08:19,980 ‫You will ID's contain a 48 bit stamp time stamps followed by 80 bit. 114 00:08:21,150 --> 00:08:21,860 ‫So I was right. 115 00:08:21,870 --> 00:08:23,520 ‫128 is that right? 116 00:08:23,700 --> 00:08:24,120 ‫Yeah. 117 00:08:24,120 --> 00:08:26,220 ‫128 if I can do math. 118 00:08:26,430 --> 00:08:27,180 ‫So. 119 00:08:28,410 --> 00:08:30,240 ‫Followed by 80 bits of random data. 120 00:08:30,240 --> 00:08:31,890 ‫So you'll ID's. 121 00:08:31,890 --> 00:08:35,580 ‫Ellie ID has some sort of an order to them. 122 00:08:35,580 --> 00:08:40,500 ‫So the first 48 bit has a time stamp and this time stamp. 123 00:08:41,510 --> 00:08:43,180 ‫Well, I. 124 00:08:44,340 --> 00:08:44,610 ‫Uh. 125 00:08:46,090 --> 00:08:51,450 ‫Will inject some sort of an order to these random UI IDs. 126 00:08:51,460 --> 00:08:53,230 ‫What is the benefit of these? 127 00:08:53,860 --> 00:09:01,270 ‫The time stamp allow u l IDs to be sorted, unlike random UI IDs which are not sorted. 128 00:09:01,390 --> 00:09:10,660 ‫Which works much better with ab3 data structure databases user for indexes in one high throughput system. 129 00:09:10,660 --> 00:09:19,630 ‫As Shopify, we've seen a 50% decrease in insert statement duration by switching from a UUID version 130 00:09:19,630 --> 00:09:23,600 ‫four to a u UL IDs for item keys. 131 00:09:23,680 --> 00:09:25,420 ‫And that's all what they say. 132 00:09:25,510 --> 00:09:26,980 ‫They don't tell you how. 133 00:09:26,980 --> 00:09:28,330 ‫They don't tell you why. 134 00:09:28,330 --> 00:09:35,800 ‫But I'm here to actually explain why this is the case and why it is faster because everything once you 135 00:09:35,800 --> 00:09:43,630 ‫understand how databases work and how the fundamentals of first principle of databases, this is just 136 00:09:43,630 --> 00:09:45,910 ‫like reading one plus one equal to. 137 00:09:46,150 --> 00:09:47,290 ‫So let's explain that. 138 00:09:48,050 --> 00:09:48,280 ‫Yeah. 139 00:09:48,920 --> 00:09:55,070 ‫So you see, you, you they're databases here. 140 00:09:55,340 --> 00:09:57,860 ‫They don't spell it out, but it's my SQL. 141 00:09:58,610 --> 00:09:59,870 ‫My SQL. 142 00:10:02,390 --> 00:10:11,420 ‫Primary keys are called a clustered index, which means that if you pick a primary key, that is, for 143 00:10:11,420 --> 00:10:13,430 ‫example, an integer. 144 00:10:15,290 --> 00:10:20,420 ‫A clustered primary key index is the table itself. 145 00:10:20,720 --> 00:10:21,650 ‫So what does that mean? 146 00:10:21,650 --> 00:10:27,260 ‫If you're integer as the primary key, then the index structure. 147 00:10:27,530 --> 00:10:37,280 ‫At the end there is the leaf pages where basically the pointers of where this integer points to is the 148 00:10:37,280 --> 00:10:40,910 ‫actual pages of data. 149 00:10:41,390 --> 00:10:49,370 ‫So if you have a row, one row to row three or four or five or six or seven, eight, nine, ten, these 150 00:10:49,730 --> 00:10:53,650 ‫are tucked in nicely together in a single page. 151 00:10:53,660 --> 00:10:59,600 ‫And not only there are the values, but every single column in the table is in the leaf page. 152 00:10:59,600 --> 00:11:09,500 ‫So if you search for the value of seven, right, you will find seven and you will find the row seven. 153 00:11:09,500 --> 00:11:12,710 ‫You find that all the columns that belongs to row seven. 154 00:11:13,220 --> 00:11:15,590 ‫That's how clustered indexes work. 155 00:11:16,450 --> 00:11:16,810 ‫Mm hmm. 156 00:11:16,840 --> 00:11:22,730 ‫And not only you find row seven, you get a four hour, find row eight, row nine, ten. 157 00:11:22,750 --> 00:11:25,780 ‫If 11 doesn't exist, you can find 12 next to it. 158 00:11:25,780 --> 00:11:33,420 ‫And not only just the values, 12 at ten and 12 and 13, you're going to find all the columns. 159 00:11:33,430 --> 00:11:39,310 ‫And because of this order, if you look up four rows seven, you will traverse the B three. 160 00:11:39,310 --> 00:11:43,630 ‫You find row seven and you're going to find any rows that is next to it. 161 00:11:43,630 --> 00:11:45,670 ‫So if you're doing range scans. 162 00:11:46,560 --> 00:11:50,070 ‫It is really beautiful because it's like how? 163 00:11:50,070 --> 00:11:53,160 ‫Give me all the rows between seven and 12. 164 00:11:53,190 --> 00:11:58,050 ‫Oh, that's that's a cheap query for B-plus trees. 165 00:11:58,770 --> 00:12:07,710 ‫And because we all searching on an indexed clustered index, not only we find it's almost like an index 166 00:12:07,710 --> 00:12:10,800 ‫only scan, it is an index only scan in this particular case. 167 00:12:11,130 --> 00:12:11,520 ‫All right. 168 00:12:11,520 --> 00:12:12,030 ‫That's nice. 169 00:12:12,030 --> 00:12:13,590 ‫We're reading an integer value. 170 00:12:14,070 --> 00:12:14,520 ‫Right? 171 00:12:14,520 --> 00:12:19,250 ‫And integer values are ordered in this particular clustered index. 172 00:12:19,250 --> 00:12:19,710 ‫So nice. 173 00:12:19,710 --> 00:12:22,320 ‫I get all the next values next to it. 174 00:12:22,320 --> 00:12:27,450 ‫But what if I'm inserting if you're inserting rows one and then. 175 00:12:28,600 --> 00:12:30,420 ‫Ten and then 1000. 176 00:12:30,430 --> 00:12:36,400 ‫They are not ordered and you're inserting 20,000 and then you turn around and insert. 177 00:12:37,220 --> 00:12:37,870 ‫Three. 178 00:12:38,840 --> 00:12:48,200 ‫The database must insert the row three in the same page that has the one right because it needs to order 179 00:12:48,200 --> 00:12:48,740 ‫them. 180 00:12:48,740 --> 00:12:54,380 ‫And not only it needs to order the index, it needs to order the whole row, right? 181 00:12:54,380 --> 00:12:59,600 ‫So the pages must be ordered and the. 182 00:12:59,930 --> 00:13:02,840 ‫That's the, that's the both the. 183 00:13:03,810 --> 00:13:06,150 ‫Advantages, disadvantages of a clustered index. 184 00:13:06,150 --> 00:13:13,560 ‫So now if we take if we move this into the primary key concept with a UUID, which is a random one, 185 00:13:13,560 --> 00:13:18,900 ‫you generate a UID, which is random, it's the table is empty, you get an insert in the first page, 186 00:13:18,900 --> 00:13:21,030 ‫create a brand new page, and you insert it right. 187 00:13:21,330 --> 00:13:25,230 ‫And then the second UID is also random. 188 00:13:25,680 --> 00:13:27,570 ‫Well, we don't have anything else. 189 00:13:27,570 --> 00:13:30,540 ‫We just have this one and we order it. 190 00:13:30,540 --> 00:13:33,510 ‫And this one happened to be right after it. 191 00:13:33,510 --> 00:13:36,660 ‫And then you keep inserting and inserting random, random goods. 192 00:13:36,660 --> 00:13:37,170 ‫Right. 193 00:13:37,590 --> 00:13:45,330 ‫And shifting the results as you find out as the table starts to grow, as you start inserting these 194 00:13:45,330 --> 00:13:49,290 ‫random codes, you will find yourself pointing to random pages. 195 00:13:49,290 --> 00:13:49,920 ‫Because guess what? 196 00:13:50,250 --> 00:13:54,210 ‫There is no order of the way you're inserting these things. 197 00:13:54,210 --> 00:14:01,710 ‫They are not ordered at all, so you need to find where this random goods should live based on their 198 00:14:01,710 --> 00:14:06,870 ‫order and it's exactly identical to inserting random integer values. 199 00:14:06,870 --> 00:14:07,680 ‫Identical. 200 00:14:07,680 --> 00:14:14,790 ‫So if you insert value one and then value thousand and then value of 3 million and then 7 million, 201 00:14:14,790 --> 00:14:17,460 ‫and then you turn around, insert two. 202 00:14:17,610 --> 00:14:22,470 ‫You need to find the page where two lives exactly next to the one. 203 00:14:22,470 --> 00:14:23,010 ‫Right. 204 00:14:23,160 --> 00:14:27,630 ‫So random insertion will just. 205 00:14:28,360 --> 00:14:31,260 ‫Be will cause random iOS. 206 00:14:31,270 --> 00:14:38,680 ‫So we will do an IO, fetch that page, get an memory, which is called also the buffer pool in my SQL 207 00:14:38,680 --> 00:14:45,460 ‫and then put it nicely in memory and let's just hope that this page will receive another write which 208 00:14:45,460 --> 00:14:50,560 ‫will never write, which will almost never receive another write because everything is random, right? 209 00:14:50,860 --> 00:14:56,680 ‫So you're going to get this page, you insert, insert that random word, and then you insert another 210 00:14:56,680 --> 00:14:58,390 ‫random UUID. 211 00:14:58,390 --> 00:14:59,440 ‫And guess what? 212 00:14:59,440 --> 00:15:00,250 ‫It's not on the same page. 213 00:15:00,250 --> 00:15:02,620 ‫You have to fetch the page that needs to live in. 214 00:15:02,620 --> 00:15:04,990 ‫You fetch another page, you write it and put it in memory. 215 00:15:04,990 --> 00:15:10,390 ‫So now we have two pages in memory and then another random, I guess it's not in these pages, it's 216 00:15:10,390 --> 00:15:11,230 ‫another page. 217 00:15:11,230 --> 00:15:19,660 ‫So you end up filling the buffer pool, which is the memory with pages that almost receive just one 218 00:15:19,660 --> 00:15:21,070 ‫or two writes. 219 00:15:21,280 --> 00:15:28,870 ‫That's bad because what happens is you will fill up your memory, the buffer pool, with pages that 220 00:15:28,870 --> 00:15:31,120 ‫are almost never used. 221 00:15:32,590 --> 00:15:36,940 ‫It's almost you going to read the entire data with with UID. 222 00:15:36,940 --> 00:15:38,200 ‫That's why it's bad. 223 00:15:38,200 --> 00:15:40,240 ‫That's why it's inserts or slow. 224 00:15:40,630 --> 00:15:41,020 ‫Right. 225 00:15:41,020 --> 00:15:47,980 ‫And I'm focusing on inserts because reads are also bad, but inserts are the worst because now inserts 226 00:15:47,980 --> 00:15:53,640 ‫you have to read the page and then write to the page and then write to the wall, which is the right 227 00:15:53,650 --> 00:15:57,010 ‫ahead lock and then flush the page with checkpoints. 228 00:15:57,010 --> 00:16:05,560 ‫So not only you slowed down rights because to receive a write, you have to read, put it in memory 229 00:16:05,560 --> 00:16:06,520 ‫and then right way. 230 00:16:06,520 --> 00:16:09,220 ‫And if it's not in memory, you have to go and do an IO. 231 00:16:09,220 --> 00:16:13,750 ‫So inserts are always almost causing an IO write. 232 00:16:13,750 --> 00:16:14,830 ‫And guess what? 233 00:16:15,360 --> 00:16:18,150 ‫What if the buffer pool fills up, which will. 234 00:16:18,910 --> 00:16:24,270 ‫And give it a few million requests, which Shopify easily do in an hour. 235 00:16:24,280 --> 00:16:24,910 ‫Right. 236 00:16:25,000 --> 00:16:27,510 ‫And then this buffer will fill up. 237 00:16:27,520 --> 00:16:28,560 ‫So what does that mean? 238 00:16:28,570 --> 00:16:31,150 ‫If it fills up, you can't even write. 239 00:16:31,420 --> 00:16:31,800 ‫Right. 240 00:16:31,810 --> 00:16:32,800 ‫So what does that mean? 241 00:16:32,800 --> 00:16:36,960 ‫You have to flush existing buffer pool pages back to disk. 242 00:16:36,970 --> 00:16:40,930 ‫That is a cost that is an expensive thing called check pointing. 243 00:16:40,930 --> 00:16:46,330 ‫And it's not exactly check pointing, but it's part of the writing. 244 00:16:46,330 --> 00:16:47,170 ‫Flushing back. 245 00:16:47,170 --> 00:16:52,150 ‫The changes that you do in the data pages to disk check pointing is something else. 246 00:16:52,150 --> 00:16:58,510 ‫But when you flush these things, you have to flush them to disk, persist them and. 247 00:16:59,330 --> 00:17:05,900 ‫And then now that we wrote, wrote these dirty pages to this, now we have some free memory on the buffer. 248 00:17:06,080 --> 00:17:12,290 ‫Let's read now that random page that this good has happened to live in and then insert it. 249 00:17:12,320 --> 00:17:18,500 ‫And you see this thrashing that the database keeps doing, which is awful. 250 00:17:18,800 --> 00:17:21,330 ‫So what those guys did was Shopify. 251 00:17:21,380 --> 00:17:25,160 ‫It is like they realized this big problem with the UID. 252 00:17:25,160 --> 00:17:26,720 ‫So it said, Hey, we still like you. 253 00:17:26,780 --> 00:17:30,050 ‫It is we, we like the uniqueness of the OIDs. 254 00:17:30,050 --> 00:17:34,460 ‫I'm not going to introduce a centralized system for to generate unique IDs. 255 00:17:34,460 --> 00:17:39,950 ‫That's just an I mean, you can but you created a bottleneck, right? 256 00:17:39,950 --> 00:17:46,660 ‫All these requests you can create a microservice that its sole job is to generate unique IDs that aren't 257 00:17:46,670 --> 00:17:49,260 ‫integers that works right? 258 00:17:49,280 --> 00:17:56,600 ‫And it will guarantee that no to services or or to request will get the same unique ID. 259 00:17:56,630 --> 00:18:00,020 ‫You can do that, but you have to serialize them, right? 260 00:18:00,020 --> 00:18:02,330 ‫And it becomes a bottleneck. 261 00:18:02,330 --> 00:18:08,420 ‫So they still want to use the UID, but they will use the UL ID which has a time stamp base. 262 00:18:08,420 --> 00:18:16,250 ‫So these there is, there is there is an order to this request and the order by time. 263 00:18:16,250 --> 00:18:23,210 ‫And it works perfectly for Shopify y when you generate a request, they are time based. 264 00:18:23,210 --> 00:18:28,130 ‫They are absolutely time based requests that are generated is definitely time based. 265 00:18:28,130 --> 00:18:34,430 ‫So if I generate a new request and I want to write it right in t zero, the next request is PT one. 266 00:18:34,430 --> 00:18:35,840 ‫Then this request is RT three. 267 00:18:35,840 --> 00:18:40,910 ‫The next request for these requests will definitely will be one after the other. 268 00:18:40,910 --> 00:18:41,930 ‫What does that mean? 269 00:18:41,930 --> 00:18:44,270 ‫And B three speak in. 270 00:18:44,270 --> 00:18:45,440 ‫B three speak. 271 00:18:45,440 --> 00:18:50,540 ‫If I my primary key is the UL IDs then. 272 00:18:51,760 --> 00:18:53,620 ‫I generate the t zero request. 273 00:18:53,620 --> 00:18:54,100 ‫Right. 274 00:18:54,310 --> 00:19:00,220 ‫And again, I'm saying t zero, but it's t zero followed by a random number, which the most important 275 00:19:00,220 --> 00:19:02,980 ‫part is the first part for the database. 276 00:19:02,980 --> 00:19:03,410 ‫Right. 277 00:19:03,430 --> 00:19:04,720 ‫It's always left to right. 278 00:19:05,260 --> 00:19:11,200 ‫And just like indexes combine indexes, compound indexes have to go left to right as well. 279 00:19:11,200 --> 00:19:11,350 ‫Right. 280 00:19:11,380 --> 00:19:11,790 ‫Same thing. 281 00:19:11,800 --> 00:19:20,860 ‫So now you generate request to zero request RT one request RT three RT 45 and all of these T zero Guess 282 00:19:20,860 --> 00:19:21,070 ‫what? 283 00:19:21,190 --> 00:19:22,720 ‫Go to this page. 284 00:19:22,840 --> 00:19:28,570 ‫And then where is T one goes right in the same page, which is exactly what you want. 285 00:19:28,600 --> 00:19:31,730 ‫So t zero first we'll say, okay, I don't have this page. 286 00:19:31,750 --> 00:19:35,830 ‫Let me fetch it from disk, put it there and then write to it. 287 00:19:35,950 --> 00:19:38,000 ‫And then t request t one comes in. 288 00:19:38,020 --> 00:19:39,190 ‫It's not random. 289 00:19:39,190 --> 00:19:40,680 ‫It's ordered. 290 00:19:40,690 --> 00:19:42,280 ‫So t one. 291 00:19:42,280 --> 00:19:46,600 ‫All the requests that comes in almost always goes to the tail. 292 00:19:47,390 --> 00:19:48,410 ‫Of the big three. 293 00:19:48,920 --> 00:19:49,480 ‫Right. 294 00:19:49,520 --> 00:19:50,330 ‫Which is. 295 00:19:50,330 --> 00:19:51,340 ‫Which is good. 296 00:19:51,350 --> 00:19:55,340 ‫But there's also another problem that I'm going to mention, that they don't talk about it here. 297 00:19:56,030 --> 00:19:59,300 ‫That page will receive many rights. 298 00:19:59,480 --> 00:20:05,540 ‫So t one will go to a t, three will go to a T for all of these budget pages will until it gets full 299 00:20:06,050 --> 00:20:06,800 ‫done. 300 00:20:06,980 --> 00:20:07,810 ‫Leave it. 301 00:20:07,820 --> 00:20:10,370 ‫It's almost impossible. 302 00:20:10,370 --> 00:20:11,270 ‫Improbable. 303 00:20:11,270 --> 00:20:17,810 ‫Let's say that another request in in the past will just come in out of the blue and then. 304 00:20:18,940 --> 00:20:22,540 ‫Fitch is a page from the past and you cannot insert a request in the past. 305 00:20:22,540 --> 00:20:22,840 ‫Right? 306 00:20:22,840 --> 00:20:24,730 ‫All requests will come real time. 307 00:20:25,030 --> 00:20:29,230 ‫And I guess I suppose there is there might be a bug. 308 00:20:29,230 --> 00:20:36,910 ‫I will call it a bug in the client where the client generates the UL ID, but it got disconnected, 309 00:20:36,910 --> 00:20:37,330 ‫right? 310 00:20:37,330 --> 00:20:42,460 ‫And then later it was connected after an hour and then use that ID. 311 00:20:42,490 --> 00:20:44,050 ‫That's fine, right? 312 00:20:44,380 --> 00:20:44,890 ‫That's fine. 313 00:20:44,890 --> 00:20:45,370 ‫Sure. 314 00:20:45,370 --> 00:20:53,410 ‫That is kind of an anomaly where an old request ID with an old timestamp will pull an old page to write 315 00:20:53,410 --> 00:20:56,410 ‫to it because we have moved on already. 316 00:20:56,410 --> 00:21:02,440 ‫But the goal here is all the request that comes in will be nicely ordered. 317 00:21:02,920 --> 00:21:06,460 ‫The buffer pool will almost have one or two or three pages. 318 00:21:06,460 --> 00:21:06,980 ‫Right. 319 00:21:07,000 --> 00:21:08,830 ‫And we'll get right to the tail. 320 00:21:08,830 --> 00:21:10,030 ‫Always right to the tail. 321 00:21:10,030 --> 00:21:11,830 ‫So right or fast. 322 00:21:11,860 --> 00:21:12,760 ‫How about reads? 323 00:21:12,760 --> 00:21:13,750 ‫Reads are also fast. 324 00:21:13,750 --> 00:21:14,470 ‫Because guess what? 325 00:21:14,500 --> 00:21:20,590 ‫If you're going to read a request, chances that this request you ID that you just generated is just 326 00:21:20,590 --> 00:21:21,850 ‫you just generated it. 327 00:21:21,850 --> 00:21:29,140 ‫So if you just generated it, chances that it is almost always in memory, it's a dirty page, right? 328 00:21:29,140 --> 00:21:34,600 ‫So yeah, you're going to read this page that is effectively it's committed in the wall, but it's still 329 00:21:35,020 --> 00:21:37,390 ‫in the memory so you're going to read it from. 330 00:21:37,570 --> 00:21:38,920 ‫So it's already there. 331 00:21:38,920 --> 00:21:43,960 ‫So it reads out fast as opposed to random your IDs, which you just random. 332 00:21:43,960 --> 00:21:48,490 ‫They have no absolutely no order to them. 333 00:21:48,550 --> 00:21:53,560 ‫And then you're going to query that and then you have to find the page where it lives and then pull 334 00:21:53,560 --> 00:21:57,070 ‫it in memory and just hope that someone else will ask for the same page. 335 00:21:57,850 --> 00:22:01,480 ‫But what, because of their unique use case. 336 00:22:01,480 --> 00:22:07,660 ‫So before, because of this request, the order of the request and the order they are come in and the 337 00:22:07,660 --> 00:22:15,880 ‫fact that it's 24 hours or less, they built this so tight such that this works perfectly for them, 338 00:22:15,880 --> 00:22:18,490 ‫other IDs work perfectly for them. 339 00:22:18,490 --> 00:22:19,750 ‫What's the problem here? 340 00:22:19,760 --> 00:22:25,930 ‫The other problem is a little thing called the mutex and. 341 00:22:27,300 --> 00:22:30,310 ‫Suppose it's an operating system thing. 342 00:22:30,330 --> 00:22:32,610 ‫It's a computer science thing where it's a lock. 343 00:22:32,820 --> 00:22:36,840 ‫I talk about pages in my medium and I'm like, I'm making another video about it. 344 00:22:36,840 --> 00:22:38,730 ‫Just talk about database pages. 345 00:22:39,420 --> 00:22:46,090 ‫Very critical concept to understand whether it's not equal graph doesn't matter. 346 00:22:46,110 --> 00:22:48,480 ‫Every database has a concept of a page and. 347 00:22:49,670 --> 00:22:54,650 ‫It's different from the file system page and it's different from the PSD page has nothing to do with 348 00:22:54,650 --> 00:22:58,460 ‫each other, and page is nothing but an in-memory structure. 349 00:22:58,880 --> 00:23:06,230 ‫And if you have multithreading in your database, which almost you do, then multiple threads will try 350 00:23:06,230 --> 00:23:08,060 ‫to write on the same page. 351 00:23:08,300 --> 00:23:08,780 ‫Right. 352 00:23:09,140 --> 00:23:15,530 ‫And if you don't do it correctly, you can corrupt your page, write, raise conditions and all. 353 00:23:15,890 --> 00:23:21,640 ‫So unless you build your structure, set that such that you can have two threads right in the same memory 354 00:23:21,650 --> 00:23:25,040 ‫location, which is very hard, you have to acquire something called a mutex. 355 00:23:25,040 --> 00:23:33,200 ‫And I think if I'm mistaken in my SQL, they call them latches or there might be SQL Server I can't 356 00:23:33,200 --> 00:23:33,630 ‫remember. 357 00:23:33,650 --> 00:23:35,060 ‫So you have to latch on the page. 358 00:23:35,060 --> 00:23:40,280 ‫If a thread wants to write something, it latches to it and then writes and then an UN latches. 359 00:23:40,580 --> 00:23:42,680 ‫All right, I'll release the mutex. 360 00:23:43,700 --> 00:23:44,120 ‫Right. 361 00:23:44,240 --> 00:23:51,830 ‫And that constant if you're writing to the tale always, if you're like there's like thousands of requests 362 00:23:51,830 --> 00:23:57,440 ‫always competing for the tail, you're going to start seeing serialization, right? 363 00:23:57,500 --> 00:23:59,950 ‫As as as threads being serialized. 364 00:23:59,960 --> 00:24:04,160 ‫So you'll see slight slow down for that particular problem. 365 00:24:04,160 --> 00:24:08,660 ‫But it's not as bad as having the buffer pool filled up. 366 00:24:08,660 --> 00:24:13,010 ‫I think they already realize this problem where this latching happen in the tail. 367 00:24:13,460 --> 00:24:15,650 ‫I don't know if there's a solution to it, to be honest. 368 00:24:15,650 --> 00:24:16,130 ‫Right. 369 00:24:17,450 --> 00:24:20,990 ‫I guess solutions to me having a little bit of randomness to it. 370 00:24:21,440 --> 00:24:24,050 ‫But for for this particular case. 371 00:24:24,440 --> 00:24:25,160 ‫Yeah. 372 00:24:25,760 --> 00:24:33,680 ‫Maybe make the make the page size a little bit smaller so that you have a little bit more pages. 373 00:24:33,680 --> 00:24:34,550 ‫But I don't know. 374 00:24:35,090 --> 00:24:42,020 ‫So that if you have a large page size, I mean, because my SQL, A.B. is 16 and I have no idea if they 375 00:24:42,020 --> 00:24:43,430 ‫configure that and changed it. 376 00:24:43,430 --> 00:24:49,280 ‫16 KB So Skip can fill up really quick with with UI IDs. 377 00:24:49,280 --> 00:24:50,890 ‫UL IDs, I suppose. 378 00:24:50,900 --> 00:24:54,440 ‫And I have no idea what their table structure looks like. 379 00:24:55,220 --> 00:24:58,700 ‫But yeah, it's, it's interesting looking at all these things, right? 380 00:24:59,130 --> 00:25:05,480 ‫They actually took full advantage of ul ID the first time I've seen a full advantage of you all IDs 381 00:25:05,480 --> 00:25:08,120 ‫right now we're going to be careful with this. 382 00:25:08,120 --> 00:25:08,450 ‫Right? 383 00:25:08,450 --> 00:25:12,590 ‫So now we know there is something called ul ID, a new UID is. 384 00:25:13,630 --> 00:25:14,920 ‫Is usually you. 385 00:25:14,920 --> 00:25:15,340 ‫You. 386 00:25:15,610 --> 00:25:17,890 ‫But what I want to say is. 387 00:25:18,690 --> 00:25:22,920 ‫Don't just use your ID because it's a new thing. 388 00:25:22,950 --> 00:25:24,690 ‫I think you still need to. 389 00:25:25,460 --> 00:25:32,270 ‫Fix, make sure that it's your use case actually fits it right nicely. 390 00:25:32,270 --> 00:25:33,320 ‫Let's take an example. 391 00:25:33,320 --> 00:25:43,430 ‫So let's say I want to make a URL shorter such that I don't want the user to select anything. 392 00:25:43,430 --> 00:25:43,910 ‫Right. 393 00:25:44,420 --> 00:25:45,740 ‫I just wanted. 394 00:25:46,710 --> 00:25:53,300 ‫To generate a unique short URL based on the utility. 395 00:25:53,310 --> 00:25:59,130 ‫So when you tell me, Hey, this is wrong, you ought to make it short, I'll generate a utility for 396 00:25:59,130 --> 00:25:59,520 ‫you. 397 00:26:00,090 --> 00:26:02,970 ‫And based on that, this is now the short order. 398 00:26:03,210 --> 00:26:07,620 ‫This is if you use normal random UI IDs. 399 00:26:08,480 --> 00:26:09,530 ‫Inserts, right? 400 00:26:09,920 --> 00:26:12,980 ‫A fleet of people will be creating short URLs. 401 00:26:12,980 --> 00:26:13,430 ‫Right. 402 00:26:13,700 --> 00:26:20,180 ‫And in that particular case, randomness will hurt performance, especially in write performance. 403 00:26:20,540 --> 00:26:28,310 ‫If you use UL IDs, then you can control the inserts because you know that people who generate the short 404 00:26:28,310 --> 00:26:31,490 ‫yardage will be generated in game one after the other. 405 00:26:31,490 --> 00:26:38,090 ‫You're going to get a nice boost and insert performance with your slides generating short URLs. 406 00:26:38,120 --> 00:26:39,230 ‫Again, if you have like. 407 00:26:39,940 --> 00:26:45,790 ‫Many, many thousands of requests because this early will come in and. 408 00:26:46,330 --> 00:26:53,470 ‫And the beauty here is actually you can either have the client generate them or have the database generate 409 00:26:53,470 --> 00:26:53,860 ‫them. 410 00:26:53,860 --> 00:27:00,690 ‫And if you do that, all these keys that come in will, we'll look up where should they fit? 411 00:27:00,700 --> 00:27:09,700 ‫And most probably they're going to be ordered because right requests to generate new URLs will be in 412 00:27:09,700 --> 00:27:16,030 ‫order and as a result they can all fit in in the nice nicely tucked in page instead of randomly. 413 00:27:16,030 --> 00:27:16,170 ‫Right. 414 00:27:16,180 --> 00:27:17,110 ‫So that's nice. 415 00:27:17,110 --> 00:27:18,330 ‫But reads. 416 00:27:18,370 --> 00:27:20,440 ‫I don't think it's going to benefit you at all. 417 00:27:20,440 --> 00:27:22,840 ‫Just the same thing as you ID read. 418 00:27:23,610 --> 00:27:25,470 ‫Or will be so random. 419 00:27:25,470 --> 00:27:31,070 ‫You have absolutely no guarantee in this case to optimize read requests. 420 00:27:31,080 --> 00:27:31,950 ‫Unlike Shopify. 421 00:27:32,220 --> 00:27:39,900 ‫Shopify actually optimize both write and read because reading they will almost read requests that are 422 00:27:39,900 --> 00:27:41,460 ‫within this 24 hours. 423 00:27:41,460 --> 00:27:46,750 ‫There's no point reading request from seven years right ago. 424 00:27:46,840 --> 00:27:50,100 ‫Doesn't make sense because hey, we're building an item potency token. 425 00:27:50,100 --> 00:27:50,940 ‫Here you are. 426 00:27:50,940 --> 00:27:53,130 ‫Else you have no control over that. 427 00:27:53,130 --> 00:27:59,120 ‫Someone might request a URL that you have created a year ago. 428 00:27:59,130 --> 00:27:59,610 ‫Right. 429 00:27:59,880 --> 00:28:02,010 ‫And in that case, you're going to pull that page. 430 00:28:02,010 --> 00:28:06,480 ‫So with your ID unfortunately short, you are a short note. 431 00:28:06,600 --> 00:28:16,590 ‫This randomness, the use case is random and there is absolutely no way I can think of to make it better 432 00:28:16,590 --> 00:28:17,910 ‫for read request. 433 00:28:18,680 --> 00:28:19,220 ‫Time. 434 00:28:19,490 --> 00:28:19,870 ‫Because. 435 00:28:20,150 --> 00:28:21,770 ‫Because the the. 436 00:28:21,770 --> 00:28:24,530 ‫The use case is random. 437 00:28:24,980 --> 00:28:27,290 ‫The read request is random. 438 00:28:27,530 --> 00:28:29,180 ‫You have no control of you. 439 00:28:29,180 --> 00:28:31,220 ‫What users you are else visit. 440 00:28:31,220 --> 00:28:31,700 ‫Right. 441 00:28:31,700 --> 00:28:32,690 ‫So this is a result. 442 00:28:32,690 --> 00:28:37,250 ‫It's just a random and I have you have to configure your hardware based on that so just this is just 443 00:28:37,250 --> 00:28:38,810 ‫an understanding of this. 444 00:28:38,810 --> 00:28:42,700 ‫I don't don't think that you will you will fix everything for you. 445 00:28:42,710 --> 00:28:42,930 ‫Right. 446 00:28:43,100 --> 00:28:49,620 ‫So another disadvantage, if you will, is the size, right? 447 00:28:49,640 --> 00:28:50,750 ‫128. 448 00:28:51,490 --> 00:28:52,570 ‫Is huge. 449 00:28:52,570 --> 00:29:00,820 ‫And if you are using my SQL in particular, not necessarily postcodes, but my school will secondary 450 00:29:00,820 --> 00:29:01,840 ‫indexes. 451 00:29:02,770 --> 00:29:05,590 ‫Point back to the. 452 00:29:07,230 --> 00:29:10,520 ‫To the to the primary index key. 453 00:29:10,530 --> 00:29:14,730 ‫So if you are like indexing another field, that has nothing to do with Euclid. 454 00:29:14,940 --> 00:29:15,180 ‫Right. 455 00:29:15,240 --> 00:29:16,140 ‫But I don't know. 456 00:29:16,140 --> 00:29:24,630 ‫You're interested in the date on which this created, then this date will create an index and the values, 457 00:29:24,630 --> 00:29:26,250 ‫the keys is the date. 458 00:29:26,250 --> 00:29:32,580 ‫The values is what is the UID ID because that's how my SQL. 459 00:29:33,490 --> 00:29:42,770 ‫Actually cluster indexes work right as this is the same recently in MongoDB as well with clustered collections. 460 00:29:42,790 --> 00:29:49,090 ‫So if you're using MongoDB and you decided to turn on cluster collections on your collections, that 461 00:29:49,090 --> 00:29:50,890 ‫is identical to MySQL right now. 462 00:29:51,940 --> 00:29:55,330 ‫So the secondary indexes can blow it up. 463 00:29:56,250 --> 00:29:58,670 ‫Based on the size of the primary key. 464 00:29:58,680 --> 00:30:02,370 ‫So just just something a fruit of thought to understand these things. 465 00:30:02,490 --> 00:30:06,720 ‫What's the problem of large indexes like I have all the space in the world. 466 00:30:06,720 --> 00:30:10,710 ‫Well, you have all the space in the world, but you don't have all the all the memory in the world, 467 00:30:10,710 --> 00:30:11,220 ‫Right. 468 00:30:11,640 --> 00:30:13,290 ‫If you do, kudos. 469 00:30:13,710 --> 00:30:19,710 ‫But large indexes effectively need to be in memory to be effective in reading. 470 00:30:19,710 --> 00:30:26,820 ‫So if you have a large indexes with a lot of bloated secondary indexes and you have ten indexes can 471 00:30:26,820 --> 00:30:35,160 ‫really add up and it can slow down writes as well and reads mostly, but right guys. 472 00:30:35,160 --> 00:30:36,810 ‫So this is what I wanted to discuss. 473 00:30:36,900 --> 00:30:38,250 ‫Very interesting concept. 474 00:30:38,250 --> 00:30:44,340 ‫You alluded and I'll keep thinking more about it is like where can this break? 475 00:30:44,340 --> 00:30:46,470 ‫But most probably you will. 476 00:30:46,470 --> 00:30:50,520 ‫It is almost always better than you uid right. 477 00:30:50,520 --> 00:30:53,640 ‫But I'm yet to think about a case where. 478 00:30:55,090 --> 00:30:58,030 ‫Both are actually not a good idea. 479 00:30:59,050 --> 00:31:00,820 ‫Size is one of them, I suppose. 480 00:31:01,300 --> 00:31:08,590 ‫But yeah, but this is one of the first use cases that I saw that actually takes full advantage of your 481 00:31:08,620 --> 00:31:10,240 ‫IDs, Right. 482 00:31:11,070 --> 00:31:12,450 ‫Brilliantly, might I say. 483 00:31:12,450 --> 00:31:14,250 ‫So again, good article. 484 00:31:14,250 --> 00:31:15,930 ‫I'm going to reference it below the shownotes. 485 00:31:16,050 --> 00:31:19,470 ‫I hope you enjoy this video podcast. 486 00:31:20,140 --> 00:31:20,760 ‫See you in the next one. 487 00:31:20,790 --> 00:31:21,150 ‫You guys stay. 488 00:31:21,150 --> 00:31:21,420 ‫Awesome. 489 00:31:21,450 --> 00:31:21,840 ‫Goodbye.