1 00:00:00,120 --> 00:00:05,160 ‫Welcome to another Q&A of the month of January 2022. 2 00:00:05,220 --> 00:00:10,710 ‫This is the series where we pick one of your wonderful questions that you submit here and the fundamentals 3 00:00:10,710 --> 00:00:11,970 ‫of database engineering. 4 00:00:12,210 --> 00:00:19,890 ‫Udemy course and try to answer them in depth as much as possible to my extent and experience. 5 00:00:20,130 --> 00:00:21,090 ‫How about we jump into it? 6 00:00:21,500 --> 00:00:26,190 ‫And today I said I thought we're going to pick two questions. 7 00:00:27,060 --> 00:00:28,710 ‫This one comes from Chris. 8 00:00:30,980 --> 00:00:36,830 ‫And says, Doug, about Reed, only transactions, when do you want to perform only reads in a transaction. 9 00:00:36,830 --> 00:00:40,370 ‫So this is effectively the question relates to read only transaction. 10 00:00:40,370 --> 00:00:46,010 ‫So read only transactions are what are those first? 11 00:00:46,550 --> 00:00:48,950 ‫Those are transactions that you. 12 00:00:52,400 --> 00:01:01,370 ‫Issue from the client to the database and any queries that execute in that transaction doesn't do any 13 00:01:01,370 --> 00:01:04,970 ‫changes to the data in the database. 14 00:01:05,780 --> 00:01:12,620 ‫So the benefits of this is really very minimal. 15 00:01:13,220 --> 00:01:19,580 ‫Right from my my own experience and just researching this topic is the first thing is it will protect 16 00:01:19,580 --> 00:01:20,780 ‫you as a client. 17 00:01:20,780 --> 00:01:30,260 ‫So let's say you have a bunch of code and you're trying to, you know, execute a transaction, but 18 00:01:30,590 --> 00:01:33,110 ‫you want you or you're not sure what's beneath you. 19 00:01:33,110 --> 00:01:35,090 ‫You're calling them a bunch of methods. 20 00:01:35,090 --> 00:01:38,030 ‫And these methods are executing queries. 21 00:01:38,270 --> 00:01:42,980 ‫But you want to make sure that you know that whatever you're about to do, this operation that the user, 22 00:01:42,980 --> 00:01:48,110 ‫the high level operation that the user will do, is it only reading it shouldn't do any writing. 23 00:01:48,410 --> 00:01:54,320 ‫So if you do, I read only transaction that will protect you in case any of these underlying method 24 00:01:54,320 --> 00:02:00,440 ‫that you're calling accidentally tries to, you know, try to change something, it will fail. 25 00:02:01,820 --> 00:02:05,990 ‫So so you'll find the bad code and stop it right there. 26 00:02:06,500 --> 00:02:13,910 ‫And so the other reason so this is one reason, at least from the client protection perspective. 27 00:02:14,270 --> 00:02:21,920 ‫The other reason is that the database level, you know, if the database know that the transaction that 28 00:02:21,920 --> 00:02:31,730 ‫you're about to start will never change anything, it can sometimes take shortcuts and do certain performance 29 00:02:32,840 --> 00:02:33,880 ‫optimizations. 30 00:02:33,890 --> 00:02:35,610 ‫Not all the time, but sometimes. 31 00:02:35,620 --> 00:02:41,810 ‫Yeah, because when you start a transaction, you can either read or you can read and write. 32 00:02:42,020 --> 00:02:47,390 ‫Writing is more expensive than reading, and just by that chance that you're actually modifying the 33 00:02:47,390 --> 00:02:50,540 ‫data, you need to obtain certain types of locks. 34 00:02:50,960 --> 00:02:55,700 ‫You need to aversion that data if you want to support multi version concurrency control. 35 00:02:55,910 --> 00:02:59,690 ‫So there is some more work that you need to do that you need to generate transaction IDs. 36 00:02:59,690 --> 00:03:02,930 ‫In case of Postgres, you need, you need utility. 37 00:03:03,050 --> 00:03:10,430 ‫So if if it if the database know in advance that you're never going to change, it can skip all that 38 00:03:10,430 --> 00:03:11,240 ‫stuff, right? 39 00:03:12,650 --> 00:03:16,910 ‫Doesn't mean that you're going to gain performance because most databases are now smart enough. 40 00:03:17,120 --> 00:03:27,410 ‫Like, I'll give you example in Postgres, even if you say if you start the transaction where you say 41 00:03:27,410 --> 00:03:33,990 ‫it's read only or not, it's by default read only until you make your first right. 42 00:03:34,220 --> 00:03:37,870 ‫And that is when it will become a right transaction. 43 00:03:37,880 --> 00:03:38,480 ‫What does that mean? 44 00:03:38,750 --> 00:03:43,190 ‫It means that it will issue a new transaction ID for that transaction. 45 00:03:43,310 --> 00:03:49,630 ‫Issuing these transaction IDs, we talked about them many times to the lecturers of this course is expensive. 46 00:03:49,640 --> 00:03:51,140 ‫Is going to be expensive, right? 47 00:03:51,320 --> 00:03:57,290 ‫Because this is how we identify that all this rule was alive from this transaction ID until this transaction 48 00:03:57,290 --> 00:04:00,950 ‫idea that killed it was that whether there was an update on an actual delete. 49 00:04:01,410 --> 00:04:07,490 ‫And so issuing this transaction, I.D. keeping them in sync, keeping the state of those transaction 50 00:04:07,790 --> 00:04:12,790 ‫and obviously the post process, which is vacuum needs to clean up, right? 51 00:04:13,730 --> 00:04:14,600 ‫Take an example. 52 00:04:14,720 --> 00:04:23,140 ‫If you if you have 100 million transactions and let's say 50 million of those or just read only we, 53 00:04:23,150 --> 00:04:25,070 ‫we only need 50 million. 54 00:04:26,750 --> 00:04:29,510 ‫Transaction ID is you don't need 100 million transaction ideas. 55 00:04:29,990 --> 00:04:32,990 ‫So as the scale read, only our is better. 56 00:04:33,260 --> 00:04:39,410 ‫But whether you specified or not really depends on the database that the or a small enough it knows 57 00:04:39,410 --> 00:04:42,020 ‫that you're now trading effectively, right? 58 00:04:43,280 --> 00:04:48,650 ‫But again, if if you don't if you know you're not reading or you, you're not going to write. 59 00:04:48,870 --> 00:04:53,180 ‫Sure, it's always a good practice to do a read only transaction just to protect you. 60 00:04:53,450 --> 00:04:57,590 ‫So the next question comes from Vitaly Is this Italian? 61 00:04:58,890 --> 00:04:59,570 ‫I believe it is. 62 00:05:00,790 --> 00:05:06,510 ‫You idea versus sequential primary keys, and let's be very careful, what do we mean by a primary key? 63 00:05:06,600 --> 00:05:07,290 ‫We talked about that. 64 00:05:07,290 --> 00:05:14,250 ‫That's why I want you to watch that original fundamental section that I talk about this building block, 65 00:05:14,250 --> 00:05:14,580 ‫right? 66 00:05:14,880 --> 00:05:15,750 ‫Primary key. 67 00:05:16,260 --> 00:05:22,200 ‫It's not just unique, it's the idea of clustering your entire table based on that key. 68 00:05:22,410 --> 00:05:24,630 ‫And we talked also about what a clustering of you mean. 69 00:05:25,980 --> 00:05:34,320 ‫So he's asking here, they're asking whether it's better to use sequential keys like one, two, three, 70 00:05:34,320 --> 00:05:36,960 ‫four or five integers versus you. 71 00:05:37,500 --> 00:05:44,280 ‫This is a very loaded question, and I can't possibly tell you, Oh, always you is sequential, always 72 00:05:44,280 --> 00:05:44,510 ‫you. 73 00:05:45,210 --> 00:05:52,380 ‫There are cases where you are very useful, you know, because it's a universally unique identifier, 74 00:05:53,100 --> 00:05:57,030 ‫the client can generate it like that and then just submit it. 75 00:05:57,330 --> 00:06:02,310 ‫And we almost guarantee that nobody and the war, we're going to have the same idiom. 76 00:06:02,320 --> 00:06:03,540 ‫That's the beauty here. 77 00:06:04,320 --> 00:06:10,410 ‫And when you do that, you don't have to incur the cost of having the database to issue a new sequence 78 00:06:10,410 --> 00:06:10,620 ‫ID. 79 00:06:11,220 --> 00:06:11,380 ‫Right. 80 00:06:11,400 --> 00:06:19,310 ‫Because there is a slight delay, slight, almost insignificant to issue a sequence because what how, 81 00:06:19,400 --> 00:06:20,820 ‫how the sequences work, right? 82 00:06:21,600 --> 00:06:26,730 ‫The sequence of the database level need the database need to ask, OK, what's the next idea I can generate? 83 00:06:27,000 --> 00:06:31,320 ‫Imagine 100 times asking the same sequence RAM. 84 00:06:31,830 --> 00:06:33,910 ‫These need to be cute, right? 85 00:06:33,930 --> 00:06:38,760 ‫Because you cannot just generate concurrent sequence ideas, it has to be in sequence, right? 86 00:06:40,710 --> 00:06:45,450 ‫So so if if two transaction asks for the sequence, I need the first one. 87 00:06:46,680 --> 00:06:47,490 ‫They have to be cute. 88 00:06:47,520 --> 00:06:52,800 ‫The first one will get one, the second will get two, but nobody should get the same idea. 89 00:06:52,800 --> 00:06:55,800 ‫Otherwise, the whole thing will be bust, right? 90 00:06:56,670 --> 00:07:00,780 ‫So that's why building sequences and counts are very difficult. 91 00:07:01,020 --> 00:07:02,460 ‫You need to obtain locks. 92 00:07:02,550 --> 00:07:04,260 ‫The database takes care of that stuff. 93 00:07:04,530 --> 00:07:05,970 ‫It does a small things. 94 00:07:05,970 --> 00:07:13,170 ‫You know, you can either lock, you know, and weigh a block other transactions so this waiting can 95 00:07:13,170 --> 00:07:13,890 ‫slow you down. 96 00:07:14,290 --> 00:07:21,330 ‫Know if you have a flux of things that it depends on on the way you can test this yourself and versus 97 00:07:21,330 --> 00:07:21,960 ‫your ID. 98 00:07:22,290 --> 00:07:28,680 ‫If you have, if they're called generating that, you just slam that insert and it immediately doesn't 99 00:07:28,680 --> 00:07:29,550 ‫need to do anything. 100 00:07:29,550 --> 00:07:30,510 ‫Just insert the. 101 00:07:31,790 --> 00:07:36,330 ‫Doesn't need to do anything, of course, it need to go and update the indexes and all that stuff, 102 00:07:36,330 --> 00:07:41,390 ‫but it's it doesn't have that extra auto sequence right now. 103 00:07:41,400 --> 00:07:42,810 ‫Obviously, databases are smart enough. 104 00:07:42,810 --> 00:07:44,870 ‫They they try to optimize that as much as possible. 105 00:07:44,880 --> 00:07:50,790 ‫But just think about them now because you it is unique and universally unique. 106 00:07:51,940 --> 00:07:52,840 ‫It means. 107 00:07:54,500 --> 00:07:55,400 ‫It is expensive. 108 00:07:55,490 --> 00:07:58,850 ‫Why, if you put it as a primary key? 109 00:07:59,720 --> 00:08:02,600 ‫Well, let's think about that for a minute. 110 00:08:03,960 --> 00:08:05,280 ‫How big is that unity? 111 00:08:05,910 --> 00:08:07,230 ‫Let me check a little bit quick. 112 00:08:07,860 --> 00:08:11,130 ‫So it's 128 bit, 128 bit. 113 00:08:11,580 --> 00:08:13,650 ‫So that's 16 byte. 114 00:08:14,710 --> 00:08:20,430 ‫It depends how you actually store the unity on all of these details. 115 00:08:20,430 --> 00:08:22,890 ‫You need to know about this, guys. 116 00:08:23,370 --> 00:08:27,120 ‫You cannot just assume everything will be done for you. 117 00:08:27,330 --> 00:08:29,820 ‫You are responsible to understand all this stuff. 118 00:08:30,270 --> 00:08:31,710 ‫So how is this going to start? 119 00:08:31,740 --> 00:08:33,180 ‫Are we going to start as a strength? 120 00:08:33,450 --> 00:08:36,270 ‫If it's a string, it's even worse, right? 121 00:08:36,510 --> 00:08:38,550 ‫Because it's going to have these dashes with it. 122 00:08:39,150 --> 00:08:41,700 ‫If it's a string, then how many? 123 00:08:41,700 --> 00:08:43,560 ‫How many of these letters do the math? 124 00:08:43,890 --> 00:08:51,030 ‫So if you start as a string that that unit that you your idea will have the dashes will have the characters 125 00:08:51,030 --> 00:08:56,250 ‫in hexadecimal and you're going to store it in, what, 36 bytes. 126 00:08:56,640 --> 00:09:01,620 ‫So even more so if you start this thing, it's even more expensive than storing it. 127 00:09:01,620 --> 00:09:05,580 ‫As a native, you use it if the if your database actually supports them. 128 00:09:05,790 --> 00:09:10,650 ‫So 16 by minimum, 36 if it's a string, so that takes. 129 00:09:11,810 --> 00:09:12,800 ‫Size, right? 130 00:09:13,100 --> 00:09:14,300 ‫You might say, I don't care. 131 00:09:15,320 --> 00:09:18,110 ‫I have all the disk storage on the war. 132 00:09:18,470 --> 00:09:21,110 ‫No, nobody cares about storage anymore. 133 00:09:21,110 --> 00:09:22,280 ‫We know we have that. 134 00:09:22,610 --> 00:09:27,410 ‫It's the cost of putting that -- thing in memory and scanning it. 135 00:09:28,280 --> 00:09:36,800 ‫It's the cost of copying this primary key and all the indexes that are secondary because secondary indexes. 136 00:09:37,070 --> 00:09:40,370 ‫Again, we're talking about general primary key secondary. 137 00:09:40,820 --> 00:09:43,000 ‫In a sense that is not post-Christmas. 138 00:09:43,000 --> 00:09:44,270 ‫Chris does things differently. 139 00:09:44,990 --> 00:09:45,260 ‫Right? 140 00:09:45,350 --> 00:09:55,790 ‫Secondary Index says if you have a secondary index, the value of the keys or what you index, but the 141 00:09:55,790 --> 00:09:59,510 ‫value of that secondary index is the primary key. 142 00:09:59,540 --> 00:10:09,110 ‫So if you have a primary key on you, it then would get us land there 16 Bhide or 32 36 buy in every 143 00:10:09,110 --> 00:10:09,770 ‫entry. 144 00:10:10,100 --> 00:10:13,550 ‫In the secondary index, Dow bloats the secondary indexes. 145 00:10:13,850 --> 00:10:16,700 ‫What happens when we both as a contender, says I don't care. 146 00:10:16,880 --> 00:10:18,100 ‫We have all the size of the world. 147 00:10:18,110 --> 00:10:19,160 ‫We have all the time in the world. 148 00:10:19,160 --> 00:10:19,490 ‫No. 149 00:10:20,000 --> 00:10:26,420 ‫If you build the secondary indexes, then scanning the B3, which we have a whole section on becomes 150 00:10:26,420 --> 00:10:27,440 ‫slower, right? 151 00:10:27,440 --> 00:10:31,790 ‫Because at a read of a page, we'll have fewer rows. 152 00:10:32,210 --> 00:10:38,570 ‫If you have that kind of fat keys, right, it's just it's just logic. 153 00:10:38,960 --> 00:10:46,190 ‫You know, if you have a one byte, if I pitch at four pages, I have four thousand bytes right versus 154 00:10:46,190 --> 00:10:50,990 ‫if I have 16 or I have much, much fewer roads, right? 155 00:10:50,990 --> 00:10:56,630 ‫So the value of an IOU becomes less and less valuable. 156 00:10:56,720 --> 00:11:02,810 ‫You don't get as many keys, so your you find yourself going back to the memory. 157 00:11:03,020 --> 00:11:05,060 ‫Going back to desk, doing a lot of I. 158 00:11:05,540 --> 00:11:07,060 ‫So that's the cost of you. 159 00:11:07,070 --> 00:11:07,940 ‫It is right. 160 00:11:08,570 --> 00:11:11,510 ‫Another cost of new ideas is the randomness. 161 00:11:11,840 --> 00:11:16,100 ‫People don't like databases, don't like random stuff at all. 162 00:11:16,610 --> 00:11:25,100 ‫And the reason is when you issue this, you insert specifically rhyme and reads, for that matter. 163 00:11:25,440 --> 00:11:25,670 ‫Right? 164 00:11:26,060 --> 00:11:26,630 ‫What do we do? 165 00:11:26,660 --> 00:11:27,530 ‫Let's talk about it. 166 00:11:27,830 --> 00:11:32,370 ‫If I go, Hey, pitch me this role where you idea, call blah. 167 00:11:32,690 --> 00:11:38,300 ‫Yeah, well, the database will do a plan, then find out that you have an index beautiful and will 168 00:11:38,300 --> 00:11:43,370 ‫go and will find based on the B three find page. 169 00:11:43,790 --> 00:11:49,220 ‫So in that page, you have not only that you do, you have a bunch of other new ideas that happens to 170 00:11:49,220 --> 00:11:55,160 ‫have the same order RAM because indexes are always order, right? 171 00:11:55,850 --> 00:11:59,990 ‫So now we have these ideas in order and then we pull the whole page. 172 00:12:00,650 --> 00:12:01,580 ‫That's how we do things. 173 00:12:02,030 --> 00:12:04,250 ‫So we have a beautiful page that we just read. 174 00:12:04,700 --> 00:12:07,370 ‫So let's say if someone else just read. 175 00:12:08,680 --> 00:12:09,640 ‫Some other you idea. 176 00:12:11,320 --> 00:12:15,730 ‫In that case, we're going to scan the battery and almost never. 177 00:12:16,180 --> 00:12:20,790 ‫We're going to hit a page that is already cached because it's not, it's not. 178 00:12:21,220 --> 00:12:22,510 ‫It's not an order at all. 179 00:12:23,110 --> 00:12:26,140 ‫Then what is the chances that they're you? 180 00:12:26,440 --> 00:12:30,050 ‫I'm going to read is next to the one that I just read them. 181 00:12:30,340 --> 00:12:36,070 ‫You might you can say something about the same thing about sequencer, but it's all it's not as common. 182 00:12:36,170 --> 00:12:37,180 ‫They were going to come to that. 183 00:12:37,750 --> 00:12:40,300 ‫So you have a lot of random reads. 184 00:12:40,510 --> 00:12:47,230 ‫So you're phishing pages and you only pick one row and then another one wants to do something and then 185 00:12:47,230 --> 00:12:49,450 ‫you pick another page that doesn't. 186 00:12:49,720 --> 00:12:50,110 ‫It's not. 187 00:12:50,350 --> 00:12:55,720 ‫It's not in the RAM, it's not in the cache as on the cache that you find yourself doing nil, which 188 00:12:55,720 --> 00:12:56,410 ‫are expensive. 189 00:12:57,040 --> 00:12:59,380 ‫So these random reads sales performance. 190 00:12:59,680 --> 00:13:00,550 ‫Same thing with insert. 191 00:13:00,760 --> 00:13:06,910 ‫When you insert something, we fetch where we want to insert it and we inserted and we keep that dirty 192 00:13:06,910 --> 00:13:08,830 ‫page in cache, right? 193 00:13:08,830 --> 00:13:14,800 ‫So that hopefully we're hoping that someone else will insert something that goes into the same dirty 194 00:13:14,800 --> 00:13:16,240 ‫page before we flushed. 195 00:13:16,240 --> 00:13:24,070 ‫2-Disc This way, how we can obviously get it better performance with with with sequential IDs. 196 00:13:24,850 --> 00:13:29,350 ‫Every time you construct one, two or three or four or five, you're going to pull that page where one 197 00:13:29,350 --> 00:13:34,360 ‫exists and the next 10 or 20 IDs and then you've got to insert them. 198 00:13:35,320 --> 00:13:38,650 ‫So the two comes in, Oh, we have a nice and hot and memory. 199 00:13:38,650 --> 00:13:39,970 ‫Write it right to drive. 200 00:13:40,000 --> 00:13:45,550 ‫Obviously, also right to the wall and the the the the reader reader logs. 201 00:13:45,840 --> 00:13:46,120 ‫Right? 202 00:13:46,840 --> 00:13:47,860 ‫And you also write on the page. 203 00:13:48,100 --> 00:13:55,450 ‫So now you don't have to go and fetch that dirty another page where that idea should exist, right? 204 00:13:55,720 --> 00:13:57,910 ‫So you have nice in memory, right? 205 00:13:57,940 --> 00:13:58,500 ‫Beautiful. 206 00:13:58,510 --> 00:14:02,110 ‫We love those with reads with sequential IDs. 207 00:14:02,230 --> 00:14:07,120 ‫You can do between right there because i o between the I.D. one and 10 is very common. 208 00:14:07,330 --> 00:14:09,610 ‫Oh, give me all the students between one on 10. 209 00:14:10,090 --> 00:14:10,600 ‫It's none of that. 210 00:14:12,250 --> 00:14:15,120 ‫It's not really a good query, but you get my point. 211 00:14:15,490 --> 00:14:21,640 ‫Sometimes the IDs you do arranges right, and the Rangers are the best thing you can do because ranges 212 00:14:21,640 --> 00:14:23,860 ‫are usually consequently. 213 00:14:25,050 --> 00:14:31,530 ‫Tucked in together, unless obviously you have gaps and right, if you have these gaps and fragmented 214 00:14:31,540 --> 00:14:37,920 ‫index, then that's another question, but it's probably another topic, right? 215 00:14:38,190 --> 00:14:38,700 ‫But that's it. 216 00:14:38,730 --> 00:14:40,570 ‫That's I think that's a that's basically it. 217 00:14:41,040 --> 00:14:43,500 ‫You idea versus a sequential primary key. 218 00:14:43,510 --> 00:14:45,720 ‫I think that gives you kind of an idea. 219 00:14:46,290 --> 00:14:46,650 ‫Right? 220 00:14:46,860 --> 00:14:51,000 ‫Just understand I don't have an answer just like, Oh, you always use this, always use that. 221 00:14:52,230 --> 00:14:52,980 ‫Play with it. 222 00:14:53,040 --> 00:14:53,790 ‫Try. 223 00:14:53,850 --> 00:14:54,810 ‫Understand. 224 00:14:54,810 --> 00:14:56,220 ‫That's the most important important thing. 225 00:14:56,340 --> 00:15:00,570 ‫Just understand the only to do anything if you know what's happening. 226 00:15:01,050 --> 00:15:02,270 ‫You don't need any of us. 227 00:15:02,280 --> 00:15:04,980 ‫You don't need help from anyone because you don't. 228 00:15:05,010 --> 00:15:06,180 ‫You know what's going on. 229 00:15:06,960 --> 00:15:09,840 ‫If you know what's going on, you don't need help from anyone. 230 00:15:09,840 --> 00:15:10,990 ‫You don't even stack overflow. 231 00:15:11,010 --> 00:15:11,790 ‫You don't need housing. 232 00:15:11,800 --> 00:15:14,070 ‫You don't need anyone because this is house. 233 00:15:14,340 --> 00:15:15,180 ‫This is how it works. 234 00:15:16,250 --> 00:15:21,570 ‫Once you understand how things work, right, you will follow your own path. 235 00:15:21,590 --> 00:15:23,150 ‫You will just find an answer. 236 00:15:23,840 --> 00:15:26,510 ‫This is, of course, I'm for my use case. 237 00:15:26,930 --> 00:15:31,550 ‫I'm going to use your IDs because, hey, I don't insert a lot and I'm fine. 238 00:15:31,550 --> 00:15:34,700 ‫If if we have some random reads, who cares? 239 00:15:35,400 --> 00:15:38,210 ‫I have a I have all the rounds in the world, right? 240 00:15:38,690 --> 00:15:42,080 ‫So it all depends on you just understand. 241 00:15:42,080 --> 00:15:43,190 ‫And that's the most important thing. 242 00:15:43,220 --> 00:15:44,150 ‫Hope it helps. 243 00:15:44,570 --> 00:15:45,530 ‫Can to see you in the next one. 244 00:15:45,590 --> 00:15:46,640 ‫Enjoy the course. 245 00:15:47,000 --> 00:15:47,270 ‫Goodbye.