1 00:00:00,060 --> 00:00:01,210 ‫What is going on, guys? 2 00:00:01,230 --> 00:00:11,910 ‫My name is Hussein, and this is an old bug gold article, Uber Engineering switched from postscripts 3 00:00:11,910 --> 00:00:19,140 ‫to my school and this article was published on July 26, 2016. 4 00:00:19,140 --> 00:00:29,150 ‫And this article explains why Uber moved from postcrisis to my school back in the days. 5 00:00:29,160 --> 00:00:35,580 ‫I remember that this article got a lot of backlash from the Postgres community and actually the whole 6 00:00:35,580 --> 00:00:45,720 ‫database community, to be honest, because of of how the language used in this article severely criticized 7 00:00:45,720 --> 00:00:48,090 ‫Posterous as if it's a bad database. 8 00:00:48,120 --> 00:00:48,450 ‫Right. 9 00:00:48,510 --> 00:00:50,160 ‫They don't even mention that thing. 10 00:00:50,190 --> 00:00:53,310 ‫That is a hey, by the way, guys, this is just didn't work for us. 11 00:00:53,670 --> 00:00:55,370 ‫Doesn't mean it won't work for you. 12 00:00:55,800 --> 00:01:01,440 ‫So that's that's was then that was the main reason this article was heavily criticized. 13 00:01:01,440 --> 00:01:04,350 ‫I'm going to reference this article and the Hacker News. 14 00:01:07,050 --> 00:01:13,290 ‫A thread that explains that you're going to have a lot of discussions as some discussions go into the 15 00:01:13,290 --> 00:01:17,480 ‫deep, some of the discussions kind of point the flaws of this article. 16 00:01:17,880 --> 00:01:24,360 ‫But what I want to do in this video podcast is going to go through this article and through the main 17 00:01:24,360 --> 00:01:27,570 ‫pain point that Obama had and then discuss them. 18 00:01:27,720 --> 00:01:33,760 ‫Give you my personal opinion, whether I think you moving from postscripts they did. 19 00:01:33,780 --> 00:01:38,380 ‫They have to move from post goes to my school or not, all of that stuff. 20 00:01:38,460 --> 00:01:39,990 ‫How about we jump into it, guys? 21 00:01:41,010 --> 00:01:48,150 ‫So, guys, first they explained that their architecture right here, they have the monolithic back 22 00:01:48,150 --> 00:01:54,450 ‫in application written in Python that used Postgres for data persistence and they are moving again. 23 00:01:54,450 --> 00:01:56,000 ‫This is 2016. 24 00:01:56,020 --> 00:02:02,580 ‫Things change and change, but they are moving to a micro services architecture and surprisingly to 25 00:02:02,580 --> 00:02:09,500 ‫a new system using schema lists, synovial database, sharding layer built on top of my school. 26 00:02:09,510 --> 00:02:11,610 ‫So we're going to talk about that a little bit. 27 00:02:11,610 --> 00:02:13,860 ‫That that just that is a little bit of flak. 28 00:02:14,310 --> 00:02:18,090 ‫You might you might you might say, Hosain, why schema list and MySQL. 29 00:02:18,090 --> 00:02:19,310 ‫That doesn't make any sense. 30 00:02:19,320 --> 00:02:20,540 ‫Right, exactly. 31 00:02:20,550 --> 00:02:22,110 ‫That's a lot of people confused a lot. 32 00:02:22,120 --> 00:02:22,760 ‫Oh, OK. 33 00:02:23,190 --> 00:02:25,170 ‫Why would you pick my school? 34 00:02:26,110 --> 00:02:30,610 ‫For let's just go with with Cassandre Cockroach DB. 35 00:02:30,640 --> 00:02:34,480 ‫I don't think Cockroach TV was born back then, but. 36 00:02:34,480 --> 00:02:35,680 ‫But Farnam. 37 00:02:37,440 --> 00:02:39,070 ‫Mango, right? 38 00:02:39,240 --> 00:02:42,810 ‫Anything but, yeah, again, they have their own reasons. 39 00:02:43,650 --> 00:02:45,300 ‫And that's another article. 40 00:02:45,300 --> 00:02:49,790 ‫But what I want to focus in here is the architecture of Bosco's, as they claim. 41 00:02:50,280 --> 00:02:52,710 ‫So is the for the people listening. 42 00:02:52,740 --> 00:02:55,620 ‫We're reading now the architecture of Postgres. 43 00:02:55,620 --> 00:03:06,630 ‫And I'm going to read the article, The Five Most Pain Point that led UBA to move from Bosco's to my 44 00:03:06,630 --> 00:03:06,960 ‫school. 45 00:03:06,960 --> 00:03:07,260 ‫So. 46 00:03:08,530 --> 00:03:13,810 ‫So this is this is the article now we encountered many Posterous limitations. 47 00:03:15,390 --> 00:03:22,320 ‫The first one in inefficient architecture for rights, the second one efficient data application, the 48 00:03:22,320 --> 00:03:29,940 ‫third one issue with table corruption issues with table corruption issues, not issue and is poor. 49 00:03:30,390 --> 00:03:36,150 ‫The fourth one, poor replica NBCC multiverses and concurrently control support. 50 00:03:37,230 --> 00:03:42,020 ‫And the fifth one is the final one difficulty upgrading to new releases. 51 00:03:42,030 --> 00:03:47,560 ‫I can agree with some of them because I use postcrisis and I know how painful is to upgrade both postscripts 52 00:03:47,580 --> 00:03:50,010 ‫so I can relate it to that. 53 00:03:50,250 --> 00:03:54,000 ‫I understand that this is a little bit easier process right now. 54 00:03:54,900 --> 00:03:57,300 ‫But nevertheless, how about we. 55 00:03:57,570 --> 00:04:01,590 ‫I don't agree with all the points, by the way, but I'm just reading to you. 56 00:04:01,590 --> 00:04:02,730 ‫I agree with some of them. 57 00:04:02,850 --> 00:04:06,150 ‫Some of them is just to me, preposterous. 58 00:04:07,170 --> 00:04:08,370 ‫So how about we jump into it? 59 00:04:08,370 --> 00:04:14,700 ‫So they look through the limitation and they decide to move to a MySQL because it solves most of these 60 00:04:14,700 --> 00:04:15,120 ‫problems. 61 00:04:15,120 --> 00:04:21,180 ‫So how about we jump into one point, one point after another? 62 00:04:21,180 --> 00:04:25,170 ‫So the first point here, that's called the on disk format. 63 00:04:25,500 --> 00:04:35,490 ‫And they are describing in this article in this section that that on disk format of Postgres that implements 64 00:04:35,490 --> 00:04:37,220 ‫a multi version concurrency controller. 65 00:04:37,260 --> 00:04:45,150 ‫And we talked about it many times on this channel, how the actual indexes are stored, how secondary 66 00:04:45,150 --> 00:04:47,380 ‫and indexes are stored and postscripts. 67 00:04:47,400 --> 00:04:51,540 ‫How do they implement the multi version concurrency controller using the transaction ID? 68 00:04:51,540 --> 00:04:52,160 ‫That's the idea. 69 00:04:52,170 --> 00:04:52,500 ‫Right. 70 00:04:53,220 --> 00:04:55,290 ‫And then Max Idee and Maniatty. 71 00:04:55,290 --> 00:04:57,690 ‫And how are you how a role becomes visible? 72 00:04:57,690 --> 00:05:04,710 ‫My transaction, once I go out of scope of the transaction, I need to do a vacuum to clean up those 73 00:05:05,040 --> 00:05:08,370 ‫rows that need no longer seem by any other transaction. 74 00:05:08,380 --> 00:05:15,540 ‫So that's all it comes down to the isolation and all that stuff that we talked about many times, this 75 00:05:15,540 --> 00:05:15,830 ‫channel. 76 00:05:15,990 --> 00:05:19,890 ‫So check out the ASADA video here to learn about isolation, ASADA Tomasetti. 77 00:05:19,890 --> 00:05:21,270 ‫I'm not going to explain it right here. 78 00:05:21,990 --> 00:05:23,720 ‫So that's that's why I explain here. 79 00:05:23,850 --> 00:05:32,340 ‫So the what they are going through here is they have a table called users and they showing you how postscripts 80 00:05:32,340 --> 00:05:35,550 ‫works so well for the people listening on the podcast. 81 00:05:35,820 --> 00:05:38,880 ‫We're looking at the table with four columns. 82 00:05:38,880 --> 00:05:42,420 ‫I'd first, last and Bourdier. 83 00:05:43,050 --> 00:05:45,720 ‫So first name, last name and then birthday. 84 00:05:46,620 --> 00:05:48,690 ‫And then there is an ID is a number. 85 00:05:48,750 --> 00:05:54,060 ‫First name is obviously starting last names, a strength character and then Boissiere and they show 86 00:05:54,060 --> 00:06:02,460 ‫you how this is on disk and there is a seated which is which, which is a transaction ID that is stored. 87 00:06:03,300 --> 00:06:08,360 ‫That's basically the tuple reference on the disk. 88 00:06:08,360 --> 00:06:10,230 ‫And this is very, very important. 89 00:06:10,620 --> 00:06:14,070 ‫So these tables are ABCDE, PFG and so on. 90 00:06:14,370 --> 00:06:18,600 ‫And so the primary key, they have an index on the primary key, which is the ID. 91 00:06:18,600 --> 00:06:22,770 ‫They have a secondary index on the first, last and birth year. 92 00:06:22,770 --> 00:06:24,920 ‫So they have indexes on all of them. 93 00:06:24,930 --> 00:06:26,520 ‫And again, this is just an example. 94 00:06:26,520 --> 00:06:30,930 ‫They didn't show us there with their architecture for security reasons, probably. 95 00:06:31,320 --> 00:06:33,600 ‫So they don't they don't show us their scheme. 96 00:06:33,600 --> 00:06:34,770 ‫I'm nothing like that. 97 00:06:35,250 --> 00:06:41,760 ‫But from this example that tells me that they have a lot of indexes, so pay attention to that. 98 00:06:42,120 --> 00:06:50,280 ‫So PostgreSQL and their primary key and secondary, he always points to the tuple ID, which is the 99 00:06:50,280 --> 00:06:52,560 ‫physical representation on disk. 100 00:06:53,820 --> 00:06:57,060 ‫And here's here's here's how PostScript works. 101 00:06:57,300 --> 00:07:07,350 ‫So if you now go ahead and update a row, any role in this table, what we do is we essentially insult 102 00:07:07,350 --> 00:07:12,630 ‫insert a duplicate role within you to pull it this time. 103 00:07:13,800 --> 00:07:20,490 ‫And now that we have a new idea, we need to point the indexes. 104 00:07:21,460 --> 00:07:27,370 ‫The secondary indexes and pretty much everything they use is the stability to the new representation. 105 00:07:27,780 --> 00:07:28,020 ‫Right. 106 00:07:28,570 --> 00:07:34,420 ‫And that takes a finite amount of time, a finite amount of rights. 107 00:07:34,700 --> 00:07:40,030 ‫Finally, I want to work for progress to do right, because everything points directly to the desk, 108 00:07:40,240 --> 00:07:48,520 ‫just like my eye, my isaam I and in my skull, that's exactly the same architecture where everything 109 00:07:48,520 --> 00:07:49,720 ‫points directly to the desk. 110 00:07:49,720 --> 00:07:51,610 ‫And you might say, what's bad about this? 111 00:07:53,080 --> 00:07:54,310 ‫There is good and bad. 112 00:07:54,580 --> 00:07:56,740 ‫The bad thing is what they are explaining. 113 00:07:56,740 --> 00:07:57,790 ‫Explaining here is this. 114 00:07:57,850 --> 00:08:06,550 ‫Hey, the moment we touch any role, I, I have to update all the indexes, including the primary key, 115 00:08:06,880 --> 00:08:16,420 ‫because now all these indexes have a new the those entries have a new I.D. that I need to pick up tuple 116 00:08:16,420 --> 00:08:16,780 ‫ID. 117 00:08:16,930 --> 00:08:17,950 ‫So I have to update that. 118 00:08:17,950 --> 00:08:21,160 ‫And that obviously takes a ripple effect. 119 00:08:21,330 --> 00:08:22,330 ‫It's called they called it. 120 00:08:22,330 --> 00:08:22,660 ‫Right. 121 00:08:22,660 --> 00:08:24,610 ‫Amplification and the coming slides. 122 00:08:24,610 --> 00:08:24,860 ‫Right. 123 00:08:25,540 --> 00:08:33,610 ‫And this logical writer, I updated a single field in a single row, yet it results on five, six, 124 00:08:33,610 --> 00:08:39,850 ‫seven, write physical rights to disk because you're updating the secondary index, the secondary and 125 00:08:39,850 --> 00:08:40,320 ‫the second. 126 00:08:40,330 --> 00:08:43,720 ‫If you have a lot of indexes, this even gets slower and slower and slower. 127 00:08:44,330 --> 00:08:44,700 ‫Right. 128 00:08:45,220 --> 00:08:47,380 ‫So bear with me. 129 00:08:47,380 --> 00:08:50,470 ‫Here is this is just explaining their point now. 130 00:08:50,770 --> 00:08:54,010 ‫So there they go through all of that's exactly what I said. 131 00:08:54,910 --> 00:08:57,640 ‫And that as a result. 132 00:08:59,380 --> 00:09:06,610 ‫Slows things down because first of all, rights are not just right, are not sloper, say, because 133 00:09:06,610 --> 00:09:08,290 ‫you do flush right. 134 00:09:08,300 --> 00:09:12,070 ‫You have a lot of rights to do and then you do all of them at once. 135 00:09:12,940 --> 00:09:16,930 ‫But the side effect of the rights, we're going to explain it in a minute. 136 00:09:16,990 --> 00:09:21,910 ‫It's like one single thing translate to a lot of physical rights. 137 00:09:22,240 --> 00:09:23,860 ‫Have to update this index is index this. 138 00:09:23,890 --> 00:09:28,810 ‫This isn't the right to log, which is which is something we're going to explain in this article. 139 00:09:28,810 --> 00:09:34,790 ‫A lot gets large when you when you want to apply these changes. 140 00:09:35,260 --> 00:09:36,560 ‫So that's the first thing on this. 141 00:09:37,300 --> 00:09:41,680 ‫They go through the on desk representation of PostgreSQL. 142 00:09:42,220 --> 00:09:43,240 ‫Then they explain. 143 00:09:43,240 --> 00:09:48,370 ‫The second point is replication and replication gear. 144 00:09:48,370 --> 00:09:54,190 ‫Guys, as I discussed, the right ahead log is basically if I do an uncertified one, an update. 145 00:09:54,520 --> 00:09:59,470 ‫This statement is translated into physical changes. 146 00:09:59,740 --> 00:10:05,980 ‫They go to this block and change this location and and replace this value with this value. 147 00:10:06,310 --> 00:10:06,700 ‫Right. 148 00:10:06,940 --> 00:10:10,060 ‫Or go to this index and change this value to this value. 149 00:10:10,060 --> 00:10:11,900 ‫Go to this index on this position changes. 150 00:10:12,160 --> 00:10:18,910 ‫These changes are written in the right ahead log as actual disc changes. 151 00:10:20,110 --> 00:10:24,530 ‫OK, so this is a very, very, very important thing to know. 152 00:10:25,240 --> 00:10:26,770 ‫Now, this is the right to head. 153 00:10:26,770 --> 00:10:31,720 ‫Look, we have this and also the head log is has its its own structure. 154 00:10:32,140 --> 00:10:32,490 ‫Right. 155 00:10:32,500 --> 00:10:35,740 ‫It's somewhere else and it's being maintained. 156 00:10:35,750 --> 00:10:38,380 ‫So that's also the right ahead log. 157 00:10:38,380 --> 00:10:44,680 ‫Also has a physical representation on disk when you translate it into and this is the right. 158 00:10:44,920 --> 00:10:46,660 ‫So there is a lot that's going on. 159 00:10:47,560 --> 00:10:52,690 ‫So now when you come to your application, which we talked about right here, guys, and also discussed 160 00:10:52,690 --> 00:10:58,570 ‫in my course introduction to Derivs Engineering, the idea of having a primary database except the rights 161 00:10:58,960 --> 00:11:04,660 ‫and stand by her application for reads that that you can they can take reads. 162 00:11:04,690 --> 00:11:10,480 ‫You need to push these changes and the way you push them as you push them to right ahead log, which 163 00:11:10,480 --> 00:11:18,010 ‫is a very consistent thing, down the standby databases so they can get up to date. 164 00:11:18,670 --> 00:11:18,990 ‫Right. 165 00:11:19,080 --> 00:11:21,340 ‫That sounds simple, you know. 166 00:11:21,940 --> 00:11:26,650 ‫And what what what what they explain in that application format. 167 00:11:26,650 --> 00:11:31,810 ‫And this is where kind of their point about the limitation of postcrisis. 168 00:11:32,620 --> 00:11:33,550 ‫You guys have this. 169 00:11:33,910 --> 00:11:38,140 ‫First of all, you have this war or I Tadlock, which is quite large. 170 00:11:38,140 --> 00:11:38,500 ‫Why? 171 00:11:38,500 --> 00:11:46,900 ‫Because a single update statement translated into multiple rights and these rights are made its way 172 00:11:46,900 --> 00:11:48,160 ‫to the right ahead log. 173 00:11:48,290 --> 00:11:49,080 ‫Hmm. 174 00:11:49,570 --> 00:11:54,130 ‫So the the right to log doesn't have to update the stable. 175 00:11:54,160 --> 00:12:02,830 ‫It's not like a statement based replication despite Moscow's support statement based application through 176 00:12:02,830 --> 00:12:03,460 ‫a third party. 177 00:12:03,460 --> 00:12:12,370 ‫I believe the my school have both supported both that statement based application and also right based 178 00:12:12,370 --> 00:12:12,910 ‫application. 179 00:12:13,330 --> 00:12:15,700 ‫There is, again, pros and cons for both. 180 00:12:17,080 --> 00:12:23,920 ‫So now when when we try to apply that came back, that dog was barking. 181 00:12:23,920 --> 00:12:24,310 ‫All right. 182 00:12:25,150 --> 00:12:26,620 ‫So the master. 183 00:12:27,560 --> 00:12:35,090 ‫Database pushes the wall, changes down to the standby databases so they can get updated, but you might 184 00:12:35,090 --> 00:12:38,720 ‫say, Hosain, what if the standby is actually executing a query? 185 00:12:39,080 --> 00:12:42,100 ‫Do this, just do we just stop this query? 186 00:12:42,890 --> 00:12:43,300 ‫Right. 187 00:12:43,440 --> 00:12:49,400 ‫I see a McSweeny query in the standby to read something that happened to be deleted in the master, 188 00:12:49,820 --> 00:12:51,830 ‫and it's being written directly. 189 00:12:51,830 --> 00:12:53,000 ‫So I stop there. 190 00:12:53,420 --> 00:12:54,440 ‫Do I stop the query. 191 00:12:54,440 --> 00:12:55,220 ‫Do I wait. 192 00:12:55,490 --> 00:13:02,660 ‫All these questions are going to get answered in a minute and as a result it will shape their decision 193 00:13:02,780 --> 00:13:03,860 ‫to move to my school. 194 00:13:04,010 --> 00:13:06,360 ‫I'm going to just get them to explain it to you now. 195 00:13:07,430 --> 00:13:12,890 ‫So now they have we talked about the underscore and the underscore presentation. 196 00:13:12,890 --> 00:13:19,180 ‫We talked about our application and now we're going to talk about the consequences of Posterous design. 197 00:13:19,190 --> 00:13:24,920 ‫That's that third point here where the problems of Bosco's. 198 00:13:24,920 --> 00:13:29,120 ‫So let's enjoy this reading is the first problem is right. 199 00:13:29,120 --> 00:13:32,000 ‫Amplification and write amplification is apparently something. 200 00:13:32,000 --> 00:13:41,840 ‫It is in SSD where a single right that you think it's logical translate to many, many, many physical 201 00:13:41,840 --> 00:13:44,750 ‫rights, especially A.D.s, as he does his own thing. 202 00:13:44,760 --> 00:13:54,390 ‫So when you update versus insert eses deed does a little bit different thing is this D love to insert 203 00:13:54,440 --> 00:14:02,090 ‫new things, you have to like to logically just insert new things and create new pages. 204 00:14:02,840 --> 00:14:12,680 ‫SSD does not do well with updates because the goal of SSD is to have a page and flush it. 205 00:14:12,680 --> 00:14:15,290 ‫In order to update an existing page. 206 00:14:15,290 --> 00:14:17,900 ‫You have to invalidate that existing page. 207 00:14:18,530 --> 00:14:23,170 ‫You take it and then you copy it, change it and then write it. 208 00:14:23,180 --> 00:14:28,150 ‫So there is a little bit more work when it comes to an update versus an instance, which is faster. 209 00:14:28,160 --> 00:14:35,780 ‫So that's just that's just that's all the reason why Google invented the level DB database and then 210 00:14:35,780 --> 00:14:38,030 ‫why Facebook invented Rock CDB. 211 00:14:39,260 --> 00:14:45,830 ‫On top of that, I think that to to take advantage of SSD and they built a completely different structure 212 00:14:45,830 --> 00:14:52,220 ‫called the log structured military, where it's it's optimized for inserts and instead of updates, 213 00:14:52,820 --> 00:14:56,030 ‫everything is an insert almost in their log structure. 214 00:14:56,050 --> 00:14:59,960 ‫Mircera So so that's that's the idea of right. 215 00:14:59,960 --> 00:15:00,600 ‫Amplification. 216 00:15:00,620 --> 00:15:04,940 ‫This is now take that and amplify it at the Bosphorus level. 217 00:15:05,210 --> 00:15:14,030 ‫I that a client I'm doing a single update statement to my table and if I have like 700 indexers, I 218 00:15:14,030 --> 00:15:17,750 ‫just made 700 updates, physical updates. 219 00:15:18,970 --> 00:15:22,780 ‫As a result of my single logical update. 220 00:15:24,270 --> 00:15:33,630 ‫So that this 700 update also at the highest level, translate to many, many physical amplified as the 221 00:15:33,630 --> 00:15:40,530 ‫updates because you going writes and pages and this is deeper thing, they have a limited shelf life. 222 00:15:40,530 --> 00:15:45,630 ‫So if you have if you have a limited shelf life, it is as you can only write so much. 223 00:15:45,630 --> 00:15:49,980 ‫I think there is a number that varies between a disc and another. 224 00:15:50,400 --> 00:15:55,320 ‫But in general it's essentially I think 12000 times or something like that. 225 00:15:55,470 --> 00:15:56,760 ‫That's most of them. 226 00:15:57,790 --> 00:15:59,740 ‫So so they explain this here. 227 00:15:59,760 --> 00:16:01,200 ‫I just summarized it to you. 228 00:16:01,290 --> 00:16:01,650 ‫Right. 229 00:16:01,650 --> 00:16:03,270 ‫Amplification is a problem for them. 230 00:16:03,280 --> 00:16:11,670 ‫So there is is this getting is getting their life span of us is getting lower and lower because of the 231 00:16:11,670 --> 00:16:16,740 ‫right amplification, because those guys have hundreds and hundreds of indexes. 232 00:16:16,770 --> 00:16:18,840 ‫Why would you have this much indexes? 233 00:16:19,200 --> 00:16:19,800 ‫Beats me. 234 00:16:19,980 --> 00:16:21,880 ‫Do you really carry on all of them? 235 00:16:22,410 --> 00:16:28,510 ‫Do you really query on first name, right or last name? 236 00:16:29,640 --> 00:16:31,890 ‫That's why adding indexes great. 237 00:16:32,160 --> 00:16:34,840 ‫Adding too much indexes is just a bad idea. 238 00:16:35,400 --> 00:16:37,310 ‫So that's the right amplification problem. 239 00:16:40,400 --> 00:16:47,420 ‫The second problem they want to discuss here is the replication problem, guys, take the same thing 240 00:16:47,420 --> 00:16:47,870 ‫that we did. 241 00:16:47,870 --> 00:16:54,740 ‫We did a single update that translated to lots of update to all the indexes because all of the indexes 242 00:16:54,740 --> 00:16:56,840 ‫point to the row directly. 243 00:16:56,840 --> 00:16:59,160 ‫So and the rotch idea changes. 244 00:16:59,510 --> 00:17:02,580 ‫So we have to make them aware of this row of changes. 245 00:17:02,960 --> 00:17:06,020 ‫So all of these indexes point to the role directly. 246 00:17:06,350 --> 00:17:08,970 ‫So these changes are just amplified. 247 00:17:09,500 --> 00:17:13,700 ‫Now, these changes translate to walk to a wall, right? 248 00:17:14,060 --> 00:17:14,310 ‫Right. 249 00:17:14,310 --> 00:17:16,730 ‫TADLOCK Hey, update. 250 00:17:16,740 --> 00:17:20,020 ‫This was a goddess and this index and this second and this, this. 251 00:17:20,150 --> 00:17:25,160 ‫And by the way, there is a rule here change this is to do this and it is earned it it's all physical 252 00:17:25,160 --> 00:17:27,740 ‫right to the desk. 253 00:17:30,320 --> 00:17:35,090 ‫What they complain about in their applications is this wall. 254 00:17:36,510 --> 00:17:38,310 ‫Translate to a large. 255 00:17:39,430 --> 00:17:46,960 ‫Big sized bandwidth when it comes to to their to their master. 256 00:17:48,340 --> 00:17:57,040 ‫Walker or stand by or application, and those are interstate, they have their application leprechaun's 257 00:17:57,040 --> 00:18:07,780 ‫across states, across different countries, so they had to buy expensive bandwidth to kind of transmit 258 00:18:07,780 --> 00:18:12,000 ‫their wall changes from this replica to this replica. 259 00:18:12,520 --> 00:18:16,480 ‫And I believe they have also child, grandchild, their applications. 260 00:18:16,480 --> 00:18:18,350 ‫So take that into consideration. 261 00:18:18,370 --> 00:18:24,670 ‫So the will change as they grow large, the bandwidth becomes expensive because they are very large 262 00:18:24,670 --> 00:18:26,800 ‫and they are not making small updates. 263 00:18:26,800 --> 00:18:27,850 ‫They're making large updates. 264 00:18:27,850 --> 00:18:29,300 ‫Which devices are even larger. 265 00:18:29,530 --> 00:18:32,010 ‫So that's that that's the limitation problem here. 266 00:18:32,020 --> 00:18:39,790 ‫In case I'm going to read this this section for you guys so so you can learn more about it in case one 267 00:18:40,300 --> 00:18:44,380 ‫more prosperous replication happens purely within a single data center. 268 00:18:44,380 --> 00:18:46,590 ‫The replication bandwidth may not be a problem. 269 00:18:46,990 --> 00:18:52,420 ‫Modern network equipment and switches can handle a large amount of bandwidth, and many host providers 270 00:18:52,420 --> 00:18:55,510 ‫offer free or cheap data center bandwidth. 271 00:18:56,650 --> 00:18:56,950 ‫Right. 272 00:18:57,670 --> 00:19:03,010 ‫If you're internally, I can transfer one gig of all sizes easily. 273 00:19:03,340 --> 00:19:08,410 ‫However, when replication must happen between data centers, issues can quickly escalate. 274 00:19:08,420 --> 00:19:15,270 ‫For instance, Gober originally used a physical servers in Coolac colocation space. 275 00:19:15,280 --> 00:19:21,880 ‫I don't know what the heck is a colocation colocation space on the West Coast for disaster recovery 276 00:19:21,880 --> 00:19:22,330 ‫purposes. 277 00:19:22,330 --> 00:19:28,960 ‫We added servers in the Second East Coast, colocation space and this design. 278 00:19:28,960 --> 00:19:36,160 ‫We had a Master Bosco's instance plus replicas in Western datacenter and set of replication is so that 279 00:19:36,160 --> 00:19:43,390 ‫that kind of the constraint you can see from east to west just just did not scare off all of them. 280 00:19:43,900 --> 00:19:50,350 ‫Right, because of the C one, how one single problem can lead to a lot of bigger problems can lead 281 00:19:50,350 --> 00:19:51,030 ‫to another problem. 282 00:19:51,040 --> 00:19:52,450 ‫You see the pattern, guys. 283 00:19:54,790 --> 00:19:58,040 ‫Rights are big because they have a lot of indexes. 284 00:19:58,450 --> 00:20:00,200 ‫That's where you should start. 285 00:20:00,490 --> 00:20:04,760 ‫Why do you have this much Texas? 286 00:20:06,500 --> 00:20:09,190 ‫You might say, hey, Hussein, I cannot live. 287 00:20:09,230 --> 00:20:15,830 ‫I have to have 350 indexes on all my fields because I query against the. 288 00:20:17,820 --> 00:20:24,810 ‫Well, in this case, I say, OK, maybe that's not a choice for you then, but try to avoid that in 289 00:20:24,810 --> 00:20:25,170 ‫the first. 290 00:20:25,170 --> 00:20:28,680 ‫But that's that's why that's what I didn't see. 291 00:20:28,680 --> 00:20:30,170 ‫And that's why people are pissed. 292 00:20:30,180 --> 00:20:36,880 ‫It's like, wow, Kent didn't did you really didn't you explain why do you guys have a lot of indexes? 293 00:20:37,260 --> 00:20:46,230 ‫Can you explain why do you need I bet if you go into the actual architecture, most thing don't need 294 00:20:46,230 --> 00:20:47,300 ‫this much indexes. 295 00:20:48,360 --> 00:20:54,600 ‫As a result, you will not translate to a huge right amplification consequences. 296 00:20:54,600 --> 00:21:00,030 ‫You will not have that because you'll not have a lot of of indexes to update. 297 00:21:00,240 --> 00:21:00,620 ‫Right. 298 00:21:01,470 --> 00:21:09,030 ‫But well, we're not on board, so we don't know their architecture, but that might be a valid use. 299 00:21:09,240 --> 00:21:11,550 ‫So let's go to the data corruption. 300 00:21:11,550 --> 00:21:16,620 ‫This is this is the most dumb section in this whole article. 301 00:21:16,740 --> 00:21:17,880 ‫I'll save you some time. 302 00:21:18,720 --> 00:21:27,010 ‫What they say here as hey, during that application, we Postgres two had a bug in it and our tableware 303 00:21:27,210 --> 00:21:28,560 ‫was corrupted as a result. 304 00:21:30,980 --> 00:21:33,590 ‫Seriously, seriously over. 305 00:21:34,600 --> 00:21:43,780 ‫What software doesn't have bugs, you're adding a bug as a result to move from postscripts to a MySQL 306 00:21:43,780 --> 00:21:45,510 ‫like my cycle's perfect. 307 00:21:46,150 --> 00:21:48,350 ‫That's just odd. 308 00:21:48,940 --> 00:21:50,020 ‫That's just to me. 309 00:21:50,020 --> 00:21:50,650 ‫I'm sorry. 310 00:21:50,650 --> 00:21:51,450 ‫That's just odd. 311 00:21:51,970 --> 00:21:59,440 ‫So they said during during their application process, the replicas were not in sync for some reason. 312 00:21:59,690 --> 00:22:05,650 ‫And as a result, when you query for a unique value, let's say select start from users, what I'd call 313 00:22:05,650 --> 00:22:06,490 ‫for you should get one. 314 00:22:06,760 --> 00:22:07,600 ‫They were getting two. 315 00:22:07,600 --> 00:22:12,700 ‫They were getting the old retired row for some for some reason. 316 00:22:13,480 --> 00:22:17,800 ‫And that causes their application to fall down and fall apart. 317 00:22:17,800 --> 00:22:23,050 ‫So they had to add defensive programming and to catch for the stuff. 318 00:22:23,050 --> 00:22:29,320 ‫But it's a bug Bosca's immediately if they notified police his name, they would immediately have fixed 319 00:22:29,320 --> 00:22:31,330 ‫it and fix that bug immediately. 320 00:22:31,480 --> 00:22:33,040 ‫But that's a good bug. 321 00:22:33,040 --> 00:22:40,150 ‫But I don't see bugs as a reason to move from as a show stopper, in my opinion. 322 00:22:40,210 --> 00:22:43,200 ‫So that's why they talk about that and they talk about here. 323 00:22:43,270 --> 00:22:50,980 ‫One section is the brief rebalancing, which is, by the way, by the way, B3 three rebalancing adds 324 00:22:50,980 --> 00:22:52,420 ‫to the right amplification. 325 00:22:52,420 --> 00:22:59,710 ‫I just they don't mention that, but it's just implied because a lot of people know that when you start 326 00:23:00,010 --> 00:23:07,180 ‫to grow and you have a lot of indexes, you keep updating those indexes naturally if you if your value 327 00:23:07,690 --> 00:23:09,190 ‫touches that index. 328 00:23:09,370 --> 00:23:09,640 ‫Right. 329 00:23:11,260 --> 00:23:21,520 ‫However, as a result of inserting that might the B3 structure might need to rebalance itself and when 330 00:23:21,520 --> 00:23:23,560 ‫it needs to rebalance itself. 331 00:23:24,590 --> 00:23:33,150 ‫It actually doesn't update physical update to the three and updates or updates, not inserts, right. 332 00:23:33,350 --> 00:23:37,260 ‫So updates translate to word to actual SSD. 333 00:23:37,310 --> 00:23:37,670 ‫Right. 334 00:23:37,670 --> 00:23:40,480 ‫Amplifications, because this is the do not like updates. 335 00:23:41,630 --> 00:23:44,720 ‫So that's that's another thing that can amplify the rights. 336 00:23:45,200 --> 00:23:48,590 ‫I'm talking if you go to the millions of euros. 337 00:23:48,740 --> 00:23:49,040 ‫Right. 338 00:23:49,040 --> 00:23:49,510 ‫Obviously. 339 00:23:51,820 --> 00:23:52,130 ‫Right. 340 00:23:52,730 --> 00:23:53,680 ‫Let's move to the next one. 341 00:23:53,810 --> 00:23:55,760 ‫Replica MVC. 342 00:23:56,130 --> 00:23:56,480 ‫All right. 343 00:23:56,750 --> 00:24:00,560 ‫Replica NBCC or replica multi version. 344 00:24:00,560 --> 00:24:06,320 ‫Concurrency Control says Bosco's does not have true replica of NBCC. 345 00:24:08,150 --> 00:24:10,580 ‫Well, why? 346 00:24:10,580 --> 00:24:20,900 ‫Because the fact that replicas apply will updates directly meant because if you think about it, Posterous 347 00:24:21,260 --> 00:24:27,020 ‫by default, again, by FISA, by default, takes the undiscussable presentation of the wall changes 348 00:24:27,380 --> 00:24:29,000 ‫and that's what gets transmitted. 349 00:24:29,030 --> 00:24:36,020 ‫So it's often higher bandwidth, but it's if you think about it, it's faster. 350 00:24:36,590 --> 00:24:36,920 ‫Right. 351 00:24:37,160 --> 00:24:40,760 ‫The alternative is just a statement based replication. 352 00:24:40,760 --> 00:24:41,030 ‫Right. 353 00:24:41,150 --> 00:24:50,300 ‫Where when instead of sending the results of the execution of the queries, send the queries themself 354 00:24:51,320 --> 00:24:55,310 ‫like, hey, I just didn't insert I just did an update. 355 00:24:55,310 --> 00:24:55,940 ‫I just didn't. 356 00:24:56,420 --> 00:25:03,440 ‫So the actual strength of the statements, these SQL statement, just send them to the replica. 357 00:25:04,040 --> 00:25:07,680 ‫This will be way slower, right. 358 00:25:07,700 --> 00:25:18,050 ‫Because, yeah, the bandwidth of transmitting these wall changes as as a form of statement is smaller 359 00:25:18,380 --> 00:25:20,970 ‫than the actual physical changes that happen. 360 00:25:21,320 --> 00:25:27,680 ‫However, applying them to the replica now you have to actually insert are not straightforward and such 361 00:25:27,680 --> 00:25:28,310 ‫might be OK. 362 00:25:28,310 --> 00:25:33,350 ‫But what if you do an update, for example, an update, could scan, could touch. 363 00:25:33,350 --> 00:25:36,490 ‫The index actually does work. 364 00:25:37,160 --> 00:25:39,020 ‫So you did double the work technically. 365 00:25:40,580 --> 00:25:45,740 ‫Right, because you did the work to execute the statement on the master, you now we're doing the same 366 00:25:45,740 --> 00:25:46,220 ‫work. 367 00:25:46,710 --> 00:25:47,420 ‫Exactly. 368 00:25:47,480 --> 00:25:49,470 ‫And that statement is expensive. 369 00:25:49,470 --> 00:25:54,080 ‫Are you going to take the same costs on the server and the destination? 370 00:25:54,620 --> 00:25:59,540 ‫So there is a problem calls for using both, but they are complaining here that. 371 00:26:00,610 --> 00:26:01,660 ‫Polska has. 372 00:26:04,170 --> 00:26:11,280 ‫Wall update is just doesn't give them NVCA support, that's clear that so let's see if I am if I am 373 00:26:11,280 --> 00:26:19,440 ‫in a replica, stand by and I'm executing a query and one of my wall changes. 374 00:26:20,830 --> 00:26:24,350 ‫A fact that query that is being executed on the stand by. 375 00:26:24,520 --> 00:26:27,090 ‫So, again, I a master, I am. 376 00:26:27,110 --> 00:26:29,680 ‫I deleted the thing. 377 00:26:29,680 --> 00:26:32,320 ‫I deleted a table that's just a little bit harsh. 378 00:26:32,320 --> 00:26:34,900 ‫But let's say deleted furors. 379 00:26:34,900 --> 00:26:35,220 ‫Right. 380 00:26:36,480 --> 00:26:44,520 ‫And now on the on the stand by, I'm actually querying those rules that is being deleted on the master, 381 00:26:44,520 --> 00:26:46,030 ‫I am on a different replica. 382 00:26:46,650 --> 00:26:54,420 ‫So now I am pushing the master, pushing the wall, changes to the to the stand by while that Quarrie 383 00:26:54,420 --> 00:26:57,550 ‫that squaring those deleted rows is being executed. 384 00:26:57,720 --> 00:26:59,780 ‫What should Posterous do you. 385 00:27:00,090 --> 00:27:05,230 ‫You tell me as the viewer listener, what should what do you think should happen here? 386 00:27:06,750 --> 00:27:10,320 ‫Should the post office immediately cancel the query? 387 00:27:12,040 --> 00:27:14,380 ‫Right and right, the changes. 388 00:27:15,500 --> 00:27:22,700 ‫Or should the should the rule changes be paused until the query finishes? 389 00:27:24,320 --> 00:27:27,350 ‫If you think about there are no other choices, right, you have to pause it. 390 00:27:27,620 --> 00:27:30,170 ‫Obviously you're not pausing all wool changes. 391 00:27:30,860 --> 00:27:34,880 ‫You're only pausing changes that affect running transaction. 392 00:27:34,880 --> 00:27:37,250 ‫And that's another thing to worry about. 393 00:27:37,250 --> 00:27:42,580 ‫How the heck do I know that the query that being executed actually affects my wall changes? 394 00:27:43,160 --> 00:27:45,220 ‫Building databases is not easy. 395 00:27:45,230 --> 00:27:47,420 ‫Guys, look at all this complexity. 396 00:27:48,050 --> 00:27:54,560 ‫So they are complaining here that you guys don't have any VC support because what you're doing is. 397 00:27:55,450 --> 00:28:02,260 ‫What Baucus does effectively is essentially having a timeout, says, hey, we're going to we're going 398 00:28:02,260 --> 00:28:06,970 ‫to block the wall changes for a given time, and they give you this time configurable. 399 00:28:07,970 --> 00:28:15,670 ‫If the query didn't finish in this amount of time, we are sorry, we're going to cancel those changes. 400 00:28:16,180 --> 00:28:21,580 ‫We're going to cancel that query that is actually querying its reading. 401 00:28:22,000 --> 00:28:25,930 ‫And while we're applying, we're going to force applying the changes. 402 00:28:25,930 --> 00:28:26,200 ‫Why? 403 00:28:26,200 --> 00:28:36,490 ‫Because Polska is designed to favor eventual consistency over, let's say, just reading queries in 404 00:28:36,490 --> 00:28:37,030 ‫this case. 405 00:28:37,690 --> 00:28:39,860 ‫So I'd rather be eventually consistent. 406 00:28:40,060 --> 00:28:43,150 ‫Remember, if this is a ventriloquist as well. 407 00:28:43,150 --> 00:28:47,590 ‫So stop saying that Norse equals the only database has evangelic assistance. 408 00:28:47,670 --> 00:28:50,200 ‫Every database has it as long as between replicas. 409 00:28:50,200 --> 00:28:50,470 ‫Right. 410 00:28:51,130 --> 00:28:53,320 ‫Relational doesn't in the same thing. 411 00:28:53,440 --> 00:28:54,280 ‫Same instance. 412 00:28:54,280 --> 00:28:55,750 ‫Yet it's completely consistent. 413 00:28:55,750 --> 00:28:59,860 ‫But across replicas there is always this idea of eventual consistency. 414 00:29:00,310 --> 00:29:04,630 ‫So what this does is actually kills the transaction and they did not like that. 415 00:29:05,230 --> 00:29:07,360 ‫So let's read this a little bit. 416 00:29:07,570 --> 00:29:14,560 ‫I kind of don't don't agree with the statement, but the design means that replicas can retain routinely 417 00:29:14,560 --> 00:29:20,370 ‫lagged seconds behind master, obviously, and therefore it is easy to write code that results in transactions. 418 00:29:20,640 --> 00:29:21,120 ‫Hmm. 419 00:29:21,880 --> 00:29:22,540 ‫What does that mean? 420 00:29:23,200 --> 00:29:28,120 ‫This problem might not be apparent to the application developer writing code, that obscure word, the 421 00:29:28,120 --> 00:29:33,070 ‫transactions on them, for instance, say a developer has some code that has to email received to a 422 00:29:33,070 --> 00:29:33,460 ‫user. 423 00:29:33,640 --> 00:29:41,170 ‫Depending on how it's written, the code may implicitly have a database transaction that helds open 424 00:29:41,170 --> 00:29:42,610 ‫until the email finishes. 425 00:29:43,070 --> 00:29:44,590 ‫That's just a bad idea, right? 426 00:29:46,210 --> 00:29:53,770 ‫You don't have you don't held you don't hold a transaction open and you do stuff has nothing to do with 427 00:29:53,770 --> 00:29:54,670 ‫the transaction itself. 428 00:29:54,700 --> 00:29:56,500 ‫Try to avoid that as much as possible. 429 00:29:56,510 --> 00:29:58,630 ‫So I was just that's just a best practice. 430 00:30:00,010 --> 00:30:06,710 ‫While it's always bad form to let your code hold open that obvious transaction while performing unrelated 431 00:30:06,730 --> 00:30:08,090 ‫walking out, OK. 432 00:30:08,480 --> 00:30:13,120 ‫Thankfully, they mention that the reality is the most enduring are not database expert and may not 433 00:30:13,120 --> 00:30:14,440 ‫always understand this problem. 434 00:30:15,640 --> 00:30:18,060 ‫I have to disagree with this one again, guys. 435 00:30:18,550 --> 00:30:24,400 ‫If you if you if you have if you have a few, if you know me from this channel or the podcast, you 436 00:30:24,400 --> 00:30:31,690 ‫know that war as an engineer, you have to take your pride of your work and the thing that you interface 437 00:30:31,690 --> 00:30:31,930 ‫with. 438 00:30:32,050 --> 00:30:34,910 ‫I believe that you have to understand what you're communicating with. 439 00:30:35,230 --> 00:30:42,700 ‫So, yeah, engineers are not database expert, but this does not qualify as a database expert. 440 00:30:42,730 --> 00:30:46,120 ‫This is just basic transaction management, in my opinion. 441 00:30:46,540 --> 00:30:49,720 ‫And I believe engineers have to understand this. 442 00:30:50,500 --> 00:30:50,830 ‫Right. 443 00:30:51,220 --> 00:30:57,040 ‫And engineers have to understand, you know, you might be not as radical as as me. 444 00:30:57,340 --> 00:30:59,830 ‫I don't like to work with anything that I don't understand. 445 00:30:59,980 --> 00:31:02,200 ‫If it's black box, I don't like to work with it. 446 00:31:02,470 --> 00:31:10,570 ‫Before I pick a tool, I have to understand fully how it actually works fully, fully from zero to 100 447 00:31:10,570 --> 00:31:10,950 ‫percent. 448 00:31:11,470 --> 00:31:17,710 ‫If I'm if I'm working on it, if I'm connecting with it, if I'm interfering with it, it's OK if I 449 00:31:17,710 --> 00:31:19,510 ‫understand 80, 70 percent of the tool. 450 00:31:20,500 --> 00:31:23,170 ‫But again, I'm not going to understand every single thing in that case. 451 00:31:23,170 --> 00:31:23,400 ‫Right. 452 00:31:23,710 --> 00:31:24,580 ‫But that's just me. 453 00:31:24,830 --> 00:31:27,100 ‫You might have a different opinion polls because upgrades. 454 00:31:27,100 --> 00:31:29,440 ‫So I kind of agree with them on this one. 455 00:31:29,740 --> 00:31:32,230 ‫I, I try to upgrade this many times. 456 00:31:32,230 --> 00:31:37,930 ‫I always didn't find the right tutorial or it was so complicated that I gave up. 457 00:31:37,930 --> 00:31:38,320 ‫Right. 458 00:31:38,620 --> 00:31:41,400 ‫And they kind of reiterate the same problem. 459 00:31:41,410 --> 00:31:48,630 ‫So I had to agree with them 100 percent in this Polska subject is really painful, really painful. 460 00:31:48,640 --> 00:31:54,430 ‫I've been there, I've been there from nine three to nine for nine four to nine five. 461 00:31:54,760 --> 00:31:56,860 ‫I then just gives up. 462 00:31:56,860 --> 00:32:00,700 ‫I just rather recreate my database's from scratch after that. 463 00:32:00,820 --> 00:32:08,800 ‫Obviously I'm running a test database here but but yet I didn't run a production database that I had 464 00:32:08,800 --> 00:32:09,490 ‫to upgrade it. 465 00:32:09,490 --> 00:32:15,010 ‫But what I'm going to do in this case is just obviously there is a way, but. 466 00:32:15,910 --> 00:32:18,440 ‫Apparently this way sometimes works, sometimes it doesn't. 467 00:32:19,030 --> 00:32:24,070 ‫So there is a there is there is also the PJI logical way of right. 468 00:32:24,660 --> 00:32:27,190 ‫There is there are some tools that allow you to do upgrades. 469 00:32:28,030 --> 00:32:28,410 ‫Right. 470 00:32:28,420 --> 00:32:35,170 ‫And guys, if you if you know any of that stuff, if you have ever upgraded Postgres database smoothly, 471 00:32:35,170 --> 00:32:37,000 ‫let me know in the comments section below. 472 00:32:37,030 --> 00:32:38,840 ‫I'd love to know how to do it. 473 00:32:39,550 --> 00:32:44,820 ‫I tried twice, I believe, and I gave up and says, you know, this is not straightforward at all. 474 00:32:45,520 --> 00:32:48,070 ‫And I didn't have I wasn't forced to do it. 475 00:32:48,070 --> 00:32:49,390 ‫So I took that. 476 00:32:49,390 --> 00:32:57,940 ‫He's got out of recreating my dad's OK, the architecture of my school and UTB and whatever we talked 477 00:32:57,940 --> 00:33:01,840 ‫about, I'd have you guys check out the video right here if you want to learn more about it. 478 00:33:01,840 --> 00:33:03,270 ‫But maybe what? 479 00:33:04,090 --> 00:33:07,300 ‫So they go now through their own describers in compared to a podcast. 480 00:33:07,300 --> 00:33:08,620 ‫So A.B.. 481 00:33:08,620 --> 00:33:10,360 ‫Or just miscalled in general. 482 00:33:11,890 --> 00:33:18,620 ‫My sexual energy in general, that's the right voice, saying you have the primary key and the primary 483 00:33:18,640 --> 00:33:26,530 ‫has a pointer to the roar directly to the physical database on the scroll, all the indexes that you 484 00:33:26,530 --> 00:33:30,220 ‫create points back to the primary key. 485 00:33:30,340 --> 00:33:37,540 ‫And that's the powerful thing here for them, because now if I update anything on on the on the row, 486 00:33:38,200 --> 00:33:43,930 ‫only the primary key needs to be updated to know the new kind of rule ID and even that. 487 00:33:44,470 --> 00:33:44,860 ‫Right. 488 00:33:44,860 --> 00:33:50,140 ‫It's a little bit different, but I don't have to touch my secondary indexes. 489 00:33:50,140 --> 00:33:50,380 ‫Right. 490 00:33:51,970 --> 00:33:54,710 ‫That being said, guys, they didn't. 491 00:33:55,300 --> 00:34:00,160 ‫That's not always true if you're updating a field that has no index. 492 00:34:01,200 --> 00:34:01,770 ‫Then. 493 00:34:03,750 --> 00:34:04,290 ‫Right. 494 00:34:05,220 --> 00:34:10,500 ‫You get it, not touch, only the primary key, but if you updated a feel that has an index, you got 495 00:34:10,500 --> 00:34:11,270 ‫to touch both. 496 00:34:11,490 --> 00:34:13,470 ‫So they didn't mention that, but. 497 00:34:14,830 --> 00:34:19,430 ‫Yeah, right, because it's very defensive architecture article, right, bicycle's perfect. 498 00:34:20,230 --> 00:34:25,930 ‫Yeah, if you did that, the actual field that has a secondary index, you have to operate like an index. 499 00:34:25,930 --> 00:34:27,760 ‫You just updated a value. 500 00:34:27,940 --> 00:34:30,580 ‫So you have to go to your index and change the tree. 501 00:34:30,590 --> 00:34:32,380 ‫So that includes this value, right. 502 00:34:32,800 --> 00:34:35,380 ‫So, yeah, you touch a lot of fields. 503 00:34:35,380 --> 00:34:35,640 ‫Right. 504 00:34:36,160 --> 00:34:40,210 ‫And if you touch a lot of fields, then you have to update all the indexes. 505 00:34:40,210 --> 00:34:40,420 ‫Right. 506 00:34:40,600 --> 00:34:46,540 ‫It's just bydesign it's less if you have a lot of indexes, you have less changes and general. 507 00:34:46,720 --> 00:34:46,960 ‫Right. 508 00:34:47,320 --> 00:34:54,070 ‫So as a result, this translates to obviously less, less raw wall changes because they don't have as 509 00:34:54,070 --> 00:34:57,490 ‫much changes, logical to physical translation. 510 00:34:58,150 --> 00:35:04,480 ‫And now they talk about the rollback mechanism here that might be called they have this concept of rollback 511 00:35:04,480 --> 00:35:05,100 ‫segments. 512 00:35:05,110 --> 00:35:10,450 ‫So instead of inserting a raw in the heap itself. 513 00:35:12,910 --> 00:35:19,330 ‫When you updated data, Arau and upholsterers, you insert a row in the heap, it's on the table itself, 514 00:35:19,480 --> 00:35:19,780 ‫right? 515 00:35:20,320 --> 00:35:22,720 ‫Musical, A.B. does it differently. 516 00:35:22,900 --> 00:35:28,840 ‫It's just the the copy that on to some other place called the undo the rollback segments. 517 00:35:29,020 --> 00:35:30,070 ‫The undo logs. 518 00:35:30,250 --> 00:35:30,550 ‫Right. 519 00:35:30,790 --> 00:35:31,930 ‫And they keep it all there. 520 00:35:32,540 --> 00:35:37,180 ‫And then based on that, they point to that location and roll back segments. 521 00:35:37,510 --> 00:35:37,760 ‫All right. 522 00:35:37,780 --> 00:35:40,240 ‫So so it's a little bit different architecture. 523 00:35:40,630 --> 00:35:44,830 ‫So if you query now, if you want the latest, the latest is always there. 524 00:35:45,010 --> 00:35:46,060 ‫So that's the beautiful thing. 525 00:35:46,300 --> 00:35:51,340 ‫But based on your transaction idea, if you are coming from the past, you're counting on the past. 526 00:35:51,550 --> 00:35:52,900 ‫You want all the results. 527 00:35:53,140 --> 00:35:59,770 ‫You have to do the jump to go back to to get the all this jump doesn't exist in Polska. 528 00:35:59,780 --> 00:36:04,450 ‫So queries that that are concurrent are fast on Postgres. 529 00:36:04,690 --> 00:36:11,140 ‫They are technically slower and in my school because now you have to jump back and going through different 530 00:36:11,140 --> 00:36:13,570 ‫places to do it, to do the query. 531 00:36:13,960 --> 00:36:14,410 ‫Right. 532 00:36:15,010 --> 00:36:16,780 ‫And and vice versa. 533 00:36:18,340 --> 00:36:22,300 ‫So they explain the haziq under indexes point to the primary index and the primary. 534 00:36:22,300 --> 00:36:23,080 ‫Next point to the desk. 535 00:36:23,110 --> 00:36:25,360 ‫This is for people listening on the podcast. 536 00:36:25,510 --> 00:36:26,380 ‫Were listening. 537 00:36:26,680 --> 00:36:27,610 ‫Were what? 538 00:36:28,210 --> 00:36:28,690 ‫What is it? 539 00:36:28,900 --> 00:36:35,230 ‫We're looking at a picture of secondary index pointing to the primary index and then primary index pointing 540 00:36:35,230 --> 00:36:35,830 ‫to the disk. 541 00:36:36,190 --> 00:36:37,150 ‫That's just an extra layer. 542 00:36:37,660 --> 00:36:44,950 ‫And then they claim that they say here that the application section of MySQL supports multiple replication 543 00:36:45,340 --> 00:36:47,590 ‫statement based and. 544 00:36:48,530 --> 00:36:55,520 ‫Wool changes and the moment you if you implement if you implement any of these, but if you implement 545 00:36:55,520 --> 00:37:03,470 ‫statements based replication, you have true MVC support because now the statement, the wool changes, 546 00:37:03,470 --> 00:37:12,350 ‫the coming that is coming to you from the master to the stand by is just another right to consider it 547 00:37:12,350 --> 00:37:15,040 ‫another transaction trying to be executed. 548 00:37:15,380 --> 00:37:19,700 ‫So it will have truly true and support. 549 00:37:19,700 --> 00:37:21,920 ‫And that case will not be blocking. 550 00:37:22,070 --> 00:37:22,390 ‫Right. 551 00:37:22,580 --> 00:37:26,090 ‫Because you can technically query and right at the same time. 552 00:37:26,310 --> 00:37:32,330 ‫And now as a result, you can implement the same exact thing that you're doing because you have a logical 553 00:37:32,330 --> 00:37:34,360 ‫view of what is changing. 554 00:37:34,370 --> 00:37:39,980 ‫As a result, the database is aware of the change it can implement and VXI at the higher level. 555 00:37:40,240 --> 00:37:40,580 ‫All right. 556 00:37:41,270 --> 00:37:42,830 ‫Even through replication. 557 00:37:45,020 --> 00:37:50,000 ‫Postgres, the support that there is a third party that you can install and does exactly that you can 558 00:37:50,000 --> 00:37:52,470 ‫do that is just they just didn't mention that. 559 00:37:52,520 --> 00:37:54,230 ‫Oh, and this is an old article. 560 00:37:54,230 --> 00:37:57,710 ‫So things can change, obviously, right. 561 00:37:58,520 --> 00:37:59,390 ‫In my article. 562 00:38:00,230 --> 00:38:07,640 ‫And and they say that, oh, by the way, even the wall the wall says the wall sizes are so small because 563 00:38:08,030 --> 00:38:10,880 ‫we're changing, which you do very few things. 564 00:38:10,880 --> 00:38:12,410 ‫You know, they go through all of that stuff. 565 00:38:13,940 --> 00:38:18,120 ‫I'm not going to go through that, but that's essentially their advantage. 566 00:38:18,590 --> 00:38:25,520 ‫They go through another advantage of my school saying that a buffer pool, the buffer pool is the caching 567 00:38:25,520 --> 00:38:26,150 ‫mechanism. 568 00:38:26,630 --> 00:38:32,780 ‫And PostgreSQL, compared to the buffer pool, is the caching mechanism in. 569 00:38:33,820 --> 00:38:39,450 ‫And my sequel, compared to the Cachay Mechanism, Postgres, which is which is basically the RSS memory. 570 00:38:39,460 --> 00:38:39,790 ‫Right. 571 00:38:40,750 --> 00:38:42,860 ‫And they explaining the difference here. 572 00:38:42,880 --> 00:38:49,810 ‫They they they claim that Posterous, using it, uses a different operating operating system, calls 573 00:38:49,810 --> 00:38:52,070 ‫like they are using two calls instead of one. 574 00:38:52,810 --> 00:38:54,340 ‫I don't know much about that, to be honest. 575 00:38:54,340 --> 00:38:59,620 ‫I'm not an expert in operating systems, but a lot of people say that here you have to use a one call 576 00:38:59,620 --> 00:39:03,250 ‫to seek and read at the same time instead of seeking and reading. 577 00:39:03,670 --> 00:39:08,460 ‫I don't know, maybe Polska actually changes a lot of people here listening and watching this channel. 578 00:39:08,490 --> 00:39:14,320 ‫Some some people actually are experts in this thing and might correct that bar, but I'm not aware of 579 00:39:14,320 --> 00:39:15,880 ‫that as a result. 580 00:39:15,880 --> 00:39:18,190 ‫So I can't comment more much on that part. 581 00:39:19,690 --> 00:39:26,410 ‫There is then the energy storage engine implements the least recently used buffer pool and which you 582 00:39:26,410 --> 00:39:28,490 ‫can apparently control. 583 00:39:28,540 --> 00:39:30,490 ‫I'm surprised that you cannot control them. 584 00:39:30,490 --> 00:39:31,870 ‫Cache size and Postgres. 585 00:39:31,870 --> 00:39:33,700 ‫I need to read more about that a little bit. 586 00:39:34,480 --> 00:39:36,230 ‫But that's another thing that they said. 587 00:39:36,250 --> 00:39:38,050 ‫Oh, there's another advantage of MySQL. 588 00:39:38,830 --> 00:39:45,970 ‫When another thing says the connection handling MySQL, there's a thread per connection GCP connection 589 00:39:45,970 --> 00:39:46,480 ‫to you. 590 00:39:46,480 --> 00:39:50,470 ‫Open to my school is a thread on the server side. 591 00:39:50,470 --> 00:39:52,630 ‫However, Postgres it's an actual process. 592 00:39:52,630 --> 00:39:57,880 ‫So technically now they claim obviously not enough. 593 00:39:57,940 --> 00:40:01,780 ‫A thread is cheaper to spin off than a process. 594 00:40:02,800 --> 00:40:09,640 ‫I read I read that this is no longer true because the process is almost identical now, but could be 595 00:40:09,970 --> 00:40:11,050 ‫back in the days. 596 00:40:11,050 --> 00:40:12,580 ‫Could be that was true. 597 00:40:13,270 --> 00:40:18,640 ‫But now if you think about it to scale 10000 connections right now, if you think about it. 598 00:40:19,690 --> 00:40:24,650 ‫Opening, opening a lot of TCP connections is just a bad idea. 599 00:40:24,850 --> 00:40:28,500 ‫So that's why we have the idea of connection pooling, right? 600 00:40:28,570 --> 00:40:31,900 ‫We build our application so that it uses a pool. 601 00:40:33,090 --> 00:40:39,210 ‫Reserve a pool or of reserve a connection from the pool, execute the transaction and then return it 602 00:40:39,210 --> 00:40:39,710 ‫to the pool. 603 00:40:40,260 --> 00:40:48,060 ‫And if you're doing a single atomic statement that executed, you can just execute on the pool directly, 604 00:40:48,060 --> 00:40:53,160 ‫said, hey, pick any pool, any illnesses in the pool, execute and then return, return. 605 00:40:53,160 --> 00:41:00,660 ‫And immediately this reserve and release is also back to their careers if they have a case that spans 606 00:41:00,660 --> 00:41:02,370 ‫three, four, five, seven minutes. 607 00:41:04,000 --> 00:41:04,390 ‫And. 608 00:41:05,710 --> 00:41:12,820 ‫Again, nothing wrong with a query that that transaction, it stands long if you're actually doing all 609 00:41:12,830 --> 00:41:19,030 ‫database works, some some some transactions, I've seen transaction that takes 30 minutes. 610 00:41:19,180 --> 00:41:24,100 ‫Just because it does a lot of work, it changes a lot. 611 00:41:24,100 --> 00:41:26,830 ‫Then these changes, it has to be atomic, right? 612 00:41:27,040 --> 00:41:33,220 ‫Yeah, you can argue that you can break it even that you have to break this transaction to smaller and 613 00:41:33,220 --> 00:41:38,980 ‫smaller, small, small, small pieces so that each piece can be executed in its own. 614 00:41:40,320 --> 00:41:43,230 ‫Atomic Waagner, right, so he can minimize the transaction. 615 00:41:43,710 --> 00:41:49,500 ‫So this this also results in if you have a long running transactions, then you have to really think 616 00:41:49,500 --> 00:41:53,300 ‫about how deep the reservation and connection pooling works. 617 00:41:53,310 --> 00:41:53,440 ‫Right. 618 00:41:53,550 --> 00:41:55,200 ‫So the number of connection. 619 00:41:55,880 --> 00:41:56,210 ‫Right. 620 00:41:56,880 --> 00:41:57,560 ‫Think about it. 621 00:41:57,660 --> 00:42:03,540 ‫So that if not if the client's not not using a connection, then don't let them open a connection and 622 00:42:03,540 --> 00:42:05,400 ‫just have it open. 623 00:42:05,740 --> 00:42:06,630 ‫You just connection Pawling. 624 00:42:06,660 --> 00:42:13,620 ‫And they say they use a I believe PJI bouncer's that they're using some some service that actually does 625 00:42:13,620 --> 00:42:15,850 ‫that that connection pulling. 626 00:42:15,900 --> 00:42:17,340 ‫But a lot of application do it. 627 00:42:17,340 --> 00:42:20,340 ‫Even if you don't, you can build your own layer on top. 628 00:42:20,340 --> 00:42:25,680 ‫And I show the connection pulling on Polska as many times in this channel, right through this idea, 629 00:42:25,680 --> 00:42:28,530 ‫guys, and hopefully, hopefully in the future. 630 00:42:29,160 --> 00:42:32,380 ‫And we're all at the end of the article, obviously, guys. 631 00:42:32,410 --> 00:42:32,690 ‫Right. 632 00:42:34,400 --> 00:42:41,930 ‫Or the end of the article, but hopefully when it comes to connection polling, I really hope that quick 633 00:42:41,930 --> 00:42:43,040 ‫as a protocol. 634 00:42:44,250 --> 00:42:49,920 ‫And mask, I believe they're just working on a new protocol right now, skull mask that will allows 635 00:42:49,920 --> 00:42:59,850 ‫you to kind of stream multiple open multiple streams on a given TCP connection or UDP connection in 636 00:42:59,850 --> 00:43:00,570 ‫case of quick. 637 00:43:01,840 --> 00:43:03,130 ‫That represents your. 638 00:43:04,280 --> 00:43:11,390 ‫You're in your database connection so that if if my sequel or Polska supported Quake and I don't see 639 00:43:11,390 --> 00:43:18,710 ‫a reason why not, then the clients can open a single and remember, the clients always observe or something 640 00:43:18,710 --> 00:43:23,930 ‫like that to open a single connection and have up to 200. 641 00:43:23,930 --> 00:43:28,060 ‫Even more than that streams concurrently in a single Disneyfication. 642 00:43:28,910 --> 00:43:32,340 ‫The only trick here is the database has to understand ideastream. 643 00:43:32,360 --> 00:43:33,620 ‫So that's a lot of work. 644 00:43:33,620 --> 00:43:35,000 ‫But I believe is going to be. 645 00:43:36,570 --> 00:43:44,830 ‫Really lucrative for a data base to implement a protocol like that, just like I don't really need TCP 646 00:43:44,850 --> 00:43:45,550 ‫anymore, right? 647 00:43:45,690 --> 00:43:54,150 ‫A single Tsipi, just a wasteful thing to have a single TCP connection for a given client or connection 648 00:43:54,150 --> 00:43:54,450 ‫borling. 649 00:43:54,480 --> 00:43:58,680 ‫This has to go away and we have to move to a model where we multiplex. 650 00:43:59,680 --> 00:44:06,730 ‫Queries in a single DCP connection using this protocol, whether whether it was even if they implemented 651 00:44:06,730 --> 00:44:11,320 ‫their own, they don't have you as quick, just implement and your own protocol that supports multiplexing 652 00:44:11,830 --> 00:44:19,720 ‫through multiplexing so that every request, every session, every channel has its own logical representation 653 00:44:19,720 --> 00:44:21,660 ‫in that DCP connection that you open. 654 00:44:21,670 --> 00:44:23,170 ‫So this you don't have to open. 655 00:44:23,170 --> 00:44:24,280 ‫Many can actually. 656 00:44:24,280 --> 00:44:26,770 ‫You just have to open one or a few of them. 657 00:44:27,040 --> 00:44:30,370 ‫And each one of them has basically some limit. 658 00:44:30,910 --> 00:44:37,300 ‫Obviously, that doesn't come with it for free because now you just increase the CPU size at the back 659 00:44:37,300 --> 00:44:42,890 ‫end and the front end, because now you have to assemble these channels and streams. 660 00:44:42,910 --> 00:44:45,460 ‫That's the problem which DeVita and Quick People start. 661 00:44:46,000 --> 00:44:49,360 ‫Lucas Purdue and and what's his name? 662 00:44:49,360 --> 00:44:50,230 ‫Chris Wood. 663 00:44:50,230 --> 00:44:52,420 ‫And people working on the quick protocol. 664 00:44:52,420 --> 00:44:59,020 ‫They're trying to solve this problem with the CPU usage because CPU usage now you have you just not 665 00:44:59,020 --> 00:45:03,090 ‫working with just stream of content coming from TCP socket. 666 00:45:03,100 --> 00:45:08,560 ‫No, you have to actually look at the data and then arrange the packets so they are in logical streams 667 00:45:08,560 --> 00:45:11,710 ‫or channels and then then deliver to the app. 668 00:45:11,710 --> 00:45:14,950 ‫So the operating system or the application. 669 00:45:15,940 --> 00:45:24,130 ‫Wherever this thing lives, doing extra work, so again, I'm sorry about that Segway, but I want to 670 00:45:24,130 --> 00:45:26,440 ‫discuss that a little bit, think I think that's just an idea. 671 00:45:26,440 --> 00:45:29,050 ‫That is just great conclusion. 672 00:45:29,320 --> 00:45:35,890 ‫Obviously, they say, hey, Polska Sevda served us well in the early days of Uber, but we ran into 673 00:45:35,890 --> 00:45:38,730 ‫significant problems scaling Postgres with our growth. 674 00:45:39,190 --> 00:45:44,770 ‫Today we have some legacy Postgres instances, but the bulk of our databases are either on top of my 675 00:45:44,770 --> 00:45:47,400 ‫cycle, typically using our ski months later. 676 00:45:47,440 --> 00:45:48,250 ‫That's another point. 677 00:45:49,450 --> 00:45:51,640 ‫You have no schema less. 678 00:45:54,500 --> 00:46:02,630 ‫You have schema lists and using MySQL, maybe there is something I'm missing here, but it does not 679 00:46:02,630 --> 00:46:05,140 ‫seem natural to me. 680 00:46:07,430 --> 00:46:13,610 ‫A lot of people use this podcast as a schema lists where they put a hunk of Jason in a single field 681 00:46:13,610 --> 00:46:17,690 ‫as Jason B. And they they they work on that. 682 00:46:17,690 --> 00:46:23,660 ‫But maybe that that's just the way for war, because if they have a lot of fields and they have a lot 683 00:46:23,660 --> 00:46:26,600 ‫of indexes on those fields, maybe that's the way to go. 684 00:46:27,230 --> 00:46:27,830 ‫Who knows? 685 00:46:28,590 --> 00:46:28,900 ‫Right. 686 00:46:29,270 --> 00:46:31,390 ‫Again, guys, what do you think? 687 00:46:31,400 --> 00:46:32,590 ‫What do you think about all this stuff? 688 00:46:33,200 --> 00:46:34,520 ‫Let me know in the comments section below. 689 00:46:35,060 --> 00:46:36,180 ‫I'm going to see you in the next one. 690 00:46:36,210 --> 00:46:37,560 ‫Hope you enjoyed this video. 691 00:46:37,880 --> 00:46:40,730 ‫Give it a like if you do and share with your friends. 692 00:46:41,060 --> 00:46:43,220 ‫I'm going to see you in the next one. 693 00:46:43,880 --> 00:46:44,450 ‫Thank you. 694 00:46:44,600 --> 00:46:46,400 ‫Even kill tasers. 695 00:46:46,430 --> 00:46:50,120 ‫Kill, kill to keep a staff engineer. 696 00:46:50,120 --> 00:46:50,760 ‫And overengineer. 697 00:46:50,810 --> 00:46:52,150 ‫This is a great article again. 698 00:46:52,700 --> 00:46:53,120 ‫Yeah. 699 00:46:53,360 --> 00:47:00,500 ‫And things things have been changing a lot in the Uber ward. 700 00:47:01,340 --> 00:47:06,290 ‫But this is again, this is this is a historical article that goes in the years and years. 701 00:47:06,290 --> 00:47:08,630 ‫And we had to discuss it, so. 702 00:47:09,930 --> 00:47:10,810 ‫Thank you so much. 703 00:47:10,880 --> 00:47:13,050 ‫Appreciate you, I'm going to see in the next one. 704 00:47:13,080 --> 00:47:13,740 ‫You guys stay awesome.