1
00:00:00,060 --> 00:00:01,210
‫What is going on, guys?

2
00:00:01,230 --> 00:00:11,910
‫My name is Hussein, and this is an old bug gold article, Uber Engineering switched from postscripts

3
00:00:11,910 --> 00:00:19,140
‫to my school and this article was published on July 26, 2016.

4
00:00:19,140 --> 00:00:29,150
‫And this article explains why Uber moved from postcrisis to my school back in the days.

5
00:00:29,160 --> 00:00:35,580
‫I remember that this article got a lot of backlash from the Postgres community and actually the whole

6
00:00:35,580 --> 00:00:45,720
‫database community, to be honest, because of of how the language used in this article severely criticized

7
00:00:45,720 --> 00:00:48,090
‫Posterous as if it's a bad database.

8
00:00:48,120 --> 00:00:48,450
‫Right.

9
00:00:48,510 --> 00:00:50,160
‫They don't even mention that thing.

10
00:00:50,190 --> 00:00:53,310
‫That is a hey, by the way, guys, this is just didn't work for us.

11
00:00:53,670 --> 00:00:55,370
‫Doesn't mean it won't work for you.

12
00:00:55,800 --> 00:01:01,440
‫So that's that's was then that was the main reason this article was heavily criticized.

13
00:01:01,440 --> 00:01:04,350
‫I'm going to reference this article and the Hacker News.

14
00:01:07,050 --> 00:01:13,290
‫A thread that explains that you're going to have a lot of discussions as some discussions go into the

15
00:01:13,290 --> 00:01:17,480
‫deep, some of the discussions kind of point the flaws of this article.

16
00:01:17,880 --> 00:01:24,360
‫But what I want to do in this video podcast is going to go through this article and through the main

17
00:01:24,360 --> 00:01:27,570
‫pain point that Obama had and then discuss them.

18
00:01:27,720 --> 00:01:33,760
‫Give you my personal opinion, whether I think you moving from postscripts they did.

19
00:01:33,780 --> 00:01:38,380
‫They have to move from post goes to my school or not, all of that stuff.

20
00:01:38,460 --> 00:01:39,990
‫How about we jump into it, guys?

21
00:01:41,010 --> 00:01:48,150
‫So, guys, first they explained that their architecture right here, they have the monolithic back

22
00:01:48,150 --> 00:01:54,450
‫in application written in Python that used Postgres for data persistence and they are moving again.

23
00:01:54,450 --> 00:01:56,000
‫This is 2016.

24
00:01:56,020 --> 00:02:02,580
‫Things change and change, but they are moving to a micro services architecture and surprisingly to

25
00:02:02,580 --> 00:02:09,500
‫a new system using schema lists, synovial database, sharding layer built on top of my school.

26
00:02:09,510 --> 00:02:11,610
‫So we're going to talk about that a little bit.

27
00:02:11,610 --> 00:02:13,860
‫That that just that is a little bit of flak.

28
00:02:14,310 --> 00:02:18,090
‫You might you might you might say, Hosain, why schema list and MySQL.

29
00:02:18,090 --> 00:02:19,310
‫That doesn't make any sense.

30
00:02:19,320 --> 00:02:20,540
‫Right, exactly.

31
00:02:20,550 --> 00:02:22,110
‫That's a lot of people confused a lot.

32
00:02:22,120 --> 00:02:22,760
‫Oh, OK.

33
00:02:23,190 --> 00:02:25,170
‫Why would you pick my school?

34
00:02:26,110 --> 00:02:30,610
‫For let's just go with with Cassandre Cockroach DB.

35
00:02:30,640 --> 00:02:34,480
‫I don't think Cockroach TV was born back then, but.

36
00:02:34,480 --> 00:02:35,680
‫But Farnam.

37
00:02:37,440 --> 00:02:39,070
‫Mango, right?

38
00:02:39,240 --> 00:02:42,810
‫Anything but, yeah, again, they have their own reasons.

39
00:02:43,650 --> 00:02:45,300
‫And that's another article.

40
00:02:45,300 --> 00:02:49,790
‫But what I want to focus in here is the architecture of Bosco's, as they claim.

41
00:02:50,280 --> 00:02:52,710
‫So is the for the people listening.

42
00:02:52,740 --> 00:02:55,620
‫We're reading now the architecture of Postgres.

43
00:02:55,620 --> 00:03:06,630
‫And I'm going to read the article, The Five Most Pain Point that led UBA to move from Bosco's to my

44
00:03:06,630 --> 00:03:06,960
‫school.

45
00:03:06,960 --> 00:03:07,260
‫So.

46
00:03:08,530 --> 00:03:13,810
‫So this is this is the article now we encountered many Posterous limitations.

47
00:03:15,390 --> 00:03:22,320
‫The first one in inefficient architecture for rights, the second one efficient data application, the

48
00:03:22,320 --> 00:03:29,940
‫third one issue with table corruption issues with table corruption issues, not issue and is poor.

49
00:03:30,390 --> 00:03:36,150
‫The fourth one, poor replica NBCC multiverses and concurrently control support.

50
00:03:37,230 --> 00:03:42,020
‫And the fifth one is the final one difficulty upgrading to new releases.

51
00:03:42,030 --> 00:03:47,560
‫I can agree with some of them because I use postcrisis and I know how painful is to upgrade both postscripts

52
00:03:47,580 --> 00:03:50,010
‫so I can relate it to that.

53
00:03:50,250 --> 00:03:54,000
‫I understand that this is a little bit easier process right now.

54
00:03:54,900 --> 00:03:57,300
‫But nevertheless, how about we.

55
00:03:57,570 --> 00:04:01,590
‫I don't agree with all the points, by the way, but I'm just reading to you.

56
00:04:01,590 --> 00:04:02,730
‫I agree with some of them.

57
00:04:02,850 --> 00:04:06,150
‫Some of them is just to me, preposterous.

58
00:04:07,170 --> 00:04:08,370
‫So how about we jump into it?

59
00:04:08,370 --> 00:04:14,700
‫So they look through the limitation and they decide to move to a MySQL because it solves most of these

60
00:04:14,700 --> 00:04:15,120
‫problems.

61
00:04:15,120 --> 00:04:21,180
‫So how about we jump into one point, one point after another?

62
00:04:21,180 --> 00:04:25,170
‫So the first point here, that's called the on disk format.

63
00:04:25,500 --> 00:04:35,490
‫And they are describing in this article in this section that that on disk format of Postgres that implements

64
00:04:35,490 --> 00:04:37,220
‫a multi version concurrency controller.

65
00:04:37,260 --> 00:04:45,150
‫And we talked about it many times on this channel, how the actual indexes are stored, how secondary

66
00:04:45,150 --> 00:04:47,380
‫and indexes are stored and postscripts.

67
00:04:47,400 --> 00:04:51,540
‫How do they implement the multi version concurrency controller using the transaction ID?

68
00:04:51,540 --> 00:04:52,160
‫That's the idea.

69
00:04:52,170 --> 00:04:52,500
‫Right.

70
00:04:53,220 --> 00:04:55,290
‫And then Max Idee and Maniatty.

71
00:04:55,290 --> 00:04:57,690
‫And how are you how a role becomes visible?

72
00:04:57,690 --> 00:05:04,710
‫My transaction, once I go out of scope of the transaction, I need to do a vacuum to clean up those

73
00:05:05,040 --> 00:05:08,370
‫rows that need no longer seem by any other transaction.

74
00:05:08,380 --> 00:05:15,540
‫So that's all it comes down to the isolation and all that stuff that we talked about many times, this

75
00:05:15,540 --> 00:05:15,830
‫channel.

76
00:05:15,990 --> 00:05:19,890
‫So check out the ASADA video here to learn about isolation, ASADA Tomasetti.

77
00:05:19,890 --> 00:05:21,270
‫I'm not going to explain it right here.

78
00:05:21,990 --> 00:05:23,720
‫So that's that's why I explain here.

79
00:05:23,850 --> 00:05:32,340
‫So the what they are going through here is they have a table called users and they showing you how postscripts

80
00:05:32,340 --> 00:05:35,550
‫works so well for the people listening on the podcast.

81
00:05:35,820 --> 00:05:38,880
‫We're looking at the table with four columns.

82
00:05:38,880 --> 00:05:42,420
‫I'd first, last and Bourdier.

83
00:05:43,050 --> 00:05:45,720
‫So first name, last name and then birthday.

84
00:05:46,620 --> 00:05:48,690
‫And then there is an ID is a number.

85
00:05:48,750 --> 00:05:54,060
‫First name is obviously starting last names, a strength character and then Boissiere and they show

86
00:05:54,060 --> 00:06:02,460
‫you how this is on disk and there is a seated which is which, which is a transaction ID that is stored.

87
00:06:03,300 --> 00:06:08,360
‫That's basically the tuple reference on the disk.

88
00:06:08,360 --> 00:06:10,230
‫And this is very, very important.

89
00:06:10,620 --> 00:06:14,070
‫So these tables are ABCDE, PFG and so on.

90
00:06:14,370 --> 00:06:18,600
‫And so the primary key, they have an index on the primary key, which is the ID.

91
00:06:18,600 --> 00:06:22,770
‫They have a secondary index on the first, last and birth year.

92
00:06:22,770 --> 00:06:24,920
‫So they have indexes on all of them.

93
00:06:24,930 --> 00:06:26,520
‫And again, this is just an example.

94
00:06:26,520 --> 00:06:30,930
‫They didn't show us there with their architecture for security reasons, probably.

95
00:06:31,320 --> 00:06:33,600
‫So they don't they don't show us their scheme.

96
00:06:33,600 --> 00:06:34,770
‫I'm nothing like that.

97
00:06:35,250 --> 00:06:41,760
‫But from this example that tells me that they have a lot of indexes, so pay attention to that.

98
00:06:42,120 --> 00:06:50,280
‫So PostgreSQL and their primary key and secondary, he always points to the tuple ID, which is the

99
00:06:50,280 --> 00:06:52,560
‫physical representation on disk.

100
00:06:53,820 --> 00:06:57,060
‫And here's here's here's how PostScript works.

101
00:06:57,300 --> 00:07:07,350
‫So if you now go ahead and update a row, any role in this table, what we do is we essentially insult

102
00:07:07,350 --> 00:07:12,630
‫insert a duplicate role within you to pull it this time.

103
00:07:13,800 --> 00:07:20,490
‫And now that we have a new idea, we need to point the indexes.

104
00:07:21,460 --> 00:07:27,370
‫The secondary indexes and pretty much everything they use is the stability to the new representation.

105
00:07:27,780 --> 00:07:28,020
‫Right.

106
00:07:28,570 --> 00:07:34,420
‫And that takes a finite amount of time, a finite amount of rights.

107
00:07:34,700 --> 00:07:40,030
‫Finally, I want to work for progress to do right, because everything points directly to the desk,

108
00:07:40,240 --> 00:07:48,520
‫just like my eye, my isaam I and in my skull, that's exactly the same architecture where everything

109
00:07:48,520 --> 00:07:49,720
‫points directly to the desk.

110
00:07:49,720 --> 00:07:51,610
‫And you might say, what's bad about this?

111
00:07:53,080 --> 00:07:54,310
‫There is good and bad.

112
00:07:54,580 --> 00:07:56,740
‫The bad thing is what they are explaining.

113
00:07:56,740 --> 00:07:57,790
‫Explaining here is this.

114
00:07:57,850 --> 00:08:06,550
‫Hey, the moment we touch any role, I, I have to update all the indexes, including the primary key,

115
00:08:06,880 --> 00:08:16,420
‫because now all these indexes have a new the those entries have a new I.D. that I need to pick up tuple

116
00:08:16,420 --> 00:08:16,780
‫ID.

117
00:08:16,930 --> 00:08:17,950
‫So I have to update that.

118
00:08:17,950 --> 00:08:21,160
‫And that obviously takes a ripple effect.

119
00:08:21,330 --> 00:08:22,330
‫It's called they called it.

120
00:08:22,330 --> 00:08:22,660
‫Right.

121
00:08:22,660 --> 00:08:24,610
‫Amplification and the coming slides.

122
00:08:24,610 --> 00:08:24,860
‫Right.

123
00:08:25,540 --> 00:08:33,610
‫And this logical writer, I updated a single field in a single row, yet it results on five, six,

124
00:08:33,610 --> 00:08:39,850
‫seven, write physical rights to disk because you're updating the secondary index, the secondary and

125
00:08:39,850 --> 00:08:40,320
‫the second.

126
00:08:40,330 --> 00:08:43,720
‫If you have a lot of indexes, this even gets slower and slower and slower.

127
00:08:44,330 --> 00:08:44,700
‫Right.

128
00:08:45,220 --> 00:08:47,380
‫So bear with me.

129
00:08:47,380 --> 00:08:50,470
‫Here is this is just explaining their point now.

130
00:08:50,770 --> 00:08:54,010
‫So there they go through all of that's exactly what I said.

131
00:08:54,910 --> 00:08:57,640
‫And that as a result.

132
00:08:59,380 --> 00:09:06,610
‫Slows things down because first of all, rights are not just right, are not sloper, say, because

133
00:09:06,610 --> 00:09:08,290
‫you do flush right.

134
00:09:08,300 --> 00:09:12,070
‫You have a lot of rights to do and then you do all of them at once.

135
00:09:12,940 --> 00:09:16,930
‫But the side effect of the rights, we're going to explain it in a minute.

136
00:09:16,990 --> 00:09:21,910
‫It's like one single thing translate to a lot of physical rights.

137
00:09:22,240 --> 00:09:23,860
‫Have to update this index is index this.

138
00:09:23,890 --> 00:09:28,810
‫This isn't the right to log, which is which is something we're going to explain in this article.

139
00:09:28,810 --> 00:09:34,790
‫A lot gets large when you when you want to apply these changes.

140
00:09:35,260 --> 00:09:36,560
‫So that's the first thing on this.

141
00:09:37,300 --> 00:09:41,680
‫They go through the on desk representation of PostgreSQL.

142
00:09:42,220 --> 00:09:43,240
‫Then they explain.

143
00:09:43,240 --> 00:09:48,370
‫The second point is replication and replication gear.

144
00:09:48,370 --> 00:09:54,190
‫Guys, as I discussed, the right ahead log is basically if I do an uncertified one, an update.

145
00:09:54,520 --> 00:09:59,470
‫This statement is translated into physical changes.

146
00:09:59,740 --> 00:10:05,980
‫They go to this block and change this location and and replace this value with this value.

147
00:10:06,310 --> 00:10:06,700
‫Right.

148
00:10:06,940 --> 00:10:10,060
‫Or go to this index and change this value to this value.

149
00:10:10,060 --> 00:10:11,900
‫Go to this index on this position changes.

150
00:10:12,160 --> 00:10:18,910
‫These changes are written in the right ahead log as actual disc changes.

151
00:10:20,110 --> 00:10:24,530
‫OK, so this is a very, very, very important thing to know.

152
00:10:25,240 --> 00:10:26,770
‫Now, this is the right to head.

153
00:10:26,770 --> 00:10:31,720
‫Look, we have this and also the head log is has its its own structure.

154
00:10:32,140 --> 00:10:32,490
‫Right.

155
00:10:32,500 --> 00:10:35,740
‫It's somewhere else and it's being maintained.

156
00:10:35,750 --> 00:10:38,380
‫So that's also the right ahead log.

157
00:10:38,380 --> 00:10:44,680
‫Also has a physical representation on disk when you translate it into and this is the right.

158
00:10:44,920 --> 00:10:46,660
‫So there is a lot that's going on.

159
00:10:47,560 --> 00:10:52,690
‫So now when you come to your application, which we talked about right here, guys, and also discussed

160
00:10:52,690 --> 00:10:58,570
‫in my course introduction to Derivs Engineering, the idea of having a primary database except the rights

161
00:10:58,960 --> 00:11:04,660
‫and stand by her application for reads that that you can they can take reads.

162
00:11:04,690 --> 00:11:10,480
‫You need to push these changes and the way you push them as you push them to right ahead log, which

163
00:11:10,480 --> 00:11:18,010
‫is a very consistent thing, down the standby databases so they can get up to date.

164
00:11:18,670 --> 00:11:18,990
‫Right.

165
00:11:19,080 --> 00:11:21,340
‫That sounds simple, you know.

166
00:11:21,940 --> 00:11:26,650
‫And what what what what they explain in that application format.

167
00:11:26,650 --> 00:11:31,810
‫And this is where kind of their point about the limitation of postcrisis.

168
00:11:32,620 --> 00:11:33,550
‫You guys have this.

169
00:11:33,910 --> 00:11:38,140
‫First of all, you have this war or I Tadlock, which is quite large.

170
00:11:38,140 --> 00:11:38,500
‫Why?

171
00:11:38,500 --> 00:11:46,900
‫Because a single update statement translated into multiple rights and these rights are made its way

172
00:11:46,900 --> 00:11:48,160
‫to the right ahead log.

173
00:11:48,290 --> 00:11:49,080
‫Hmm.

174
00:11:49,570 --> 00:11:54,130
‫So the the right to log doesn't have to update the stable.

175
00:11:54,160 --> 00:12:02,830
‫It's not like a statement based replication despite Moscow's support statement based application through

176
00:12:02,830 --> 00:12:03,460
‫a third party.

177
00:12:03,460 --> 00:12:12,370
‫I believe the my school have both supported both that statement based application and also right based

178
00:12:12,370 --> 00:12:12,910
‫application.

179
00:12:13,330 --> 00:12:15,700
‫There is, again, pros and cons for both.

180
00:12:17,080 --> 00:12:23,920
‫So now when when we try to apply that came back, that dog was barking.

181
00:12:23,920 --> 00:12:24,310
‫All right.

182
00:12:25,150 --> 00:12:26,620
‫So the master.

183
00:12:27,560 --> 00:12:35,090
‫Database pushes the wall, changes down to the standby databases so they can get updated, but you might

184
00:12:35,090 --> 00:12:38,720
‫say, Hosain, what if the standby is actually executing a query?

185
00:12:39,080 --> 00:12:42,100
‫Do this, just do we just stop this query?

186
00:12:42,890 --> 00:12:43,300
‫Right.

187
00:12:43,440 --> 00:12:49,400
‫I see a McSweeny query in the standby to read something that happened to be deleted in the master,

188
00:12:49,820 --> 00:12:51,830
‫and it's being written directly.

189
00:12:51,830 --> 00:12:53,000
‫So I stop there.

190
00:12:53,420 --> 00:12:54,440
‫Do I stop the query.

191
00:12:54,440 --> 00:12:55,220
‫Do I wait.

192
00:12:55,490 --> 00:13:02,660
‫All these questions are going to get answered in a minute and as a result it will shape their decision

193
00:13:02,780 --> 00:13:03,860
‫to move to my school.

194
00:13:04,010 --> 00:13:06,360
‫I'm going to just get them to explain it to you now.

195
00:13:07,430 --> 00:13:12,890
‫So now they have we talked about the underscore and the underscore presentation.

196
00:13:12,890 --> 00:13:19,180
‫We talked about our application and now we're going to talk about the consequences of Posterous design.

197
00:13:19,190 --> 00:13:24,920
‫That's that third point here where the problems of Bosco's.

198
00:13:24,920 --> 00:13:29,120
‫So let's enjoy this reading is the first problem is right.

199
00:13:29,120 --> 00:13:32,000
‫Amplification and write amplification is apparently something.

200
00:13:32,000 --> 00:13:41,840
‫It is in SSD where a single right that you think it's logical translate to many, many, many physical

201
00:13:41,840 --> 00:13:44,750
‫rights, especially A.D.s, as he does his own thing.

202
00:13:44,760 --> 00:13:54,390
‫So when you update versus insert eses deed does a little bit different thing is this D love to insert

203
00:13:54,440 --> 00:14:02,090
‫new things, you have to like to logically just insert new things and create new pages.

204
00:14:02,840 --> 00:14:12,680
‫SSD does not do well with updates because the goal of SSD is to have a page and flush it.

205
00:14:12,680 --> 00:14:15,290
‫In order to update an existing page.

206
00:14:15,290 --> 00:14:17,900
‫You have to invalidate that existing page.

207
00:14:18,530 --> 00:14:23,170
‫You take it and then you copy it, change it and then write it.

208
00:14:23,180 --> 00:14:28,150
‫So there is a little bit more work when it comes to an update versus an instance, which is faster.

209
00:14:28,160 --> 00:14:35,780
‫So that's just that's just that's all the reason why Google invented the level DB database and then

210
00:14:35,780 --> 00:14:38,030
‫why Facebook invented Rock CDB.

211
00:14:39,260 --> 00:14:45,830
‫On top of that, I think that to to take advantage of SSD and they built a completely different structure

212
00:14:45,830 --> 00:14:52,220
‫called the log structured military, where it's it's optimized for inserts and instead of updates,

213
00:14:52,820 --> 00:14:56,030
‫everything is an insert almost in their log structure.

214
00:14:56,050 --> 00:14:59,960
‫Mircera So so that's that's the idea of right.

215
00:14:59,960 --> 00:15:00,600
‫Amplification.

216
00:15:00,620 --> 00:15:04,940
‫This is now take that and amplify it at the Bosphorus level.

217
00:15:05,210 --> 00:15:14,030
‫I that a client I'm doing a single update statement to my table and if I have like 700 indexers, I

218
00:15:14,030 --> 00:15:17,750
‫just made 700 updates, physical updates.

219
00:15:18,970 --> 00:15:22,780
‫As a result of my single logical update.

220
00:15:24,270 --> 00:15:33,630
‫So that this 700 update also at the highest level, translate to many, many physical amplified as the

221
00:15:33,630 --> 00:15:40,530
‫updates because you going writes and pages and this is deeper thing, they have a limited shelf life.

222
00:15:40,530 --> 00:15:45,630
‫So if you have if you have a limited shelf life, it is as you can only write so much.

223
00:15:45,630 --> 00:15:49,980
‫I think there is a number that varies between a disc and another.

224
00:15:50,400 --> 00:15:55,320
‫But in general it's essentially I think 12000 times or something like that.

225
00:15:55,470 --> 00:15:56,760
‫That's most of them.

226
00:15:57,790 --> 00:15:59,740
‫So so they explain this here.

227
00:15:59,760 --> 00:16:01,200
‫I just summarized it to you.

228
00:16:01,290 --> 00:16:01,650
‫Right.

229
00:16:01,650 --> 00:16:03,270
‫Amplification is a problem for them.

230
00:16:03,280 --> 00:16:11,670
‫So there is is this getting is getting their life span of us is getting lower and lower because of the

231
00:16:11,670 --> 00:16:16,740
‫right amplification, because those guys have hundreds and hundreds of indexes.

232
00:16:16,770 --> 00:16:18,840
‫Why would you have this much indexes?

233
00:16:19,200 --> 00:16:19,800
‫Beats me.

234
00:16:19,980 --> 00:16:21,880
‫Do you really carry on all of them?

235
00:16:22,410 --> 00:16:28,510
‫Do you really query on first name, right or last name?

236
00:16:29,640 --> 00:16:31,890
‫That's why adding indexes great.

237
00:16:32,160 --> 00:16:34,840
‫Adding too much indexes is just a bad idea.

238
00:16:35,400 --> 00:16:37,310
‫So that's the right amplification problem.

239
00:16:40,400 --> 00:16:47,420
‫The second problem they want to discuss here is the replication problem, guys, take the same thing

240
00:16:47,420 --> 00:16:47,870
‫that we did.

241
00:16:47,870 --> 00:16:54,740
‫We did a single update that translated to lots of update to all the indexes because all of the indexes

242
00:16:54,740 --> 00:16:56,840
‫point to the row directly.

243
00:16:56,840 --> 00:16:59,160
‫So and the rotch idea changes.

244
00:16:59,510 --> 00:17:02,580
‫So we have to make them aware of this row of changes.

245
00:17:02,960 --> 00:17:06,020
‫So all of these indexes point to the role directly.

246
00:17:06,350 --> 00:17:08,970
‫So these changes are just amplified.

247
00:17:09,500 --> 00:17:13,700
‫Now, these changes translate to walk to a wall, right?

248
00:17:14,060 --> 00:17:14,310
‫Right.

249
00:17:14,310 --> 00:17:16,730
‫TADLOCK Hey, update.

250
00:17:16,740 --> 00:17:20,020
‫This was a goddess and this index and this second and this, this.

251
00:17:20,150 --> 00:17:25,160
‫And by the way, there is a rule here change this is to do this and it is earned it it's all physical

252
00:17:25,160 --> 00:17:27,740
‫right to the desk.

253
00:17:30,320 --> 00:17:35,090
‫What they complain about in their applications is this wall.

254
00:17:36,510 --> 00:17:38,310
‫Translate to a large.

255
00:17:39,430 --> 00:17:46,960
‫Big sized bandwidth when it comes to to their to their master.

256
00:17:48,340 --> 00:17:57,040
‫Walker or stand by or application, and those are interstate, they have their application leprechaun's

257
00:17:57,040 --> 00:18:07,780
‫across states, across different countries, so they had to buy expensive bandwidth to kind of transmit

258
00:18:07,780 --> 00:18:12,000
‫their wall changes from this replica to this replica.

259
00:18:12,520 --> 00:18:16,480
‫And I believe they have also child, grandchild, their applications.

260
00:18:16,480 --> 00:18:18,350
‫So take that into consideration.

261
00:18:18,370 --> 00:18:24,670
‫So the will change as they grow large, the bandwidth becomes expensive because they are very large

262
00:18:24,670 --> 00:18:26,800
‫and they are not making small updates.

263
00:18:26,800 --> 00:18:27,850
‫They're making large updates.

264
00:18:27,850 --> 00:18:29,300
‫Which devices are even larger.

265
00:18:29,530 --> 00:18:32,010
‫So that's that that's the limitation problem here.

266
00:18:32,020 --> 00:18:39,790
‫In case I'm going to read this this section for you guys so so you can learn more about it in case one

267
00:18:40,300 --> 00:18:44,380
‫more prosperous replication happens purely within a single data center.

268
00:18:44,380 --> 00:18:46,590
‫The replication bandwidth may not be a problem.

269
00:18:46,990 --> 00:18:52,420
‫Modern network equipment and switches can handle a large amount of bandwidth, and many host providers

270
00:18:52,420 --> 00:18:55,510
‫offer free or cheap data center bandwidth.

271
00:18:56,650 --> 00:18:56,950
‫Right.

272
00:18:57,670 --> 00:19:03,010
‫If you're internally, I can transfer one gig of all sizes easily.

273
00:19:03,340 --> 00:19:08,410
‫However, when replication must happen between data centers, issues can quickly escalate.

274
00:19:08,420 --> 00:19:15,270
‫For instance, Gober originally used a physical servers in Coolac colocation space.

275
00:19:15,280 --> 00:19:21,880
‫I don't know what the heck is a colocation colocation space on the West Coast for disaster recovery

276
00:19:21,880 --> 00:19:22,330
‫purposes.

277
00:19:22,330 --> 00:19:28,960
‫We added servers in the Second East Coast, colocation space and this design.

278
00:19:28,960 --> 00:19:36,160
‫We had a Master Bosco's instance plus replicas in Western datacenter and set of replication is so that

279
00:19:36,160 --> 00:19:43,390
‫that kind of the constraint you can see from east to west just just did not scare off all of them.

280
00:19:43,900 --> 00:19:50,350
‫Right, because of the C one, how one single problem can lead to a lot of bigger problems can lead

281
00:19:50,350 --> 00:19:51,030
‫to another problem.

282
00:19:51,040 --> 00:19:52,450
‫You see the pattern, guys.

283
00:19:54,790 --> 00:19:58,040
‫Rights are big because they have a lot of indexes.

284
00:19:58,450 --> 00:20:00,200
‫That's where you should start.

285
00:20:00,490 --> 00:20:04,760
‫Why do you have this much Texas?

286
00:20:06,500 --> 00:20:09,190
‫You might say, hey, Hussein, I cannot live.

287
00:20:09,230 --> 00:20:15,830
‫I have to have 350 indexes on all my fields because I query against the.

288
00:20:17,820 --> 00:20:24,810
‫Well, in this case, I say, OK, maybe that's not a choice for you then, but try to avoid that in

289
00:20:24,810 --> 00:20:25,170
‫the first.

290
00:20:25,170 --> 00:20:28,680
‫But that's that's why that's what I didn't see.

291
00:20:28,680 --> 00:20:30,170
‫And that's why people are pissed.

292
00:20:30,180 --> 00:20:36,880
‫It's like, wow, Kent didn't did you really didn't you explain why do you guys have a lot of indexes?

293
00:20:37,260 --> 00:20:46,230
‫Can you explain why do you need I bet if you go into the actual architecture, most thing don't need

294
00:20:46,230 --> 00:20:47,300
‫this much indexes.

295
00:20:48,360 --> 00:20:54,600
‫As a result, you will not translate to a huge right amplification consequences.

296
00:20:54,600 --> 00:21:00,030
‫You will not have that because you'll not have a lot of of indexes to update.

297
00:21:00,240 --> 00:21:00,620
‫Right.

298
00:21:01,470 --> 00:21:09,030
‫But well, we're not on board, so we don't know their architecture, but that might be a valid use.

299
00:21:09,240 --> 00:21:11,550
‫So let's go to the data corruption.

300
00:21:11,550 --> 00:21:16,620
‫This is this is the most dumb section in this whole article.

301
00:21:16,740 --> 00:21:17,880
‫I'll save you some time.

302
00:21:18,720 --> 00:21:27,010
‫What they say here as hey, during that application, we Postgres two had a bug in it and our tableware

303
00:21:27,210 --> 00:21:28,560
‫was corrupted as a result.

304
00:21:30,980 --> 00:21:33,590
‫Seriously, seriously over.

305
00:21:34,600 --> 00:21:43,780
‫What software doesn't have bugs, you're adding a bug as a result to move from postscripts to a MySQL

306
00:21:43,780 --> 00:21:45,510
‫like my cycle's perfect.

307
00:21:46,150 --> 00:21:48,350
‫That's just odd.

308
00:21:48,940 --> 00:21:50,020
‫That's just to me.

309
00:21:50,020 --> 00:21:50,650
‫I'm sorry.

310
00:21:50,650 --> 00:21:51,450
‫That's just odd.

311
00:21:51,970 --> 00:21:59,440
‫So they said during during their application process, the replicas were not in sync for some reason.

312
00:21:59,690 --> 00:22:05,650
‫And as a result, when you query for a unique value, let's say select start from users, what I'd call

313
00:22:05,650 --> 00:22:06,490
‫for you should get one.

314
00:22:06,760 --> 00:22:07,600
‫They were getting two.

315
00:22:07,600 --> 00:22:12,700
‫They were getting the old retired row for some for some reason.

316
00:22:13,480 --> 00:22:17,800
‫And that causes their application to fall down and fall apart.

317
00:22:17,800 --> 00:22:23,050
‫So they had to add defensive programming and to catch for the stuff.

318
00:22:23,050 --> 00:22:29,320
‫But it's a bug Bosca's immediately if they notified police his name, they would immediately have fixed

319
00:22:29,320 --> 00:22:31,330
‫it and fix that bug immediately.

320
00:22:31,480 --> 00:22:33,040
‫But that's a good bug.

321
00:22:33,040 --> 00:22:40,150
‫But I don't see bugs as a reason to move from as a show stopper, in my opinion.

322
00:22:40,210 --> 00:22:43,200
‫So that's why they talk about that and they talk about here.

323
00:22:43,270 --> 00:22:50,980
‫One section is the brief rebalancing, which is, by the way, by the way, B3 three rebalancing adds

324
00:22:50,980 --> 00:22:52,420
‫to the right amplification.

325
00:22:52,420 --> 00:22:59,710
‫I just they don't mention that, but it's just implied because a lot of people know that when you start

326
00:23:00,010 --> 00:23:07,180
‫to grow and you have a lot of indexes, you keep updating those indexes naturally if you if your value

327
00:23:07,690 --> 00:23:09,190
‫touches that index.

328
00:23:09,370 --> 00:23:09,640
‫Right.

329
00:23:11,260 --> 00:23:21,520
‫However, as a result of inserting that might the B3 structure might need to rebalance itself and when

330
00:23:21,520 --> 00:23:23,560
‫it needs to rebalance itself.

331
00:23:24,590 --> 00:23:33,150
‫It actually doesn't update physical update to the three and updates or updates, not inserts, right.

332
00:23:33,350 --> 00:23:37,260
‫So updates translate to word to actual SSD.

333
00:23:37,310 --> 00:23:37,670
‫Right.

334
00:23:37,670 --> 00:23:40,480
‫Amplifications, because this is the do not like updates.

335
00:23:41,630 --> 00:23:44,720
‫So that's that's another thing that can amplify the rights.

336
00:23:45,200 --> 00:23:48,590
‫I'm talking if you go to the millions of euros.

337
00:23:48,740 --> 00:23:49,040
‫Right.

338
00:23:49,040 --> 00:23:49,510
‫Obviously.

339
00:23:51,820 --> 00:23:52,130
‫Right.

340
00:23:52,730 --> 00:23:53,680
‫Let's move to the next one.

341
00:23:53,810 --> 00:23:55,760
‫Replica MVC.

342
00:23:56,130 --> 00:23:56,480
‫All right.

343
00:23:56,750 --> 00:24:00,560
‫Replica NBCC or replica multi version.

344
00:24:00,560 --> 00:24:06,320
‫Concurrency Control says Bosco's does not have true replica of NBCC.

345
00:24:08,150 --> 00:24:10,580
‫Well, why?

346
00:24:10,580 --> 00:24:20,900
‫Because the fact that replicas apply will updates directly meant because if you think about it, Posterous

347
00:24:21,260 --> 00:24:27,020
‫by default, again, by FISA, by default, takes the undiscussable presentation of the wall changes

348
00:24:27,380 --> 00:24:29,000
‫and that's what gets transmitted.

349
00:24:29,030 --> 00:24:36,020
‫So it's often higher bandwidth, but it's if you think about it, it's faster.

350
00:24:36,590 --> 00:24:36,920
‫Right.

351
00:24:37,160 --> 00:24:40,760
‫The alternative is just a statement based replication.

352
00:24:40,760 --> 00:24:41,030
‫Right.

353
00:24:41,150 --> 00:24:50,300
‫Where when instead of sending the results of the execution of the queries, send the queries themself

354
00:24:51,320 --> 00:24:55,310
‫like, hey, I just didn't insert I just did an update.

355
00:24:55,310 --> 00:24:55,940
‫I just didn't.

356
00:24:56,420 --> 00:25:03,440
‫So the actual strength of the statements, these SQL statement, just send them to the replica.

357
00:25:04,040 --> 00:25:07,680
‫This will be way slower, right.

358
00:25:07,700 --> 00:25:18,050
‫Because, yeah, the bandwidth of transmitting these wall changes as as a form of statement is smaller

359
00:25:18,380 --> 00:25:20,970
‫than the actual physical changes that happen.

360
00:25:21,320 --> 00:25:27,680
‫However, applying them to the replica now you have to actually insert are not straightforward and such

361
00:25:27,680 --> 00:25:28,310
‫might be OK.

362
00:25:28,310 --> 00:25:33,350
‫But what if you do an update, for example, an update, could scan, could touch.

363
00:25:33,350 --> 00:25:36,490
‫The index actually does work.

364
00:25:37,160 --> 00:25:39,020
‫So you did double the work technically.

365
00:25:40,580 --> 00:25:45,740
‫Right, because you did the work to execute the statement on the master, you now we're doing the same

366
00:25:45,740 --> 00:25:46,220
‫work.

367
00:25:46,710 --> 00:25:47,420
‫Exactly.

368
00:25:47,480 --> 00:25:49,470
‫And that statement is expensive.

369
00:25:49,470 --> 00:25:54,080
‫Are you going to take the same costs on the server and the destination?

370
00:25:54,620 --> 00:25:59,540
‫So there is a problem calls for using both, but they are complaining here that.

371
00:26:00,610 --> 00:26:01,660
‫Polska has.

372
00:26:04,170 --> 00:26:11,280
‫Wall update is just doesn't give them NVCA support, that's clear that so let's see if I am if I am

373
00:26:11,280 --> 00:26:19,440
‫in a replica, stand by and I'm executing a query and one of my wall changes.

374
00:26:20,830 --> 00:26:24,350
‫A fact that query that is being executed on the stand by.

375
00:26:24,520 --> 00:26:27,090
‫So, again, I a master, I am.

376
00:26:27,110 --> 00:26:29,680
‫I deleted the thing.

377
00:26:29,680 --> 00:26:32,320
‫I deleted a table that's just a little bit harsh.

378
00:26:32,320 --> 00:26:34,900
‫But let's say deleted furors.

379
00:26:34,900 --> 00:26:35,220
‫Right.

380
00:26:36,480 --> 00:26:44,520
‫And now on the on the stand by, I'm actually querying those rules that is being deleted on the master,

381
00:26:44,520 --> 00:26:46,030
‫I am on a different replica.

382
00:26:46,650 --> 00:26:54,420
‫So now I am pushing the master, pushing the wall, changes to the to the stand by while that Quarrie

383
00:26:54,420 --> 00:26:57,550
‫that squaring those deleted rows is being executed.

384
00:26:57,720 --> 00:26:59,780
‫What should Posterous do you.

385
00:27:00,090 --> 00:27:05,230
‫You tell me as the viewer listener, what should what do you think should happen here?

386
00:27:06,750 --> 00:27:10,320
‫Should the post office immediately cancel the query?

387
00:27:12,040 --> 00:27:14,380
‫Right and right, the changes.

388
00:27:15,500 --> 00:27:22,700
‫Or should the should the rule changes be paused until the query finishes?

389
00:27:24,320 --> 00:27:27,350
‫If you think about there are no other choices, right, you have to pause it.

390
00:27:27,620 --> 00:27:30,170
‫Obviously you're not pausing all wool changes.

391
00:27:30,860 --> 00:27:34,880
‫You're only pausing changes that affect running transaction.

392
00:27:34,880 --> 00:27:37,250
‫And that's another thing to worry about.

393
00:27:37,250 --> 00:27:42,580
‫How the heck do I know that the query that being executed actually affects my wall changes?

394
00:27:43,160 --> 00:27:45,220
‫Building databases is not easy.

395
00:27:45,230 --> 00:27:47,420
‫Guys, look at all this complexity.

396
00:27:48,050 --> 00:27:54,560
‫So they are complaining here that you guys don't have any VC support because what you're doing is.

397
00:27:55,450 --> 00:28:02,260
‫What Baucus does effectively is essentially having a timeout, says, hey, we're going to we're going

398
00:28:02,260 --> 00:28:06,970
‫to block the wall changes for a given time, and they give you this time configurable.

399
00:28:07,970 --> 00:28:15,670
‫If the query didn't finish in this amount of time, we are sorry, we're going to cancel those changes.

400
00:28:16,180 --> 00:28:21,580
‫We're going to cancel that query that is actually querying its reading.

401
00:28:22,000 --> 00:28:25,930
‫And while we're applying, we're going to force applying the changes.

402
00:28:25,930 --> 00:28:26,200
‫Why?

403
00:28:26,200 --> 00:28:36,490
‫Because Polska is designed to favor eventual consistency over, let's say, just reading queries in

404
00:28:36,490 --> 00:28:37,030
‫this case.

405
00:28:37,690 --> 00:28:39,860
‫So I'd rather be eventually consistent.

406
00:28:40,060 --> 00:28:43,150
‫Remember, if this is a ventriloquist as well.

407
00:28:43,150 --> 00:28:47,590
‫So stop saying that Norse equals the only database has evangelic assistance.

408
00:28:47,670 --> 00:28:50,200
‫Every database has it as long as between replicas.

409
00:28:50,200 --> 00:28:50,470
‫Right.

410
00:28:51,130 --> 00:28:53,320
‫Relational doesn't in the same thing.

411
00:28:53,440 --> 00:28:54,280
‫Same instance.

412
00:28:54,280 --> 00:28:55,750
‫Yet it's completely consistent.

413
00:28:55,750 --> 00:28:59,860
‫But across replicas there is always this idea of eventual consistency.

414
00:29:00,310 --> 00:29:04,630
‫So what this does is actually kills the transaction and they did not like that.

415
00:29:05,230 --> 00:29:07,360
‫So let's read this a little bit.

416
00:29:07,570 --> 00:29:14,560
‫I kind of don't don't agree with the statement, but the design means that replicas can retain routinely

417
00:29:14,560 --> 00:29:20,370
‫lagged seconds behind master, obviously, and therefore it is easy to write code that results in transactions.

418
00:29:20,640 --> 00:29:21,120
‫Hmm.

419
00:29:21,880 --> 00:29:22,540
‫What does that mean?

420
00:29:23,200 --> 00:29:28,120
‫This problem might not be apparent to the application developer writing code, that obscure word, the

421
00:29:28,120 --> 00:29:33,070
‫transactions on them, for instance, say a developer has some code that has to email received to a

422
00:29:33,070 --> 00:29:33,460
‫user.

423
00:29:33,640 --> 00:29:41,170
‫Depending on how it's written, the code may implicitly have a database transaction that helds open

424
00:29:41,170 --> 00:29:42,610
‫until the email finishes.

425
00:29:43,070 --> 00:29:44,590
‫That's just a bad idea, right?

426
00:29:46,210 --> 00:29:53,770
‫You don't have you don't held you don't hold a transaction open and you do stuff has nothing to do with

427
00:29:53,770 --> 00:29:54,670
‫the transaction itself.

428
00:29:54,700 --> 00:29:56,500
‫Try to avoid that as much as possible.

429
00:29:56,510 --> 00:29:58,630
‫So I was just that's just a best practice.

430
00:30:00,010 --> 00:30:06,710
‫While it's always bad form to let your code hold open that obvious transaction while performing unrelated

431
00:30:06,730 --> 00:30:08,090
‫walking out, OK.

432
00:30:08,480 --> 00:30:13,120
‫Thankfully, they mention that the reality is the most enduring are not database expert and may not

433
00:30:13,120 --> 00:30:14,440
‫always understand this problem.

434
00:30:15,640 --> 00:30:18,060
‫I have to disagree with this one again, guys.

435
00:30:18,550 --> 00:30:24,400
‫If you if you if you have if you have a few, if you know me from this channel or the podcast, you

436
00:30:24,400 --> 00:30:31,690
‫know that war as an engineer, you have to take your pride of your work and the thing that you interface

437
00:30:31,690 --> 00:30:31,930
‫with.

438
00:30:32,050 --> 00:30:34,910
‫I believe that you have to understand what you're communicating with.

439
00:30:35,230 --> 00:30:42,700
‫So, yeah, engineers are not database expert, but this does not qualify as a database expert.

440
00:30:42,730 --> 00:30:46,120
‫This is just basic transaction management, in my opinion.

441
00:30:46,540 --> 00:30:49,720
‫And I believe engineers have to understand this.

442
00:30:50,500 --> 00:30:50,830
‫Right.

443
00:30:51,220 --> 00:30:57,040
‫And engineers have to understand, you know, you might be not as radical as as me.

444
00:30:57,340 --> 00:30:59,830
‫I don't like to work with anything that I don't understand.

445
00:30:59,980 --> 00:31:02,200
‫If it's black box, I don't like to work with it.

446
00:31:02,470 --> 00:31:10,570
‫Before I pick a tool, I have to understand fully how it actually works fully, fully from zero to 100

447
00:31:10,570 --> 00:31:10,950
‫percent.

448
00:31:11,470 --> 00:31:17,710
‫If I'm if I'm working on it, if I'm connecting with it, if I'm interfering with it, it's OK if I

449
00:31:17,710 --> 00:31:19,510
‫understand 80, 70 percent of the tool.

450
00:31:20,500 --> 00:31:23,170
‫But again, I'm not going to understand every single thing in that case.

451
00:31:23,170 --> 00:31:23,400
‫Right.

452
00:31:23,710 --> 00:31:24,580
‫But that's just me.

453
00:31:24,830 --> 00:31:27,100
‫You might have a different opinion polls because upgrades.

454
00:31:27,100 --> 00:31:29,440
‫So I kind of agree with them on this one.

455
00:31:29,740 --> 00:31:32,230
‫I, I try to upgrade this many times.

456
00:31:32,230 --> 00:31:37,930
‫I always didn't find the right tutorial or it was so complicated that I gave up.

457
00:31:37,930 --> 00:31:38,320
‫Right.

458
00:31:38,620 --> 00:31:41,400
‫And they kind of reiterate the same problem.

459
00:31:41,410 --> 00:31:48,630
‫So I had to agree with them 100 percent in this Polska subject is really painful, really painful.

460
00:31:48,640 --> 00:31:54,430
‫I've been there, I've been there from nine three to nine for nine four to nine five.

461
00:31:54,760 --> 00:31:56,860
‫I then just gives up.

462
00:31:56,860 --> 00:32:00,700
‫I just rather recreate my database's from scratch after that.

463
00:32:00,820 --> 00:32:08,800
‫Obviously I'm running a test database here but but yet I didn't run a production database that I had

464
00:32:08,800 --> 00:32:09,490
‫to upgrade it.

465
00:32:09,490 --> 00:32:15,010
‫But what I'm going to do in this case is just obviously there is a way, but.

466
00:32:15,910 --> 00:32:18,440
‫Apparently this way sometimes works, sometimes it doesn't.

467
00:32:19,030 --> 00:32:24,070
‫So there is a there is there is also the PJI logical way of right.

468
00:32:24,660 --> 00:32:27,190
‫There is there are some tools that allow you to do upgrades.

469
00:32:28,030 --> 00:32:28,410
‫Right.

470
00:32:28,420 --> 00:32:35,170
‫And guys, if you if you know any of that stuff, if you have ever upgraded Postgres database smoothly,

471
00:32:35,170 --> 00:32:37,000
‫let me know in the comments section below.

472
00:32:37,030 --> 00:32:38,840
‫I'd love to know how to do it.

473
00:32:39,550 --> 00:32:44,820
‫I tried twice, I believe, and I gave up and says, you know, this is not straightforward at all.

474
00:32:45,520 --> 00:32:48,070
‫And I didn't have I wasn't forced to do it.

475
00:32:48,070 --> 00:32:49,390
‫So I took that.

476
00:32:49,390 --> 00:32:57,940
‫He's got out of recreating my dad's OK, the architecture of my school and UTB and whatever we talked

477
00:32:57,940 --> 00:33:01,840
‫about, I'd have you guys check out the video right here if you want to learn more about it.

478
00:33:01,840 --> 00:33:03,270
‫But maybe what?

479
00:33:04,090 --> 00:33:07,300
‫So they go now through their own describers in compared to a podcast.

480
00:33:07,300 --> 00:33:08,620
‫So A.B..

481
00:33:08,620 --> 00:33:10,360
‫Or just miscalled in general.

482
00:33:11,890 --> 00:33:18,620
‫My sexual energy in general, that's the right voice, saying you have the primary key and the primary

483
00:33:18,640 --> 00:33:26,530
‫has a pointer to the roar directly to the physical database on the scroll, all the indexes that you

484
00:33:26,530 --> 00:33:30,220
‫create points back to the primary key.

485
00:33:30,340 --> 00:33:37,540
‫And that's the powerful thing here for them, because now if I update anything on on the on the row,

486
00:33:38,200 --> 00:33:43,930
‫only the primary key needs to be updated to know the new kind of rule ID and even that.

487
00:33:44,470 --> 00:33:44,860
‫Right.

488
00:33:44,860 --> 00:33:50,140
‫It's a little bit different, but I don't have to touch my secondary indexes.

489
00:33:50,140 --> 00:33:50,380
‫Right.

490
00:33:51,970 --> 00:33:54,710
‫That being said, guys, they didn't.

491
00:33:55,300 --> 00:34:00,160
‫That's not always true if you're updating a field that has no index.

492
00:34:01,200 --> 00:34:01,770
‫Then.

493
00:34:03,750 --> 00:34:04,290
‫Right.

494
00:34:05,220 --> 00:34:10,500
‫You get it, not touch, only the primary key, but if you updated a feel that has an index, you got

495
00:34:10,500 --> 00:34:11,270
‫to touch both.

496
00:34:11,490 --> 00:34:13,470
‫So they didn't mention that, but.

497
00:34:14,830 --> 00:34:19,430
‫Yeah, right, because it's very defensive architecture article, right, bicycle's perfect.

498
00:34:20,230 --> 00:34:25,930
‫Yeah, if you did that, the actual field that has a secondary index, you have to operate like an index.

499
00:34:25,930 --> 00:34:27,760
‫You just updated a value.

500
00:34:27,940 --> 00:34:30,580
‫So you have to go to your index and change the tree.

501
00:34:30,590 --> 00:34:32,380
‫So that includes this value, right.

502
00:34:32,800 --> 00:34:35,380
‫So, yeah, you touch a lot of fields.

503
00:34:35,380 --> 00:34:35,640
‫Right.

504
00:34:36,160 --> 00:34:40,210
‫And if you touch a lot of fields, then you have to update all the indexes.

505
00:34:40,210 --> 00:34:40,420
‫Right.

506
00:34:40,600 --> 00:34:46,540
‫It's just bydesign it's less if you have a lot of indexes, you have less changes and general.

507
00:34:46,720 --> 00:34:46,960
‫Right.

508
00:34:47,320 --> 00:34:54,070
‫So as a result, this translates to obviously less, less raw wall changes because they don't have as

509
00:34:54,070 --> 00:34:57,490
‫much changes, logical to physical translation.

510
00:34:58,150 --> 00:35:04,480
‫And now they talk about the rollback mechanism here that might be called they have this concept of rollback

511
00:35:04,480 --> 00:35:05,100
‫segments.

512
00:35:05,110 --> 00:35:10,450
‫So instead of inserting a raw in the heap itself.

513
00:35:12,910 --> 00:35:19,330
‫When you updated data, Arau and upholsterers, you insert a row in the heap, it's on the table itself,

514
00:35:19,480 --> 00:35:19,780
‫right?

515
00:35:20,320 --> 00:35:22,720
‫Musical, A.B. does it differently.

516
00:35:22,900 --> 00:35:28,840
‫It's just the the copy that on to some other place called the undo the rollback segments.

517
00:35:29,020 --> 00:35:30,070
‫The undo logs.

518
00:35:30,250 --> 00:35:30,550
‫Right.

519
00:35:30,790 --> 00:35:31,930
‫And they keep it all there.

520
00:35:32,540 --> 00:35:37,180
‫And then based on that, they point to that location and roll back segments.

521
00:35:37,510 --> 00:35:37,760
‫All right.

522
00:35:37,780 --> 00:35:40,240
‫So so it's a little bit different architecture.

523
00:35:40,630 --> 00:35:44,830
‫So if you query now, if you want the latest, the latest is always there.

524
00:35:45,010 --> 00:35:46,060
‫So that's the beautiful thing.

525
00:35:46,300 --> 00:35:51,340
‫But based on your transaction idea, if you are coming from the past, you're counting on the past.

526
00:35:51,550 --> 00:35:52,900
‫You want all the results.

527
00:35:53,140 --> 00:35:59,770
‫You have to do the jump to go back to to get the all this jump doesn't exist in Polska.

528
00:35:59,780 --> 00:36:04,450
‫So queries that that are concurrent are fast on Postgres.

529
00:36:04,690 --> 00:36:11,140
‫They are technically slower and in my school because now you have to jump back and going through different

530
00:36:11,140 --> 00:36:13,570
‫places to do it, to do the query.

531
00:36:13,960 --> 00:36:14,410
‫Right.

532
00:36:15,010 --> 00:36:16,780
‫And and vice versa.

533
00:36:18,340 --> 00:36:22,300
‫So they explain the haziq under indexes point to the primary index and the primary.

534
00:36:22,300 --> 00:36:23,080
‫Next point to the desk.

535
00:36:23,110 --> 00:36:25,360
‫This is for people listening on the podcast.

536
00:36:25,510 --> 00:36:26,380
‫Were listening.

537
00:36:26,680 --> 00:36:27,610
‫Were what?

538
00:36:28,210 --> 00:36:28,690
‫What is it?

539
00:36:28,900 --> 00:36:35,230
‫We're looking at a picture of secondary index pointing to the primary index and then primary index pointing

540
00:36:35,230 --> 00:36:35,830
‫to the disk.

541
00:36:36,190 --> 00:36:37,150
‫That's just an extra layer.

542
00:36:37,660 --> 00:36:44,950
‫And then they claim that they say here that the application section of MySQL supports multiple replication

543
00:36:45,340 --> 00:36:47,590
‫statement based and.

544
00:36:48,530 --> 00:36:55,520
‫Wool changes and the moment you if you implement if you implement any of these, but if you implement

545
00:36:55,520 --> 00:37:03,470
‫statements based replication, you have true MVC support because now the statement, the wool changes,

546
00:37:03,470 --> 00:37:12,350
‫the coming that is coming to you from the master to the stand by is just another right to consider it

547
00:37:12,350 --> 00:37:15,040
‫another transaction trying to be executed.

548
00:37:15,380 --> 00:37:19,700
‫So it will have truly true and support.

549
00:37:19,700 --> 00:37:21,920
‫And that case will not be blocking.

550
00:37:22,070 --> 00:37:22,390
‫Right.

551
00:37:22,580 --> 00:37:26,090
‫Because you can technically query and right at the same time.

552
00:37:26,310 --> 00:37:32,330
‫And now as a result, you can implement the same exact thing that you're doing because you have a logical

553
00:37:32,330 --> 00:37:34,360
‫view of what is changing.

554
00:37:34,370 --> 00:37:39,980
‫As a result, the database is aware of the change it can implement and VXI at the higher level.

555
00:37:40,240 --> 00:37:40,580
‫All right.

556
00:37:41,270 --> 00:37:42,830
‫Even through replication.

557
00:37:45,020 --> 00:37:50,000
‫Postgres, the support that there is a third party that you can install and does exactly that you can

558
00:37:50,000 --> 00:37:52,470
‫do that is just they just didn't mention that.

559
00:37:52,520 --> 00:37:54,230
‫Oh, and this is an old article.

560
00:37:54,230 --> 00:37:57,710
‫So things can change, obviously, right.

561
00:37:58,520 --> 00:37:59,390
‫In my article.

562
00:38:00,230 --> 00:38:07,640
‫And and they say that, oh, by the way, even the wall the wall says the wall sizes are so small because

563
00:38:08,030 --> 00:38:10,880
‫we're changing, which you do very few things.

564
00:38:10,880 --> 00:38:12,410
‫You know, they go through all of that stuff.

565
00:38:13,940 --> 00:38:18,120
‫I'm not going to go through that, but that's essentially their advantage.

566
00:38:18,590 --> 00:38:25,520
‫They go through another advantage of my school saying that a buffer pool, the buffer pool is the caching

567
00:38:25,520 --> 00:38:26,150
‫mechanism.

568
00:38:26,630 --> 00:38:32,780
‫And PostgreSQL, compared to the buffer pool, is the caching mechanism in.

569
00:38:33,820 --> 00:38:39,450
‫And my sequel, compared to the Cachay Mechanism, Postgres, which is which is basically the RSS memory.

570
00:38:39,460 --> 00:38:39,790
‫Right.

571
00:38:40,750 --> 00:38:42,860
‫And they explaining the difference here.

572
00:38:42,880 --> 00:38:49,810
‫They they they claim that Posterous, using it, uses a different operating operating system, calls

573
00:38:49,810 --> 00:38:52,070
‫like they are using two calls instead of one.

574
00:38:52,810 --> 00:38:54,340
‫I don't know much about that, to be honest.

575
00:38:54,340 --> 00:38:59,620
‫I'm not an expert in operating systems, but a lot of people say that here you have to use a one call

576
00:38:59,620 --> 00:39:03,250
‫to seek and read at the same time instead of seeking and reading.

577
00:39:03,670 --> 00:39:08,460
‫I don't know, maybe Polska actually changes a lot of people here listening and watching this channel.

578
00:39:08,490 --> 00:39:14,320
‫Some some people actually are experts in this thing and might correct that bar, but I'm not aware of

579
00:39:14,320 --> 00:39:15,880
‫that as a result.

580
00:39:15,880 --> 00:39:18,190
‫So I can't comment more much on that part.

581
00:39:19,690 --> 00:39:26,410
‫There is then the energy storage engine implements the least recently used buffer pool and which you

582
00:39:26,410 --> 00:39:28,490
‫can apparently control.

583
00:39:28,540 --> 00:39:30,490
‫I'm surprised that you cannot control them.

584
00:39:30,490 --> 00:39:31,870
‫Cache size and Postgres.

585
00:39:31,870 --> 00:39:33,700
‫I need to read more about that a little bit.

586
00:39:34,480 --> 00:39:36,230
‫But that's another thing that they said.

587
00:39:36,250 --> 00:39:38,050
‫Oh, there's another advantage of MySQL.

588
00:39:38,830 --> 00:39:45,970
‫When another thing says the connection handling MySQL, there's a thread per connection GCP connection

589
00:39:45,970 --> 00:39:46,480
‫to you.

590
00:39:46,480 --> 00:39:50,470
‫Open to my school is a thread on the server side.

591
00:39:50,470 --> 00:39:52,630
‫However, Postgres it's an actual process.

592
00:39:52,630 --> 00:39:57,880
‫So technically now they claim obviously not enough.

593
00:39:57,940 --> 00:40:01,780
‫A thread is cheaper to spin off than a process.

594
00:40:02,800 --> 00:40:09,640
‫I read I read that this is no longer true because the process is almost identical now, but could be

595
00:40:09,970 --> 00:40:11,050
‫back in the days.

596
00:40:11,050 --> 00:40:12,580
‫Could be that was true.

597
00:40:13,270 --> 00:40:18,640
‫But now if you think about it to scale 10000 connections right now, if you think about it.

598
00:40:19,690 --> 00:40:24,650
‫Opening, opening a lot of TCP connections is just a bad idea.

599
00:40:24,850 --> 00:40:28,500
‫So that's why we have the idea of connection pooling, right?

600
00:40:28,570 --> 00:40:31,900
‫We build our application so that it uses a pool.

601
00:40:33,090 --> 00:40:39,210
‫Reserve a pool or of reserve a connection from the pool, execute the transaction and then return it

602
00:40:39,210 --> 00:40:39,710
‫to the pool.

603
00:40:40,260 --> 00:40:48,060
‫And if you're doing a single atomic statement that executed, you can just execute on the pool directly,

604
00:40:48,060 --> 00:40:53,160
‫said, hey, pick any pool, any illnesses in the pool, execute and then return, return.

605
00:40:53,160 --> 00:41:00,660
‫And immediately this reserve and release is also back to their careers if they have a case that spans

606
00:41:00,660 --> 00:41:02,370
‫three, four, five, seven minutes.

607
00:41:04,000 --> 00:41:04,390
‫And.

608
00:41:05,710 --> 00:41:12,820
‫Again, nothing wrong with a query that that transaction, it stands long if you're actually doing all

609
00:41:12,830 --> 00:41:19,030
‫database works, some some some transactions, I've seen transaction that takes 30 minutes.

610
00:41:19,180 --> 00:41:24,100
‫Just because it does a lot of work, it changes a lot.

611
00:41:24,100 --> 00:41:26,830
‫Then these changes, it has to be atomic, right?

612
00:41:27,040 --> 00:41:33,220
‫Yeah, you can argue that you can break it even that you have to break this transaction to smaller and

613
00:41:33,220 --> 00:41:38,980
‫smaller, small, small, small pieces so that each piece can be executed in its own.

614
00:41:40,320 --> 00:41:43,230
‫Atomic Waagner, right, so he can minimize the transaction.

615
00:41:43,710 --> 00:41:49,500
‫So this this also results in if you have a long running transactions, then you have to really think

616
00:41:49,500 --> 00:41:53,300
‫about how deep the reservation and connection pooling works.

617
00:41:53,310 --> 00:41:53,440
‫Right.

618
00:41:53,550 --> 00:41:55,200
‫So the number of connection.

619
00:41:55,880 --> 00:41:56,210
‫Right.

620
00:41:56,880 --> 00:41:57,560
‫Think about it.

621
00:41:57,660 --> 00:42:03,540
‫So that if not if the client's not not using a connection, then don't let them open a connection and

622
00:42:03,540 --> 00:42:05,400
‫just have it open.

623
00:42:05,740 --> 00:42:06,630
‫You just connection Pawling.

624
00:42:06,660 --> 00:42:13,620
‫And they say they use a I believe PJI bouncer's that they're using some some service that actually does

625
00:42:13,620 --> 00:42:15,850
‫that that connection pulling.

626
00:42:15,900 --> 00:42:17,340
‫But a lot of application do it.

627
00:42:17,340 --> 00:42:20,340
‫Even if you don't, you can build your own layer on top.

628
00:42:20,340 --> 00:42:25,680
‫And I show the connection pulling on Polska as many times in this channel, right through this idea,

629
00:42:25,680 --> 00:42:28,530
‫guys, and hopefully, hopefully in the future.

630
00:42:29,160 --> 00:42:32,380
‫And we're all at the end of the article, obviously, guys.

631
00:42:32,410 --> 00:42:32,690
‫Right.

632
00:42:34,400 --> 00:42:41,930
‫Or the end of the article, but hopefully when it comes to connection polling, I really hope that quick

633
00:42:41,930 --> 00:42:43,040
‫as a protocol.

634
00:42:44,250 --> 00:42:49,920
‫And mask, I believe they're just working on a new protocol right now, skull mask that will allows

635
00:42:49,920 --> 00:42:59,850
‫you to kind of stream multiple open multiple streams on a given TCP connection or UDP connection in

636
00:42:59,850 --> 00:43:00,570
‫case of quick.

637
00:43:01,840 --> 00:43:03,130
‫That represents your.

638
00:43:04,280 --> 00:43:11,390
‫You're in your database connection so that if if my sequel or Polska supported Quake and I don't see

639
00:43:11,390 --> 00:43:18,710
‫a reason why not, then the clients can open a single and remember, the clients always observe or something

640
00:43:18,710 --> 00:43:23,930
‫like that to open a single connection and have up to 200.

641
00:43:23,930 --> 00:43:28,060
‫Even more than that streams concurrently in a single Disneyfication.

642
00:43:28,910 --> 00:43:32,340
‫The only trick here is the database has to understand ideastream.

643
00:43:32,360 --> 00:43:33,620
‫So that's a lot of work.

644
00:43:33,620 --> 00:43:35,000
‫But I believe is going to be.

645
00:43:36,570 --> 00:43:44,830
‫Really lucrative for a data base to implement a protocol like that, just like I don't really need TCP

646
00:43:44,850 --> 00:43:45,550
‫anymore, right?

647
00:43:45,690 --> 00:43:54,150
‫A single Tsipi, just a wasteful thing to have a single TCP connection for a given client or connection

648
00:43:54,150 --> 00:43:54,450
‫borling.

649
00:43:54,480 --> 00:43:58,680
‫This has to go away and we have to move to a model where we multiplex.

650
00:43:59,680 --> 00:44:06,730
‫Queries in a single DCP connection using this protocol, whether whether it was even if they implemented

651
00:44:06,730 --> 00:44:11,320
‫their own, they don't have you as quick, just implement and your own protocol that supports multiplexing

652
00:44:11,830 --> 00:44:19,720
‫through multiplexing so that every request, every session, every channel has its own logical representation

653
00:44:19,720 --> 00:44:21,660
‫in that DCP connection that you open.

654
00:44:21,670 --> 00:44:23,170
‫So this you don't have to open.

655
00:44:23,170 --> 00:44:24,280
‫Many can actually.

656
00:44:24,280 --> 00:44:26,770
‫You just have to open one or a few of them.

657
00:44:27,040 --> 00:44:30,370
‫And each one of them has basically some limit.

658
00:44:30,910 --> 00:44:37,300
‫Obviously, that doesn't come with it for free because now you just increase the CPU size at the back

659
00:44:37,300 --> 00:44:42,890
‫end and the front end, because now you have to assemble these channels and streams.

660
00:44:42,910 --> 00:44:45,460
‫That's the problem which DeVita and Quick People start.

661
00:44:46,000 --> 00:44:49,360
‫Lucas Purdue and and what's his name?

662
00:44:49,360 --> 00:44:50,230
‫Chris Wood.

663
00:44:50,230 --> 00:44:52,420
‫And people working on the quick protocol.

664
00:44:52,420 --> 00:44:59,020
‫They're trying to solve this problem with the CPU usage because CPU usage now you have you just not

665
00:44:59,020 --> 00:45:03,090
‫working with just stream of content coming from TCP socket.

666
00:45:03,100 --> 00:45:08,560
‫No, you have to actually look at the data and then arrange the packets so they are in logical streams

667
00:45:08,560 --> 00:45:11,710
‫or channels and then then deliver to the app.

668
00:45:11,710 --> 00:45:14,950
‫So the operating system or the application.

669
00:45:15,940 --> 00:45:24,130
‫Wherever this thing lives, doing extra work, so again, I'm sorry about that Segway, but I want to

670
00:45:24,130 --> 00:45:26,440
‫discuss that a little bit, think I think that's just an idea.

671
00:45:26,440 --> 00:45:29,050
‫That is just great conclusion.

672
00:45:29,320 --> 00:45:35,890
‫Obviously, they say, hey, Polska Sevda served us well in the early days of Uber, but we ran into

673
00:45:35,890 --> 00:45:38,730
‫significant problems scaling Postgres with our growth.

674
00:45:39,190 --> 00:45:44,770
‫Today we have some legacy Postgres instances, but the bulk of our databases are either on top of my

675
00:45:44,770 --> 00:45:47,400
‫cycle, typically using our ski months later.

676
00:45:47,440 --> 00:45:48,250
‫That's another point.

677
00:45:49,450 --> 00:45:51,640
‫You have no schema less.

678
00:45:54,500 --> 00:46:02,630
‫You have schema lists and using MySQL, maybe there is something I'm missing here, but it does not

679
00:46:02,630 --> 00:46:05,140
‫seem natural to me.

680
00:46:07,430 --> 00:46:13,610
‫A lot of people use this podcast as a schema lists where they put a hunk of Jason in a single field

681
00:46:13,610 --> 00:46:17,690
‫as Jason B. And they they they work on that.

682
00:46:17,690 --> 00:46:23,660
‫But maybe that that's just the way for war, because if they have a lot of fields and they have a lot

683
00:46:23,660 --> 00:46:26,600
‫of indexes on those fields, maybe that's the way to go.

684
00:46:27,230 --> 00:46:27,830
‫Who knows?

685
00:46:28,590 --> 00:46:28,900
‫Right.

686
00:46:29,270 --> 00:46:31,390
‫Again, guys, what do you think?

687
00:46:31,400 --> 00:46:32,590
‫What do you think about all this stuff?

688
00:46:33,200 --> 00:46:34,520
‫Let me know in the comments section below.

689
00:46:35,060 --> 00:46:36,180
‫I'm going to see you in the next one.

690
00:46:36,210 --> 00:46:37,560
‫Hope you enjoyed this video.

691
00:46:37,880 --> 00:46:40,730
‫Give it a like if you do and share with your friends.

692
00:46:41,060 --> 00:46:43,220
‫I'm going to see you in the next one.

693
00:46:43,880 --> 00:46:44,450
‫Thank you.

694
00:46:44,600 --> 00:46:46,400
‫Even kill tasers.

695
00:46:46,430 --> 00:46:50,120
‫Kill, kill to keep a staff engineer.

696
00:46:50,120 --> 00:46:50,760
‫And overengineer.

697
00:46:50,810 --> 00:46:52,150
‫This is a great article again.

698
00:46:52,700 --> 00:46:53,120
‫Yeah.

699
00:46:53,360 --> 00:47:00,500
‫And things things have been changing a lot in the Uber ward.

700
00:47:01,340 --> 00:47:06,290
‫But this is again, this is this is a historical article that goes in the years and years.

701
00:47:06,290 --> 00:47:08,630
‫And we had to discuss it, so.

702
00:47:09,930 --> 00:47:10,810
‫Thank you so much.

703
00:47:10,880 --> 00:47:13,050
‫Appreciate you, I'm going to see in the next one.

704
00:47:13,080 --> 00:47:13,740
‫You guys stay awesome.