1
00:00:00,090 --> 00:00:01,540
‫What's going on, guys?

2
00:00:01,560 --> 00:00:02,610
‫My name is Hussein.

3
00:00:02,610 --> 00:00:09,990
‫And in this video, I want to discuss the idea of working with billion row tables in general.

4
00:00:09,990 --> 00:00:18,810
‫And this is a very, very interesting point because as you design a system, it will force you to ask

5
00:00:18,810 --> 00:00:19,920
‫a question.

6
00:00:19,950 --> 00:00:25,050
‫Your table design, your document design, whatever flavor of a database you use.

7
00:00:25,260 --> 00:00:26,400
‫How?

8
00:00:27,340 --> 00:00:34,000
‫Do you anticipate this table to grow in the future after one, three, four years?

9
00:00:34,030 --> 00:00:39,220
‫Do you anticipate these tables to grow so large that again and reach the billion level?

10
00:00:39,310 --> 00:00:46,960
‫So in this video, I want to kind of discuss how do we work with such huge volume of data?

11
00:00:47,050 --> 00:00:50,830
‫Right, because there are many ways of tackling this problem.

12
00:00:50,830 --> 00:00:52,510
‫And I want to just discuss some of them.

13
00:00:52,510 --> 00:00:55,660
‫I have three concepts here to discuss.

14
00:00:55,690 --> 00:00:59,800
‫Obviously, guys, if I missed anything, let me know in the comment section below.

15
00:01:00,430 --> 00:01:08,410
‫Before we jump into the video, guys, this video was basically inspired from my comment section specifically

16
00:01:08,410 --> 00:01:09,550
‫with Vinnie.

17
00:01:09,580 --> 00:01:17,260
‫He's a great database engineer and he always finds mistakes and things that I say wrong and and obviously

18
00:01:17,290 --> 00:01:20,350
‫has a great feedback in general on my videos.

19
00:01:20,350 --> 00:01:22,990
‫So I like this is just a shout out for him.

20
00:01:23,380 --> 00:01:30,550
‫So it was spawn as a discussion on my Twitter system design video when I came up with with this arbitrary

21
00:01:30,550 --> 00:01:34,330
‫design for the flow feature, which I'm going to reference the video right here.

22
00:01:34,360 --> 00:01:35,200
‫Go check it out.

23
00:01:36,310 --> 00:01:41,140
‫And so as a result, that design

24
00:01:43,030 --> 00:01:46,150
‫generated a table which is.

25
00:01:47,720 --> 00:01:49,030
‫Very, very huge.

26
00:01:49,040 --> 00:01:49,510
‫Right.

27
00:01:49,520 --> 00:01:53,510
‫So what I basically did is I have this following feature, right?

28
00:01:53,510 --> 00:01:56,810
‫So I have this person following this person.

29
00:01:56,810 --> 00:02:02,570
‫So I put the whole thing in one table and had added two roles with their ideas.

30
00:02:02,570 --> 00:02:04,070
‫So this person is following this step.

31
00:02:04,070 --> 00:02:08,810
‫And I said in the video, this is going to be a huge table, so what do we do?

32
00:02:08,810 --> 00:02:15,770
‫And we had a discussion back and forth of what do we do with these kind of intuition, which inspired

33
00:02:15,770 --> 00:02:19,130
‫me to actually make a video to discuss What do we do?

34
00:02:19,160 --> 00:02:20,750
‫What do people do today?

35
00:02:20,780 --> 00:02:24,080
‫If you have a large table, how about we discuss it?

36
00:02:24,590 --> 00:02:27,470
‫So I have three basically concepts.

37
00:02:27,470 --> 00:02:34,880
‫The first one is brute forcing your way to to process the table or work on the table.

38
00:02:34,880 --> 00:02:44,060
‫So if you are trying to find a row inside this table, right, what you can do without the concept of

39
00:02:44,060 --> 00:02:51,290
‫indexing, without acceptance of anything brute force your way, which is do multithreading, do multi

40
00:02:51,290 --> 00:02:58,310
‫processing and chunk the table into multiple segments and search in parallel.

41
00:02:58,310 --> 00:02:58,820
‫Right.

42
00:02:58,820 --> 00:03:00,650
‫That's how basically.

43
00:03:02,000 --> 00:03:05,510
‫Big data essentially and and Hadoop works, right?

44
00:03:05,510 --> 00:03:11,090
‫So it was like map and reduce the subset of the table into smaller concepts.

45
00:03:11,090 --> 00:03:18,380
‫So you can run in parallel and brute force your way and find what you're looking for and try to do the

46
00:03:18,380 --> 00:03:19,580
‫process yourself.

47
00:03:19,580 --> 00:03:19,790
‫Right.

48
00:03:19,790 --> 00:03:20,830
‫So that's the idea.

49
00:03:20,840 --> 00:03:29,120
‫Can I break this table into hundreds of pieces and search these spaces in parallel concurrently throwing

50
00:03:29,120 --> 00:03:35,720
‫this problem on on 100 machine cluster that will work sometimes.

51
00:03:35,720 --> 00:03:40,130
‫That's why I want to discuss the second point, which is can I.

52
00:03:41,930 --> 00:03:45,170
‫Avoid processing the entire table.

53
00:03:45,320 --> 00:03:52,550
‫Can I avoid processing the entire table and instead process a subset of this table only?

54
00:03:52,730 --> 00:03:53,780
‫How do I do that?

55
00:03:54,440 --> 00:03:58,670
‫The best best approach is use indexing, right?

56
00:03:58,670 --> 00:03:59,950
‫Because that's what we do.

57
00:03:59,960 --> 00:04:11,330
‫If we index a column on a table, then you essentially create a structure on the desk that will basically

58
00:04:11,330 --> 00:04:18,170
‫it's ab3 or less M three that will help you reduce the subset on what you're searching.

59
00:04:18,170 --> 00:04:26,450
‫So instead of searching the entire table for what you want, you search only a small subset, which

60
00:04:26,450 --> 00:04:32,660
‫is the index, and that even it's own, it's a scan to find what you want.

61
00:04:32,660 --> 00:04:39,470
‫And then you by finding that you, you kind of narrow what you were looking for.

62
00:04:39,800 --> 00:04:40,940
‫It's like a binder.

63
00:04:40,940 --> 00:04:46,070
‫And then in the secretary's office, right where you have, okay, there is the book and there is the

64
00:04:46,070 --> 00:04:46,520
‫letter.

65
00:04:46,520 --> 00:04:51,080
‫A And any contract that starts with A is is this anyone that starts with B?

66
00:04:51,080 --> 00:04:53,090
‫Is this and that started with C, is this.

67
00:04:53,090 --> 00:04:54,650
‫So you see it in color coded, right?

68
00:04:54,650 --> 00:04:59,570
‫So if you if your contract is, I don't know, company is called zebra.

69
00:04:59,570 --> 00:05:03,560
‫So you only immediately go to the Z color and then you start searching.

70
00:05:03,710 --> 00:05:07,250
‫So you minimize what you're searching for.

71
00:05:07,400 --> 00:05:09,650
‫However, that's indexing.

72
00:05:09,650 --> 00:05:13,640
‫So let's search with a smaller set, right?

73
00:05:13,640 --> 00:05:17,390
‫So instead of having the whole table, let's reduce the set, right?

74
00:05:17,390 --> 00:05:21,860
‫So instead of working with a billion rows, maybe we're working with a few millions in this case.

75
00:05:21,860 --> 00:05:22,160
‫Right.

76
00:05:22,490 --> 00:05:27,410
‫Can I even go and reduce that set even more?

77
00:05:27,620 --> 00:05:32,870
‫That's where database people do tricks like partitioning.

78
00:05:33,440 --> 00:05:38,720
‫So partitioning is on desk by this huge table is now.

79
00:05:39,910 --> 00:05:41,360
‫Broken into.

80
00:05:41,380 --> 00:05:46,090
‫And I'm talking about here essentially horizontal partitioning, not vertical partitioning.

81
00:05:46,420 --> 00:05:52,540
‫So horizontal partitioning means like slice the table in half, almost like in the middle.

82
00:05:52,660 --> 00:06:00,100
‫And then you say, okay, rows from this to this is is here on this location on disk.

83
00:06:00,100 --> 00:06:00,430
‫Right?

84
00:06:00,430 --> 00:06:03,330
‫And then rows from this range to this range.

85
00:06:03,340 --> 00:06:05,620
‫And then this location is different than indexing.

86
00:06:05,620 --> 00:06:13,540
‫So the whole thing is still index, but we're literally partitioning the table into multiple parts.

87
00:06:13,540 --> 00:06:14,800
‫So now.

88
00:06:16,870 --> 00:06:19,810
‫How do I know which partition to search for?

89
00:06:19,840 --> 00:06:27,520
‫You need another concept that tells you which partition to hit, and if you're lucky, you might search

90
00:06:27,520 --> 00:06:29,860
‫one partition only or a couple.

91
00:06:30,280 --> 00:06:31,920
‫And this is called the partition key.

92
00:06:31,930 --> 00:06:33,820
‫So you always partition on a key.

93
00:06:33,850 --> 00:06:38,080
‫Very similar to indexing, except the indexing work and the whole table.

94
00:06:38,110 --> 00:06:41,890
‫Partitioning works also on the whole table.

95
00:06:41,890 --> 00:06:42,190
‫But.

96
00:06:42,190 --> 00:06:47,800
‫But it will partition will break down the table into smaller, smaller pieces.

97
00:06:47,800 --> 00:06:53,060
‫And now you can you will have different indexes per partition.

98
00:06:53,080 --> 00:06:57,920
‫Usually the database takes care of all that stuff for you, so it's almost transparent.

99
00:06:57,940 --> 00:07:00,460
‫Working with indexes was working with what?

100
00:07:00,460 --> 00:07:05,380
‫Partition is transparent from you as a client who queries this table.

101
00:07:05,380 --> 00:07:07,830
‫So it's incredibly fast, right?

102
00:07:07,840 --> 00:07:16,300
‫So if you know where to search for, you can hit the right partition and only hoping that you the partition

103
00:07:16,300 --> 00:07:21,910
‫that you're searching for is in that and indexing also make that even smaller set.

104
00:07:21,910 --> 00:07:24,160
‫So that's pretty cool, right?

105
00:07:24,160 --> 00:07:33,070
‫And that's still where we have one database, still we have one machine and we broken this into multiple

106
00:07:33,070 --> 00:07:35,760
‫partitions and now I can search exactly what I want to.

107
00:07:35,800 --> 00:07:44,380
‫Now you can distribute that even further across multiple hosts by doing sharding, right?

108
00:07:44,380 --> 00:07:51,850
‫So, so similarly to the concept of partition, you can still have partitioning and also add the idea

109
00:07:51,850 --> 00:07:57,100
‫of sharding on top of that, which adds a little bit of complexity to your system.

110
00:07:57,250 --> 00:08:06,220
‫But now you put the first 100,000 customers in one database and you put the second 100,001 database,

111
00:08:06,220 --> 00:08:08,050
‫and they don't talk to each other.

112
00:08:08,050 --> 00:08:10,660
‫And here's now the problem of transactions, right?

113
00:08:10,660 --> 00:08:12,970
‫Because they are two different databases.

114
00:08:15,220 --> 00:08:18,220
‫You just reduce the size of the table, obviously.

115
00:08:18,220 --> 00:08:23,870
‫But now you complicated the client because the client is now should be aware of the Shard.

116
00:08:23,870 --> 00:08:27,520
‫It's like, okay, I am searching for customer number 500.

117
00:08:27,520 --> 00:08:28,450
‫Which shard should I head?

118
00:08:28,450 --> 00:08:30,910
‫Oh, you had shard one because that's where it is, right?

119
00:08:30,910 --> 00:08:38,680
‫And now going down deep into that shard, there are partitions of that table and going down into each

120
00:08:38,680 --> 00:08:41,110
‫partition, there are indexes, right.

121
00:08:41,110 --> 00:08:42,100
‫Or index.

122
00:08:42,100 --> 00:08:51,220
‫And now you just you basically narrowed the billions into maybe a few thousand or few hundred thousand

123
00:08:51,220 --> 00:08:52,630
‫rows, which is pretty good.

124
00:08:52,630 --> 00:08:54,990
‫So that's the idea of what we do, right?

125
00:08:55,060 --> 00:09:00,580
‫Shard partition and then index and then find a row exactly what we're looking for.

126
00:09:00,580 --> 00:09:04,510
‫So that's the idea of of limiting what do we want to work with?

127
00:09:04,510 --> 00:09:13,450
‫And the final thing is, and as I started thinking about it, it's like, okay, maybe we can avoid

128
00:09:13,450 --> 00:09:14,770
‫all this together.

129
00:09:14,770 --> 00:09:19,750
‫Why do you have a billion row table to begin with?

130
00:09:19,750 --> 00:09:23,280
‫So that's on the database design, which was me in that case, right?

131
00:09:23,300 --> 00:09:28,750
‫It's like, okay, maybe it's not a good idea to have a table so large.

132
00:09:28,780 --> 00:09:35,950
‫Can we solve this problem so that I don't need to have a billion row table?

133
00:09:36,930 --> 00:09:43,980
‫And in case of the of the Twitter following example, we might actually be able to I still didn't complete

134
00:09:43,980 --> 00:09:49,920
‫the thought yet, but if you have like a profile table and say, okay, this is my ID, this is my name,

135
00:09:49,920 --> 00:09:53,550
‫this is my picture, we can add a field called.

136
00:09:54,810 --> 00:09:56,610
‫Followers count.

137
00:09:56,970 --> 00:09:58,050
‫It's an integer.

138
00:09:59,480 --> 00:10:00,410
‫We'll come to that.

139
00:10:00,630 --> 00:10:02,390
‫Now there is another field.

140
00:10:03,050 --> 00:10:05,690
‫Now most relational database support JSON.

141
00:10:05,690 --> 00:10:07,250
‫We can put a JSON there.

142
00:10:08,070 --> 00:10:13,920
‫Or a list field and add your followers in your profile.

143
00:10:14,130 --> 00:10:19,290
‫So now we don't have a relational table that tells you, Oh, this guy is following this guy, this

144
00:10:19,290 --> 00:10:20,840
‫guy is following this gal.

145
00:10:20,850 --> 00:10:28,380
‫Now we have one profile and if you want to get your followers, then you go to your profile and fetch

146
00:10:28,380 --> 00:10:31,290
‫that row and and pull that information.

147
00:10:31,290 --> 00:10:32,550
‫And that's you have all the profile.

148
00:10:32,550 --> 00:10:36,720
‫And every time someone follows you or some, you follow someone.

149
00:10:37,140 --> 00:10:38,100
‫Now.

150
00:10:38,950 --> 00:10:42,750
‫The hit is on the right level if I want to write.

151
00:10:42,760 --> 00:10:44,620
‫Hey, someone just followed me.

152
00:10:45,550 --> 00:10:47,650
‫I need to update those two columns.

153
00:10:47,650 --> 00:10:49,420
‫I need to update the count to that.

154
00:10:49,450 --> 00:10:56,290
‫I didn't have this problem in the first design, but the first design wouldn't scale as better as this,

155
00:10:56,290 --> 00:10:57,580
‫in my opinion.

156
00:10:57,580 --> 00:10:58,110
‫Right.

157
00:10:58,120 --> 00:10:59,090
‫You can.

158
00:10:59,110 --> 00:11:04,480
‫Now we start worrying about the right throughput, but I don't want to go through that stuff.

159
00:11:04,660 --> 00:11:06,420
‫We can do message queues where we can.

160
00:11:06,430 --> 00:11:08,440
‫Okay, let's write it asynchronously.

161
00:11:08,440 --> 00:11:09,230
‫Update that.

162
00:11:09,250 --> 00:11:09,830
‫Yeah.

163
00:11:10,240 --> 00:11:11,440
‫There will be a little bit delay.

164
00:11:11,470 --> 00:11:12,160
‫Who cares?

165
00:11:12,160 --> 00:11:13,970
‫It's a follower count anyway.

166
00:11:13,990 --> 00:11:18,400
‫We're going to pick the queue and then slowly just update these things.

167
00:11:18,400 --> 00:11:21,010
‫So we have many ways to solve a problem.

168
00:11:21,010 --> 00:11:29,800
‫So instead of to summarise the whole video, instead of working with the whole billion table row, try

169
00:11:29,800 --> 00:11:32,620
‫first concurrently, process it, processing it.

170
00:11:32,770 --> 00:11:34,360
‫Maybe I'll flip that a little bit.

171
00:11:34,360 --> 00:11:37,090
‫Maybe try to avoid having a billion row.

172
00:11:37,120 --> 00:11:38,220
‫That's the first thing.

173
00:11:38,230 --> 00:11:40,780
‫I kind of said it last, right?

174
00:11:40,810 --> 00:11:46,240
‫The second one, if you can't avoid it, then can you index it of your obviously what field to index,

175
00:11:46,240 --> 00:11:50,570
‫then can you partition it right on the same table, on the same disk?

176
00:11:50,590 --> 00:11:51,250
‫Right.

177
00:11:51,280 --> 00:11:55,390
‫Can you partition your table so that they are smaller sizes?

178
00:11:55,390 --> 00:12:01,270
‫And if if you can't partition and you can index, can you even if do you really need to short it so

179
00:12:01,270 --> 00:12:05,340
‫that if it's even smaller and smaller or small pieces on multiple hosts.

180
00:12:05,350 --> 00:12:05,750
‫Right.

181
00:12:05,770 --> 00:12:08,830
‫Because if that host dies, then that's a problem.

182
00:12:08,830 --> 00:12:09,070
‫Right.

183
00:12:09,070 --> 00:12:16,870
‫So you even partition it on on horizontally, essentially on multiple databases.

184
00:12:16,910 --> 00:12:17,260
‫Right.

185
00:12:17,320 --> 00:12:21,110
‫Shards that will create complexity, which I try to avoid.

186
00:12:21,130 --> 00:12:22,720
‫I talked about that a little bit.

187
00:12:22,720 --> 00:12:31,540
‫And then finally, if you can't do any of that stuff, just do do a MapReduce to just run parallel processing

188
00:12:31,540 --> 00:12:40,390
‫and try to process run your work so that you process the billion row concurrently with a massive army

189
00:12:40,390 --> 00:12:44,260
‫of machines, if possible, if your database transactional.

190
00:12:44,500 --> 00:12:52,540
‫Then that's kind of pointless because the moment you start the army searching or working with your huge

191
00:12:52,540 --> 00:13:01,780
‫table partitioned, right spliced, then it will go out of there the moment you start because people

192
00:13:01,780 --> 00:13:02,760
‫start editing, right?

193
00:13:02,800 --> 00:13:04,460
‫People changing all the time.

194
00:13:04,480 --> 00:13:06,550
‫All right, guys, I'm going to see you in the next one.

195
00:13:06,550 --> 00:13:07,480
‫You guys stay awesome.

196
00:13:07,510 --> 00:13:08,080
‫Goodbye.