1 00:00:00,090 --> 00:00:01,540 ‫What's going on, guys? 2 00:00:01,560 --> 00:00:02,610 ‫My name is Hussein. 3 00:00:02,610 --> 00:00:09,990 ‫And in this video, I want to discuss the idea of working with billion row tables in general. 4 00:00:09,990 --> 00:00:18,810 ‫And this is a very, very interesting point because as you design a system, it will force you to ask 5 00:00:18,810 --> 00:00:19,920 ‫a question. 6 00:00:19,950 --> 00:00:25,050 ‫Your table design, your document design, whatever flavor of a database you use. 7 00:00:25,260 --> 00:00:26,400 ‫How? 8 00:00:27,340 --> 00:00:34,000 ‫Do you anticipate this table to grow in the future after one, three, four years? 9 00:00:34,030 --> 00:00:39,220 ‫Do you anticipate these tables to grow so large that again and reach the billion level? 10 00:00:39,310 --> 00:00:46,960 ‫So in this video, I want to kind of discuss how do we work with such huge volume of data? 11 00:00:47,050 --> 00:00:50,830 ‫Right, because there are many ways of tackling this problem. 12 00:00:50,830 --> 00:00:52,510 ‫And I want to just discuss some of them. 13 00:00:52,510 --> 00:00:55,660 ‫I have three concepts here to discuss. 14 00:00:55,690 --> 00:00:59,800 ‫Obviously, guys, if I missed anything, let me know in the comment section below. 15 00:01:00,430 --> 00:01:08,410 ‫Before we jump into the video, guys, this video was basically inspired from my comment section specifically 16 00:01:08,410 --> 00:01:09,550 ‫with Vinnie. 17 00:01:09,580 --> 00:01:17,260 ‫He's a great database engineer and he always finds mistakes and things that I say wrong and and obviously 18 00:01:17,290 --> 00:01:20,350 ‫has a great feedback in general on my videos. 19 00:01:20,350 --> 00:01:22,990 ‫So I like this is just a shout out for him. 20 00:01:23,380 --> 00:01:30,550 ‫So it was spawn as a discussion on my Twitter system design video when I came up with with this arbitrary 21 00:01:30,550 --> 00:01:34,330 ‫design for the flow feature, which I'm going to reference the video right here. 22 00:01:34,360 --> 00:01:35,200 ‫Go check it out. 23 00:01:36,310 --> 00:01:41,140 ‫And so as a result, that design 24 00:01:43,030 --> 00:01:46,150 ‫generated a table which is. 25 00:01:47,720 --> 00:01:49,030 ‫Very, very huge. 26 00:01:49,040 --> 00:01:49,510 ‫Right. 27 00:01:49,520 --> 00:01:53,510 ‫So what I basically did is I have this following feature, right? 28 00:01:53,510 --> 00:01:56,810 ‫So I have this person following this person. 29 00:01:56,810 --> 00:02:02,570 ‫So I put the whole thing in one table and had added two roles with their ideas. 30 00:02:02,570 --> 00:02:04,070 ‫So this person is following this step. 31 00:02:04,070 --> 00:02:08,810 ‫And I said in the video, this is going to be a huge table, so what do we do? 32 00:02:08,810 --> 00:02:15,770 ‫And we had a discussion back and forth of what do we do with these kind of intuition, which inspired 33 00:02:15,770 --> 00:02:19,130 ‫me to actually make a video to discuss What do we do? 34 00:02:19,160 --> 00:02:20,750 ‫What do people do today? 35 00:02:20,780 --> 00:02:24,080 ‫If you have a large table, how about we discuss it? 36 00:02:24,590 --> 00:02:27,470 ‫So I have three basically concepts. 37 00:02:27,470 --> 00:02:34,880 ‫The first one is brute forcing your way to to process the table or work on the table. 38 00:02:34,880 --> 00:02:44,060 ‫So if you are trying to find a row inside this table, right, what you can do without the concept of 39 00:02:44,060 --> 00:02:51,290 ‫indexing, without acceptance of anything brute force your way, which is do multithreading, do multi 40 00:02:51,290 --> 00:02:58,310 ‫processing and chunk the table into multiple segments and search in parallel. 41 00:02:58,310 --> 00:02:58,820 ‫Right. 42 00:02:58,820 --> 00:03:00,650 ‫That's how basically. 43 00:03:02,000 --> 00:03:05,510 ‫Big data essentially and and Hadoop works, right? 44 00:03:05,510 --> 00:03:11,090 ‫So it was like map and reduce the subset of the table into smaller concepts. 45 00:03:11,090 --> 00:03:18,380 ‫So you can run in parallel and brute force your way and find what you're looking for and try to do the 46 00:03:18,380 --> 00:03:19,580 ‫process yourself. 47 00:03:19,580 --> 00:03:19,790 ‫Right. 48 00:03:19,790 --> 00:03:20,830 ‫So that's the idea. 49 00:03:20,840 --> 00:03:29,120 ‫Can I break this table into hundreds of pieces and search these spaces in parallel concurrently throwing 50 00:03:29,120 --> 00:03:35,720 ‫this problem on on 100 machine cluster that will work sometimes. 51 00:03:35,720 --> 00:03:40,130 ‫That's why I want to discuss the second point, which is can I. 52 00:03:41,930 --> 00:03:45,170 ‫Avoid processing the entire table. 53 00:03:45,320 --> 00:03:52,550 ‫Can I avoid processing the entire table and instead process a subset of this table only? 54 00:03:52,730 --> 00:03:53,780 ‫How do I do that? 55 00:03:54,440 --> 00:03:58,670 ‫The best best approach is use indexing, right? 56 00:03:58,670 --> 00:03:59,950 ‫Because that's what we do. 57 00:03:59,960 --> 00:04:11,330 ‫If we index a column on a table, then you essentially create a structure on the desk that will basically 58 00:04:11,330 --> 00:04:18,170 ‫it's ab3 or less M three that will help you reduce the subset on what you're searching. 59 00:04:18,170 --> 00:04:26,450 ‫So instead of searching the entire table for what you want, you search only a small subset, which 60 00:04:26,450 --> 00:04:32,660 ‫is the index, and that even it's own, it's a scan to find what you want. 61 00:04:32,660 --> 00:04:39,470 ‫And then you by finding that you, you kind of narrow what you were looking for. 62 00:04:39,800 --> 00:04:40,940 ‫It's like a binder. 63 00:04:40,940 --> 00:04:46,070 ‫And then in the secretary's office, right where you have, okay, there is the book and there is the 64 00:04:46,070 --> 00:04:46,520 ‫letter. 65 00:04:46,520 --> 00:04:51,080 ‫A And any contract that starts with A is is this anyone that starts with B? 66 00:04:51,080 --> 00:04:53,090 ‫Is this and that started with C, is this. 67 00:04:53,090 --> 00:04:54,650 ‫So you see it in color coded, right? 68 00:04:54,650 --> 00:04:59,570 ‫So if you if your contract is, I don't know, company is called zebra. 69 00:04:59,570 --> 00:05:03,560 ‫So you only immediately go to the Z color and then you start searching. 70 00:05:03,710 --> 00:05:07,250 ‫So you minimize what you're searching for. 71 00:05:07,400 --> 00:05:09,650 ‫However, that's indexing. 72 00:05:09,650 --> 00:05:13,640 ‫So let's search with a smaller set, right? 73 00:05:13,640 --> 00:05:17,390 ‫So instead of having the whole table, let's reduce the set, right? 74 00:05:17,390 --> 00:05:21,860 ‫So instead of working with a billion rows, maybe we're working with a few millions in this case. 75 00:05:21,860 --> 00:05:22,160 ‫Right. 76 00:05:22,490 --> 00:05:27,410 ‫Can I even go and reduce that set even more? 77 00:05:27,620 --> 00:05:32,870 ‫That's where database people do tricks like partitioning. 78 00:05:33,440 --> 00:05:38,720 ‫So partitioning is on desk by this huge table is now. 79 00:05:39,910 --> 00:05:41,360 ‫Broken into. 80 00:05:41,380 --> 00:05:46,090 ‫And I'm talking about here essentially horizontal partitioning, not vertical partitioning. 81 00:05:46,420 --> 00:05:52,540 ‫So horizontal partitioning means like slice the table in half, almost like in the middle. 82 00:05:52,660 --> 00:06:00,100 ‫And then you say, okay, rows from this to this is is here on this location on disk. 83 00:06:00,100 --> 00:06:00,430 ‫Right? 84 00:06:00,430 --> 00:06:03,330 ‫And then rows from this range to this range. 85 00:06:03,340 --> 00:06:05,620 ‫And then this location is different than indexing. 86 00:06:05,620 --> 00:06:13,540 ‫So the whole thing is still index, but we're literally partitioning the table into multiple parts. 87 00:06:13,540 --> 00:06:14,800 ‫So now. 88 00:06:16,870 --> 00:06:19,810 ‫How do I know which partition to search for? 89 00:06:19,840 --> 00:06:27,520 ‫You need another concept that tells you which partition to hit, and if you're lucky, you might search 90 00:06:27,520 --> 00:06:29,860 ‫one partition only or a couple. 91 00:06:30,280 --> 00:06:31,920 ‫And this is called the partition key. 92 00:06:31,930 --> 00:06:33,820 ‫So you always partition on a key. 93 00:06:33,850 --> 00:06:38,080 ‫Very similar to indexing, except the indexing work and the whole table. 94 00:06:38,110 --> 00:06:41,890 ‫Partitioning works also on the whole table. 95 00:06:41,890 --> 00:06:42,190 ‫But. 96 00:06:42,190 --> 00:06:47,800 ‫But it will partition will break down the table into smaller, smaller pieces. 97 00:06:47,800 --> 00:06:53,060 ‫And now you can you will have different indexes per partition. 98 00:06:53,080 --> 00:06:57,920 ‫Usually the database takes care of all that stuff for you, so it's almost transparent. 99 00:06:57,940 --> 00:07:00,460 ‫Working with indexes was working with what? 100 00:07:00,460 --> 00:07:05,380 ‫Partition is transparent from you as a client who queries this table. 101 00:07:05,380 --> 00:07:07,830 ‫So it's incredibly fast, right? 102 00:07:07,840 --> 00:07:16,300 ‫So if you know where to search for, you can hit the right partition and only hoping that you the partition 103 00:07:16,300 --> 00:07:21,910 ‫that you're searching for is in that and indexing also make that even smaller set. 104 00:07:21,910 --> 00:07:24,160 ‫So that's pretty cool, right? 105 00:07:24,160 --> 00:07:33,070 ‫And that's still where we have one database, still we have one machine and we broken this into multiple 106 00:07:33,070 --> 00:07:35,760 ‫partitions and now I can search exactly what I want to. 107 00:07:35,800 --> 00:07:44,380 ‫Now you can distribute that even further across multiple hosts by doing sharding, right? 108 00:07:44,380 --> 00:07:51,850 ‫So, so similarly to the concept of partition, you can still have partitioning and also add the idea 109 00:07:51,850 --> 00:07:57,100 ‫of sharding on top of that, which adds a little bit of complexity to your system. 110 00:07:57,250 --> 00:08:06,220 ‫But now you put the first 100,000 customers in one database and you put the second 100,001 database, 111 00:08:06,220 --> 00:08:08,050 ‫and they don't talk to each other. 112 00:08:08,050 --> 00:08:10,660 ‫And here's now the problem of transactions, right? 113 00:08:10,660 --> 00:08:12,970 ‫Because they are two different databases. 114 00:08:15,220 --> 00:08:18,220 ‫You just reduce the size of the table, obviously. 115 00:08:18,220 --> 00:08:23,870 ‫But now you complicated the client because the client is now should be aware of the Shard. 116 00:08:23,870 --> 00:08:27,520 ‫It's like, okay, I am searching for customer number 500. 117 00:08:27,520 --> 00:08:28,450 ‫Which shard should I head? 118 00:08:28,450 --> 00:08:30,910 ‫Oh, you had shard one because that's where it is, right? 119 00:08:30,910 --> 00:08:38,680 ‫And now going down deep into that shard, there are partitions of that table and going down into each 120 00:08:38,680 --> 00:08:41,110 ‫partition, there are indexes, right. 121 00:08:41,110 --> 00:08:42,100 ‫Or index. 122 00:08:42,100 --> 00:08:51,220 ‫And now you just you basically narrowed the billions into maybe a few thousand or few hundred thousand 123 00:08:51,220 --> 00:08:52,630 ‫rows, which is pretty good. 124 00:08:52,630 --> 00:08:54,990 ‫So that's the idea of what we do, right? 125 00:08:55,060 --> 00:09:00,580 ‫Shard partition and then index and then find a row exactly what we're looking for. 126 00:09:00,580 --> 00:09:04,510 ‫So that's the idea of of limiting what do we want to work with? 127 00:09:04,510 --> 00:09:13,450 ‫And the final thing is, and as I started thinking about it, it's like, okay, maybe we can avoid 128 00:09:13,450 --> 00:09:14,770 ‫all this together. 129 00:09:14,770 --> 00:09:19,750 ‫Why do you have a billion row table to begin with? 130 00:09:19,750 --> 00:09:23,280 ‫So that's on the database design, which was me in that case, right? 131 00:09:23,300 --> 00:09:28,750 ‫It's like, okay, maybe it's not a good idea to have a table so large. 132 00:09:28,780 --> 00:09:35,950 ‫Can we solve this problem so that I don't need to have a billion row table? 133 00:09:36,930 --> 00:09:43,980 ‫And in case of the of the Twitter following example, we might actually be able to I still didn't complete 134 00:09:43,980 --> 00:09:49,920 ‫the thought yet, but if you have like a profile table and say, okay, this is my ID, this is my name, 135 00:09:49,920 --> 00:09:53,550 ‫this is my picture, we can add a field called. 136 00:09:54,810 --> 00:09:56,610 ‫Followers count. 137 00:09:56,970 --> 00:09:58,050 ‫It's an integer. 138 00:09:59,480 --> 00:10:00,410 ‫We'll come to that. 139 00:10:00,630 --> 00:10:02,390 ‫Now there is another field. 140 00:10:03,050 --> 00:10:05,690 ‫Now most relational database support JSON. 141 00:10:05,690 --> 00:10:07,250 ‫We can put a JSON there. 142 00:10:08,070 --> 00:10:13,920 ‫Or a list field and add your followers in your profile. 143 00:10:14,130 --> 00:10:19,290 ‫So now we don't have a relational table that tells you, Oh, this guy is following this guy, this 144 00:10:19,290 --> 00:10:20,840 ‫guy is following this gal. 145 00:10:20,850 --> 00:10:28,380 ‫Now we have one profile and if you want to get your followers, then you go to your profile and fetch 146 00:10:28,380 --> 00:10:31,290 ‫that row and and pull that information. 147 00:10:31,290 --> 00:10:32,550 ‫And that's you have all the profile. 148 00:10:32,550 --> 00:10:36,720 ‫And every time someone follows you or some, you follow someone. 149 00:10:37,140 --> 00:10:38,100 ‫Now. 150 00:10:38,950 --> 00:10:42,750 ‫The hit is on the right level if I want to write. 151 00:10:42,760 --> 00:10:44,620 ‫Hey, someone just followed me. 152 00:10:45,550 --> 00:10:47,650 ‫I need to update those two columns. 153 00:10:47,650 --> 00:10:49,420 ‫I need to update the count to that. 154 00:10:49,450 --> 00:10:56,290 ‫I didn't have this problem in the first design, but the first design wouldn't scale as better as this, 155 00:10:56,290 --> 00:10:57,580 ‫in my opinion. 156 00:10:57,580 --> 00:10:58,110 ‫Right. 157 00:10:58,120 --> 00:10:59,090 ‫You can. 158 00:10:59,110 --> 00:11:04,480 ‫Now we start worrying about the right throughput, but I don't want to go through that stuff. 159 00:11:04,660 --> 00:11:06,420 ‫We can do message queues where we can. 160 00:11:06,430 --> 00:11:08,440 ‫Okay, let's write it asynchronously. 161 00:11:08,440 --> 00:11:09,230 ‫Update that. 162 00:11:09,250 --> 00:11:09,830 ‫Yeah. 163 00:11:10,240 --> 00:11:11,440 ‫There will be a little bit delay. 164 00:11:11,470 --> 00:11:12,160 ‫Who cares? 165 00:11:12,160 --> 00:11:13,970 ‫It's a follower count anyway. 166 00:11:13,990 --> 00:11:18,400 ‫We're going to pick the queue and then slowly just update these things. 167 00:11:18,400 --> 00:11:21,010 ‫So we have many ways to solve a problem. 168 00:11:21,010 --> 00:11:29,800 ‫So instead of to summarise the whole video, instead of working with the whole billion table row, try 169 00:11:29,800 --> 00:11:32,620 ‫first concurrently, process it, processing it. 170 00:11:32,770 --> 00:11:34,360 ‫Maybe I'll flip that a little bit. 171 00:11:34,360 --> 00:11:37,090 ‫Maybe try to avoid having a billion row. 172 00:11:37,120 --> 00:11:38,220 ‫That's the first thing. 173 00:11:38,230 --> 00:11:40,780 ‫I kind of said it last, right? 174 00:11:40,810 --> 00:11:46,240 ‫The second one, if you can't avoid it, then can you index it of your obviously what field to index, 175 00:11:46,240 --> 00:11:50,570 ‫then can you partition it right on the same table, on the same disk? 176 00:11:50,590 --> 00:11:51,250 ‫Right. 177 00:11:51,280 --> 00:11:55,390 ‫Can you partition your table so that they are smaller sizes? 178 00:11:55,390 --> 00:12:01,270 ‫And if if you can't partition and you can index, can you even if do you really need to short it so 179 00:12:01,270 --> 00:12:05,340 ‫that if it's even smaller and smaller or small pieces on multiple hosts. 180 00:12:05,350 --> 00:12:05,750 ‫Right. 181 00:12:05,770 --> 00:12:08,830 ‫Because if that host dies, then that's a problem. 182 00:12:08,830 --> 00:12:09,070 ‫Right. 183 00:12:09,070 --> 00:12:16,870 ‫So you even partition it on on horizontally, essentially on multiple databases. 184 00:12:16,910 --> 00:12:17,260 ‫Right. 185 00:12:17,320 --> 00:12:21,110 ‫Shards that will create complexity, which I try to avoid. 186 00:12:21,130 --> 00:12:22,720 ‫I talked about that a little bit. 187 00:12:22,720 --> 00:12:31,540 ‫And then finally, if you can't do any of that stuff, just do do a MapReduce to just run parallel processing 188 00:12:31,540 --> 00:12:40,390 ‫and try to process run your work so that you process the billion row concurrently with a massive army 189 00:12:40,390 --> 00:12:44,260 ‫of machines, if possible, if your database transactional. 190 00:12:44,500 --> 00:12:52,540 ‫Then that's kind of pointless because the moment you start the army searching or working with your huge 191 00:12:52,540 --> 00:13:01,780 ‫table partitioned, right spliced, then it will go out of there the moment you start because people 192 00:13:01,780 --> 00:13:02,760 ‫start editing, right? 193 00:13:02,800 --> 00:13:04,460 ‫People changing all the time. 194 00:13:04,480 --> 00:13:06,550 ‫All right, guys, I'm going to see you in the next one. 195 00:13:06,550 --> 00:13:07,480 ‫You guys stay awesome. 196 00:13:07,510 --> 00:13:08,080 ‫Goodbye.