1 00:00:00,330 --> 00:00:08,550 ‫Roe versus Collum oriented databased storages are two style of bewitch databases used to store their 2 00:00:08,550 --> 00:00:13,130 ‫tables on disk, each have pros and cons. 3 00:00:13,440 --> 00:00:14,650 ‫Nothing is perfect. 4 00:00:14,670 --> 00:00:17,130 ‫There are use cases for column oriented. 5 00:00:17,220 --> 00:00:21,110 ‫There are use cases for ROE oriented databases. 6 00:00:22,020 --> 00:00:29,790 ‫And this video, I want to discuss what is a role oriented database or also called role store database, 7 00:00:29,790 --> 00:00:31,920 ‫because that's that's the store type. 8 00:00:31,920 --> 00:00:35,550 ‫How is it stored on desk right now? 9 00:00:35,610 --> 00:00:41,940 ‫Got to also describe the column oriented database or also known as Column Sterol. 10 00:00:42,270 --> 00:00:46,290 ‫Another fancy name is called Menar Database because. 11 00:00:46,290 --> 00:00:50,700 ‫Yeah, why not just confuse everybody, buy new by inventing new names. 12 00:00:51,030 --> 00:00:52,500 ‫So it's all in the same thing. 13 00:00:52,950 --> 00:00:56,280 ‫Just see how things are stored on this. 14 00:00:56,280 --> 00:00:58,200 ‫We're going to talk about the differences too. 15 00:00:58,410 --> 00:01:04,740 ‫And then finally going to discuss pros and cons of both. 16 00:01:05,110 --> 00:01:06,480 ‫How about we jump into it, guys? 17 00:01:06,600 --> 00:01:07,020 ‫Auriga. 18 00:01:07,020 --> 00:01:09,420 ‫So let's say we have a beautiful table like this. 19 00:01:10,140 --> 00:01:15,930 ‫It's an employee's table, doesn't only have eight roles that have thousands of thousands of rows, 20 00:01:16,560 --> 00:01:18,330 ‫but for some lazy we only have these. 21 00:01:18,360 --> 00:01:19,290 ‫These are the columns. 22 00:01:20,340 --> 00:01:26,520 ‫Straightforward and the first I want to bring attention to this, this is the real I.D. This is how 23 00:01:26,910 --> 00:01:33,720 ‫databases it's a unique identifier that identifies that it's different than the primary key or this 24 00:01:33,720 --> 00:01:40,260 ‫I.D. It's just something that has most of the time most databases have this internal mechanism to identify 25 00:01:40,260 --> 00:01:40,820 ‫this role. 26 00:01:41,520 --> 00:01:43,230 ‫And I want you to pay attention to this. 27 00:01:43,350 --> 00:01:50,970 ‫So if I'm going to submit a bunch of queries on this table, I want to do this without any indexes because 28 00:01:50,970 --> 00:01:53,970 ‫indexes will just complicate the manner. 29 00:01:53,970 --> 00:01:56,990 ‫And by the way, we think about column versus roll. 30 00:01:57,050 --> 00:02:00,780 ‫Again, I don't I want to take them out of the equation and think about them. 31 00:02:01,080 --> 00:02:03,030 ‫Just we don't have any indexes. 32 00:02:03,060 --> 00:02:07,860 ‫Were just scanning the table to try to answer these following questions. 33 00:02:08,100 --> 00:02:12,240 ‫So these are the following queries that I'm going to execute on our database. 34 00:02:12,360 --> 00:02:14,580 ‫And then I'm going to ask you the same exact query. 35 00:02:14,580 --> 00:02:17,070 ‫Don't call column database and see which one does best. 36 00:02:17,670 --> 00:02:22,590 ‫They're for each style of how we're going to store these stables. 37 00:02:22,600 --> 00:02:22,970 ‫Right. 38 00:02:22,980 --> 00:02:28,470 ‫The same thing we're going to store tomorrow, the same day we're going to start on column storage and 39 00:02:28,470 --> 00:02:30,800 ‫then we're going to look at how these queries perform. 40 00:02:30,810 --> 00:02:35,640 ‫So select first name from Blois, where Social Security number 666, The Devil. 41 00:02:36,090 --> 00:02:41,310 ‫And then we're going to select store from employee where ID equal one very simple game. 42 00:02:41,610 --> 00:02:43,200 ‫And the final is to select some. 43 00:02:43,590 --> 00:02:46,560 ‫We're just going to sum all the salaries of the employees for some reason. 44 00:02:48,720 --> 00:02:51,180 ‫So the first one, rule oriented databases. 45 00:02:51,600 --> 00:02:58,440 ‫So for me oriented databases, tables are stored as rolls on disk. 46 00:02:58,950 --> 00:03:02,280 ‫And this might sound a little bit confusing when you first read it. 47 00:03:02,610 --> 00:03:07,260 ‫Of course, Hosain, how else are you going to store stuff in the day you get a store roll by it? 48 00:03:07,260 --> 00:03:07,670 ‫All right. 49 00:03:07,950 --> 00:03:15,600 ‫So if you think of the disk as a blob, huge blob store of just things that you just just a bunch of 50 00:03:15,600 --> 00:03:22,250 ‫bits and bytes, then the rows are just contiguous, one after the other. 51 00:03:22,260 --> 00:03:27,750 ‫So the ID followed by the first name, followed by the second one last thing that followed by their 52 00:03:27,750 --> 00:03:31,500 ‫search was to go down and then comma and then the second row starts. 53 00:03:31,500 --> 00:03:31,960 ‫Right. 54 00:03:32,010 --> 00:03:33,030 ‫And then so on. 55 00:03:33,030 --> 00:03:35,520 ‫Each row obviously has a different size. 56 00:03:35,640 --> 00:03:38,030 ‫It could be variable size because I don't know. 57 00:03:38,220 --> 00:03:42,570 ‫So like my my name is John Smith and someone else named Hussein Mohamed. 58 00:03:43,110 --> 00:03:43,360 ‫Right. 59 00:03:43,360 --> 00:03:44,460 ‫So that's a longer name. 60 00:03:44,460 --> 00:03:47,040 ‫So it's going to occupy more storage on disk, obviously. 61 00:03:47,190 --> 00:03:49,920 ‫And that's the rules can get very much for simplicity. 62 00:03:49,920 --> 00:03:52,070 ‫Just think of was their fixed size roles. 63 00:03:52,530 --> 00:03:52,910 ‫All right. 64 00:03:53,220 --> 00:03:56,760 ‫When I want to read a particular row. 65 00:03:58,030 --> 00:04:08,920 ‫I go, I do an operation on desk called, I read when I read this stuff that controller, whether this 66 00:04:08,920 --> 00:04:16,230 ‫is an SSD drive or a hard disk, reads and blocks, it doesn't read like bit by bit more. 67 00:04:16,240 --> 00:04:19,350 ‫Oh, but zero one zero one doesn't really like that. 68 00:04:19,360 --> 00:04:20,160 ‫It really just a block. 69 00:04:20,200 --> 00:04:28,030 ‫So let's shove 512 bytes at a time and when you or I don't know, a thousand twenty four bytes at a 70 00:04:28,030 --> 00:04:29,920 ‫time depends on the block size. 71 00:04:29,930 --> 00:04:31,270 ‫There's something called block side. 72 00:04:31,570 --> 00:04:37,410 ‫And once you read that you're going to get not only one roll, you could get one, two, three, four, 73 00:04:37,420 --> 00:04:39,550 ‫five rows in a given block. 74 00:04:39,550 --> 00:04:42,340 ‫So you get the act of doing one. 75 00:04:42,550 --> 00:04:48,970 ‫You can give you five rows and just not all these five rows, five rows and all their columns or their 76 00:04:48,970 --> 00:04:49,440 ‫values. 77 00:04:49,900 --> 00:04:50,200 ‫So. 78 00:04:50,350 --> 00:04:53,290 ‫So whether you want them or not, you're going to get everything. 79 00:04:53,320 --> 00:04:54,560 ‫You're going to get all the columns. 80 00:04:54,610 --> 00:04:56,490 ‫So think about that. 81 00:04:56,920 --> 00:05:02,450 ‫So technically, you need more Io's to find the row you're looking for. 82 00:05:03,070 --> 00:05:08,020 ‫So imagine you're doing us a table sequential scan on on the entire table. 83 00:05:08,320 --> 00:05:16,810 ‫So if you're looking for employee with a Social Security number 666, you're going to have to do a block. 84 00:05:16,810 --> 00:05:18,760 ‫I oh, you got, I don't know, seven rows. 85 00:05:19,060 --> 00:05:21,430 ‫Then search these rows one by one. 86 00:05:21,430 --> 00:05:25,450 ‫You're only interested in the Social Security number, so you're going to only pick that field. 87 00:05:25,610 --> 00:05:28,060 ‫Unfortunately, you you got all the fields. 88 00:05:28,480 --> 00:05:29,770 ‫Not saying that's bad. 89 00:05:29,770 --> 00:05:30,580 ‫Could be bad. 90 00:05:30,580 --> 00:05:31,150 ‫Could be good. 91 00:05:31,150 --> 00:05:34,750 ‫Depends what you're doing, but that's what you have to work with. 92 00:05:34,960 --> 00:05:42,190 ‫So you have to do a lot of Io's to to to actually get to the role that you were looking for because 93 00:05:42,190 --> 00:05:49,270 ‫you spent a lot of wasteful reads pulling columns that you didn't need and you don't have a choice. 94 00:05:49,720 --> 00:05:49,970 ‫Right. 95 00:05:49,990 --> 00:05:55,120 ‫So that's that's just one thing you have to think about with your oriented as I'm in there. 96 00:05:55,120 --> 00:05:56,920 ‫But no, they are beautiful. 97 00:05:56,920 --> 00:05:59,980 ‫They are elegantly designed because they are so simple. 98 00:06:00,010 --> 00:06:00,850 ‫They are just the rules. 99 00:06:01,180 --> 00:06:03,820 ‫And you can easily think about them. 100 00:06:03,820 --> 00:06:07,750 ‫You can easily implement them compared to what we're going to come up next. 101 00:06:08,200 --> 00:06:09,550 ‫So let's dive deep into this. 102 00:06:09,580 --> 00:06:14,680 ‫So we're going to take our table and then we're going to convert into a rule based on desk. 103 00:06:14,680 --> 00:06:16,570 ‫How would how will this look like? 104 00:06:16,900 --> 00:06:17,620 ‫Let's take a look. 105 00:06:18,730 --> 00:06:23,820 ‫This is how we're going to look like, so imagine this gray box as a block, right? 106 00:06:24,070 --> 00:06:28,480 ‫And for simplicity, I bought I put two rows in the same block. 107 00:06:28,870 --> 00:06:29,470 ‫Could be three. 108 00:06:29,470 --> 00:06:30,010 ‫Could be four. 109 00:06:30,010 --> 00:06:30,630 ‫Could be five. 110 00:06:30,820 --> 00:06:31,620 ‫Whoknows, right. 111 00:06:31,630 --> 00:06:32,710 ‫Depends on the block size. 112 00:06:32,710 --> 00:06:43,150 ‫But think about this as as a given block, each gray box takes one eye or jump to the controller to 113 00:06:43,150 --> 00:06:44,860 ‫pull that block. 114 00:06:45,040 --> 00:06:47,740 ‫So once you did you ask for this block. 115 00:06:47,950 --> 00:06:49,300 ‫You got everything son. 116 00:06:49,390 --> 00:06:50,800 ‫You got everything right. 117 00:06:51,040 --> 00:06:53,040 ‫So now this is how it is stored. 118 00:06:53,050 --> 00:06:54,100 ‫Basically the raw. 119 00:06:54,310 --> 00:06:55,270 ‫Come on, really. 120 00:06:55,270 --> 00:06:55,630 ‫Come on. 121 00:06:55,630 --> 00:06:58,600 ‫It's just literally consequent right. 122 00:06:58,600 --> 00:06:59,020 ‫Values. 123 00:06:59,020 --> 00:07:07,720 ‫But I just add a comma for you guys to understand that ID first names like a last name SSN or the celery 124 00:07:08,050 --> 00:07:11,950 ‫birth date, I guess when this John the occupation and so on. 125 00:07:11,980 --> 00:07:17,370 ‫And so you can see that and this I added this so it can indicate the second drawer. 126 00:07:17,860 --> 00:07:19,660 ‫It doesn't really exist. 127 00:07:19,660 --> 00:07:22,510 ‫It's just an identifier like to allow that. 128 00:07:22,510 --> 00:07:27,820 ‫Oh, this is actually a second row that just started and an arrow just ended. 129 00:07:27,820 --> 00:07:31,720 ‫And you can see that we these are the two rows, two rows tourism. 130 00:07:32,710 --> 00:07:37,860 ‫So let's execute our first query against a row oriented database. 131 00:07:38,710 --> 00:07:46,920 ‫I'm looking for the first name for employee where their Social Security number 666, that four. 132 00:07:47,470 --> 00:07:47,890 ‫All right. 133 00:07:48,310 --> 00:07:49,030 ‫So let's pull it. 134 00:07:50,830 --> 00:07:51,410 ‫What do we do? 135 00:07:52,000 --> 00:07:57,040 ‫We ask because I don't know, I have to start from the top, right, because I don't know where it employs 136 00:07:57,040 --> 00:08:00,460 ‫six six six is something like this. 137 00:08:00,760 --> 00:08:07,510 ‫No, this field has any relation to the role itself is just a random value. 138 00:08:07,680 --> 00:08:08,010 ‫Right. 139 00:08:08,290 --> 00:08:10,960 ‫So I have to pull the first block of the table. 140 00:08:11,590 --> 00:08:14,230 ‫Does sex, sex, sex exist in this NUPE? 141 00:08:15,340 --> 00:08:20,800 ‫This is two to two, and this guy is one one one, that's the social worker, so let's pull the second 142 00:08:20,800 --> 00:08:22,050 ‫block, right. 143 00:08:23,230 --> 00:08:24,430 ‫Nope, it doesn't exist. 144 00:08:24,460 --> 00:08:26,370 ‫This is four four four three three three, right. 145 00:08:26,650 --> 00:08:29,620 ‫And yeah, it might look sorted, but it doesn't have to be sorted. 146 00:08:29,620 --> 00:08:34,300 ‫Guys, just this is just I'm I'm walking through that area with you. 147 00:08:34,840 --> 00:08:35,070 ‫Right. 148 00:08:35,080 --> 00:08:36,430 ‫Again, noindex is nothing. 149 00:08:37,800 --> 00:08:38,760 ‫Now, I'm going to. 150 00:08:39,740 --> 00:08:46,700 ‫Pull the next block, so I did three reads and then I happen to find that six, six, six. 151 00:08:47,060 --> 00:08:49,890 ‫So I found what I'm looking for now. 152 00:08:50,000 --> 00:08:52,570 ‫I want the first name. 153 00:08:52,940 --> 00:09:00,470 ‫That's what I don't need to do any extra read y because the first name is right there in the memory 154 00:09:00,470 --> 00:09:06,770 ‫now, because once I pull this block, it's a memory, it's in the ramp and when I ask for it, I don't 155 00:09:06,770 --> 00:09:09,580 ‫have to go back to the desk because yeah, guess what, I just pulled it. 156 00:09:09,710 --> 00:09:10,750 ‫I found it. 157 00:09:10,760 --> 00:09:12,470 ‫Now I just think this is the first thing. 158 00:09:12,480 --> 00:09:17,030 ‫And I know because the position this position number two is the first time to pull it. 159 00:09:17,330 --> 00:09:25,700 ‫So once I find something, asking for extra columns are really cheap because we we already got them 160 00:09:26,390 --> 00:09:27,500 ‫right again. 161 00:09:27,510 --> 00:09:28,820 ‫Guys, you have to think about this. 162 00:09:28,820 --> 00:09:34,400 ‫Whatever I'm going to explain to you, there is no bad or good depends on your use case, depending 163 00:09:34,400 --> 00:09:36,110 ‫on your query. 164 00:09:36,110 --> 00:09:37,670 ‫That depends on your clauses. 165 00:09:37,940 --> 00:09:40,610 ‫And based on that, you pick roll versus column this week. 166 00:09:40,700 --> 00:09:41,410 ‫I'm going to come to that. 167 00:09:41,660 --> 00:09:42,720 ‫So that's the first query. 168 00:09:43,760 --> 00:09:44,690 ‫Let's do another query. 169 00:09:46,020 --> 00:09:46,960 ‫Is there another way? 170 00:09:47,790 --> 00:09:50,100 ‫No, I guess it's executed immediately. 171 00:09:50,520 --> 00:09:55,410 ‫All right, so we're going to do a select star from employee where I'd equal one, right? 172 00:09:55,740 --> 00:10:03,630 ‫And since it's already equal one, this if this is a sequence, the database can do tricks where it 173 00:10:03,630 --> 00:10:10,140 ‫can link this sequence with that idea says, OK, I.D. one is on four thousand and one. 174 00:10:10,140 --> 00:10:16,410 ‫So I know actually where to jump and pull this roll this block. 175 00:10:16,740 --> 00:10:17,040 ‫Right. 176 00:10:17,130 --> 00:10:21,480 ‫It doesn't have to read other things, but it happened to be that this the first book is what we're 177 00:10:21,480 --> 00:10:22,040 ‫looking for. 178 00:10:22,050 --> 00:10:24,150 ‫So yeah, we're lucky I guess. 179 00:10:24,510 --> 00:10:29,790 ‫But even if we don't have this track where we link the ID to the real ID, we still have to go through 180 00:10:29,790 --> 00:10:30,480 ‫each block. 181 00:10:30,480 --> 00:10:34,050 ‫And once we find the block, we won't guess what. 182 00:10:34,680 --> 00:10:36,420 ‫Now I want all the columns. 183 00:10:37,550 --> 00:10:38,240 ‫Can I do this? 184 00:10:38,540 --> 00:10:39,320 ‫Absolutely. 185 00:10:39,680 --> 00:10:48,530 ‫That is so cheap because all the columns are already a memory, assuming they are not vertically partitioned 186 00:10:48,530 --> 00:10:50,600 ‫into another location. 187 00:10:50,600 --> 00:10:52,280 ‫Right, vertical partition. 188 00:10:52,310 --> 00:10:55,970 ‫I talked about that and many times and in my channels and my courses. 189 00:10:56,330 --> 00:11:00,250 ‫So vertical partitioning where we take a column and put it somewhere else. 190 00:11:00,260 --> 00:11:00,550 ‫Right. 191 00:11:00,740 --> 00:11:03,970 ‫This is not the case where everything is in one beautiful block. 192 00:11:03,980 --> 00:11:10,670 ‫So I, I pulled the block, I asked for the I.D., I found the I.D. and now the user is asking for every 193 00:11:10,670 --> 00:11:11,020 ‫field. 194 00:11:11,090 --> 00:11:15,350 ‫I always tell people that, hey, selects those bad guys, stay away from it. 195 00:11:15,350 --> 00:11:18,080 ‫But it really depends what you're trying to do. 196 00:11:18,080 --> 00:11:22,310 ‫If you have indexes, try to do less of that stuff. 197 00:11:22,370 --> 00:11:27,260 ‫So next thing, we're going to learn that it's the worst thing you can do in a column oriented database. 198 00:11:27,500 --> 00:11:29,020 ‫So let's start enroll. 199 00:11:29,960 --> 00:11:31,400 ‫You can live without it, I guess. 200 00:11:31,400 --> 00:11:31,880 ‫Still. 201 00:11:32,960 --> 00:11:37,370 ‫Just just be aware of that, so that's cheap, that's relatively cheap because I'm going to pull all 202 00:11:37,370 --> 00:11:42,080 ‫the roads and everything and memory beautiful assume fit in memory. 203 00:11:42,080 --> 00:11:45,480 ‫So could it could only be one block. 204 00:11:45,500 --> 00:11:50,500 ‫Not necessarily because a road can span multiple blocks if it's too huge. 205 00:11:50,510 --> 00:11:57,920 ‫I say we have a, I don't know, some blob here field and I it's a bad idea to store Blob's in line 206 00:11:57,920 --> 00:12:01,280 ‫and you way, but let's assume you do. 207 00:12:01,280 --> 00:12:06,680 ‫But that means this block, this road can span seven blocks for example. 208 00:12:06,860 --> 00:12:09,830 ‫So you need to read all the block. 209 00:12:09,830 --> 00:12:10,460 ‫Not necessarily. 210 00:12:10,460 --> 00:12:11,350 ‫The data are smart. 211 00:12:11,540 --> 00:12:12,700 ‫It's going to be the first block. 212 00:12:12,710 --> 00:12:15,020 ‫Find the ID and only the rows. 213 00:12:15,020 --> 00:12:19,970 ‫The, the columns you asked for is going to ask for the rest of the block to to fetch your stuff. 214 00:12:20,210 --> 00:12:21,850 ‫Then I would have been doing this for years. 215 00:12:21,860 --> 00:12:23,120 ‫They know what they doing. 216 00:12:23,120 --> 00:12:27,220 ‫But I want you just I want to explain how is doing this stuff. 217 00:12:27,680 --> 00:12:28,090 ‫Awesome. 218 00:12:29,060 --> 00:12:30,380 ‫So let's do the same thing. 219 00:12:30,380 --> 00:12:33,530 ‫But in aggregate this is called an aggregate function, which is an aggregate. 220 00:12:33,530 --> 00:12:36,080 ‫It's just grouping by something. 221 00:12:36,080 --> 00:12:36,350 ‫Right. 222 00:12:36,500 --> 00:12:42,410 ‫So you work on one or a few columns, mainly one or less. 223 00:12:42,750 --> 00:12:45,440 ‫How do you work with all of that's just 224 00:12:48,590 --> 00:12:49,140 ‫an idea. 225 00:12:49,420 --> 00:12:53,520 ‫OK, so you work with a few columns or let's say one in this example. 226 00:12:53,660 --> 00:12:56,960 ‫OK, so select some salary from employers. 227 00:12:56,960 --> 00:12:57,740 ‫Just some. 228 00:12:57,740 --> 00:12:58,400 ‫Every other. 229 00:12:59,850 --> 00:13:00,510 ‫Let's do this. 230 00:13:01,540 --> 00:13:02,840 ‫So we pick the first block. 231 00:13:03,700 --> 00:13:06,790 ‫Well, tough luck, I only wanted to say tough luck. 232 00:13:06,820 --> 00:13:08,350 ‫I got everything again. 233 00:13:08,380 --> 00:13:10,050 ‫No indexes here with indexes. 234 00:13:10,060 --> 00:13:15,820 ‫This could be extremely fast, but I'm I'm just telling you what the database does without indexes. 235 00:13:16,060 --> 00:13:17,560 ‫So you pull this oh. 236 00:13:17,710 --> 00:13:18,430 ‫Back this early. 237 00:13:18,430 --> 00:13:21,710 ‫It's one thousand and one and one thousand two. 238 00:13:21,730 --> 00:13:23,160 ‫So we summed those two. 239 00:13:23,500 --> 00:13:26,520 ‫Now let's pull the next block so we get the Mauro's. 240 00:13:26,710 --> 00:13:32,260 ‫OK, one thousand one hundred and three one of Riquet went off OK, some of them. 241 00:13:32,470 --> 00:13:36,330 ‫And then pull one oh five or six, some of them. 242 00:13:36,910 --> 00:13:45,220 ‫So guys, every block we try, we pull, we're pulling Roy these are pulling first names and last names, 243 00:13:45,220 --> 00:13:49,050 ‫bonuses and we're pulling salary opening there, which we never use. 244 00:13:49,540 --> 00:13:49,890 ‫Right. 245 00:13:50,020 --> 00:13:57,280 ‫So imagine if I only asked for the for the salaries and if they were they were grouped nicely, that 246 00:13:57,280 --> 00:13:58,560 ‫would be awesome, wouldn't it? 247 00:13:59,080 --> 00:14:03,820 ‫But now, unfortunately, we're asking for salary, but we're pulling all the rules. 248 00:14:03,820 --> 00:14:05,360 ‫So that's a lot of iio. 249 00:14:05,410 --> 00:14:07,930 ‫So again, this is not just one read. 250 00:14:08,380 --> 00:14:09,940 ‫It can appear in this. 251 00:14:10,210 --> 00:14:10,930 ‫He has a one read. 252 00:14:11,110 --> 00:14:14,110 ‫If the road is long, you can read seven blocks. 253 00:14:14,110 --> 00:14:15,880 ‫So that's seven Io's, right? 254 00:14:16,120 --> 00:14:22,960 ‫I mean, the other databases can do a trick where they send one request asynchronously, read multiple 255 00:14:22,960 --> 00:14:23,410 ‫blocks. 256 00:14:23,410 --> 00:14:31,660 ‫But regardless, you're hitting the desk, which which every time you hit the desk is a duck dies essentially. 257 00:14:31,690 --> 00:14:31,960 ‫Right. 258 00:14:31,960 --> 00:14:34,930 ‫So try to save Ducks' as much as possible. 259 00:14:34,930 --> 00:14:36,060 ‫Save the ducks, guys. 260 00:14:36,130 --> 00:14:36,820 ‫Save the ducks. 261 00:14:37,330 --> 00:14:39,160 ‫So yeah. 262 00:14:39,340 --> 00:14:42,370 ‫Reading, reading and well something well something all of that stuff. 263 00:14:42,970 --> 00:14:45,190 ‫That's not so bad I guess. 264 00:14:45,190 --> 00:14:45,400 ‫Yeah. 265 00:14:45,790 --> 00:14:48,370 ‫But you read unnecessary information. 266 00:14:48,370 --> 00:14:51,550 ‫That means you almost read the entire table. 267 00:14:53,080 --> 00:14:55,180 ‫But used very. 268 00:14:56,780 --> 00:15:03,190 ‫Very small portion of it that's extremely inefficient, so raw data doesn't do well, this kind of squares, 269 00:15:03,190 --> 00:15:08,600 ‫if you think about it, it doesn't mean it's slow, really depends what you mean by slow. 270 00:15:08,890 --> 00:15:14,590 ‫I mean, you you have run these kind of databases, do all kind of tricks to be fast. 271 00:15:14,590 --> 00:15:20,770 ‫But if you think about the logic and I'm not saying this is how databases absolutely work, they have 272 00:15:20,770 --> 00:15:21,640 ‫all sorts of trick. 273 00:15:21,640 --> 00:15:23,010 ‫Again, multi threading. 274 00:15:23,110 --> 00:15:27,460 ‫They send multiple threads, Boscastle and multiple thread to read asynchronously. 275 00:15:27,460 --> 00:15:30,490 ‫So so they do all sorts of work to do fast. 276 00:15:30,820 --> 00:15:34,750 ‫But let's discuss how column oriented databases work. 277 00:15:35,530 --> 00:15:36,910 ‫So old Coolmore into databases. 278 00:15:36,910 --> 00:15:41,480 ‫If you think about them, tables are stored as columns first and disk. 279 00:15:41,500 --> 00:15:43,090 ‫So think about. 280 00:15:44,070 --> 00:15:45,270 ‫The first name called. 281 00:15:46,670 --> 00:15:48,670 ‫The I'd call the last name call. 282 00:15:48,830 --> 00:15:53,450 ‫So what they do is they take the first name and they take all possible values. 283 00:15:54,170 --> 00:15:56,240 ‫John, Melissa. 284 00:15:58,600 --> 00:16:08,080 ‫Rik, Paul, Hussein, everybody, and pull all of these into and in and save them and desk as consequence 285 00:16:08,080 --> 00:16:13,990 ‫value, and then once that table is done and that their last value of the first name is done, they 286 00:16:13,990 --> 00:16:17,830 ‫start with the second one last name and then and then so on. 287 00:16:18,250 --> 00:16:20,740 ‫So you might say, oh, why are we doing this? 288 00:16:20,740 --> 00:16:21,710 ‫And we're going to come to that. 289 00:16:22,390 --> 00:16:31,740 ‫So a single book I'll read to the table features multiple columns with all matching rows. 290 00:16:32,290 --> 00:16:35,080 ‫So it features a column. 291 00:16:35,990 --> 00:16:42,920 ‫It could be fish one column or multiple depends, it depends like if you have a small table, you can 292 00:16:42,920 --> 00:16:43,790 ‫fish multiple one. 293 00:16:43,880 --> 00:16:46,250 ‫If you ask the database's, hey, give me this. 294 00:16:46,430 --> 00:16:48,110 ‫You're going to get a lot of arrows. 295 00:16:48,710 --> 00:16:50,140 ‫That's the trick here. 296 00:16:50,450 --> 00:16:53,660 ‫So single call with a lot of those lists. 297 00:16:53,660 --> 00:16:58,910 ‫Io's are required to get more values of a given column. 298 00:16:59,630 --> 00:17:00,530 ‫Right, because. 299 00:17:01,600 --> 00:17:09,520 ‫If you want to get, as we said, the first hundred salaries, that is extremely efficient and goal 300 00:17:09,520 --> 00:17:15,970 ‫oriented because they just go to the place where we know we started the salary column and just walk 301 00:17:15,970 --> 00:17:18,960 ‫through there because you get to see the values just consequent. 302 00:17:19,000 --> 00:17:23,620 ‫We're going to go to the example and they are great on online and process, as we're going to see and 303 00:17:23,620 --> 00:17:24,910 ‫then in the coming slides. 304 00:17:25,540 --> 00:17:25,840 ‫All right. 305 00:17:25,840 --> 00:17:29,370 ‫So let's take the same exact table and start as Comb's. 306 00:17:29,380 --> 00:17:30,250 ‫Here's how it looks like. 307 00:17:30,250 --> 00:17:32,290 ‫Again, the ride is very critical here. 308 00:17:32,800 --> 00:17:34,270 ‫The I.D. field is this. 309 00:17:34,450 --> 00:17:36,430 ‫This is the first name that can last name. 310 00:17:36,430 --> 00:17:39,760 ‫I keep saying second Social Security number. 311 00:17:40,060 --> 00:17:44,440 ‫This is the salary, the date of birth, the that title, I think. 312 00:17:44,440 --> 00:17:45,610 ‫And this is the joint date. 313 00:17:46,030 --> 00:17:46,720 ‫Look at this. 314 00:17:47,050 --> 00:17:48,490 ‫So look at this. 315 00:17:48,770 --> 00:17:53,230 ‫Want the ID one and this, which is this thousand and one. 316 00:17:53,590 --> 00:17:57,400 ‫I need to withdraw three thousand three. 317 00:17:57,520 --> 00:17:59,650 ‫Why we're going to we're going to see look at this. 318 00:17:59,830 --> 00:18:02,140 ‫The wrong idea is duplicated in every column. 319 00:18:03,620 --> 00:18:10,310 ‫So just like that, what is the first thing that comes to mind or anything is going to be painful? 320 00:18:11,390 --> 00:18:19,510 ‫Because anything if you delete roll 2004, you have to boo boo, boo, boo, boo boo. 321 00:18:20,630 --> 00:18:25,550 ‫You have to go and mark all these stupid columns or these blocks. 322 00:18:25,560 --> 00:18:26,780 ‫Remember, these are blocks, right? 323 00:18:27,140 --> 00:18:33,680 ‫So and for fun, I split some of them and multiple blocks doesn't mean they are like, I know this is 324 00:18:33,680 --> 00:18:35,690 ‫an integer, for example, or string. 325 00:18:36,050 --> 00:18:43,910 ‫I just split them for me that OK, we couldn't store all all four rolls here in one block, so we had 326 00:18:43,910 --> 00:18:45,320 ‫to split it another block. 327 00:18:45,320 --> 00:18:45,580 ‫Right. 328 00:18:45,590 --> 00:18:47,660 ‫So I can show you a different kind of examples. 329 00:18:48,110 --> 00:18:49,850 ‫Again, take it with a grain of salt. 330 00:18:49,850 --> 00:18:51,790 ‫I just want to explain this things to you. 331 00:18:51,920 --> 00:18:57,530 ‫So this is one block to block to block two blocks, one block, one block, one block, one block. 332 00:18:57,540 --> 00:18:58,460 ‫OK, got it. 333 00:18:59,600 --> 00:19:00,700 ‫So this is how it works. 334 00:19:00,860 --> 00:19:02,810 ‫So they just store all this stuff in columns. 335 00:19:02,810 --> 00:19:08,150 ‫So all the Eidsvold the entire the entire table's here. 336 00:19:08,300 --> 00:19:08,600 ‫Right. 337 00:19:08,750 --> 00:19:09,980 ‫All the possible values. 338 00:19:10,280 --> 00:19:13,190 ‫So this is not just one block. 339 00:19:13,190 --> 00:19:17,510 ‫Could be hundreds of thousands of blocks if you have a lot of rows. 340 00:19:18,170 --> 00:19:20,270 ‫Remember, everything need to be updated. 341 00:19:20,450 --> 00:19:26,480 ‫If you add a new role, you have to get that updated in all of these logical structures. 342 00:19:26,840 --> 00:19:31,310 ‫So what that being said, now we know how this works. 343 00:19:32,180 --> 00:19:33,860 ‫Let's put it to the test. 344 00:19:33,980 --> 00:19:38,170 ‫Select first name from employee where Social Security number six six six. 345 00:19:38,360 --> 00:19:43,220 ‫So now the database will say, wait a second, you're looking for Social Security number. 346 00:19:43,580 --> 00:19:50,810 ‫I don't need to look at any of other logical structures except for the Social Security, which is this 347 00:19:50,810 --> 00:19:51,170 ‫puppy. 348 00:19:51,320 --> 00:19:55,730 ‫This is the Social Security data descried. 349 00:19:55,730 --> 00:19:58,280 ‫This is how I know this is the location. 350 00:19:58,580 --> 00:20:03,890 ‫So in DESC, I know where to point my needle and read this stuff. 351 00:20:04,310 --> 00:20:10,010 ‫So to search for Social Security number, I only need to pull this right. 352 00:20:10,130 --> 00:20:12,590 ‫So now I start reading block by block. 353 00:20:12,770 --> 00:20:16,970 ‫I need first block and I got a lot of beautiful values. 354 00:20:16,970 --> 00:20:19,550 ‫I got five five five four four, four, three, two, one, two, one. 355 00:20:21,230 --> 00:20:22,670 ‫We did not get six six. 356 00:20:23,300 --> 00:20:23,960 ‫So no. 357 00:20:24,020 --> 00:20:24,860 ‫OK, no problem. 358 00:20:24,860 --> 00:20:26,810 ‫Let's pull the second block. 359 00:20:26,960 --> 00:20:30,530 ‫This is almost like the roll based, but now add the column level. 360 00:20:30,530 --> 00:20:31,220 ‫Does that make sense. 361 00:20:31,970 --> 00:20:33,500 ‫So now we got it. 362 00:20:33,860 --> 00:20:34,610 ‫We found it. 363 00:20:36,260 --> 00:20:38,210 ‫But now I need the first name. 364 00:20:39,170 --> 00:20:41,480 ‫What first name is not here, son? 365 00:20:42,530 --> 00:20:47,230 ‫You only got the six six now, you found that this is actually 2006. 366 00:20:47,450 --> 00:20:50,390 ‫That's almost like an in how indexing works. 367 00:20:50,390 --> 00:20:53,480 ‫If you think about it, this is how Posterous stores indexes. 368 00:20:53,660 --> 00:20:57,680 ‫Actually, if you think about this as an index, it's almost very similar. 369 00:20:57,870 --> 00:20:58,220 ‫Right? 370 00:20:58,610 --> 00:21:01,760 ‫It just points to the row now, 2006. 371 00:21:02,730 --> 00:21:11,190 ‫Roll twenty six, I know this value, I know the roll I'm looking for, this is very critical and I 372 00:21:11,250 --> 00:21:13,980 ‫am asked for the first name was the first name. 373 00:21:13,980 --> 00:21:15,540 ‫The first name is right here. 374 00:21:15,540 --> 00:21:16,680 ‫This is the first name. 375 00:21:16,680 --> 00:21:19,620 ‫So I'm not going to read these, blah. 376 00:21:19,860 --> 00:21:21,660 ‫I'm only going to jump right here. 377 00:21:22,110 --> 00:21:22,440 ‫Right. 378 00:21:22,570 --> 00:21:26,220 ‫So obviously we found this row, as we said, and then we can jump. 379 00:21:26,220 --> 00:21:28,980 ‫Right, not the first block. 380 00:21:28,980 --> 00:21:31,290 ‫I'm going to jump to this block, you might say. 381 00:21:31,290 --> 00:21:35,940 ‫I was saying, how did you know that it's in this block and you didn't pull this block because I know 382 00:21:35,940 --> 00:21:36,600 ‫the wrong number. 383 00:21:36,990 --> 00:21:44,490 ‫And the database does all sorts of trick to to say, OK, this block has only rose from one thousand 384 00:21:44,490 --> 00:21:45,900 ‫and one to the 1004. 385 00:21:45,930 --> 00:21:46,280 ‫Right. 386 00:21:46,560 --> 00:21:54,240 ‫So I'm going to jump to block number seven hundred and three because it exactly locate the row I want 387 00:21:54,540 --> 00:21:58,020 ‫because they have all this metadata mumbo-jumbo. 388 00:21:58,020 --> 00:21:58,410 ‫Right. 389 00:21:58,860 --> 00:22:00,750 ‫Again, that's back to the our question. 390 00:22:00,960 --> 00:22:04,740 ‫The 1006 link with a block is all there. 391 00:22:04,770 --> 00:22:07,380 ‫They know and they pull it and they find it. 392 00:22:07,380 --> 00:22:09,330 ‫So we had to do one jump. 393 00:22:09,630 --> 00:22:10,400 ‫One block, right. 394 00:22:10,650 --> 00:22:15,140 ‫Second block read, jumped back to another block read and then we got it. 395 00:22:15,150 --> 00:22:15,600 ‫So three. 396 00:22:16,080 --> 00:22:17,190 ‫No, that's so bad. 397 00:22:17,190 --> 00:22:21,930 ‫I guess let's just start from employee where it equal. 398 00:22:21,930 --> 00:22:28,410 ‫OK, let's see how how come how our beautiful co index Culham Storage do. 399 00:22:28,470 --> 00:22:28,890 ‫All right. 400 00:22:29,190 --> 00:22:30,300 ‫So I'd equal one again. 401 00:22:30,300 --> 00:22:35,040 ‫I don't have any knowledge of this so I'm going to start idy whereas idee this is dite. 402 00:22:35,040 --> 00:22:38,700 ‫So I need only to read these structures. 403 00:22:38,700 --> 00:22:40,020 ‫Right, let's put it. 404 00:22:41,480 --> 00:22:49,490 ‫Found one thousand and one, so now I have knowledge of the role, I know which book to read. 405 00:22:50,480 --> 00:22:51,260 ‫But. 406 00:22:53,320 --> 00:23:02,920 ‫The user asked me for everything, OK, let's jump on, OK, we found that jump onto the first name. 407 00:23:03,930 --> 00:23:09,480 ‫We know which blog to read, so this one put, that's one read, let's jump one over AIO. 408 00:23:10,510 --> 00:23:15,250 ‫Last name, because they want the last name, they want everything read, OK, founded because I know 409 00:23:15,760 --> 00:23:22,300 ‫I don't need to read this because I only know which block exactly in each column to read to find my 410 00:23:22,300 --> 00:23:22,690 ‫value. 411 00:23:22,960 --> 00:23:23,310 ‫All right. 412 00:23:23,970 --> 00:23:26,140 ‫This is in fine read. 413 00:23:27,980 --> 00:23:37,200 ‫Salary read, birthday read engineer, I have a bug here, I should have I didn't stretch it enough. 414 00:23:37,700 --> 00:23:38,780 ‫Engineer Reid. 415 00:23:40,020 --> 00:23:41,340 ‫And joined it read. 416 00:23:44,550 --> 00:23:45,520 ‫What have we done, guys? 417 00:23:46,480 --> 00:23:55,390 ‫This is the worst query, there is so much thrashing happening on disk that this is the worst thing 418 00:23:55,390 --> 00:23:55,880 ‫you can do. 419 00:23:56,350 --> 00:24:03,160 ‫Call them databases, just tanks when you do this, especially if you have a lot of columns and you're 420 00:24:03,160 --> 00:24:09,550 ‫asking for everything, and especially if you have like a lot of and queries or queries here where you 421 00:24:09,550 --> 00:24:12,550 ‫had to jump to find that in multiple rows. 422 00:24:14,240 --> 00:24:22,850 ‫There will dude, dude, dude, that dude endou that I'm talking to both of you, dude, dude that do 423 00:24:22,850 --> 00:24:23,540 ‫not do this. 424 00:24:24,020 --> 00:24:26,710 ‫So let's start with the raw stories and columns. 425 00:24:26,720 --> 00:24:28,210 ‫Stories do not do select stuff. 426 00:24:28,730 --> 00:24:34,790 ‫This is do you know how many ducks die when you do this kind of queries. 427 00:24:34,790 --> 00:24:40,010 ‫Guys, you go to the database, you read all these damn ducks, every IO kills a duck. 428 00:24:40,250 --> 00:24:40,580 ‫Right. 429 00:24:40,580 --> 00:24:41,950 ‫Save the ducks please, guys. 430 00:24:43,280 --> 00:24:46,190 ‫All right, let's do this like some celery for our employees. 431 00:24:46,190 --> 00:24:49,700 ‫Know that we saw how horrible column oriented databases can be. 432 00:24:49,940 --> 00:24:51,620 ‫Let's see how great they are. 433 00:24:53,120 --> 00:24:55,940 ‫I want to sum all the salaries. 434 00:24:55,940 --> 00:24:56,630 ‫Yes, sir. 435 00:24:57,050 --> 00:24:58,100 ‫What is the salary field? 436 00:24:58,370 --> 00:24:59,090 ‫There it is. 437 00:24:59,900 --> 00:25:05,930 ‫One read dun dun. 438 00:25:06,680 --> 00:25:10,130 ‫Well, if you have if you live in multiple blocks, are you going to read the multiple blocks. 439 00:25:10,130 --> 00:25:11,210 ‫But done. 440 00:25:11,690 --> 00:25:12,470 ‫That's it. 441 00:25:13,070 --> 00:25:13,880 ‫That's it. 442 00:25:14,510 --> 00:25:17,150 ‫And if you have caching and stuff like that. 443 00:25:17,390 --> 00:25:17,960 ‫Hmm. 444 00:25:18,980 --> 00:25:21,800 ‫And by the way, I didn't mention something. 445 00:25:21,950 --> 00:25:27,770 ‫If you, if people, multiple people have the same salary, column oriented databases do something called 446 00:25:27,770 --> 00:25:37,730 ‫compression and aggregate, they actually if let's say if if three people have 100000 salary, OK, 447 00:25:38,690 --> 00:25:40,190 ‫all of them have a lot of calories. 448 00:25:40,190 --> 00:25:40,420 ‫Right. 449 00:25:40,640 --> 00:25:42,080 ‫They have a lot of calories. 450 00:25:42,170 --> 00:25:49,010 ‫And then you can you can just have one in three, one hundred thousand with a comma besides one thousand 451 00:25:49,010 --> 00:25:51,200 ‫three, one thousand four, one thousand and five. 452 00:25:51,200 --> 00:25:52,300 ‫All of them have the same one. 453 00:25:52,310 --> 00:25:55,490 ‫So this is even more compact. 454 00:25:55,490 --> 00:26:00,590 ‫So this will give me a lot of bang for my buck by just doing one block again. 455 00:26:00,590 --> 00:26:06,170 ‫I sometimes if I if there is like a duplication new, this is the best thing for going the database 456 00:26:06,170 --> 00:26:10,520 ‫that you just shove it in the same row because they go they were going to do it as an array. 457 00:26:10,520 --> 00:26:14,330 ‫We talked about that a little bit in my Postgres Thirteen Polska thirteen. 458 00:26:14,660 --> 00:26:21,080 ‫They just started doing this index deduplication I believe they call it B three D duplication. 459 00:26:21,080 --> 00:26:22,550 ‫Postgres version thirteen. 460 00:26:22,730 --> 00:26:31,580 ‫They start shoving more duplicated values in a single leaf node so they can compress comparison. 461 00:26:32,540 --> 00:26:35,000 ‫So yeah, pros and cons. 462 00:26:35,480 --> 00:26:36,620 ‫Is this thing perfect. 463 00:26:36,740 --> 00:26:38,240 ‫Nope, nope, nope, nope. 464 00:26:38,240 --> 00:26:38,480 ‫Nothing. 465 00:26:38,490 --> 00:26:40,260 ‫But it depends on your use case guys. 466 00:26:40,580 --> 00:26:44,990 ‫And if you ask me, I still prefer Rabassa to be honest. 467 00:26:45,020 --> 00:26:46,970 ‫Just simplicity overrides and reads. 468 00:26:47,360 --> 00:26:52,250 ‫But there are some cases where you need column orienteers to summarize. 469 00:26:52,250 --> 00:26:53,930 ‫Right, right. 470 00:26:54,770 --> 00:26:57,770 ‫And reads you want a simple structure. 471 00:26:57,770 --> 00:27:01,610 ‫And we saw how complex column stores are. 472 00:27:01,730 --> 00:27:02,050 ‫Right. 473 00:27:02,840 --> 00:27:03,790 ‫They're very complex. 474 00:27:03,800 --> 00:27:04,970 ‫They you have to duplicate that. 475 00:27:04,970 --> 00:27:09,690 ‫You have to show that the the row IDs everywhere. 476 00:27:10,400 --> 00:27:13,430 ‫That can be a little bit messy for right and slow. 477 00:27:13,480 --> 00:27:13,850 ‫Right. 478 00:27:13,850 --> 00:27:19,340 ‫And that's why a lot of people use column based and warehouses and stuff like that and lakes, data 479 00:27:19,340 --> 00:27:22,310 ‫lakes where things are just almost static. 480 00:27:22,310 --> 00:27:26,720 ‫They don't edit and they just do a lot of analytics on a single column. 481 00:27:26,720 --> 00:27:28,820 ‫So that's powerful stuff. 482 00:27:28,820 --> 00:27:29,210 ‫Right. 483 00:27:29,570 --> 00:27:32,540 ‫So let's go through these pros and cons in general. 484 00:27:32,810 --> 00:27:37,430 ‫So for based, it's very optimal for read and write, right. 485 00:27:37,430 --> 00:27:37,850 ‫And end. 486 00:27:38,210 --> 00:27:39,290 ‫Well, I'm going to say read and write. 487 00:27:39,290 --> 00:27:44,960 ‫I'm going to say transactions in general or online transactional processing are they are very good for 488 00:27:44,960 --> 00:27:49,340 ‫this because they are very simple in their implementation. 489 00:27:50,320 --> 00:27:56,650 ‫So if something simple, we can enhance it and make it more and more efficient as we go, right, that's 490 00:27:56,650 --> 00:27:59,990 ‫why a writer fast reads almost fast. 491 00:27:59,990 --> 00:28:03,060 ‫It depends what how your career it looks like we saw some examples, right? 492 00:28:03,400 --> 00:28:11,410 ‫Online transfer transfer, online transaction protocols and processing transactions are great for Rule-based 493 00:28:11,410 --> 00:28:16,510 ‫because you start a transaction when we know what exactly what rules, what blocks to touch, we can 494 00:28:16,720 --> 00:28:23,800 ‫write the wall, the right ahead log efficiently because we know exactly where we're changing right 495 00:28:24,100 --> 00:28:26,290 ‫versus column, the small little bit. 496 00:28:26,290 --> 00:28:27,880 ‫The structure is all over. 497 00:28:27,880 --> 00:28:34,330 ‫The place that we have to do a scattershot compression isn't really as effective, though, because 498 00:28:34,330 --> 00:28:34,630 ‫that. 499 00:28:36,450 --> 00:28:36,990 ‫Think about it. 500 00:28:37,100 --> 00:28:40,030 ‫No, I'm not talking about the do duplication. 501 00:28:40,080 --> 00:28:41,780 ‫That's something you can do in the next. 502 00:28:41,830 --> 00:28:50,910 ‫I'm talking about just the idea of having the values in the in the role. 503 00:28:52,020 --> 00:28:55,210 ‫The role itself is almost distinct, right? 504 00:28:55,590 --> 00:28:59,520 ‫It's almost like a hash because it consists of different fields. 505 00:28:59,620 --> 00:29:03,300 ‫And so you can really easily compress it. 506 00:29:03,360 --> 00:29:03,750 ‫Right. 507 00:29:04,410 --> 00:29:11,630 ‫The compression algorithms is not going to find a lot of tweaks to compress because the values are not 508 00:29:12,120 --> 00:29:12,980 ‫consequence. 509 00:29:13,020 --> 00:29:13,220 ‫Right. 510 00:29:13,290 --> 00:29:16,200 ‫Compared to the column oriented. 511 00:29:16,200 --> 00:29:17,390 ‫Where are they? 512 00:29:17,460 --> 00:29:22,080 ‫The consequent values are almost the same type and almost the same thing. 513 00:29:22,080 --> 00:29:22,390 ‫Right. 514 00:29:22,590 --> 00:29:28,980 ‫So you're going to get a lot of chances were of duplication, like you get a lot of people named John. 515 00:29:29,010 --> 00:29:29,390 ‫Right. 516 00:29:29,640 --> 00:29:33,470 ‫So John's are going to be squashed together. 517 00:29:33,630 --> 00:29:41,160 ‫So compression is not as efficient withdraws because you get all these different things that are next 518 00:29:41,160 --> 00:29:42,630 ‫to each other and are completely different. 519 00:29:42,640 --> 00:29:45,240 ‫So we can't compress as effectively aggregation. 520 00:29:45,240 --> 00:29:51,020 ‫We saw how poorly the Rome based database did with the with the aggregation. 521 00:29:51,300 --> 00:29:55,620 ‫I mean, we had to query the whole table, but we we pulled. 522 00:29:56,490 --> 00:30:02,820 ‫Lots of information that we didn't pick, so column oriented databases were more efficient when it comes 523 00:30:02,820 --> 00:30:08,920 ‫to aggregation because we fetch only what we need and we we work on that. 524 00:30:08,940 --> 00:30:17,130 ‫So if you if you started switching information or data that that are only what you need, then you're 525 00:30:17,130 --> 00:30:23,880 ‫going to start you you'd be very, very efficient in your query and you only fetch the information that 526 00:30:23,880 --> 00:30:24,270 ‫you need. 527 00:30:24,570 --> 00:30:24,800 ‫Right. 528 00:30:25,050 --> 00:30:31,170 ‫So obviously you're going to have a lot and you're going to finish faster than than raw oriented databases, 529 00:30:31,530 --> 00:30:33,480 ‫efficient queries with multi columns. 530 00:30:33,480 --> 00:30:33,730 ‫Right. 531 00:30:33,990 --> 00:30:40,680 ‫So if you're picking multiple columns in a query or working with multiple columns, usually. 532 00:30:42,780 --> 00:30:44,130 ‫Rural oriented or better? 533 00:30:44,380 --> 00:30:49,720 ‫Right, and again, depends on talking only about the queries itself, right. 534 00:30:50,600 --> 00:30:56,420 ‫So you can have a lot of columns if you want to, but if you're only working with a few of them, that's 535 00:30:56,660 --> 00:30:59,600 ‫absolutely fine with column oriented databases, but. 536 00:31:01,130 --> 00:31:09,860 ‫If you're actually working on touching all this Combes, I could be about right with Kamaran, with 537 00:31:10,010 --> 00:31:16,880 ‫Wetherall, we saw it write a fetch on the road, give us almost all the columns against you unless 538 00:31:16,880 --> 00:31:18,190 ‫you have vertical partition. 539 00:31:18,680 --> 00:31:23,870 ‫So we'll call them based are slower because we have to update all these beautiful structures. 540 00:31:23,870 --> 00:31:26,570 ‫Every column has almost a structure on itself. 541 00:31:26,600 --> 00:31:27,440 ‫Think of it this way. 542 00:31:27,440 --> 00:31:28,360 ‫How is this stored? 543 00:31:28,580 --> 00:31:31,880 ‫So I need to know where is it located and touch it, touch it, touch it. 544 00:31:31,880 --> 00:31:32,690 ‫Every column. 545 00:31:32,720 --> 00:31:39,260 ‫Right, like very similar to indexes if you think they are perfect for online analytical processing. 546 00:31:39,260 --> 00:31:43,820 ‫So if you're doing analysis, you're not touching, you're not writing perfect. 547 00:31:43,820 --> 00:31:45,010 ‫Cormoran to databased. 548 00:31:45,650 --> 00:31:51,830 ‫It compress greatly because all similar sampling are together. 549 00:31:52,070 --> 00:31:58,220 ‫So you can put the compression algorithms can do magic on these things and compress them. 550 00:31:58,700 --> 00:32:03,500 ‫Amazing for aggregation we so be done. 551 00:32:03,930 --> 00:32:04,210 ‫Right. 552 00:32:04,220 --> 00:32:05,860 ‫That was, that was amazing. 553 00:32:05,870 --> 00:32:11,320 ‫I'm going to, I got, I got tired of it and finally it's inefficient for queries. 554 00:32:11,480 --> 00:32:16,580 ‫Well the multiple comes if you're working, if you're asking for all columns, don't do a calm, don't 555 00:32:16,580 --> 00:32:17,060 ‫do a column. 556 00:32:17,310 --> 00:32:22,550 ‫I know there's a little bit of where you think that column oriented database's or Colmar into deadest 557 00:32:22,550 --> 00:32:26,540 ‫or column number will be great if you're asking for a lot of columns. 558 00:32:26,540 --> 00:32:28,700 ‫Nope, they are the worst of this stuff. 559 00:32:29,150 --> 00:32:37,160 ‫I know guys Postgres, my school, other database's, most of them are Robair storage. 560 00:32:37,310 --> 00:32:45,620 ‫However, S.P.C.A. and others have Oracle, I believe most databases by default. 561 00:32:45,620 --> 00:32:48,910 ‫Gaudreau row row based storage. 562 00:32:49,700 --> 00:32:57,020 ‫However, if you think about it, they have database storage engine and we talked about database storage 563 00:32:57,260 --> 00:33:01,160 ‫and my database scores and my YouTube channel as well. 564 00:33:01,160 --> 00:33:07,400 ‫I talked about that, how our database engine, you can swizzle this database engine for a given table 565 00:33:07,610 --> 00:33:15,230 ‫so every table you can pick a table and you store this table as Collum store and you can pick another 566 00:33:15,230 --> 00:33:16,730 ‫table and start as store. 567 00:33:17,240 --> 00:33:19,430 ‫You can Switzler database engine. 568 00:33:19,610 --> 00:33:19,940 ‫Right. 569 00:33:20,600 --> 00:33:26,270 ‫Based on that, you can just change your database engine for every table based on the needs. 570 00:33:26,270 --> 00:33:30,110 ‫If you have a table that only get queried, make it column, for example. 571 00:33:30,410 --> 00:33:30,680 ‫Right. 572 00:33:30,800 --> 00:33:33,920 ‫If you're doing a lot of analytics on it, make it a call. 573 00:33:34,100 --> 00:33:34,350 ‫Right. 574 00:33:34,550 --> 00:33:38,660 ‫If you're doing a lot of rights, probably you want to make it a raw store. 575 00:33:38,660 --> 00:33:39,050 ‫Right. 576 00:33:39,470 --> 00:33:47,900 ‫And think about all of this stuff like you cannot, for example, join a road based store, call a table 577 00:33:48,080 --> 00:33:50,510 ‫with a column based table. 578 00:33:50,510 --> 00:33:50,960 ‫Right. 579 00:33:51,600 --> 00:33:52,580 ‫That's just bad. 580 00:33:52,850 --> 00:33:58,100 ‫I mean, some day there is a support, but give them give them a break, guys. 581 00:33:58,100 --> 00:34:01,220 ‫Give them a break, because that will be really disastrous. 582 00:34:01,220 --> 00:34:03,220 ‫I can't even imagine what the database will do. 583 00:34:03,230 --> 00:34:05,690 ‫It was going to freak out, I guess our guys. 584 00:34:05,980 --> 00:34:06,860 ‫That's it for me today. 585 00:34:07,010 --> 00:34:09,110 ‫What do you think about these two puppies? 586 00:34:09,320 --> 00:34:11,510 ‫Let me know in the comments section below. 587 00:34:11,540 --> 00:34:13,820 ‫I'm going to see you in the next one. 588 00:34:14,090 --> 00:34:14,840 ‫You guys, they awesome.