1 00:00:00,060 --> 00:00:10,080 ‫One of the most important fundamentals to understand and databases are the back end of tables and indexes, 2 00:00:10,080 --> 00:00:18,060 ‫and how are they stored on disk and you going to see me mention these fundamental, you know, concepts 3 00:00:18,060 --> 00:00:18,810 ‫and pieces? 4 00:00:19,140 --> 00:00:23,400 ‫And I want you to understand them before jumping into deep into the course. 5 00:00:23,700 --> 00:00:31,020 ‫So this lecture will discuss these different concepts, how tables are and indexes are stored in disk 6 00:00:31,320 --> 00:00:32,670 ‫and how they are queried. 7 00:00:32,880 --> 00:00:38,670 ‫And what is the cost that is associated with these knowing these concepts? 8 00:00:39,330 --> 00:00:40,110 ‫Let's jump into it. 9 00:00:40,140 --> 00:00:46,020 ‫So here's all of the storage concepts and not really just storage is just the operations that are performed 10 00:00:46,020 --> 00:00:47,100 ‫on the storage itself. 11 00:00:47,490 --> 00:00:50,550 ‫So we're going to assess what is a table from a logical point of view. 12 00:00:50,550 --> 00:00:56,790 ‫We're going to discuss the role I'd consider the unique identifier of the role we'll get to discuss 13 00:00:56,940 --> 00:00:58,230 ‫what is a page. 14 00:00:58,410 --> 00:01:03,120 ‫We're going to discuss what's in IEO, a heap index, data structure, Beatriz. 15 00:01:03,390 --> 00:01:11,520 ‫And if I know going to give you an example of how a query performs on the heap versus an index, let's 16 00:01:11,520 --> 00:01:12,110 ‫jump into it. 17 00:01:12,120 --> 00:01:15,990 ‫So we have here a beautiful table with a bunch of columns, right? 18 00:01:16,230 --> 00:01:21,270 ‫And so these are the actual field, the columns, and this is a row. 19 00:01:21,420 --> 00:01:26,910 ‫Obviously, you might say that I work with databases that don't have rows. 20 00:01:26,910 --> 00:01:30,690 ‫They have documented the same Briody same concept. 21 00:01:31,000 --> 00:01:34,950 ‫Once you take that idea off row, you can convert to back into. 22 00:01:34,950 --> 00:01:42,630 ‫At the end of the day, it it's all bits and bytes and and the storage model just understands that. 23 00:01:42,870 --> 00:01:44,820 ‫And that's where I want you to understand this. 24 00:01:45,120 --> 00:01:46,760 ‫This rule, this is a call. 25 00:01:47,010 --> 00:01:50,730 ‫So what is the concept of photo ID here? 26 00:01:51,150 --> 00:01:52,740 ‫Usually databases. 27 00:01:53,900 --> 00:02:00,680 ‫Don't work with the fields that you provide them, some do right, but most databases create their own 28 00:02:00,680 --> 00:02:04,520 ‫system maintained row. 29 00:02:04,820 --> 00:02:05,270 ‫Right. 30 00:02:05,600 --> 00:02:14,330 ‫And this is the real idea effectively, and this is a position in the role itself to a uniquely identified 31 00:02:14,630 --> 00:02:17,290 ‫this particular role ideal right in Postgres. 32 00:02:17,300 --> 00:02:20,150 ‫This is also called tuple I.D. effectively. 33 00:02:20,430 --> 00:02:20,780 ‫Right? 34 00:02:21,250 --> 00:02:24,050 ‫There's another important concept to understand, right? 35 00:02:24,140 --> 00:02:24,560 ‫Nothing. 36 00:02:24,560 --> 00:02:25,460 ‫Nothing so special. 37 00:02:25,460 --> 00:02:32,750 ‫In my score, for example, the primary key becomes that pseudo rule IDF actively. 38 00:02:32,960 --> 00:02:33,300 ‫Right. 39 00:02:33,560 --> 00:02:41,100 ‫But in Postgres and some other databases as well and you roll ideas created on your column. 40 00:02:41,360 --> 00:02:44,270 ‫And that is how it is referenced effectively. 41 00:02:44,600 --> 00:02:46,100 ‫So what is a page, really? 42 00:02:46,110 --> 00:02:52,940 ‫So if we really thought about this stable right as we're looking at this logical table, a bunch of 43 00:02:52,940 --> 00:02:59,360 ‫columns and rows, what depends really on the storage model, whether it's a role store and a columnist 44 00:02:59,360 --> 00:03:04,850 ‫that I really want you to watch that other lecture where I talked about role versus columnist or how 45 00:03:04,850 --> 00:03:07,640 ‫are they physically storing it does really matter. 46 00:03:08,240 --> 00:03:17,960 ‫Roles are stored on these logical pages and pages are really nothing but a fixed size memory location, 47 00:03:18,290 --> 00:03:23,420 ‫which translates to desk location as well of of a of a bunch of bytes. 48 00:03:24,020 --> 00:03:24,440 ‫That's it. 49 00:03:24,560 --> 00:03:25,310 ‫It's a fixed size. 50 00:03:25,310 --> 00:03:26,480 ‫Usually it's fixed size. 51 00:03:26,960 --> 00:03:27,290 ‫Right? 52 00:03:27,590 --> 00:03:30,020 ‫And then you can put the row in the page. 53 00:03:30,020 --> 00:03:33,710 ‫So technically, a page can fit many rows. 54 00:03:34,160 --> 00:03:36,530 ‫Now it depends how many rows. 55 00:03:36,860 --> 00:03:40,040 ‫It all depends on the size of the row, right? 56 00:03:40,040 --> 00:03:45,770 ‫And the size of the columns and the size of the data types of these columns. 57 00:03:46,580 --> 00:03:51,470 ‫So the database really doesn't really read just a single row say, Oh, go, really read me. 58 00:03:51,890 --> 00:03:53,270 ‫Wrote them 4000 now. 59 00:03:53,450 --> 00:03:54,860 ‫It's not like a ram. 60 00:03:55,130 --> 00:04:00,800 ‫RAM is byte address where you say, Oh God, give me that particular byte location and just go read 61 00:04:00,800 --> 00:04:00,980 ‫it. 62 00:04:01,610 --> 00:04:04,370 ‫That's a random access memory, right? 63 00:04:04,970 --> 00:04:05,390 ‫RAM. 64 00:04:06,140 --> 00:04:10,760 ‫No disk, on the other hand, actually just reads a page, right? 65 00:04:11,180 --> 00:04:17,270 ‫And you tell me, Oh, go, go make an Io and fetch me this much data, right? 66 00:04:17,660 --> 00:04:24,080 ‫And when you do an issue, this input output, you get a page or more than one page. 67 00:04:24,230 --> 00:04:26,300 ‫It really depends on that. 68 00:04:26,570 --> 00:04:32,630 ‫The disk partition, how did you partition your, your format, your formats, your your hard drive 69 00:04:32,630 --> 00:04:33,860 ‫or SSD? 70 00:04:34,520 --> 00:04:39,110 ‫And it depends on so many other factors like that that the operating system as well. 71 00:04:39,110 --> 00:04:44,330 ‫But when you actually read, you get so much stuff, you get one page or more. 72 00:04:44,450 --> 00:04:45,320 ‫It really depends. 73 00:04:45,680 --> 00:04:49,490 ‫But the beauty of this is one page can have many, many roads and write. 74 00:04:49,820 --> 00:04:54,180 ‫Each piece has a fixed size like guess the default size is eighth-grader by. 75 00:04:54,410 --> 00:04:57,890 ‫It might be quite, I believe, a 16 year old, and you can configure those. 76 00:04:58,070 --> 00:05:04,610 ‫So if I, for example, say each page holds three rows pride in this particular example, that means 77 00:05:04,850 --> 00:05:08,750 ‫for one thousand one rows, you divide it by three. 78 00:05:08,930 --> 00:05:12,680 ‫You get around 333 pages, right? 79 00:05:12,800 --> 00:05:16,160 ‫And these pages are stored on disk, right? 80 00:05:16,400 --> 00:05:21,380 ‫And when you look at them in the disk, it's just a bunch of little ones and zeros, right? 81 00:05:21,890 --> 00:05:28,280 ‫But they physically have a start and an end, which you define as a database. 82 00:05:28,550 --> 00:05:34,100 ‫So this is how you look, and this is very critical to understand it is not just theoretical stuff. 83 00:05:34,520 --> 00:05:36,770 ‫This is what makes some breaks a database. 84 00:05:37,010 --> 00:05:39,650 ‫This is what makes a query slow or fast. 85 00:05:39,800 --> 00:05:42,610 ‫Understanding how many pages are you pulling? 86 00:05:42,620 --> 00:05:44,270 ‫How many ideas are you making? 87 00:05:44,630 --> 00:05:48,160 ‫Because that's what you what it would actually have pages. 88 00:05:49,280 --> 00:05:49,970 ‫What does it tell you? 89 00:05:50,390 --> 00:05:53,600 ‫And I was an operation that reads request to disk. 90 00:05:53,950 --> 00:05:56,360 ‫We try to minimize this as much as possible. 91 00:05:56,370 --> 00:05:58,850 ‫This is the currency of a database. 92 00:05:59,090 --> 00:06:01,640 ‫The list i o you make the faster you records. 93 00:06:01,910 --> 00:06:03,140 ‫That's what we need to make. 94 00:06:03,350 --> 00:06:04,400 ‫So what does that mean? 95 00:06:04,730 --> 00:06:07,910 ‫And i o can fetch one page or more, depending on the disk partitions. 96 00:06:07,910 --> 00:06:08,720 ‫I don't know about that. 97 00:06:09,740 --> 00:06:11,660 ‫And I cannot read a single row. 98 00:06:11,900 --> 00:06:14,360 ‫It's just a page with many laws in them. 99 00:06:14,570 --> 00:06:18,560 ‫When and when you do that, you get many rows for free. 100 00:06:18,800 --> 00:06:23,450 ‫You don't get to say, Oh, I want this page, but I don't want this stuff. 101 00:06:24,110 --> 00:06:25,580 ‫Sorry, you don't get to do that. 102 00:06:25,760 --> 00:06:27,950 ‫Oh, I want this page, but I don't want the date of birth. 103 00:06:28,310 --> 00:06:29,210 ‫I want this page. 104 00:06:29,210 --> 00:06:30,440 ‫But I don't want the names. 105 00:06:30,770 --> 00:06:32,240 ‫No, you don't get to do that. 106 00:06:32,540 --> 00:06:34,640 ‫That's why select Start is very expensive. 107 00:06:34,910 --> 00:06:38,120 ‫That's why I select a name is very expensive as well. 108 00:06:38,570 --> 00:06:38,800 ‫Right. 109 00:06:38,810 --> 00:06:45,260 ‫Because if you do select name from table, because what you're already going to the table, you're doing 110 00:06:45,260 --> 00:06:45,740 ‫the page. 111 00:06:45,740 --> 00:06:49,340 ‫If it's a roll store, you're getting everything right. 112 00:06:49,760 --> 00:06:52,490 ‫We're going to talk more about that in the the and the course. 113 00:06:53,180 --> 00:07:00,140 ‫In the database has to filter out what you don't want and throw it, um, throw it away. 114 00:07:00,380 --> 00:07:07,490 ‫The cost of this realizing the bites back to the memory in memory structure for the database to process 115 00:07:07,490 --> 00:07:08,600 ‫is expensive, right? 116 00:07:09,380 --> 00:07:13,130 ‫So you want to minimize these ills about expensive same ideas. 117 00:07:13,130 --> 00:07:18,320 ‫An operating system Some eosin operating system goes to the operating system cache, and this is specifically 118 00:07:18,320 --> 00:07:22,820 ‫true for in case of Postgres, post-Christian rely heavily on the operating system cache, so an i o 119 00:07:22,820 --> 00:07:27,170 ‫doesn't really mean going to disk so it could go to a cache that the operating system defines. 120 00:07:27,260 --> 00:07:29,720 ‫So you're going to hear me as well. 121 00:07:30,440 --> 00:07:32,980 ‫Talk about this a lot in the course heap. 122 00:07:33,040 --> 00:07:33,710 ‫What is a heap? 123 00:07:33,710 --> 00:07:35,990 ‫A heap is exactly what we talked about here. 124 00:07:36,230 --> 00:07:42,020 ‫A heap is a collection of pages that represent points to your data table. 125 00:07:42,230 --> 00:07:47,660 ‫It has everything above the table itself, so everything in the table is in the heap. 126 00:07:48,260 --> 00:07:53,960 ‫We call it the here because it's a special data structure that has pretty much all the data, everything 127 00:07:54,260 --> 00:07:54,650 ‫right? 128 00:07:54,920 --> 00:07:59,840 ‫And that that makes it expensive to query because it has a lot of data, right? 129 00:08:00,320 --> 00:08:04,370 ‫So this is where the actual data is stored and everything. 130 00:08:06,480 --> 00:08:12,960 ‫Traversing the heap is expensive, as we need to read so many data to find what exactly we want, right? 131 00:08:13,200 --> 00:08:17,910 ‫Because usually when you look, when you read stuff from the heap, when you try to find something you 132 00:08:17,910 --> 00:08:20,100 ‫find, you're trying to be looking for few things. 133 00:08:20,670 --> 00:08:22,890 ‫But the heap gives you everything. 134 00:08:22,890 --> 00:08:24,360 ‫And that's costly, right? 135 00:08:24,870 --> 00:08:29,080 ‫This is why we need indexes that help that help tell you. 136 00:08:29,100 --> 00:08:31,760 ‫Tell us exactly what part of the heap we need to read. 137 00:08:31,770 --> 00:08:34,800 ‫We don't need to read every single page in the heap. 138 00:08:34,950 --> 00:08:39,450 ‫I want to know what page exactly to jump to, to read just that page. 139 00:08:39,600 --> 00:08:40,350 ‫You can do that. 140 00:08:40,470 --> 00:08:44,970 ‫And the desk is a go read me page three, three, three, three, three and above. 141 00:08:45,240 --> 00:08:52,470 ‫You can do that, but you can't say, Oh, go read me raw seven row if if you say, go read me Ross 142 00:08:52,470 --> 00:08:57,540 ‫Avenue, get a good page two and maybe Page three and anything that's next to it effectively. 143 00:08:57,600 --> 00:09:02,790 ‫Let's talk about the indexes and index is nothing but another data structure. 144 00:09:03,090 --> 00:09:10,500 ‫No, magic is just a bunch of other data structure that helps you pinpoint exactly where to go into 145 00:09:10,500 --> 00:09:10,980 ‫the heap. 146 00:09:11,520 --> 00:09:13,830 ‫So you can fetch that information in an efficient manner. 147 00:09:14,130 --> 00:09:18,420 ‫So an index is another data structure separate from the heap and has pointers to the heap. 148 00:09:18,840 --> 00:09:25,710 ‫And these pointers is just literally numbers that point to the row ID that we talked about, and that 149 00:09:25,720 --> 00:09:31,720 ‫already has more metadata about the page in the heap that we need to pull it. 150 00:09:32,310 --> 00:09:35,940 ‫It has part of the data and used to quickly search for something. 151 00:09:36,180 --> 00:09:42,840 ‫You can index one or more column, uh, in the index effectively to create this data structure. 152 00:09:42,840 --> 00:09:47,400 ‫And this data structure uses something called bigotries, which I have an entire section talking about 153 00:09:47,490 --> 00:09:48,090 ‫effectively. 154 00:09:50,940 --> 00:09:55,890 ‫So, yeah, because you cannot index everything you index exactly with the things you need to search 155 00:09:55,890 --> 00:09:56,150 ‫for. 156 00:09:57,000 --> 00:10:03,180 ‫Once you find a value of the in the index, you go to the heap to fetch more information where everything 157 00:10:03,180 --> 00:10:03,750 ‫is there, right? 158 00:10:03,750 --> 00:10:05,700 ‫That's that's the point of the heap. 159 00:10:06,300 --> 00:10:09,300 ‫Index tells you exactly which base to fetch in the heap. 160 00:10:09,300 --> 00:10:15,840 ‫And instead of taking the head to scan every page in the heap, the index is also stored as pages and 161 00:10:15,840 --> 00:10:18,660 ‫cost i o to pull the entries of the index. 162 00:10:18,660 --> 00:10:23,160 ‫Because guess what, when you when you think about the index, it's not something magic, right? 163 00:10:23,340 --> 00:10:29,490 ‫It's a bunch of guess what a bunch of another data structure usually be tree and that be tree. 164 00:10:29,490 --> 00:10:34,440 ‫You need to read it in order to understand to pass this content, right? 165 00:10:34,980 --> 00:10:42,150 ‫And this leaves on desk because when you spin up the database, you need to read the index from desk 166 00:10:42,150 --> 00:10:43,380 ‫and pull it in memory. 167 00:10:43,590 --> 00:10:46,670 ‫Some indexes fit in memory, others don't. 168 00:10:46,680 --> 00:10:52,920 ‫If you have an index that is so large, it doesn't have to fit in memory that makes in searching the 169 00:10:52,920 --> 00:10:59,820 ‫index to know what to pull right from the heap can be expensive, and this is what I want you to kind 170 00:10:59,820 --> 00:11:02,250 ‫of understand and pinpoint here. 171 00:11:03,510 --> 00:11:09,240 ‫More and as the more can fit in the memory, popular data structures for indexes as Trig can be listed 172 00:11:09,240 --> 00:11:09,990 ‫at a structure. 173 00:11:10,320 --> 00:11:13,380 ‫Brennan indexes, you know, many other type of indexes. 174 00:11:13,710 --> 00:11:15,660 ‫Learn more on the B3 section. 175 00:11:16,650 --> 00:11:20,820 ‫All right, so this is how it looks like on index on disk. 176 00:11:20,970 --> 00:11:26,310 ‫This is a very simplified if it was guys, so take it with a grain of salt. 177 00:11:26,520 --> 00:11:29,010 ‫This is not exactly how it looks like, but it helps. 178 00:11:29,010 --> 00:11:29,450 ‫Interesting. 179 00:11:29,460 --> 00:11:30,150 ‫So this is the heap. 180 00:11:30,480 --> 00:11:31,350 ‫So we have a page. 181 00:11:31,350 --> 00:11:33,790 ‫We have a row ID one. 182 00:11:33,810 --> 00:11:39,210 ‫This is the price and probably their employee ID, the name, the date of birth, the salary and then 183 00:11:39,210 --> 00:11:43,040 ‫immediately the next employer or stocks, right? 184 00:11:43,050 --> 00:11:43,890 ‫It's in the same page. 185 00:11:44,520 --> 00:11:49,470 ‫Roll it to twenty nine and then Ali five two. 186 00:11:50,080 --> 00:11:51,380 ‫And then this ends page. 187 00:11:51,410 --> 00:11:51,780 ‫Image one. 188 00:11:52,020 --> 00:11:52,830 ‫I simplified it. 189 00:11:53,070 --> 00:11:58,560 ‫It's exactly identical to this, but I just said roll five, four or five six because of three rows 190 00:11:58,560 --> 00:11:58,950 ‫in a page. 191 00:11:58,960 --> 00:12:03,000 ‫Remember that that that was our condition here in this particular example. 192 00:12:03,450 --> 00:12:04,680 ‫Seven, eight nine based two. 193 00:12:04,860 --> 00:12:08,160 ‫So 10, 11, 12 and Page three ended it. 194 00:12:08,160 --> 00:12:08,560 ‫It added it. 195 00:12:08,580 --> 00:12:14,690 ‫A Page 33 will have will have three rows the thousand nine, nine, nine and 998. 196 00:12:15,480 --> 00:12:17,010 ‫Well, this is the heap, right? 197 00:12:17,250 --> 00:12:23,290 ‫But in the index, if you think about in the index, it like, let's say, I indexed on employee I.D., 198 00:12:23,340 --> 00:12:24,660 ‫which is this 10, right? 199 00:12:24,930 --> 00:12:27,590 ‫So you you're going to see only the employee ID here. 200 00:12:27,630 --> 00:12:31,830 ‫So 10, 20, 30, 40, 50, 60, 70, 80, 90, right? 201 00:12:32,010 --> 00:12:37,200 ‫And then did it add it up until 10000, which is the last employee employee 10000. 202 00:12:37,470 --> 00:12:41,280 ‫OK, I have all the employees here, but that's just their ideas. 203 00:12:41,280 --> 00:12:42,750 ‫And look at what do we have here? 204 00:12:43,230 --> 00:12:45,690 ‫This is the point we talked about. 205 00:12:46,050 --> 00:12:46,760 ‫Is this OK? 206 00:12:47,220 --> 00:12:55,440 ‫Employee I.D. 10 lives in row ID one has our I.D. one and lives in Page Zero, right? 207 00:12:55,740 --> 00:12:59,220 ‫Employee ID 20 lives on Page Zero. 208 00:12:59,220 --> 00:13:02,350 ‫Android Auto Employee number one one zero. 209 00:13:02,350 --> 00:13:05,430 ‫It lives on Page three and he has a. 210 00:13:05,740 --> 00:13:08,180 ‫ID 11 and so on. 211 00:13:08,200 --> 00:13:15,130 ‫Right this way, if you search the index and searching the index is just another Io, as we talked about 212 00:13:15,130 --> 00:13:17,200 ‫it now, it's not sequential. 213 00:13:17,200 --> 00:13:21,280 ‫As I am showing it here, it's more a little bit complicated than that. 214 00:13:21,280 --> 00:13:22,560 ‫It's a B3 structure. 215 00:13:22,580 --> 00:13:28,930 ‫So it's that actual three where you go to the left, if it's less and go to the right of it's more are 216 00:13:28,930 --> 00:13:30,280 ‫going to talk about that in details. 217 00:13:30,280 --> 00:13:33,250 ‫But take this with a grain of salt as an example. 218 00:13:33,490 --> 00:13:38,820 ‫So let's say I want to search for employee, I don't know, 40, right? 219 00:13:39,160 --> 00:13:41,410 ‫So 40, I build this page. 220 00:13:41,560 --> 00:13:41,980 ‫So I did. 221 00:13:41,980 --> 00:13:42,880 ‫Didn't I own the index? 222 00:13:43,150 --> 00:13:45,310 ‫Oh, I found four employee 40. 223 00:13:45,310 --> 00:13:46,420 ‫But hey, wait a second. 224 00:13:46,660 --> 00:13:47,950 ‫I don't want employee 40. 225 00:13:47,950 --> 00:13:50,130 ‫I want their name, for example, or their salary. 226 00:13:50,140 --> 00:13:51,480 ‫Or you need to go to the heap. 227 00:13:51,490 --> 00:13:52,090 ‫What do you do? 228 00:13:52,190 --> 00:13:52,420 ‫How? 229 00:13:52,510 --> 00:13:54,790 ‫Well, how do I know which page to go to the hip? 230 00:13:55,030 --> 00:14:00,150 ‫Well, employee, 40, have his a raw 84 and it's page one. 231 00:14:00,160 --> 00:14:07,660 ‫So immediately you do and another I own the heap says, Oh, give me page for page one and give me employee 232 00:14:07,660 --> 00:14:08,390 ‫a story. 233 00:14:08,440 --> 00:14:09,720 ‫Raw ID four. 234 00:14:09,730 --> 00:14:11,710 ‫So you go to the whole idea for you. 235 00:14:11,710 --> 00:14:15,640 ‫Discard five of the six because guess what you did and I know you got five and six. 236 00:14:15,970 --> 00:14:16,870 ‫You cannot. 237 00:14:17,050 --> 00:14:18,880 ‫You cannot escape that. 238 00:14:19,180 --> 00:14:21,790 ‫With that, I'll give you all the stuff right? 239 00:14:22,120 --> 00:14:27,100 ‫And you gave you five and six and and their birthday, you got the salary of six. 240 00:14:27,100 --> 00:14:29,050 ‫You get you get the d'urbervilles there. 241 00:14:29,050 --> 00:14:34,570 ‫You get it their name of six, you get all this stuff, but you only take exactly what you want from 242 00:14:34,570 --> 00:14:35,410 ‫for effectively. 243 00:14:35,670 --> 00:14:37,660 ‫The database does all this magic behind this. 244 00:14:37,960 --> 00:14:39,700 ‫So think about it just that. 245 00:14:40,210 --> 00:14:43,770 ‫So this is much more efficient, right? 246 00:14:44,200 --> 00:14:51,670 ‫If this query is fast and if this square is fast furious, asking for more is going to do more jumps 247 00:14:51,670 --> 00:14:52,530 ‫here, right? 248 00:14:52,960 --> 00:14:53,770 ‫Let's take an example. 249 00:14:53,980 --> 00:14:55,600 ‫So let's say I don't have an index. 250 00:14:55,600 --> 00:15:02,300 ‫I only have a heap and I want to do select star from employee table, where employee ID equals 10000. 251 00:15:02,320 --> 00:15:04,900 ‫Well, in this case, guess what? 252 00:15:04,900 --> 00:15:06,610 ‫Employee doesn't have an index, right? 253 00:15:06,970 --> 00:15:08,410 ‫So what do I do? 254 00:15:08,470 --> 00:15:09,820 ‫Well, I don't have a choice. 255 00:15:10,120 --> 00:15:13,660 ‫I have to go query Page Zero, as is. 256 00:15:14,230 --> 00:15:15,160 ‫How many rolls do we have? 257 00:15:15,340 --> 00:15:22,390 ‫We have Raw ID one, which is 10, 20 and 30 employed, 10 employee I.D. 20 and 30. 258 00:15:22,690 --> 00:15:23,440 ‫Oh, none. 259 00:15:23,440 --> 00:15:24,070 ‫None, none. 260 00:15:24,820 --> 00:15:25,570 ‫Discard the page. 261 00:15:25,570 --> 00:15:26,310 ‫Go to the next page. 262 00:15:26,320 --> 00:15:29,530 ‫Page one four five six seven eight nine. 263 00:15:29,540 --> 00:15:29,850 ‫Nope. 264 00:15:30,310 --> 00:15:35,110 ‫Up until doing fitting all these pages, this is so expensive. 265 00:15:35,470 --> 00:15:42,850 ‫Some, some databases does a parallel threading multiprocessing to kind of walk this structure from 266 00:15:43,570 --> 00:15:48,670 ‫by using multiple threads from down and bottom, from down to top. 267 00:15:49,000 --> 00:15:51,460 ‫And in parallel, to scan. 268 00:15:51,610 --> 00:15:57,250 ‫But in general, it is expensive because the last page happened to be the one that have employed ten 269 00:15:57,250 --> 00:15:57,640 ‫thousand. 270 00:15:58,150 --> 00:15:59,910 ‫So we did a lot of ways to fix that. 271 00:15:59,920 --> 00:16:03,320 ‫So that's that's what our index lets doing with that index. 272 00:16:03,340 --> 00:16:10,210 ‫So now the query with the index, in this case, we're going to query Employee ID ten thousand. 273 00:16:10,540 --> 00:16:21,580 ‫And here's with the B3 query actually takes place and it goes through a three until we find exactly 274 00:16:21,580 --> 00:16:27,670 ‫the page that we need to look at and one the moment we find a page in the index o is page and we go 275 00:16:27,670 --> 00:16:33,250 ‫to the page and then we find that exactly employee ten thousand. 276 00:16:33,430 --> 00:16:33,940 ‫Right. 277 00:16:34,330 --> 00:16:37,750 ‫And this is ORide Thousand and its page 333. 278 00:16:38,080 --> 00:16:45,970 ‫So once I got that tuple that information, now I go to the heap and they say, OK, go page three, 279 00:16:45,970 --> 00:16:53,830 ‫three three, and Paul wrote 10000 and returned to the user so immediately we know where to which page 280 00:16:53,830 --> 00:16:57,280 ‫to pull and immediately we get all the information. 281 00:16:57,610 --> 00:17:04,330 ‫And then finally, I have some notes here that I just added so we can discuss and end this lecture. 282 00:17:04,720 --> 00:17:09,180 ‫Sometimes the heap table can be organized around this single index, right? 283 00:17:09,180 --> 00:17:12,490 ‫So we did show that index is a separate data structure from the heap. 284 00:17:12,730 --> 00:17:18,580 ‫But sometimes the heap, as you can, as you saw in this lecture, is not really organized. 285 00:17:18,580 --> 00:17:20,470 ‫It's not ordered right. 286 00:17:20,770 --> 00:17:23,950 ‫But the heap can be ordered around the index. 287 00:17:23,950 --> 00:17:25,750 ‫And in that case, it's called the cluster index. 288 00:17:25,750 --> 00:17:28,060 ‫And I talk about it in this lecture as what in this course? 289 00:17:28,060 --> 00:17:28,330 ‫Sorry. 290 00:17:28,780 --> 00:17:35,390 ‫And it's called also called the oracle terminologies called iottie or sorry. 291 00:17:35,390 --> 00:17:38,110 ‫It's called index organized table iottie, right? 292 00:17:38,320 --> 00:17:39,070 ‫So I just fixed it. 293 00:17:39,550 --> 00:17:45,520 ‫So, yeah, so this is called an iottie write and index organized table where you have the table organized 294 00:17:45,520 --> 00:17:47,470 ‫around a single index of activity. 295 00:17:47,470 --> 00:17:50,830 ‫And this is called a cluster index and has a pros and cons. 296 00:17:50,830 --> 00:17:52,000 ‫Cango can. 297 00:17:52,300 --> 00:17:57,100 ‫I can't go into this in this lecture, but I'm going to talk about it in other lectures as well. 298 00:17:57,410 --> 00:18:03,100 ‫Primary key is usually what is called primary as a cluster index, unless otherwise specified and whenever 299 00:18:03,100 --> 00:18:05,080 ‫you hear the term primary key there. 300 00:18:05,170 --> 00:18:09,690 ‫Means it's a cluster, and next, usually unless his boss, Chris Paul, post-GST primary keys, is 301 00:18:09,700 --> 00:18:11,710 ‫just the secondary key, right? 302 00:18:12,040 --> 00:18:14,590 ‫So my school a.B always have a primary key. 303 00:18:14,590 --> 00:18:16,360 ‫It's always have a clustered index. 304 00:18:16,360 --> 00:18:23,920 ‫So if you define a primary key, the table will be organized around that index. 305 00:18:23,920 --> 00:18:28,150 ‫So be very careful about what data type you pick. 306 00:18:28,420 --> 00:18:30,610 ‫I talked about this a lot. 307 00:18:31,690 --> 00:18:39,730 ‫The idea of having a, for example, a new idea which is truly random in a cluster index just kills 308 00:18:39,730 --> 00:18:41,650 ‫the performance when it comes to writing. 309 00:18:41,740 --> 00:18:45,100 ‫Because now if you insert a new idea, random thing. 310 00:18:45,880 --> 00:18:51,130 ‫Now the heap have to be organized around the index, which is your idea, which is random. 311 00:18:51,370 --> 00:18:58,780 ‫So you're going to be he be hitting multiple pages because every new idea is random, so it will and 312 00:18:58,780 --> 00:19:02,170 ‫will live in another completely random page. 313 00:19:02,620 --> 00:19:06,370 ‫And this scattershot is very, very expensive. 314 00:19:06,370 --> 00:19:09,270 ‫And I talk about it and many other ask questions, guys. 315 00:19:09,520 --> 00:19:16,480 ‫If you want me to explore more of these products have only secondary indexes, and all indexes point 316 00:19:16,480 --> 00:19:19,330 ‫directly to the Real ID, which lives in the heap. 317 00:19:19,330 --> 00:19:22,420 ‫So this is different than the primary key, right? 318 00:19:22,690 --> 00:19:28,840 ‫So in the case of if you have a primary key where you have that kind of clustered index, the primary 319 00:19:28,840 --> 00:19:32,680 ‫key itself is a kind of almost like a user defined field, right? 320 00:19:33,070 --> 00:19:38,830 ‫But the lack of any other index will point to that primary key here. 321 00:19:39,310 --> 00:19:42,570 ‫But in Postgres, everything is a secondary index. 322 00:19:42,580 --> 00:19:48,460 ‫We have this role idea, which is a system maintained field that every index point to the Real ID, 323 00:19:48,910 --> 00:19:54,280 ‫which means if you edit anything in focus, all the indexes get updated. 324 00:19:54,730 --> 00:19:57,910 ‫That's a cost that you have to be in to understand, right? 325 00:19:58,390 --> 00:20:02,070 ‫It's just all makes sense when you understand these low level things. 326 00:20:02,080 --> 00:20:04,600 ‫That's why I wanted to make this lecture and find an to make. 327 00:20:04,750 --> 00:20:09,880 ‫I mean, we talked about what a table is talked about there wide, which is really it's a system maintained 328 00:20:09,880 --> 00:20:12,130 ‫field that some databases at, some don't. 329 00:20:12,550 --> 00:20:17,410 ‫Some rely on an existing primary key that the user defined lookup a page a page has. 330 00:20:17,950 --> 00:20:19,780 ‫It's just a fixed size, right? 331 00:20:19,810 --> 00:20:24,820 ‫Some pages are dynamic, but let's talk about fixed size just easier to deal with fixed size pages, 332 00:20:25,360 --> 00:20:28,030 ‫8K and Postgres 16K and equal. 333 00:20:28,270 --> 00:20:30,130 ‫And you can put arrows in them. 334 00:20:30,150 --> 00:20:31,600 ‫You can put columns on them. 335 00:20:31,790 --> 00:20:35,740 ‫I can organize the page around columns, which is something I'm going to talk about. 336 00:20:35,770 --> 00:20:37,200 ‫And it looks like you're right. 337 00:20:37,210 --> 00:20:39,460 ‫I o gives you one or more pages. 338 00:20:39,460 --> 00:20:43,150 ‫You don't you don't get a roll or a byte in an IO. 339 00:20:43,150 --> 00:20:44,290 ‫You get a lot of stuff. 340 00:20:44,620 --> 00:20:51,160 ‫So the key here is if you do it and you get everything you want in a single i o, that's the beauty. 341 00:20:51,520 --> 00:20:53,020 ‫That's the efficient dial. 342 00:20:53,170 --> 00:20:56,890 ‫You want your audio to be efficient, heap data structures, right? 343 00:20:56,890 --> 00:21:00,790 ‫We talked about that index did as Dr B threes, and we gave you some examples. 344 00:21:01,060 --> 00:21:02,440 ‫All right, guys, I'm going to see you in the next month. 345 00:21:02,440 --> 00:21:03,280 ‫Thank you so much.