1 00:00:00,090 --> 00:00:01,450 ‫What's going on, guys? 2 00:00:01,470 --> 00:00:07,470 ‫My name is Hosain, and I wanted to do a quick video to discuss this topic, which is very interesting 3 00:00:07,470 --> 00:00:08,640 ‫to me and I think. 4 00:00:09,940 --> 00:00:18,130 ‫Anyone who works with databases should really understand the underlining data structure of how databases 5 00:00:18,130 --> 00:00:24,730 ‫store stuff, because it will really become really handy when you scale, it will really become handy 6 00:00:24,730 --> 00:00:36,220 ‫when you design your backend application, how your queries actually access the path of your path, 7 00:00:36,220 --> 00:00:40,540 ‫and then you're pulling the data how how the plans are actually makes decision. 8 00:00:40,720 --> 00:00:43,680 ‫And all of this really comes down to indexes. 9 00:00:44,590 --> 00:00:51,130 ‫And this is really I want to discuss how specifically just postscripts and how. 10 00:00:52,020 --> 00:01:01,380 ‫My sequel specifically, A.B., as a database engine in my school stores, their indexes, before we 11 00:01:01,380 --> 00:01:06,510 ‫jump into that, we need to understand the basic architecture of tables. 12 00:01:06,810 --> 00:01:11,630 ‫So if I create a beautiful table on my database, what does that mean? 13 00:01:12,540 --> 00:01:15,210 ‫A table has well, it has columns. 14 00:01:15,210 --> 00:01:24,210 ‫And then and each column will have a specific data type, which an integer string and no dates. 15 00:01:25,170 --> 00:01:30,450 ‫Jason, I heard this is a good data type these days, and you guys convince me that's actually a good 16 00:01:30,450 --> 00:01:37,680 ‫idea, too, to store Jason as a as a just a table will have multiple columns and each column will have 17 00:01:37,680 --> 00:01:45,030 ‫a specific size and that decide the row size and that Rossides will become either dynamic or static 18 00:01:45,270 --> 00:01:46,950 ‫based on how you define it. 19 00:01:47,310 --> 00:01:47,630 ‫Right. 20 00:01:47,720 --> 00:01:50,520 ‫Morcha vs. just normal text and so on. 21 00:01:51,070 --> 00:01:53,820 ‫And then the table is is huge. 22 00:01:53,940 --> 00:01:56,760 ‫Data structure is big and usually stored in disk. 23 00:01:56,760 --> 00:02:02,880 ‫And unless you can put it in memory, which is awesome for fast access queries, but usually it's in 24 00:02:02,880 --> 00:02:07,050 ‫disk because it's a blob huge of just blob. 25 00:02:07,050 --> 00:02:07,440 ‫Right. 26 00:02:08,820 --> 00:02:13,880 ‫So now it comes to indexes index and I mean a variable index. 27 00:02:14,340 --> 00:02:16,500 ‫Check out this video right here to learn more about them. 28 00:02:16,500 --> 00:02:22,860 ‫But an index is is basically a date and other data structure different than the table that tells us 29 00:02:24,240 --> 00:02:28,800 ‫where exactly the rules I'm searching for are. 30 00:02:29,130 --> 00:02:29,450 ‫Right. 31 00:02:29,940 --> 00:02:38,400 ‫And then the easiest way I understand the indexes is like, you know, these alphabetical dividers, 32 00:02:38,400 --> 00:02:42,930 ‫binder dividers, so that A, B, C, D until Z, right. 33 00:02:43,170 --> 00:02:48,240 ‫And then you're searching for specific like company-owned phone number. 34 00:02:48,840 --> 00:02:57,240 ‫So if you're searching for a company Xebra, I don't know those companies or you will jump to the Z 35 00:02:57,270 --> 00:02:59,760 ‫divider and then start searching for that. 36 00:02:59,760 --> 00:03:01,720 ‫So you have a smaller subset. 37 00:03:02,010 --> 00:03:11,790 ‫So what the index does in this case is actually stores a pointer to the first row in where the XYZ starts, 38 00:03:11,790 --> 00:03:12,370 ‫essentially. 39 00:03:12,420 --> 00:03:15,240 ‫So that's essentially how the index is always like a pointer. 40 00:03:15,600 --> 00:03:17,790 ‫So the row, the table, the big stuff. 41 00:03:17,910 --> 00:03:23,810 ‫And there is the other data structure that you can have a quick and access index and it points to that. 42 00:03:24,810 --> 00:03:26,550 ‫Not every database does that. 43 00:03:26,560 --> 00:03:26,850 ‫So. 44 00:03:28,370 --> 00:03:33,560 ‫How about we start with PostgreSQL, every index you create in Postgres. 45 00:03:34,880 --> 00:03:36,080 ‫Does exactly that. 46 00:03:36,560 --> 00:03:43,520 ‫So if you create an index on column A is going to create a data structure for you, that beautiful data 47 00:03:43,520 --> 00:03:55,640 ‫structure and each leaf, each entry will be pointing directly to that table directly to to a to a unique 48 00:03:55,640 --> 00:03:58,300 ‫identifier and the and the in the table. 49 00:03:58,340 --> 00:04:00,230 ‫That's not the primary key necessarily. 50 00:04:00,470 --> 00:04:03,140 ‫That's just a role that is most of the time it's invisible. 51 00:04:03,950 --> 00:04:06,180 ‫And Postgres, I think it's called that Topal idea. 52 00:04:06,950 --> 00:04:08,330 ‫So that's that's the data structure. 53 00:04:08,360 --> 00:04:12,630 ‫So if you're at an index point to the role, you might say that was the problem with that? 54 00:04:12,920 --> 00:04:13,760 ‫Not really, no. 55 00:04:13,770 --> 00:04:14,390 ‫No problem. 56 00:04:14,390 --> 00:04:21,260 ‫But pay attention to that and another index, because what what is does do it also still points to the 57 00:04:21,260 --> 00:04:21,650 ‫table. 58 00:04:23,170 --> 00:04:23,860 ‫Directly. 59 00:04:26,220 --> 00:04:27,260 ‫You see a pattern here, right? 60 00:04:27,840 --> 00:04:33,150 ‫The more indexes you create, all of these are actually put into the table, which is pretty neat. 61 00:04:33,170 --> 00:04:35,930 ‫So and that's what we that makes sense, right? 62 00:04:35,940 --> 00:04:36,090 ‫Right. 63 00:04:36,090 --> 00:04:37,490 ‫I'm searching something in this index. 64 00:04:37,710 --> 00:04:38,400 ‫I find it. 65 00:04:38,670 --> 00:04:40,320 ‫I jump to the table, I pull. 66 00:04:41,460 --> 00:04:47,410 ‫If I need to jump to the table, that's what's called index only scan versus an index scan, right. 67 00:04:47,440 --> 00:04:52,460 ‫And if I only need to jump the table, I know where to find my data. 68 00:04:53,670 --> 00:04:56,760 ‫So index is imposter's all point to the table directly. 69 00:05:00,770 --> 00:05:09,440 ‫My second and specifically A.B., in my second, because my uncle has other database engines that he 70 00:05:09,440 --> 00:05:13,400 ‫can swizzle in can change, which is something I love in my cycle. 71 00:05:13,730 --> 00:05:17,450 ‫And Maria, DBI, something you don't see in other databases. 72 00:05:17,660 --> 00:05:21,390 ‫We're going to swizzle the Davis engine of a table to be something completely different. 73 00:05:21,890 --> 00:05:26,060 ‫My Assim is a is a different database engine that it works. 74 00:05:26,060 --> 00:05:29,150 ‫Exactly almost like like Postgres. 75 00:05:29,150 --> 00:05:31,180 ‫It points directly to the table. 76 00:05:31,190 --> 00:05:38,170 ‫Each index we created posted a table and E.V., which is the default MySQL engine doesn't do that. 77 00:05:39,560 --> 00:05:42,350 ‫There is always a primary key in the table. 78 00:05:43,910 --> 00:05:50,840 ‫And when you heard that primarily if you don't A.B. or may seek to create one for you and then that 79 00:05:51,110 --> 00:05:54,320 ‫primary key index points to that table. 80 00:05:54,500 --> 00:05:54,840 ‫Right. 81 00:05:55,830 --> 00:06:02,850 ‫But any secondary index you create or any other index you create does not point to the table directly, 82 00:06:03,310 --> 00:06:09,630 ‫it points to the primary key value which corresponds to the table. 83 00:06:12,090 --> 00:06:13,020 ‫Multiple Hoppes. 84 00:06:16,180 --> 00:06:22,930 ‫There is a design, careful design choice for both, and let's let's explore both of them so we know. 85 00:06:23,880 --> 00:06:26,760 ‫Progress, every index points to the table directly. 86 00:06:28,200 --> 00:06:28,890 ‫Michael. 87 00:06:29,760 --> 00:06:34,620 ‫Every index points to the primary key and the primary key is the only index that points the table. 88 00:06:35,010 --> 00:06:38,010 ‫So if I did an update and both databases. 89 00:06:39,300 --> 00:06:39,810 ‫And I. 90 00:06:41,020 --> 00:06:41,890 ‫Updated Ororo. 91 00:06:43,900 --> 00:06:45,280 ‫Right, or deleted or. 92 00:06:47,810 --> 00:06:49,350 ‫In postcrisis, I deleted her. 93 00:06:49,360 --> 00:06:50,960 ‫All right. 94 00:06:52,350 --> 00:06:54,220 ‫And that road doesn't exist anymore. 95 00:06:54,510 --> 00:06:59,100 ‫I have to tell index number one, index number two, index number three in December form, December 96 00:06:59,100 --> 00:07:02,060 ‫six, that that road doesn't exist because guess what? 97 00:07:02,400 --> 00:07:07,830 ‫All of them actually are aware of our role and we are aware of the stable. 98 00:07:08,190 --> 00:07:09,630 ‫So there are dependencies. 99 00:07:10,110 --> 00:07:14,880 ‫So false because when you did this, it actually goes on update all indexes. 100 00:07:15,960 --> 00:07:20,180 ‫The mean it's slower or bad is just that's what it does, right? 101 00:07:22,280 --> 00:07:25,190 ‫And my Michael, when you update. 102 00:07:26,170 --> 00:07:27,550 ‫I wrote a literal. 103 00:07:29,420 --> 00:07:30,660 ‫Just update the primary. 104 00:07:31,760 --> 00:07:37,130 ‫Right, because that's that's the only thing we need to do, because these guys will still point to 105 00:07:37,130 --> 00:07:39,470 ‫the to the to the to the primary in. 106 00:07:39,650 --> 00:07:45,290 ‫And that said, they they don't have they are not aware that our role has been deleted. 107 00:07:45,290 --> 00:07:45,620 ‫Right. 108 00:07:46,100 --> 00:07:54,110 ‫So that's as long as you don't touch a specific value, like for for example, if you if you updated 109 00:07:54,620 --> 00:07:58,100 ‫another column in our role that. 110 00:07:59,320 --> 00:08:06,220 ‫The index, which you know, on which this is a created need to be aware, right, and that is in that 111 00:08:06,220 --> 00:08:13,510 ‫case, you will need to update that index anyway, but only the index that you actually touched for 112 00:08:13,510 --> 00:08:14,500 ‫the columns you touched. 113 00:08:14,980 --> 00:08:17,000 ‫So that's that's how things work in general. 114 00:08:17,020 --> 00:08:17,290 ‫So. 115 00:08:18,580 --> 00:08:25,090 ‫Hey, guys, saying from editing and I forgot to mention one thing in my school, remember, every index 116 00:08:25,090 --> 00:08:29,080 ‫points to the primary key and primary points of the table. 117 00:08:29,320 --> 00:08:34,050 ‫So if you did a lot of updates on the primary key, guess what will happen? 118 00:08:34,450 --> 00:08:37,470 ‫Every single index will need to be updated. 119 00:08:37,660 --> 00:08:38,140 ‫So. 120 00:08:39,250 --> 00:08:47,470 ‫Really careful while updating primary keys in my school, so that's another kind of fun thing to be 121 00:08:47,470 --> 00:08:48,080 ‫aware of. 122 00:08:48,100 --> 00:08:49,630 ‫I'm not going to say limitation. 123 00:08:50,750 --> 00:08:52,300 ‫But just something to be aware of. 124 00:08:53,420 --> 00:08:54,110 ‫Back to video. 125 00:08:54,910 --> 00:09:02,050 ‫And updates my sexual versus progress, you can see that it's a little bit less scattered. 126 00:09:04,290 --> 00:09:07,110 ‫Reads, however, if I read. 127 00:09:08,210 --> 00:09:11,600 ‫In in in in Postgres, using an index. 128 00:09:13,280 --> 00:09:19,460 ‫I'm going to find the value I want, and if I ask for a road that doesn't exist in an index, it's not 129 00:09:19,460 --> 00:09:25,610 ‫including the next I have to jump to the table and I can jump very quickly because I immediately look 130 00:09:25,610 --> 00:09:26,960 ‫at the other table. 131 00:09:26,990 --> 00:09:29,900 ‫I have the road that points to the table directly and I read it. 132 00:09:30,110 --> 00:09:32,570 ‫So Rieser and Postgres are extremely fast. 133 00:09:33,020 --> 00:09:34,220 ‫Does not mean that. 134 00:09:35,750 --> 00:09:40,650 ‫Reads on my sequel's laws, just just understand how they all things work, right? 135 00:09:41,030 --> 00:09:52,760 ‫So now if I'm reading a particular table in my school and I'm using a specific index and I found what 136 00:09:52,760 --> 00:09:54,710 ‫I am looking for, I cannot jump. 137 00:09:54,710 --> 00:09:57,640 ‫And I need another value to select from the table. 138 00:09:58,280 --> 00:10:00,890 ‫I cannot jump to the table directly. 139 00:10:01,310 --> 00:10:07,550 ‫I have to jump to the primary index and then from the primary key index. 140 00:10:08,720 --> 00:10:20,300 ‫I get the roar and then I jump to that there is an extra hop, so raids are a little bit tiny, bit 141 00:10:20,300 --> 00:10:20,810 ‫slower. 142 00:10:22,210 --> 00:10:28,270 ‫Sometimes it's unnoticeable, but just understand, if you're doing this a lot, I understand what's 143 00:10:28,270 --> 00:10:28,630 ‫going on. 144 00:10:28,840 --> 00:10:34,090 ‫So if you're but but the beauty of this, if you're selecting the primary here a lot in my school, 145 00:10:34,090 --> 00:10:40,310 ‫that's really beneficial because the work is almost always included with every index. 146 00:10:40,660 --> 00:10:41,050 ‫So. 147 00:10:42,090 --> 00:10:48,960 ‫That choice of what becomes the primary key and A.B. becomes really critical and it can boost your performance 148 00:10:48,960 --> 00:10:56,000 ‫in a in a in an amazing manner while impulse reads our fast right. 149 00:10:56,010 --> 00:11:06,060 ‫Because and any Ahepe value pulls like pulling from the heap or pulling from the table also pretty, 150 00:11:06,450 --> 00:11:09,990 ‫extremely good because you're pulling to the table directly. 151 00:11:10,200 --> 00:11:15,750 ‫I'm obviously doing it over, overdoing it, overdoing everything is just basically slower. 152 00:11:15,750 --> 00:11:21,840 ‫But that's just very something that is very interesting to understand when you when you deal with these 153 00:11:21,840 --> 00:11:22,260 ‫things. 154 00:11:22,670 --> 00:11:33,510 ‫The suppose was updating a lot of fros or deleting a lot of rows might trigger a lot of update to a 155 00:11:33,510 --> 00:11:34,400 ‫lot of indexes. 156 00:11:34,410 --> 00:11:36,950 ‫So just pay attention to that when you do that. 157 00:11:39,940 --> 00:11:41,870 ‫And that's it, that's what I wanted to talk about. 158 00:11:42,310 --> 00:11:44,510 ‫Obviously, I want to make this video short. 159 00:11:44,860 --> 00:11:45,790 ‫What do you guys think? 160 00:11:46,300 --> 00:11:49,750 ‫There is always a choice between Posterous and Moissac on any level. 161 00:11:49,750 --> 00:11:52,240 ‫The database doesn't mean postcrisis bad. 162 00:11:52,240 --> 00:11:57,100 ‫Doesn't mean my cycle is as good, doesn't mean anything. 163 00:11:57,130 --> 00:11:58,810 ‫It really depends on what you're trying to do. 164 00:11:59,950 --> 00:12:06,250 ‫But I believe one of their main concerns of moving from Posterous to to miscall is exactly that. 165 00:12:06,580 --> 00:12:08,570 ‫They had tons of indexes. 166 00:12:08,830 --> 00:12:15,820 ‫I was surprised, like, why do you have this much indexes, like too much indexes, too many indexes, 167 00:12:16,030 --> 00:12:20,390 ‫and as a result, touching Rose if they touch a lot of rose Abdeh. 168 00:12:20,450 --> 00:12:21,280 ‫A lot of rose. 169 00:12:23,100 --> 00:12:28,710 ‫It just just thrashed all the indexes because you stopped it all lenders, you need to tell everybody 170 00:12:29,010 --> 00:12:32,700 ‫that points to the table know that something has changed, right? 171 00:12:33,400 --> 00:12:35,320 ‫Especially if you update it. 172 00:12:35,430 --> 00:12:41,280 ‫If you insert the update, these these things are all scatting everywhere and postcrisis. 173 00:12:41,280 --> 00:12:47,650 ‫Like it's a little bit different when it comes to updating versus deleting the they keep the roads alive 174 00:12:47,650 --> 00:12:49,470 ‫for NBCC reasons. 175 00:12:49,830 --> 00:12:50,310 ‫All right, guys. 176 00:12:50,430 --> 00:12:53,540 ‫Kind of gave them a very video shot I'm going to see on the next one. 177 00:12:54,030 --> 00:12:56,130 ‫What do you think about these two beautiful databases? 178 00:12:56,140 --> 00:12:57,030 ‫What do you prefer? 179 00:12:57,240 --> 00:13:00,000 ‫I mean, in that comment section below, I'm going to see. 180 00:13:00,000 --> 00:13:01,070 ‫And that's what I say. 181 00:13:01,500 --> 00:13:01,770 ‫Good bye.