1 00:00:00,090 --> 00:00:00,970 ‫What's going on, guys? 2 00:00:00,990 --> 00:00:07,620 ‫My name is Hussein, and I want to take a few minutes to explain the difference between sequential table 3 00:00:07,620 --> 00:00:11,040 ‫scan and an index scan. 4 00:00:12,000 --> 00:00:19,110 ‫And a bitmap index scan, which is something specific to Posterous, but regardless, it's very, very 5 00:00:19,110 --> 00:00:19,700 ‫clever. 6 00:00:19,710 --> 00:00:22,770 ‫And I want to take a few minutes to explain the difference between three. 7 00:00:24,090 --> 00:00:29,430 ‫You might have a more familiar with the sequence of sequential sequential tables scan as full table 8 00:00:29,430 --> 00:00:31,830 ‫scan, but that's just about terminology. 9 00:00:31,920 --> 00:00:35,150 ‫How about we jump into it and play with this little bit? 10 00:00:35,160 --> 00:00:42,390 ‫So I have a table here called grades and it has an ID, which is a primary key and it's a LTA increments 11 00:00:42,450 --> 00:00:45,500 ‫serial have a primary key, which is an index. 12 00:00:45,510 --> 00:00:51,000 ‫I have an index on the ID field is just a one, two or three incremental value of a name which not index 13 00:00:51,000 --> 00:00:51,810 ‫or anything like that. 14 00:00:52,050 --> 00:01:00,630 ‫I have a grade field G which indicate this is thinks of all this like a grade of students and G is a 15 00:01:00,630 --> 00:01:03,120 ‫value between zero and one hundred. 16 00:01:03,660 --> 00:01:10,380 ‫And there is an index on that grade field that includes as a Nunk key value. 17 00:01:10,380 --> 00:01:15,510 ‫Also the ID feel, which is pretty, something we talked about in many videos. 18 00:01:16,470 --> 00:01:24,690 ‫So now what I want to do here explain many aspects of how can I query this table to retrieve certain 19 00:01:24,690 --> 00:01:26,220 ‫data and pay attention to this. 20 00:01:26,500 --> 00:01:31,190 ‫I'm going to explain, do and explain, because I don't really interested in the results. 21 00:01:31,200 --> 00:01:35,310 ‫I want I just want to know what postcrisis is about to do when I want to ask you this. 22 00:01:35,760 --> 00:01:42,110 ‫So I'm going to select name the name of this student from the grades where ID is equal a thousand. 23 00:01:42,120 --> 00:01:42,430 ‫Right. 24 00:01:42,630 --> 00:01:44,100 ‫Whether there is a specific ID. 25 00:01:44,250 --> 00:01:50,010 ‫And the moment you do that, you can see that that postcrisis is going to use an index scan using the 26 00:01:50,010 --> 00:01:56,430 ‫grades primary key, which is the ID, and then provide the values using the index condition on the 27 00:01:56,430 --> 00:01:57,080 ‫index. 28 00:01:57,510 --> 00:02:00,390 ‫That's the ID equal thousand and is going to pull. 29 00:02:00,390 --> 00:02:02,100 ‫We know that is going to build one value. 30 00:02:02,100 --> 00:02:02,400 ‫Right. 31 00:02:02,700 --> 00:02:06,470 ‫And that explains a lot why we have one goal once we pull that row. 32 00:02:06,840 --> 00:02:13,320 ‫Now the index is responsible to go back to the table to bring the name field. 33 00:02:13,320 --> 00:02:13,670 ‫Why? 34 00:02:13,680 --> 00:02:16,260 ‫Because the name is part of the heap. 35 00:02:17,070 --> 00:02:22,320 ‫It's part of the table, which is part of the hip, which is another data structure, the index is a 36 00:02:22,320 --> 00:02:23,210 ‫different data structure. 37 00:02:23,800 --> 00:02:29,040 ‫Now, if I if I if I do this, for example, I like less than that's a hundred. 38 00:02:29,620 --> 00:02:31,680 ‫So I'm bringing around ninety nine value. 39 00:02:31,920 --> 00:02:36,010 ‫So Polska still decided to do an index scan. 40 00:02:36,030 --> 00:02:37,950 ‫So what, what, what happened here. 41 00:02:37,980 --> 00:02:39,590 ‫Let's go through exactly what happened. 42 00:02:39,600 --> 00:02:41,850 ‫So we brought ninety eight rows in this case. 43 00:02:41,880 --> 00:02:44,790 ‫What Postgres did, it says OK you want anything. 44 00:02:44,790 --> 00:02:46,290 ‫Any student that has an idea. 45 00:02:46,290 --> 00:02:50,370 ‫Less than one hundred, which is kind of a silly query if you think about I don't what database, what 46 00:02:50,370 --> 00:02:55,890 ‫the database decides says OK, I'm going to use the index to do an index scan on the on the ID on that 47 00:02:56,160 --> 00:03:01,470 ‫index and then go scan one by one and find find anything that has less than one hundred. 48 00:03:01,800 --> 00:03:09,120 ‫And that's very quick to do in a B three, especially that we kind of have a clue of how many rows is 49 00:03:09,120 --> 00:03:09,810 ‫going to get back. 50 00:03:10,320 --> 00:03:14,730 ‫So it's OK, I want anything that's hundred, anything that is greater than one hundred, just toss 51 00:03:14,730 --> 00:03:23,160 ‫it away and so it knows that is going to get values ninety nine or less because maybe something was 52 00:03:23,160 --> 00:03:25,680 ‫deleted and as a result it used an index. 53 00:03:26,130 --> 00:03:35,400 ‫And for each value that it finds, it finds the page where that rule exists and goes back to the heap 54 00:03:35,640 --> 00:03:41,990 ‫to pull that page which kind of have one or more rows and then pulls that value. 55 00:03:42,000 --> 00:03:43,410 ‫And this is called random access. 56 00:03:43,620 --> 00:03:52,170 ‫So four ninety nine, it went to the page of the heap that it did that four ninety eight for all the 57 00:03:52,200 --> 00:03:52,620 ‫rows. 58 00:03:53,490 --> 00:03:55,260 ‫So this is called random access. 59 00:03:55,260 --> 00:03:59,550 ‫So you can see that this can get slow if you have a lot of rows. 60 00:03:59,700 --> 00:04:01,740 ‫So let's go do something ridiculous. 61 00:04:01,750 --> 00:04:07,140 ‫By the way, I have around fifty million rows in this table, so I'm going to do it greater than one 62 00:04:07,140 --> 00:04:07,440 ‫hundred. 63 00:04:08,720 --> 00:04:16,420 ‫Polska is actually smart enough to know, OK, I expect at this table this query is going to be insane. 64 00:04:17,030 --> 00:04:21,560 ‫So look what that database did here says, OK, it is better than hundred. 65 00:04:22,670 --> 00:04:24,380 ‫I have what is the maximum? 66 00:04:24,380 --> 00:04:26,150 ‫It it has them statistics. 67 00:04:26,150 --> 00:04:26,410 ‫Right. 68 00:04:26,510 --> 00:04:34,040 ‫And I Kalka this a statistic by doing analyses grades and when you do analyze grades it will calculate 69 00:04:34,040 --> 00:04:39,260 ‫some statistics on the table, says, OK, this table has 50 million and this guy wants anything that's 70 00:04:39,260 --> 00:04:44,000 ‫greater than hundred that's going to bring 50 million rows back. 71 00:04:44,000 --> 00:04:46,210 ‫Is that 50 million to five million? 72 00:04:46,320 --> 00:04:46,510 ‫Right. 73 00:04:46,730 --> 00:04:48,550 ‫I forgot how many rows I have in this table. 74 00:04:48,920 --> 00:04:49,990 ‫So five million. 75 00:04:50,030 --> 00:04:52,400 ‫So bring all of those properties back. 76 00:04:53,090 --> 00:04:53,910 ‫Well, wait a second. 77 00:04:54,410 --> 00:04:56,300 ‫Scanning the index. 78 00:04:57,380 --> 00:05:03,440 ‫And every time I find a ruined the index, they're satisfied 100, which, guess what, almost all the 79 00:05:03,440 --> 00:05:07,670 ‫rules will satisfy that except the hundred that is listed in the hundred. 80 00:05:07,670 --> 00:05:08,000 ‫Right. 81 00:05:08,840 --> 00:05:10,190 ‫I'm going to go and jump. 82 00:05:10,220 --> 00:05:13,920 ‫That is extremely expensive to jump back and do a random exit. 83 00:05:14,100 --> 00:05:17,570 ‫So both of us decided to do a sequential scan. 84 00:05:18,580 --> 00:05:21,290 ‫What was that sequential scan and says you don't want. 85 00:05:21,460 --> 00:05:26,220 ‫Let me just let me just jump to the table and do a sequential scan on the table. 86 00:05:26,500 --> 00:05:34,390 ‫It's way cheaper to do a sequential scan, full table scan on the table there and pull the results. 87 00:05:34,600 --> 00:05:35,750 ‫Now, we have two things. 88 00:05:35,950 --> 00:05:43,330 ‫If the book is so that you have fumaroles is going to use an index scan, if you if I thought that is 89 00:05:43,330 --> 00:05:48,160 ‫going to have a lot of rows is going to say, you know what, it's way cheaper for me to do a sequential 90 00:05:48,160 --> 00:05:51,240 ‫scan on the table because I'm going to go there anyway. 91 00:05:51,370 --> 00:05:55,120 ‫I don't want to jump read the index and then read a table that's expensive. 92 00:05:55,600 --> 00:06:00,910 ‫So if I'm reading at a table and the index all the time, might as well just go and read the table fully. 93 00:06:00,910 --> 00:06:01,180 ‫Right. 94 00:06:01,180 --> 00:06:01,870 ‫That's easier. 95 00:06:02,320 --> 00:06:09,010 ‫So now that compromise between the two is when you have a case where you don't have a lot of rows, 96 00:06:09,010 --> 00:06:10,840 ‫you don't have a lot of rows. 97 00:06:10,840 --> 00:06:17,020 ‫So that justifies a full table scan, but you have quality enough that it's actually efficient to use 98 00:06:17,020 --> 00:06:17,450 ‫the index. 99 00:06:17,710 --> 00:06:20,680 ‫So what this does is actually does a bitmap scan. 100 00:06:20,690 --> 00:06:22,270 ‫Let's give an example of a witness. 101 00:06:23,050 --> 00:06:29,650 ‫A witness scan is when you do when you do something a little bit slightly less ridiculous, like give 102 00:06:29,650 --> 00:06:31,780 ‫me the grades that are greater than ninety five. 103 00:06:31,870 --> 00:06:32,200 ‫Right. 104 00:06:33,030 --> 00:06:33,330 ‫Like. 105 00:06:34,510 --> 00:06:39,790 ‫Course doesn't really know how much, but it knows this a lot, but since there is an index on. 106 00:06:41,400 --> 00:06:48,990 ‫Which is the grades it's worth doing an index scan, but doing an index card and jumping to the table 107 00:06:48,990 --> 00:06:52,700 ‫all the time for every value I find is expensive. 108 00:06:52,710 --> 00:06:56,040 ‫So it's going to do something called the bitmap index. 109 00:06:56,970 --> 00:06:58,080 ‫Let's talk about that a little bit. 110 00:06:58,290 --> 00:07:05,190 ‫So the bitmap is good index and you're going to see something some some graphics that I put on the video 111 00:07:05,190 --> 00:07:05,550 ‫here. 112 00:07:05,760 --> 00:07:06,240 ‫Where? 113 00:07:07,340 --> 00:07:14,780 ‫Polska is going to build a bitmap and it's basically an array of bits and the values represent the page 114 00:07:14,780 --> 00:07:21,420 ‫number, so bit number zero is page number zero, but no one is number one and so on and below that 115 00:07:21,420 --> 00:07:26,600 ‫number, total number of pages that is in this relation or this table in this case. 116 00:07:27,350 --> 00:07:36,020 ‫OK, so with that, with the Postgres will do it will do a scan, a bitmap scan G and says, you know 117 00:07:36,020 --> 00:07:41,430 ‫what, when I find a value that satisfies this condition, I'm going to do an index scan. 118 00:07:41,450 --> 00:07:42,770 ‫This is a normal scan. 119 00:07:42,770 --> 00:07:49,010 ‫But when I find something, I'm not going to jump directly to the table when I find a page. 120 00:07:49,370 --> 00:07:50,050 ‫No, no, no. 121 00:07:50,060 --> 00:07:53,330 ‫I'm going to set a bit on that bitmap. 122 00:07:53,540 --> 00:07:58,490 ‫So, for example, let's say I found a right and that rule belongs to page number nine. 123 00:07:58,820 --> 00:08:03,630 ‫So I'm going to go to that bitmap and sit with number nine as one. 124 00:08:03,680 --> 00:08:09,650 ‫That means, hey, by the way, I found a role that and Bajou right on page nine and then does goes 125 00:08:09,650 --> 00:08:12,740 ‫goes into all of the index and scans the whole index. 126 00:08:12,830 --> 00:08:20,870 ‫You have a beautiful bitmap that you can use to jump once to the heap table and pull all these pages 127 00:08:21,230 --> 00:08:21,520 ‫get. 128 00:08:21,710 --> 00:08:24,010 ‫So that's that's what we're going to do right here. 129 00:08:24,380 --> 00:08:26,990 ‫Now we're going to do a bitmap scan. 130 00:08:27,110 --> 00:08:31,730 ‫So we're using the number of pages then I know exactly to fetch. 131 00:08:31,970 --> 00:08:36,500 ‫I need to jump to the heap once and all those pages. 132 00:08:37,050 --> 00:08:37,350 ‫OK. 133 00:08:37,850 --> 00:08:44,390 ‫And when I finish all those pages, that's what every page is going to have one or more. 134 00:08:44,540 --> 00:08:44,990 ‫Right. 135 00:08:45,020 --> 00:08:46,600 ‫That's how Polska starts things. 136 00:08:46,940 --> 00:08:53,950 ‫That's what some of those rules will not satisfy the G greater than nine ninety five criteria so that 137 00:08:54,530 --> 00:09:02,600 ‫the Postgres does every check on the condition only to filter out the rows that has been filtered and 138 00:09:02,600 --> 00:09:03,860 ‫drops the rows that have been put. 139 00:09:03,860 --> 00:09:10,970 ‫In this case, we were lucky that that exact number of rows actually fit exactly in the and the pages. 140 00:09:11,420 --> 00:09:13,940 ‫So that's a bitmap index. 141 00:09:14,090 --> 00:09:16,010 ‫So this is very, very interesting, guys. 142 00:09:16,220 --> 00:09:17,860 ‫So let's spice things up a little bit. 143 00:09:17,870 --> 00:09:20,210 ‫So let's spice things up a little bit with a bitmap. 144 00:09:20,210 --> 00:09:29,330 ‫And so the beauty of the bitmap index is you can scan the multiple indexes or indices of them from the 145 00:09:29,330 --> 00:09:29,840 ‫UK. 146 00:09:30,260 --> 00:09:36,350 ‫You can scan multiple indexes, build those Bittner's and them together, and then get one beautiful 147 00:09:36,350 --> 00:09:38,070 ‫bitmap and then jump to the heap. 148 00:09:38,090 --> 00:09:38,950 ‫Let's take an example. 149 00:09:39,230 --> 00:09:39,740 ‫I'm going to do it. 150 00:09:39,740 --> 00:09:43,790 ‫Explain where greater is greater than 95 and ideas list then. 151 00:09:43,790 --> 00:09:44,210 ‫I don't know. 152 00:09:44,210 --> 00:09:46,940 ‫Ten thousand when I do that. 153 00:09:47,090 --> 00:09:48,140 ‫Look what happened here. 154 00:09:49,500 --> 00:09:52,090 ‫I.D. has an index and she has an index. 155 00:09:52,110 --> 00:09:54,840 ‫So what does it do then? 156 00:09:55,400 --> 00:10:01,220 ‫An index can a bitmap index can on G and calculated that beautiful bitmap that we talked about. 157 00:10:01,410 --> 00:10:09,240 ‫And then another bitmap scan on grades, Primerica, which is the ID field and then had now I have to 158 00:10:09,240 --> 00:10:12,120 ‫bitmap what it did it just and then. 159 00:10:13,100 --> 00:10:19,710 ‫You know how smart this is, guys, by ending then any page that exists in one hand doesn't exist another. 160 00:10:20,060 --> 00:10:25,560 ‫I don't have to visit it because one hundred percent that Raul is not going to satisfy my criteria. 161 00:10:26,360 --> 00:10:27,390 ‫This is genius. 162 00:10:27,710 --> 00:10:36,770 ‫So all of a sudden, instead of having nine thousand rules and whatever this 20 thousand rows, you 163 00:10:36,770 --> 00:10:39,100 ‫only get three hundred ninety four rules. 164 00:10:39,500 --> 00:10:40,300 ‫Beautiful. 165 00:10:40,310 --> 00:10:46,700 ‫And then once we have this beautiful final thank you, go to that heap, do a bitmap heap, scan this 166 00:10:46,700 --> 00:10:52,070 ‫time with this final result and the result, and then go and do a recheck. 167 00:10:52,070 --> 00:10:52,550 ‫Why? 168 00:10:52,560 --> 00:10:58,250 ‫Because you have to do a recheck because you're pitching some pages, some rows on those pages that 169 00:10:58,250 --> 00:11:00,440 ‫you're pitching might not satisfy your criteria. 170 00:11:00,650 --> 00:11:06,070 ‫So you're doing a recheck again on those thing that's not so expensive because once you figure page, 171 00:11:06,080 --> 00:11:11,780 ‫you get you get those calls for free in memory, hot right in the shared memory, to be specific. 172 00:11:12,170 --> 00:11:12,690 ‫All right, guys. 173 00:11:12,830 --> 00:11:19,280 ‫So that was April scan, full table scan, sequential scan and scan and bitmap and scan and bitmap. 174 00:11:19,280 --> 00:11:20,840 ‫Hebes can help you enjoy this video. 175 00:11:20,870 --> 00:11:21,980 ‫I'm going to see on the next one. 176 00:11:21,980 --> 00:11:24,280 ‫You guys, the state by.