1 00:00:00,060 --> 00:00:06,330 ‫Welcome to another lecture in this Q&A section of the database Engineering Fundamentals. 2 00:00:06,330 --> 00:00:11,510 ‫And today we have a question from karate. 3 00:00:11,520 --> 00:00:14,010 ‫Sorry if I didn't answer your name correctly. 4 00:00:14,100 --> 00:00:15,630 ‫It's a very good question, actually. 5 00:00:16,240 --> 00:00:23,000 ‫It says, Hey, I have a pulse, Chris, Using sequential scan rather than index only scan. 6 00:00:23,010 --> 00:00:24,360 ‫Why is that? 7 00:00:24,360 --> 00:00:26,100 ‫And that's a very good question. 8 00:00:26,100 --> 00:00:30,120 ‫Let's take a look at what was what was a curator's question here. 9 00:00:31,290 --> 00:00:32,820 ‫I'm using PostgreSQL. 10 00:00:32,820 --> 00:00:39,810 ‫I have created a table with four columns ID, first name, last name, date of birth, and I created 11 00:00:39,810 --> 00:00:40,860 ‫an index. 12 00:00:42,860 --> 00:00:45,740 ‫On the ID field, which is a primary key as well. 13 00:00:45,770 --> 00:00:54,260 ‫When I do a select query like this, which is a select ID where from, obviously he forgot to add the 14 00:00:54,260 --> 00:01:01,670 ‫table here, it's like from blah blah, the table where id equal to he's getting it is using a sequential 15 00:01:01,670 --> 00:01:04,190 ‫scan rather than an index only scan. 16 00:01:04,220 --> 00:01:05,720 ‫What's the issue here? 17 00:01:05,750 --> 00:01:08,330 ‫The table has only seven rows. 18 00:01:08,510 --> 00:01:12,800 ‫Even if he didn't say that, I would have guessed that his table is small. 19 00:01:13,960 --> 00:01:14,240 ‫All right. 20 00:01:14,240 --> 00:01:15,680 ‫So my guess is here. 21 00:01:15,680 --> 00:01:17,600 ‫So what is happening here? 22 00:01:17,810 --> 00:01:23,540 ‫All right, so I already answered here, but I'll give you I'll give you a hint of what's happening 23 00:01:23,540 --> 00:01:23,870 ‫here. 24 00:01:23,900 --> 00:01:24,470 ‫The. 25 00:01:25,580 --> 00:01:32,720 ‫You see, Postgres or any database planner will get to understand how things work, right? 26 00:01:33,830 --> 00:01:44,150 ‫We have the data of the table in a file and we created also an index, which is another file, and the 27 00:01:44,150 --> 00:01:49,720 ‫index has the structure of we talked about the B3 structure for an entire section. 28 00:01:49,730 --> 00:01:54,290 ‫It's a complex structure with so many intensities, right? 29 00:01:54,380 --> 00:02:00,470 ‫Where we have a root, we have intermediate nodes, we have deltas there, we have leaf nodes, we have 30 00:02:00,470 --> 00:02:06,710 ‫complex structure here and we have the data file, which is just a dump of everything that is in the 31 00:02:06,710 --> 00:02:07,280 ‫table. 32 00:02:07,370 --> 00:02:15,170 ‫So when we load the database and we start querying the table, what will the database will look is like, 33 00:02:15,170 --> 00:02:15,800 ‫okay. 34 00:02:16,840 --> 00:02:21,370 ‫We have we have the data file and it's a single page. 35 00:02:21,400 --> 00:02:22,930 ‫Why is it a single page? 36 00:02:23,080 --> 00:02:31,630 ‫Because it has only seven rows and the seven rows happened to fit neatly inside that page. 37 00:02:31,960 --> 00:02:35,650 ‫A page is around, what, eight K and postcodes? 38 00:02:35,650 --> 00:02:36,160 ‫I think. 39 00:02:36,160 --> 00:02:36,550 ‫So. 40 00:02:36,550 --> 00:02:39,100 ‫This is way too small of a table. 41 00:02:39,100 --> 00:02:39,390 ‫Right? 42 00:02:39,490 --> 00:02:40,870 ‫So it's a single page. 43 00:02:41,780 --> 00:02:43,530 ‫And here's what the database says. 44 00:02:43,550 --> 00:02:44,180 ‫All right. 45 00:02:44,600 --> 00:02:52,880 ‫What I'm going to do is actually, yes, you start using the ID field, but guess what? 46 00:02:52,910 --> 00:02:55,820 ‫It's not even worth it to for me to go to the index. 47 00:02:55,820 --> 00:02:59,260 ‫Because to go to the index, I have to do two things. 48 00:02:59,270 --> 00:03:02,210 ‫I have to go to the index, I have to fetch the index. 49 00:03:02,210 --> 00:03:08,540 ‫I have to build the B three index in my memory and then I have to traverse it just to find everything 50 00:03:08,540 --> 00:03:10,940 ‫that has ID equal to versus. 51 00:03:10,940 --> 00:03:15,650 ‫I know the whole table is one page and that will cost me one. 52 00:03:16,370 --> 00:03:17,780 ‫So here it is. 53 00:03:17,780 --> 00:03:21,710 ‫I'm going to do an IEO and I'm going to pull it in memory. 54 00:03:21,710 --> 00:03:27,560 ‫And that goes to the heap as we talked about it, the postgres heap, and then that page in the heap 55 00:03:27,560 --> 00:03:34,310 ‫was, etc. Here is our heritage user because we just read all the things that are ID equal to and it's 56 00:03:34,310 --> 00:03:37,700 ‫a single read, it's a sequential scan in this case. 57 00:03:37,790 --> 00:03:39,830 ‫That's why tables. 58 00:03:40,430 --> 00:03:42,950 ‫In this particular case, there are too tiny. 59 00:03:42,950 --> 00:03:46,170 ‫The database will just do a sequential scan on the heap, right. 60 00:03:46,200 --> 00:03:47,480 ‫It will just fetch one page. 61 00:03:47,480 --> 00:03:50,270 ‫And that's zero cost almost to false. 62 00:03:50,270 --> 00:03:50,570 ‫Correct. 63 00:03:51,680 --> 00:03:53,990 ‫Now, here is where it gets interesting. 64 00:03:53,990 --> 00:03:59,510 ‫And I got bitten many times that I have count with this, actually. 65 00:03:59,510 --> 00:04:06,440 ‫And so I have a tiny table and I start querying this like like what karate did here, right? 66 00:04:06,560 --> 00:04:09,460 ‫I was querying it and the plan started getting cached. 67 00:04:09,470 --> 00:04:10,040 ‫This is it. 68 00:04:10,070 --> 00:04:13,490 ‫Oh, this table is too tiny, so sequential scan is the best. 69 00:04:13,490 --> 00:04:18,520 ‫I don't want to go to the complex fee tree and scan many indexes to get my results. 70 00:04:18,530 --> 00:04:22,610 ‫I was going to do a sequential scan because it's just it's a simple table. 71 00:04:22,610 --> 00:04:23,540 ‫It's a small table. 72 00:04:24,590 --> 00:04:31,160 ‫And what I do is immediately after doing all the queries, I would insert my data. 73 00:04:32,290 --> 00:04:34,390 ‫And when I do that right. 74 00:04:35,560 --> 00:04:39,670 ‫What will happen is I started like inserting 3 million rows, right? 75 00:04:39,760 --> 00:04:44,230 ‫And immediately after 3 million, I didn't give the database times to breathe. 76 00:04:44,260 --> 00:04:52,210 ‫I would start querying for a few minutes and my queries are so slow. 77 00:04:52,990 --> 00:04:54,850 ‫Why is that right? 78 00:04:55,060 --> 00:04:56,770 ‫Why are more queries slow? 79 00:04:57,340 --> 00:05:03,010 ‫Because the database thinks the table is small. 80 00:05:03,190 --> 00:05:09,100 ‫So if I have a query, the statistics and we talked about statistics in the course. 81 00:05:09,710 --> 00:05:15,410 ‫That the statistics, which tells the table how many rows are in there, are not up to date with the 82 00:05:15,410 --> 00:05:17,260 ‫3 million rolls that are inserted. 83 00:05:17,270 --> 00:05:20,390 ‫So the planner thinks, Oh, it's a tables that is small. 84 00:05:20,510 --> 00:05:22,250 ‫Let me go and query it. 85 00:05:22,760 --> 00:05:30,470 ‫And when it does that, it says, Oh, it starts doing a sequential scan of an actual instead of an 86 00:05:30,470 --> 00:05:31,700 ‫actual performance. 87 00:05:31,730 --> 00:05:32,960 ‫B three scan. 88 00:05:32,990 --> 00:05:39,200 ‫Because remember B, three scans or an index scans are very amazing when you have large amounts of data 89 00:05:39,200 --> 00:05:41,870 ‫because you can traverse a few blocks. 90 00:05:41,990 --> 00:05:47,810 ‫But then you're going to find exactly what you want in the B three immediately go to that. 91 00:05:48,320 --> 00:05:48,900 ‫Right. 92 00:05:48,920 --> 00:05:50,360 ‫That's the problem here. 93 00:05:50,600 --> 00:05:53,540 ‫The problem is the planner is out of date. 94 00:05:53,840 --> 00:05:54,170 ‫Right. 95 00:05:54,170 --> 00:05:55,820 ‫The cache, the stats that already. 96 00:05:55,820 --> 00:05:58,390 ‫So the planner thinks, let me do a sequential scan. 97 00:05:58,400 --> 00:06:04,430 ‫So start scanning then toyour heap fetching millions of pages. 98 00:06:05,480 --> 00:06:10,520 ‫I'm exaggerating thousands of pages to find what I'm looking for. 99 00:06:10,700 --> 00:06:11,390 ‫Why? 100 00:06:11,420 --> 00:06:13,280 ‫Because it was out of date. 101 00:06:13,520 --> 00:06:18,880 ‫Obviously, this is a very common problem when you just insert the data and immediately you query it, 102 00:06:18,890 --> 00:06:21,620 ‫That's that's something you have to watch out for. 103 00:06:21,650 --> 00:06:25,990 ‫That's why after inserting a lot of data, you have to do a vacuum in case of Bosphorus. 104 00:06:26,000 --> 00:06:29,260 ‫And in the case of Oracle, you have to do a. 105 00:06:30,310 --> 00:06:38,320 ‫Vacuum analyser, I suppose, will will do computing the stats again, it will just do a count against 106 00:06:38,320 --> 00:06:38,770 ‫its OC. 107 00:06:38,800 --> 00:06:41,020 ‫This is how many active rows are there? 108 00:06:41,410 --> 00:06:45,370 ‫I'm going to update my stats so the planner will make smarter decisions. 109 00:06:45,610 --> 00:06:52,090 ‫And in Oracle I think it's called gather schema stats and in SQL Server it's called something else and 110 00:06:52,090 --> 00:06:56,110 ‫that's updating the schema stats are very, very critical in this particular case. 111 00:06:56,110 --> 00:06:57,640 ‫So I thought a very good question. 112 00:06:58,060 --> 00:07:02,740 ‫I thought I'll talk about it here because it's it's really something you have to watch out for. 113 00:07:02,740 --> 00:07:07,000 ‫And very good observation correctly. 114 00:07:07,000 --> 00:07:10,270 ‫I like that you guys are actually looking at the plans. 115 00:07:10,270 --> 00:07:12,670 ‫It says, wait a minute, I have I'm using an index. 116 00:07:12,670 --> 00:07:12,970 ‫Right. 117 00:07:12,970 --> 00:07:14,370 ‫Why is it not using an index? 118 00:07:14,380 --> 00:07:16,420 ‫It's a very smart thing. 119 00:07:16,660 --> 00:07:19,480 ‫Even with an index only scan is not fast enough. 120 00:07:19,480 --> 00:07:20,140 ‫Right. 121 00:07:20,140 --> 00:07:24,880 ‫To to to fetch the index compared to just having the one page. 122 00:07:24,880 --> 00:07:25,240 ‫Right. 123 00:07:25,480 --> 00:07:31,030 ‫If you add a few rows to the table, more zeros at 10,000 rolls, you'll immediately feel. 124 00:07:31,600 --> 00:07:35,290 ‫I hope you guys enjoyed this Q&A lecture. 125 00:07:35,460 --> 00:07:36,280 ‫See in the next one. 126 00:07:36,790 --> 00:07:37,270 ‫Goodbye.