1 00:00:00,120 --> 00:00:00,930 ‫Welcome, everybody. 2 00:00:00,930 --> 00:00:11,580 ‫This is the Q&A for April 2022 for the the ABC engineering course. 3 00:00:12,690 --> 00:00:18,360 ‫We've been doing this for a few months now and it's been very successful and very useful for people. 4 00:00:18,960 --> 00:00:24,000 ‫And I believe it's always good to be like Samir Sumaidaie, interactive like this. 5 00:00:25,980 --> 00:00:34,080 ‫So we receive a good amount of questions on this course because it became very popular and it's only 6 00:00:34,380 --> 00:00:40,740 ‫fair to pick one or two questions every month to answer them in depth. 7 00:00:41,040 --> 00:00:43,140 ‫So I answer almost every question. 8 00:00:43,140 --> 00:00:52,460 ‫I to my to my extent and to my time allocated, you know, but some some answers you can just write 9 00:00:52,470 --> 00:01:00,120 ‫just he needs time to think about it and write about and some some of them, like, deserve, you know, 10 00:01:00,600 --> 00:01:03,930 ‫time like this and to to explain it better. 11 00:01:04,800 --> 00:01:05,700 ‫And this is one of them. 12 00:01:05,760 --> 00:01:07,710 ‫So this comes from Dominique. 13 00:01:07,710 --> 00:01:18,600 ‫And Dominique is asking about the implications of none key column indexes in real world database systems. 14 00:01:18,990 --> 00:01:23,790 ‫Would it be best practice to create and drop new indexes with non key columns for short term purposes? 15 00:01:25,090 --> 00:01:31,440 ‫Or would this be a decision I make once the setting up the table when I launch a new application utilizing 16 00:01:31,440 --> 00:01:31,920 ‫the table? 17 00:01:32,580 --> 00:01:38,330 ‫And the second question is, is it really advisable to drop indexes of tables which might have various 18 00:01:38,340 --> 00:01:41,670 ‫different use cases to different users? 19 00:01:43,080 --> 00:01:44,470 ‫So here's, here's what I want to say. 20 00:01:45,340 --> 00:01:52,830 ‫The the question, while we'll kind of relate to non key indexes, it's kind of it's a it covers all 21 00:01:52,830 --> 00:01:53,760 ‫type of indexes. 22 00:01:54,570 --> 00:01:59,940 ‫Is it do you need some time to create indexes just for specific applications? 23 00:02:01,800 --> 00:02:12,690 ‫Sure, you can you can create an index if you seen an app follows a path, a query path that is not 24 00:02:12,690 --> 00:02:14,010 ‫covered by an index. 25 00:02:14,730 --> 00:02:16,470 ‫You might want to create an index to do that. 26 00:02:18,620 --> 00:02:23,630 ‫The question is how often this AB is executed because the moment you get an index, you've got to go 27 00:02:23,630 --> 00:02:27,740 ‫with the maintenance of the index while the maintenance is not really that bad. 28 00:02:28,130 --> 00:02:30,470 ‫If that, if you have like a few million rows. 29 00:02:30,710 --> 00:02:38,660 ‫But if you go to two multimillion rows and more than them, the index maintenance becomes really a problematic. 30 00:02:38,870 --> 00:02:43,730 ‫What I want to mention here is the real world implications of non key indexes. 31 00:02:45,250 --> 00:02:49,010 ‫So nike.com indexes are usually created when you. 32 00:02:49,990 --> 00:02:55,300 ‫You have an index that is used in equity path and that query path. 33 00:02:56,370 --> 00:03:02,010 ‫Will efficiently use the index to get to collection of results. 34 00:03:02,580 --> 00:03:02,880 ‫No. 35 00:03:03,510 --> 00:03:06,810 ‫But then the query asked for. 36 00:03:08,020 --> 00:03:11,920 ‫Certain columns that are obviously not indexed most of the time. 37 00:03:12,250 --> 00:03:12,570 ‫And. 38 00:03:14,850 --> 00:03:20,130 ‫If this collection that is returned is significantly large. 39 00:03:21,310 --> 00:03:29,680 ‫Then with the database has to do is take the squeeze that have been resulted because we found that the 40 00:03:29,680 --> 00:03:31,090 ‫results are from the index. 41 00:03:31,750 --> 00:03:41,980 ‫Now we have to go to the heap or to the clustered table and query to pull the full pages. 42 00:03:42,460 --> 00:03:42,750 ‫No. 43 00:03:43,810 --> 00:03:47,230 ‫And in a heap based. 44 00:03:49,060 --> 00:03:54,850 ‫Databases such as Postgres, where by default there is no clustering. 45 00:03:55,800 --> 00:03:57,980 ‫You know what the index will do? 46 00:03:57,990 --> 00:04:06,000 ‫It will say, okay, here are the pages that I need, because it knows that this this index, this topologies 47 00:04:06,000 --> 00:04:07,140 ‫belong to this page. 48 00:04:07,920 --> 00:04:14,580 ‫So it needs to fetch all the pages in the heap and then pull these pages and then. 49 00:04:15,880 --> 00:04:16,960 ‫From these pages. 50 00:04:18,050 --> 00:04:24,530 ‫It will take the columns it needs effectively write. 51 00:04:25,490 --> 00:04:30,670 ‫In addition to the things that is selected and that is slow, right? 52 00:04:30,840 --> 00:04:35,780 ‫Because because you didn't you just use the index as a path to get to the rules. 53 00:04:35,780 --> 00:04:42,140 ‫And if these things are random and believe me, they will, because what would you find is a bunch of 54 00:04:42,140 --> 00:04:42,770 ‫tuples. 55 00:04:43,610 --> 00:04:45,410 ‫And these table are all over the place. 56 00:04:45,650 --> 00:04:48,770 ‫They're not as nicely ordered as in the index. 57 00:04:49,190 --> 00:04:50,420 ‫They'll be all over the place. 58 00:04:50,990 --> 00:04:53,300 ‫So you'll be calling all over the place. 59 00:04:53,810 --> 00:04:58,370 ‫You'll be having a lot of random reads to the heap. 60 00:04:59,660 --> 00:05:10,070 ‫And that is very expensive, and especially expensive in a clustered table where you're going to do 61 00:05:10,070 --> 00:05:19,460 ‫ab3 scans right to two because the on the primary key in order to fetch these pages or the results. 62 00:05:20,090 --> 00:05:26,390 ‫So it is it is quite expensive to do that and especially if it's one or two results, not big deal. 63 00:05:26,390 --> 00:05:32,000 ‫But if you're furnishing hundreds or thousands of results, then you are doing a see this random reads 64 00:05:32,510 --> 00:05:38,960 ‫on the table which starts as the random IO, which obviously hits the disk and slows things down. 65 00:05:41,660 --> 00:05:45,920 ‫You probably get a miss the cache or escalation other caches in the process. 66 00:05:47,000 --> 00:05:56,660 ‫So what do you do is take the columns that you would have selected and recreate the index with those 67 00:05:56,660 --> 00:05:58,960 ‫columns as non key. 68 00:06:00,350 --> 00:06:09,590 ‫This way when you do your nice scan of the index right and you found the candidate rows, you don't 69 00:06:10,160 --> 00:06:12,470 ‫you just didn't just find the candidate rows. 70 00:06:12,470 --> 00:06:21,230 ‫You found all the columns you need in line in the index itself, which is very, very powerful. 71 00:06:21,680 --> 00:06:23,450 ‫Now you're code is done. 72 00:06:23,870 --> 00:06:25,790 ‫It's an index only scan. 73 00:06:26,210 --> 00:06:32,330 ‫You don't have to go back to the heap to do do those random zeros. 74 00:06:32,810 --> 00:06:33,050 ‫Right. 75 00:06:33,050 --> 00:06:41,450 ‫Or the primary key clustered index to do these random files and your performance will be tripled if 76 00:06:41,450 --> 00:06:45,560 ‫not even more high, depending on the size of the results they're using. 77 00:06:45,800 --> 00:06:48,980 ‫And then you might say, I don't really return 10,000 results. 78 00:06:49,370 --> 00:06:49,940 ‫You don't? 79 00:06:50,120 --> 00:06:57,590 ‫But don't you have some queries that has select in Select and the inner selector returns a lot of rows. 80 00:06:57,590 --> 00:07:00,110 ‫But the other select just picks few one of them. 81 00:07:00,830 --> 00:07:06,530 ‫Right and then returns those some of these inner select queries can with a lot of rows. 82 00:07:06,530 --> 00:07:16,760 ‫So a few if you can do this efficiently, you will have minimize the is and so you things that would 83 00:07:16,760 --> 00:07:22,250 ‫have exacerbate the problem will be much much isolated and minimized. 84 00:07:22,910 --> 00:07:28,850 ‫So yeah and and key is definitely find it to close this and key indexes. 85 00:07:29,910 --> 00:07:33,090 ‫Said he was in production all the time that we have been. 86 00:07:33,090 --> 00:07:36,480 ‫I've been in this for 17 years and I always see none. 87 00:07:36,480 --> 00:07:39,240 ‫Key indexes in production systems. 88 00:07:39,510 --> 00:07:39,810 ‫Yeah. 89 00:07:41,940 --> 00:07:42,480 ‫It just. 90 00:07:42,480 --> 00:07:43,330 ‫It just makes sense. 91 00:07:43,680 --> 00:07:43,920 ‫Yeah. 92 00:07:44,400 --> 00:07:45,420 ‫What's the cost? 93 00:07:45,600 --> 00:07:47,130 ‫That is all the cost of everything. 94 00:07:48,420 --> 00:07:52,080 ‫The cost is space, and the cost is I. 95 00:07:52,090 --> 00:07:57,170 ‫Oh, because now the token will be plus threes. 96 00:07:57,180 --> 00:08:01,350 ‫And if you're listening to this, you probably watch the whole course, which is good. 97 00:08:02,070 --> 00:08:04,500 ‫So B plus three, everything in the leaf. 98 00:08:04,500 --> 00:08:06,090 ‫So I ose. 99 00:08:06,090 --> 00:08:08,010 ‫Now you're going to need more. 100 00:08:08,010 --> 00:08:08,970 ‫I always. 101 00:08:10,650 --> 00:08:11,400 ‫Two Fitch. 102 00:08:11,700 --> 00:08:14,220 ‫The B three lineage. 103 00:08:14,520 --> 00:08:15,540 ‫The leaf trees. 104 00:08:17,560 --> 00:08:17,950 ‫Is it? 105 00:08:17,950 --> 00:08:18,280 ‫A bit. 106 00:08:18,910 --> 00:08:19,510 ‫Is that. 107 00:08:19,720 --> 00:08:20,590 ‫Is it a big deal? 108 00:08:21,460 --> 00:08:22,480 ‫Not necessarily. 109 00:08:22,600 --> 00:08:23,380 ‫Yeah, it is. 110 00:08:23,720 --> 00:08:24,790 ‫How bad it is? 111 00:08:25,180 --> 00:08:25,880 ‫I wouldn't tell. 112 00:08:25,900 --> 00:08:28,110 ‫I can't really tell you how bad it is. 113 00:08:28,480 --> 00:08:28,810 ‫Month. 114 00:08:29,020 --> 00:08:32,410 ‫It's probably insignificant, but it is always a cost. 115 00:08:32,500 --> 00:08:32,770 ‫Right. 116 00:08:33,020 --> 00:08:37,810 ‫And then to the need to try and see was the cost of everything happened there. 117 00:08:38,260 --> 00:08:45,040 ‫I got a very quick video today for the April 20, 22 cook you're going to see in the next one.