1 00:00:00,210 --> 00:00:06,210 ‫In this video, we'll look through how the database optimizer decides which index to use, especially 2 00:00:06,210 --> 00:00:08,070 ‫if there are multiple indexes. 3 00:00:08,670 --> 00:00:15,450 ‫This is very useful if you want to know whether your table needs an extra index or not, because if 4 00:00:15,450 --> 00:00:21,030 ‫you have multiple indexes, how do you know which index that database is going to use? 5 00:00:21,030 --> 00:00:23,150 ‫Is not going to use all of them all the time. 6 00:00:23,160 --> 00:00:24,240 ‫It really depends. 7 00:00:24,480 --> 00:00:30,270 ‫This video again explained this kind of footprint kind of cases where this video is not going to discuss 8 00:00:30,270 --> 00:00:34,260 ‫the combined index is one index that have multiple columns. 9 00:00:34,260 --> 00:00:35,940 ‫Right, because I already discussed that. 10 00:00:35,940 --> 00:00:39,360 ‫So let's say we have a beautiful table with just two columns. 11 00:00:39,690 --> 00:00:48,240 ‫If one if two column one has an index, if one index and column two has an index, F2 index, and we're 12 00:00:48,240 --> 00:00:54,000 ‫going to do a select star from the table where if one equals one and if two equal four. 13 00:00:54,190 --> 00:01:00,420 ‫So let's go through some scenarios to see how the database optimizer decide to play. 14 00:01:00,600 --> 00:01:03,720 ‫Should I use if one index or if two index? 15 00:01:03,990 --> 00:01:08,610 ‫There are so many ways to solve this to look for these values. 16 00:01:08,610 --> 00:01:08,820 ‫Right. 17 00:01:09,210 --> 00:01:13,920 ‫So let's take a little case number one, because number one is when the database optimizer decides to 18 00:01:13,920 --> 00:01:15,060 ‫use both of them. 19 00:01:15,090 --> 00:01:24,660 ‫So what it does is it queries if one idea looking for the value one in this case, finding all the row 20 00:01:24,660 --> 00:01:30,360 ‫IDs that matches and tuples in case of Postgres Reuters, in case of Oracle, and it goes over and it 21 00:01:30,360 --> 00:01:31,950 ‫collects all these are all ideas. 22 00:01:32,520 --> 00:01:40,590 ‫And then it does the same exact search, but with a value for on if IDEX so searches and collects a 23 00:01:40,590 --> 00:01:46,740 ‫different set of priorities and in case of an end it will essentially do an intersection like, OK, 24 00:01:46,740 --> 00:01:52,110 ‫what is the values that are in here, not in here, and then essentially merges the results and or is 25 00:01:52,110 --> 00:01:54,430 ‫going to result in essentially unioned them. 26 00:01:54,990 --> 00:01:58,110 ‫So that's that's how it does the effect of ResultSet. 27 00:01:58,110 --> 00:02:00,660 ‫The row ideas are are collected. 28 00:02:00,660 --> 00:02:01,520 ‫That's the final result. 29 00:02:01,530 --> 00:02:07,740 ‫You go to the table if necessary, obviously, and you collect the actual values from the table, from 30 00:02:07,740 --> 00:02:08,250 ‫the heap. 31 00:02:08,250 --> 00:02:08,620 ‫Right. 32 00:02:08,640 --> 00:02:11,220 ‫So, one, does it actually do this? 33 00:02:11,400 --> 00:02:19,530 ‫It really depends if the values are not too small of the if we know that we're not going to get so little 34 00:02:19,530 --> 00:02:24,960 ‫of a result, because if we're going to get a little result, it's not worth searching both indexes. 35 00:02:25,080 --> 00:02:26,330 ‫One is enough, right. 36 00:02:26,340 --> 00:02:30,930 ‫Especially in case of an end in cases gets really tedious. 37 00:02:31,650 --> 00:02:40,060 ‫And if we get too large of a result, we know that the end is going to turn so many rows, so many row 38 00:02:40,080 --> 00:02:40,590 ‫ideas. 39 00:02:40,770 --> 00:02:43,770 ‫It's just not worth looking through the index in this case. 40 00:02:43,770 --> 00:02:45,590 ‫We're going to go through the table index. 41 00:02:45,870 --> 00:02:46,110 ‫Right. 42 00:02:46,380 --> 00:02:48,450 ‫So it really, really depends on. 43 00:02:50,100 --> 00:02:54,210 ‫And that's all heuristics the database uses to decide what what of this. 44 00:02:54,210 --> 00:02:54,460 ‫Right. 45 00:02:54,480 --> 00:02:55,460 ‫So that's case one. 46 00:02:55,470 --> 00:03:01,110 ‫So let's talk about case to the case, too, is when the optimizer decides to use one or index over 47 00:03:01,110 --> 00:03:01,950 ‫the other. 48 00:03:01,980 --> 00:03:02,860 ‫So let's take an example. 49 00:03:02,860 --> 00:03:04,260 ‫We're going to show an example right here. 50 00:03:05,310 --> 00:03:13,530 ‫The database decides to use if one index only search all the rows for the value of one and then we collect 51 00:03:13,530 --> 00:03:17,310 ‫all the rows and then don't use the second index. 52 00:03:17,720 --> 00:03:25,620 ‫And let's just go immediately to the table, which those rows and then do a recondition really filter 53 00:03:25,620 --> 00:03:28,530 ‫the results based on F two equal four. 54 00:03:28,530 --> 00:03:28,870 ‫Right. 55 00:03:29,220 --> 00:03:38,400 ‫So databases do usually do that when they know that if two index returns so many rolls while F one returns 56 00:03:38,400 --> 00:03:44,520 ‫very few rows and the condition is an end, because if I know it's an end, then I know, of course 57 00:03:44,520 --> 00:03:50,790 ‫I'm going to go with a smaller one because anything that doesn't match if one right is out of the ResultSet. 58 00:03:50,790 --> 00:03:51,060 ‫Right. 59 00:03:51,060 --> 00:03:53,550 ‫So it's going to go off of one example is a primary. 60 00:03:54,000 --> 00:04:00,930 ‫If I have a primary, the database almost always going to use the Primerica index in case of an and 61 00:04:00,930 --> 00:04:04,530 ‫again or then or it's a different situation because you get more results. 62 00:04:05,220 --> 00:04:05,550 ‫Right. 63 00:04:05,700 --> 00:04:10,190 ‫And then go back to the table directly and fetch the results we want to write. 64 00:04:10,410 --> 00:04:18,750 ‫It's just not worth searching to indexes because trees are not cheap guys and even cemeteries are not 65 00:04:18,750 --> 00:04:19,020 ‫cheap. 66 00:04:19,020 --> 00:04:20,600 ‫Indexes are not cheap to search. 67 00:04:20,850 --> 00:04:21,210 ‫Yeah. 68 00:04:21,210 --> 00:04:26,730 ‫If you're searching, if you know that you're going to get so, so few of a values, then it's worth 69 00:04:26,730 --> 00:04:27,150 ‫searching. 70 00:04:27,150 --> 00:04:32,700 ‫Then if you know that you're going to get a lot of rows back, then it's not worth searching. 71 00:04:32,700 --> 00:04:34,860 ‫The index and maybe table scan is actually better. 72 00:04:35,070 --> 00:04:40,320 ‫And you might say, how do you know how does it know that, hey, this index, again, results so many 73 00:04:40,320 --> 00:04:43,440 ‫rows versus this search is going to result into fewer? 74 00:04:43,740 --> 00:04:45,480 ‫Well, there's something called the statistics. 75 00:04:45,810 --> 00:04:48,720 ‫The stats real, really powerful. 76 00:04:48,720 --> 00:04:52,260 ‫The database keeps stats on the table, says, okay, here's the table. 77 00:04:52,380 --> 00:04:54,810 ‫Here's approximately how many rows there. 78 00:04:55,050 --> 00:04:57,200 ‫Here's approximately how many ones are there. 79 00:04:57,360 --> 00:04:58,740 ‫Here's some of how many. 80 00:05:00,030 --> 00:05:06,960 ‫Wreaths are there and all this kind of value, so they are not 100 percent correct, but they give really 81 00:05:07,200 --> 00:05:13,140 ‫a lot of values and you can also always update these statistics with a common goal and allies or sometimes 82 00:05:13,410 --> 00:05:15,390 ‫I think gather statistics and oracle. 83 00:05:16,020 --> 00:05:16,200 ‫Right. 84 00:05:16,320 --> 00:05:18,060 ‫So that's the idea really. 85 00:05:18,060 --> 00:05:20,260 ‫Again, depends on an and or here. 86 00:05:20,280 --> 00:05:27,390 ‫But here's one example where we use one index over the other, just one and then filter. 87 00:05:27,540 --> 00:05:29,170 ‫So an index with a filter, that's case two. 88 00:05:29,220 --> 00:05:34,650 ‫And case number three is just the database decides, you know what, you guys suck. 89 00:05:34,650 --> 00:05:35,870 ‫Both indexes suck. 90 00:05:35,880 --> 00:05:40,470 ‫I'm not going to allow the search to get for the value of one equal one. 91 00:05:40,470 --> 00:05:46,080 ‫And if two equals four is just going to return to the whole table, almost the whole table, three quarters 92 00:05:46,080 --> 00:05:46,730 ‫of the table. 93 00:05:47,250 --> 00:05:49,500 ‫I did my math and this is what I think. 94 00:05:49,770 --> 00:05:50,970 ‫So guess what? 95 00:05:51,120 --> 00:05:52,340 ‫I'm not going to use both of you. 96 00:05:52,500 --> 00:05:57,150 ‫I'm going to already go to the index because I need to go there anyway, because I'm going to select 97 00:05:57,150 --> 00:06:00,750 ‫other columns that Airbus decides to search the social list. 98 00:06:00,780 --> 00:06:04,590 ‫So many rules, just like let's go through the table directly into a filter which is much cheaper. 99 00:06:04,830 --> 00:06:10,830 ‫And some databases like Bullsbrook does a threatening multithreaded and issue so many workers so he 100 00:06:10,830 --> 00:06:14,580 ‫can do multiple workers stood to scan the table. 101 00:06:14,820 --> 00:06:15,050 ‫Right. 102 00:06:15,210 --> 00:06:18,920 ‫So table statistics are very, very critical here. 103 00:06:18,930 --> 00:06:27,930 ‫So if you got some wrong result, right, let's say here's where things get nasty sometimes. 104 00:06:29,670 --> 00:06:31,890 ‫I got beat so many times on this. 105 00:06:32,320 --> 00:06:33,240 ‫Here's an example. 106 00:06:33,510 --> 00:06:36,840 ‫You bring a table, very fresh table. 107 00:06:36,840 --> 00:06:37,230 ‫Right. 108 00:06:37,230 --> 00:06:38,230 ‫Empty, right. 109 00:06:38,310 --> 00:06:40,320 ‫So that is statistics, almost zero. 110 00:06:40,320 --> 00:06:41,850 ‫It knows that a table is empty. 111 00:06:42,000 --> 00:06:46,620 ‫You them to enroll you in two rows, three rows, and then the tables get updated. 112 00:06:46,620 --> 00:06:50,310 ‫Every asynchronously right now knows it has only three rows. 113 00:06:50,670 --> 00:06:56,850 ‫And then you do an operation to insert three million or 300 million rows just in bulk. 114 00:06:57,000 --> 00:07:00,600 ‫And you're just immediately after you've inserted you just do queries. 115 00:07:00,870 --> 00:07:03,000 ‫You know what will happen in this case? 116 00:07:03,270 --> 00:07:09,090 ‫The is if you don't do the statistics, it will you will execute some something like this and we'll 117 00:07:09,090 --> 00:07:11,720 ‫say, hey, what should I do? 118 00:07:12,240 --> 00:07:13,460 ‫Let's let's look at this. 119 00:07:13,680 --> 00:07:19,800 ‫Oh, this table has only three rows, so it's always easier to scan it fully. 120 00:07:19,980 --> 00:07:27,240 ‫And as this case is going to do a full table scan, pulling all the 300 million rows to look through 121 00:07:27,240 --> 00:07:33,740 ‫this, I've I've got beaten in so many times by this because I forgot to update this or immediately 122 00:07:33,740 --> 00:07:36,000 ‫I, I did my queries after this. 123 00:07:36,000 --> 00:07:42,720 ‫So always, if you're using Postgres, do analyze was the other command vacuum, fold, vacuum to clean 124 00:07:42,720 --> 00:07:44,010 ‫up all that garbage. 125 00:07:44,310 --> 00:07:50,790 ‫And then, and then after that just I do an analyze in Oracle use gathers that the schema statistics 126 00:07:50,790 --> 00:07:53,970 ‫and SQL Server I forgot was the correct execute planner. 127 00:07:53,970 --> 00:07:56,190 ‫I forgot what it was called for the SQL Server. 128 00:07:56,190 --> 00:07:59,580 ‫But yeah, you can do all this kind of stuff guys, essentially. 129 00:07:59,940 --> 00:08:06,360 ‫And for those who want to advance stuff, if you know what you're doing, if you really know what you're 130 00:08:06,360 --> 00:08:08,220 ‫doing, you can use database hinting. 131 00:08:09,000 --> 00:08:16,890 ‫You can in the query include some sort of comments like this one is the one you see on the screen to 132 00:08:16,890 --> 00:08:23,910 ‫kind of force that database to say, hey, you database, you're dumb, you don't know anything. 133 00:08:24,240 --> 00:08:25,230 ‫Please trust me. 134 00:08:25,530 --> 00:08:35,130 ‫I want you to always use the index of EF1 or use the index of F2 because I have more knowledge than 135 00:08:35,130 --> 00:08:41,340 ‫you do because of certain situation, because my application I know what I just did. 136 00:08:42,030 --> 00:08:48,120 ‫You are not caught up to what I just did as a as a database. 137 00:08:48,450 --> 00:08:53,430 ‫So trust me, your stats are not up to date to what I am about to do. 138 00:08:53,490 --> 00:08:57,210 ‫Plus maybe a human in this case are way smarter than machines. 139 00:08:57,600 --> 00:09:00,690 ‫And unless I picks up anytime soon, I don't know. 140 00:09:00,900 --> 00:09:02,010 ‫I don't think it will. 141 00:09:02,130 --> 00:09:03,510 ‫But anytime soon. 142 00:09:03,510 --> 00:09:04,620 ‫So it is going to take time. 143 00:09:05,070 --> 00:09:05,370 ‫Right, guys? 144 00:09:05,570 --> 00:09:06,450 ‫That's it for me today. 145 00:09:06,450 --> 00:09:11,040 ‫Very quick video talking about how these databases use the indexing. 146 00:09:11,040 --> 00:09:12,270 ‫Very, very critical. 147 00:09:12,270 --> 00:09:16,500 ‫Again, I miss I omitted so many details, for example. 148 00:09:16,500 --> 00:09:23,910 ‫So we'll give we'll give this will open up for questions and we'll have some discussions on on what 149 00:09:23,910 --> 00:09:25,230 ‫if there is an order? 150 00:09:25,230 --> 00:09:26,250 ‫There is a join. 151 00:09:26,460 --> 00:09:28,350 ‫It completely changed that dynamic. 152 00:09:28,350 --> 00:09:28,560 ‫Right. 153 00:09:28,590 --> 00:09:31,620 ‫Is the building a database is not something easy. 154 00:09:31,620 --> 00:09:33,480 ‫It's really, really difficult. 155 00:09:33,870 --> 00:09:40,620 ‫But these tools will help you while you're scanning query at least you know, hey, I know that if one 156 00:09:40,620 --> 00:09:46,800 ‫is not going to have a lot of value, is going to have a let's say is going to have a lot of values. 157 00:09:46,800 --> 00:09:48,450 ‫I say have one, two, three. 158 00:09:48,450 --> 00:09:51,200 ‫That's only the unique value is going to have an aura. 159 00:09:51,540 --> 00:09:51,840 ‫I don't. 160 00:09:51,890 --> 00:09:53,550 ‫Let's take a real example. 161 00:09:56,570 --> 00:10:01,340 ‫Let's say you're building a customer database, right, and you have the states right in the United 162 00:10:01,340 --> 00:10:05,570 ‫States, like California, Texas, whatever, all of them. 163 00:10:05,570 --> 00:10:05,820 ‫Right. 164 00:10:06,110 --> 00:10:08,540 ‫And let's say your database is almost. 165 00:10:09,770 --> 00:10:15,770 ‫Exclusively, all people from California, right, and you have an index on the state field, obviously, 166 00:10:15,770 --> 00:10:20,690 ‫if all of them are California, that's just that's just a useless index right there. 167 00:10:20,690 --> 00:10:26,420 ‫Because, like, what is going to give me is if I search where state is California, that's the entire 168 00:10:26,420 --> 00:10:26,810 ‫table. 169 00:10:26,870 --> 00:10:27,320 ‫Right. 170 00:10:27,650 --> 00:10:37,190 ‫So but let's say you have three customers on Texas and one from Florida searching that one, that Florida 171 00:10:37,460 --> 00:10:42,620 ‫is way faster than searching the 300 million, 300 million. 172 00:10:42,780 --> 00:10:44,450 ‫That's the entire United States. 173 00:10:45,020 --> 00:10:49,400 ‫But do you get an idea if you have one value or few values? 174 00:10:49,670 --> 00:10:56,780 ‫Indexes are beautiful for this rhyme, but if you have so much rolls that you're coming back, that's 175 00:10:56,780 --> 00:10:59,200 ‫where you play the game of multiple indexes. 176 00:10:59,450 --> 00:10:59,930 ‫You know what? 177 00:11:00,230 --> 00:11:04,400 ‫I know that this rule, this index is going to give me this much rows back. 178 00:11:04,670 --> 00:11:08,060 ‫I'm going to slash it down with another index. 179 00:11:08,270 --> 00:11:09,650 ‫So let's put that. 180 00:11:09,660 --> 00:11:14,690 ‫But you have to think of how sparse your table looks like, right. 181 00:11:14,690 --> 00:11:15,960 ‫And in these kind of cases. 182 00:11:15,980 --> 00:11:16,180 ‫All right. 183 00:11:16,200 --> 00:11:17,510 ‫I'm going to leave you there, guys. 184 00:11:17,510 --> 00:11:19,730 ‫I'm going to see in the next one, you guys say goodbye.