1 00:00:00,180 --> 00:00:07,410 ‫So I want to spend a few moments to talk about the differences between a primary key and secondary key 2 00:00:07,710 --> 00:00:11,760 ‫or something called a primary index or a secondary index. 3 00:00:11,790 --> 00:00:13,320 ‫They are almost analogous. 4 00:00:15,790 --> 00:00:23,200 ‫You might think you know the differences, but they are very subtle and I want to talk about those two 5 00:00:23,200 --> 00:00:31,440 ‫concepts and database system within the context of actual database platforms out there, Posterous, 6 00:00:31,840 --> 00:00:42,310 ‫MySQL, Oracle, and how these how and how when we actually say a primary key, you get so much knowledge 7 00:00:42,310 --> 00:00:49,190 ‫out of that word that will actually make you ask questions. 8 00:00:49,720 --> 00:00:50,770 ‫It's not just 9 00:00:53,380 --> 00:00:57,610 ‫that the fact that the primary key is unique, that is true. 10 00:00:58,510 --> 00:01:09,130 ‫But it also gives you other kind of implicit information that will give you the necessary understanding 11 00:01:09,700 --> 00:01:16,630 ‫to optimize your queries to to know what you're doing when you actually executing queries against a 12 00:01:16,630 --> 00:01:18,940 ‫primary key or a secondary key. 13 00:01:19,390 --> 00:01:23,290 ‫How about we jump into it in order to explain what a primary key is? 14 00:01:23,590 --> 00:01:32,350 ‫We really need to first explain the concept of the table spaces or the heap. 15 00:01:32,860 --> 00:01:42,760 ‫When we're talking about a store where you have rows in a table, we store, we dedicate area and desk. 16 00:01:43,390 --> 00:01:52,270 ‫We call it usually the heap in both squares or even in database as well, which is usually a slow access 17 00:01:52,570 --> 00:01:53,140 ‫space. 18 00:01:53,140 --> 00:02:01,000 ‫Not much these days, but where where all the data expensive data, large it is, sits there and the 19 00:02:01,000 --> 00:02:02,650 ‫table is organized. 20 00:02:02,650 --> 00:02:06,340 ‫Basically, Roblero, there is no. 21 00:02:07,620 --> 00:02:09,850 ‫Order to a table. 22 00:02:10,230 --> 00:02:17,340 ‫We don't enforce ordering by default, assuming no primaries or anything like that, you're just inserting 23 00:02:17,340 --> 00:02:20,170 ‫data and that rose come after each other. 24 00:02:20,190 --> 00:02:26,010 ‫So if you insert the value of seven goes top, if you add to the value of one goes to the bottom. 25 00:02:26,250 --> 00:02:28,230 ‫And then if you add that another value of. 26 00:02:29,780 --> 00:02:33,750 ‫Two is still goes to the bottom, it gets keeps spending. 27 00:02:34,300 --> 00:02:37,680 ‫So there is no order that is maintained by default. 28 00:02:39,410 --> 00:02:50,540 ‫Now, when we add the concept of a primary key, what we do to the table is we do something called clustering. 29 00:02:51,230 --> 00:03:01,070 ‫And clustering is the idea of organizing the table around that key so beautiful we have to maintain 30 00:03:01,310 --> 00:03:01,970 ‫order. 31 00:03:02,360 --> 00:03:08,570 ‫And by default, the rows that you insert must fit that order. 32 00:03:09,470 --> 00:03:15,050 ‫That means there is an additional cost associated with this ordering. 33 00:03:16,610 --> 00:03:19,790 ‫The good thing we get because of this is like. 34 00:03:21,010 --> 00:03:26,910 ‫It's an almost like an index in the table itself, if you think about it, but it's organized that way. 35 00:03:29,180 --> 00:03:39,590 ‫That is why in Oracle, they call it Iot or Iot index organized table that I'm not to be confused with 36 00:03:39,590 --> 00:03:46,610 ‫the Internet of Things that other Iot and in in Postgres. 37 00:03:47,710 --> 00:03:54,210 ‫Or other data, because they call it a clustered index, or when you say a primary key in general, 38 00:03:54,520 --> 00:03:56,020 ‫that's what we understand. 39 00:03:56,020 --> 00:04:00,820 ‫We there is there is an index and then in the heap. 40 00:04:01,210 --> 00:04:01,540 ‫Right. 41 00:04:01,780 --> 00:04:05,650 ‫And the heap is organized around that index. 42 00:04:05,860 --> 00:04:06,790 ‫Let's take an example. 43 00:04:07,220 --> 00:04:09,610 ‫Let's say Ion's of the value of one. 44 00:04:09,610 --> 00:04:10,060 ‫Right. 45 00:04:10,060 --> 00:04:15,140 ‫And that value of one is my unique primary key. 46 00:04:15,340 --> 00:04:20,770 ‫So there will be the value of one and all the the rest of the row, like you have a first name and last 47 00:04:20,770 --> 00:04:20,980 ‫name. 48 00:04:21,220 --> 00:04:23,250 ‫It just goes there as well. 49 00:04:23,260 --> 00:04:25,570 ‫It will be organized in that manner. 50 00:04:25,990 --> 00:04:34,480 ‫So now if I insert the value of eight right, it will go right under it. 51 00:04:34,480 --> 00:04:34,860 ‫Right. 52 00:04:35,200 --> 00:04:46,300 ‫But if you now and so the value of to that value of two has to to sit right after the value of one. 53 00:04:46,840 --> 00:04:47,220 ‫Right. 54 00:04:47,470 --> 00:04:47,890 ‫So. 55 00:04:49,180 --> 00:04:56,590 ‫You you have to kind of push the value of eight below so you can sneak in the value of two because they 56 00:04:56,590 --> 00:05:01,300 ‫have to be clustered together, that you have to maintain order. 57 00:05:01,450 --> 00:05:06,970 ‫And as a result, if you obviously databases don't shift and left like this. 58 00:05:06,970 --> 00:05:07,150 ‫Right. 59 00:05:07,160 --> 00:05:08,080 ‫They are smart. 60 00:05:08,080 --> 00:05:08,580 ‫They do. 61 00:05:08,830 --> 00:05:11,920 ‫They have the page that they inserted. 62 00:05:11,920 --> 00:05:13,780 ‫They know, OK, value for in value of eight. 63 00:05:13,780 --> 00:05:15,160 ‫I'm not going to put it in the same page. 64 00:05:15,160 --> 00:05:20,160 ‫I'm going to leave some space for the value of two, three, four or five, six and seven because there 65 00:05:20,190 --> 00:05:21,750 ‫are some values that need to be there. 66 00:05:21,970 --> 00:05:23,400 ‫So the databases are smart. 67 00:05:23,410 --> 00:05:23,860 ‫They're not. 68 00:05:24,040 --> 00:05:27,850 ‫This is I'm thinking about the naive implementation of things. 69 00:05:28,750 --> 00:05:32,020 ‫And then when you do that, you will get that. 70 00:05:33,460 --> 00:05:40,290 ‫You will get that extra cost, but the best thing that you can do with clustered indexes or primary 71 00:05:40,290 --> 00:05:44,280 ‫keys is speed because especially with the range. 72 00:05:44,640 --> 00:05:49,320 ‫So let's say if I if I asked you to give me all the values from one to nine. 73 00:05:49,620 --> 00:05:50,100 ‫Right. 74 00:05:50,610 --> 00:05:54,750 ‫Especially range, that is almost like a one single eye. 75 00:05:54,760 --> 00:05:59,810 ‫Or you go to the heap, which is the in this case, the index, and it's beautifully organized. 76 00:06:00,000 --> 00:06:07,170 ‫So if you ask for these values, you get one Io, you get all these calls because they are nicely talk 77 00:06:07,170 --> 00:06:07,690 ‫together. 78 00:06:08,310 --> 00:06:19,350 ‫However, in a configuration where you have a secondary index, that means the table is a jumbled mess, 79 00:06:19,740 --> 00:06:21,990 ‫the value of one and then seven and an eighth. 80 00:06:21,990 --> 00:06:22,710 ‫And then seven. 81 00:06:22,950 --> 00:06:24,390 ‫Seven hundred and three. 82 00:06:24,660 --> 00:06:26,040 ‫And then the better value one. 83 00:06:26,490 --> 00:06:28,600 ‫Then they are not organized. 84 00:06:28,920 --> 00:06:29,130 ‫Right. 85 00:06:29,390 --> 00:06:33,300 ‫Again, I'm talking about a primary key that is random in nature here. 86 00:06:33,750 --> 00:06:35,560 ‫Not all primary keys are random. 87 00:06:36,090 --> 00:06:43,590 ‫That's why I get is a bad idea to have as a primary key, especially in as a primary key. 88 00:06:44,040 --> 00:06:44,370 ‫Right. 89 00:06:45,000 --> 00:06:50,340 ‫Because the randomness nature of things, you will start jumping through the heap left and right. 90 00:06:50,520 --> 00:07:00,810 ‫You will not benefit of the memory caching that the database is doing to answer the Suros, which is 91 00:07:01,410 --> 00:07:07,980 ‫which is something I talked about some of the limitations of my school, which is, by the way, a primary 92 00:07:07,980 --> 00:07:10,530 ‫based primary care database. 93 00:07:10,860 --> 00:07:16,740 ‫You have by default, you have to have a primary key in my school and you get one for free, I believe, 94 00:07:16,740 --> 00:07:17,790 ‫which is sequential. 95 00:07:17,790 --> 00:07:23,400 ‫But you have to have one and you have to have that means you have to have your table clustered in that 96 00:07:23,400 --> 00:07:23,750 ‫way. 97 00:07:24,660 --> 00:07:26,730 ‫Get Oracle, give you that option. 98 00:07:27,860 --> 00:07:35,720 ‫To have the index organize table, but you don't have to write SQL Server again, you have the option 99 00:07:35,720 --> 00:07:41,960 ‫to have the index organize table or they call the clustered index, in that case, a primary clustered 100 00:07:41,960 --> 00:07:42,390 ‫index. 101 00:07:43,220 --> 00:07:44,020 ‫You can have that. 102 00:07:44,210 --> 00:07:49,640 ‫So a clustered index that is sequential is not that bad, but. 103 00:07:51,420 --> 00:07:57,780 ‫You can get into other things that I don't want to spend more time in this video, make it we can we 104 00:07:57,780 --> 00:07:59,210 ‫can leave it to the lecture. 105 00:08:00,150 --> 00:08:03,990 ‫So that's that's a primary loss to the second or the second. 106 00:08:04,020 --> 00:08:13,770 ‫The key is having the table as a jumbled mess, but having an additional outside structure. 107 00:08:13,770 --> 00:08:14,820 ‫That is your key. 108 00:08:14,820 --> 00:08:17,120 ‫That is your index as your tree. 109 00:08:17,440 --> 00:08:18,300 ‫It's not in the table. 110 00:08:18,300 --> 00:08:20,010 ‫That table is a jumbled mess. 111 00:08:20,220 --> 00:08:25,380 ‫So you have the value of one seven and then three hundred and then seven and then eight, assuming there 112 00:08:25,380 --> 00:08:26,220 ‫are no order. 113 00:08:27,480 --> 00:08:36,360 ‫And now all the indexes or the index that is pointing to your column has no order in itself in the table, 114 00:08:36,360 --> 00:08:39,140 ‫but it is ordering another structure. 115 00:08:39,150 --> 00:08:43,740 ‫So there is you're maintaining a separate structure for your indexes. 116 00:08:44,490 --> 00:08:47,520 ‫OK, that's called a secondary index. 117 00:08:47,930 --> 00:08:51,120 ‫Now, if you want to search, you have to jump to this index first. 118 00:08:51,120 --> 00:08:58,800 ‫Do the search, collect the tuples or the pages that you want to query, because guess what? 119 00:08:59,640 --> 00:09:02,390 ‫You're not going to find everything in the index. 120 00:09:02,610 --> 00:09:07,020 ‫Yes, you're going to find the things that you're looking for. 121 00:09:07,020 --> 00:09:11,220 ‫But not all the information is located in the index. 122 00:09:11,490 --> 00:09:13,830 ‫You're going to find the row IDs. 123 00:09:13,830 --> 00:09:16,220 ‫You're going to find that rows wrong. 124 00:09:16,230 --> 00:09:18,750 ‫Number A is the value. 125 00:09:18,780 --> 00:09:22,640 ‫Look at a lower number BNZ are the values that you're looking for. 126 00:09:23,670 --> 00:09:29,460 ‫But you still have to go to the HEB, to the table to actually fetch the row itself, especially if 127 00:09:29,460 --> 00:09:31,140 ‫you drink select star on things. 128 00:09:31,680 --> 00:09:33,540 ‫So you have to go that do that jump. 129 00:09:34,200 --> 00:09:39,990 ‫That's the that's the disadvantage of secondary indexes. 130 00:09:40,320 --> 00:09:44,600 ‫Posterous all indexes are postgres are secondary indexes. 131 00:09:45,180 --> 00:09:46,830 ‫You don't have primary indexes. 132 00:09:47,370 --> 00:09:50,790 ‫And in pulseless can you have one. 133 00:09:51,210 --> 00:09:52,220 ‫I'm not sure. 134 00:09:52,320 --> 00:09:55,820 ‫Can you create an index that is a clustered index. 135 00:09:56,430 --> 00:09:57,060 ‫Maybe. 136 00:09:57,060 --> 00:09:57,930 ‫I need to check. 137 00:09:57,930 --> 00:10:04,500 ‫I think there's an option to to call to cluster your table so that they are organized around that index. 138 00:10:04,500 --> 00:10:17,250 ‫I like I prefer the Oracle term better than the index, better than than the clustered index naming 139 00:10:17,250 --> 00:10:18,930 ‫or the primary key for that matter. 140 00:10:19,470 --> 00:10:25,050 ‫Index organize table is is its self descriptive and I love it. 141 00:10:25,050 --> 00:10:25,380 ‫Right. 142 00:10:25,680 --> 00:10:28,500 ‫As opposed to the opposite heap organized table. 143 00:10:29,560 --> 00:10:36,120 ‫A hot heap organize table is when when you have the heap organize the tables is just random. 144 00:10:36,120 --> 00:10:42,600 ‫There is no index that organizes the table, there's no primary queue that fits that. 145 00:10:44,310 --> 00:10:46,620 ‫And yeah, that's what I wanted to talk. 146 00:10:46,620 --> 00:10:48,660 ‫I want to keep this lecture short and sweet. 147 00:10:48,660 --> 00:10:50,730 ‫I hope you enjoyed this lecture. 148 00:10:50,730 --> 00:10:56,620 ‫I see you in the next one and say awesome, enjoy the course.