1 00:00:00,090 --> 00:00:06,420 ‫In this video, I'll explain why you should avoid using sexual offset when implementing any kind of 2 00:00:06,420 --> 00:00:07,090 ‫paging. 3 00:00:07,320 --> 00:00:14,140 ‫I'll explain what offset does why is it slow and what is the alternative for better performance? 4 00:00:14,580 --> 00:00:22,500 ‫This video inspired by use the index look and I'll have a link to his blog and slides to learn more. 5 00:00:22,530 --> 00:00:26,430 ‫And if you're not subscribe to do use the index. 6 00:00:26,430 --> 00:00:28,570 ‫Look, log what are doing. 7 00:00:28,620 --> 00:00:34,390 ‫This is an extremely mandatory log to read for all database engineers. 8 00:00:34,410 --> 00:00:35,290 ‫So check it out. 9 00:00:35,820 --> 00:00:40,470 ‫So let's say you have a Web application with an API that supports paging your user. 10 00:00:40,680 --> 00:00:44,960 ‫Want to request 10 news article in page ten. 11 00:00:45,270 --> 00:00:49,800 ‫This is performed via a simple request as shown here. 12 00:00:51,010 --> 00:00:56,710 ‫The API server receives the guest request and builds a sequel in order to send it to the database, 13 00:00:56,740 --> 00:01:02,110 ‫hopefully there should be a pool of is already established connection to your databases so you don't 14 00:01:02,110 --> 00:01:04,380 ‫have to hit that cold start problem here. 15 00:01:05,840 --> 00:01:13,520 ‫Page 10 translates to offset hundred, assuming each page has ten records and now the database is ready 16 00:01:13,520 --> 00:01:17,390 ‫to execute the query against this beautiful table. 17 00:01:18,740 --> 00:01:24,260 ‫Offset by design means fetch and drop the first X number of rows. 18 00:01:24,290 --> 00:01:31,760 ‫So in this case, the database will fetch the first hundred and ten rows and physically drop the first 19 00:01:31,760 --> 00:01:32,870 ‫hundred, because guess what? 20 00:01:32,870 --> 00:01:34,460 ‫You said it's offset, right? 21 00:01:34,730 --> 00:01:42,650 ‫And then leaving the limit of 10 rows, which the user will be actually seeing as the offset increase. 22 00:01:42,650 --> 00:01:48,290 ‫The database is doing more work, which makes the operation extremely, extremely expensive. 23 00:01:49,520 --> 00:01:54,050 ‫Another problem with the offset is you might accidentally read duplicate records. 24 00:01:54,380 --> 00:02:00,610 ‫Consider this the user has now requested page 11, which is now offset one hundred and ten. 25 00:02:00,620 --> 00:02:05,390 ‫And meanwhile, someone inserted a new record in your table. 26 00:02:05,440 --> 00:02:14,120 ‫So now offsetting one hundred and ten rows will accidentally push your existing rule that you have read 27 00:02:14,120 --> 00:02:17,780 ‫before to your ResultSet accidentally. 28 00:02:17,810 --> 00:02:24,050 ‫Now, all of a sudden, you just read a row that you have read before, which is row 11 in this case, 29 00:02:24,140 --> 00:02:28,970 ‫does jump to the practical side and test this thing on Postgres guys. 30 00:02:29,120 --> 00:02:37,580 ‫All right, guys, I have a postscripts database and I have a table called News Desk and there is an 31 00:02:37,580 --> 00:02:47,780 ‫index on the ID and I'm going to use the ID here as sense of it as an auto incrementing ID for my news 32 00:02:47,780 --> 00:02:48,590 ‫articles here. 33 00:02:48,740 --> 00:02:51,090 ‫And don't pay attention to these three fields. 34 00:02:51,380 --> 00:02:53,210 ‫This is the title of the article. 35 00:02:53,240 --> 00:02:58,940 ‫Essentially what I want to do here is I'm going to use Offset to start paging this and I'm going I want 36 00:02:58,940 --> 00:03:01,520 ‫to see how the database perform as a result. 37 00:03:01,730 --> 00:03:02,030 ‫All right. 38 00:03:02,030 --> 00:03:12,780 ‫So let's start with explain, analyze, select title from news ordered by ID descending because I want 39 00:03:12,780 --> 00:03:18,980 ‫the fresh news first and then I'm going to start with offset zero and then I'm going to limit the result 40 00:03:18,980 --> 00:03:19,460 ‫was ten. 41 00:03:19,640 --> 00:03:22,510 ‫And then what I do that instantly I get the results. 42 00:03:22,640 --> 00:03:24,190 ‫Obviously we have two results. 43 00:03:24,200 --> 00:03:25,370 ‫Explain what happens here. 44 00:03:25,580 --> 00:03:34,130 ‫We are using index scan backwards because I have to order by ID descending and index is already ordered 45 00:03:34,130 --> 00:03:34,840 ‫by default. 46 00:03:34,880 --> 00:03:39,140 ‫So the index was scanned back or from bottom up I guess. 47 00:03:39,320 --> 00:03:42,350 ‫And then there is this much in the index. 48 00:03:42,350 --> 00:03:44,060 ‫But this is not really what we were. 49 00:03:44,060 --> 00:03:46,190 ‫We actually pulled is ten rows. 50 00:03:46,430 --> 00:03:46,760 ‫Why. 51 00:03:46,760 --> 00:03:49,910 ‫Because of the zero plus ten rows. 52 00:03:49,910 --> 00:03:54,980 ‫That is exactly ten and applying a limit on top of that give you ten. 53 00:03:55,100 --> 00:03:57,500 ‫So that's not so bad to point to millisecond. 54 00:03:57,710 --> 00:03:59,320 ‫But let's just buy things a little bit. 55 00:03:59,690 --> 00:04:07,490 ‫Let's start with offset a thousand and all of a sudden now look at the rows here guys we have pulled. 56 00:04:08,400 --> 00:04:16,830 ‫From the index, thousand and ten thousand plus then and that on top of that, we apply the limit and 57 00:04:16,830 --> 00:04:18,570 ‫then we only pull ten. 58 00:04:19,050 --> 00:04:21,950 ‫You're seeing the pattern so called millisecond, right? 59 00:04:22,050 --> 00:04:22,910 ‫That's not so bad. 60 00:04:23,370 --> 00:04:24,390 ‫Let's make it. 61 00:04:25,780 --> 00:04:32,410 ‫A hundred thousand who you started to feel like seventy nine millisecond and the guys I run this query 62 00:04:32,410 --> 00:04:34,990 ‫so many times, so there is a lot of page caches. 63 00:04:35,690 --> 00:04:39,310 ‫So this might be slower for first time, but you get the idea. 64 00:04:39,640 --> 00:04:40,690 ‫So what did happen? 65 00:04:40,690 --> 00:04:41,350 ‫What happened here? 66 00:04:42,160 --> 00:04:43,900 ‫One hundred thousand and ten rows. 67 00:04:44,050 --> 00:04:48,670 ‫And then only from those stinking hundred thousand rows, we pulled ten. 68 00:04:49,510 --> 00:04:52,780 ‫Can you imagine the work that that service is doing? 69 00:04:53,200 --> 00:04:55,300 ‫Guys, think about this this much. 70 00:04:55,300 --> 00:05:02,860 ‫Rose, if if this was SQL Server, SQL Server will immediately escalate the lot, because that's all 71 00:05:03,490 --> 00:05:06,130 ‫we're touching this montreaux but was going to die. 72 00:05:06,280 --> 00:05:09,190 ‫But Postgres, thank God it doesn't lock escalation. 73 00:05:09,670 --> 00:05:14,470 ‫So, look, it's going to go and we're working with ten to one hundred thousand ten rows. 74 00:05:14,740 --> 00:05:17,140 ‫We only actually need ten. 75 00:05:17,680 --> 00:05:22,840 ‫And that the the unfortunate thing is the database doesn't know doesn't know to do this because it's 76 00:05:22,840 --> 00:05:23,890 ‫too late by then. 77 00:05:26,490 --> 00:05:32,410 ‫So you get the idea, do a million and you can start feeling it, it's going to be slower and slower. 78 00:05:32,430 --> 00:05:32,640 ‫Right. 79 00:05:32,700 --> 00:05:35,720 ‫And if I thought that this is going to go so much slower. 80 00:05:35,730 --> 00:05:37,350 ‫Six hundred twenty. 81 00:05:37,710 --> 00:05:39,930 ‫And the first time I took like six seconds. 82 00:05:40,440 --> 00:05:42,870 ‫But you get the idea right now. 83 00:05:43,140 --> 00:05:44,900 ‫What can I do to fix this, guys? 84 00:05:44,940 --> 00:05:47,700 ‫So what I'm going to do, I was going to do a little bit. 85 00:05:47,950 --> 00:05:51,080 ‫Nice trick to actually accomplish that. 86 00:05:51,450 --> 00:05:55,650 ‫I'm going to remove that, explain, analyze and I'm going to remove the offset. 87 00:05:56,550 --> 00:06:02,910 ‫And I'm going to simulate an offset by using the ID itself, and then since the I.T. field has a beautiful 88 00:06:02,910 --> 00:06:05,400 ‫index, I'm going to use that as a page. 89 00:06:05,640 --> 00:06:06,270 ‫Look at this. 90 00:06:06,720 --> 00:06:09,800 ‫So it's not really rocket science, what I'm doing here. 91 00:06:09,810 --> 00:06:15,990 ‫I'm going to select the title the and let's say I'm going to pull the first in road and you return this 92 00:06:15,990 --> 00:06:16,560 ‫to the user. 93 00:06:16,560 --> 00:06:16,950 ‫Right. 94 00:06:16,950 --> 00:06:18,660 ‫And the user is responsible. 95 00:06:18,660 --> 00:06:21,940 ‫Now to send you one was the last ID. 96 00:06:21,990 --> 00:06:24,470 ‫It's so it's overseas or it. 97 00:06:24,490 --> 00:06:25,650 ‫So it's best. 98 00:06:25,890 --> 00:06:26,120 ‫Yes. 99 00:06:26,400 --> 00:06:30,720 ‫So now what do you do is basically execute the same query but says hey. 100 00:06:31,680 --> 00:06:34,680 ‫Don't bother returning anything. 101 00:06:35,590 --> 00:06:43,050 ‫After this idea, this is the last idea I saw, so I want older pages because you're going back, right? 102 00:06:43,060 --> 00:06:45,610 ‫And it depends like how how you go forward or back. 103 00:06:46,120 --> 00:06:48,250 ‫You can play with this in your algorithm. 104 00:06:48,490 --> 00:06:52,120 ‫But if I do this now, I got only these roles. 105 00:06:52,970 --> 00:06:54,360 ‫And this is much, much faster. 106 00:06:54,650 --> 00:06:58,020 ‫Let's take a look and explain what is what are we doing? 107 00:06:58,100 --> 00:06:59,000 ‫Explain, analyze. 108 00:06:59,950 --> 00:07:06,040 ‫And if I do that, look at the number of rows that the actual database works with only 10. 109 00:07:06,050 --> 00:07:06,550 ‫Why? 110 00:07:06,830 --> 00:07:12,140 ‫Because if you do just limit 10, that's the best thing you can do. 111 00:07:12,170 --> 00:07:12,880 ‫This is good. 112 00:07:13,520 --> 00:07:21,680 ‫You don't you don't use offset with it using limit with two parameter like at 10 comma 10 or offset. 113 00:07:22,220 --> 00:07:27,820 ‫That kills the performance because now you have to jump and then read limit 10 tells the database, 114 00:07:27,830 --> 00:07:31,460 ‫hey, I'm looking only on the first ten rows that you can pull. 115 00:07:31,490 --> 00:07:37,490 ‫There's another component called fetch the first two rows only whatever the first 10 rows. 116 00:07:37,490 --> 00:07:40,460 ‫That's another thing you can use, which is exactly the same as limit. 117 00:07:40,770 --> 00:07:41,170 ‫All right. 118 00:07:41,420 --> 00:07:47,150 ‫So if you do that, then you get back 10 rows, the index will be scanned backward. 119 00:07:47,160 --> 00:07:56,630 ‫But whereas again, we only 10 plus we have used an index to actually filter that even down even better. 120 00:07:56,640 --> 00:07:56,910 ‫Right. 121 00:07:56,930 --> 00:07:59,460 ‫We just essentially filter down the results. 122 00:07:59,820 --> 00:07:59,960 ‫Right. 123 00:08:00,110 --> 00:08:05,750 ‫We're working with a little bit of a more nice result, which is awesome. 124 00:08:06,710 --> 00:08:10,020 ‫And now and the limit applied after the limits only took ten. 125 00:08:10,250 --> 00:08:16,670 ‫Now, if I do this, if I if I imagine go wins back all the way to page, I don't know, thousand. 126 00:08:16,670 --> 00:08:17,090 ‫Right. 127 00:08:17,420 --> 00:08:19,330 ‫Page thousand, which is very, very old. 128 00:08:19,730 --> 00:08:21,350 ‫That's what an instant. 129 00:08:21,350 --> 00:08:21,730 ‫Right. 130 00:08:22,400 --> 00:08:27,440 ‫We have worked now the database worked with 89 rolls and instead of whatever. 131 00:08:27,440 --> 00:08:27,770 ‫Right. 132 00:08:28,040 --> 00:08:33,130 ‫But still the actual results said from the end from the index came out as ten. 133 00:08:33,170 --> 00:08:39,170 ‫Now we jump back to the heap, we pull the title field and then we limit and it becomes a 10. 134 00:08:39,440 --> 00:08:43,460 ‫So then the amount of IO is much, much less. 135 00:08:43,700 --> 00:08:46,700 ‫The amount of work that databases do is way less. 136 00:08:46,970 --> 00:08:51,340 ‫And you saved a lot of CPU cycle and the database will. 137 00:08:51,350 --> 00:08:51,800 ‫Thank you. 138 00:08:52,300 --> 00:08:54,460 ‫I hope you guys enjoy this with your guys. 139 00:08:54,470 --> 00:08:58,540 ‫What are the other approaches that you personally do when paging? 140 00:08:58,580 --> 00:09:04,970 ‫I'd love to see them in the comments section below so we all can learn and advance and become better 141 00:09:04,970 --> 00:09:05,930 ‫software engineers. 142 00:09:06,350 --> 00:09:07,550 ‫I'm going to see you on the next one. 143 00:09:08,440 --> 00:09:09,660 ‫You guys stay out of the.