1 00:00:00,090 --> 00:00:06,270 ‫Aggregating large entries in the database to perform a count, for example, is a lot of work, the 2 00:00:06,270 --> 00:00:12,810 ‫database has to sort through a large number of records, whether it's in an index or on their raw heap 3 00:00:12,810 --> 00:00:13,740 ‫table itself. 4 00:00:14,160 --> 00:00:20,880 ‫Doing this too often can impact the performance of both your database and your application that is discussed. 5 00:00:20,880 --> 00:00:26,630 ‫Why count can be slow and an alternative if you really don't want the actual company. 6 00:00:26,640 --> 00:00:29,370 ‫Yes, you want an estimate that's just jump into it. 7 00:00:29,400 --> 00:00:29,740 ‫Origo. 8 00:00:29,780 --> 00:00:36,780 ‫So there are many ways you can execute a count on your database or avea a great stable or famous student. 9 00:00:36,780 --> 00:00:37,500 ‫Great sabl. 10 00:00:37,740 --> 00:00:44,490 ‫So there is a G which the grade itself does the idea of the students and I think it has like around 11 00:00:44,490 --> 00:00:46,170 ‫60 million rows are created. 12 00:00:46,260 --> 00:00:50,220 ‫So what I'm going to do here is do a select count G. 13 00:00:51,560 --> 00:01:00,860 ‫From grades where I'd say between 3000 and 4000 grades of those people, but I just want to count if 14 00:01:00,860 --> 00:01:06,950 ‫you execute that it's so fast right now, don't pay attention to the speed because I have Cash-in going 15 00:01:06,950 --> 00:01:07,190 ‫on. 16 00:01:07,190 --> 00:01:12,640 ‫I ask you this question many times, but I want to pay attention to that number that come back 1900. 17 00:01:12,860 --> 00:01:20,750 ‫And the reason this number is little bit low from comparing 2000 and 4000 should be around 3000, because 18 00:01:21,140 --> 00:01:25,220 ‫Count G will return entries that are not null. 19 00:01:25,220 --> 00:01:26,450 ‫And just just think about it. 20 00:01:26,600 --> 00:01:33,950 ‫I have an index on the ID field that means the database have used that index to pull the rows in order 21 00:01:33,950 --> 00:01:34,610 ‫to count them. 22 00:01:34,610 --> 00:01:41,270 ‫So we're on the index, but we asked the database to do a count on, gee, that means we'd have to go 23 00:01:41,270 --> 00:01:44,120 ‫to the table to check those. 24 00:01:44,990 --> 00:01:49,040 ‫The value of gee, whether it's null or not, does do and explain. 25 00:01:49,040 --> 00:01:52,370 ‫It was just add and explain and analyze before these puppies. 26 00:01:52,940 --> 00:01:54,110 ‫Let's see what happened here. 27 00:01:54,560 --> 00:01:55,370 ‫I do that. 28 00:01:55,790 --> 00:01:57,290 ‫Let's pay attention to what happened here. 29 00:01:57,320 --> 00:02:04,520 ‫We're doing an index, a scan of this, and we have a return of three thousand and one rose because 30 00:02:04,520 --> 00:02:04,970 ‫guess what? 31 00:02:05,330 --> 00:02:08,990 ‫We're in the index and the actual index entries. 32 00:02:08,990 --> 00:02:11,480 ‫I don't have any deleted rolls, by the way, or so. 33 00:02:11,480 --> 00:02:13,100 ‫Three thousand and one is about. 34 00:02:14,090 --> 00:02:21,620 ‫What that that final count has been reduced because we went back to the table to get the value, because 35 00:02:21,620 --> 00:02:22,580 ‫that's where we ask that. 36 00:02:22,580 --> 00:02:24,040 ‫We ask the value. 37 00:02:24,530 --> 00:02:30,440 ‫And when you when you do count here in any fields, the database will filter through the fields that 38 00:02:30,440 --> 00:02:32,150 ‫only count the not null values. 39 00:02:32,180 --> 00:02:33,470 ‫And I have few nulls there. 40 00:02:33,560 --> 00:02:40,970 ‫OK, so now it uses an index to scan and that tells me that a isco index. 41 00:02:40,970 --> 00:02:42,200 ‫But I have to go to the table. 42 00:02:42,200 --> 00:02:46,310 ‫It's not an index only scan the spice things up and see. 43 00:02:47,620 --> 00:02:53,830 ‫Do the same thing here, but I'm going to do a select star this time, and a lot of people have the 44 00:02:53,830 --> 00:03:02,050 ‫misconception that Count Star actually goes to the table and Fitch all the fields and count the fields. 45 00:03:02,050 --> 00:03:05,450 ‫No, almost no database do this anymore. 46 00:03:05,460 --> 00:03:05,730 ‫Right. 47 00:03:05,890 --> 00:03:10,450 ‫Count star essentially means just count whatever entries you have. 48 00:03:10,450 --> 00:03:10,760 ‫Right. 49 00:03:11,140 --> 00:03:13,390 ‫This will include null. 50 00:03:13,390 --> 00:03:16,630 ‫That is if you're scouting an index, given the values. 51 00:03:16,630 --> 00:03:16,980 ‫Right. 52 00:03:16,990 --> 00:03:19,120 ‫And you can see that we got a higher number. 53 00:03:19,420 --> 00:03:25,230 ‫Let's take a look at that plan that did that that Posterous used to do that stuff. 54 00:03:25,240 --> 00:03:26,140 ‫What did you use? 55 00:03:26,140 --> 00:03:26,740 ‫Postgres. 56 00:03:27,160 --> 00:03:27,670 ‫Alimi. 57 00:03:29,260 --> 00:03:36,940 ‫A few we look at the plan now, look at this, it's an index only scan and always index only scan always 58 00:03:36,940 --> 00:03:43,870 ‫trumps and always it's better than the actual index scan because I don't really need to go back to the 59 00:03:43,870 --> 00:03:44,710 ‫table again. 60 00:03:44,900 --> 00:03:49,000 ‫Don't pay attention to these numbers because I have caches all over the place. 61 00:03:49,510 --> 00:03:52,780 ‫I just wanted to the most important thing is to understand that plan. 62 00:03:52,900 --> 00:03:58,900 ‫Running these numbers don't mean anything right now because first of all, I'm in a container. 63 00:03:59,350 --> 00:04:01,180 ‫I have a large amount of money. 64 00:04:01,180 --> 00:04:05,700 ‫So the database will start cashing these pages if I execute them over and over again. 65 00:04:06,340 --> 00:04:09,460 ‫But just understand that plan is the most important thing. 66 00:04:09,580 --> 00:04:15,430 ‫And you, as a as a result, the larger the number of rows come back, the more work the database is 67 00:04:15,430 --> 00:04:15,580 ‫doing. 68 00:04:15,580 --> 00:04:18,240 ‫So 118 index won't be scanned. 69 00:04:18,460 --> 00:04:18,850 ‫All right. 70 00:04:18,860 --> 00:04:20,470 ‫I say, what are you trying to do here? 71 00:04:20,660 --> 00:04:21,340 ‫Here's what I'm going to do. 72 00:04:22,180 --> 00:04:28,570 ‫I'm going to do an update now, I'm going to do an update grades, ciggy equal. 73 00:04:29,640 --> 00:04:39,630 ‫20, where I.D. as between 2000 and 4000, so those roads between 3000 and 4000, I'm going to slam 74 00:04:39,630 --> 00:04:41,070 ‫all of them and update them. 75 00:04:42,420 --> 00:04:47,320 ‫And change this, let's see how the baskets are all freaked out now, what will happen? 76 00:04:47,580 --> 00:04:53,370 ‫I'm going to ask this can't start and then let's see what will happen all of a sudden. 77 00:04:53,370 --> 00:04:59,460 ‫Guys, look, this number jumped again, not by much, but it is significant. 78 00:04:59,460 --> 00:05:02,460 ‫The more rose, the more actual real data. 79 00:05:02,460 --> 00:05:03,090 ‫You have this. 80 00:05:03,840 --> 00:05:04,490 ‫Got that. 81 00:05:04,500 --> 00:05:05,460 ‫But but look at that. 82 00:05:06,260 --> 00:05:11,260 ‫It still says index only scan, but look at this fitchett. 83 00:05:11,690 --> 00:05:13,250 ‫I always to pay attention to this. 84 00:05:13,490 --> 00:05:17,720 ‫The moment you start seeing he pushes that mean index game. 85 00:05:17,750 --> 00:05:24,530 ‫Yeah, we only scan the index, but we had to go back to the table six thousand and two times. 86 00:05:25,040 --> 00:05:25,350 ‫Right. 87 00:05:26,920 --> 00:05:28,240 ‫Or these amount of values. 88 00:05:29,700 --> 00:05:35,400 ‫I'm not sure this these are the blocks of actual roads I have to have to go back and check, but we 89 00:05:35,400 --> 00:05:39,000 ‫have to go back this amount of time, which is expensive. 90 00:05:39,480 --> 00:05:39,950 ‫Why? 91 00:05:40,260 --> 00:05:43,470 ‫Because we have updated the values. 92 00:05:43,740 --> 00:05:48,210 ‫The visibility map told the index scanner that, hey, by the way. 93 00:05:48,330 --> 00:05:49,890 ‫Yeah, you're scanning the index. 94 00:05:49,890 --> 00:05:55,650 ‫And I have I'm going to give you only values in the index, which is usually fast, again, if you're 95 00:05:55,650 --> 00:05:56,820 ‫not scanning the whole index. 96 00:05:57,570 --> 00:06:04,140 ‫But these rules that I'm scanning in the index might have been updated, might have been deleted. 97 00:06:05,330 --> 00:06:13,950 ‫So I have to go back to the heap where the actual visibility of the rule exists to check if the rule 98 00:06:13,950 --> 00:06:14,850 ‫is actually deleted. 99 00:06:15,390 --> 00:06:21,500 ‫Because when you delete something in both cases, the index is not immediately marked as deleted. 100 00:06:21,500 --> 00:06:25,020 ‫It just adds that adds a new record and just. 101 00:06:26,610 --> 00:06:34,740 ‫And keeps the old records for four NBCC reasons, so multiple currency controls so other transactions 102 00:06:34,740 --> 00:06:36,390 ‫can see those old temples. 103 00:06:36,420 --> 00:06:38,430 ‫All right, so how do we feel? 104 00:06:38,670 --> 00:06:39,930 ‫How do you solve this problem? 105 00:06:39,960 --> 00:06:40,710 ‫Very simple. 106 00:06:40,710 --> 00:06:42,840 ‫You just vacuum the table. 107 00:06:43,110 --> 00:06:48,240 ‫Vacuuming the table will update the visibility map, saying that, by the way, those Staib, those 108 00:06:48,240 --> 00:06:51,030 ‫old rules that you just updated, nobody's reading them. 109 00:06:51,150 --> 00:06:55,860 ‫No, there are no running transactions that we're reading them in our production system. 110 00:06:55,870 --> 00:06:57,990 ‫There might be, but no now. 111 00:06:58,200 --> 00:07:00,880 ‫So if I do now the same Equidae again. 112 00:07:00,900 --> 00:07:06,160 ‫So if we do it again, you can see that we got he features zero. 113 00:07:06,390 --> 00:07:06,750 ‫All right. 114 00:07:06,870 --> 00:07:13,950 ‫And guys, every time I increase that number three thousand four, you can see that this is going to 115 00:07:13,950 --> 00:07:19,650 ‫get slower and slower and slower just to just to show you that, for example, now it have one and a 116 00:07:19,650 --> 00:07:20,780 ‫half second to execute. 117 00:07:21,030 --> 00:07:25,320 ‫And then if I put that number a little bit because they oh, you can feel it. 118 00:07:25,500 --> 00:07:27,660 ‫So count is not cheap, right? 119 00:07:27,810 --> 00:07:30,810 ‫It was cheap for the three rolls that I'm going to present. 120 00:07:30,810 --> 00:07:37,380 ‫But every time you do that, the plan now change says, OK, we're still going to do an index, but 121 00:07:37,440 --> 00:07:42,280 ‫I'm going to use threading the data decided to do multiple threads, multiple workers, administrative 122 00:07:42,280 --> 00:07:47,670 ‫to multiple workers to scan that index so I can give you the results. 123 00:07:47,670 --> 00:07:47,890 ‫Right. 124 00:07:48,030 --> 00:07:48,950 ‫So we're good. 125 00:07:48,990 --> 00:07:53,820 ‫We didn't do any searches, but it took six seconds to return this many rows. 126 00:07:53,820 --> 00:07:54,100 ‫Right. 127 00:07:54,480 --> 00:08:00,500 ‫So every time you increase that number goes larger, the operation is going to go slower. 128 00:08:00,510 --> 00:08:03,120 ‫It's just it's a proportionate was just so. 129 00:08:03,840 --> 00:08:04,110 ‫So. 130 00:08:04,110 --> 00:08:04,430 ‫All right. 131 00:08:04,710 --> 00:08:08,370 ‫So what if I say I don't care about the actual roll count. 132 00:08:08,380 --> 00:08:08,700 ‫Right. 133 00:08:08,850 --> 00:08:13,710 ‫And really care about giving the actual exact value. 134 00:08:13,890 --> 00:08:14,240 ‫Right. 135 00:08:15,350 --> 00:08:16,880 ‫But here's what I want to do. 136 00:08:17,240 --> 00:08:21,170 ‫I want you just to give me an estimate, if you do just then explain. 137 00:08:21,830 --> 00:08:22,210 ‫Right. 138 00:08:22,370 --> 00:08:23,540 ‫And let's just format this. 139 00:08:23,540 --> 00:08:25,090 ‫So it's Jasni, right? 140 00:08:25,490 --> 00:08:26,420 ‫You do that. 141 00:08:27,620 --> 00:08:33,680 ‫That without it and analyze again, without and analyze, analyze, actually execute the query, explain 142 00:08:33,680 --> 00:08:39,290 ‫will not execute that way, but it will estimated it will estimate that, hey, I'm going to use the 143 00:08:39,290 --> 00:08:39,830 ‫index. 144 00:08:40,160 --> 00:08:49,610 ‫I am planning that I might get this much road back two thousand eight hundred sixty eight. 145 00:08:49,880 --> 00:08:50,150 ‫Right. 146 00:08:50,360 --> 00:08:52,310 ‫So compared to. 147 00:08:53,470 --> 00:08:57,190 ‫The select count, so it's not an accurate number. 148 00:08:58,290 --> 00:09:05,520 ‫It's it's 200 values up and a lot of people use this, and I actually got this trick from a blog that 149 00:09:05,520 --> 00:09:06,450 ‫I'm going to reference below. 150 00:09:06,540 --> 00:09:12,450 ‫Let's remember that the table itself has some statistics to to update itself. 151 00:09:12,450 --> 00:09:12,580 ‫Right. 152 00:09:12,610 --> 00:09:19,320 ‫So that is actually without actually looking at the table, it knows roughly how many rows or in table, 153 00:09:19,470 --> 00:09:24,540 ‫roughly how much raw data will come back of this or these actual correct numbers. 154 00:09:24,600 --> 00:09:25,500 ‫Absolutely not. 155 00:09:25,680 --> 00:09:30,510 ‫But if you're building Instagram or building something that law account accountant likes or something 156 00:09:30,510 --> 00:09:33,300 ‫like that, this is way better. 157 00:09:33,300 --> 00:09:33,590 ‫Right. 158 00:09:33,840 --> 00:09:36,230 ‫So I can quickly estimate this stuff. 159 00:09:36,240 --> 00:09:43,980 ‫And as you update your table, obviously a good idea to do analyze on your grades, Abel, or your table 160 00:09:44,220 --> 00:09:47,420 ‫that will update the statistics to the correct numbers. 161 00:09:47,430 --> 00:09:49,470 ‫And obviously, this operation is going to take a long time. 162 00:09:49,650 --> 00:09:51,280 ‫All right, guys, that's it for me today. 163 00:09:51,300 --> 00:09:55,980 ‫That was count and how count is as exponentially a lot of work for the database. 164 00:09:55,980 --> 00:10:02,790 ‫And if you really don't need an actual count, correct number, especially if it's in the millions, 165 00:10:02,790 --> 00:10:10,290 ‫why would you show the user 60 million and 320 exact number? 166 00:10:10,290 --> 00:10:10,490 ‫Right. 167 00:10:10,500 --> 00:10:13,500 ‫Almost no one does that, Alysse. 168 00:10:14,430 --> 00:10:14,650 ‫Yeah. 169 00:10:14,680 --> 00:10:15,930 ‫So it's an estimate. 170 00:10:15,930 --> 00:10:17,130 ‫Always matter. 171 00:10:17,280 --> 00:10:20,490 ‫If you're working with a few arrows, you can do an actual count. 172 00:10:20,640 --> 00:10:27,420 ‫But if you expect the table to grow avoided using an Excel account, does that do an estimate with that 173 00:10:27,420 --> 00:10:30,110 ‫planner like that if that works for you? 174 00:10:30,220 --> 00:10:33,060 ‫Hope that that's way more than enough. 175 00:10:33,270 --> 00:10:33,400 ‫Right. 176 00:10:33,420 --> 00:10:35,330 ‫Again, guys, I'm going to see you on the next one. 177 00:10:35,550 --> 00:10:36,310 ‫You guys say awesome. 178 00:10:36,430 --> 00:10:36,770 ‫Goodbye.