1
00:00:00,120 --> 00:00:00,930
‫Welcome, everybody.

2
00:00:00,930 --> 00:00:11,580
‫This is the Q&A for April 2022 for the the ABC engineering course.

3
00:00:12,690 --> 00:00:18,360
‫We've been doing this for a few months now and it's been very successful and very useful for people.

4
00:00:18,960 --> 00:00:24,000
‫And I believe it's always good to be like Samir Sumaidaie, interactive like this.

5
00:00:25,980 --> 00:00:34,080
‫So we receive a good amount of questions on this course because it became very popular and it's only

6
00:00:34,380 --> 00:00:40,740
‫fair to pick one or two questions every month to answer them in depth.

7
00:00:41,040 --> 00:00:43,140
‫So I answer almost every question.

8
00:00:43,140 --> 00:00:52,460
‫I to my to my extent and to my time allocated, you know, but some some answers you can just write

9
00:00:52,470 --> 00:01:00,120
‫just he needs time to think about it and write about and some some of them, like, deserve, you know,

10
00:01:00,600 --> 00:01:03,930
‫time like this and to to explain it better.

11
00:01:04,800 --> 00:01:05,700
‫And this is one of them.

12
00:01:05,760 --> 00:01:07,710
‫So this comes from Dominique.

13
00:01:07,710 --> 00:01:18,600
‫And Dominique is asking about the implications of none key column indexes in real world database systems.

14
00:01:18,990 --> 00:01:23,790
‫Would it be best practice to create and drop new indexes with non key columns for short term purposes?

15
00:01:25,090 --> 00:01:31,440
‫Or would this be a decision I make once the setting up the table when I launch a new application utilizing

16
00:01:31,440 --> 00:01:31,920
‫the table?

17
00:01:32,580 --> 00:01:38,330
‫And the second question is, is it really advisable to drop indexes of tables which might have various

18
00:01:38,340 --> 00:01:41,670
‫different use cases to different users?

19
00:01:43,080 --> 00:01:44,470
‫So here's, here's what I want to say.

20
00:01:45,340 --> 00:01:52,830
‫The the question, while we'll kind of relate to non key indexes, it's kind of it's a it covers all

21
00:01:52,830 --> 00:01:53,760
‫type of indexes.

22
00:01:54,570 --> 00:01:59,940
‫Is it do you need some time to create indexes just for specific applications?

23
00:02:01,800 --> 00:02:12,690
‫Sure, you can you can create an index if you seen an app follows a path, a query path that is not

24
00:02:12,690 --> 00:02:14,010
‫covered by an index.

25
00:02:14,730 --> 00:02:16,470
‫You might want to create an index to do that.

26
00:02:18,620 --> 00:02:23,630
‫The question is how often this AB is executed because the moment you get an index, you've got to go

27
00:02:23,630 --> 00:02:27,740
‫with the maintenance of the index while the maintenance is not really that bad.

28
00:02:28,130 --> 00:02:30,470
‫If that, if you have like a few million rows.

29
00:02:30,710 --> 00:02:38,660
‫But if you go to two multimillion rows and more than them, the index maintenance becomes really a problematic.

30
00:02:38,870 --> 00:02:43,730
‫What I want to mention here is the real world implications of non key indexes.

31
00:02:45,250 --> 00:02:49,010
‫So nike.com indexes are usually created when you.

32
00:02:49,990 --> 00:02:55,300
‫You have an index that is used in equity path and that query path.

33
00:02:56,370 --> 00:03:02,010
‫Will efficiently use the index to get to collection of results.

34
00:03:02,580 --> 00:03:02,880
‫No.

35
00:03:03,510 --> 00:03:06,810
‫But then the query asked for.

36
00:03:08,020 --> 00:03:11,920
‫Certain columns that are obviously not indexed most of the time.

37
00:03:12,250 --> 00:03:12,570
‫And.

38
00:03:14,850 --> 00:03:20,130
‫If this collection that is returned is significantly large.

39
00:03:21,310 --> 00:03:29,680
‫Then with the database has to do is take the squeeze that have been resulted because we found that the

40
00:03:29,680 --> 00:03:31,090
‫results are from the index.

41
00:03:31,750 --> 00:03:41,980
‫Now we have to go to the heap or to the clustered table and query to pull the full pages.

42
00:03:42,460 --> 00:03:42,750
‫No.

43
00:03:43,810 --> 00:03:47,230
‫And in a heap based.

44
00:03:49,060 --> 00:03:54,850
‫Databases such as Postgres, where by default there is no clustering.

45
00:03:55,800 --> 00:03:57,980
‫You know what the index will do?

46
00:03:57,990 --> 00:04:06,000
‫It will say, okay, here are the pages that I need, because it knows that this this index, this topologies

47
00:04:06,000 --> 00:04:07,140
‫belong to this page.

48
00:04:07,920 --> 00:04:14,580
‫So it needs to fetch all the pages in the heap and then pull these pages and then.

49
00:04:15,880 --> 00:04:16,960
‫From these pages.

50
00:04:18,050 --> 00:04:24,530
‫It will take the columns it needs effectively write.

51
00:04:25,490 --> 00:04:30,670
‫In addition to the things that is selected and that is slow, right?

52
00:04:30,840 --> 00:04:35,780
‫Because because you didn't you just use the index as a path to get to the rules.

53
00:04:35,780 --> 00:04:42,140
‫And if these things are random and believe me, they will, because what would you find is a bunch of

54
00:04:42,140 --> 00:04:42,770
‫tuples.

55
00:04:43,610 --> 00:04:45,410
‫And these table are all over the place.

56
00:04:45,650 --> 00:04:48,770
‫They're not as nicely ordered as in the index.

57
00:04:49,190 --> 00:04:50,420
‫They'll be all over the place.

58
00:04:50,990 --> 00:04:53,300
‫So you'll be calling all over the place.

59
00:04:53,810 --> 00:04:58,370
‫You'll be having a lot of random reads to the heap.

60
00:04:59,660 --> 00:05:10,070
‫And that is very expensive, and especially expensive in a clustered table where you're going to do

61
00:05:10,070 --> 00:05:19,460
‫ab3 scans right to two because the on the primary key in order to fetch these pages or the results.

62
00:05:20,090 --> 00:05:26,390
‫So it is it is quite expensive to do that and especially if it's one or two results, not big deal.

63
00:05:26,390 --> 00:05:32,000
‫But if you're furnishing hundreds or thousands of results, then you are doing a see this random reads

64
00:05:32,510 --> 00:05:38,960
‫on the table which starts as the random IO, which obviously hits the disk and slows things down.

65
00:05:41,660 --> 00:05:45,920
‫You probably get a miss the cache or escalation other caches in the process.

66
00:05:47,000 --> 00:05:56,660
‫So what do you do is take the columns that you would have selected and recreate the index with those

67
00:05:56,660 --> 00:05:58,960
‫columns as non key.

68
00:06:00,350 --> 00:06:09,590
‫This way when you do your nice scan of the index right and you found the candidate rows, you don't

69
00:06:10,160 --> 00:06:12,470
‫you just didn't just find the candidate rows.

70
00:06:12,470 --> 00:06:21,230
‫You found all the columns you need in line in the index itself, which is very, very powerful.

71
00:06:21,680 --> 00:06:23,450
‫Now you're code is done.

72
00:06:23,870 --> 00:06:25,790
‫It's an index only scan.

73
00:06:26,210 --> 00:06:32,330
‫You don't have to go back to the heap to do do those random zeros.

74
00:06:32,810 --> 00:06:33,050
‫Right.

75
00:06:33,050 --> 00:06:41,450
‫Or the primary key clustered index to do these random files and your performance will be tripled if

76
00:06:41,450 --> 00:06:45,560
‫not even more high, depending on the size of the results they're using.

77
00:06:45,800 --> 00:06:48,980
‫And then you might say, I don't really return 10,000 results.

78
00:06:49,370 --> 00:06:49,940
‫You don't?

79
00:06:50,120 --> 00:06:57,590
‫But don't you have some queries that has select in Select and the inner selector returns a lot of rows.

80
00:06:57,590 --> 00:07:00,110
‫But the other select just picks few one of them.

81
00:07:00,830 --> 00:07:06,530
‫Right and then returns those some of these inner select queries can with a lot of rows.

82
00:07:06,530 --> 00:07:16,760
‫So a few if you can do this efficiently, you will have minimize the is and so you things that would

83
00:07:16,760 --> 00:07:22,250
‫have exacerbate the problem will be much much isolated and minimized.

84
00:07:22,910 --> 00:07:28,850
‫So yeah and and key is definitely find it to close this and key indexes.

85
00:07:29,910 --> 00:07:33,090
‫Said he was in production all the time that we have been.

86
00:07:33,090 --> 00:07:36,480
‫I've been in this for 17 years and I always see none.

87
00:07:36,480 --> 00:07:39,240
‫Key indexes in production systems.

88
00:07:39,510 --> 00:07:39,810
‫Yeah.

89
00:07:41,940 --> 00:07:42,480
‫It just.

90
00:07:42,480 --> 00:07:43,330
‫It just makes sense.

91
00:07:43,680 --> 00:07:43,920
‫Yeah.

92
00:07:44,400 --> 00:07:45,420
‫What's the cost?

93
00:07:45,600 --> 00:07:47,130
‫That is all the cost of everything.

94
00:07:48,420 --> 00:07:52,080
‫The cost is space, and the cost is I.

95
00:07:52,090 --> 00:07:57,170
‫Oh, because now the token will be plus threes.

96
00:07:57,180 --> 00:08:01,350
‫And if you're listening to this, you probably watch the whole course, which is good.

97
00:08:02,070 --> 00:08:04,500
‫So B plus three, everything in the leaf.

98
00:08:04,500 --> 00:08:06,090
‫So I ose.

99
00:08:06,090 --> 00:08:08,010
‫Now you're going to need more.

100
00:08:08,010 --> 00:08:08,970
‫I always.

101
00:08:10,650 --> 00:08:11,400
‫Two Fitch.

102
00:08:11,700 --> 00:08:14,220
‫The B three lineage.

103
00:08:14,520 --> 00:08:15,540
‫The leaf trees.

104
00:08:17,560 --> 00:08:17,950
‫Is it?

105
00:08:17,950 --> 00:08:18,280
‫A bit.

106
00:08:18,910 --> 00:08:19,510
‫Is that.

107
00:08:19,720 --> 00:08:20,590
‫Is it a big deal?

108
00:08:21,460 --> 00:08:22,480
‫Not necessarily.

109
00:08:22,600 --> 00:08:23,380
‫Yeah, it is.

110
00:08:23,720 --> 00:08:24,790
‫How bad it is?

111
00:08:25,180 --> 00:08:25,880
‫I wouldn't tell.

112
00:08:25,900 --> 00:08:28,110
‫I can't really tell you how bad it is.

113
00:08:28,480 --> 00:08:28,810
‫Month.

114
00:08:29,020 --> 00:08:32,410
‫It's probably insignificant, but it is always a cost.

115
00:08:32,500 --> 00:08:32,770
‫Right.

116
00:08:33,020 --> 00:08:37,810
‫And then to the need to try and see was the cost of everything happened there.

117
00:08:38,260 --> 00:08:45,040
‫I got a very quick video today for the April 20, 22 cook you're going to see in the next one.