1
00:00:04,130 --> 00:00:09,620
‫OK, let's talk about this, the final thing, I think the storage costs between Postgres and Mystikal,

2
00:00:10,070 --> 00:00:17,580
‫so people plus three secondary index values and I talk about secondary indexes versus a primary index

3
00:00:17,590 --> 00:00:18,920
‫difference is very important to know.

4
00:00:18,920 --> 00:00:21,350
‫This difference is the two difference between the two.

5
00:00:21,950 --> 00:00:22,310
‫Right.

6
00:00:23,450 --> 00:00:27,650
‫And can either point directly to the two pull.

7
00:00:28,430 --> 00:00:31,520
‫This is an example of Posterous or to the primary key.

8
00:00:31,550 --> 00:00:35,690
‫And this is one of the reasons that Ueber moved from post customizable.

9
00:00:35,750 --> 00:00:36,230
‫So they.

10
00:00:37,270 --> 00:00:43,750
‫Bosco's points deductable as a result, right, amplifications implode and I have a right amplification,

11
00:00:43,750 --> 00:00:49,900
‫Lechter goes to the go to the section where you have a various database discussion, you're going to

12
00:00:49,900 --> 00:00:50,990
‫see the right amplification.

13
00:00:51,010 --> 00:00:52,600
‫Very critical to understand that.

14
00:00:53,410 --> 00:00:53,730
‫Right.

15
00:00:54,730 --> 00:01:03,100
‫So if you if you have a secondary index and the second earners point to the tuple, then the tuple size

16
00:01:03,100 --> 00:01:07,290
‫is really not that large because it's as a fit.

17
00:01:07,300 --> 00:01:11,230
‫I believe it's to be it might be wrong.

18
00:01:11,240 --> 00:01:19,960
‫It might be 60 for a bit, but that is the pointer while the MySQL Sikandar indexes point to the primary

19
00:01:19,960 --> 00:01:20,240
‫keys.

20
00:01:20,240 --> 00:01:27,220
‫So if the primary key is large, if it's an integer, you don't have a problem, really is just tiny.

21
00:01:27,910 --> 00:01:41,200
‫But if it's good or you idy, that is a really bad idea to put a Gwynedd or a UID as the primary key

22
00:01:41,800 --> 00:01:52,570
‫in energy being MISAKO, because any secondary key unfortunately will point to the primary key and plus

23
00:01:52,570 --> 00:01:53,830
‫the whole thing is clustered.

24
00:01:53,830 --> 00:02:00,730
‫So inserts are so slow because of the randomness of the idea is just not worth it at all.

25
00:02:02,050 --> 00:02:10,630
‫OK, so as a result the secondary indexes will be so large because they have all these values that points

26
00:02:10,630 --> 00:02:17,380
‫to two primary keys, which are effectively ideas, which are these large things.

27
00:02:18,220 --> 00:02:26,080
‫And and you can't you can do all sorts of tricks to to to convert you into the stuff using a string,

28
00:02:26,080 --> 00:02:27,790
‫which is I forgot.

29
00:02:27,790 --> 00:02:31,630
‫What's the selling of the ID by so many bytes.

30
00:02:31,630 --> 00:02:31,960
‫Right.

31
00:02:32,590 --> 00:02:33,700
‫One twenty eight I believe.

32
00:02:34,030 --> 00:02:41,960
‫But you can trick it to use sixty four bytes or less than that using the binary representational void.

33
00:02:41,990 --> 00:02:45,700
‫But still you will it is still large.

34
00:02:46,300 --> 00:02:50,200
‫If it's large disk space, the space can fit the memory.

35
00:02:50,680 --> 00:02:50,980
‫Right.

36
00:02:51,000 --> 00:02:56,980
‫Your memory is Prussia's, you might say, okay, I'm going to add one terabyte worth of memory for

37
00:02:56,980 --> 00:02:57,980
‫my private database.

38
00:02:58,780 --> 00:02:59,410
‫Sure.

39
00:02:59,410 --> 00:03:02,560
‫But do you have to really think about this?

40
00:03:02,560 --> 00:03:02,840
‫Right.

41
00:03:03,580 --> 00:03:10,630
‫Scaling and database engineering is not something that you take lightly.

42
00:03:10,630 --> 00:03:16,120
‫You have to think about all this stuff and then all makes sense when you understand these basic fundamentals.

43
00:03:16,570 --> 00:03:19,540
‫That is what I want to convey in this lecture.

44
00:03:20,380 --> 00:03:23,920
‫The B trees are not something you do math on.

45
00:03:24,610 --> 00:03:29,590
‫Every decision you make cost you right.

46
00:03:29,590 --> 00:03:37,840
‫And what is important to understand how these different DBMS make these design choices, because every

47
00:03:37,840 --> 00:03:42,850
‫design choice can lead to a completely different outcome.

48
00:03:43,750 --> 00:03:49,690
‫If a primary key data type is expensive, this can cause bloat and all the secondary indexes.

49
00:03:49,690 --> 00:03:57,010
‫As I talk to my lymph nodes in my school and really B contains the full row since it's an index organized

50
00:03:57,010 --> 00:03:59,710
‫table or a clustered index too.

51
00:04:00,550 --> 00:04:01,760
‫So that's another thing, right?

52
00:04:02,470 --> 00:04:04,750
‫Clustered indexes in general SQL Server.

53
00:04:04,750 --> 00:04:12,250
‫I also have this idea of clustered indexes, clustered index or cluster tables, sometimes called index

54
00:04:12,250 --> 00:04:13,060
‫organize table.

55
00:04:13,270 --> 00:04:14,230
‫So this is it.

56
00:04:14,630 --> 00:04:19,600
‫And this is then index where the index is the table.

57
00:04:19,630 --> 00:04:22,780
‫So if you think about it, this just the whole thing.

58
00:04:22,960 --> 00:04:34,000
‫The leaf node has the whole row and all the columns in it so that everything is is really clustered

59
00:04:34,000 --> 00:04:34,860
‫nicely.

60
00:04:34,900 --> 00:04:39,450
‫It has disadvantages and advantages that I am just not going to go through it.

61
00:04:39,460 --> 00:04:41,860
‫But it's very important to understand that.