1 00:00:00,700 --> 00:00:02,620 So in this lesson, we are going to talk 2 00:00:02,620 --> 00:00:06,800 about how to understand schema and table design. 3 00:00:06,800 --> 00:00:09,180 Specifically, we are going to take a look 4 00:00:09,180 --> 00:00:12,930 at the basics of database schemas. 5 00:00:12,930 --> 00:00:14,840 We're going to take a look at a few schemas 6 00:00:14,840 --> 00:00:16,740 that you need to know about. 7 00:00:16,740 --> 00:00:18,220 And then we're going to wrap that up 8 00:00:18,220 --> 00:00:20,330 by talking about some of the basics 9 00:00:20,330 --> 00:00:22,423 of design just briefly. 10 00:00:23,500 --> 00:00:24,850 Let's start off though by talking 11 00:00:24,850 --> 00:00:27,630 about the basics of database schemas. 12 00:00:27,630 --> 00:00:29,160 So over here on the right, 13 00:00:29,160 --> 00:00:33,020 I have illustrated just a sample database to look at. 14 00:00:33,020 --> 00:00:35,610 So what is a database schema? 15 00:00:35,610 --> 00:00:37,787 Well, it's how we organize our data. 16 00:00:37,787 --> 00:00:39,930 And so a way to think about this is, 17 00:00:39,930 --> 00:00:42,700 if I had 10,000 sheets of paper 18 00:00:42,700 --> 00:00:45,270 and I just threw all of that into a room, 19 00:00:45,270 --> 00:00:47,590 it would be extremely frustrating 20 00:00:47,590 --> 00:00:49,680 to find anything. 21 00:00:49,680 --> 00:00:51,680 If we went instead though 22 00:00:51,680 --> 00:00:53,470 and got files and folders 23 00:00:53,470 --> 00:00:56,210 and we organized them into a cabinet, 24 00:00:56,210 --> 00:00:58,370 it would be much easier to figure out 25 00:00:58,370 --> 00:00:59,880 and find the piece of paper 26 00:00:59,880 --> 00:01:01,540 that we were looking for. 27 00:01:01,540 --> 00:01:03,480 Well, that's kind of what a schema does. 28 00:01:03,480 --> 00:01:05,960 A schema helps us to provide meaning 29 00:01:05,960 --> 00:01:08,160 to our database. 30 00:01:08,160 --> 00:01:11,460 Now, this could be both physical or logical. 31 00:01:11,460 --> 00:01:12,690 So when we think about logical, 32 00:01:12,690 --> 00:01:15,920 we can talk about constraints, and views, and tables. 33 00:01:15,920 --> 00:01:17,630 And then if we were talking about physical, 34 00:01:17,630 --> 00:01:20,113 we could talk about files and indices. 35 00:01:21,690 --> 00:01:25,200 Dynamic schemas are non-relational 36 00:01:25,200 --> 00:01:28,210 and fixed schemas are relational. 37 00:01:28,210 --> 00:01:32,220 So we talk about NoSQL and SQL databases. 38 00:01:32,220 --> 00:01:33,923 This is what we're talking about. 39 00:01:35,270 --> 00:01:36,900 So if we have a fixed schema, 40 00:01:36,900 --> 00:01:39,410 our columns are not going to change. 41 00:01:39,410 --> 00:01:41,200 If we have a dynamic schema, 42 00:01:41,200 --> 00:01:43,950 our columns might change over time 43 00:01:43,950 --> 00:01:45,710 or from different tables. 44 00:01:45,710 --> 00:01:47,690 And so it's more complicated. 45 00:01:47,690 --> 00:01:50,433 And so you view those 2 things differently. 46 00:01:53,660 --> 00:01:56,040 So some schemas that you need to know about. 47 00:01:56,040 --> 00:01:58,120 First is star schema. 48 00:01:58,120 --> 00:02:00,890 So let's say that we have a couple of tables 49 00:02:00,890 --> 00:02:03,170 over here on the right. 50 00:02:03,170 --> 00:02:06,060 Now, the first thing is our fact table. 51 00:02:06,060 --> 00:02:08,240 Now, our fact table is here. 52 00:02:08,240 --> 00:02:09,830 This is the central table 53 00:02:09,830 --> 00:02:12,060 and it's full of countable items. 54 00:02:12,060 --> 00:02:14,263 So numbers, basically. 55 00:02:16,050 --> 00:02:19,710 A star schema has 1 dimension table level. 56 00:02:19,710 --> 00:02:22,390 So that's this, those are our 4 tables, 57 00:02:22,390 --> 00:02:25,530 and there's only 1 level of those tables. 58 00:02:25,530 --> 00:02:27,680 If we had a 2 dimension table level, 59 00:02:27,680 --> 00:02:29,320 and we'll look at that here in a second, 60 00:02:29,320 --> 00:02:32,900 you would have another set of tables 61 00:02:32,900 --> 00:02:35,700 off of this first set, where the arrows are pointing at. 62 00:02:37,160 --> 00:02:41,010 All of the tables tie into your fact table 63 00:02:41,010 --> 00:02:42,253 in a star schema. 64 00:02:43,350 --> 00:02:45,940 And it's not normalized. 65 00:02:45,940 --> 00:02:47,930 And when we talk about normalization, 66 00:02:47,930 --> 00:02:50,390 what we think about is copies of data. 67 00:02:50,390 --> 00:02:53,980 So multiple copies of data 68 00:02:53,980 --> 00:02:56,023 could be in each of those tables. 69 00:02:57,640 --> 00:02:59,870 So star schemas are very easy 70 00:02:59,870 --> 00:03:01,960 to query for simple queries. 71 00:03:01,960 --> 00:03:04,683 And that's what you would design a star schema around. 72 00:03:07,730 --> 00:03:09,820 A snowflake schema, on the other hand, 73 00:03:09,820 --> 00:03:11,520 is like this. 74 00:03:11,520 --> 00:03:14,640 And so we have multiple table dimension levels. 75 00:03:14,640 --> 00:03:15,840 And so you can see that here, 76 00:03:15,840 --> 00:03:17,250 that's our first dimension 77 00:03:17,250 --> 00:03:19,860 and second dimension of tables. 78 00:03:19,860 --> 00:03:22,160 And so basically we're breaking that down 79 00:03:22,160 --> 00:03:22,993 a little further. 80 00:03:22,993 --> 00:03:25,640 We're breaking our information down further. 81 00:03:25,640 --> 00:03:29,060 And snowflake schemas have high cardinality, 82 00:03:29,060 --> 00:03:32,063 which means there's very little repetition. 83 00:03:33,380 --> 00:03:36,060 They're normalized, which means 84 00:03:36,060 --> 00:03:39,130 that our data is not going to be repeated. 85 00:03:39,130 --> 00:03:41,660 Again, that's little repetition. 86 00:03:41,660 --> 00:03:43,030 It's not going to be repeated 87 00:03:43,030 --> 00:03:45,460 across the different tables. 88 00:03:45,460 --> 00:03:47,840 And so snowflake schemas are much better 89 00:03:47,840 --> 00:03:50,520 for complex queries. 90 00:03:50,520 --> 00:03:53,130 However, they're not as good for simple queries 91 00:03:53,130 --> 00:03:54,490 like the star schema 92 00:03:54,490 --> 00:03:57,023 because of those multiple table levels. 93 00:03:58,160 --> 00:04:00,290 They also take up less storage base 94 00:04:00,290 --> 00:04:02,480 and the reason it takes up less storage base 95 00:04:02,480 --> 00:04:05,660 is because there's not the repetition 96 00:04:05,660 --> 00:04:08,530 that you see with star schemas. 97 00:04:12,030 --> 00:04:13,600 So let's kind of wrap this up 98 00:04:13,600 --> 00:04:16,980 and talk about some database design ideas. 99 00:04:16,980 --> 00:04:19,120 The first thing that you want to ask yourself is, 100 00:04:19,120 --> 00:04:21,540 what's the purpose of your database? 101 00:04:21,540 --> 00:04:25,600 So what kind of queries are you going to expect to run? 102 00:04:25,600 --> 00:04:26,710 So if you understand that, 103 00:04:26,710 --> 00:04:28,350 that's going to help you to understand things 104 00:04:28,350 --> 00:04:30,500 like star versus snowflake 105 00:04:30,500 --> 00:04:31,850 or other schema designs 106 00:04:31,850 --> 00:04:33,810 because there's more than just those 2. 107 00:04:33,810 --> 00:04:35,840 Those are just the 2 most common. 108 00:04:35,840 --> 00:04:37,090 And it will help you to figure out 109 00:04:37,090 --> 00:04:39,083 how you want to organize your data. 110 00:04:40,130 --> 00:04:41,700 Then you want to map out 111 00:04:41,700 --> 00:04:44,220 all of your information on a whiteboard. 112 00:04:44,220 --> 00:04:45,930 Yes, you can do it on something else. 113 00:04:45,930 --> 00:04:48,730 But honestly, I have found a whiteboard 114 00:04:48,730 --> 00:04:51,350 to be an extremely helpful tool 115 00:04:51,350 --> 00:04:53,090 to just kind of pull everything back 116 00:04:53,090 --> 00:04:54,310 and write out the ideas 117 00:04:54,310 --> 00:04:56,310 of what you're going to be querying, 118 00:04:56,310 --> 00:04:57,720 write out kind of the basics 119 00:04:57,720 --> 00:04:59,910 of how that data would then be organized 120 00:04:59,910 --> 00:05:00,980 into tables. 121 00:05:00,980 --> 00:05:02,460 So that's what you want to think about here 122 00:05:02,460 --> 00:05:04,530 is kind of putting those into boxes 123 00:05:04,530 --> 00:05:06,730 for your columns and tables. 124 00:05:06,730 --> 00:05:08,980 And then you want to think about your relationships 125 00:05:08,980 --> 00:05:10,640 as you do that as well. 126 00:05:10,640 --> 00:05:12,910 So think about that fact table, 127 00:05:12,910 --> 00:05:15,110 and the fact table is going to connect 128 00:05:15,110 --> 00:05:16,720 to all of those other tables 129 00:05:16,720 --> 00:05:19,390 via a relationship or a key. 130 00:05:19,390 --> 00:05:22,260 And how are we going to tie 131 00:05:22,260 --> 00:05:23,690 that relationship together. 132 00:05:23,690 --> 00:05:25,340 So think about the relationships 133 00:05:25,340 --> 00:05:27,540 and establish those relationships 134 00:05:27,540 --> 00:05:29,860 on that whiteboard drawing. 135 00:05:29,860 --> 00:05:31,740 And then finally go back and review 136 00:05:31,740 --> 00:05:33,210 and tweak the design. 137 00:05:33,210 --> 00:05:35,570 If you've done that, you're not done 138 00:05:35,570 --> 00:05:37,530 but that is going to give you the basics 139 00:05:37,530 --> 00:05:40,660 of what you need to have a pretty solid foundation 140 00:05:40,660 --> 00:05:42,550 as you then dive down further 141 00:05:42,550 --> 00:05:45,150 into your database design. 142 00:05:45,150 --> 00:05:47,560 I've found that a lot of architects 143 00:05:47,560 --> 00:05:50,890 seem to skip past this first kind of basic step 144 00:05:50,890 --> 00:05:52,690 and they want to immediately go down 145 00:05:52,690 --> 00:05:54,140 into the details of the data 146 00:05:54,140 --> 00:05:55,635 and where it's going to live and how, 147 00:05:55,635 --> 00:05:57,770 rather than kind of stepping back 148 00:05:57,770 --> 00:05:59,520 and thinking about the business need 149 00:05:59,520 --> 00:06:01,610 and the queries that you're going to be running on it, 150 00:06:01,610 --> 00:06:03,370 the purpose of your database, 151 00:06:03,370 --> 00:06:05,550 how large you think it's going to get, you know, 152 00:06:05,550 --> 00:06:07,880 and then establishing those relationships. 153 00:06:07,880 --> 00:06:09,330 If you do that at the beginning 154 00:06:09,330 --> 00:06:11,480 you're going to save yourself a lot of headache 155 00:06:11,480 --> 00:06:12,730 years down the road 156 00:06:12,730 --> 00:06:14,923 as your database continues to grow. 157 00:06:15,830 --> 00:06:18,620 Alright, so for the DP-203, 158 00:06:18,620 --> 00:06:20,270 let's wrap all this up. 159 00:06:20,270 --> 00:06:22,710 First, this should be a review. 160 00:06:22,710 --> 00:06:24,970 Some of these concepts I've talked about, 161 00:06:24,970 --> 00:06:26,340 actually most of these concepts 162 00:06:26,340 --> 00:06:28,780 I've talked about at a very high level. 163 00:06:28,780 --> 00:06:29,940 I did that for a reason. 164 00:06:29,940 --> 00:06:32,330 You don't need to know a ton of information 165 00:06:32,330 --> 00:06:34,730 in order to pass the DP-203, 166 00:06:34,730 --> 00:06:36,870 and this should be a review. 167 00:06:36,870 --> 00:06:39,450 If it's not, you're probably going to want to dive in 168 00:06:39,450 --> 00:06:42,120 a little further into schemas 169 00:06:42,120 --> 00:06:44,890 and some basics of data engineering, 170 00:06:44,890 --> 00:06:46,980 not necessarily for the DP-203 171 00:06:46,980 --> 00:06:49,350 but honestly, just for your sake 172 00:06:49,350 --> 00:06:51,363 as a data architect as you grow. 173 00:06:52,600 --> 00:06:55,330 Next, there are definitely more schemas, 174 00:06:55,330 --> 00:06:58,270 but star and snowflake are the most common 175 00:06:58,270 --> 00:06:59,810 and the ones that you're most likely 176 00:06:59,810 --> 00:07:03,080 to encounter on something like the DP-203. 177 00:07:03,080 --> 00:07:04,440 Now, there are other scenarios 178 00:07:04,440 --> 00:07:06,320 that you might get into as the data architect. 179 00:07:06,320 --> 00:07:08,900 But, like I said, these are the most common. 180 00:07:08,900 --> 00:07:12,220 And then finally, table design is complex 181 00:07:12,220 --> 00:07:14,320 but it is very important. 182 00:07:14,320 --> 00:07:17,130 As you get in and start building databases 183 00:07:17,130 --> 00:07:19,610 and looking at table design, 184 00:07:19,610 --> 00:07:21,230 don't skip past it. 185 00:07:21,230 --> 00:07:22,640 Make sure that you take the time 186 00:07:22,640 --> 00:07:24,810 at the front end on the design 187 00:07:24,810 --> 00:07:26,810 because that is going to save you hours 188 00:07:26,810 --> 00:07:29,070 and hours and hours of headache 189 00:07:29,070 --> 00:07:30,460 on the backend 190 00:07:30,460 --> 00:07:32,900 as your tables and your databases 191 00:07:32,900 --> 00:07:35,750 continue to grow and become more complex. 192 00:07:35,750 --> 00:07:38,200 So make sure that you argue for time 193 00:07:38,200 --> 00:07:39,610 and make sure that you spend the time 194 00:07:39,610 --> 00:07:42,110 to get really good designs up front. 195 00:07:42,110 --> 00:07:43,583 You'll thank yourself later. 196 00:07:45,360 --> 00:07:48,000 So as far as the DP-203 is concerned, 197 00:07:48,000 --> 00:07:49,710 that's it for this lesson 198 00:07:49,710 --> 00:07:51,160 and I'll see you in the next.