1 00:00:01,330 --> 00:00:07,060 Our pivot is all about creating mini databases with an excel and then letting the database do the heavy 2 00:00:07,060 --> 00:00:13,000 lifting for you, you don't need to know any database languages or install any crazy systems to make 3 00:00:13,000 --> 00:00:13,630 this work. 4 00:00:13,960 --> 00:00:19,220 However, you will need to be able to structure your data in certain ways in traditional data work. 5 00:00:19,270 --> 00:00:23,020 What I'm going to discuss in this video is a type of data normalization. 6 00:00:23,650 --> 00:00:29,290 When you normalize the data set, you look for unique identifiers that describe how pieces of data relate 7 00:00:29,290 --> 00:00:29,950 to each other. 8 00:00:30,550 --> 00:00:35,920 In many ways, normalizing a data set is about defining the types of lookups that you might want to 9 00:00:35,920 --> 00:00:38,110 be able to perform with a given data set. 10 00:00:38,740 --> 00:00:44,530 Let's start by looking at our sales data, our sales data as a single data table so we're not working 11 00:00:44,530 --> 00:00:45,720 within a database yet. 12 00:00:46,540 --> 00:00:49,990 What are some ways that we can describe and explain this table to someone? 13 00:00:51,170 --> 00:00:57,620 As I describe this, I'm going to create an entity relationship diagram or database diagram that's related 14 00:00:57,620 --> 00:00:58,340 to this table. 15 00:00:59,550 --> 00:01:06,240 The simplest thing we can say is that it has four fields store, date, department and sales. 16 00:01:07,350 --> 00:01:12,750 That's a start, but we need to be more explicit in describing what our table looks like conceptually 17 00:01:13,560 --> 00:01:19,110 to define this in our diagram, I'll create a little box for a table called sales and add these four 18 00:01:19,110 --> 00:01:20,820 fields along the left side. 19 00:01:22,200 --> 00:01:27,900 I can go a step further and say that the sales column is a number the department and store fields are 20 00:01:27,900 --> 00:01:31,650 both text and then the date field contains date values. 21 00:01:32,690 --> 00:01:36,560 Our place in identifier for each of these next to the field name. 22 00:01:38,820 --> 00:01:44,220 This box defines the contents of our table fairly succinctly, but it's still very flat. 23 00:01:44,400 --> 00:01:49,440 There's no context about how the fields within the table relate to each other, aside from the simple 24 00:01:49,440 --> 00:01:50,880 fact that we know they do. 25 00:01:51,900 --> 00:01:57,360 Conceptually, we know that the starfield has a certain set of store names, if we define that unique 26 00:01:57,360 --> 00:02:03,620 set elsewhere, then we specified a unique store dimension table that's related to our sales table. 27 00:02:04,440 --> 00:02:08,250 So I'm going to create a little table called stores here to the side. 28 00:02:10,120 --> 00:02:13,310 This table looks dumb and pointless, but it's incredibly important. 29 00:02:13,720 --> 00:02:15,650 This is our first dimensional table. 30 00:02:15,700 --> 00:02:20,920 It defines all of the options that a store can be and it relates directly to the store column on our 31 00:02:20,920 --> 00:02:21,700 sales table. 32 00:02:22,820 --> 00:02:27,890 Visually, I can represent this with an arrow drawn from the store name field in the store's table to 33 00:02:27,890 --> 00:02:34,190 the store name field and our sales table, the arrow points into the sales table intentionally because 34 00:02:34,190 --> 00:02:39,320 for one entry in the store's table, like, say, Kroger, there are many copies of that name in the 35 00:02:39,320 --> 00:02:40,100 sales table. 36 00:02:41,980 --> 00:02:47,890 This is called a one to many relationship and will be a major component of this course, let's do the 37 00:02:47,890 --> 00:02:50,770 same thing with the department field this time just faster. 38 00:02:51,340 --> 00:02:57,490 We get our unique department values table and our data base diagram and we create a connector arrow. 39 00:02:57,970 --> 00:03:01,900 Note that the department and store both appear to be independent of one another. 40 00:03:02,590 --> 00:03:07,660 What this means is that each store should have every department and every department should be in every 41 00:03:07,660 --> 00:03:08,080 store. 42 00:03:09,060 --> 00:03:14,340 This type of a cross relationship can be called a cross joint or cross product, since we have three 43 00:03:14,340 --> 00:03:15,870 stores and seven departments. 44 00:03:15,900 --> 00:03:17,730 This means we have twenty one combinations. 45 00:03:17,760 --> 00:03:19,850 The two, the product of three and seven. 46 00:03:20,370 --> 00:03:22,560 We want to treat the date field the same way. 47 00:03:22,860 --> 00:03:26,820 We'll pull the unique dates into its own table and call this the calendar. 48 00:03:27,540 --> 00:03:33,090 When we build more robust reports will probably want a calendar table that extends outside of strictly 49 00:03:33,090 --> 00:03:34,540 our existing data range. 50 00:03:34,860 --> 00:03:36,930 However, the general concept is the same. 51 00:03:38,020 --> 00:03:43,900 I'll add this to our diagram with its fields and then I'll give it a relationship out to date, and 52 00:03:43,900 --> 00:03:47,110 now we have a spider web diagram that describes our sales table. 53 00:03:47,830 --> 00:03:51,690 With this diagram laid out, we fully described our singular sales table. 54 00:03:51,970 --> 00:03:56,890 We need to do this for each of our other tables to instead of walking through the creation of each one, 55 00:03:57,200 --> 00:03:59,160 let's go ahead and just fade to each of them. 56 00:03:59,830 --> 00:04:01,450 Here's the purchase orders table. 57 00:04:01,460 --> 00:04:02,530 It's very similar. 58 00:04:03,550 --> 00:04:07,000 Now, here's our labor expenses again, very similar. 59 00:04:08,520 --> 00:04:14,790 What commonalities do we see across all three of our data tables, they all had relationships to a calendar 60 00:04:14,790 --> 00:04:17,910 table and they all had relationships to a store table. 61 00:04:18,980 --> 00:04:23,630 Now, can you imagine what our data base diagram might look like if we put all of our tables in one 62 00:04:23,630 --> 00:04:24,080 place? 63 00:04:25,760 --> 00:04:30,680 Well, if we put them all in one diagram, we can begin to see how each different data set relates to 64 00:04:30,680 --> 00:04:33,410 the others through the different dimensional tables. 65 00:04:34,310 --> 00:04:39,590 One interesting notation to make is that our labor expenses has labor departments that are different 66 00:04:39,590 --> 00:04:43,490 from the other departments will address this detail in later lessons. 67 00:04:45,100 --> 00:04:50,350 At this stage, we've drawn out the structure for a database based upon our three tables going through 68 00:04:50,350 --> 00:04:56,190 the effort of drawing out every data structure you use is a bit tedious and not always strictly necessary. 69 00:04:56,470 --> 00:05:01,630 However, you should draw out a few to get used to breaking your data down into its components and tying 70 00:05:01,660 --> 00:05:02,350 them together. 71 00:05:03,410 --> 00:05:08,540 In just a few lessons, we'll have power pivot building and showing this diagram to us with a little 72 00:05:08,540 --> 00:05:09,440 bit less work. 73 00:05:10,190 --> 00:05:15,680 Before that, we will take this data structure and build it with our relationships tool in the next 74 00:05:15,680 --> 00:05:16,160 lesson.