1 00:00:00,090 --> 00:00:00,420 Okay. 2 00:00:00,420 --> 00:00:01,180 Welcome back. 3 00:00:01,200 --> 00:00:02,480 So we're finally here. 4 00:00:02,490 --> 00:00:04,500 This is a bit of a turning point in the course. 5 00:00:04,950 --> 00:00:12,180 Really important section not to scare you too much, but really fundamental to how SQL and my SQL are 6 00:00:12,180 --> 00:00:13,470 used in the real world. 7 00:00:14,200 --> 00:00:19,930 So this is a section all about relationships and also something called joins, which we'll get to in 8 00:00:19,930 --> 00:00:21,730 the second part of this section. 9 00:00:21,880 --> 00:00:24,460 So let's focus on the relationships part first. 10 00:00:24,610 --> 00:00:31,270 So far in this course, we've been working with very simple data, things like books or users where 11 00:00:31,270 --> 00:00:36,450 we only have a couple of columns per table and that's been by design. 12 00:00:36,460 --> 00:00:42,820 We need that simplicity in order to focus on what's important, to isolate the different features that 13 00:00:42,820 --> 00:00:46,720 we're talking about, whether it's functions or logical operators. 14 00:00:46,720 --> 00:00:51,520 It's much easier to teach those, and we don't have 20 different tables to worry about and we can just 15 00:00:51,970 --> 00:00:55,780 basically drill down into one table and focus on that. 16 00:00:56,140 --> 00:01:02,890 But that's typically not how the real world works, where, yeah, we might have a user's table, but 17 00:01:02,890 --> 00:01:08,430 there's no site that really can just get by on having users unless it's featureless. 18 00:01:08,440 --> 00:01:14,590 There is plenty of other things that need to go along with users, whether it's comments or likes or 19 00:01:14,590 --> 00:01:19,300 tweets or posts or photos or any sort of, I don't know, order transaction. 20 00:01:19,300 --> 00:01:22,360 There are all these different things that we could have alongside users. 21 00:01:22,360 --> 00:01:27,340 So we've been working with simple data, but that's about to change. 22 00:01:29,360 --> 00:01:35,000 Real world data, unlike what we've been working with, is really messy, typically and interrelated. 23 00:01:35,000 --> 00:01:37,220 There's a lot of connections between data. 24 00:01:37,220 --> 00:01:42,440 So some of those things I talked about with, let's say, users, if we're working with just a simple 25 00:01:42,440 --> 00:01:47,090 blog site, so nothing like a Facebook or a complex social network, just a simple blog. 26 00:01:47,090 --> 00:01:48,950 We still need to have user data. 27 00:01:48,980 --> 00:01:52,610 We need to have information about posts, blog posts. 28 00:01:52,610 --> 00:01:56,630 We also need to somehow keep track of comments and then tags. 29 00:01:56,990 --> 00:01:59,120 And that can get pretty complicated already. 30 00:01:59,120 --> 00:02:05,060 Not to mention things like advertising and tracking, not only ads on their own, but how ads are related 31 00:02:05,060 --> 00:02:07,940 to users and what users are clicking on and not clicking on. 32 00:02:07,940 --> 00:02:13,670 There's so much data that just from a simple blog site, something that may not seem that complex. 33 00:02:13,670 --> 00:02:20,270 There's so much stuff that we could store, so we're going to focus on how we work with interconnected 34 00:02:20,270 --> 00:02:22,430 data in the next couple of sections. 35 00:02:22,430 --> 00:02:24,620 So we're no longer focusing on one table. 36 00:02:24,650 --> 00:02:28,940 We're going to see how do we work with two tables or three or four that are related. 37 00:02:28,940 --> 00:02:34,880 So not just tables that exist independently, but table set, reference one another and are interconnected. 38 00:02:34,880 --> 00:02:35,960 So let's get going. 39 00:02:35,960 --> 00:02:39,830 And the first thing that I want to do is go back to our books data. 40 00:02:40,100 --> 00:02:43,220 So remember our books table, don't worry, we're done with it. 41 00:02:43,220 --> 00:02:47,210 We're not going to be typing any books code, but on its own it was really simple. 42 00:02:47,210 --> 00:02:54,110 We had a books table with author first name and last name, a book title released here, a page count 43 00:02:54,110 --> 00:02:58,820 in a stock inventory, I believe a stock quantity, and I think that was it. 44 00:02:59,150 --> 00:03:04,130 So again, on its own, that just represents a single book, but you can't really do much with that. 45 00:03:04,130 --> 00:03:09,440 But if we are running a sort of book site where we sold books, at a minimum, we would need to keep 46 00:03:09,440 --> 00:03:11,510 track of things like versions. 47 00:03:11,630 --> 00:03:16,010 So we didn't even talk about this, but there's often multiple versions of a single book. 48 00:03:16,010 --> 00:03:22,400 If you go on Amazon and you look at a book, let's say Harry Potter, the second book, Chamber of Secrets. 49 00:03:22,400 --> 00:03:27,650 Well, there's going to be the initial British release, the UK version as a hardcover, and then there's 50 00:03:27,650 --> 00:03:31,220 a paperback, and then there's the American version and then there's a paperback. 51 00:03:31,220 --> 00:03:36,050 There is a full color illustrated version that came out relatively, relatively recently. 52 00:03:36,050 --> 00:03:39,200 It looks pretty cool that has these giant illustrations. 53 00:03:39,200 --> 00:03:40,490 It's the same title. 54 00:03:41,240 --> 00:03:42,740 How do you keep track of that? 55 00:03:42,740 --> 00:03:46,940 Is that a different table or do you just make a separate book for every one of those? 56 00:03:46,940 --> 00:03:49,790 But if you do that, then how do you make sure that they're connected? 57 00:03:49,790 --> 00:03:54,890 Because when I go on Amazon and I look at the hardcover, it also will show me below. 58 00:03:55,160 --> 00:04:03,320 Some people also buy paperback version or it's also available and I don't know, 2540 different languages. 59 00:04:03,320 --> 00:04:06,710 So there's all sorts of things to worry about just with versions. 60 00:04:07,700 --> 00:04:09,050 But then we have authors. 61 00:04:09,350 --> 00:04:11,120 Well, that might seem straightforward. 62 00:04:11,150 --> 00:04:17,360 We have an author, first name and last name, two columns on our books table, but that's not sufficient 63 00:04:17,360 --> 00:04:18,980 when we have multiple authors. 64 00:04:19,519 --> 00:04:22,310 So some books a lot of books have multiple authors. 65 00:04:22,310 --> 00:04:28,370 What about if it's a research paper that we're selling or a journal that has dozens of authors potentially? 66 00:04:28,910 --> 00:04:32,750 So there's a lot of things to keep track of there that we can't really do right now. 67 00:04:32,750 --> 00:04:35,510 We only have room for one author in our columns. 68 00:04:36,630 --> 00:04:39,270 Then a whole big thing is customers. 69 00:04:39,690 --> 00:04:43,470 If we're doing a book shop, online books on their own is great. 70 00:04:43,470 --> 00:04:48,060 But we also need ways to keep track of users or customers is what I'm calling them in this case. 71 00:04:48,240 --> 00:04:54,840 So customers on their own also can't do that much, but we might keep track of their email, log in 72 00:04:54,840 --> 00:04:55,830 information. 73 00:04:56,280 --> 00:05:00,420 Of course, we wouldn't just store their password directly, but for simplicity's sake, let's say we're 74 00:05:00,420 --> 00:05:03,780 keeping track of password and email so that they can log in. 75 00:05:04,110 --> 00:05:06,450 And then there's a whole bunch of other things. 76 00:05:06,480 --> 00:05:07,220 Orders. 77 00:05:07,230 --> 00:05:11,040 So if customers want to order books, how do we store that? 78 00:05:11,130 --> 00:05:16,530 We need to create a new order, and it needs to be associated with a customer who created it. 79 00:05:16,890 --> 00:05:18,750 It needs to have address information. 80 00:05:18,750 --> 00:05:21,630 Maybe it needs to have the contents of the order. 81 00:05:21,870 --> 00:05:22,920 Are there books? 82 00:05:22,920 --> 00:05:23,920 One book? 83 00:05:23,940 --> 00:05:24,830 How do we support that? 84 00:05:24,840 --> 00:05:25,890 How do we handle that? 85 00:05:26,340 --> 00:05:33,480 What about if we're working with dates and times and we need to keep track of when a return period expires? 86 00:05:33,480 --> 00:05:39,840 So we need to have a an order date and then keep track of 30 days or 60 days after that or whatever 87 00:05:39,840 --> 00:05:40,380 it is. 88 00:05:40,500 --> 00:05:47,550 Then we also need to store or somehow keep track of the transaction information addresses for billing 89 00:05:47,550 --> 00:05:52,320 address, not only shipping address, the type of shipping, the billing information, there's so much 90 00:05:52,320 --> 00:05:56,270 to store and then a whole other thing is reviews. 91 00:05:56,280 --> 00:06:02,700 So if we go on Amazon or pretty much any book site, any book that you look at will have dozens or hundreds, 92 00:06:02,700 --> 00:06:04,410 sometimes thousands of reviews. 93 00:06:04,620 --> 00:06:11,850 So we've got reviews, their ratings, that's the number of stars, but then also a title for the review 94 00:06:12,510 --> 00:06:18,060 and then a caption and then the actual content of the review, which may be ten characters. 95 00:06:18,240 --> 00:06:20,850 I hate this book, however many characters that is. 96 00:06:20,850 --> 00:06:24,950 Or it could be a giant paragraph, someone's love letter that they're writing to a book. 97 00:06:24,960 --> 00:06:26,730 So how do we account for all of that? 98 00:06:26,730 --> 00:06:29,700 And then the last thing that we'll talk about are genres. 99 00:06:30,060 --> 00:06:34,990 So you may think that we could just get away with storing this in a book, which we absolutely could. 100 00:06:35,010 --> 00:06:39,900 We could just have one genre, but many books don't cleanly fit into one genre. 101 00:06:40,260 --> 00:06:46,170 We might have a vampire novel that's also historical fiction because it's set in Victorian period, 102 00:06:46,170 --> 00:06:46,800 England. 103 00:06:47,190 --> 00:06:54,570 Or we might have a sci fi book that has elements of fantasy, or who knows, you can have all sorts 104 00:06:54,570 --> 00:06:59,460 of cross-pollinated genres, so only assigning one genre might not be that useful. 105 00:06:59,460 --> 00:07:01,110 We may want to have a bunch more. 106 00:07:01,110 --> 00:07:06,210 And in fact on Amazon you can see there are books that will have two or three different genres or different 107 00:07:06,210 --> 00:07:07,830 tags associated with them. 108 00:07:08,070 --> 00:07:12,600 So what I'm trying to show you here is that we were working with books on their own, but in a typical 109 00:07:12,600 --> 00:07:16,320 app or a typical website that has to do with books and a bookstore. 110 00:07:16,320 --> 00:07:22,170 In our case, there's a lot more that goes along with books to make anything functional, to store any 111 00:07:22,170 --> 00:07:23,250 important information. 112 00:07:23,250 --> 00:07:28,490 And this is, I don't know, maybe half of the tables that you would need to get away with kind of the 113 00:07:28,560 --> 00:07:30,240 the MVP or the bare minimum. 114 00:07:30,240 --> 00:07:32,670 There's still many other things. 115 00:07:32,670 --> 00:07:36,870 Something like orders, for instance, might actually need to be split into a couple of tables. 116 00:07:36,870 --> 00:07:41,070 We might need to have an address section or an address table. 117 00:07:41,070 --> 00:07:44,460 We might need to then have a separate billing table. 118 00:07:44,550 --> 00:07:46,950 We might need to have shipping. 119 00:07:46,950 --> 00:07:49,560 There could be all different tables that we could split it up into. 120 00:07:49,590 --> 00:07:52,170 So this isn't to say this is the end all, be all. 121 00:07:52,170 --> 00:07:56,730 This is just that there's a couple of other tables that we need to consider at the bare minimum. 122 00:07:56,880 --> 00:07:58,530 Okay, so we'll stop here. 123 00:07:58,530 --> 00:08:03,600 In the next video, we're going to identify some of the different types of relationships between data 124 00:08:03,600 --> 00:08:07,080 at a high level, what are the different types of associations? 125 00:08:07,440 --> 00:08:08,520 And I'm back. 126 00:08:08,670 --> 00:08:10,500 I totally forgot to show you this. 127 00:08:10,500 --> 00:08:13,770 Even though I wrote a note to myself, I totally skipped this. 128 00:08:13,770 --> 00:08:20,400 So I'm back after recording this, adding this to the end, I want to show the schema diagram that I 129 00:08:20,400 --> 00:08:23,430 really think highlights how messy data can be. 130 00:08:23,820 --> 00:08:27,000 So here's I think I showed this early on in the course. 131 00:08:27,090 --> 00:08:31,770 This is just a potential schema for Facebook and it's definitely incomplete. 132 00:08:31,770 --> 00:08:35,039 It's maybe a 10th of the number of tables that Facebook has. 133 00:08:35,039 --> 00:08:42,600 But you can see just to store basic things like events and users, we have an event, Facebook event, 134 00:08:42,600 --> 00:08:49,680 we have event membership, which is related to RSVP states, which is also related to venues, which 135 00:08:49,680 --> 00:08:53,670 is also related to groups and to workplaces and to Facebook profiles. 136 00:08:53,670 --> 00:08:55,530 And it just goes on and on and on. 137 00:08:55,860 --> 00:08:58,710 So what I'm trying to show you is that data can get messy. 138 00:08:58,710 --> 00:09:02,640 And in fact, here's another super intimidating schema. 139 00:09:02,670 --> 00:09:08,460 This one is a little bit crazy because it has to do with storing protein database. 140 00:09:08,610 --> 00:09:14,580 I think it was I saw some article on IBM and I believe MIT and fortunately it's very low quality image. 141 00:09:14,580 --> 00:09:20,220 So you can actually see the tables, but each one of these little rectangles is a separate table and 142 00:09:20,220 --> 00:09:23,520 every line is a connection between the tables. 143 00:09:23,790 --> 00:09:25,320 Now, this is the worst I've ever seen. 144 00:09:25,320 --> 00:09:29,820 The craziest, and I think that was the point, is that they're trying to just show you they're never 145 00:09:29,820 --> 00:09:32,670 going to actually look at this to try and understand anything. 146 00:09:32,670 --> 00:09:36,180 But they're trying to show you how complex protein information is when you have. 147 00:09:36,320 --> 00:09:39,560 All these different combinations and permutations for a given protein. 148 00:09:39,680 --> 00:09:44,960 Without going into too much detail here, this isn't a bio course, but the way that a protein is folded, 149 00:09:45,260 --> 00:09:50,480 the same structure, basically the same amino acids, depending on how it's folded, can drastically 150 00:09:50,480 --> 00:09:53,390 change the features, if you will. 151 00:09:53,420 --> 00:09:55,820 Basically what the protein does, how it behaves. 152 00:09:55,820 --> 00:10:05,030 And so this is a way of documenting, using SQL in some way the structure of a protein using a database. 153 00:10:05,570 --> 00:10:11,000 So not to scare you there, I just think it's kind of a powerful image, although grainy, it is powerful 154 00:10:11,000 --> 00:10:18,590 to see basically some really messy, ugly gross data that can be supported using tables. 155 00:10:18,590 --> 00:10:20,270 It's just doesn't look pretty. 156 00:10:20,420 --> 00:10:21,380 But that's the important part. 157 00:10:21,380 --> 00:10:26,240 And kind of the point of this video is I wanted to get you prepared mentally for working with some more 158 00:10:26,240 --> 00:10:28,130 complex data, although nothing that looks like this.