1 00:00:00,150 --> 00:00:06,210 Now then I don't really like now, then I'll use. 2 00:00:06,210 --> 00:00:06,540 All right. 3 00:00:06,540 --> 00:00:07,170 Too much. 4 00:00:07,440 --> 00:00:07,710 Okay. 5 00:00:07,710 --> 00:00:10,830 How about up next, we're going to talk about distinct. 6 00:00:11,070 --> 00:00:15,210 So distinct is something that we use in conjunction with Select. 7 00:00:15,210 --> 00:00:21,480 So when we're selecting data out, let's say we're doing something like give me all book titles from 8 00:00:21,480 --> 00:00:24,240 the database and we may have duplicates in there. 9 00:00:24,240 --> 00:00:27,900 And we would get all those duplicates because we just asked for all titles. 10 00:00:27,900 --> 00:00:32,189 But if I say give me all distinct titles, it will do what that sounds like. 11 00:00:32,189 --> 00:00:35,550 It will only give us the distinct or unique titles. 12 00:00:36,030 --> 00:00:39,420 So without further ado, let's take a look at it in use. 13 00:00:39,750 --> 00:00:41,010 So here's an example. 14 00:00:41,010 --> 00:00:46,140 The first thing that you should notice is that distinct comes right after Select. 15 00:00:46,440 --> 00:00:49,560 So we're saying select author last name. 16 00:00:49,560 --> 00:00:56,010 So there's a bunch of authors and in our table some books have or some authors have multiple books. 17 00:00:56,010 --> 00:00:59,460 So David Foster Wallace, for example, has two books in there. 18 00:00:59,460 --> 00:01:03,000 So when we say select author last name, we get two results. 19 00:01:03,000 --> 00:01:04,200 We get David Foster Wallace. 20 00:01:04,200 --> 00:01:08,760 And then again, David, if we're only doing last name, we get Foster Wallace and Foster Wallace. 21 00:01:08,760 --> 00:01:16,170 So if I wanted to know only the unique or the distinct author last names in my books table, I would 22 00:01:16,170 --> 00:01:18,420 write this query here select. 23 00:01:18,420 --> 00:01:23,460 And the only thing that's different is the distinct so select distinct author name from books. 24 00:01:23,970 --> 00:01:25,140 So let's try it out. 25 00:01:25,740 --> 00:01:30,450 We'll go over here, make some space, and let's just start just to show you what I'm talking about 26 00:01:30,450 --> 00:01:38,520 by select author without the distinct from books just like that. 27 00:01:39,430 --> 00:01:43,060 And as you can see, we have gay men who's in here twice. 28 00:01:43,120 --> 00:01:44,800 Eggers is in there twice. 29 00:01:45,220 --> 00:01:46,420 Carver is in there twice. 30 00:01:46,450 --> 00:01:48,010 Foster Wallace is in there twice. 31 00:01:48,010 --> 00:01:50,770 And Harris is in there twice. 32 00:01:51,400 --> 00:01:54,910 However, there's something kind of weird about Harris that we'll take a look at in just a moment. 33 00:01:55,390 --> 00:02:02,010 So if we only want to just get a survey of all the authors in our database, but we don't want duplicates. 34 00:02:02,020 --> 00:02:10,240 It's as simple as select distinct author l name from books just like that. 35 00:02:10,270 --> 00:02:10,990 Hit Enter. 36 00:02:11,380 --> 00:02:14,170 Now it's much shorter because those duplicates are gone. 37 00:02:14,830 --> 00:02:19,450 So now we only have one game and one Eggers, one Foster Wallace and one Harris. 38 00:02:20,080 --> 00:02:22,000 And of course, we could do that for other things. 39 00:02:22,000 --> 00:02:23,470 So let's try it. 40 00:02:23,470 --> 00:02:24,460 Four years. 41 00:02:24,460 --> 00:02:28,180 So we have select released year from books. 42 00:02:30,220 --> 00:02:30,940 You can see. 43 00:02:30,940 --> 00:02:31,840 Let's see. 44 00:02:32,210 --> 00:02:33,800 There are some books that have duplicates. 45 00:02:33,820 --> 00:02:35,830 So, for example, 2003. 46 00:02:37,180 --> 00:02:43,450 There's two 2003 se at least I think there's multiple there's at least one more. 47 00:02:43,450 --> 00:02:47,440 When I designed this or when I created the data, let's just focus on 2003. 48 00:02:47,440 --> 00:02:48,640 So there's two of them there. 49 00:02:48,640 --> 00:02:51,250 So if we do a select distinct. 50 00:02:53,110 --> 00:02:56,920 Released year from books. 51 00:02:57,730 --> 00:03:01,750 Our list is shorter and now we only have one 2003. 52 00:03:02,200 --> 00:03:07,690 So the reason I show it to you is not just to show you another way or another example, but also we're 53 00:03:07,690 --> 00:03:13,270 working with numbers here and that's important to remember that released year is an INT, it's not a 54 00:03:13,270 --> 00:03:15,820 bar chart, but distinct still works. 55 00:03:16,060 --> 00:03:22,210 There's one more advanced thing that I want to talk about here, which has to do with when we did our 56 00:03:22,210 --> 00:03:23,010 select. 57 00:03:23,020 --> 00:03:30,580 Let's start again with author LL Name from books when we did this remember that there's and actually 58 00:03:30,580 --> 00:03:35,830 let's do author f name and l name so. 59 00:03:38,030 --> 00:03:42,770 When we did this, you know, remember, we have Neil Gaiman twice and Dave Eggers and so on. 60 00:03:42,770 --> 00:03:50,000 But then there was Dan Harris and Frieda Harris, two different people, two different authors with 61 00:03:50,000 --> 00:03:52,520 the same last name, which is a pretty common situation. 62 00:03:52,910 --> 00:04:00,980 So when I do distinct select distinct author l name from books. 63 00:04:02,680 --> 00:04:07,720 It reduces them to just the distinct last name, which is what you would expect. 64 00:04:07,750 --> 00:04:10,240 It only gives us Harris once, which makes sense. 65 00:04:10,240 --> 00:04:12,760 We're asking four distinct author last names. 66 00:04:13,120 --> 00:04:16,540 But what about distinct full names? 67 00:04:16,570 --> 00:04:20,890 What if I want every distinct author's full name printed out? 68 00:04:20,920 --> 00:04:25,690 So basically, what if I just want to know how many distinct authors there are not just based off of 69 00:04:25,690 --> 00:04:29,350 last names or just first names, but both need to be distinct. 70 00:04:29,470 --> 00:04:34,930 So that Dan Harris is distinct from Frieda Harris. 71 00:04:35,500 --> 00:04:37,170 There are a couple of ways of doing it. 72 00:04:37,180 --> 00:04:41,680 One, you already know at this point, you can use Ken Cat. 73 00:04:42,010 --> 00:04:42,940 So I'll show you that. 74 00:04:42,940 --> 00:04:51,220 Now, what we could do is can Kat author f name and author ll name and then ask or select distinct versions 75 00:04:51,220 --> 00:04:51,910 of those. 76 00:04:52,270 --> 00:04:55,210 So if I make a new file, this one will be won't be too long. 77 00:04:55,210 --> 00:04:58,090 But I'll make a new directory first. 78 00:04:59,230 --> 00:05:02,080 And I'll just call it refining selection. 79 00:05:02,140 --> 00:05:07,120 You don't have to do this, but I'm getting a little messy in here, so I want to make a new file in 80 00:05:07,120 --> 00:05:12,150 that directory, and I'll just call it distinct dot SQL. 81 00:05:12,160 --> 00:05:13,300 The most important part. 82 00:05:14,110 --> 00:05:23,650 Okay, so in here right now, we have our select distinct author l name from books. 83 00:05:23,740 --> 00:05:25,300 So let's start off by just running this. 84 00:05:25,300 --> 00:05:27,070 Make sure that our file is working. 85 00:05:27,160 --> 00:05:31,690 One thing I will point out, I don't recommend putting spaces in your directory names. 86 00:05:32,590 --> 00:05:33,750 It's just not a great idea. 87 00:05:33,760 --> 00:05:35,510 So I am going to edit that. 88 00:05:35,530 --> 00:05:36,760 It won't break anything. 89 00:05:36,760 --> 00:05:39,160 We can still do it, but it's just not great. 90 00:05:42,950 --> 00:05:47,870 So let's just do like that refining election. 91 00:05:47,930 --> 00:05:51,070 Well, that might work too, but refining selections. 92 00:05:51,080 --> 00:05:51,650 All righty. 93 00:05:51,650 --> 00:06:02,270 So if we want to run it, it's a source refining underscore selections slash distinct dot SQL. 94 00:06:03,070 --> 00:06:03,860 Semicolon. 95 00:06:04,180 --> 00:06:05,110 And it works. 96 00:06:05,560 --> 00:06:07,000 So back to the problem at hand. 97 00:06:07,000 --> 00:06:11,890 How do we actually get just distinct authors, first name and last name? 98 00:06:12,250 --> 00:06:18,790 So what we could do, I'll comment this out for now is a select and then we can contact author first 99 00:06:18,790 --> 00:06:19,300 name. 100 00:06:22,380 --> 00:06:24,600 And Arthur last name. 101 00:06:24,930 --> 00:06:27,900 Just like that from books. 102 00:06:28,650 --> 00:06:31,980 And if we save and run that, now we get this. 103 00:06:32,880 --> 00:06:36,390 And if we try and do distinct on that whole thing. 104 00:06:39,090 --> 00:06:40,140 That works. 105 00:06:40,410 --> 00:06:46,830 We get Dan Harris and Frieda Harris are distinct and we only have one Neil Gaiman, one Dave Eggers 106 00:06:46,830 --> 00:06:47,510 and so on. 107 00:06:47,520 --> 00:06:53,490 And of course, we could go add the space that we want there and try doing it again. 108 00:06:53,490 --> 00:06:56,130 And that gives us distinct, full names like this. 109 00:06:56,310 --> 00:07:02,460 That's one way, but there's actually an easier way and that's using distinct. 110 00:07:02,460 --> 00:07:08,790 We just do a select distinct and if you do author we can do F name first or L name. 111 00:07:11,710 --> 00:07:13,270 Just separate them by a comma. 112 00:07:15,460 --> 00:07:17,110 And we tried running that now. 113 00:07:18,220 --> 00:07:21,510 You see, we only get unique rows back. 114 00:07:21,520 --> 00:07:25,120 So it's basically applying distinct to the entire row. 115 00:07:25,510 --> 00:07:29,680 So in this case, it's basically saying, I want the combination of the first name and last name to 116 00:07:29,680 --> 00:07:30,520 be distinct. 117 00:07:31,330 --> 00:07:38,370 So that's why we end up with one Neil Gaiman, one Dave Eggers and one Dan Harris and one Frieda Harris. 118 00:07:38,380 --> 00:07:40,240 So it works in that way. 119 00:07:40,630 --> 00:07:45,310 The only difference here, aside from this being longer, is that here we're actually just combining 120 00:07:45,310 --> 00:07:48,010 them and checking if it's distinct here. 121 00:07:48,130 --> 00:07:50,590 SQL, Stewart or MySQL is doing that for us. 122 00:07:50,950 --> 00:07:52,630 And it depends just how you want your results. 123 00:07:52,630 --> 00:07:53,290 Print it out. 124 00:07:53,290 --> 00:07:56,620 If you want them combined as a full name, then this method works better. 125 00:07:56,620 --> 00:08:00,910 And if you just want to keep your data separate in two different columns, then this is much easier 126 00:08:00,910 --> 00:08:01,990 and a little bit shorter. 127 00:08:02,170 --> 00:08:02,800 All right. 128 00:08:02,800 --> 00:08:05,740 So that's pretty much all there is to distinct at its core. 129 00:08:05,740 --> 00:08:07,780 It's pretty simple, and it's powerful, too. 130 00:08:07,780 --> 00:08:10,120 It's something we'll use quite a bit throughout the course. 131 00:08:10,120 --> 00:08:16,120 As our data gets more complicated, we'll also have more duplicated data, which isn't a bad thing. 132 00:08:16,120 --> 00:08:21,580 But sometimes when you're trying to understand it or work with it, you only want distinct data, so 133 00:08:21,580 --> 00:08:22,630 you use to sync. 134 00:08:23,230 --> 00:08:23,800 All right. 135 00:08:23,800 --> 00:08:24,610 I should shut up.