1 00:00:00,210 --> 00:00:00,810 Okay. 2 00:00:00,810 --> 00:00:06,930 So next up, we're talking about something called group B and Group B is going to be essential to everything 3 00:00:06,930 --> 00:00:08,580 we do in the rest of this section. 4 00:00:09,330 --> 00:00:14,250 And I'll warn you right now, take a deep breath that it's a little bit weird. 5 00:00:14,250 --> 00:00:16,470 It's different than the things that we've seen so far. 6 00:00:16,470 --> 00:00:23,340 And in fact, for me personally, it's a challenging one to teach because like like I just said, it 7 00:00:23,340 --> 00:00:24,210 is different. 8 00:00:24,690 --> 00:00:29,400 It's not just a single little function that we can throw in there, like Kin, Cat or uPAR, where we 9 00:00:29,400 --> 00:00:35,760 get a nice immediate impact, where we can see that was uppercase or that was concatenated or that's 10 00:00:35,760 --> 00:00:36,420 account. 11 00:00:36,780 --> 00:00:41,670 Rather, we have to use it alongside some other functions. 12 00:00:41,670 --> 00:00:43,560 So it's going to take a little bit to explain. 13 00:00:43,560 --> 00:00:46,590 So hang in there and just know that we're going to get there. 14 00:00:46,590 --> 00:00:48,420 There is a payoff and it is very useful. 15 00:00:48,420 --> 00:00:50,760 It's kind of the core of aggregating our data. 16 00:00:50,940 --> 00:00:57,750 It's how we can do things eventually, like average or some or find the minimum or the maximum and so 17 00:00:57,750 --> 00:00:58,140 on. 18 00:00:58,140 --> 00:00:59,130 So here we go. 19 00:01:00,210 --> 00:01:07,560 This is a little blurb that I wrote about grouped by group, by summarizes or aggregates identical data 20 00:01:07,560 --> 00:01:09,060 into single rows. 21 00:01:09,150 --> 00:01:11,460 So it's actually pretty self-explanatory. 22 00:01:11,460 --> 00:01:17,310 The name group by we can say things like take all of our books and group them by the year they were 23 00:01:17,310 --> 00:01:23,980 released or grouped them by the author last name or take all of our TS or actually let's say take all 24 00:01:23,980 --> 00:01:29,670 of the movies in our database and group them by genre, and that on its own is not that useful. 25 00:01:29,670 --> 00:01:36,630 But then we can say things like group them by genre and tell me how many each genre has or group are 26 00:01:36,630 --> 00:01:43,560 teas by variety, like green tea, white tea, black tea and so on, group arts by variety. 27 00:01:43,710 --> 00:01:46,980 And find the average sales price for each one. 28 00:01:47,040 --> 00:01:53,220 For each variety that is, or add together all the sales for each variety and figure out which tea type 29 00:01:53,220 --> 00:01:54,330 is selling best. 30 00:01:54,330 --> 00:01:55,710 So we'll get there in time. 31 00:01:55,710 --> 00:01:59,310 But for now, let's start with a little example. 32 00:01:59,310 --> 00:02:00,330 Again, here's a reminder. 33 00:02:00,330 --> 00:02:01,380 Stay with me, please. 34 00:02:02,890 --> 00:02:06,570 Let's say we're working with a very small subset of our book data. 35 00:02:06,580 --> 00:02:08,710 So I just grabbed five books. 36 00:02:08,710 --> 00:02:14,530 So when we do select title and author L name from books, right now we'll get 19 in our actual database. 37 00:02:14,530 --> 00:02:18,010 But this is a fictional database where we get five books. 38 00:02:18,370 --> 00:02:20,650 We've got The Namesake by Jhumpa Lahiri. 39 00:02:20,770 --> 00:02:22,420 Norse Mythology by Neil Gaiman. 40 00:02:22,420 --> 00:02:23,950 American Gods Parnell Gaiman. 41 00:02:23,950 --> 00:02:26,320 Interpreter of Maladies by Jhumpa Lahiri. 42 00:02:26,320 --> 00:02:28,510 And A Hologram for the King by Dave Eggers. 43 00:02:28,720 --> 00:02:33,370 All right, so we've get these five books out when we just select just like that, we don't do anything 44 00:02:33,370 --> 00:02:33,790 fancy. 45 00:02:33,790 --> 00:02:35,950 We're just getting title and author last name. 46 00:02:36,190 --> 00:02:41,920 So then we introduce Group Buy and we'll talk about the syntax a little bit. 47 00:02:41,920 --> 00:02:45,670 But all that we've done, this is the exact same line as before. 48 00:02:46,060 --> 00:02:52,330 Select Title An Author, Last name from books and we've added on at the end group by author name. 49 00:02:52,480 --> 00:02:56,260 And I've just done it on the second line just to make it so that this is all easier to read. 50 00:02:56,260 --> 00:02:57,610 Again, it doesn't matter. 51 00:02:58,240 --> 00:03:04,840 So group by author last name and what we get looks like this, which is bizarre. 52 00:03:05,920 --> 00:03:07,360 It's probably not what you would expect. 53 00:03:07,360 --> 00:03:11,530 It's not what to me makes sense on a visual level. 54 00:03:11,530 --> 00:03:13,900 At least it does make sense once you learn what's happening. 55 00:03:13,900 --> 00:03:22,240 But on a visual level, it looks like basically it's only just giving us the unique authors like Lahiri, 56 00:03:22,240 --> 00:03:25,360 Gaiman and Eggers, and then just picking their first book title. 57 00:03:26,680 --> 00:03:27,010 Right? 58 00:03:27,010 --> 00:03:32,590 So we're getting The Namesake for Lahiri, Norse mythology for Gaiman, and a hologram for the King 59 00:03:32,590 --> 00:03:33,070 for Eggers. 60 00:03:33,070 --> 00:03:34,750 But then we're missing these two books. 61 00:03:35,740 --> 00:03:37,480 And that's that is what's happening. 62 00:03:37,480 --> 00:03:40,180 That's what we're seeing here, which isn't really useful. 63 00:03:40,180 --> 00:03:41,320 Why would you want that? 64 00:03:41,650 --> 00:03:48,190 So what's actually happening is that even though it's just prints out The Namesake by Lahiri, it's 65 00:03:48,190 --> 00:03:51,310 basically grouping based off of last name. 66 00:03:51,880 --> 00:04:00,400 And so it creates one big, let's call it a super row with all of Lahiri's information or all of the 67 00:04:00,400 --> 00:04:02,290 books written by Lahiri. 68 00:04:03,040 --> 00:04:08,050 So that's the namesake and interpreter of maladies, and it treats that as one row. 69 00:04:09,550 --> 00:04:11,800 And it does the same thing for Neil Gaiman. 70 00:04:12,760 --> 00:04:14,110 And then the same thing for Eggers. 71 00:04:14,110 --> 00:04:16,510 And there's only one Dave Eggers book in our data set. 72 00:04:17,660 --> 00:04:23,720 So even though this is all that we see behind the scenes, they are grouped together in these super 73 00:04:23,720 --> 00:04:25,820 mega whatever we want to call them rows. 74 00:04:26,720 --> 00:04:32,180 And that's important because the data is now grouped and we can do things like this. 75 00:04:32,300 --> 00:04:35,120 Count how many books each author has written. 76 00:04:37,410 --> 00:04:39,720 So our query would look something like this. 77 00:04:39,720 --> 00:04:42,260 Select author LastName Comma. 78 00:04:42,270 --> 00:04:48,750 We don't have to have this here either, but it's like the author last name comma count star from books 79 00:04:49,110 --> 00:04:51,090 grouped by author last name. 80 00:04:51,090 --> 00:04:53,910 So it's going to group them like we've already seen. 81 00:04:54,720 --> 00:05:01,350 And then when we do a count star, it's not going to count every row in our entire table, but it's 82 00:05:01,350 --> 00:05:05,010 counting is for each of these mega rows. 83 00:05:05,010 --> 00:05:07,260 How many sub rows are in there? 84 00:05:07,260 --> 00:05:10,130 So basically how many things are under the theory? 85 00:05:10,140 --> 00:05:11,580 And in this case, there's two. 86 00:05:12,120 --> 00:05:15,210 How many things, how many books are grouped under? 87 00:05:15,210 --> 00:05:16,530 Gaiman Two. 88 00:05:17,340 --> 00:05:18,850 And how many are under Eggers? 89 00:05:18,870 --> 00:05:19,590 One. 90 00:05:20,550 --> 00:05:21,480 So then what? 91 00:05:21,480 --> 00:05:29,880 We get spit out when we're actually selecting the author last name and the count is this Lahiri to Gaiman 92 00:05:29,880 --> 00:05:31,860 to Eggers one. 93 00:05:32,100 --> 00:05:34,440 So we're only getting started here with Group B. 94 00:05:34,470 --> 00:05:37,620 I'm going to now dive into Cloud nine and do some more. 95 00:05:37,860 --> 00:05:42,270 But just know that we will be using Group B to do things like average. 96 00:05:42,540 --> 00:05:47,550 So once we have, let's say, grouped by authors, we could average the page count, like I said, or 97 00:05:47,550 --> 00:05:49,430 the average year that they wrote a book. 98 00:05:49,440 --> 00:05:53,970 Or we could find the first year they wrote a book using Min, which we'll see in just a moment. 99 00:05:54,120 --> 00:05:58,380 Or we could sum the total page count, for instance, for every author. 100 00:05:58,800 --> 00:06:01,110 So group is very important and versatile. 101 00:06:01,350 --> 00:06:04,350 For now, we're only using it in conjunction with Count. 102 00:06:05,100 --> 00:06:08,160 So now let's dive into Cloud nine and do just that. 103 00:06:08,670 --> 00:06:16,290 So let's start with something simple like select title comma, author l name. 104 00:06:16,320 --> 00:06:19,320 Let's do actually first name and author. 105 00:06:19,350 --> 00:06:21,180 Last name from books. 106 00:06:21,180 --> 00:06:24,810 So no count or anything like that yet. 107 00:06:25,200 --> 00:06:27,210 So we have a much bigger data set, right? 108 00:06:27,990 --> 00:06:30,090 Let's do a group by now. 109 00:06:32,840 --> 00:06:33,140 Group. 110 00:06:33,140 --> 00:06:33,470 Bye. 111 00:06:33,470 --> 00:06:35,240 And we'll do author last name. 112 00:06:35,990 --> 00:06:37,790 And we're going to just do the exact same thing. 113 00:06:37,790 --> 00:06:42,920 I want to know how many books each author has written and we will run into problems. 114 00:06:42,920 --> 00:06:44,120 And just telling you now with this. 115 00:06:44,120 --> 00:06:44,840 HARRIS Right. 116 00:06:44,840 --> 00:06:48,170 Because we have two different Harris's free two. 117 00:06:48,200 --> 00:06:49,070 Harris and Dan Harris. 118 00:06:49,070 --> 00:06:51,800 But when we grouped by last name, it just treats them as one. 119 00:06:51,800 --> 00:06:55,240 So it will say Harris has two books, which technically is true. 120 00:06:55,260 --> 00:06:57,650 The last name Harris has two books. 121 00:06:58,040 --> 00:07:02,390 Okay, so if we do this and just hit enter, it's not useful. 122 00:07:02,750 --> 00:07:02,990 You. 123 00:07:03,350 --> 00:07:04,790 I don't know why you would ever do that. 124 00:07:04,790 --> 00:07:10,610 Honestly, it just is giving you sort of a preview of the the mega rows, if you will. 125 00:07:10,640 --> 00:07:14,810 Basically, it's just taking the first book by Carver, the first book by Harris. 126 00:07:14,960 --> 00:07:17,270 But we don't see how they're actually grouped. 127 00:07:17,390 --> 00:07:18,560 So it's not that useful. 128 00:07:18,560 --> 00:07:20,030 It's not useful at all, honestly. 129 00:07:20,630 --> 00:07:26,390 So what we want to do is, rather than just title, author, first name, author, last name, let's 130 00:07:26,390 --> 00:07:29,750 do Count Star. 131 00:07:29,750 --> 00:07:34,940 And remember, Count Star in this case is referring to the groups rows. 132 00:07:35,240 --> 00:07:40,190 So it's going to go to each of those group rows and count how many rows have been grouped together. 133 00:07:41,720 --> 00:07:45,860 Oh, and this is kind of a mess to look at, so let's actually get rid of title. 134 00:07:48,730 --> 00:07:52,560 And just to author first name and last name, now we can see. 135 00:07:52,570 --> 00:07:55,570 So Raymond Carver has to Don DeLillo has one. 136 00:07:55,570 --> 00:07:57,040 Dave Eggers has three. 137 00:07:57,070 --> 00:07:58,650 David Foster Wallace has two. 138 00:07:58,660 --> 00:08:04,090 And then, as I mentioned, we have this problem with Harris because it says Dan Harris has two, but 139 00:08:04,090 --> 00:08:04,780 that's not true. 140 00:08:04,780 --> 00:08:06,010 Dan Harris has one book. 141 00:08:06,040 --> 00:08:07,390 Frieda Harris has one book. 142 00:08:07,840 --> 00:08:13,840 So to get around that, of course, we can do the same thing where rather than just doing group by author 143 00:08:13,840 --> 00:08:15,940 ll name hopefully you've seen it. 144 00:08:15,940 --> 00:08:19,750 This is something that's kind of standardized across MySQL. 145 00:08:19,780 --> 00:08:26,950 We can do author f name as well so it will group where both of these are unique. 146 00:08:28,180 --> 00:08:31,430 So then we do that and everything else is the same. 147 00:08:31,450 --> 00:08:35,080 Raymond Carver has two, Dave Eggers has three, Foster Wallace has two. 148 00:08:35,559 --> 00:08:36,490 But we go down here. 149 00:08:36,490 --> 00:08:41,110 We now have Dan Harris with one and Frieda Harris with one. 150 00:08:41,530 --> 00:08:42,159 All right. 151 00:08:42,159 --> 00:08:47,260 So just to show you another thing that we could do, we could also group buy released year. 152 00:08:47,560 --> 00:08:51,460 So we could regroup by title, of course, but we don't have any books that have the same title. 153 00:08:51,460 --> 00:08:53,830 But we do have a couple of books that have the same year. 154 00:08:54,190 --> 00:09:00,010 So let's say we want to get a nice printout that shows us in 2017, one book was released. 155 00:09:00,010 --> 00:09:03,790 In 2016, two books or however many books were released. 156 00:09:04,000 --> 00:09:15,580 So we can do that pretty easily with just a select and let's do released year comma count star. 157 00:09:15,790 --> 00:09:22,510 So if we just do select release year from books, you can see where it's an example. 158 00:09:22,510 --> 00:09:29,020 I think 2003, we've got at least two, 2000 threes and I know there's a couple more. 159 00:09:29,300 --> 00:09:30,910 I hope there's a couple more. 160 00:09:31,030 --> 00:09:37,480 So let's say we wanted to condense them all together and then put a count next to it as well. 161 00:09:38,770 --> 00:09:41,860 We would do that just like this select release year from books. 162 00:09:42,190 --> 00:09:52,570 We would group by release year and rather than only selecting the release year, we also want to select 163 00:09:52,570 --> 00:09:54,220 Count Star. 164 00:09:55,260 --> 00:09:56,160 Let's hit enter. 165 00:09:57,030 --> 00:10:01,230 Now you can see we've got 1945 one book, 2001. 166 00:10:01,230 --> 00:10:03,690 There were actually three releases, 2003. 167 00:10:03,690 --> 00:10:04,860 There were two releases. 168 00:10:05,700 --> 00:10:07,140 So you can see that it's working. 169 00:10:07,140 --> 00:10:11,580 And if you want to get really fancy, go ahead, move on to the next video if you'd like. 170 00:10:11,580 --> 00:10:17,220 But I'm just going to show if we want to get fancy and print it out nicely, we can combine it with 171 00:10:17,550 --> 00:10:18,690 concatenation. 172 00:10:18,690 --> 00:10:20,400 We could do something like this. 173 00:10:20,490 --> 00:10:23,070 So rather than just selecting the count. 174 00:10:24,150 --> 00:10:25,320 We could do. 175 00:10:27,000 --> 00:10:35,460 Let's see a select kitten cat and we're going to concatenate the release here comma. 176 00:10:36,090 --> 00:10:39,720 So let's do actually let's add a string before. 177 00:10:39,720 --> 00:10:46,500 So I'd like it to say like in 2010, two books released in one book, I guess in 2001, three books 178 00:10:46,500 --> 00:10:46,800 released. 179 00:10:46,800 --> 00:10:51,330 I actually want that string so we can do in the release year. 180 00:10:52,650 --> 00:10:53,430 Comma. 181 00:10:56,390 --> 00:10:57,470 The Count. 182 00:11:01,850 --> 00:11:02,690 Comma. 183 00:11:05,100 --> 00:11:06,930 Books released. 184 00:11:07,500 --> 00:11:13,410 And technically, just to make sure let's add parentheses around the SE because there's in a lot of 185 00:11:13,410 --> 00:11:14,760 cases one book release. 186 00:11:14,760 --> 00:11:20,610 So I don't want to say one books anyways, so let's make sure we have our correct parentheses here. 187 00:11:21,720 --> 00:11:26,610 Select the concatenation of in the released year. 188 00:11:27,720 --> 00:11:29,880 We probably want to space there, don't we? 189 00:11:29,880 --> 00:11:33,690 This is where it gets kind of annoying that we're not using a separate file. 190 00:11:33,690 --> 00:11:40,320 So in 2001, let's say space three space books released. 191 00:11:40,620 --> 00:11:48,180 And then lastly to clean this whole thing up, let's add an as and we'll just call this as let's just 192 00:11:48,180 --> 00:11:49,680 say year or something silly. 193 00:11:50,910 --> 00:11:57,450 We hit enter and we get this nice little table in 1945, one book released in 2003. 194 00:11:57,450 --> 00:12:00,390 Two books released in 2001 three books released. 195 00:12:00,840 --> 00:12:05,670 Okay, so that's it for now, at least with Group By. 196 00:12:05,700 --> 00:12:08,940 We're going to be working with it in every video from here on out in this section. 197 00:12:08,940 --> 00:12:14,640 Like I said, we're going to move on to things like Min and Max average, some fun things, hopefully 198 00:12:14,640 --> 00:12:15,300 sort of fun. 199 00:12:15,810 --> 00:12:17,820 Congratulations if you made it through this video. 200 00:12:18,090 --> 00:12:19,890 Like I said, it's a little bit weird. 201 00:12:19,890 --> 00:12:21,480 It's different than what we've done before. 202 00:12:21,510 --> 00:12:22,740 Hopefully it wasn't too boring.