1 00:00:00,090 --> 00:00:01,320 Welcome back, everyone. 2 00:00:01,320 --> 00:00:03,630 We've got a very important section here. 3 00:00:03,630 --> 00:00:06,230 We're talking about aggregate functions. 4 00:00:06,240 --> 00:00:11,790 Essentially, we're going to learn about functions that can operate on multiple rows or multiple pieces 5 00:00:11,790 --> 00:00:17,910 of data at once to tell us the min or the max or the average or the sum or the count. 6 00:00:17,910 --> 00:00:20,250 And we're going to start with count. 7 00:00:20,520 --> 00:00:26,460 So the count function is one of the builtin functions we get in my SQL in the same way that we had can 8 00:00:26,460 --> 00:00:30,030 cat and upper and replace and all those. 9 00:00:30,510 --> 00:00:32,970 We don't have to do anything fancy to have access to those. 10 00:00:32,970 --> 00:00:36,330 It's the same concept with count we automatically have access to it. 11 00:00:36,330 --> 00:00:42,000 It's part of the syntax and we would use it in situations like if I wanted to find out how many books 12 00:00:42,000 --> 00:00:47,250 are in the database, that's a very simple one, especially because we know there's like what, 19 or 13 00:00:47,250 --> 00:00:47,820 20? 14 00:00:47,820 --> 00:00:48,870 There's not very many. 15 00:00:49,050 --> 00:00:56,010 We can find it out though, using the count function, if you ever see count and then in parentheses 16 00:00:56,010 --> 00:00:58,560 Star this is telling my SQL. 17 00:00:58,560 --> 00:01:01,860 We want to count the rows that we get back from this query. 18 00:01:01,860 --> 00:01:07,230 So think of this as select the number of rows from books. 19 00:01:07,230 --> 00:01:09,180 Let's try that and see what we get. 20 00:01:09,630 --> 00:01:13,560 Select not star, but count star. 21 00:01:14,310 --> 00:01:16,350 From books. 22 00:01:16,970 --> 00:01:18,800 And we get 19. 23 00:01:19,130 --> 00:01:22,250 So 19 rows in our books database. 24 00:01:22,580 --> 00:01:27,980 Now, it's important to know that this count function doesn't really play nice with other things. 25 00:01:27,980 --> 00:01:37,910 For example, what if I did select title comma, author, f name, comma, count star from books? 26 00:01:37,910 --> 00:01:39,650 What will our results look like? 27 00:01:40,130 --> 00:01:47,090 They don't really look like anything because when we do select count from books, it's boiling the whole 28 00:01:47,090 --> 00:01:49,600 result down into a single value. 29 00:01:49,610 --> 00:01:54,650 But when we do select title and author F name, that's asking for a bunch of rows. 30 00:01:54,650 --> 00:01:58,670 So we're asking for two things that don't work together, at least not easily. 31 00:01:58,670 --> 00:02:00,500 There is a setting we can change. 32 00:02:00,620 --> 00:02:02,870 It tells you about it here in this error message. 33 00:02:03,110 --> 00:02:11,870 It says in your aggregated query this expression number one called title right there is incompatible 34 00:02:11,870 --> 00:02:14,050 with SQL mode only for group. 35 00:02:14,240 --> 00:02:15,110 Don't worry about it. 36 00:02:15,110 --> 00:02:15,680 Really. 37 00:02:15,680 --> 00:02:16,630 It's not a big deal. 38 00:02:16,640 --> 00:02:24,110 But my point is that you don't often even want to do something like count all the rows and at the same 39 00:02:24,110 --> 00:02:25,610 time select title. 40 00:02:26,000 --> 00:02:31,010 But if you wanted to, it doesn't play nicely, at least not with modern SQL. 41 00:02:31,040 --> 00:02:33,890 You have to change some settings if you want to have that behavior. 42 00:02:34,670 --> 00:02:39,680 Okay, so how about if I want to count the number of author first names? 43 00:02:39,770 --> 00:02:44,930 Well, instead of doing Count Star, Count Star is going to select all the rows. 44 00:02:45,080 --> 00:02:51,740 Count author F name will count all of the first names or all the values in that column. 45 00:02:51,740 --> 00:03:00,650 So select count author F name from books, and we still get 19. 46 00:03:00,650 --> 00:03:02,210 So what is that counting? 47 00:03:02,390 --> 00:03:09,590 It's counting every time that a value is present in an author f name in that column. 48 00:03:09,590 --> 00:03:12,140 So all of our author f names are there. 49 00:03:12,140 --> 00:03:20,780 I don't think we have any null, but if I were to insert one into books, insert into books is anything 50 00:03:20,780 --> 00:03:21,500 required. 51 00:03:21,500 --> 00:03:25,790 I think it can just be empty values empty. 52 00:03:27,310 --> 00:03:28,570 I'll do it twice. 53 00:03:28,690 --> 00:03:33,070 Now what we have, if I do a select count star. 54 00:03:33,100 --> 00:03:36,830 This again counts the number of rows from books. 55 00:03:36,850 --> 00:03:38,860 There's 21 total rows. 56 00:03:38,860 --> 00:03:43,600 But if I count the author f names, there's only 19. 57 00:03:43,600 --> 00:03:49,000 Because two of those rows are empty and there is no author f name, they have null in them. 58 00:03:49,510 --> 00:03:54,370 So that could be useful if you just want to know how many times do we even have a value for author f 59 00:03:54,370 --> 00:03:54,850 name? 60 00:03:54,970 --> 00:04:01,330 But more often than not, what we probably want is to know how many distinct author first names are 61 00:04:01,330 --> 00:04:01,620 there. 62 00:04:01,630 --> 00:04:04,500 Maybe not, but at least that's what my slide says. 63 00:04:04,510 --> 00:04:05,800 That's what we want to find out. 64 00:04:05,800 --> 00:04:08,970 How many distinct or unique author first names. 65 00:04:08,980 --> 00:04:13,780 And the way that we do this is we actually pass distinct into the parentheses. 66 00:04:14,050 --> 00:04:17,740 So I want you to count the distinct first names. 67 00:04:17,740 --> 00:04:18,940 So let's try that. 68 00:04:19,630 --> 00:04:25,450 Instead of counting all of them, count the distinct first names and there's only 12. 69 00:04:25,450 --> 00:04:30,160 And that makes sense because we know that there's multiple. 70 00:04:32,100 --> 00:04:36,160 David Rose and Raymond Rose and Neil and Dave. 71 00:04:36,180 --> 00:04:42,600 So when we get rid of those duplicates and we only count the unique names, it's Jhumpa, Neal and then 72 00:04:42,600 --> 00:04:44,820 Dave, Michael, Patti and so on. 73 00:04:45,480 --> 00:04:50,670 These multiple days are only counted one time as an example, so we could do the same thing. 74 00:04:50,670 --> 00:04:54,360 Let's see how many years we had where books were released. 75 00:04:54,600 --> 00:04:59,850 So I'll just start with released year from books. 76 00:05:01,250 --> 00:05:08,690 And I know that there's some duplicates, so let's instead count the distinct released year values. 77 00:05:09,870 --> 00:05:17,880 And there are 16 total, distinct released years because I know at least 2003 shows up twice and I think 78 00:05:17,880 --> 00:05:20,490 2000 something else is in here. 79 00:05:20,520 --> 00:05:23,460 2001 is in here twice right there. 80 00:05:23,670 --> 00:05:25,330 So not a huge impact. 81 00:05:25,350 --> 00:05:31,590 Not that many duplicates, but we're getting the total number of distinct released year values. 82 00:05:31,590 --> 00:05:34,220 So remember, the role of that Distinct is important. 83 00:05:34,230 --> 00:05:35,970 This is valid code. 84 00:05:35,970 --> 00:05:41,070 This tells you how many values are present in the author f name column. 85 00:05:41,370 --> 00:05:44,700 They could be null so they wouldn't be included if they are null. 86 00:05:44,700 --> 00:05:50,040 Whereas here this tells us how many unique values there are for author f name. 87 00:05:50,040 --> 00:05:51,660 So this sort of thing can be really useful. 88 00:05:51,660 --> 00:05:53,790 Let's do one or two more examples here. 89 00:05:53,790 --> 00:06:03,540 We're counting the distinct last names in our dataset, so select count distinct author l name from 90 00:06:03,540 --> 00:06:04,350 books. 91 00:06:05,620 --> 00:06:10,600 And there's 11 distinct other last names because there's Harris in there twice. 92 00:06:10,600 --> 00:06:15,910 I believe there's two different Harris's Frieda Harris and Dan Harris, I think. 93 00:06:16,300 --> 00:06:17,530 And then how about this? 94 00:06:17,530 --> 00:06:18,640 How would we find out? 95 00:06:18,910 --> 00:06:22,180 Find out how many titles contain the word the. 96 00:06:22,660 --> 00:06:23,860 Well, I have the answer. 97 00:06:23,860 --> 00:06:25,450 But let's think about it for a moment. 98 00:06:25,450 --> 00:06:34,360 We know how to select the titles, select title where, and then we would do title like if we want it 99 00:06:34,360 --> 00:06:39,820 to contain the word the and then we'll just have our wild card percent the percent. 100 00:06:41,020 --> 00:06:46,390 And I have to say from books, of course, somehow I forgot that. 101 00:06:47,980 --> 00:06:48,760 Like this. 102 00:06:49,120 --> 00:06:53,080 The namesake, a hologram for the king, the Circle, blah, blah, blah. 103 00:06:53,230 --> 00:06:57,010 Consider the lobster Lincoln in the bardo so we get the correct results. 104 00:06:57,010 --> 00:07:00,000 But I want to count how many rows are here. 105 00:07:00,010 --> 00:07:01,250 So how do I do that? 106 00:07:01,270 --> 00:07:05,210 How do I count how many times or how many rows we got back? 107 00:07:05,230 --> 00:07:07,690 It's actually the first thing we saw in this video. 108 00:07:07,720 --> 00:07:11,260 Instead of selecting title, we'll select count. 109 00:07:12,090 --> 00:07:16,950 Star Count Starr means count the rose and it tells us six. 110 00:07:16,950 --> 00:07:17,640 And that's correct. 111 00:07:17,640 --> 00:07:17,850 Right. 112 00:07:17,850 --> 00:07:24,630 One, two, three, four, five, six, six results from books that had a title that contains the. 113 00:07:25,140 --> 00:07:25,680 All right. 114 00:07:25,680 --> 00:07:31,350 So there's a bit more to learn about count, but we kind of have to learn a whole other topic next which 115 00:07:31,350 --> 00:07:32,100 is grouped by. 116 00:07:32,130 --> 00:07:33,300 Really important stuff.