1 00:00:00,090 --> 00:00:03,360 Next up, we've got a brand new topic, window functions. 2 00:00:03,390 --> 00:00:06,570 Now, I'm going to warn you up front, this isn't terribly hard. 3 00:00:06,960 --> 00:00:11,430 However, it can be a little confusing when you're first starting out. 4 00:00:11,430 --> 00:00:14,260 It has a lot of moving pieces that have to be explained. 5 00:00:14,280 --> 00:00:15,850 So bear with me. 6 00:00:15,870 --> 00:00:19,980 We're going to conquer window functions in this section, and the first thing we have to do is try and 7 00:00:19,980 --> 00:00:21,570 understand what they are. 8 00:00:21,840 --> 00:00:29,640 So if you go to the docs and you look at the window function section, we get a little tiny bit of information 9 00:00:29,640 --> 00:00:30,540 on this first page. 10 00:00:30,540 --> 00:00:36,300 It says, My SQL supports window functions that for each row from a query, perform a calculation using 11 00:00:36,300 --> 00:00:38,220 rows related to that row. 12 00:00:39,060 --> 00:00:45,810 So it's a little confusing if you go to what I what I recommend is go into the concepts and syntax page 13 00:00:45,810 --> 00:00:48,540 to get a better explanation of how it works. 14 00:00:48,870 --> 00:00:54,750 Because if you just go to this page that has the descriptions, you'll see not all that much in the 15 00:00:54,750 --> 00:00:57,420 way of explanation and examples. 16 00:00:57,420 --> 00:00:58,920 It's more of just documentation. 17 00:00:58,920 --> 00:01:00,480 Here's how this one function works. 18 00:01:00,480 --> 00:01:02,280 Here's how this other function works. 19 00:01:02,280 --> 00:01:08,370 But it doesn't really explain what this partition by or over is some of the new syntax that you'll see. 20 00:01:08,370 --> 00:01:14,730 But if you go to this other page that I mentioned, the concepts and syntax, it's a decent reference, 21 00:01:14,730 --> 00:01:16,020 although it is a little brief. 22 00:01:16,110 --> 00:01:21,690 So with that out of the way, let's actually talk about what window functions are and how they work. 23 00:01:21,930 --> 00:01:28,200 So I'm going to be using a very simple data set in these slides just for rows where we have employees. 24 00:01:28,200 --> 00:01:31,170 This would be the employees table that each have an employee number. 25 00:01:31,170 --> 00:01:37,590 So this is from a real data set that I have that we'll be working with shortly that has, I don't know, 26 00:01:37,590 --> 00:01:39,870 fif 20, 30 different employees. 27 00:01:39,870 --> 00:01:42,000 But for the slides I only put four of them. 28 00:01:42,000 --> 00:01:47,190 So they have an employee number, their ID, what department they're in and what their salary is. 29 00:01:47,460 --> 00:01:53,130 Okay, so we have two departments here, sales and customer service, and this is going to be a lot 30 00:01:53,130 --> 00:01:53,730 to look at. 31 00:01:53,730 --> 00:01:56,580 But if I used group B, this is nothing new. 32 00:01:56,580 --> 00:02:03,090 If I did a group by department when I tried to calculate the average salary, well, this is kind of 33 00:02:03,090 --> 00:02:04,950 a visualization of what happens, right? 34 00:02:04,980 --> 00:02:10,440 We would get two groups, one based off of the department sales, another based off of the department 35 00:02:10,440 --> 00:02:11,730 customer service. 36 00:02:12,030 --> 00:02:17,370 We grouped by department, and each of those groups is then reduced down to a single row where we're 37 00:02:17,370 --> 00:02:18,570 selecting the department name. 38 00:02:18,570 --> 00:02:23,880 And the average salary again should be recap even if we don't know this data set. 39 00:02:23,880 --> 00:02:24,630 Exactly. 40 00:02:24,660 --> 00:02:27,210 It's a very simple group by group by department. 41 00:02:27,210 --> 00:02:33,270 Calculate the average salary so we get one row per group and that's how group BI works. 42 00:02:33,540 --> 00:02:37,230 We get one row that is collapsed down from a group of rows. 43 00:02:37,320 --> 00:02:37,920 Here. 44 00:02:37,920 --> 00:02:41,310 We had two sales rose to customer service rows. 45 00:02:41,310 --> 00:02:44,520 They get collapsed down into a single row each. 46 00:02:45,060 --> 00:02:51,300 So that's grouped by, we can call all sorts of aggregate functions and reduce down these groups into 47 00:02:51,300 --> 00:02:52,200 a single row. 48 00:02:52,200 --> 00:02:59,070 Now window functions are very similar to group by in some ways we can perform these aggregate operations 49 00:02:59,070 --> 00:03:00,720 on groups of rows. 50 00:03:00,720 --> 00:03:07,350 But the big difference is that window functions will produce a result for every row in a group or what 51 00:03:07,350 --> 00:03:09,840 we call in the window or windows. 52 00:03:09,840 --> 00:03:14,340 So I'm going to show you an example of one, and then we're going to dive into the syntax in future 53 00:03:14,340 --> 00:03:15,480 videos in this section. 54 00:03:15,480 --> 00:03:17,130 So it's a little overwhelming up front. 55 00:03:17,130 --> 00:03:20,910 Like I said, we're not going to worry about the syntax, but here's my data set. 56 00:03:20,910 --> 00:03:22,590 Imagine this is the whole table. 57 00:03:22,800 --> 00:03:25,950 Maybe there's thousands of rows, but we're just looking at for. 58 00:03:26,620 --> 00:03:30,670 And I run this code here that has a window function in it. 59 00:03:30,670 --> 00:03:33,700 Average salary over partitioned by department. 60 00:03:33,700 --> 00:03:35,320 So this is all new here. 61 00:03:35,710 --> 00:03:38,830 We've seen average but this over partition by. 62 00:03:38,830 --> 00:03:40,090 We're going to get to that. 63 00:03:40,090 --> 00:03:45,520 But think of it as basically saying calculate the average salary by department. 64 00:03:45,670 --> 00:03:50,590 But instead of doing a group by there is no group by happening here, it's very different. 65 00:03:50,590 --> 00:03:56,500 We still get these windows, these basically these groups we partition by department. 66 00:03:56,500 --> 00:04:02,020 So sales is grouped together, customer service is grouped together and the average is calculated for 67 00:04:02,020 --> 00:04:02,860 each of those. 68 00:04:02,980 --> 00:04:05,200 But here's the really big difference. 69 00:04:05,200 --> 00:04:06,460 Look at our output. 70 00:04:06,760 --> 00:04:09,910 Our output has every single row present. 71 00:04:09,910 --> 00:04:13,060 It's not been reduced or collapsed down like a group by. 72 00:04:13,150 --> 00:04:15,370 So we still have the same four rows. 73 00:04:15,640 --> 00:04:22,029 But what we have is the department average that we calculated next to all the other information from 74 00:04:22,029 --> 00:04:22,720 each row. 75 00:04:23,140 --> 00:04:29,230 So if we go back to this definition of window functions, we performed an aggregate operation. 76 00:04:29,230 --> 00:04:34,270 We calculated the average by department, but we produced a result for each row. 77 00:04:34,270 --> 00:04:37,030 Now, a lot of these rows have the same value. 78 00:04:37,030 --> 00:04:39,280 I mean, that's that's how these window functions work. 79 00:04:39,280 --> 00:04:46,330 Often if I'm calculating the average department salary, well, any row that's in customer service will 80 00:04:46,330 --> 00:04:48,100 have that same department average. 81 00:04:48,100 --> 00:04:53,020 Any row from sales has the same sales department average one more time. 82 00:04:53,020 --> 00:04:59,200 Look at the end result here where every row has the department average calculated next to it for that 83 00:04:59,200 --> 00:05:00,610 corresponding department. 84 00:05:00,610 --> 00:05:04,930 If we had a group by like on this slide here we have the same numbers. 85 00:05:04,930 --> 00:05:09,090 Customer service average was five or $55,500. 86 00:05:09,220 --> 00:05:12,130 Sales department average was 59,500. 87 00:05:13,300 --> 00:05:17,410 Same thing here as you can see, 59 555 five. 88 00:05:17,590 --> 00:05:23,260 The difference, of course, is that we took these groups or SQL took those groups from group by and 89 00:05:23,260 --> 00:05:25,570 collapses them into a single row. 90 00:05:25,750 --> 00:05:32,410 So there could have been 1000 sales salaries, sales employees, their average is calculated and collapsed 91 00:05:32,410 --> 00:05:34,600 into a single value one row. 92 00:05:34,600 --> 00:05:36,220 And sometimes that's what we want. 93 00:05:36,220 --> 00:05:43,270 But what window functions allow us to do is to look at aggregate information alongside individual rows 94 00:05:43,270 --> 00:05:45,940 that that information was calculated based upon. 95 00:05:46,420 --> 00:05:51,610 So if you're still confused at is very normal, we're going to take time to go through how this works 96 00:05:51,610 --> 00:05:55,510 and look at the syntax we're going to talk about over in partition by and order by. 97 00:05:55,510 --> 00:05:59,650 And we'll look at some special window functions in the coming videos. 98 00:05:59,650 --> 00:06:05,200 But for now, the main takeaway is when we use a window function, whatever the operation is that we're 99 00:06:05,200 --> 00:06:09,400 performing, it will be performed on some windows of data. 100 00:06:09,400 --> 00:06:14,830 Think of it as just groups, groups of data that is somehow grouped together in some way, in our case 101 00:06:14,830 --> 00:06:22,840 by department name, and then some value is produced that is then re integrated alongside the initial 102 00:06:22,840 --> 00:06:24,760 rows as our output. 103 00:06:24,880 --> 00:06:26,530 So don't feel bad if you're confused. 104 00:06:26,530 --> 00:06:27,190 Hang in there. 105 00:06:27,190 --> 00:06:28,390 We'll learn more in the next video.