1 00:00:00,180 --> 00:00:05,220 The next thing we're going to cover is the partition by clause, which we put inside of over. 2 00:00:05,250 --> 00:00:09,570 If we want to have a window function, work with multiple windows. 3 00:00:09,900 --> 00:00:11,760 We call them windows, not groups. 4 00:00:11,760 --> 00:00:14,220 It's the same concept though, right now. 5 00:00:14,220 --> 00:00:19,200 When we don't have anything in those prisons, we are finding the min salary and the max salary for 6 00:00:19,200 --> 00:00:20,640 all of our salaries. 7 00:00:20,640 --> 00:00:26,790 But if I instead want to find the minimum salary for each one of these departments in the maximum salary 8 00:00:26,790 --> 00:00:33,540 for each department, I would then partition by department when I use partition by instead of having 9 00:00:33,540 --> 00:00:38,850 one massive window with all the rows, we get smaller, presumably multiple windows. 10 00:00:38,850 --> 00:00:44,070 If we have different values for department, one window might have 20 rows, another one might have 11 00:00:44,070 --> 00:00:44,700 five rows. 12 00:00:44,700 --> 00:00:47,310 It's just basically the same concept as grouping. 13 00:00:47,310 --> 00:00:52,830 We call it Windows, and then each one of those windows will have whatever operation performed on its 14 00:00:52,830 --> 00:01:00,630 values, like average across the sales department window, average salary across the customer service 15 00:01:00,630 --> 00:01:01,620 department window. 16 00:01:01,650 --> 00:01:06,330 These chunks of rows average for that, average for these salaries. 17 00:01:06,330 --> 00:01:14,160 And then just like as we saw before, our output will include every row alongside whatever we calculated 18 00:01:14,160 --> 00:01:15,480 using a window function. 19 00:01:15,480 --> 00:01:17,130 So let's take a look at this. 20 00:01:17,790 --> 00:01:22,200 Let's write a select employee number. 21 00:01:22,200 --> 00:01:33,420 We'll look at every employees number and their department and their salary alongside the average salary. 22 00:01:33,990 --> 00:01:40,530 But instead of saying over and having empty print, meaning the entire all the rows is one window, 23 00:01:40,560 --> 00:01:48,060 let's partition by and this doesn't have to be capitalized, but I do it partition by department. 24 00:01:49,250 --> 00:01:51,320 From employees. 25 00:01:52,010 --> 00:01:53,780 All right, so if I run this one. 26 00:01:55,880 --> 00:01:57,190 Take a look at what we get now. 27 00:01:57,500 --> 00:01:58,790 Let's rename this. 28 00:01:58,790 --> 00:02:00,110 That's a mess to look at. 29 00:02:00,800 --> 00:02:03,560 We could put these on different lines, clean it up a tad. 30 00:02:05,660 --> 00:02:09,229 And we'll call this as department average. 31 00:02:12,090 --> 00:02:12,510 All right. 32 00:02:12,510 --> 00:02:18,420 So now for every single row we see the employee number, the department and that individual's salary 33 00:02:18,450 --> 00:02:20,400 and then the department average. 34 00:02:20,400 --> 00:02:26,460 So all of the rows that have Department of Customer service, their average is 46,571. 35 00:02:27,420 --> 00:02:30,330 All the engineering rose, all those employees. 36 00:02:30,330 --> 00:02:34,860 The average salary is 81,285 sales. 37 00:02:34,860 --> 00:02:37,840 Once again, 77,228. 38 00:02:37,860 --> 00:02:44,670 We see individual rows with an individual salary alongside some aggregate information, in this case, 39 00:02:44,670 --> 00:02:46,340 the average by department. 40 00:02:46,350 --> 00:02:48,240 So we can't do this with group buy. 41 00:02:48,540 --> 00:02:53,550 I could, of course calculate the average by department very easily. 42 00:02:53,730 --> 00:03:04,440 We could do this select department and then average salary from employees group by department and we 43 00:03:04,440 --> 00:03:05,490 get the same numbers. 44 00:03:05,490 --> 00:03:07,960 But the output format is very different. 45 00:03:07,980 --> 00:03:12,600 We're not seeing this alongside each of those individual rows. 46 00:03:12,600 --> 00:03:15,600 Instead, all of our groups are collapsed to a single row. 47 00:03:16,080 --> 00:03:19,680 So this lets me see this person here made 80,000. 48 00:03:20,220 --> 00:03:23,720 As an engineer, how does that compare to the department average? 49 00:03:23,730 --> 00:03:24,720 It's pretty close. 50 00:03:25,230 --> 00:03:26,910 But then this person in sales. 51 00:03:26,910 --> 00:03:29,010 Wow, $159,000. 52 00:03:29,010 --> 00:03:30,990 How does that compare to the department average? 53 00:03:31,020 --> 00:03:32,310 It's not close at all. 54 00:03:32,310 --> 00:03:33,390 They're an outlier. 55 00:03:33,900 --> 00:03:40,590 So we could also do things like include the average salary for all employees. 56 00:03:40,590 --> 00:03:46,620 If once again, I don't put anything in over, we'll call this as company average. 57 00:03:48,220 --> 00:03:52,900 And I can have multiple of these window functions in the same output. 58 00:03:53,080 --> 00:03:56,530 So now we have the department average and the company average. 59 00:03:57,240 --> 00:03:59,030 So let's try another example. 60 00:03:59,040 --> 00:04:05,340 If I want to calculate the I don't know how about how many employees are in each department? 61 00:04:05,580 --> 00:04:07,290 This is easy enough. 62 00:04:08,100 --> 00:04:10,950 I can do a select employee number. 63 00:04:10,950 --> 00:04:19,470 I don't know if we need all this department and salary, but then I can also do a count and we'll just 64 00:04:19,470 --> 00:04:23,700 count the rows over and we'll partition by. 65 00:04:24,460 --> 00:04:25,660 Department again. 66 00:04:27,710 --> 00:04:32,480 And we'll call this as department count. 67 00:04:32,810 --> 00:04:33,320 Sure. 68 00:04:34,080 --> 00:04:36,510 From employees. 69 00:04:38,870 --> 00:04:43,280 And if I run that, we see how many individuals we really have. 70 00:04:43,280 --> 00:04:45,110 Seven in every single department, huh? 71 00:04:45,200 --> 00:04:51,320 Well, that's anticlimactic, but you could see our count per department. 72 00:04:51,350 --> 00:04:52,910 It's just hard to know that it's working. 73 00:04:52,910 --> 00:04:54,470 But I guess it is. 74 00:04:54,470 --> 00:04:56,690 We have seven in every single department. 75 00:04:57,320 --> 00:04:58,610 Let's try another example. 76 00:04:58,700 --> 00:04:59,810 Let's do some. 77 00:04:59,810 --> 00:05:02,630 Let's sum up the payroll for each department. 78 00:05:02,630 --> 00:05:06,290 And I'm using department because that's really all we have in this very simple data set. 79 00:05:06,620 --> 00:05:08,870 So I'll keep this same select. 80 00:05:08,870 --> 00:05:15,440 We'll take every employee number, department and salary, but then instead of average, we'll do some 81 00:05:16,070 --> 00:05:17,900 let's do our total payroll. 82 00:05:17,900 --> 00:05:19,370 So that would be some. 83 00:05:20,170 --> 00:05:23,590 Salary over one massive window. 84 00:05:23,590 --> 00:05:28,420 No partitions, just one big window with all the rows as total payroll. 85 00:05:28,990 --> 00:05:32,380 And then let's do a similar sum salary. 86 00:05:32,380 --> 00:05:33,880 But we'll partition it. 87 00:05:33,910 --> 00:05:37,030 Partition by department. 88 00:05:37,980 --> 00:05:40,140 As department payroll. 89 00:05:44,090 --> 00:05:44,890 And here we are. 90 00:05:44,900 --> 00:05:50,450 We see each employee, their number, their department, their salary, then the department payroll. 91 00:05:50,450 --> 00:05:54,980 So customer services payroll is $326,000 a year. 92 00:05:55,370 --> 00:05:59,240 And then the total company payroll is $1.4 million a year. 93 00:05:59,920 --> 00:06:00,970 Same thing down here. 94 00:06:00,970 --> 00:06:06,310 Engineering, we see 569,000 is the department payroll for each individual. 95 00:06:06,310 --> 00:06:09,580 We see their salary and then we also see the total payroll. 96 00:06:09,580 --> 00:06:12,130 And this is going to be the same for every single row. 97 00:06:13,270 --> 00:06:17,470 So I hope I know you're able to see sort of how this works. 98 00:06:17,470 --> 00:06:24,790 Maybe wrap your head around this concept of taking our data, forming it into these windows that we 99 00:06:24,790 --> 00:06:26,560 partition by in some way. 100 00:06:26,560 --> 00:06:29,260 We don't have to partition, but often we do. 101 00:06:29,410 --> 00:06:35,560 And then some calculation is run on each of those windows average salary for each window. 102 00:06:35,770 --> 00:06:42,400 And then instead of being collapsed down to a single row, all of our information is preserved as individual 103 00:06:42,400 --> 00:06:45,400 rows and then we get aggregate information added on. 104 00:06:46,190 --> 00:06:53,000 Okay, so that's how we can use partition by to form these smaller sub windows to group our rows together.