1 00:00:11,599 --> 00:00:18,090 Welcome back to Backspace Academy in this lecture on AWS auto-scaling. We're 2 00:00:18,090 --> 00:00:23,340 going to build on what you already know about auto scaling ec2 instances in an 3 00:00:23,340 --> 00:00:29,519 auto scaling group and look further into the AWS application auto scaling service 4 00:00:29,519 --> 00:00:36,660 and how those same auto scaling features can be applied to databases, to functions, 5 00:00:36,660 --> 00:00:40,649 to Lambda functions and then finally, we'll have a look at how we can 6 00:00:40,649 --> 00:00:49,590 implement some best practices around auto scaling. As we already know ec2 auto 7 00:00:49,590 --> 00:00:55,680 scaling allows us to create auto scaling groups of ec2 instances that can scale 8 00:00:55,680 --> 00:01:02,760 up or down depending on what conditions you set, and that enables elasticity and 9 00:01:02,760 --> 00:01:09,240 it does that by scaling horizontally, not vertically by deleting and then putting 10 00:01:09,240 --> 00:01:14,250 in a bigger server, it scales horizontally by adding or terminating 11 00:01:14,250 --> 00:01:20,909 these ec2 instances. It enables fault tolerance and it does that through using 12 00:01:20,909 --> 00:01:26,310 health checks. If an ec2 instance fails a health check it can be replaced with a 13 00:01:26,310 --> 00:01:33,659 healthy instance. It can span multiple availability zones but it cannot span 14 00:01:33,659 --> 00:01:39,750 multiple regions, so that redundancy is across multiple availability zones, 15 00:01:39,750 --> 00:01:45,060 not across multiple regions. So if a region goes down then your infrastructure will 16 00:01:45,060 --> 00:01:50,909 also go down, but if an availability zone goes down and you have multi a-z for 17 00:01:50,909 --> 00:01:56,189 your auto scaling then your infrastructure will still operate as it should. 18 00:01:56,189 --> 00:02:00,899 The basic parameters there are the minimum size and maximum size that 19 00:02:00,899 --> 00:02:06,360 we set and then the desired capacity, and the desired capacity is generally what we 20 00:02:06,360 --> 00:02:11,069 start with when we launch an auto scaling group. The benefits: 21 00:02:11,069 --> 00:02:14,970 fault tolerance, availability and much better cost management because we're 22 00:02:14,970 --> 00:02:19,260 scaling horizontally, we're going to make sure that we're getting the best out of 23 00:02:19,260 --> 00:02:26,250 those ec2 instances and the right amount of ec2 instances as well. On the right there 24 00:02:26,250 --> 00:02:30,900 we can see that we've got an auto scaling group. We've defined the desired 25 00:02:30,900 --> 00:02:35,130 capacity, and what will happen is when that auto scaling group is first launched, 26 00:02:35,130 --> 00:02:40,740 it will have that desired capacity, but it will scale out as needed 27 00:02:40,740 --> 00:02:47,190 when demand increases to its maximum size, and as demand decreases it will 28 00:02:47,190 --> 00:02:54,090 scale in and terminate those instances down to the minimum size and not below that. 29 00:02:54,090 --> 00:03:01,380 Before we create an auto scaling group the first thing that we need to do 30 00:03:01,380 --> 00:03:06,750 is that we need to define a launch configuration or else define a launch 31 00:03:06,750 --> 00:03:12,510 template. Now a launch configuration, that will describe the configuration of these 32 00:03:12,510 --> 00:03:17,430 ec2 instances that are going to be launched within this auto scaling group. 33 00:03:17,430 --> 00:03:23,970 In particular it will describe the AMI to be used, the instance type, the key 34 00:03:23,970 --> 00:03:28,590 pair to connect to these instances and also these security groups that will be 35 00:03:28,590 --> 00:03:34,560 applied to these instances that are launched. You can also use a launch 36 00:03:34,560 --> 00:03:40,080 configuration to describe any spot instance bid pricing as well and that 37 00:03:40,080 --> 00:03:47,850 will help you to reduce the cost of your instances. A launch template is similar 38 00:03:47,850 --> 00:03:53,700 to a launch configuration. We use it to define what these instances are going to 39 00:03:53,700 --> 00:03:57,780 look like that are going to be launched within an auto scaling group in the same 40 00:03:57,780 --> 00:04:01,380 way that we do at the launch configuration but the difference is that 41 00:04:01,380 --> 00:04:07,500 we can have version control. We can have multiple versions of the same template 42 00:04:07,500 --> 00:04:14,280 and that goes in with the philosophy of AWS of having or managing all of our 43 00:04:14,280 --> 00:04:19,380 infrastructure in the same way that we would manage software. Another advantage 44 00:04:19,380 --> 00:04:24,870 of using a launch template is that you can provision the capacity within your 45 00:04:24,870 --> 00:04:29,889 auto scaling group using multiple instance types, and also 46 00:04:29,889 --> 00:04:36,159 using both on demand and spot instances and a combination of all of these. 47 00:04:36,159 --> 00:04:40,900 Your existing auto scaling groups you can modify those to use launch templates 48 00:04:40,900 --> 00:04:44,979 instead and you can do that by simply going into the console selecting that 49 00:04:44,979 --> 00:04:50,199 auto scaling group and changing from the configuration to a launch template. 50 00:04:50,199 --> 00:04:54,159 You can also do it using the command-line interface or one of the many software 51 00:04:54,159 --> 00:04:59,710 development kits using the update auto scaling group command. Launch templates 52 00:04:59,710 --> 00:05:04,539 are recommended by AWS over launch configuration. So if you're creating a 53 00:05:04,539 --> 00:05:08,680 new auto scaling group make sure that you use a launch template instead of a 54 00:05:08,680 --> 00:05:16,389 launch configuration. An auto scaling group is a collection of ec2 instances 55 00:05:16,389 --> 00:05:23,710 organized within a group of instances and that capacity of that group can 56 00:05:23,710 --> 00:05:31,360 expand and contract by automatically deleting or adding ec2 instances as 57 00:05:31,360 --> 00:05:36,159 needed. The starting point when you first create an auto scaling group will be the 58 00:05:36,159 --> 00:05:42,430 desired capacity. Health checks achieve fault tolerance in your auto scaling 59 00:05:42,430 --> 00:05:47,139 group and they ensure that unhealthy instances are quickly terminated and 60 00:05:47,139 --> 00:05:53,259 replaced with healthy instances to maintain that desired capacity. 61 00:05:53,259 --> 00:05:58,870 Scaling plans define the ways that we want our auto scaling group to scale in and out. 62 00:05:58,870 --> 00:06:05,289 We can first maintain a desired capacity and then have that vary to within a 63 00:06:05,289 --> 00:06:11,409 minimum or maximum number of instances. We can also manually scale an auto 64 00:06:11,409 --> 00:06:16,860 scaling group by changing the desired capacity or the minimum or maximum 65 00:06:16,860 --> 00:06:23,139 number of instances in that scaling plan. So the number of instances can be 66 00:06:23,139 --> 00:06:27,969 increased or decreased automatically based upon the conditions that are 67 00:06:27,969 --> 00:06:33,819 specified within a scaling policy and that will define the metric that we're 68 00:06:33,819 --> 00:06:40,480 going to be using to make those scaling and scale out decisions. If we expect 69 00:06:40,480 --> 00:06:46,120 in the future at a certain time and date or on a regular schedule on a certain 70 00:06:46,120 --> 00:06:51,850 time of date we can base our capacity on that schedule and we do that by putting 71 00:06:51,850 --> 00:06:57,670 in the time and date and the action that we want to take as a scheduled action in 72 00:06:57,670 --> 00:07:03,820 the auto scaling group and we can quite simply do that in the ec2 auto-scaling 73 00:07:03,820 --> 00:07:13,210 console. Our scaling policy will define how much we want to scale based upon 74 00:07:13,210 --> 00:07:19,450 some defined conditions.After we have defined what the Cloudwatch metric that 75 00:07:19,450 --> 00:07:25,360 we're going to use to scale this and the conditions around that our auto scaling 76 00:07:25,360 --> 00:07:32,320 group will use, CloudWatch alarms and the associated policies to determine 77 00:07:32,320 --> 00:07:38,020 what that scaling will be. For example we could have this scaling in and out based 78 00:07:38,020 --> 00:07:46,030 upon the CPU utilization across all of these ec2 instances. The types of 79 00:07:46,030 --> 00:07:51,700 adjustments that we can make include change in capacity. So adding one or two 80 00:07:51,700 --> 00:07:57,370 or three instances, or we could have an exact capacity maintaining a specific 81 00:07:57,370 --> 00:08:02,710 capacity, or we could have a percent change in capacity so we could add 20 82 00:08:02,710 --> 00:08:08,800 percent capacity for example to our auto scaling group. There are a number of 83 00:08:08,800 --> 00:08:15,700 different scaling policy types that you can select for your auto scaling group 84 00:08:15,700 --> 00:08:22,390 the first one is target tracking scaling and that one is where AWS takes the most 85 00:08:22,390 --> 00:08:28,390 control around your strategy for scaling and all you simply do is that you will 86 00:08:28,390 --> 00:08:34,720 define a target value and that scaling will be based upon that target value for 87 00:08:34,720 --> 00:08:41,490 a specific metric that you define, and auto scaling from there will create and 88 00:08:41,490 --> 00:08:47,560 do the ongoing management of those cloudwatch alarms that will trigger that 89 00:08:47,560 --> 00:08:53,740 scaling policy and it will calculate, completely for you, you don't have to 90 00:08:53,740 --> 00:08:56,899 worry about this, it will calculate the scaling 91 00:08:56,899 --> 00:09:02,870 adjustment based upon that metric and the target value and it will base that 92 00:09:02,870 --> 00:09:10,699 decision on the demand on your auto scaling group. A step scaling policy 93 00:09:10,699 --> 00:09:17,630 allows us to define our own scaling adjustment values based on different 94 00:09:17,630 --> 00:09:24,079 bands of conditions and so it allows us to define small changes in our capacity 95 00:09:24,079 --> 00:09:30,170 for small changes in demand and also large changes in capacity for large 96 00:09:30,170 --> 00:09:34,880 changes in that demand, and so for example you might want to increase your 97 00:09:34,880 --> 00:09:43,370 capacity by 25% if your CPU utilization falls between 25% and 50% if you get a 98 00:09:43,370 --> 00:09:48,589 big demand on your auto scaling group you can set up where the CPU utilization 99 00:09:48,589 --> 00:09:55,519 is between 50 and 75 percent you can increase the capacity by 200 percent and 100 00:09:55,519 --> 00:10:01,730 that way you can quickly manage that spike in demand, and the last one there 101 00:10:01,730 --> 00:10:07,519 is a simple scaling policy type and that simply increases or decreases the 102 00:10:07,519 --> 00:10:13,430 capacity of your auto scaling group by a single scaling adjustment. So simply if 103 00:10:13,430 --> 00:10:19,910 your CPU utilization goes above say 50% then you would add X amount of instances. 104 00:10:19,910 --> 00:10:24,110 You can define a cool-down period to make sure that you don't double up on 105 00:10:24,110 --> 00:10:29,750 this so that when the next check comes in that you don't double up before the 106 00:10:29,750 --> 00:10:34,160 instances have had time to actually launch and be registered, and it may 107 00:10:34,160 --> 00:10:38,240 react very slowly to large spikes in demand because of that you're going to 108 00:10:38,240 --> 00:10:41,660 have a cool-down period that you need to go through, and then you need to adjust 109 00:10:41,660 --> 00:10:46,490 by that single scaling adjustment but it may also react too much if you've got 110 00:10:46,490 --> 00:10:49,939 too much of an adjustment, and so these are things that you really need to 111 00:10:49,939 --> 00:10:53,810 fine-tune, that you don't have to fine-tune if you use something like a 112 00:10:53,810 --> 00:11:01,160 target tracking strategy or scaling policy. In the same way that we can auto 113 00:11:01,160 --> 00:11:08,329 scale our ec2 instances we can also Auto scale our ECS service as well. 114 00:11:08,329 --> 00:11:14,509 So the ECS auto-scaling uses the AWS application auto-scaling 115 00:11:14,509 --> 00:11:19,159 service, and we'll talk more about that in the next slide, it allows you to 116 00:11:19,159 --> 00:11:25,429 increase or decrease the number of tasks, as opposed to the number of instances 117 00:11:25,429 --> 00:11:32,179 with ec2. We can increase or decrease the number of ECS tasks based upon a scaling 118 00:11:32,179 --> 00:11:37,039 policy. Again we've got target tracking where we can base it on a target value 119 00:11:37,039 --> 00:11:42,349 for a specific cloudwatch metric. We have step again where we can base it on 120 00:11:42,349 --> 00:11:48,589 a series of step adjustments that vary based upon the size of the alarm breech 121 00:11:48,589 --> 00:11:54,319 within cloudwatch, and finally we can also set up scheduled changes in our 122 00:11:54,319 --> 00:11:58,869 capacity as well based on date and time. 123 00:12:01,700 --> 00:12:06,089 AWS application auto-scaling allows us to apply 124 00:12:06,089 --> 00:12:10,620 auto-scaling and all of the benefits that come from auto scaling to many 125 00:12:10,620 --> 00:12:16,830 other services other than ec2, for example as we've seen with ECS, but also 126 00:12:16,830 --> 00:12:24,450 with ec2 spot fleets as well, with EMR clusters, app stream fleets, DynamoDB 127 00:12:24,450 --> 00:12:30,660 tables and global secondary indexes, Aurora replicas. So we can scale an RDS 128 00:12:30,660 --> 00:12:37,020 Aurora series of replicas as well. Sagemaker, comprehend and also Lambda functions. 129 00:12:37,020 --> 00:12:43,290 We can automatically provision the concurrency of those multiple lambda 130 00:12:43,290 --> 00:12:49,140 functions, and we can also apply auto scaling to our own custom resources as 131 00:12:49,140 --> 00:12:55,020 well. For the auto scaling service to automatically change the capacity of 132 00:12:55,020 --> 00:13:00,690 those resources it needs to have a service linked IAM role.So that it has 133 00:13:00,690 --> 00:13:06,480 those permissions to call those AWS services. An application auto scaling 134 00:13:06,480 --> 00:13:10,890 group can be created with the console, but you can also use the command line 135 00:13:10,890 --> 00:13:15,600 interface or one of the many software development kits as well. The commands 136 00:13:15,600 --> 00:13:20,760 there are register scaleable target, to register that target metric that you're 137 00:13:20,760 --> 00:13:25,260 going to be using, and then you can upload your scaling policy generally in 138 00:13:25,260 --> 00:13:31,080 JSON using put scaling policy, and then you can also define a scheduled action 139 00:13:31,080 --> 00:13:35,940 that you might want to increase or decrease the capacity of that group at a 140 00:13:35,940 --> 00:13:42,620 time in the future. When we create our application auto 141 00:13:42,620 --> 00:13:47,660 scaling group we can select a number of different scaling strategies. We can 142 00:13:47,660 --> 00:13:52,670 optimize for availability and that will maintain the resource utilization at 40 143 00:13:52,670 --> 00:13:59,120 percent, or for balance availability and cost and that will maintain 50 percent 144 00:13:59,120 --> 00:14:05,180 resource utilization, or we can optimize for cost and that will maintain at 70 145 00:14:05,180 --> 00:14:12,529 percent. If we like we can also define our own customs strategy and we do that 146 00:14:12,529 --> 00:14:16,850 by defining the scaling metric that we're going to be used that is going to 147 00:14:16,850 --> 00:14:22,070 be measuring those individual resources, so that could be CPU utilization for 148 00:14:22,070 --> 00:14:26,450 example, and then we have a target value for that scaling that we want to achieve 149 00:14:26,450 --> 00:14:31,880 and we also have a load metric which will measure the load on that entire 150 00:14:31,880 --> 00:14:39,110 auto scaling group. Now that load metric is normally used for predictive scaling 151 00:14:39,110 --> 00:14:43,910 and so the auto scaling service will look at the history of load on that auto 152 00:14:43,910 --> 00:14:47,750 scaling group and it will make adjustments to the scaling strategy 153 00:14:47,750 --> 00:14:50,860 based upon that. 154 00:14:52,489 --> 00:14:59,249 We can also use the application auto-scaling service with DynamoDB and 155 00:14:59,249 --> 00:15:06,029 we can use it to adjust the provisioned throughput capacity of both our tables 156 00:15:06,029 --> 00:15:11,970 and also of any global secondary index as well. What that does it will reduce 157 00:15:11,970 --> 00:15:18,389 any throttling of those requests when demand on our DynamoDB back-end gets 158 00:15:18,389 --> 00:15:23,850 high, by doing that we're going to provide a better experience for our 159 00:15:23,850 --> 00:15:28,170 clients that are connected to this back-end and reduce that latency of all 160 00:15:28,170 --> 00:15:33,480 those requests. We define a scaling policy which again will consist of a 161 00:15:33,480 --> 00:15:38,309 scaleable target, and that could be the read capacity or the write capacity of 162 00:15:38,309 --> 00:15:44,220 either that dynamodb table or global secondary index, or we could have both 163 00:15:44,220 --> 00:15:49,439 read and write capacity as a scaleable target. Then all we need to do is to 164 00:15:49,439 --> 00:15:56,069 define a target utilization of between 20 and 90%. If we would like to have a 165 00:15:56,069 --> 00:16:01,470 lot of spare capacity up our sleeve than we would define 20%. If we want to reduce 166 00:16:01,470 --> 00:16:07,470 costs and maximize the utilization of these tables then we could define 167 00:16:07,470 --> 00:16:13,649 anything up to 90%. Okay so here is how it works on the left there we've got our 168 00:16:13,649 --> 00:16:17,999 clients that we'll be connecting into this DynamoDB table, and that demand on 169 00:16:17,999 --> 00:16:22,619 that DynamoDB table will vary depending on the number of clients and the types 170 00:16:22,619 --> 00:16:28,860 of requests. So that variation in demand will be picked up by Amazon CloudWatch 171 00:16:28,860 --> 00:16:35,610 as a change in a cloudwatch metric. If that change exceeds an alarm level then 172 00:16:35,610 --> 00:16:40,069 Amazon CloudWatch will notify the application auto-scaling service, and 173 00:16:40,069 --> 00:16:45,209 optionally you could also have Amazon CloudWatch send out an SNS message to 174 00:16:45,209 --> 00:16:50,730 someone as well. When the application auto-scaling service receives a 175 00:16:50,730 --> 00:16:56,610 notification from Amazon CloudWatch then it will issue an update table 176 00:16:56,610 --> 00:17:02,299 operation to the dynamodb table and that will increase or decrease the 177 00:17:02,299 --> 00:17:08,579 provisioned throughput capacity of that table or global secondary index 178 00:17:08,579 --> 00:17:17,010 as well. We can also use the application auto-scaling service to dynamically 179 00:17:17,010 --> 00:17:24,179 adjust the number of Aurora replicas within an Aurora provisioned DB cluster. 180 00:17:24,179 --> 00:17:30,570 So this is a provisioned DB cluster as opposed to Aurora serverless. So you will 181 00:17:30,570 --> 00:17:36,000 have a real cluster and you'll have replicas within that cluster. 182 00:17:36,000 --> 00:17:41,820 It's available for both the MySQL and PostgreSQL database engines. 183 00:17:41,820 --> 00:17:47,610 Your scaling policy will consist of a target metric for example CPU utilization, you 184 00:17:47,610 --> 00:17:52,169 will define a minimum and maximum number of Aurora replicas that you would like. 185 00:17:52,169 --> 00:17:57,559 You can also define a cool-down period and that way you can make sure that 186 00:17:57,559 --> 00:18:02,730 scaling operations are finished before you invoke another scaling operation so 187 00:18:02,730 --> 00:18:08,190 you don't double up on those. You can also enable or disable scale in activity. 188 00:18:08,190 --> 00:18:11,970 So you can leave it scaled up and permanently scale it up, or you can 189 00:18:11,970 --> 00:18:17,480 enable that to scale back in when that demand goes down. 190 00:18:20,570 --> 00:18:27,960 When an AWS lambda function is invoked in response to a request for something 191 00:18:27,960 --> 00:18:35,190 to be computed, an instance will handle that request. If you get many requests 192 00:18:35,190 --> 00:18:39,330 coming in at the same time then there will be concurrent instances that will 193 00:18:39,330 --> 00:18:45,300 be handling those multiple requests. So when we get a large initial burst of 194 00:18:45,300 --> 00:18:50,850 traffic, the concurrency or the concurrent instances that are available 195 00:18:50,850 --> 00:18:58,680 within a region, can reach between 500 to 3,000 depending on which region that 196 00:18:58,680 --> 00:19:03,930 function is operating in. Once it's reached that initial burst of traffic, it 197 00:19:03,930 --> 00:19:11,060 will have to throttle those requests but after that it can scale an additional 198 00:19:11,060 --> 00:19:18,450 500 concurrent instances every minute up to the maximum of the regional 199 00:19:18,450 --> 00:19:25,020 concurrency limit, above that burst limit which is 1000. Now that 1000 it is a 200 00:19:25,020 --> 00:19:31,980 limit across all accounts but you can contact AWS support and put in a case to 201 00:19:31,980 --> 00:19:38,490 have that increased if you need it. Now obviously if you still exceed that 500 202 00:19:38,490 --> 00:19:44,070 instances per minute, the capacity of those, then again that will cause latency 203 00:19:44,070 --> 00:19:51,030 by throttling at those requests. Okay so in the gray there we've got the open 204 00:19:51,030 --> 00:19:56,220 requests that are needed to be handled by this Lambda function and in the 205 00:19:56,220 --> 00:20:00,240 orange there we've got the instances that have been invoked to handle that 206 00:20:00,240 --> 00:20:05,460 that requests for that compute capacity. So as we can see there as those requests 207 00:20:05,460 --> 00:20:12,780 come in they all be matched by function instances up until that burst limit. When 208 00:20:12,780 --> 00:20:20,370 we exceed that burst limit we can only add up to 500 extra instances every 209 00:20:20,370 --> 00:20:25,950 minute up until we reach that concurrency limit of 1000. 210 00:20:25,950 --> 00:20:31,460 So as we exceed that burst limit up until our concurrency limit, 211 00:20:31,460 --> 00:20:37,040 if we exceed any of that then those requests will need to be throttled and 212 00:20:37,040 --> 00:20:42,260 that will cause latency for your application, and then as those open 213 00:20:42,260 --> 00:20:47,450 requests are closed out, we can see there that the number of instances will slowly 214 00:20:47,450 --> 00:20:54,650 reduce down as well back down to the minimum level. One way of reducing any 215 00:20:54,650 --> 00:20:59,330 throttling within your Lambda architecture is to use Lambda provisioned 216 00:20:59,330 --> 00:21:06,700 concurrency, and what that will do is it will initialize the requested number of 217 00:21:06,700 --> 00:21:12,350 execution environments, all those invoked instances, that you specify and that will 218 00:21:12,350 --> 00:21:17,930 allow you to reduce that throttling and to reduce latency. So up until the burst 219 00:21:17,930 --> 00:21:23,420 behavior it will act exactly the same as standard concurrency and then when you 220 00:21:23,420 --> 00:21:30,950 exceed that burst it will scale up to that provisioned concurrency and then 221 00:21:30,950 --> 00:21:36,430 once that provision concurrency has exceeded then it will scale up normally 222 00:21:36,430 --> 00:21:44,510 above that which will be of the order of 500 additional instances per minute, and 223 00:21:44,510 --> 00:21:49,820 again here we see we've got our open requests in gray and we've got our 224 00:21:49,820 --> 00:21:54,680 instances that are invoked for this function in orange. So when we first 225 00:21:54,680 --> 00:21:59,810 start there we have our burst limit and then from that point in as more open 226 00:21:59,810 --> 00:22:06,350 requests come in then more instances are invoked up until that requested 227 00:22:06,350 --> 00:22:11,390 provision concurrency has reached, and it will maintain that, and then at the point 228 00:22:11,390 --> 00:22:17,540 where those open requests exceed that requested provision concurrency, and then 229 00:22:17,540 --> 00:22:23,420 from that point onwards it will be adding 500 per minute additional 230 00:22:23,420 --> 00:22:30,230 function instances the same as it would in a standard concurrency arrangement. 231 00:22:30,230 --> 00:22:34,520 The difference here is that above the first burst limit we're going to have some 232 00:22:34,520 --> 00:22:39,410 ready available provision concurrency that's going to prevent that throttling 233 00:22:39,410 --> 00:22:44,570 from occurring, but that said if we exceed that provision concurrency line 234 00:22:44,570 --> 00:22:48,910 and get a another big burst then it is still 235 00:22:48,910 --> 00:22:55,930 possible that we would get throttling as well. In order to handle any throttling 236 00:22:55,930 --> 00:23:00,580 that may occur if we exceed both our burst limit and our provisioned 237 00:23:00,580 --> 00:23:05,290 concurrency limit by a significant amount, what we can do is we can 238 00:23:05,290 --> 00:23:12,210 implement provisioned concurrency auto-scaling and that uses the AWS 239 00:23:12,210 --> 00:23:17,500 application auto scaling service and what it does it will look just that 240 00:23:17,500 --> 00:23:22,600 provision concurrency level automatically depending on demand on 241 00:23:22,600 --> 00:23:27,700 that function and it will use a target tracking scaling policy that will be 242 00:23:27,700 --> 00:23:34,480 based upon a utilization metric. As the function is more utilized then the 243 00:23:34,480 --> 00:23:40,360 provisioned concurrency level will be adjusted to accommodate that. Okay so 244 00:23:40,360 --> 00:23:45,820 here we have auto scaling with provision concurrency we've got the open requests 245 00:23:45,820 --> 00:23:50,860 again in gray there and we've got our function instances in orange there. So as 246 00:23:50,860 --> 00:23:56,170 we can see as the demand increases as those open requests increases our 247 00:23:56,170 --> 00:24:01,000 provisioned concurrency is going to change its going to step up, up, up, until we 248 00:24:01,000 --> 00:24:06,520 reach that maximum of the scaling range that we defined, and then above that it 249 00:24:06,520 --> 00:24:11,050 will be using standard concurrency. So if we want to exceed that it'll again be 250 00:24:11,050 --> 00:24:16,900 going on to that 500 additional instances per minute, then as the demand 251 00:24:16,900 --> 00:24:22,960 goes down and those open requests are closed down then you will see the 252 00:24:22,960 --> 00:24:28,390 provision capacity will change and then the functions will slowly come down to 253 00:24:28,390 --> 00:24:34,950 to a level to manage that lower number of open requests. 254 00:24:36,590 --> 00:24:42,600 There are a number of best practices recommended by AWS. First off there is 255 00:24:42,600 --> 00:24:47,970 make sure that you base your scaling on a one-minute frequency now with ec2 256 00:24:47,970 --> 00:24:54,570 standard cloud watch frequency is five minutes. So for an auto scaling we do 257 00:24:54,570 --> 00:25:00,480 recommend or AWS does recommend a one-minute frequency. Enable auto scaling 258 00:25:00,480 --> 00:25:06,060 group metrics rather than individual instances and that way you'll be taking 259 00:25:06,060 --> 00:25:11,490 a metric of the entire aggregate group not just individual instances. Use an 260 00:25:11,490 --> 00:25:17,840 appropriate instance type for example if you're using ec2 T2 type burstable 261 00:25:17,840 --> 00:25:23,790 instances you may run out of those CPU credits and that may not behave how you 262 00:25:23,790 --> 00:25:28,980 expected it to behave when those credit limits are exceeded. So take that into 263 00:25:28,980 --> 00:25:33,360 consideration if you're going to use bursts of all burstable all instances in 264 00:25:33,360 --> 00:25:39,360 an auto scaling group. There are some additional things you may want to take 265 00:25:39,360 --> 00:25:46,140 into consideration as well. When you have a predictive scaling plan that for 266 00:25:46,140 --> 00:25:53,040 example a target tracking scaling plan that is based on the forecast which is 267 00:25:53,040 --> 00:25:58,170 also based upon a history of the demand so what you can do is that when you 268 00:25:58,170 --> 00:26:02,820 first implement this scaling plan in your own you launch or auto scaling 269 00:26:02,820 --> 00:26:09,180 group you can set your scaling plan up as forecast only and then you can view 270 00:26:09,180 --> 00:26:13,410 how well it's working and then after that you can change it to forecast and 271 00:26:13,410 --> 00:26:20,130 scale when you're confident that that forecast quality is what you require. 272 00:26:20,130 --> 00:26:25,410 Now with custom predictive scaling you need to make sure that the scaling metric and 273 00:26:25,410 --> 00:26:29,670 the group load metrics, so you need to define both of those the scaling metric 274 00:26:29,670 --> 00:26:34,740 will be what is used to scale the auto scaling group in and out, and the group 275 00:26:34,740 --> 00:26:38,490 load metric will be used for that forecasting. You need to make sure that 276 00:26:38,490 --> 00:26:43,380 they are strongly correlated to the load that you are looking for on those 277 00:26:43,380 --> 00:26:47,880 instances. When you are implementing a new scaling plan for an auto scaling 278 00:26:47,880 --> 00:26:52,500 group make sure that you release any previously scheduled scaling 279 00:26:52,500 --> 00:26:56,429 actions when you are doing that otherwise it may interfere with your new 280 00:26:56,429 --> 00:27:03,240 scaling plan. If you are getting an active with problems error with your 281 00:27:03,240 --> 00:27:08,370 predictive scaling strategy that will mean that your scaling configuration 282 00:27:08,370 --> 00:27:12,720 that you are set up for those resources that are inside of that auto scaling 283 00:27:12,720 --> 00:27:17,880 group could not be applied and there are a couple of reasons for that. The first 284 00:27:17,880 --> 00:27:22,950 reason would be that the resource has already been added to another scaling 285 00:27:22,950 --> 00:27:26,990 policy so you need to make sure that is only added to that one scaling policy. 286 00:27:26,990 --> 00:27:32,159 The next cause it is that the auto scaling group does not meet the minimum 287 00:27:32,159 --> 00:27:36,960 requirements for predictive scaling. So if you're using a target tracking 288 00:27:36,960 --> 00:27:41,000 strategy for this auto scaling group there may not be enough information 289 00:27:41,000 --> 00:27:46,590 available to make a prediction on what that level should be. So the 290 00:27:46,590 --> 00:27:50,640 way around that is to wait 24 hours after creating that group to get that 291 00:27:50,640 --> 00:27:55,830 information and then once the service has got that then you will be able to 292 00:27:55,830 --> 00:28:01,679 configure that for predictive scaling. Ok so that brings us to the end of the 293 00:28:01,679 --> 00:28:05,490 lecture I hope you've enjoyed it and I look forward to seeing you in the next 294 00:28:05,490 --> 00:28:07,730 one.