1

00:00:11,599  -->  00:00:18,090
Welcome back to Backspace Academy in
this lecture on AWS auto-scaling. We're

2

00:00:18,090  -->  00:00:23,340
going to build on what you already know
about auto scaling ec2 instances in an

3

00:00:23,340  -->  00:00:29,519
auto scaling group and look further into
the AWS application auto scaling service

4

00:00:29,519  -->  00:00:36,660
and how those same auto scaling features
can be applied to databases, to functions,

5

00:00:36,660  -->  00:00:40,649
to Lambda functions and then finally,
we'll have a look at how we can

6

00:00:40,649  -->  00:00:49,590
implement some best practices around
auto scaling. As we already know ec2 auto

7

00:00:49,590  -->  00:00:55,680
scaling allows us to create auto scaling
groups of ec2 instances that can scale

8

00:00:55,680  -->  00:01:02,760
up or down depending on what conditions
you set, and that enables elasticity and

9

00:01:02,760  -->  00:01:09,240
it does that by scaling horizontally, not
vertically by deleting and then putting

10

00:01:09,240  -->  00:01:14,250
in a bigger server, it scales
horizontally by adding or terminating

11

00:01:14,250  -->  00:01:20,909
these ec2 instances. It enables fault
tolerance and it does that through using

12

00:01:20,909  -->  00:01:26,310
health checks. If an ec2 instance fails a
health check it can be replaced with a

13

00:01:26,310  -->  00:01:33,659
healthy instance. It can span multiple
availability zones but it cannot span

14

00:01:33,659  -->  00:01:39,750
multiple regions, so that redundancy is
across multiple availability zones,

15

00:01:39,750  -->  00:01:45,060
not across multiple regions. So if a region
goes down then your infrastructure will

16

00:01:45,060  -->  00:01:50,909
also go down, but if an availability zone
goes down and you have multi a-z for

17

00:01:50,909  -->  00:01:56,189
your auto scaling then your
infrastructure will still operate as it should.

18

00:01:56,189  -->  00:02:00,899
The basic parameters there are
the minimum size and maximum size that

19

00:02:00,899  -->  00:02:06,360
we set and then the desired capacity, and
the desired capacity is generally what we

20

00:02:06,360  -->  00:02:11,069
start with when we launch an auto
scaling group. The benefits:

21

00:02:11,069  -->  00:02:14,970
fault tolerance, availability and much
better cost management because we're

22

00:02:14,970  -->  00:02:19,260
scaling horizontally, we're going to make
sure that we're getting the best out of

23

00:02:19,260  -->  00:02:26,250
those ec2 instances and the right amount of
ec2 instances as well. On the right there

24

00:02:26,250  -->  00:02:30,900
we can see that we've got an auto
scaling group. We've defined the desired

25

00:02:30,900  -->  00:02:35,130
capacity, and what will happen is when
that auto scaling group is first launched,

26

00:02:35,130  -->  00:02:40,740
it will have that desired
capacity, but it will scale out as needed

27

00:02:40,740  -->  00:02:47,190
when demand increases to its maximum
size, and as demand decreases it will

28

00:02:47,190  -->  00:02:54,090
scale in and terminate those instances
down to the minimum size and not below that.

29

00:02:54,090  -->  00:03:01,380
Before we create an auto scaling
group the first thing that we need to do

30

00:03:01,380  -->  00:03:06,750
is that we need to define a launch
configuration or else define a launch

31

00:03:06,750  -->  00:03:12,510
template. Now a launch configuration, that
will describe the configuration of these

32

00:03:12,510  -->  00:03:17,430
ec2 instances that are going to be
launched within this auto scaling group.

33

00:03:17,430  -->  00:03:23,970
In particular it will describe the AMI
to be used, the instance type, the key

34

00:03:23,970  -->  00:03:28,590
pair to connect to these instances and
also these security groups that will be

35

00:03:28,590  -->  00:03:34,560
applied to these instances that are
launched. You can also use a launch

36

00:03:34,560  -->  00:03:40,080
configuration to describe any spot
instance bid pricing as well and that

37

00:03:40,080  -->  00:03:47,850
will help you to reduce the cost of your
instances. A launch template is similar

38

00:03:47,850  -->  00:03:53,700
to a launch configuration. We use it to
define what these instances are going to

39

00:03:53,700  -->  00:03:57,780
look like that are going to be launched
within an auto scaling group in the same

40

00:03:57,780  -->  00:04:01,380
way that we do at the launch
configuration but the difference is that

41

00:04:01,380  -->  00:04:07,500
we can have version control. We can have
multiple versions of the same template

42

00:04:07,500  -->  00:04:14,280
and that goes in with the philosophy of
AWS of having or managing all of our

43

00:04:14,280  -->  00:04:19,380
infrastructure in the same way that we
would manage software. Another advantage

44

00:04:19,380  -->  00:04:24,870
of using a launch template is that you
can provision the capacity within your

45

00:04:24,870  -->  00:04:29,889
auto scaling group
using multiple instance types, and also

46

00:04:29,889  -->  00:04:36,159
using both on demand and spot instances
and a combination of all of these.

47

00:04:36,159  -->  00:04:40,900
Your existing auto scaling groups you can
modify those to use launch templates

48

00:04:40,900  -->  00:04:44,979
instead and you can do that by simply
going into the console selecting that

49

00:04:44,979  -->  00:04:50,199
auto scaling group and changing from the
configuration to a launch template.

50

00:04:50,199  -->  00:04:54,159
You can also do it using the command-line
interface or one of the many software

51

00:04:54,159  -->  00:04:59,710
development kits using the update auto
scaling group command. Launch templates

52

00:04:59,710  -->  00:05:04,539
are recommended by AWS over launch
configuration. So if you're creating a

53

00:05:04,539  -->  00:05:08,680
new auto scaling group make sure that
you use a launch template instead of a

54

00:05:08,680  -->  00:05:16,389
launch configuration. An auto scaling
group is a collection of ec2 instances

55

00:05:16,389  -->  00:05:23,710
organized within a group of instances
and that capacity of that group can

56

00:05:23,710  -->  00:05:31,360
expand and contract by automatically
deleting or adding ec2 instances as

57

00:05:31,360  -->  00:05:36,159
needed. The starting point when you first
create an auto scaling group will be the

58

00:05:36,159  -->  00:05:42,430
desired capacity. Health checks achieve
fault tolerance in your auto scaling

59

00:05:42,430  -->  00:05:47,139
group and they ensure that unhealthy
instances are quickly terminated and

60

00:05:47,139  -->  00:05:53,259
replaced with healthy instances to
maintain that desired capacity.

61

00:05:53,259  -->  00:05:58,870
Scaling plans define the ways that we want our
auto scaling group to scale in and out.

62

00:05:58,870  -->  00:06:05,289
We can first maintain a desired capacity
and then have that vary to within a

63

00:06:05,289  -->  00:06:11,409
minimum or maximum number of instances.
We can also manually scale an auto

64

00:06:11,409  -->  00:06:16,860
scaling group by changing the desired
capacity or the minimum or maximum

65

00:06:16,860  -->  00:06:23,139
number of instances in that scaling plan.
So the number of instances can be

66

00:06:23,139  -->  00:06:27,969
increased or decreased automatically
based upon the conditions that are

67

00:06:27,969  -->  00:06:33,819
specified within a scaling policy and
that will define the metric that we're

68

00:06:33,819  -->  00:06:40,480
going to be using to make those scaling
and scale out decisions. If we expect

69

00:06:40,480  -->  00:06:46,120
in the future at a certain time and date
or on a regular schedule on a certain

70

00:06:46,120  -->  00:06:51,850
time of date we can base our capacity on
that schedule and we do that by putting

71

00:06:51,850  -->  00:06:57,670
in the time and date and the action that
we want to take as a scheduled action in

72

00:06:57,670  -->  00:07:03,820
the auto scaling group and we can quite
simply do that in the ec2 auto-scaling

73

00:07:03,820  -->  00:07:13,210
console. Our scaling policy will define
how much we want to scale based upon

74

00:07:13,210  -->  00:07:19,450
some defined conditions.After we have
defined what the Cloudwatch metric that

75

00:07:19,450  -->  00:07:25,360
we're going to use to scale this and the
conditions around that our auto scaling

76

00:07:25,360  -->  00:07:32,320
group will use, CloudWatch alarms and
the associated policies to determine

77

00:07:32,320  -->  00:07:38,020
what that scaling will be. For example we
could have this scaling in and out based

78

00:07:38,020  -->  00:07:46,030
upon the CPU utilization across all of
these ec2 instances. The types of

79

00:07:46,030  -->  00:07:51,700
adjustments that we can make include
change in capacity. So adding one or two

80

00:07:51,700  -->  00:07:57,370
or three instances, or we could have an
exact capacity maintaining a specific

81

00:07:57,370  -->  00:08:02,710
capacity, or we could have a percent
change in capacity so we could add 20

82

00:08:02,710  -->  00:08:08,800
percent capacity for example to our auto
scaling group. There are a number of

83

00:08:08,800  -->  00:08:15,700
different scaling policy types that you
can select for your auto scaling group

84

00:08:15,700  -->  00:08:22,390
the first one is target tracking scaling
and that one is where AWS takes the most

85

00:08:22,390  -->  00:08:28,390
control around your strategy for scaling
and all you simply do is that you will

86

00:08:28,390  -->  00:08:34,720
define a target value and that scaling
will be based upon that target value for

87

00:08:34,720  -->  00:08:41,490
a specific metric that you define, and
auto scaling from there will create and

88

00:08:41,490  -->  00:08:47,560
do the ongoing management of those cloudwatch alarms that will trigger that

89

00:08:47,560  -->  00:08:53,740
scaling policy and it will calculate,
completely for you, you don't have to

90

00:08:53,740  -->  00:08:56,899
worry about
this, it will calculate the scaling

91

00:08:56,899  -->  00:09:02,870
adjustment based upon that metric and
the target value and it will base that

92

00:09:02,870  -->  00:09:10,699
decision on the demand on your auto
scaling group. A step scaling policy

93

00:09:10,699  -->  00:09:17,630
allows us to define our own scaling
adjustment values based on different

94

00:09:17,630  -->  00:09:24,079
bands of conditions and so it allows us
to define small changes in our capacity

95

00:09:24,079  -->  00:09:30,170
for small changes in demand and also
large changes in capacity for large

96

00:09:30,170  -->  00:09:34,880
changes in that demand, and so for
example you might want to increase your

97

00:09:34,880  -->  00:09:43,370
capacity by 25% if your CPU utilization
falls between 25% and 50% if you get a

98

00:09:43,370  -->  00:09:48,589
big demand on your auto scaling group
you can set up where the CPU utilization

99

00:09:48,589  -->  00:09:55,519
is between 50 and 75 percent you can
increase the capacity by 200 percent and

100

00:09:55,519  -->  00:10:01,730
that way you can quickly manage that
spike in demand, and the last one there

101

00:10:01,730  -->  00:10:07,519
is a simple scaling policy type and
that simply increases or decreases the

102

00:10:07,519  -->  00:10:13,430
capacity of your auto scaling group by a
single scaling adjustment. So simply if

103

00:10:13,430  -->  00:10:19,910
your CPU utilization goes above say 50%
then you would add X amount of instances.

104

00:10:19,910  -->  00:10:24,110
You can define a cool-down period to
make sure that you don't double up on

105

00:10:24,110  -->  00:10:29,750
this so that when the next check comes
in that you don't double up before the

106

00:10:29,750  -->  00:10:34,160
instances have had time to actually
launch and be registered, and it may

107

00:10:34,160  -->  00:10:38,240
react very slowly to large spikes in
demand because of that you're going to

108

00:10:38,240  -->  00:10:41,660
have a cool-down period that you need to
go through, and then you need to adjust

109

00:10:41,660  -->  00:10:46,490
by that single scaling adjustment but it
may also react too much if you've got

110

00:10:46,490  -->  00:10:49,939
too much of an adjustment, and so these
are things that you really need to

111

00:10:49,939  -->  00:10:53,810
fine-tune, that you don't have to
fine-tune if you use something like a

112

00:10:53,810  -->  00:11:01,160
target tracking strategy or scaling
policy. In the same way that we can auto

113

00:11:01,160  -->  00:11:08,329
scale our ec2 instances we can 
also Auto scale our ECS service as well.

114

00:11:08,329  -->  00:11:14,509
So the ECS auto-scaling uses the AWS
application auto-scaling

115

00:11:14,509  -->  00:11:19,159
service, and we'll talk more about that
in the next slide, it allows you to

116

00:11:19,159  -->  00:11:25,429
increase or decrease the number of tasks,
as opposed to the number of instances

117

00:11:25,429  -->  00:11:32,179
with ec2. We can increase or decrease the
number of ECS tasks based upon a scaling

118

00:11:32,179  -->  00:11:37,039
policy. Again we've got target tracking
where we can base it on a target value

119

00:11:37,039  -->  00:11:42,349
for a specific cloudwatch metric. We
have step again where we can base it on

120

00:11:42,349  -->  00:11:48,589
a series of step adjustments that vary
based upon the size of the alarm breech

121

00:11:48,589  -->  00:11:54,319
within cloudwatch, and finally we can
also set up scheduled changes in our

122

00:11:54,319  -->  00:11:58,869
capacity as well based on date and time.

123

00:12:01,700  -->  00:12:06,089
AWS application auto-scaling allows us
to apply

124

00:12:06,089  -->  00:12:10,620
auto-scaling and all of the benefits
that come from auto scaling to many

125

00:12:10,620  -->  00:12:16,830
other services other than ec2, for
example as we've seen with ECS, but also

126

00:12:16,830  -->  00:12:24,450
with ec2 spot fleets as well, with EMR
clusters, app stream fleets, DynamoDB

127

00:12:24,450  -->  00:12:30,660
tables and global secondary indexes,
Aurora replicas. So we can scale an RDS

128

00:12:30,660  -->  00:12:37,020
Aurora series of replicas as well.
Sagemaker, comprehend and also Lambda functions.

129

00:12:37,020  -->  00:12:43,290
We can automatically provision
the concurrency of those multiple lambda

130

00:12:43,290  -->  00:12:49,140
functions, and we can also apply auto
scaling to our own custom resources as

131

00:12:49,140  -->  00:12:55,020
well. For the auto scaling service to
automatically change the capacity of

132

00:12:55,020  -->  00:13:00,690
those resources it needs to have a
service linked IAM role.So that it has

133

00:13:00,690  -->  00:13:06,480
those permissions to call those AWS
services. An application auto scaling

134

00:13:06,480  -->  00:13:10,890
group can be created with the console,
but you can also use the command line

135

00:13:10,890  -->  00:13:15,600
interface or one of the many software
development kits as well. The commands

136

00:13:15,600  -->  00:13:20,760
there are register scaleable target, to
register that target metric that you're

137

00:13:20,760  -->  00:13:25,260
going to be using, and then you can
upload your scaling policy generally in

138

00:13:25,260  -->  00:13:31,080
JSON using put scaling policy, and then
you can also define a scheduled action

139

00:13:31,080  -->  00:13:35,940
that you might want to increase or
decrease the capacity of that group at a

140

00:13:35,940  -->  00:13:42,620
time in the future.
When we create our application auto

141

00:13:42,620  -->  00:13:47,660
scaling group we can select a number of
different scaling strategies. We can

142

00:13:47,660  -->  00:13:52,670
optimize for availability and that will
maintain the resource utilization at 40

143

00:13:52,670  -->  00:13:59,120
percent, or for balance availability and
cost and that will maintain 50 percent

144

00:13:59,120  -->  00:14:05,180
resource utilization, or we can optimize
for cost and that will maintain at 70

145

00:14:05,180  -->  00:14:12,529
percent. If we like we can also define
our own customs strategy and we do that

146

00:14:12,529  -->  00:14:16,850
by defining the scaling metric that
we're going to be used that is going to

147

00:14:16,850  -->  00:14:22,070
be measuring those individual resources,
so that could be CPU utilization for

148

00:14:22,070  -->  00:14:26,450
example, and then we have a target value
for that scaling that we want to achieve

149

00:14:26,450  -->  00:14:31,880
and we also have a load metric which
will measure the load on that entire

150

00:14:31,880  -->  00:14:39,110
auto scaling group. Now that load metric
is normally used for predictive scaling

151

00:14:39,110  -->  00:14:43,910
and so the auto scaling service will
look at the history of load on that auto

152

00:14:43,910  -->  00:14:47,750
scaling group and it will make
adjustments to the scaling strategy

153

00:14:47,750  -->  00:14:50,860
based upon that.

154

00:14:52,489  -->  00:14:59,249
We can also use the application
auto-scaling service with DynamoDB and

155

00:14:59,249  -->  00:15:06,029
we can use it to adjust the provisioned
throughput capacity of both our tables

156

00:15:06,029  -->  00:15:11,970
and also of any global secondary index
as well. What that does it will reduce

157

00:15:11,970  -->  00:15:18,389
any throttling of those requests when
demand on our DynamoDB back-end gets

158

00:15:18,389  -->  00:15:23,850
high, by doing that we're going to
provide a better experience for our

159

00:15:23,850  -->  00:15:28,170
clients that are connected to this
back-end and reduce that latency of all

160

00:15:28,170  -->  00:15:33,480
those requests. We define a scaling
policy which again will consist of a

161

00:15:33,480  -->  00:15:38,309
scaleable target, and that could be the
read capacity or the write capacity of

162

00:15:38,309  -->  00:15:44,220
either that dynamodb table or global
secondary index, or we could have both

163

00:15:44,220  -->  00:15:49,439
read and write capacity as a scaleable
target. Then all we need to do is to

164

00:15:49,439  -->  00:15:56,069
define a target utilization of between
20 and 90%. If we would like to have a

165

00:15:56,069  -->  00:16:01,470
lot of spare capacity up our sleeve than
we would define 20%. If we want to reduce

166

00:16:01,470  -->  00:16:07,470
costs and maximize the utilization of
these tables then we could define

167

00:16:07,470  -->  00:16:13,649
anything up to 90%. Okay so here is how
it works on the left there we've got our

168

00:16:13,649  -->  00:16:17,999
clients that we'll be connecting into
this DynamoDB table, and that demand on

169

00:16:17,999  -->  00:16:22,619
that DynamoDB table will vary depending
on the number of clients and the types

170

00:16:22,619  -->  00:16:28,860
of requests. So that variation in demand
will be picked up by Amazon CloudWatch

171

00:16:28,860  -->  00:16:35,610
as a change in a cloudwatch metric. If
that change exceeds an alarm level then

172

00:16:35,610  -->  00:16:40,069
Amazon CloudWatch will notify the
application auto-scaling service, and

173

00:16:40,069  -->  00:16:45,209
optionally you could also have Amazon
CloudWatch send out an SNS message to

174

00:16:45,209  -->  00:16:50,730
someone as well. When the application
auto-scaling service receives a

175

00:16:50,730  -->  00:16:56,610
notification from Amazon CloudWatch
then it will issue an update table

176

00:16:56,610  -->  00:17:02,299
operation to the dynamodb table and that
will increase or decrease the

177

00:17:02,299  -->  00:17:08,579
provisioned throughput capacity
of that table or global secondary index

178

00:17:08,579  -->  00:17:17,010
as well. We can also use the application
auto-scaling service to dynamically

179

00:17:17,010  -->  00:17:24,179
adjust the number of Aurora replicas
within an Aurora provisioned DB cluster.

180

00:17:24,179  -->  00:17:30,570
So this is a provisioned DB cluster as
opposed to Aurora serverless. So you will

181

00:17:30,570  -->  00:17:36,000
have a real cluster and you'll have
replicas within that cluster.

182

00:17:36,000  -->  00:17:41,820
It's available for both the MySQL and
PostgreSQL database engines.

183

00:17:41,820  -->  00:17:47,610
Your scaling policy will consist of a target
metric for example CPU utilization, you

184

00:17:47,610  -->  00:17:52,169
will define a minimum and maximum number
of Aurora replicas that you would like.

185

00:17:52,169  -->  00:17:57,559
You can also define a cool-down period
and that way you can make sure that

186

00:17:57,559  -->  00:18:02,730
scaling operations are finished before
you invoke another scaling operation so

187

00:18:02,730  -->  00:18:08,190
you don't double up on those. You can
also enable or disable scale in activity.

188

00:18:08,190  -->  00:18:11,970
So you can leave it scaled up and
permanently scale it up, or you can

189

00:18:11,970  -->  00:18:17,480
enable that to scale back in when that
demand goes down.

190

00:18:20,570  -->  00:18:27,960
When an AWS lambda function is invoked
in response to a request for something

191

00:18:27,960  -->  00:18:35,190
to be computed, an instance will handle
that request. If you get many requests

192

00:18:35,190  -->  00:18:39,330
coming in at the same time then there
will be concurrent instances that will

193

00:18:39,330  -->  00:18:45,300
be handling those multiple requests. So
when we get a large initial burst of

194

00:18:45,300  -->  00:18:50,850
traffic, the concurrency or the
concurrent instances that are available

195

00:18:50,850  -->  00:18:58,680
within a region, can reach between 500 to
3,000 depending on which region that

196

00:18:58,680  -->  00:19:03,930
function is operating in. Once it's
reached that initial burst of traffic, it

197

00:19:03,930  -->  00:19:11,060
will have to throttle those requests but
after that it can scale an additional

198

00:19:11,060  -->  00:19:18,450
500 concurrent instances every minute up
to the maximum of the regional

199

00:19:18,450  -->  00:19:25,020
concurrency limit, above that burst limit
which is 1000. Now that 1000 it is a

200

00:19:25,020  -->  00:19:31,980
limit across all accounts but you can
contact AWS support and put in a case to

201

00:19:31,980  -->  00:19:38,490
have that increased if you need it. Now
obviously if you still exceed that 500

202

00:19:38,490  -->  00:19:44,070
instances per minute, the capacity of
those, then again that will cause latency

203

00:19:44,070  -->  00:19:51,030
by throttling at those requests. Okay so
in the gray there we've got the open

204

00:19:51,030  -->  00:19:56,220
requests that are needed to be handled
by this Lambda function and in the

205

00:19:56,220  -->  00:20:00,240
orange there we've got the instances
that have been invoked to handle that

206

00:20:00,240  -->  00:20:05,460
that requests for that compute capacity.
So as we can see there as those requests

207

00:20:05,460  -->  00:20:12,780
come in they all be matched by function
instances up until that burst limit. When

208

00:20:12,780  -->  00:20:20,370
we exceed that burst limit we can only
add up to 500 extra instances every

209

00:20:20,370  -->  00:20:25,950
minute up until we reach that
concurrency limit of 1000.

210

00:20:25,950  -->  00:20:31,460
So as we exceed that burst limit up
until our concurrency limit,

211

00:20:31,460  -->  00:20:37,040
if we exceed any of that then those
requests will need to be throttled and

212

00:20:37,040  -->  00:20:42,260
that will cause latency for your
application, and then as those open

213

00:20:42,260  -->  00:20:47,450
requests are closed out, we can see there
that the number of instances will slowly

214

00:20:47,450  -->  00:20:54,650
reduce down as well back down to the
minimum level. One way of reducing any

215

00:20:54,650  -->  00:20:59,330
throttling within your Lambda
architecture is to use Lambda provisioned

216

00:20:59,330  -->  00:21:06,700
concurrency, and what that will do is it
will initialize the requested number of

217

00:21:06,700  -->  00:21:12,350
execution environments, all those invoked
instances, that you specify and that will

218

00:21:12,350  -->  00:21:17,930
allow you to reduce that throttling and
to reduce latency. So up until the burst

219

00:21:17,930  -->  00:21:23,420
behavior it will act exactly the same as
standard concurrency and then when you

220

00:21:23,420  -->  00:21:30,950
exceed that burst it will scale up to
that provisioned concurrency and then

221

00:21:30,950  -->  00:21:36,430
once that provision concurrency has
exceeded then it will scale up normally

222

00:21:36,430  -->  00:21:44,510
above that which will be of the order of
500 additional instances per minute, and

223

00:21:44,510  -->  00:21:49,820
again here we see we've got our open
requests in gray and we've got our

224

00:21:49,820  -->  00:21:54,680
instances that are invoked for this
function in orange. So when we first

225

00:21:54,680  -->  00:21:59,810
start there we have our burst limit and
then from that point in as more open

226

00:21:59,810  -->  00:22:06,350
requests come in then more instances are
invoked up until that requested

227

00:22:06,350  -->  00:22:11,390
provision concurrency has reached, and it
will maintain that, and then at the point

228

00:22:11,390  -->  00:22:17,540
where those open requests exceed that
requested provision concurrency, and then

229

00:22:17,540  -->  00:22:23,420
from that point onwards it will be
adding 500 per minute additional

230

00:22:23,420  -->  00:22:30,230
function instances the same as it would
in a standard concurrency arrangement.

231

00:22:30,230  -->  00:22:34,520
The difference here is that above the
first burst limit we're going to have some

232

00:22:34,520  -->  00:22:39,410
ready available provision concurrency
that's going to prevent that throttling

233

00:22:39,410  -->  00:22:44,570
from occurring, but that said if we
exceed that provision concurrency line

234

00:22:44,570  -->  00:22:48,910
and
get a another big burst then it is still

235

00:22:48,910  -->  00:22:55,930
possible that we would get throttling as
well. In order to handle any throttling

236

00:22:55,930  -->  00:23:00,580
that may occur if we exceed both our
burst limit and our provisioned

237

00:23:00,580  -->  00:23:05,290
concurrency limit by a significant
amount, what we can do is we can

238

00:23:05,290  -->  00:23:12,210
implement provisioned concurrency
auto-scaling and that uses the AWS

239

00:23:12,210  -->  00:23:17,500
application auto scaling service and
what it does it will look just that

240

00:23:17,500  -->  00:23:22,600
provision concurrency level
automatically depending on demand on

241

00:23:22,600  -->  00:23:27,700
that function and it will use a target
tracking scaling policy that will be

242

00:23:27,700  -->  00:23:34,480
based upon a utilization metric. As the
function is more utilized then the

243

00:23:34,480  -->  00:23:40,360
provisioned concurrency level will be
adjusted to accommodate that. Okay so

244

00:23:40,360  -->  00:23:45,820
here we have auto scaling with provision
concurrency we've got the open requests

245

00:23:45,820  -->  00:23:50,860
again in gray there and we've got our
function instances in orange there. So as

246

00:23:50,860  -->  00:23:56,170
we can see as the demand increases as
those open requests increases our

247

00:23:56,170  -->  00:24:01,000
provisioned concurrency is going to change
its going to step up, up, up, until we

248

00:24:01,000  -->  00:24:06,520
reach that maximum of the scaling range
that we defined, and then above that it

249

00:24:06,520  -->  00:24:11,050
will be using standard concurrency. So if
we want to exceed that it'll again be

250

00:24:11,050  -->  00:24:16,900
going on to that 500 additional
instances per minute, then as the demand

251

00:24:16,900  -->  00:24:22,960
goes down and those open requests are
closed down then you will see the

252

00:24:22,960  -->  00:24:28,390
provision capacity will change and then
the functions will slowly come down to

253

00:24:28,390  -->  00:24:34,950
to a level to manage that lower number
of open requests.

254

00:24:36,590  -->  00:24:42,600
There are a number of best practices
recommended by AWS. First off there is

255

00:24:42,600  -->  00:24:47,970
make sure that you base your scaling on
a one-minute frequency now with ec2

256

00:24:47,970  -->  00:24:54,570
standard cloud watch frequency is five
minutes. So for an auto scaling we do

257

00:24:54,570  -->  00:25:00,480
recommend or AWS does recommend a
one-minute frequency. Enable auto scaling

258

00:25:00,480  -->  00:25:06,060
group metrics rather than individual
instances and that way you'll be taking

259

00:25:06,060  -->  00:25:11,490
a metric of the entire aggregate group
not just individual instances. Use an

260

00:25:11,490  -->  00:25:17,840
appropriate instance type for example if
you're using ec2 T2 type burstable

261

00:25:17,840  -->  00:25:23,790
instances you may run out of those CPU
credits and that may not behave how you

262

00:25:23,790  -->  00:25:28,980
expected it to behave when those credit
limits are exceeded. So take that into

263

00:25:28,980  -->  00:25:33,360
consideration if you're going to use
bursts of all burstable all instances in

264

00:25:33,360  -->  00:25:39,360
an auto scaling group. There are some
additional things you may want to take

265

00:25:39,360  -->  00:25:46,140
into consideration as well. When you have
a predictive scaling plan that for

266

00:25:46,140  -->  00:25:53,040
example a target tracking scaling plan
that is based on the forecast which is

267

00:25:53,040  -->  00:25:58,170
also based upon a history of the demand
so what you can do is that when you

268

00:25:58,170  -->  00:26:02,820
first implement this scaling plan in
your own you launch or auto scaling

269

00:26:02,820  -->  00:26:09,180
group you can set your scaling plan up
as forecast only and then you can view

270

00:26:09,180  -->  00:26:13,410
how well it's working and then after
that you can change it to forecast and

271

00:26:13,410  -->  00:26:20,130
scale when you're confident that that
forecast quality is what you require.

272

00:26:20,130  -->  00:26:25,410
Now with custom predictive scaling you need
to make sure that the scaling metric and

273

00:26:25,410  -->  00:26:29,670
the group load metrics, so you need to
define both of those the scaling metric

274

00:26:29,670  -->  00:26:34,740
will be what is used to scale the auto
scaling group in and out, and the group

275

00:26:34,740  -->  00:26:38,490
load metric will be used for that
forecasting. You need to make sure that

276

00:26:38,490  -->  00:26:43,380
they are strongly correlated to the load
that you are looking for on those

277

00:26:43,380  -->  00:26:47,880
instances. When you are implementing a
new scaling plan for an auto scaling

278

00:26:47,880  -->  00:26:52,500
group make sure that you
release any previously scheduled scaling

279

00:26:52,500  -->  00:26:56,429
actions when you are doing that
otherwise it may interfere with your new

280

00:26:56,429  -->  00:27:03,240
scaling plan. If you are getting an
active with problems error with your

281

00:27:03,240  -->  00:27:08,370
predictive scaling strategy that will
mean that your scaling configuration

282

00:27:08,370  -->  00:27:12,720
that you are set up for those resources
that are inside of that auto scaling

283

00:27:12,720  -->  00:27:17,880
group could not be applied and there are
a couple of reasons for that. The first

284

00:27:17,880  -->  00:27:22,950
reason would be that the resource has
already been added to another scaling

285

00:27:22,950  -->  00:27:26,990
policy so you need to make sure that is
only added to that one scaling policy.

286

00:27:26,990  -->  00:27:32,159
The next cause it is that the auto
scaling group does not meet the minimum

287

00:27:32,159  -->  00:27:36,960
requirements for predictive scaling. So
if you're using a target tracking

288

00:27:36,960  -->  00:27:41,000
strategy for this auto scaling group
there may not be enough information

289

00:27:41,000  -->  00:27:46,590
available to make a prediction on what
that level should be. So the

290

00:27:46,590  -->  00:27:50,640
way around that is to wait 24 hours
after creating that group to get that

291

00:27:50,640  -->  00:27:55,830
information and then once the service
has got that then you will be able to

292

00:27:55,830  -->  00:28:01,679
configure that for predictive scaling. Ok
so that brings us to the end of the

293

00:28:01,679  -->  00:28:05,490
lecture I hope you've enjoyed it and I
look forward to seeing you in the next

294

00:28:05,490  -->  00:28:07,730
one.