WEBVTT

00:00:00.740 --> 00:00:05.140
The business impact analysis is the thing that begins our

00:00:05.140 --> 00:00:08.580
process of setting up the BCDR strategy.

00:00:08.590 --> 00:00:13.530
This is going to be based upon the business's tolerance of loss of

00:00:13.530 --> 00:00:17.610
their most critical assets. If you're like me and you've asked business

00:00:17.610 --> 00:00:20.210
partners in the past, what's most important?

00:00:20.220 --> 00:00:24.110
Sometimes the first answer that comes back from them is, everything. In

00:00:24.110 --> 00:00:27.450
a jovial kind of way I have then followed up with,

00:00:27.450 --> 00:00:31.440
well, what is the thing that you would lose your job for first? And then there's

00:00:31.440 --> 00:00:36.960
the response of, oh, the most important thing. Just as if you were underwater

00:00:36.970 --> 00:00:42.090
for 10 minutes, but you only had 5 minutes of breathing, that helps you to see

00:00:42.100 --> 00:00:48.720
that there should be some definition for a critical asset that says based upon a

00:00:48.720 --> 00:00:54.700
predetermined amount of time, I know that I can tolerate the failure of a

00:00:54.700 --> 00:00:59.030
business resource or a business process. That's what the business is going to

00:00:59.030 --> 00:00:59.960
return to you.

00:01:00.290 --> 00:01:03.650
Another feature of the business impact analysis is this thing

00:01:03.650 --> 00:01:08.470
called an RTO. ISO 27031 states that there is a recovery time

00:01:08.470 --> 00:01:12.260
objective per resource service or activity.

00:01:12.270 --> 00:01:14.360
It's the amount of time the organization has as a

00:01:14.360 --> 00:01:17.550
goal to recover a disrupted product.

00:01:17.560 --> 00:01:21.360
So, think of the MTPD as being a timer and think of

00:01:21.360 --> 00:01:24.160
the RTO as being a stopwatch.

00:01:24.440 --> 00:01:29.260
The RPO is related to the recovery of data to a point that

00:01:29.260 --> 00:01:32.420
meets organizational validity and integrity.

00:01:32.430 --> 00:01:35.700
So the RPO is related to how far back into the past an

00:01:35.710 --> 00:01:39.700
organization must go in order to have a reliable replica or a

00:01:39.700 --> 00:01:43.890
copy of reliable data to continue the business.

00:01:43.890 --> 00:01:47.560
This could be information that is inside of a backup, so

00:01:47.560 --> 00:01:51.510
it's not measured in storage amount, it's measured in time.

00:01:51.530 --> 00:01:54.170
Here are some things to think about as you're

00:01:54.170 --> 00:01:57.470
questioning what the BCDR strategy looks like.

00:01:57.480 --> 00:02:02.410
What is the recovery point objective as defined by the business?

00:02:02.420 --> 00:02:05.620
What is the amount of data that they can tolerate losing? In a

00:02:05.620 --> 00:02:10.610
financial institution, it may be measured in seconds of time or

00:02:10.620 --> 00:02:14.910
minutes of time. In some other organization that does not have the

00:02:14.910 --> 00:02:18.990
need to have the last known good transaction so tightly bound, it

00:02:18.990 --> 00:02:21.730
could be measured in hours or in days.

00:02:21.740 --> 00:02:27.300
What is the RTO? Here again, we're focusing on the recovery time of a system.

00:02:27.300 --> 00:02:31.570
Think of the RTO as being a virtual machine that is restored.

00:02:31.840 --> 00:02:36.450
Think of the RPO, recovery point objective, as being the data

00:02:36.460 --> 00:02:39.390
that is restored from some type of backup.

00:02:39.400 --> 00:02:44.630
What are the kinds of disasters that you should include in your threat analysis?

00:02:44.640 --> 00:02:49.810
Remember how we talked about this idea of a qualitative or a

00:02:49.810 --> 00:02:53.460
quantitative analysis? Both should be employed here.

00:02:53.540 --> 00:02:59.860
So now, let's consider what the cloud‑based BCDR options could look like.

00:03:00.340 --> 00:03:00.970
Again,

00:03:00.980 --> 00:03:04.680
it's not exhaustive, but it's at least three general ways of thinking

00:03:04.680 --> 00:03:09.610
about containing the resiliency in your organization in the cloud.

00:03:10.040 --> 00:03:15.220
Cloud‑based BCDR options are three in number, again,

00:03:15.230 --> 00:03:19.240
generally speaking. You have on‑premise operations and then you're

00:03:19.240 --> 00:03:25.940
using the cloud as BCDR. You are a complete cloud consumer, so your

00:03:25.940 --> 00:03:29.830
primary provider of BCDR would be the cloud, you've left the

00:03:29.830 --> 00:03:35.440
datacenter. Or, you are a cloud consumer with an alternate cloud

00:03:35.440 --> 00:03:38.320
provider as your BCDR option.

00:03:38.330 --> 00:03:44.360
So, let's take a look at what does that appear as. On‑premise, Cloud as BCDR

00:03:44.370 --> 00:03:51.290
means that your primary mode of operation is the datacenter. In your BIA what

00:03:51.290 --> 00:03:56.010
you've done is your business impact analysis has shown that based off of a

00:03:56.020 --> 00:03:59.040
threat of actually losing this datacenter,

00:03:59.040 --> 00:04:02.380
the recovery point objective and the recovery time

00:04:02.380 --> 00:04:05.820
objective are actually going to be things that we measure

00:04:05.830 --> 00:04:08.740
out to some type of cloud service.

00:04:08.750 --> 00:04:11.850
So, we're backing up those critical things to the cloud.

00:04:11.940 --> 00:04:17.380
If in fact there is a disaster that takes place with that datacenter,

00:04:17.390 --> 00:04:20.899
then we would go to our alternate location and we

00:04:20.899 --> 00:04:23.940
would do our recovery from the cloud.

00:04:24.540 --> 00:04:29.110
How about you are primarily consuming from a cloud service provider?

00:04:29.120 --> 00:04:29.500
Well,

00:04:29.510 --> 00:04:33.070
in this case we're actually using the native

00:04:33.070 --> 00:04:35.310
replication capabilities of the cloud.

00:04:35.320 --> 00:04:39.620
You'll recall an availability zone represents one or more datacenters.

00:04:39.630 --> 00:04:44.030
If you onboard workloads from a particular availability zone,

00:04:44.040 --> 00:04:46.700
then there is replication, obviously, that is

00:04:46.700 --> 00:04:48.960
occurring within that availability zone.

00:04:49.440 --> 00:04:53.220
But, between availability zones within a region,

00:04:53.230 --> 00:04:56.300
you could also have replicas, depending on the type

00:04:56.300 --> 00:04:58.320
of workloads that you are using.

00:04:58.420 --> 00:05:02.550
If one of those availability zones should go offline,

00:05:02.560 --> 00:05:07.660
you could without any hesitation be consuming and retrieving

00:05:07.660 --> 00:05:10.230
services from another availability zone.

00:05:10.240 --> 00:05:15.640
In fact, if you set up some type of load balancer across availability zones,

00:05:15.650 --> 00:05:20.360
you would not even have to know that you lost a particular availability zone.

00:05:21.040 --> 00:05:25.780
If I am a cloud consumer and then I'm using an alternate

00:05:25.790 --> 00:05:30.300
location as my BCDR, then what I'm going to have to do is

00:05:30.300 --> 00:05:33.600
make sure that the interoperability, the format,

00:05:33.600 --> 00:05:39.720
the semantics between the formats of data and also of systems has

00:05:39.720 --> 00:05:44.070
some kind of congruency between those two providers. Because if I

00:05:44.070 --> 00:05:48.530
lose my primary, I should be able to automatically be consuming

00:05:48.530 --> 00:05:51.360
services from my alternate provider.

00:05:53.140 --> 00:05:58.760
Primary activities that are associated with BCDR would include recovery,

00:05:58.770 --> 00:06:03.510
which means that the team that gets activated is actually focused on the

00:06:03.520 --> 00:06:08.460
alternate location and they're trying to bring up systems from the most

00:06:08.460 --> 00:06:12.850
critical to the least critical. And restoration,

00:06:12.860 --> 00:06:16.870
which means that you have a team that's actually doing work at the

00:06:16.870 --> 00:06:22.450
primary location to bring that primary location back online.

00:06:22.450 --> 00:06:26.000
Recovery activities begin in prevention.

00:06:26.010 --> 00:06:30.100
We're trying to protect from threats such as environmental hardware failures,

00:06:30.100 --> 00:06:35.210
operational errors or malicious attacks. Detection says that I'm detecting

00:06:35.220 --> 00:06:38.780
incidents at the earliest opportunity to minimize the impact to the

00:06:38.780 --> 00:06:44.850
services. Response is the imperative part of the first action that says that

00:06:44.850 --> 00:06:49.210
the recovery team may need to communicate on a regular basis with the

00:06:49.210 --> 00:06:53.910
executive emergency management team to give them an analysis of how bad

00:06:53.910 --> 00:06:58.080
things are. And then, moving into the recovery activities, where you follow

00:06:58.080 --> 00:07:02.830
the appropriate strategy and you prioritize the most critical services get

00:07:02.840 --> 00:07:08.450
reinstated first, and an ongoing improvement of all of these capabilities.

00:07:09.240 --> 00:07:14.540
Working in parallel to recovery would be restoration. Here, in the

00:07:14.540 --> 00:07:19.670
restoration activities, we're thinking about what the original location looks

00:07:19.670 --> 00:07:23.330
like. If it was a disaster, it may mean that number one,

00:07:23.330 --> 00:07:25.520
it was uninhabitable for people.

00:07:25.530 --> 00:07:31.210
So getting a certification of restoration to a new normal may

00:07:31.220 --> 00:07:35.430
actually be something that the focus says it's not the original

00:07:35.430 --> 00:07:39.170
location, we're actually operating in a new space,

00:07:39.330 --> 00:07:44.900
sort of like what happened in downtown New York City after 9/11 for many

00:07:44.900 --> 00:07:50.930
companies. Why is it that we go in reverse order and return the critical systems

00:07:50.930 --> 00:07:54.310
last and the least critical systems first? Number one,

00:07:54.310 --> 00:07:58.970
because we don't want to impose some kind of conflict with the site

00:07:58.980 --> 00:08:01.710
that is in operation at the recovery location.

00:08:01.720 --> 00:08:04.430
And number two, even though it's been certified,

00:08:04.430 --> 00:08:08.860
we don't fully trust that previously uninhabitable environment.

00:08:09.040 --> 00:08:11.430
It's also in concert with the recovery.

00:08:11.530 --> 00:08:13.840
So these are parallel activities.

00:08:13.840 --> 00:08:17.370
It may seem counterintuitive, why would we do this?

00:08:17.380 --> 00:08:23.360
One of the things is that we no longer have BCDR capabilities, do we?

00:08:23.370 --> 00:08:29.460
So, we want to ensure that we get up that original location as quickly as

00:08:29.460 --> 00:08:33.169
possible so that we can still have BCDR capabilities.

00:08:34.090 --> 00:08:37.580
Next, join me over in the process of creating,

00:08:37.580 --> 00:08:40.440
implementing, and then testing the BCDR.