WEBVTT

00:00.000 --> 00:01.860
>> That leads into talking about

00:01.860 --> 00:04.740
disaster recovery and
business continuity.

00:04.740 --> 00:06.735
In the event that a
disaster strikes,

00:06.735 --> 00:09.240
redundancy is going to be
key to getting back online,

00:09.240 --> 00:10.440
backup and running and

00:10.440 --> 00:12.945
enabling the continuity
of the business.

00:12.945 --> 00:15.210
You hear the terms BCP for

00:15.210 --> 00:17.040
business continuity planning and

00:17.040 --> 00:19.465
DRP for disaster
recovery planning.

00:19.465 --> 00:20.990
We often hear them together

00:20.990 --> 00:22.850
or sometimes used
interchangeably.

00:22.850 --> 00:24.090
We want to make sure that we

00:24.090 --> 00:25.785
know the difference
between them.

00:25.785 --> 00:27.750
A business continuity plan is

00:27.750 --> 00:29.880
an overarching
umbrella document that

00:29.880 --> 00:31.860
includes many other
plans that helps sustain

00:31.860 --> 00:34.455
the organization in
case of a disaster.

00:34.455 --> 00:37.170
The DRP is more of a
short-term document

00:37.170 --> 00:39.940
that is focused on the
immediacy of the disaster.

00:39.940 --> 00:43.100
I've heard people saying the
DRP is the sky's falling,

00:43.100 --> 00:46.940
the BCP is the sky has
fallen. How do we keep going?

00:46.940 --> 00:49.925
Disaster recovery is really
focused on restoring

00:49.925 --> 00:51.875
IT services to operation based

00:51.875 --> 00:54.310
on their criticality as
quickly as possible.

00:54.310 --> 00:56.150
When we talk about criticality,

00:56.150 --> 00:57.920
we mean time sensitivity.

00:57.920 --> 00:59.510
There are certain services that

00:59.510 --> 01:01.465
while we are offline
we lose money.

01:01.465 --> 01:04.055
If we have an eco commerce
site, for instance,

01:04.055 --> 01:06.575
the longer the eco commerce
site is unavailable,

01:06.575 --> 01:08.165
the less money I'm generating.

01:08.165 --> 01:10.915
That would be a very
critical service.

01:10.915 --> 01:13.310
There are seven
stages or phases of

01:13.310 --> 01:15.440
business continuity
plan and lots of

01:15.440 --> 01:16.700
different organizations have

01:16.700 --> 01:18.455
their own documents they use.

01:18.455 --> 01:21.675
This is NIST 800-34.

01:21.675 --> 01:26.460
ISO 27031 has a framework
of business continuity.

01:26.460 --> 01:28.475
There are various
plans available

01:28.475 --> 01:31.134
and they're all performing
the same functions.

01:31.134 --> 01:33.780
We start out with
project initiation.

01:33.780 --> 01:35.750
Writing a business
continuity plan is

01:35.750 --> 01:38.180
a project and it should
be managed as such.

01:38.180 --> 01:40.100
We start with the
project then we

01:40.100 --> 01:42.380
move into the business
impact analysis.

01:42.380 --> 01:43.610
This is probably the most

01:43.610 --> 01:45.170
critical step because
it is where we

01:45.170 --> 01:46.370
identify what elements are

01:46.370 --> 01:48.665
critical and how
critical they are.

01:48.665 --> 01:51.620
That's going to be the driver
for what we recover in

01:51.620 --> 01:54.830
what we recover and
how quickly we do so.

01:54.830 --> 01:57.500
We identify our
recovery strategies

01:57.500 --> 01:59.465
then get our design
and development.

01:59.465 --> 02:01.310
We look to implement the plan,

02:01.310 --> 02:03.560
we test it and maintain it.

02:03.560 --> 02:05.560
Those are the seven phases.

02:05.560 --> 02:09.715
Again, this goes
back to NIST 800-34.

02:09.715 --> 02:11.630
There are other
frameworks out there in

02:11.630 --> 02:14.430
the support business
continuity planning.

02:14.980 --> 02:18.065
If we look at the
project initiation,

02:18.065 --> 02:20.120
you're going to manage
this as a project.

02:20.120 --> 02:21.440
We have to have support and

02:21.440 --> 02:23.285
buy-in from senior management.

02:23.285 --> 02:25.880
A business continuity plan
isn't something that you

02:25.880 --> 02:28.490
write one afternoon over
margarita at Chili's.

02:28.490 --> 02:32.090
This is a lengthy process that
needs funding and support.

02:32.090 --> 02:34.460
Senior management is
going to put their buy-in

02:34.460 --> 02:36.875
in writing and they're
going to sign off.

02:36.875 --> 02:38.810
That's committing to support and

02:38.810 --> 02:41.945
funding and the project
manager should be named.

02:41.945 --> 02:44.270
That's going to be the
person who coordinates

02:44.270 --> 02:46.695
the business continuity
planning processes.

02:46.695 --> 02:48.755
We figure out the
scope of the plan.

02:48.755 --> 02:51.305
We select members
of the BCP team.

02:51.305 --> 02:53.180
The business continuity
planning team

02:53.180 --> 02:55.130
should come from a
diverse background.

02:55.130 --> 02:57.020
You should have representation
from throughout

02:57.020 --> 03:00.050
the organization, including
senior management.

03:00.050 --> 03:02.780
On our next phase,
this is the big one

03:02.780 --> 03:05.375
because this is the
business impact analysis.

03:05.375 --> 03:08.000
This is where we do our
research and identify and

03:08.000 --> 03:09.530
prioritize all of our business

03:09.530 --> 03:11.585
processes based on
the criticality.

03:11.585 --> 03:14.750
Again, criticality
is time sensitivity.

03:14.750 --> 03:16.280
This document is
going to give us

03:16.280 --> 03:17.510
metrics to determine how

03:17.510 --> 03:21.000
quickly these critical
devices need to be up online.

03:21.320 --> 03:24.660
We'll talk about things like
Recovery point objectives

03:24.660 --> 03:26.685
, service level objectives.

03:26.685 --> 03:30.235
We've already talked
about MTBF and MTTR.

03:30.235 --> 03:31.850
Let me just take
a minute and talk

03:31.850 --> 03:33.650
about service level objectives,

03:33.650 --> 03:37.030
SLOs, not to be
confused with SLAs.

03:37.030 --> 03:38.955
Service level objectives,

03:38.955 --> 03:42.425
the idea is that if we're in
some disaster operations,

03:42.425 --> 03:43.700
we're not going to be providing

03:43.700 --> 03:46.595
100 percent of our normal
service to our customers.

03:46.595 --> 03:48.500
What we might say
is in the event

03:48.500 --> 03:50.255
that these services
are unavailable,

03:50.255 --> 03:52.730
we at least like to
operate at 80 percent.

03:52.730 --> 03:54.740
That's a service
level objective.

03:54.740 --> 03:56.510
It takes into consideration that

03:56.510 --> 03:58.600
you can't operate
at 100 percent.

03:58.600 --> 04:00.195
What are we looking for,

04:00.195 --> 04:02.975
striving for, in a
reduced capacity?

04:02.975 --> 04:06.500
A recovery point objective
is tolerance for data loss.

04:06.500 --> 04:08.540
How current must data be?

04:08.540 --> 04:11.285
If I say I have an RPO one hour,

04:11.285 --> 04:14.300
you need to restore all
files up until an hour ago.

04:14.300 --> 04:16.820
How much data am I
willing to lose?

04:16.820 --> 04:18.515
Recovery time objective,

04:18.515 --> 04:22.255
RTO and MTD are sometimes
used interchangeably.

04:22.255 --> 04:26.730
Recovery time objective,
Maximum Tolerable Downtime.

04:26.730 --> 04:28.730
This is what's the
maximum amount

04:28.730 --> 04:29.930
of time we can be without

04:29.930 --> 04:33.290
the service before we suffer
loss that's unacceptable.

04:33.290 --> 04:35.185
What's our maximum time?

04:35.185 --> 04:38.014
We've already talked about
mean time between failures,

04:38.014 --> 04:40.090
the amount of time that
the device will run,

04:40.090 --> 04:42.155
we repair it, then it fails,

04:42.155 --> 04:44.060
then we repair, then it fails.

04:44.060 --> 04:46.130
MTTR is again, that

04:46.130 --> 04:48.725
mean times to repair,
just what it sounds like.

04:48.725 --> 04:50.570
Also, we need to determine

04:50.570 --> 04:52.730
minimum operating
requirements because when

04:52.730 --> 04:54.965
we restore these
devices, for instance,

04:54.965 --> 04:56.420
if I have software that has to

04:56.420 --> 04:58.265
be up and running
within nine minutes,

04:58.265 --> 04:59.930
you better make sure
I have the hardware

04:59.930 --> 05:02.090
that will run that
software, so to speak.

05:02.090 --> 05:05.420
Any sort of environmental or
application type requirement

05:05.420 --> 05:07.820
should also be in the BIA.

05:07.820 --> 05:11.855
The next phase, identify
my recovery strategies

05:11.855 --> 05:13.340
in the event of a disaster

05:13.340 --> 05:15.475
assuredly their has
been some loss.

05:15.475 --> 05:16.880
Let me just say
that it should go

05:16.880 --> 05:18.200
without saying that if we always

05:18.200 --> 05:19.460
place the physical safety of

05:19.460 --> 05:21.030
our employees above
anything else,

05:21.030 --> 05:22.220
if there were ever to be

05:22.220 --> 05:23.540
a decision process to make

05:23.540 --> 05:25.250
where human life may be at risk,

05:25.250 --> 05:27.620
we have to choose something
different always.

05:27.620 --> 05:30.425
After human life, we start
to think about our facility

05:30.425 --> 05:31.730
because that would
be an area that

05:31.730 --> 05:33.470
would cost us a great loss.

05:33.470 --> 05:35.570
If our facility is damaged or

05:35.570 --> 05:37.565
is unavailable for
a period of time,

05:37.565 --> 05:39.245
we may need somewhere to work.

05:39.245 --> 05:42.820
Maybe our employees can work
from home, but maybe not.

05:42.820 --> 05:46.665
If not, we generally lease
an off-site facility.

05:46.665 --> 05:48.210
We might lease a cold site.

05:48.210 --> 05:49.790
A cold site is really

05:49.790 --> 05:51.320
just a bare-bones facility that

05:51.320 --> 05:53.225
has heating and
air conditioning.

05:53.225 --> 05:56.240
There's nothing beyond that.
It's just an empty building

05:56.240 --> 05:57.790
or an empty space.

05:57.790 --> 06:00.110
Obviously, coming
into a cold site is

06:00.110 --> 06:02.345
going to take a while to
get back up and running.

06:02.345 --> 06:04.265
Cold sites are the
cheapest thing.

06:04.265 --> 06:06.620
With a warm site,
they're the basics,

06:06.620 --> 06:08.080
but there's also furniture.

06:08.080 --> 06:10.885
There are computer systems,
there are telephones.

06:10.885 --> 06:14.255
Again, that's just generic
equipment, nothing on my own.

06:14.255 --> 06:15.860
That will still take a bit of

06:15.860 --> 06:17.555
time to get back up and rolling.

06:17.555 --> 06:19.085
Speaking of rolling,

06:19.085 --> 06:20.675
there's a rolling hot site.

06:20.675 --> 06:22.150
Sometimes you see these.

06:22.150 --> 06:24.410
They pull up in the
event of a disaster like

06:24.410 --> 06:25.610
a little mobile home on

06:25.610 --> 06:27.725
wheels containing
computer equipment,

06:27.725 --> 06:29.570
perhaps, but
something that we can

06:29.570 --> 06:32.320
process some other data
center operations.

06:32.320 --> 06:34.655
It's really a
short-term solution.

06:34.655 --> 06:36.305
We can pay for a hot site.

06:36.305 --> 06:38.780
That's a location that
is under our ownership,

06:38.780 --> 06:41.745
not ownership, but we
have exclusive use to.

06:41.745 --> 06:44.480
It's fully configured
and has my equipment and

06:44.480 --> 06:45.740
we just really need
to come in and

06:45.740 --> 06:47.525
restore from the latest backup.

06:47.525 --> 06:50.105
You can get back up and
running pretty quickly.

06:50.105 --> 06:52.760
Mirrored site is usually
on our ownership.

06:52.760 --> 06:54.200
It's a branch office.

06:54.200 --> 06:57.140
We can switch operations
to the northwest region.

06:57.140 --> 06:59.330
You've got access to our
data. They're staffed.

06:59.330 --> 07:01.375
They've all of the
equipment that they need.

07:01.375 --> 07:02.810
In order to make sure that it's

07:02.810 --> 07:04.415
fully redundant in every way,

07:04.415 --> 07:06.710
that could be very expensive.

07:06.710 --> 07:09.080
There are certainly some
recovery strategies

07:09.080 --> 07:10.970
in relation to our facilities.

07:10.970 --> 07:12.320
We also have to think about

07:12.320 --> 07:13.850
>> personnel where job rotation

07:13.850 --> 07:15.200
>> and training
would help in any

07:15.200 --> 07:17.490
>> of our processes as well.

