WEBVTT

00:00:01.010 --> 00:00:03.450
Let's continue with this Business Continuity,

00:00:03.450 --> 00:00:04.580
Disaster Recovery,

00:00:04.580 --> 00:00:09.530
and Incident Response for the Certified in Cybersecurity certification

00:00:09.670 --> 00:00:12.930
with a more detailed look at business continuity.

00:00:13.770 --> 00:00:18.320
Earlier on, we saw this definition, business resilience,

00:00:18.330 --> 00:00:20.470
a common word being used today,

00:00:20.870 --> 00:00:25.160
and it can be defined as the ability to continue operations,

00:00:25.200 --> 00:00:27.970
even during adverse circumstances,

00:00:28.210 --> 00:00:32.369
so this is the heartbeat or the main thrust of some type of

00:00:32.369 --> 00:00:36.630
business continuity program continuing operations,

00:00:36.770 --> 00:00:38.020
not just recovering.

00:00:39.170 --> 00:00:44.230
We saw before that incident response is very often the first step,

00:00:44.240 --> 00:00:46.250
but when it's a severe incident,

00:00:46.310 --> 00:00:50.760
it might trigger the need to implement and start to

00:00:50.760 --> 00:00:53.350
use business continuity plans.

00:00:54.720 --> 00:00:59.540
The outcomes of the business continuity management system were to have an

00:00:59.550 --> 00:01:03.540
incident response plan focused on life safety containment,

00:01:03.720 --> 00:01:06.060
documentation, and return to normal,

00:01:06.700 --> 00:01:12.330
but then to have a business continuity plan focused on business impact analysis,

00:01:12.510 --> 00:01:16.510
critical business functions, recovery time objective,

00:01:16.800 --> 00:01:21.040
the data recovery point objective, and the recovery requirements.

00:01:21.900 --> 00:01:25.090
When we looked at disaster recovery planning,

00:01:25.100 --> 00:01:30.250
we're looking at a catastrophic event that meant we had to relocate

00:01:30.480 --> 00:01:33.780
IT and other services to an alternate location.

00:01:35.170 --> 00:01:38.590
Business continuity is just simply project management.

00:01:38.940 --> 00:01:44.910
It starts with project initiation, then moves on to business impact analysis.

00:01:45.500 --> 00:01:50.340
Based on the business impact analysis, we'll select our recovery strategy.

00:01:50.340 --> 00:01:55.170
Then we write plans for how to implement that recovery

00:01:55.170 --> 00:01:57.990
strategy in the event of a serious incident,

00:01:58.580 --> 00:02:01.610
but we know that all plans need to be tested.

00:02:01.920 --> 00:02:03.600
We need to roll it out,

00:02:03.610 --> 00:02:08.030
communicate it so that everyone is aware of what to do in a crisis,

00:02:08.160 --> 00:02:11.890
and certainly through testing, we train our staff,

00:02:11.900 --> 00:02:14.610
and we also find any flaws in the plan.

00:02:15.550 --> 00:02:20.360
Every type of use of the plan, whether it's a test or a real incident,

00:02:20.570 --> 00:02:23.610
will allow us also to learn more about how to make the

00:02:23.610 --> 00:02:26.480
plans better and maintain the plan.

00:02:28.140 --> 00:02:32.880
The heartbeat of business continuity is understanding the business,

00:02:33.240 --> 00:02:38.250
and this is a process known as analysis of the impact on the business,

00:02:38.250 --> 00:02:39.800
or BIA,

00:02:40.150 --> 00:02:46.820
and it could easily be said this is the critical and most important step

00:02:46.940 --> 00:02:50.330
in the actual business continuity planning process.

00:02:51.140 --> 00:02:56.630
Through business impact analysis, we identify what is critical,

00:02:56.980 --> 00:02:59.980
the critical business functions, processes,

00:02:59.980 --> 00:03:05.830
for example, that are going to have the most impact on the profitability,

00:03:05.930 --> 00:03:09.700
the reputation, and operations of the organization.

00:03:10.340 --> 00:03:13.130
Some departments are more important than others.

00:03:13.670 --> 00:03:15.950
For a while, I worked in internal audit,

00:03:15.950 --> 00:03:18.830
and believe me, we weren't a critical process.

00:03:19.020 --> 00:03:21.780
Most of the business thought they'd run better without us,

00:03:22.130 --> 00:03:27.600
but the ones that are important need to be identified so

00:03:27.600 --> 00:03:29.400
that's where we set our priorities.

00:03:29.990 --> 00:03:34.500
We also need to know what are the critical supporting processes in

00:03:34.500 --> 00:03:38.890
order to support those critical business functions.

00:03:39.230 --> 00:03:39.980
In other words,

00:03:39.990 --> 00:03:45.440
the dependencies that critical business functions have on supporting processes.

00:03:47.150 --> 00:03:50.180
When we want to recover a business process,

00:03:50.340 --> 00:03:52.970
we need to know what we need in resources,

00:03:53.430 --> 00:03:59.200
people, data, facilities, equipment, and supply chain.

00:04:00.850 --> 00:04:05.720
The BIA allows us to determine our priorities for recovery.

00:04:07.170 --> 00:04:09.420
Let's look at how this all works.

00:04:09.990 --> 00:04:13.960
We have the element of time and business impact

00:04:13.960 --> 00:04:18.140
analysis is all about impact over time.

00:04:18.480 --> 00:04:19.170
In that way,

00:04:19.170 --> 00:04:22.830
it's different from risk management because when we looked at risk

00:04:22.830 --> 00:04:25.690
management back in the Security Principles course,

00:04:26.110 --> 00:04:30.450
we saw that risk was based on impact and likelihood.

00:04:30.940 --> 00:04:33.580
So here, we're looking at impact over time,

00:04:33.740 --> 00:04:37.560
so very much an overlapping type of supporting process,

00:04:37.640 --> 00:04:40.090
but slightly different from a risk assessment.

00:04:41.210 --> 00:04:43.880
Over time, the business is running as normal,

00:04:43.890 --> 00:04:49.470
normal operations, but then one day, we encounter a crisis.

00:04:49.920 --> 00:04:54.700
As a result of that crisis, our level of business drops to 0.

00:04:55.400 --> 00:05:00.290
We're no longer producing a product, we're no longer meeting our mission.

00:05:00.920 --> 00:05:01.560
Now,

00:05:01.570 --> 00:05:06.340
immediately we should start to determine what is the impact of

00:05:06.340 --> 00:05:11.380
that inability to operate our business over time,

00:05:11.950 --> 00:05:16.980
and we can see that that quite often will grow kind of exponentially at the end.

00:05:17.530 --> 00:05:19.480
Over the first few hours,

00:05:19.480 --> 00:05:22.540
people understand if we've got a little bit of an outage,

00:05:22.680 --> 00:05:24.210
but the longer it goes,

00:05:24.210 --> 00:05:27.990
the greater the damage to our reputation and finance becomes.

00:05:28.630 --> 00:05:31.260
Now, this is different for different business processes.

00:05:31.560 --> 00:05:34.630
Obviously, if this is the life support system,

00:05:34.770 --> 00:05:37.840
this is measured in minutes, not in hours or days.

00:05:38.520 --> 00:05:42.340
One of the things we try to determine through all of this is

00:05:42.340 --> 00:05:45.690
when the level of impact would be high enough that we

00:05:45.690 --> 00:05:50.330
actually encounter business failure, the business has to shut down.

00:05:50.540 --> 00:05:53.930
We are unable to continue business operations.

00:05:54.360 --> 00:05:57.010
We've lost the confidence of our customers,

00:05:57.020 --> 00:06:00.000
our owners, our bankers, for example,

00:06:00.630 --> 00:06:04.610
and that point in time at which we would encounter business failure

00:06:04.850 --> 00:06:07.980
can be called the maximum tolerable downtime.

00:06:08.600 --> 00:06:12.790
Sometimes we'll hear that called the maximum tolerable period of disruption.

00:06:13.480 --> 00:06:17.590
In the old days, we used to hear it called maximum allowable downtime.

00:06:17.940 --> 00:06:21.950
I think sometimes they change the name just to keep us all a little confused.

00:06:23.490 --> 00:06:27.180
So we look at all the business processes of the organization.

00:06:27.550 --> 00:06:30.360
We said that some are more critical than others,

00:06:30.660 --> 00:06:35.210
and we want to know what are the critical supporting processes for

00:06:35.210 --> 00:06:38.390
each of the critical business processes as well.

00:06:39.140 --> 00:06:41.240
We'll quite often then group.

00:06:41.390 --> 00:06:44.710
There is no way to recover a business process without also

00:06:44.940 --> 00:06:47.240
recovering its supporting processes,

00:06:47.530 --> 00:06:51.770
so our recovery plan should look at recovering both of them,

00:06:51.890 --> 00:06:53.600
should we say, concurrently.

00:06:54.610 --> 00:06:56.490
We can say it simply this way,

00:06:56.490 --> 00:07:02.260
you cannot recover essential services without recovering supporting processes.

00:07:04.120 --> 00:07:08.220
One of the things we need to learn is what will our owners,

00:07:08.410 --> 00:07:13.120
what will regulators, and what will our customers tolerate?

00:07:13.750 --> 00:07:16.480
These would be tolerable levels of outage.

00:07:16.950 --> 00:07:20.740
We all know that, in some cases, the customer will say,

00:07:20.740 --> 00:07:24.410
oh yeah, sure, your systems are down, I'll call back in an hour.

00:07:25.060 --> 00:07:27.640
In other cases, we will lose the customer.

00:07:28.390 --> 00:07:33.180
So this is where we have to understand what our customers expect.

00:07:33.190 --> 00:07:37.040
Are there regulations that say we must provide a

00:07:37.040 --> 00:07:41.680
certain level of service bound by say, government regulations?

00:07:42.240 --> 00:07:46.020
All of these can help us determine the point of business failure,

00:07:46.440 --> 00:07:51.090
something we called before the maximum tolerable downtime for those

00:07:51.090 --> 00:07:54.160
critical processes and their supporting processes.

00:07:55.390 --> 00:08:00.290
Then we want to determine what is our ideal time of recovery,

00:08:00.890 --> 00:08:04.740
and this is known as the recovery time objective and will have

00:08:04.740 --> 00:08:08.270
different recovery time objectives for different processes.

00:08:09.630 --> 00:08:15.330
The, of course, requirement is that the recovery time objective must be,

00:08:15.340 --> 00:08:20.430
in fact, we could say, significantly less than the maximum tolerable downtime.

00:08:20.920 --> 00:08:24.530
I don't want to write a plan that would have me recover my critical

00:08:24.530 --> 00:08:27.900
business process an hour before the business would fail.

00:08:29.660 --> 00:08:33.720
The other thing we have to look at is the recovery point objective,

00:08:34.169 --> 00:08:38.510
and I always call it the data recovery point objective because what

00:08:38.510 --> 00:08:43.140
this refers to is what is my data recovery point.

00:08:43.970 --> 00:08:47.730
I'm really saying that if I have a major interruption,

00:08:47.850 --> 00:08:50.560
how much data can I afford to lose.

00:08:50.920 --> 00:08:54.670
So really what this measures is the amount of data that

00:08:54.670 --> 00:09:00.260
can be lost in the case of an outage and how old the data

00:09:00.260 --> 00:09:02.020
would be when it's restored.

00:09:04.110 --> 00:09:06.390
When we looked at the resource requirements,

00:09:06.390 --> 00:09:12.340
we need to identify what would be required in order to restore systems.

00:09:12.830 --> 00:09:16.940
Now that, as we said, also included some of our supporting processes,

00:09:16.940 --> 00:09:18.970
our dependencies,

00:09:19.260 --> 00:09:23.990
but also it includes things like the controls we put in

00:09:23.990 --> 00:09:28.030
place that could be added to try to make sure that this

00:09:28.040 --> 00:09:30.440
doesn't just happen again right away.

00:09:32.360 --> 00:09:36.000
So let's go back to that diagram we looked at before.

00:09:36.410 --> 00:09:41.750
The idea here of BIA was that we determine what was the level of

00:09:41.750 --> 00:09:45.890
impact over time until the point of business failure.

00:09:47.090 --> 00:09:52.480
Then we want to say, okay, what would it cost for us to recover the business?

00:09:53.060 --> 00:09:59.710
Now, the cost of recovery is often the inverse of the duration of the outage.

00:10:00.060 --> 00:10:04.400
In other words, I could have a very minimal amount of,

00:10:04.410 --> 00:10:09.190
should we say, outage time, but then the cost of the recovery is very high.

00:10:09.830 --> 00:10:12.620
So in most cases, instead,

00:10:12.620 --> 00:10:17.120
we will try to find more of that crossover point at which point

00:10:17.130 --> 00:10:20.180
we could say the cost of recovery is sort of,

00:10:20.850 --> 00:10:23.620
I should say, inline with the impact.

00:10:25.290 --> 00:10:29.830
This is where we want to set our recovery time objective.

00:10:29.910 --> 00:10:33.790
So we write plans to try to recover these critical

00:10:33.790 --> 00:10:37.290
business processes by this point in time.

00:10:38.500 --> 00:10:42.950
But when I recover, say after a fire that wiped out my head office,

00:10:43.670 --> 00:10:49.570
I have to go to my data backups and maybe I did data backups on a regular basis,

00:10:50.530 --> 00:10:57.130
but the time of the failure was not the same as the time of my last data backup.

00:10:57.830 --> 00:11:01.200
So I, when I rebuild my systems,

00:11:01.210 --> 00:11:05.820
am going to have to use the most recent backup I have,

00:11:06.090 --> 00:11:10.230
which quite simply means that quite likely all of the data

00:11:10.230 --> 00:11:13.170
from the time of the last backup until the time of the

00:11:13.170 --> 00:11:15.890
crisis will actually be lost data.

00:11:17.370 --> 00:11:22.600
All of this allows me to set out my priorities and plans for recovery.

00:11:23.100 --> 00:11:27.920
I establish one of the priorities for system recovery based on cost,

00:11:28.320 --> 00:11:31.470
as well as the level of impact to the business.

00:11:32.190 --> 00:11:36.020
And of course, I must have a plan which is feasible,

00:11:36.480 --> 00:11:42.050
not unrealistic, I can't recover a major system in a few minutes.

00:11:42.980 --> 00:11:45.470
It must be something which is acceptable,

00:11:45.790 --> 00:11:49.370
acceptable to should we say our customers,

00:11:49.370 --> 00:11:51.710
our owners, management,

00:11:52.340 --> 00:11:56.030
something which is suitable for the type of business we're in.

00:11:56.650 --> 00:12:00.840
And of course, this is something that quite often is a little bit contentious.

00:12:01.180 --> 00:12:03.180
We'll have a lot of different people think,

00:12:03.180 --> 00:12:08.040
well, my department is most important so you should recover my department first.

00:12:09.160 --> 00:12:10.150
In the end,

00:12:10.310 --> 00:12:13.880
we need to go back to senior management and hope that they will

00:12:13.880 --> 00:12:18.380
approve the actual choices we've made for which parts of the

00:12:18.380 --> 00:12:20.710
business should be recovered first.

00:12:22.470 --> 00:12:24.020
The key points review.

00:12:25.170 --> 00:12:28.420
Its business impact analysis that provides us the

00:12:28.420 --> 00:12:33.130
information we need in order to move ahead with selecting

00:12:33.130 --> 00:12:35.920
our recovery strategies and writing plans.

00:12:36.210 --> 00:12:40.240
It's critical to the business continuity planning process.

00:12:40.770 --> 00:12:45.200
It identifies all of the critical business processes,

00:12:45.690 --> 00:12:49.830
documents the resources required to restore those processes,

00:12:50.110 --> 00:12:55.390
and gives us now the ability to choose restoration timelines.

00:12:56.110 --> 00:12:59.510
It sets out our priorities, and through this,

00:12:59.520 --> 00:13:05.750
helps us to move on so we can write effective plans for business continuity.