WEBVTT

00:00:00.980 --> 00:00:04.560
One of the essential resources required to recover our

00:00:04.570 --> 00:00:07.220
IT systems today is really data,

00:00:07.570 --> 00:00:12.110
and we have to be prepared so we are able to recover

00:00:12.110 --> 00:00:14.770
the data if we have a major outage.

00:00:14.780 --> 00:00:18.110
That means we have some type of data preservation plan.

00:00:20.120 --> 00:00:24.050
We use this term the recovery point objective.

00:00:24.540 --> 00:00:28.510
The recovery point objective means we won't lose too much data

00:00:28.850 --> 00:00:32.970
and that recovery point objective determines,

00:00:32.970 --> 00:00:38.120
or in many cases, influences or drives what our backup strategy should be.

00:00:38.810 --> 00:00:42.940
Do we back up our data to the cloud so that it should be available from

00:00:42.940 --> 00:00:46.530
an offsite location if our head office burned down?

00:00:47.240 --> 00:00:51.370
Do we actually have some type of storage area network with say

00:00:51.370 --> 00:00:55.910
internal hard drives or some type of removable storage that we

00:00:55.910 --> 00:00:59.330
could put off in a secure location, say every day?

00:01:00.080 --> 00:01:03.790
Do we mirror our data on two different, should we say,

00:01:03.790 --> 00:01:08.600
systems, maybe even geographically dispersed locations?

00:01:09.450 --> 00:01:11.040
Do we take all of our data,

00:01:11.040 --> 00:01:16.170
say once an hour and write it off into an electronic vault or

00:01:16.170 --> 00:01:19.720
maybe every 1000 transactions so it goes offsite,

00:01:20.050 --> 00:01:22.100
and if there was a problem with the primary site,

00:01:22.100 --> 00:01:25.880
well the most I would ever lose is that 1000 transactions that

00:01:25.890 --> 00:01:28.750
have happened since the last time I did a vault.

00:01:29.600 --> 00:01:34.460
When we deal with databases, whenever we make a change to a database,

00:01:34.460 --> 00:01:39.030
we write a little journal entry that allows us to recover

00:01:39.030 --> 00:01:41.570
the actual changes made to the database,

00:01:41.580 --> 00:01:44.690
even if the database was corrupt or failed.

00:01:45.230 --> 00:01:49.760
The thing is that if that journal is just kept on the same system that failed,

00:01:49.940 --> 00:01:51.970
it's probably going to be lost as well.

00:01:52.280 --> 00:01:55.500
So we will write that journal off to another location.

00:01:55.820 --> 00:01:59.190
We'll take a full database backup on a regular basis,

00:01:59.250 --> 00:02:03.350
and we can apply those journals to bring the database

00:02:03.350 --> 00:02:08.440
right up to the time of the failure, minimizing the amount of actual data loss.

00:02:09.380 --> 00:02:12.490
We want to build our systems to be resilient.

00:02:12.970 --> 00:02:15.130
That means quite often fault tolerant.

00:02:15.320 --> 00:02:17.680
We put in things like, for example,

00:02:17.680 --> 00:02:23.660
duplication and redundancy of equipment and networks so that if one failed,

00:02:24.430 --> 00:02:26.620
the others will still be able to keep going,

00:02:26.780 --> 00:02:29.200
and one of the solutions to that is a cluster.

00:02:29.200 --> 00:02:32.555
Maybe I have a number of servers working together

00:02:32.665 --> 00:02:35.220
and all of them sharing the load.

00:02:35.220 --> 00:02:36.520
If one goes down,

00:02:36.525 --> 00:02:39.845
the others just keep on processing and should have a very

00:02:39.845 --> 00:02:43.095
minimal impact on our customers and users.

00:02:43.980 --> 00:02:46.540
We build high‑availability systems,

00:02:46.800 --> 00:02:50.280
systems where we've built in the ability to failover

00:02:50.390 --> 00:02:53.070
if a piece of equipment fails, for example.

00:02:53.720 --> 00:02:58.270
We also make sure we have the appropriate levels of quality of

00:02:58.270 --> 00:03:01.090
service which ensures that we have the bandwidth,

00:03:01.090 --> 00:03:05.840
the storage we need for our processing to actually be then handled.

00:03:07.520 --> 00:03:10.390
In summary, in this module,

00:03:10.400 --> 00:03:14.050
we set out the foundation for continuity of operations.

00:03:14.220 --> 00:03:19.690
Our goal is to ensure the organization is prepared to deal with and

00:03:19.690 --> 00:03:23.800
manage disruption to business mission and operations.

00:03:24.510 --> 00:03:28.760
This is so that we can sustain the critical business

00:03:28.760 --> 00:03:32.880
operations through proper preparation and planning.