WEBVTT

00:00:01.120 --> 00:00:03.580
Let's take a look at data disposition.

00:00:04.740 --> 00:00:08.060
The idea is that data has incredible value,

00:00:08.060 --> 00:00:11.960
not just on itself, but often an aggregate as well.

00:00:12.840 --> 00:00:19.070
We accumulate vast amounts of data within most organizations, and we can

00:00:19.070 --> 00:00:23.810
use that data to do different types of data analysis.

00:00:24.040 --> 00:00:27.180
We often have referred to that these days as things like

00:00:27.180 --> 00:00:30.030
machine learning and artificial intelligence.

00:00:30.440 --> 00:00:35.100
But the idea is that it's a form of data mining. It's searching and

00:00:35.100 --> 00:00:39.860
doing analysis of our data in order to support business functions such

00:00:39.860 --> 00:00:42.950
as marketing or doing research, for example.

00:00:43.640 --> 00:00:47.100
And one of the things that the cloud is especially good at

00:00:47.300 --> 00:00:52.080
is being able to give us a platform to be able to host and

00:00:52.080 --> 00:00:54.460
process vast amounts of data.

00:00:55.040 --> 00:01:00.170
We even saw this, for example, in the contract awarded by the CIA in the

00:01:00.170 --> 00:01:05.940
United States was that because they had so much data that came in from all

00:01:05.940 --> 00:01:10.110
different sources and it needed to be processed quickly,

00:01:10.400 --> 00:01:16.450
they needed to do real time or at least near real‑time analysis. And

00:01:16.450 --> 00:01:21.580
for this, they found that using the algorithms and the capabilities

00:01:21.580 --> 00:01:24.360
of the cloud were very good for that.

00:01:25.840 --> 00:01:30.420
You can be sure that's very much a private cloud though because the data is

00:01:30.420 --> 00:01:37.320
extremely sensitive. And so even though it has a vast amount of data that's very

00:01:37.320 --> 00:01:41.190
much only for one customer with very limited access,

00:01:41.190 --> 00:01:45.580
both in a physical, but also, of course, in a network type of sense as

00:01:45.580 --> 00:01:53.430
well. The idea is that by using our data effectively, we are able to be

00:01:53.430 --> 00:01:56.760
more efficient and more effective in what we do.

00:01:57.340 --> 00:02:01.240
And the cloud works very well for collaboration,

00:02:01.250 --> 00:02:03.810
things like business intelligence,

00:02:03.820 --> 00:02:08.669
being able to tap into all of the information we have

00:02:08.669 --> 00:02:10.669
from different parts of the world.

00:02:10.680 --> 00:02:15.960
For example, even one large organization is always looking at weather.

00:02:16.240 --> 00:02:20.950
So even though they are a retail company that is selling products to

00:02:20.960 --> 00:02:27.280
customers, by feeding weather reports into their data analysis,

00:02:27.540 --> 00:02:32.440
they are able to predict what is going to be required in a local store

00:02:32.450 --> 00:02:35.660
based on the weather coming in in the next few days.

00:02:36.240 --> 00:02:42.430
So, the cloud has, in many ways, enabled this by being able to

00:02:42.430 --> 00:02:46.950
process and store vast amounts of data that were a little bit

00:02:46.960 --> 00:02:50.550
impractical for most organizations in the past.

00:02:52.130 --> 00:02:56.560
The challenge, of course, always is that when I have vast amounts of data,

00:02:56.940 --> 00:03:01.190
I need to know how to be able to store it, process it, and

00:03:01.190 --> 00:03:03.630
in some cases, we'll need to archive it,

00:03:03.640 --> 00:03:08.180
keeping it maybe even off site for a number of years,

00:03:08.180 --> 00:03:14.690
depending on regulations. And that data that's stored offsite

00:03:14.690 --> 00:03:17.850
or archived needs also to be retrieved.

00:03:18.440 --> 00:03:22.760
That means that if I need to retrieve archived data, there should be

00:03:22.760 --> 00:03:26.960
clear procedures on how that data can be obtained.

00:03:27.640 --> 00:03:31.690
And this quite often can mean working with a cloud service provider.

00:03:32.240 --> 00:03:37.160
We have seen even how large cloud service providers are archiving

00:03:37.160 --> 00:03:45.150
vast amounts of data into systems that are not online as such and

00:03:45.150 --> 00:03:49.320
allow us to be able to remove the amount of data that's sitting there

00:03:49.320 --> 00:03:50.960
in production systems.

00:03:51.540 --> 00:03:55.920
But we need to work with a cloud service provider to enable that data access.

00:03:56.340 --> 00:04:00.000
Usually, it's going to be a lot slower than the data, of course, that

00:04:00.000 --> 00:04:04.910
is online all the time. But there should be some type of backup and

00:04:04.910 --> 00:04:06.950
archive schedule that's used for this.

00:04:06.960 --> 00:04:09.960
Do we automatically back everything up every day?

00:04:10.540 --> 00:04:15.550
Do we archive data that hasn't been accessed in 30 days or 60 days?

00:04:16.040 --> 00:04:19.470
And these are things that we have to build into our data handling

00:04:19.470 --> 00:04:24.990
procedures. When it comes to data retention, we know that this is based a

00:04:24.990 --> 00:04:30.850
lot on what does the business need, but also as we've seen on legal

00:04:30.850 --> 00:04:35.670
requirements. We have a lot of data we may need to keep on our customers

00:04:35.670 --> 00:04:41.920
if we've had a customer, say, we're a telecom. And we've had a customer

00:04:41.920 --> 00:04:47.890
living in a house for 40 years, we need to keep a record of their old bills,

00:04:47.890 --> 00:04:50.830
all our interactions with them, but we certainly don't

00:04:50.830 --> 00:04:52.370
want to keep that in production.

00:04:52.940 --> 00:04:54.670
We archive it off.

00:04:54.680 --> 00:04:57.960
But in most cases, we can't just delete that until that

00:04:57.960 --> 00:05:00.260
customer is no longer a customer.

00:05:00.640 --> 00:05:02.390
It's the same with health information.

00:05:02.390 --> 00:05:05.650
If we've provided health care for a person,

00:05:06.640 --> 00:05:11.500
we are usually required to keep that available as long as that person is alive.

00:05:11.660 --> 00:05:13.120
So there are,

00:05:13.130 --> 00:05:16.690
we can say here, considerations that have to be built into

00:05:16.690 --> 00:05:19.580
our archiving and retention process.

00:05:20.530 --> 00:05:23.180
There's one other term we should bring in here, and that's the

00:05:23.180 --> 00:05:26.910
term legal hold. Let's say, for example,

00:05:26.910 --> 00:05:32.850
I have a customer who wants to initiate some type of lawsuit because

00:05:32.850 --> 00:05:36.360
they're not happy with the service that my company provided.

00:05:37.140 --> 00:05:42.450
And the moment I'm advised that there could be a lawsuit coming,

00:05:42.940 --> 00:05:46.680
one of the things we're often required to do is immediately then

00:05:46.950 --> 00:05:52.230
ensure that any data related to that customer or that area that's

00:05:52.240 --> 00:05:57.970
subject to this type of court action is being preserved, and

00:05:57.970 --> 00:05:59.460
that's called legal hold.

00:05:59.940 --> 00:06:03.750
And, of course, that, in some cases, can mean that normally I would

00:06:03.750 --> 00:06:06.700
have deleted data I didn't need after three months.

00:06:07.240 --> 00:06:10.880
I might have to keep this for several years because it could be the

00:06:10.880 --> 00:06:15.550
subject of a pending type of court action, so that's a legal hold.

00:06:15.550 --> 00:06:20.000
And that's where we have to have what would be an exception to maybe

00:06:20.000 --> 00:06:21.850
our data deletion policy.

00:06:23.740 --> 00:06:27.640
When I have a paper document in my hand,

00:06:27.640 --> 00:06:30.850
it's quite easy to make sure I have defensible destruction.

00:06:31.240 --> 00:06:35.240
I stand, and I watch it go through a shredder, and I see that it's been

00:06:35.250 --> 00:06:39.530
ground up into even little paper fragments and so on.

00:06:39.530 --> 00:06:44.750
I know it was destroyed. But when I've stored something on the cloud,

00:06:45.440 --> 00:06:49.770
how can I make sure that data was deleted? When we're dealing

00:06:49.770 --> 00:06:54.160
with secure destruction by a cloud service provider, it's much

00:06:54.160 --> 00:06:57.060
more difficult because I can't just do something like

00:06:57.070 --> 00:07:00.450
overwriting. When I go to overwrite,

00:07:00.840 --> 00:07:05.570
I don't even know if I'm overwriting to the same place my original data was.

00:07:05.570 --> 00:07:10.730
My data might have moved around many times within the various systems and

00:07:10.740 --> 00:07:14.650
hardware devices of the cloud service provider.

00:07:15.240 --> 00:07:20.360
And so therefore, I should ensure that my contract to the cloud

00:07:20.360 --> 00:07:25.000
service provider include the fact that all of their old hardware

00:07:25.000 --> 00:07:30.320
will be destroyed. When a piece of equipment is being replaced by

00:07:30.320 --> 00:07:31.860
the cloud service provider,

00:07:32.240 --> 00:07:36.940
it will actually be physically destroyed, and that, of course,

00:07:36.940 --> 00:07:42.560
can help ensure that any data that's on that will not be recoverable.

00:07:43.140 --> 00:07:47.360
Now, this becomes a little bit of an onerous task because

00:07:47.840 --> 00:07:51.250
there can be an awful lot of equipment, therefore, that needs

00:07:51.250 --> 00:07:53.370
to be destroyed on a regular basis.

00:07:53.820 --> 00:07:55.690
I know one cloud service provider,

00:07:55.690 --> 00:07:58.990
they replace a piece of hardware every two minutes around

00:07:58.990 --> 00:08:02.590
the world, and that's an awful lot of hardware then that has

00:08:02.590 --> 00:08:04.750
to be then property destroyed.

00:08:05.140 --> 00:08:09.960
And in a case like that, we've seen they've sometimes subcontracted that out.

00:08:10.640 --> 00:08:15.200
And we've seen several cases where the subcontracting company instead

00:08:15.200 --> 00:08:17.960
of actually destroying it actually was reselling it.

00:08:18.440 --> 00:08:25.450
So, defensible destruction means that I try to make sure that not only do I

00:08:25.450 --> 00:08:30.100
have a contract with a cloud service provider, but that that contract is

00:08:30.100 --> 00:08:36.250
being audited and followed to make sure that they are then living up to the

00:08:36.250 --> 00:08:42.200
terms of that contract as well. Hardware destruction is considered the most

00:08:42.200 --> 00:08:48.850
secure and, in fact, the only truly secure way to ensure data deletion.

00:08:48.850 --> 00:08:55.070
Overwriting, degaussing, purging, clearing, not usually good enough because

00:08:55.080 --> 00:09:00.230
we could still have what we call data remnants, little bits of the data that

00:09:00.230 --> 00:09:05.020
remain on that piece of hardware, especially with solid state.

00:09:05.020 --> 00:09:09.920
Solid state is very difficult to truly clear the data that was on it,

00:09:09.920 --> 00:09:15.210
and that may be retrievable using advanced systems.

00:09:15.340 --> 00:09:19.830
We have seen this where we know that data has been recovered

00:09:19.830 --> 00:09:23.320
that's been overwritten more than 100 times using things like

00:09:23.330 --> 00:09:25.360
electron microscopes and so on.

00:09:26.440 --> 00:09:29.550
Obviously, it's good to overwrite when we can.

00:09:29.890 --> 00:09:34.670
It's good to degause, but in the case of a cloud, it's very difficult to make

00:09:34.670 --> 00:09:38.260
sure I'm even overwriting to the same place my data was.

00:09:38.640 --> 00:09:42.890
And, of course, since I don't have access to the hardware, I can't

00:09:42.890 --> 00:09:47.500
actually ensure that the degaussing is being done or that the

00:09:47.510 --> 00:09:51.600
hardware that was degaussed is tested to make sure that the

00:09:51.600 --> 00:09:53.560
degaussing actually worked, either.

00:09:54.340 --> 00:09:59.840
So some of the options we've had to ensure that as a cloud consumer,

00:09:59.850 --> 00:10:04.810
how can I make sure my data cannot be recovered by the cloud provider?

00:10:04.810 --> 00:10:09.990
And one of those is bit splitting. We protect data from compromising by

00:10:09.990 --> 00:10:16.070
encrypting the data, but then we split the encrypted data into pieces

00:10:16.070 --> 00:10:17.900
stored at different locations.

00:10:18.440 --> 00:10:23.650
So even if a person was able to get access to the data at one location,

00:10:23.730 --> 00:10:29.830
it would be incomplete data, and the unauthorized person will not

00:10:29.830 --> 00:10:32.660
be able to understand or reconstruct the data.

00:10:33.740 --> 00:10:38.130
We also have what we call erasure coding, data split into

00:10:38.130 --> 00:10:42.110
fragments and built out with redundant characters.

00:10:42.640 --> 00:10:47.440
The idea of this is that if I need to reconstruct the data,

00:10:47.450 --> 00:10:52.660
I could do so by gathering a certain number of the fragments.

00:10:53.040 --> 00:10:59.370
We often say that, for example, if I broke it into 16 fragments, I would

00:10:59.370 --> 00:11:03.250
need 10 of those to be able to reconstruct the data.

00:11:03.640 --> 00:11:05.940
So that's what we call n of m.

00:11:06.180 --> 00:11:11.160
We need n number of fragments out of a total of m number of fragments.

00:11:11.840 --> 00:11:18.220
And so what I've done, I took the data. I broke it into 16 pieces, but

00:11:18.230 --> 00:11:23.040
all of those pieces kind of overlapped a bit. So as long as I had 10 of

00:11:23.040 --> 00:11:25.550
them, I could actually recover the data.

00:11:26.040 --> 00:11:32.810
This is also known as forward error correction. And yeah, all or nothing

00:11:32.810 --> 00:11:36.770
transformation of the Reed Solomon was one of the examples of this.

00:11:38.260 --> 00:11:42.120
The idea of cryptographic erasure, or sometimes called

00:11:42.120 --> 00:11:47.660
crypto‑shredding, is the idea that what I will do as a cloud consumer

00:11:48.040 --> 00:11:51.660
is ensure that all of my data on the cloud is encrypted.

00:11:52.140 --> 00:11:55.350
I never store anything on the cloud without it being encrypted

00:11:55.350 --> 00:12:00.900
first. I keep the key. And let's say I leave this cloud provider

00:12:00.900 --> 00:12:02.650
to go to a different cloud provider.

00:12:03.440 --> 00:12:11.300
Then all I need to do is retrieve my data off of the cloud provider, decrypt it,

00:12:11.350 --> 00:12:14.930
and then when I load it on the new cloud provider systems,

00:12:15.030 --> 00:12:19.550
I encrypt it with a different key, then I destroy the old key.

00:12:19.940 --> 00:12:21.650
And what that means, of course,

00:12:21.650 --> 00:12:26.440
is all of that data on the original cloud provider would not be recoverable.

00:12:26.620 --> 00:12:32.580
The actual key has been lost, and it would not be computationally feasible

00:12:32.830 --> 00:12:36.350
under today's processes to be able to recover the data.

00:12:38.440 --> 00:12:41.550
The key points review. We've looked here at your data

00:12:41.550 --> 00:12:44.380
protection. We know that this is a challenge,

00:12:44.380 --> 00:12:46.860
especially when we're dealing with a cloud service provider

00:12:47.340 --> 00:12:50.850
because we don't know what they actually do.

00:12:51.440 --> 00:12:54.130
We hopefully try to protect ourselves through

00:12:54.130 --> 00:12:56.400
contracts and service‑level agreements.

00:12:57.040 --> 00:12:58.200
But in the end,

00:12:58.210 --> 00:13:01.850
things like hardware destruction is something that is

00:13:01.850 --> 00:13:04.860
really beyond our direct control.

00:13:05.640 --> 00:13:09.840
We know that contracts and SLAs can specify what should be done,

00:13:10.020 --> 00:13:13.650
but they don't guarantee that that was followed.

00:13:14.040 --> 00:13:18.720
So therefore, I should try to ensure secure data deletion

00:13:18.720 --> 00:13:22.480
maybe through audits or reviews of the cloud service

00:13:22.480 --> 00:13:26.220
provider systems using some type of, for example,

00:13:26.220 --> 00:13:32.550
an SSAE 20 or other type of agreed evaluation schema.
