WEBVTT

00:00:01.140 --> 00:00:05.760
Hashing is closely related to encryption, but not quite the same thing.

00:00:06.240 --> 00:00:10.200
Often, we'll use hashing together with encryption, but

00:00:10.200 --> 00:00:13.040
hashing for the most part is about integrity,

00:00:13.200 --> 00:00:15.170
not about confidentiality,

00:00:15.420 --> 00:00:19.010
even though we do hash things like a password in order to

00:00:19.010 --> 00:00:21.360
maintain the confidentiality of the password.

00:00:22.140 --> 00:00:26.060
The advantage, of course, of a hash is that it's a one‑way function.

00:00:26.740 --> 00:00:32.759
We can go from a message, create a hash or digest of that message that

00:00:33.140 --> 00:00:36.860
is a very accurate representation of that message.

00:00:37.340 --> 00:00:42.090
Even if one bit in the original message was to be changed,

00:00:42.160 --> 00:00:45.560
the hash will vary often by more than 40%.

00:00:46.340 --> 00:00:49.030
The advantage of hashing is it's not reversible.

00:00:49.120 --> 00:00:50.260
It's only one way.

00:00:50.740 --> 00:00:54.000
So if a person has a hash, that's not going to tell them

00:00:54.000 --> 00:01:00.320
anything about the source message. In this way, hashing becomes

00:01:00.320 --> 00:01:02.760
a critical piece in the security chain,

00:01:03.140 --> 00:01:08.280
working primarily for integrity and to ensure authenticity,

00:01:08.450 --> 00:01:13.950
integrity, protection of secrets such as a password, but as we

00:01:13.950 --> 00:01:17.960
say, it's not really cryptography. Cryptography is something we

00:01:17.960 --> 00:01:21.530
can encrypt and we can then decrypt in what we call a two‑way

00:01:21.530 --> 00:01:25.560
function. In the case of hashing, it's a one‑way function.

00:01:27.440 --> 00:01:32.960
Some of the main benefits of hashing is that it's very sensitive to any changes

00:01:33.440 --> 00:01:38.020
in a document, file, database or other entity. It's fast,

00:01:38.440 --> 00:01:44.550
freely available, and in many ways we can say here essential for the cloud

00:01:45.040 --> 00:01:49.450
because we need to protect the data we both transmit and store.

00:01:50.240 --> 00:01:55.230
We combine it with asymmetric or public key encryption in order to create

00:01:55.230 --> 00:02:00.480
digital signatures that we looked at before. A digital signature proves the

00:02:00.480 --> 00:02:07.820
message was not altered and who it came from. Closely related to this are

00:02:07.820 --> 00:02:10.960
the areas of masking and obfuscation.

00:02:12.340 --> 00:02:16.260
This is the ability to hide sensitive data so it's not

00:02:16.260 --> 00:02:18.760
going to be disclosed. For example,

00:02:18.760 --> 00:02:23.010
we'll often replace a password or payment card number

00:02:23.020 --> 00:02:25.560
with some type of meaningless values,

00:02:25.570 --> 00:02:31.930
dots or hash marks, for example. It's also used to protect data in both

00:02:31.930 --> 00:02:36.460
the use and sharing phases of the cloud data lifecycle.

00:02:37.640 --> 00:02:41.450
In this way, the real data may be stored, say, for

00:02:41.450 --> 00:02:45.850
example, in the database, but the user only sees the

00:02:45.850 --> 00:02:49.260
actual hash values or obfuscated value.

00:02:51.040 --> 00:02:54.320
We often would do this when we're going to put sensitive data

00:02:54.330 --> 00:02:58.360
onto a screen or, for example, into a report.

00:03:00.140 --> 00:03:04.230
Anonymization is to remove the personal identifiers,

00:03:04.230 --> 00:03:07.150
the sensitive data within a record.

00:03:07.360 --> 00:03:12.750
In other words, we can make the data now non‑personally identifiable, so we

00:03:12.750 --> 00:03:16.780
remove the personally identifiable information, or PII.

00:03:16.780 --> 00:03:20.280
We can do this a few different ways.

00:03:20.280 --> 00:03:24.550
We can just randomly substitute the personal information for something

00:03:24.550 --> 00:03:30.310
else, or we'll very often do an algorithmic substitution. We'll use

00:03:30.310 --> 00:03:35.090
some type of a formula to actually substitute the values. That would

00:03:35.090 --> 00:03:39.370
allow us to go back if we needed to. If we had done random

00:03:39.370 --> 00:03:43.550
substitution, we may never be able to determine what the real value

00:03:43.550 --> 00:03:45.250
was at a later time.

00:03:46.840 --> 00:03:50.700
One of the things we always have to watch for, as we talked about before

00:03:50.700 --> 00:03:56.350
though, is the risk of deanonymization. Is someone going to be able to

00:03:56.350 --> 00:04:01.770
figure out who that identifier relates to in,

00:04:01.770 --> 00:04:02.520
for example,

00:04:02.520 --> 00:04:10.370
a small sample set? We do masking and anonymization in the cloud. For example,

00:04:10.370 --> 00:04:14.730
in a Software as a Service application, it could be done in the services

00:04:14.730 --> 00:04:19.649
offered by the cloud service provider, or it can be built into the

00:04:19.649 --> 00:04:25.230
applications that we either custom build or we purchase to run in a

00:04:25.230 --> 00:04:27.660
Platform as a Service type of environment.

00:04:28.540 --> 00:04:33.850
In this case, it could be managed by the cloud consumer instead of the provider.

00:04:34.740 --> 00:04:37.000
In the case of Infrastructure as a Service,

00:04:37.160 --> 00:04:42.820
this is the responsibility of the cloud consumer. Tokenization is

00:04:42.820 --> 00:04:46.430
something we've used for a long time where we replace this sensitive

00:04:46.430 --> 00:04:51.660
information with something which is not valuable or not indicative of

00:04:51.660 --> 00:04:53.760
what the sensitive information is.

00:04:54.440 --> 00:04:55.360
For example,

00:04:55.740 --> 00:05:01.650
we see this a lot with the use of payment card information where if I go up

00:05:01.650 --> 00:05:06.420
to a petrol pump or a gas pump and I want to make a purchase of gasoline,

00:05:06.420 --> 00:05:14.420
then the pump application talks to the bank, transmits the credit card

00:05:14.420 --> 00:05:21.500
information to the bank, the bank processes that, and sends a record of this

00:05:21.500 --> 00:05:26.350
transaction back to the gas station merchant. But the record back to the

00:05:26.350 --> 00:05:31.890
merchant does not contain the credit card number. It only contains a token

00:05:31.890 --> 00:05:38.710
value that only the bank knows which credit card is associated with that

00:05:38.860 --> 00:05:39.910
token value.

00:05:40.080 --> 00:05:44.000
So it does allow a cross reference back to the original value,

00:05:44.010 --> 00:05:46.350
but only by the authorized people.

00:05:48.140 --> 00:05:50.890
We have often done this with payment card information,

00:05:50.890 --> 00:05:55.580
for example, so that you can process a credit card,

00:05:55.580 --> 00:05:56.450
for example,

00:05:56.460 --> 00:05:59.930
without the merchant ever actually having or knowing

00:05:59.940 --> 00:06:02.060
what that credit card number was.

00:06:03.540 --> 00:06:04.760
The key points review.

00:06:06.040 --> 00:06:10.660
The protection of data in the cloud is the responsibility of the cloud consumer,

00:06:11.140 --> 00:06:12.060
the data owner.

00:06:12.640 --> 00:06:16.750
And it is important that steps are taken to ensure

00:06:17.140 --> 00:06:23.160
that when I put my data on the cloud, it has been properly protected.

00:06:23.940 --> 00:06:29.650
It needs to be protected at all times using tools such as we've looked at here.

00:06:30.040 --> 00:06:32.790
Yeah, information rights management,

00:06:32.870 --> 00:06:38.350
data leakage or data loss prevention, and cryptographic functions.
