WEBVTT

00:00:00.940 --> 00:00:02.960
When we look at data classification,

00:00:02.960 --> 00:00:07.750
it's really no different in the cloud than it is within any organization.

00:00:08.240 --> 00:00:12.500
But we're dealing with a third party, so it's important that all of the

00:00:12.500 --> 00:00:18.780
parties to a contract know what the appropriate and proper protection of

00:00:18.780 --> 00:00:24.920
data is, and the classification is way to indicate that. The purpose of

00:00:24.920 --> 00:00:30.730
data classification is to ensure that data is provided an appropriate

00:00:30.960 --> 00:00:32.320
level of protection.

00:00:32.409 --> 00:00:36.460
We could also say an adequate level of protection as well.

00:00:37.540 --> 00:00:41.460
Classification is sometimes also known as categorization.

00:00:41.840 --> 00:00:45.890
It is something, which is the responsibility of the data owner.

00:00:46.210 --> 00:00:50.570
Even though the data owner may never do the classification themselves,

00:00:50.810 --> 00:00:55.690
they are responsible to ensure that it is being done in the proper

00:00:55.690 --> 00:01:00.710
manner by those to whom that task has been delegated.

00:01:02.030 --> 00:01:05.830
The basis for classification is based on business.

00:01:06.070 --> 00:01:11.280
The idea that we have certain business elements that need data to be

00:01:11.280 --> 00:01:15.880
classified because it should not be released, disclosed to unauthorized

00:01:15.880 --> 00:01:19.790
people because it's important for business operations.

00:01:20.250 --> 00:01:20.870
For example,

00:01:20.870 --> 00:01:24.360
we can have ongoing projects and things that are happening

00:01:24.640 --> 00:01:27.370
that we don't want people to know, part of our new

00:01:27.370 --> 00:01:29.860
marketing plan or that sort of thing.

00:01:30.240 --> 00:01:35.320
We also have different business functions that some business requires

00:01:35.320 --> 00:01:39.490
the processing of personally identifiable information or even

00:01:39.610 --> 00:01:45.210
protected health information, and we need to identify where we are

00:01:45.210 --> 00:01:48.120
using data that does require protection.

00:01:49.040 --> 00:01:52.620
But the basis for classification is also based very

00:01:52.620 --> 00:01:55.160
much on laws and regulations.

00:01:55.280 --> 00:02:00.290
We have laws that specify how data and what data must

00:02:00.300 --> 00:02:03.860
be protected. For the most part,

00:02:03.860 --> 00:02:10.250
data protection comes under two categories, sensitive and critical. Sensitive

00:02:10.250 --> 00:02:16.590
data is data that could harm an organization or an individual if that data

00:02:16.590 --> 00:02:21.020
was improperly modified or improperly disclosed.

00:02:21.310 --> 00:02:26.980
So really, sensitivity is a measure of the level of confidentiality and

00:02:26.980 --> 00:02:33.540
integrity protections we need to build into the data. Criticality deals

00:02:33.540 --> 00:02:39.800
with how much the business requires that information to be available in

00:02:39.800 --> 00:02:42.060
order for a business function to work.

00:02:42.440 --> 00:02:46.020
And so criticality is usually related to availability.

00:02:47.240 --> 00:02:52.360
Organizations are different, departments are different, systems are different.

00:02:52.740 --> 00:02:57.380
So it could well be that one piece of information is very

00:02:57.380 --> 00:03:01.170
critical in one system, but that same piece of information

00:03:01.170 --> 00:03:04.230
is just nice to have on another.

00:03:04.800 --> 00:03:10.260
But then we have to look at protecting that data appropriately so

00:03:10.260 --> 00:03:14.200
that when it does require a high level of protection on one

00:03:14.200 --> 00:03:18.680
system that that is a consistent way that the protection is

00:03:18.680 --> 00:03:21.660
provided on other systems as well.

00:03:22.740 --> 00:03:26.540
There is always debate about how many levels of data

00:03:26.540 --> 00:03:31.000
classification an organization should have, and that is something

00:03:31.000 --> 00:03:33.450
that has to be determined by the data owner.

00:03:34.040 --> 00:03:37.850
It could well be that three levels are sufficient, whereas

00:03:37.850 --> 00:03:40.160
in some organizations they need more.

00:03:40.740 --> 00:03:44.460
The important thing is that each level should be distinct.

00:03:44.840 --> 00:03:50.260
The handling requirements should be clearly articulated

00:03:50.260 --> 00:03:52.950
or documented for each of the levels.

00:03:53.120 --> 00:03:58.260
It should be easy for one person to know what are the requirements

00:03:58.260 --> 00:04:02.660
on, say, another system for classification of it.

00:04:03.140 --> 00:04:05.800
So it's not such that, well, I'm not sure if this should

00:04:05.800 --> 00:04:09.760
be considered to be secret or top secret or business

00:04:09.760 --> 00:04:11.900
private or business confidential.

00:04:12.220 --> 00:04:12.440
No,

00:04:12.440 --> 00:04:16.769
there should be clear guidelines that specify what are the

00:04:16.769 --> 00:04:20.149
criteria for each of the levels of classification.

00:04:20.740 --> 00:04:24.250
We could say those are documented parameters.

00:04:24.480 --> 00:04:28.760
And in some cases, the default should be to go to a higher

00:04:28.760 --> 00:04:32.640
level of classification rather than a lower, but that depends

00:04:32.640 --> 00:04:34.740
a lot on the type of organization,

00:04:35.030 --> 00:04:39.020
depends on laws, and depends, in many cases, on the

00:04:39.020 --> 00:04:41.850
attitude of the data owner as well.

00:04:43.340 --> 00:04:47.030
The process for data classification starts with something we looked at

00:04:47.030 --> 00:04:50.450
at the beginning of this course, data identification.

00:04:51.040 --> 00:04:55.790
This is a stumbling point for many organizations who don't really

00:04:55.790 --> 00:05:00.060
even know what data they have or where it is.

00:05:00.640 --> 00:05:05.640
So, it is important that we not only identify what data we have,

00:05:05.650 --> 00:05:06.660
but where is it?

00:05:07.150 --> 00:05:08.670
And that is the data mapping.

00:05:08.810 --> 00:05:10.870
We do data flow diagrams,

00:05:10.870 --> 00:05:15.230
how does it go from this system to another system? We do flow charts to

00:05:15.230 --> 00:05:21.360
see the path data takes through the organization. And that information

00:05:21.360 --> 00:05:27.840
provides what's required for the data owner to be able to put in place the

00:05:27.840 --> 00:05:30.260
correct levels of protection for it.

00:05:31.040 --> 00:05:36.150
We need to know who owns the data and, of course, then appropriately label it.

00:05:36.250 --> 00:05:39.450
Even something like a report should always have a label

00:05:39.450 --> 00:05:42.330
that says this is business private or business

00:05:42.330 --> 00:05:45.210
confidential or for official use only.

00:05:45.470 --> 00:05:49.860
So anybody who picks it up knows right away that this is something

00:05:49.860 --> 00:05:54.100
that requires protection. When we look at labeling,

00:05:54.100 --> 00:05:59.450
of course, the labeling is an indication of what the classification is.

00:06:00.340 --> 00:06:07.560
It then mandates or indicates what type of handling provisions have to be taken.

00:06:07.940 --> 00:06:10.550
Can this information be shared with a coworker?

00:06:11.440 --> 00:06:12.960
Can it be shared with the customer?

00:06:13.540 --> 00:06:18.660
Does this need to be securely destroyed at the end of life?

00:06:19.040 --> 00:06:25.980
And so in many cases, the label is an important identifier to a user of

00:06:25.980 --> 00:06:29.660
what they can and, of course, cannot do with that data.

00:06:30.740 --> 00:06:35.510
It's important that the labeling is something that is done consistently.

00:06:35.640 --> 00:06:39.310
We don't want a piece of information to be considered business

00:06:39.310 --> 00:06:42.890
private on one system and not protected on another.

00:06:43.540 --> 00:06:47.350
So, consistency is important here, which is why the use of

00:06:47.350 --> 00:06:50.260
data flows is often very important as well.

00:06:51.440 --> 00:06:54.920
We know that some departments may take data

00:06:54.920 --> 00:06:57.550
protection more seriously than others.

00:06:57.740 --> 00:07:00.660
We quite often see that there's a difference in culture

00:07:00.660 --> 00:07:03.260
between sales and finance, for example.

00:07:03.940 --> 00:07:07.920
But it's important that when we have something like credit card or payment

00:07:07.920 --> 00:07:12.440
card information, even though that came in on a sales system that got

00:07:12.440 --> 00:07:15.710
passed to finance for monthly recurring building,

00:07:16.120 --> 00:07:19.640
it's important that it's protected consistently on

00:07:19.640 --> 00:07:22.560
both sales and financial systems.

00:07:23.040 --> 00:07:26.930
It could well be that we have two different system

00:07:26.930 --> 00:07:30.360
owners who have very different cultures.

00:07:30.940 --> 00:07:36.930
But the role of the data owner is to make sure that even though this piece of

00:07:36.930 --> 00:07:41.410
data ends up on multiple systems with different system owners,

00:07:41.420 --> 00:07:45.390
it is still protected consistently on all systems.

00:07:46.940 --> 00:07:48.430
We've often heard people say,

00:07:48.430 --> 00:07:52.610
well, I don't want to put a big red label on this that says top secret

00:07:52.620 --> 00:07:55.360
because that'll just draw people's attention to it.

00:07:56.030 --> 00:07:59.700
There could be some, should we say, merit to that argument.

00:07:59.990 --> 00:08:04.960
It's the old story of, in some cases, that we hide something in

00:08:04.960 --> 00:08:07.680
plain sight and maybe people don't notice it.

00:08:08.140 --> 00:08:11.750
But the general rule is it's better to label it so people

00:08:11.750 --> 00:08:15.130
know it needs to be protected, and anyone who sees it knows

00:08:15.130 --> 00:08:17.830
right away that must be protected.

00:08:17.830 --> 00:08:20.260
It can't be left lying around, for example.

00:08:21.340 --> 00:08:25.510
But the challenge, of course, is that if I become too

00:08:25.510 --> 00:08:30.110
compartmentalized so that we don't share information,

00:08:30.190 --> 00:08:33.669
it could be we lose some of the synergies and some of the

00:08:33.669 --> 00:08:37.799
benefits we have of people being able to work collaboratively.

00:08:37.799 --> 00:08:41.760
And we don't end up with two people doing the same thing, but

00:08:41.760 --> 00:08:43.900
because they can't share information,

00:08:43.900 --> 00:08:46.560
we end up with duplication of effort as well.

00:08:47.040 --> 00:08:48.660
So labeling is important.

00:08:49.040 --> 00:08:53.540
The handling is important, and, of course, the most important of all that is

00:08:53.540 --> 00:08:57.560
train people, what does it mean when a report on the bottom,

00:08:57.560 --> 00:08:59.060
it says business private.

00:09:00.040 --> 00:09:03.910
If they don't know what it means, then there's no sense putting a label on it.

00:09:04.340 --> 00:09:06.950
And a very good example of this is emails.

00:09:07.540 --> 00:09:11.100
Many organizations have a footer on the emails that says

00:09:11.100 --> 00:09:13.660
this is a confidential communication.

00:09:14.140 --> 00:09:17.950
Yeah, but if they put that on every single email, what does it mean?

00:09:18.440 --> 00:09:21.040
Because many of the emails are not confidential.

00:09:21.040 --> 00:09:23.600
It's an invitation to go for lunch, for example.

00:09:24.140 --> 00:09:27.340
And so those are the sorts of things where we have to be careful

00:09:27.640 --> 00:09:30.650
that we label things appropriately as well.
