WEBVTT

00:00.000 --> 00:03.480
>> Data security and
the data life cycle.

00:03.480 --> 00:05.820
The learning objectives
for this lesson

00:05.820 --> 00:08.085
are to define the
data life cycle,

00:08.085 --> 00:11.025
describe data classification
and management,

00:11.025 --> 00:13.785
and to describe data loss
prevention concepts.

00:13.785 --> 00:17.805
Let's get started. This
is the data life cycle.

00:17.805 --> 00:20.835
All data will flow
through this cycle.

00:20.835 --> 00:22.515
It begins with create.

00:22.515 --> 00:24.230
This might be
something as simple as

00:24.230 --> 00:26.615
a user creating an email
or a Word document,

00:26.615 --> 00:29.090
but it can also be
data that is created

00:29.090 --> 00:32.020
in a database or
from applications.

00:32.020 --> 00:33.845
From there we move to store.

00:33.845 --> 00:35.495
Once we create the data,

00:35.495 --> 00:37.060
we have to have a
place to put it.

00:37.060 --> 00:38.900
From store we move to use.

00:38.900 --> 00:41.330
This is making use of that data.

00:41.330 --> 00:45.140
Then once data is no longer
needed on a daily basis,

00:45.140 --> 00:47.465
but we still need to hold
onto it for a little while,

00:47.465 --> 00:49.285
we move it to archive.

00:49.285 --> 00:51.680
Then finally, when we no longer

00:51.680 --> 00:54.305
have any use of the
data, we destroy it.

00:54.305 --> 00:57.630
This would complete
the data life cycle.

00:58.310 --> 01:00.885
Data classification.

01:00.885 --> 01:04.205
All data in an organization
should be classified.

01:04.205 --> 01:06.425
This allows us to put
different controls

01:06.425 --> 01:09.020
on the data based on its
classification level.

01:09.020 --> 01:10.805
The first we have is public,

01:10.805 --> 01:12.845
also known as unclassified.

01:12.845 --> 01:14.720
This information, if it were to

01:14.720 --> 01:16.835
be released or become public,

01:16.835 --> 01:19.715
would cause no damage
to the organization.

01:19.715 --> 01:23.120
The second is confidential,
also known as secret.

01:23.120 --> 01:25.310
This data is highly
sensitive and should

01:25.310 --> 01:27.955
only be viewed by
authorized personnel.

01:27.955 --> 01:30.755
Finally, we have
critical or top secret.

01:30.755 --> 01:32.240
This information is too

01:32.240 --> 01:34.760
important to even allow
it to be captured.

01:34.760 --> 01:37.160
The highest levels of
controls are placed

01:37.160 --> 01:40.830
upon data in this
classification level.

01:41.780 --> 01:45.660
Data management.
Inventory and mapping.

01:45.660 --> 01:47.030
This is a data map that

01:47.030 --> 01:49.445
identifies and tracks
the data created,

01:49.445 --> 01:53.285
controlled, or maintained
by an organization.

01:53.285 --> 01:56.930
Data integrity management
ensures that the data is in

01:56.930 --> 01:58.595
its proper state and that

01:58.595 --> 02:01.685
any changes that occur
can be identified.

02:01.685 --> 02:03.725
This ensures data reliability.

02:03.725 --> 02:07.590
We need to know how it
changed and who changed it.

02:08.480 --> 02:10.800
Data loss prevention.

02:10.800 --> 02:13.340
As a concept, data
loss prevention or

02:13.340 --> 02:16.010
DLP automates the discovery and

02:16.010 --> 02:18.620
classification of data
and then it enforces

02:18.620 --> 02:20.300
rules to ensure that the data

02:20.300 --> 02:22.705
isn't viewed or
released improperly.

02:22.705 --> 02:26.180
Once all data has been
classified in an organization,

02:26.180 --> 02:28.400
each of those levels
of classification may

02:28.400 --> 02:30.950
have specific rules
placed upon them,

02:30.950 --> 02:33.215
which users are allowed
to access them,

02:33.215 --> 02:35.910
make use of them,
that sort of thing.

02:37.220 --> 02:41.450
Data loss prevention
as a software product,

02:41.450 --> 02:43.520
it monitors endpoints and

02:43.520 --> 02:46.190
network traffic for signs

02:46.190 --> 02:48.020
of sensitive data
that's being copied,

02:48.020 --> 02:50.560
printed, or used in
inappropriate ways.

02:50.560 --> 02:52.970
It's comprised of
a policy server,

02:52.970 --> 02:55.280
endpoint agents,
and network agents.

02:55.280 --> 02:56.990
It works in a way that's similar

02:56.990 --> 02:58.595
to anti-malware software.

02:58.595 --> 03:00.635
Once it sees sensitive data,

03:00.635 --> 03:02.015
it will either alert,

03:02.015 --> 03:04.880
block, quarantine, or
tombstone the data.

03:04.880 --> 03:07.100
The way this would
work is once you've

03:07.100 --> 03:09.665
classified all of the
data on your network,

03:09.665 --> 03:11.630
you create those specific rules.

03:11.630 --> 03:14.735
You can be very granular
with DLP software to say,

03:14.735 --> 03:18.170
users in this categories are
allowed to view the data,

03:18.170 --> 03:20.300
but they can't print
it or they can't copy

03:20.300 --> 03:22.445
it off to a USB device,

03:22.445 --> 03:23.990
or they can't even email it.

03:23.990 --> 03:27.290
Where other users may
be allowed to email it,

03:27.290 --> 03:30.020
but they're not allowed to
do anything else with it.

03:30.020 --> 03:32.120
You can be very, as I said,

03:32.120 --> 03:33.740
granular with your rules

03:33.740 --> 03:36.780
based on the needs
of the organization.

03:38.420 --> 03:41.025
Data loss detection.

03:41.025 --> 03:44.180
The first example is a
responsible disclosure form.

03:44.180 --> 03:46.730
This facilitates for
the easy reporting of

03:46.730 --> 03:49.700
incidences that would occur
within an organization.

03:49.700 --> 03:51.890
The next is dark web scanning.

03:51.890 --> 03:56.110
Oftentimes when a data breach
occurs with a company,

03:56.110 --> 03:57.860
the first signs anyone knows of

03:57.860 --> 03:59.915
it is when the data
appears in the dark web.

03:59.915 --> 04:01.295
This usually happens when

04:01.295 --> 04:04.250
a hacker group
leaves a sample out,

04:04.250 --> 04:05.810
either because
they're going to sell

04:05.810 --> 04:07.025
the data or they're just

04:07.025 --> 04:08.695
releasing all the
data in general.

04:08.695 --> 04:11.255
Dark web scanning would let
you look through the dark web

04:11.255 --> 04:14.240
to see if you can find
any signs of your data.

04:14.240 --> 04:16.995
The next is deep
packet inspection.

04:16.995 --> 04:19.930
This looks into the
network packets as they

04:19.930 --> 04:22.530
pass on the network
for actual data,

04:22.530 --> 04:26.215
so we're not just looking at
the source and destination,

04:26.215 --> 04:28.460
or ports, or even the protocols,

04:28.460 --> 04:30.040
we're looking in
the actual packets

04:30.040 --> 04:32.165
to see the data that
they would contain.

04:32.165 --> 04:34.045
Finally, we have third party.

04:34.045 --> 04:36.640
These are services
that may offer

04:36.640 --> 04:38.350
real-time visibility into how

04:38.350 --> 04:40.405
an organization is
using its data.

04:40.405 --> 04:44.090
Examples of this would be
OneDrive and Google Drive.

04:45.600 --> 04:49.065
Digital rights management
and watermarking.

04:49.065 --> 04:52.510
Digital rights management
or DRM is about

04:52.510 --> 04:54.280
controlling digital content and

04:54.280 --> 04:56.600
how it's used after
being published.

04:56.600 --> 04:58.480
Most people have run
into this when they've

04:58.480 --> 05:00.835
downloaded legal music
from the Internet.

05:00.835 --> 05:03.340
Companies put DRM in
there so that you're

05:03.340 --> 05:06.170
not able to copy that and
share it with others.

05:06.170 --> 05:08.410
DVDs also make use of

05:08.410 --> 05:10.300
stream encryption
and region locking.

05:10.300 --> 05:12.715
The overall goal is
to prevent copying.

05:12.715 --> 05:15.040
Region locking is where you

05:15.040 --> 05:17.230
buy a DVD in the United
States and then you

05:17.230 --> 05:19.210
take it to Europe
and it will not play

05:19.210 --> 05:21.910
in the DVD players for Europe.

05:21.910 --> 05:23.800
They're locking that DVD to

05:23.800 --> 05:25.910
a specific region of the world.

05:25.910 --> 05:27.954
Finally, we have watermarking.

05:27.954 --> 05:30.120
This is marking data
so that it clearly

05:30.120 --> 05:32.650
displays important
details about the data,

05:32.650 --> 05:34.495
such as ownership information,

05:34.495 --> 05:38.420
the data classification,
and how it may be used.

05:39.630 --> 05:42.085
Obfuscating and masking.

05:42.085 --> 05:43.960
This is a mechanism of hiding

05:43.960 --> 05:46.510
data and it doesn't always
involve encryption,

05:46.510 --> 05:48.850
sometimes it can be as
simple as encoding things

05:48.850 --> 05:51.325
in different formats
such as base64.

05:51.325 --> 05:53.440
The goal is to have
data in a format

05:53.440 --> 05:56.240
that isn't easily recognizable.

05:56.580 --> 05:58.825
In my instructor side note,

05:58.825 --> 06:00.320
I give an example of this.

06:00.320 --> 06:02.680
Base64 is commonly used

06:02.680 --> 06:05.050
to obfuscate payloads
in phishing emails.

06:05.050 --> 06:07.645
If you were to look
at the HTML code,

06:07.645 --> 06:11.705
often critical parts are
encoded using base64.

06:11.705 --> 06:13.410
This makes it much harder for

06:13.410 --> 06:15.040
the end user to

06:15.040 --> 06:17.050
tell what's going on in
that phishing email.

06:17.050 --> 06:19.315
Then it takes time
to break it down,

06:19.315 --> 06:23.090
decode it, and find out
what the payload is doing.

06:25.490 --> 06:29.385
Tokenization, scrubbing,
and anonymization.

06:29.385 --> 06:32.670
Tokenization is used in
credit card processing.

06:32.670 --> 06:35.290
This replaces the data
with a token and this

06:35.290 --> 06:36.490
cannot be reversed to go

06:36.490 --> 06:38.585
back to the credit
card information.

06:38.585 --> 06:43.665
Scrubbing is data
integrity control

06:43.665 --> 06:46.105
that is designed
to find invalid,

06:46.105 --> 06:48.325
redundant, or outdated data

06:48.325 --> 06:50.380
from a database or
data warehouse,

06:50.380 --> 06:52.480
if you don't need
it, get rid of it.

06:52.480 --> 06:55.210
Anonymization removes
data that could be

06:55.210 --> 06:57.580
used to uniquely
identify a person.

06:57.580 --> 06:59.770
This is common with
compliance laws.

06:59.770 --> 07:02.185
When we get into the HIPAA
regulations later on,

07:02.185 --> 07:04.520
we'll go into this
a little bit more.

07:04.790 --> 07:08.925
Summary. We described
the data life cycle,

07:08.925 --> 07:10.880
we explained data classification

07:10.880 --> 07:12.950
and management and
why that's important.

07:12.950 --> 07:15.679
Then we also explained
data loss concepts

07:15.679 --> 07:18.905
and then we demonstrate data
obfuscation and masking.

07:18.905 --> 07:21.325
Let's do some example questions.

07:21.325 --> 07:24.440
Question 1, what
method is used to

07:24.440 --> 07:26.030
control how digital content

07:26.030 --> 07:28.620
is used after being published?

07:28.900 --> 07:31.980
Digital rights management.

07:32.480 --> 07:36.560
Question 2, which stage
of the data life cycle

07:36.560 --> 07:38.450
describes when data is no longer

07:38.450 --> 07:41.790
used on a regular basis,
but is still needed?

07:42.650 --> 07:47.590
Archive. Question 3,

07:47.590 --> 07:50.150
which type of data
classification would be

07:50.150 --> 07:52.685
used for information that
is highly sensitive,

07:52.685 --> 07:56.190
it should only be viewed
by approved persons?

07:57.200 --> 08:01.800
Confidential or secret. Finally,

08:01.800 --> 08:04.880
question 4, what
is used to ensure

08:04.880 --> 08:06.575
data is in its proper state

08:06.575 --> 08:09.540
and that any changes
can be identified?

08:10.340 --> 08:12.945
Data integrity management.

08:12.945 --> 08:14.660
I hope this lesson
was useful for

08:14.660 --> 08:17.010
you, and I'll see
you in the next one.

