WEBVTT

00:00.000 --> 00:02.475
>> Now we'll talk about hashing,

00:02.475 --> 00:05.035
checksums and message digests.

00:05.035 --> 00:06.810
We've just talked
about how we get

00:06.810 --> 00:09.060
privacy through
asymmetric cryptography.

00:09.060 --> 00:10.710
We said that if we encrypt data

00:10.710 --> 00:12.150
with the receivers public key,

00:12.150 --> 00:14.460
we'll be able to send the
message across the network,

00:14.460 --> 00:15.750
and that private key is

00:15.750 --> 00:17.685
the only thing that
can be decrypt it.

00:17.685 --> 00:19.860
But we also said that
we need things like

00:19.860 --> 00:23.220
authenticity and integrity
and non-repudiation.

00:23.220 --> 00:24.840
Here we'll talk a little bit

00:24.840 --> 00:27.045
about how we can get integrity.

00:27.045 --> 00:30.120
One way we can do that
is through hashing.

00:30.120 --> 00:32.100
Hashes can also
be referred to as

00:32.100 --> 00:34.259
checksums or message digests.

00:34.259 --> 00:37.750
Any of those terms can
be used interchangeably.

00:38.060 --> 00:40.550
Let's say I'm sending
you a message

00:40.550 --> 00:42.410
across the network,
or the Internet,

00:42.410 --> 00:43.990
or across a private line,

00:43.990 --> 00:46.955
and let's say that connection
is really unreliable.

00:46.955 --> 00:48.950
It has a lot of
interference and drop

00:48.950 --> 00:51.095
packets and a lot of issues.

00:51.095 --> 00:53.015
I want to make sure
that you know,

00:53.015 --> 00:54.740
what I sent you across
the network hasn't

00:54.740 --> 00:55.824
>> been corrupted.

00:55.824 --> 00:58.315
>> That's exactly the
purpose of hashing,

00:58.315 --> 01:02.435
to detect unintentional
modification like corruption.

01:02.435 --> 01:04.700
Before I send you the message,

01:04.700 --> 01:05.960
let's say that I determined

01:05.960 --> 01:08.870
the numeric value for every
character in the message.

01:08.870 --> 01:11.270
For example, in the word hello,

01:11.270 --> 01:13.475
The H is the eighth
letter of the alphabet,

01:13.475 --> 01:15.260
and the E is the fifth letter,

01:15.260 --> 01:17.780
and the L is the 12th
letter and so forth.

01:17.780 --> 01:19.250
I'm going to take

01:19.250 --> 01:21.700
all the numeric values and
add them all together,

01:21.700 --> 01:23.720
and then I'll put that
value at the bottom of

01:23.720 --> 01:26.435
the message, that's my hash.

01:26.435 --> 01:28.940
In this case, if the
message is hello,

01:28.940 --> 01:31.410
then the value is 52.

01:31.990 --> 01:34.420
I send the message to you,

01:34.420 --> 01:36.470
your software does
the same process,

01:36.470 --> 01:37.880
adds up the numeric value

01:37.880 --> 01:39.500
for every letter in the message,

01:39.500 --> 01:41.644
and when you come up
with the number 52,

01:41.644 --> 01:44.015
then your software makes
a reasonable assumption

01:44.015 --> 01:46.375
that the message has
not been modified.

01:46.375 --> 01:50.330
Now, I realize that this
is a weak idea of a hash,

01:50.330 --> 01:52.190
but I'm just giving you
this as an example so

01:52.190 --> 01:54.485
you can understand
how a hash works.

01:54.485 --> 01:57.845
In reality, the hash
would be more robust.

01:57.845 --> 02:00.830
One thing to note about
the hash and any hash,

02:00.830 --> 02:02.810
is that it uses one-way math.

02:02.810 --> 02:04.430
Hashes don't use keys

02:04.430 --> 02:07.010
and they aren't
symmetric or asymmetric.

02:07.010 --> 02:09.425
The only reason I've
included hashing here with

02:09.425 --> 02:11.450
asymmetric cryptography
is that we're

02:11.450 --> 02:13.819
going to combine it with
asymmetric cryptography

02:13.819 --> 02:16.460
in a few minutes to get
an additional service.

02:16.460 --> 02:18.770
But I want you to know
that a hash's magic

02:18.770 --> 02:20.695
is a one-way nature of the math.

02:20.695 --> 02:22.160
It's easy to take all these

02:22.160 --> 02:23.660
numbers and add them together.

02:23.660 --> 02:25.235
But if you look at the result,

02:25.235 --> 02:27.200
you can't use that
to reverse it.

02:27.200 --> 02:28.910
If the message is encrypted

02:28.910 --> 02:30.935
and all you saw
was the number 52,

02:30.935 --> 02:34.000
you couldn't say for sure
that the message was Hello.

02:34.000 --> 02:36.440
Again, my hash is nowhere near

02:36.440 --> 02:38.570
the sophistication
of a real hash.

02:38.570 --> 02:40.430
Real hashes have a fixed length

02:40.430 --> 02:42.024
>> for the hash themselves.

02:42.024 --> 02:47.325
>> For example, MD-5 always
creates a 128-bit hash,

02:47.325 --> 02:49.695
regardless of how
long your message is.

02:49.695 --> 02:52.905
SHA-1 creates a 160 bit hash

02:52.905 --> 02:55.380
and SHA-2 can create a 256-bit,

02:55.380 --> 03:00.850
384-bit, 512-bit,
1024-bit hashes.

03:00.850 --> 03:04.285
Make sure you know those
hash bit links for the test,

03:04.285 --> 03:07.730
SHA Secure Hashing Algorithm.

03:07.730 --> 03:09.560
There are also
other hashes called

03:09.560 --> 03:11.975
HAVAL, TIGER and RIPEMD.

03:11.975 --> 03:13.670
I don't think they
will be on the test,

03:13.670 --> 03:16.255
but at least know
that they are hashes.

03:16.255 --> 03:18.510
These days SHA-2 with

03:18.510 --> 03:22.610
a 256-bit hash is the
one people use the most.

03:22.610 --> 03:25.190
Now with hashes,
I'm going to get

03:25.190 --> 03:26.450
a guarantee that the message has

03:26.450 --> 03:28.315
not changed since it was sent.

03:28.315 --> 03:30.455
But one problem can occur,

03:30.455 --> 03:31.670
which is called a collision.

03:31.670 --> 03:34.220
Its possible for two
different documents

03:34.220 --> 03:35.855
to produce the same hash.

03:35.855 --> 03:37.325
When we see a collision,

03:37.325 --> 03:39.935
we see that the hashing
algorithm is broken.

03:39.935 --> 03:43.170
But its very rare for a
hash to cause a collision.

03:43.170 --> 03:46.905
We look for hashes to
be collision-resistant.

03:46.905 --> 03:49.205
Now, a birthday attack

03:49.205 --> 03:51.140
is an attempt to
cause collisions.

03:51.140 --> 03:52.850
Its based on the idea
that it's going to

03:52.850 --> 03:54.350
be incredibly
difficult for you to

03:54.350 --> 03:55.670
look at a hash and produce

03:55.670 --> 03:57.755
a document that
matches that hash.

03:57.755 --> 04:00.125
You shouldn't be able
to reverse a hash,

04:00.125 --> 04:02.270
but the birthday attack
says if you have

04:02.270 --> 04:04.310
enough documents that
you are comparing,

04:04.310 --> 04:06.050
you might just
find two documents

04:06.050 --> 04:07.700
whose hashes happen to match,

04:07.700 --> 04:10.990
its like saying, I'd rather
be lucky than smart.

04:10.990 --> 04:13.935
Just a quick wrap
up with the hashes,

04:13.935 --> 04:16.460
they're here to protect
integrity and confirm that

04:16.460 --> 04:19.085
a message hasn't been
modified after being sent.

04:19.085 --> 04:21.110
However, there is
the possibility of

04:21.110 --> 04:23.960
a collision where two documents
produce the same hash,

04:23.960 --> 04:26.850
but those are
incredibly unlikely.

