WEBVTT

00:07.220 --> 00:12.710
In this episode, we're going to talk about file analysis and how to go through the process of identifying

00:12.710 --> 00:14.870
malware and appropriately responding.

00:14.870 --> 00:19.520
We're going to take a look at both dynamic and static analysis and the differences between them.

00:19.520 --> 00:24.320
We're also going to talk about file integrity and how to use hash values to ensure the integrity of

00:24.320 --> 00:29.720
our files, both at the operating system structure as well as from an application structure.

00:29.720 --> 00:35.390
We're also finally going to look at strings and how strings are utilized to go through our systems and

00:35.390 --> 00:42.290
really go through and search different aspects of our operating system or our file structure to identify

00:42.320 --> 00:46.610
different malicious activity that may be ongoing throughout this episode.

00:46.610 --> 00:52.730
What you really need to take away from it is that Cisa expects you to have not only a theoretical concept

00:52.730 --> 00:55.220
of it, but a practical standpoint.

00:55.220 --> 00:58.490
This is divergent from what you may have seen in Security+.

00:58.490 --> 01:03.240
As we go through these, take into account that we're only covering the theory right now.

01:03.240 --> 01:07.410
In the next episode, we're going to get our hands dirty and actually look at it from a perspective

01:07.410 --> 01:08.850
of practical standpoint.

01:08.880 --> 01:14.430
Final analysis is the examination of the file to understand the behavior, the purpose, and the potential

01:14.430 --> 01:15.960
impact of the system.

01:15.960 --> 01:20.880
More often than not, when we do final analysis, we perceive that there may be an issue within the

01:20.880 --> 01:21.870
file structure.

01:21.870 --> 01:23.520
This could be because of a bug.

01:23.520 --> 01:27.060
It could be because of another application interfering with the file structure.

01:27.060 --> 01:32.580
Or if all things go bad, it could be because a malicious actor got in there and is messing with that

01:32.580 --> 01:33.540
file structure.

01:33.540 --> 01:39.000
File analysis is done in really two points, either a static analysis or a dynamic analysis.

01:39.030 --> 01:41.010
We're going to talk about both those.

01:41.370 --> 01:45.690
One way to identify static analysis is through signature analysis.

01:45.690 --> 01:51.840
Every file on your system has a specific signature through the file header, or what's commonly known

01:51.840 --> 01:53.010
as the magic number.

01:53.010 --> 01:57.840
This signature provides information about the file type and can help determine if it matches the expected

01:57.840 --> 02:04.680
format, whether it's NTFS or a Fat format this signature really provides us a clear viewpoint of if

02:04.680 --> 02:06.360
the file has changed or not.

02:06.390 --> 02:10.860
The next analysis type is considered metadata and a metadata analysis.

02:10.890 --> 02:14.340
We're extracting the and analyzing the information embedded in the file.

02:14.340 --> 02:20.040
When we look at pictures, we often identify the metadata in terms of if there's a GPS coordinate,

02:20.070 --> 02:23.670
what type of pixelation is occurring, so on and so forth.

02:23.670 --> 02:28.230
But in files we're looking at the actual data, as in what's the author's name.

02:28.230 --> 02:29.640
When was the file created?

02:29.670 --> 02:32.190
Has it been revised or changed recently?

02:32.190 --> 02:38.010
All that metadata can lead us to whether or not the file has changed and if so, how it has changed.

02:38.040 --> 02:42.330
The third type is a file structure analysis and a file structure analysis.

02:42.360 --> 02:45.840
We're looking at the internal components such as headers or sections.

02:45.840 --> 02:48.000
We're looking at data segments to identify.

02:48.000 --> 02:52.890
Have there been changes to that and any abnormalities that may exist.

02:52.890 --> 02:57.840
When we're looking at a file structure analysis, we're more looking to see if the data has moved or

02:57.840 --> 02:59.100
changed in such a way.

02:59.100 --> 03:01.920
That doesn't make sense for the actual file.

03:01.950 --> 03:06.000
The last type of static analysis is considered entropy static analysis.

03:06.020 --> 03:09.080
And this type we're looking at the randomness of the data.

03:09.080 --> 03:15.410
If I see a data that hasn't have a lot of randomness to it, or it's very set up and repetitive, that's

03:15.410 --> 03:18.860
indicative of a malicious content within that file structure.

03:18.860 --> 03:24.380
If it has a lot of randomness, that means that the file is most likely sincere and doesn't have that

03:24.380 --> 03:26.210
malicious content attributed to it.

03:26.210 --> 03:31.850
Remember, when we see a lot of repetitive numbers or repetitive information within an entropy analysis

03:31.850 --> 03:37.610
that leads to be indicative of a malicious content associated with that specific file?

03:39.260 --> 03:42.710
The second type of analysis is called dynamic analysis.

03:42.710 --> 03:47.090
When we sandbox something, we're really taking the file structure, and we're protecting our other

03:47.090 --> 03:52.550
systems or other network features from that environment and a sandbox environment, we're closing it

03:52.550 --> 03:52.790
off.

03:52.790 --> 03:54.260
It doesn't have internet access.

03:54.260 --> 03:59.300
It's usually set up in a virtual capacity to where if something happens in that environment, we can

03:59.300 --> 04:03.320
just delete the entire thing, and it doesn't have any harm to our other systems.

04:03.320 --> 04:06.470
In a dynamic analysis, we're going to actually start the file.

04:06.470 --> 04:10.250
We're going to open it up and let the program run the way that it would normally run.

04:10.280 --> 04:12.380
We're going to identify what happened.

04:12.410 --> 04:14.450
Did it involve other applications?

04:14.480 --> 04:16.370
Is it affecting the operating system?

04:16.370 --> 04:21.470
Is it doing a lot of things that it shouldn't be doing, or is it working the way it should work?

04:21.500 --> 04:23.030
And a file structure.

04:23.030 --> 04:28.880
We're looking at the same perceptions that we would normally look in a static or a dynamic code analysis.

04:28.880 --> 04:34.700
They're very similar in that we're looking to see what happens when we open or run that file in that

04:34.700 --> 04:35.510
environment.

04:35.540 --> 04:38.000
The other aspect is behavioral analysis.

04:38.030 --> 04:43.040
Much like the sandboxing environment where we're going through and we're securing the system and a behavioral

04:43.040 --> 04:47.360
analysis, we're looking at how it behaves in the environment that it's operating currently in.

04:47.390 --> 04:50.810
This means that I can run the file structure and I can identify.

04:50.840 --> 04:52.670
Did it touch other applications.

04:52.670 --> 04:55.010
These are very similar in their degree.

04:55.010 --> 04:59.360
But remember in a sandbox environment we're shutting it off from the rest of our network.

04:59.360 --> 05:04.550
We can still run the behavioral analysis in that sandboxing environment which we described.

05:04.550 --> 05:07.730
But we can also don't have to necessarily sandbox it.

05:07.760 --> 05:13.280
Sometimes we don't perceive the threat to be that great, and we can run the behavioral analysis through

05:13.280 --> 05:17.570
our indicative machine, meaning that the file is there and we're just going to run it and see what

05:17.570 --> 05:18.140
happens.

05:18.170 --> 05:20.000
I don't recommend this structure.

05:20.000 --> 05:23.750
Usually when I move it to a sandbox environment after isolating that machine.

05:23.750 --> 05:28.820
But sometimes we may perceive that the value based off of that machine and its capacity to do information

05:28.820 --> 05:29.780
doesn't make sense.

05:29.780 --> 05:33.050
We'll just isolate the machine and then run the file directly from it.

05:33.080 --> 05:36.980
File integrity is where we go into play and we're measuring different files.

05:36.980 --> 05:42.050
Remember, if I update a machine or it update to the latest operating system or even update the latest

05:42.050 --> 05:44.660
software, the file structure could change.

05:44.660 --> 05:47.090
We could be adding more information to that file.

05:47.090 --> 05:49.550
We could be deleting old versions of that file.

05:49.550 --> 05:51.320
In any case, it could change.

05:51.320 --> 05:53.240
But what about my operating system?

05:53.240 --> 05:57.140
Or what about a file structure that I didn't give permission for it to change?

05:57.140 --> 06:02.930
If we're not writing new data to it, and it's a core portion of the operating system or the application

06:02.930 --> 06:07.790
I'm running, the file structure should remain unchanged and the file should remain unchanged.

06:07.820 --> 06:10.100
This is where we talk about file integrity.

06:10.100 --> 06:15.630
The integrity of the file should not change unless I give it to patch or I'm updating the software,

06:15.630 --> 06:19.140
or something is happening within my system that makes it change.

06:19.140 --> 06:22.260
In this aspect, we can actually go through and measure.

06:22.260 --> 06:27.840
If something changed, it should report to our system because the integrity of that file changes and

06:27.840 --> 06:30.390
we track that through hashing mechanisms.

06:30.390 --> 06:36.600
Through hashing, I take an individual file or multiple files, and I hash each individual file separately.

06:36.600 --> 06:38.340
That gives me a hash value.

06:38.340 --> 06:42.750
If that value changes, I know that something within that file has also changed.

06:42.750 --> 06:45.450
This is where file integrity checks take place.

06:45.450 --> 06:49.800
Now, we haven't talked about hashing yet, so let's show what it really looks like on Kali.

06:49.800 --> 06:52.890
I'm just going to show you real quickly what a hash looks like.

06:52.920 --> 06:57.750
And what happens if we change a file within our system and run a new hash.

06:57.750 --> 06:59.310
Let's take a look at that now.

06:59.310 --> 07:01.140
So here I am on my Kali box.

07:01.140 --> 07:03.060
I'm going to go ahead and open up a terminal real quick.

07:03.060 --> 07:05.250
And let me blow this up so you can actually read it.

07:05.250 --> 07:09.960
The first thing that we need to do is actually create a file or a document in which to hash.

07:09.960 --> 07:10.830
I'm going to create one.

07:10.830 --> 07:13.340
Right now I'm just going to use a program called Leafpad.

07:13.370 --> 07:16.670
It's much like notepad that you might see on a Microsoft windows machine.

07:16.670 --> 07:17.960
It does the same thing.

07:17.960 --> 07:21.950
So I'm going to name my file and I'm just going to name mine total SEM.

07:21.950 --> 07:24.410
Just like that I've created my file.

07:24.410 --> 07:25.910
It opens up right here.

07:25.910 --> 07:28.520
And here you can see that I can type whatever I want in.

07:28.520 --> 07:38.960
I'm going to put I'm going to pass my c y s A+ exam with flying colors.

07:38.990 --> 07:42.200
Thanks doctor.

07:42.230 --> 07:42.980
Kay.

07:43.520 --> 07:44.990
Exclamation marks, four of them.

07:44.990 --> 07:48.380
Just to be safe, we're going to save this right here.

07:48.380 --> 07:49.700
And I'm going to go ahead and close it.

07:49.700 --> 07:52.460
Now if I do an LZ I can see this right here.

07:52.460 --> 07:53.750
My total SEM document.

07:53.750 --> 07:55.340
And I'm going to run a hash on that.

07:55.340 --> 07:59.210
I'm going to run a basic MD5 hash by typing MD5 sum.

07:59.210 --> 08:03.590
And then the file I want to hash I press enter.

08:03.590 --> 08:06.860
And you can see here that it gives me a hash value for MD5.

08:06.890 --> 08:09.170
Now I could do this on a Sha as well.

08:09.170 --> 08:15.760
I could do a Sha 256 sum and again I could run it on the file.

08:15.790 --> 08:17.050
Total seminars.

08:17.080 --> 08:17.800
Oops.

08:18.280 --> 08:18.820
Total.

08:18.850 --> 08:20.650
Let's do that tab key just like that.

08:20.650 --> 08:27.430
And again you can see here that the Shah is actually quite a substantially larger than our MD5.

08:27.550 --> 08:29.890
Now let's go back into that same file.

08:29.890 --> 08:32.050
I'm going to go leafpad total Cem.

08:32.230 --> 08:33.460
It's going to open it back up.

08:33.490 --> 08:35.050
And I'm only going to change one thing.

08:35.050 --> 08:37.270
I'm just going to delete this one exclamation mark.

08:37.270 --> 08:38.650
That's all I'm going to do.

08:38.680 --> 08:40.180
I'm going to resave it.

08:40.780 --> 08:42.430
And again it's resaved.

08:42.430 --> 08:46.450
Let's rerun that sha256sum for that hash value.

08:46.480 --> 08:52.630
I want to point out that all I did was erase one exclamation mark and look the Sha value.

08:52.660 --> 08:54.430
The hash value has changed.

08:54.430 --> 08:59.110
If we were identifying this through automated means, we would know that something changed right off

08:59.110 --> 09:04.090
the bat, because this was our baseline hash for that file, and it now has changed.

09:04.090 --> 09:05.950
When we reran the hash value.

09:05.980 --> 09:11.470
Now we can set up our system to rerun hashes on an hourly basis every ten minutes.

09:11.470 --> 09:16.760
It doesn't really matter how often we do it, we just want to do it at least 24 hours every 24 hours

09:16.760 --> 09:19.160
to prove the integrity of our file system.

09:19.190 --> 09:22.250
Let's go back and change the file to its original content.

09:22.250 --> 09:24.200
I'm going to open up again that leafpad total.

09:24.350 --> 09:29.870
I'm going to go in there and again, I'm going to go back to the very end and re-add that one exclamation

09:29.870 --> 09:30.440
mark.

09:30.620 --> 09:32.600
Let me go back there and save it again.

09:33.860 --> 09:34.760
There we go.

09:34.760 --> 09:37.730
And I'm going to run that 256 to total sum.

09:37.730 --> 09:38.810
And look at that.

09:38.840 --> 09:45.380
Our hash value is back to the original hash value that we had, because the value has changed back to

09:45.410 --> 09:46.610
the original point.

09:46.610 --> 09:52.880
This is where hashing comes into play, where it provides us a good signature of the files, and if

09:52.880 --> 09:55.670
the integrity changes, it will notify us.

09:56.390 --> 10:00.620
I want to go into strings and talk a little bit about what they are and how they're utilized.

10:00.650 --> 10:07.490
Strings are a utility that we utilize to analyze binary and executable files within our core structure.

10:07.490 --> 10:14.030
This provides us a good segue into understanding what's going on in terms of search capability.

10:14.030 --> 10:20.790
This means that I can go through and actually look at registry Keys and find out if malware has inadvertently

10:20.790 --> 10:24.000
changed our registry keys from our original operating system.

10:24.030 --> 10:29.910
Now, it doesn't sound like much, but this could be an invaluable tool if I'm trying to define or identify

10:29.940 --> 10:34.620
potential malware within my core operating system, we're going to go through registry strings from

10:34.620 --> 10:37.380
a hands on perspective later within this course.

10:37.380 --> 10:42.690
If I have file names on my system, I need to be able to search and make sure that those file names

10:42.690 --> 10:45.540
correspond directly to the program that it's associated with.

10:45.540 --> 10:49.950
I want you to imagine that you're programming a big game, and you've got a lot of files on your computer

10:49.950 --> 10:55.140
system that are pulling into this program to allow your character to move from point A to point B,

10:55.170 --> 10:59.190
if that file name suddenly changes, you've ruined the entire program.

10:59.190 --> 11:01.440
That's a problem with file names changing.

11:01.440 --> 11:06.360
If we have a file name that we want to make sure that it doesn't change, then our pathway of strings

11:06.360 --> 11:09.870
will allow us to go through and search for those individual file names.

11:09.870 --> 11:12.120
It could also lead us to other malware.

11:12.120 --> 11:18.360
If I take a file name and I input it, or I change it very slightly, let's say that I have a file name

11:18.360 --> 11:19.890
called operating system.

11:19.890 --> 11:21.450
I have this original file.

11:21.450 --> 11:26.940
It's for the original operating system, and it has all kinds of configuration and required files injected

11:26.940 --> 11:27.570
within it.

11:27.600 --> 11:32.250
If I'm a malicious user, I could rename that file name to File names dot.

11:32.280 --> 11:34.740
Just it just file names with a period at the end.

11:34.770 --> 11:38.280
I then create a new folder that it's called file names.

11:38.280 --> 11:41.400
This poses a problem within our security infrastructure.

11:41.400 --> 11:47.280
That malicious actor can then inject that file names folder with all kinds of malicious activity, and

11:47.280 --> 11:50.160
change the code that was originally perceived within it.

11:50.190 --> 11:54.990
They didn't delete the file names folder, they just changed it, and they substitute it with their

11:54.990 --> 11:56.970
version of that file names folder.

11:56.970 --> 11:58.710
This poses problems.

11:59.130 --> 12:00.660
The next portion is pass.

12:00.660 --> 12:05.820
If I have a path that's attributed to a specific file name and that path changes, it again could ruin

12:05.850 --> 12:06.630
the program.

12:06.630 --> 12:12.000
But what if I divert that same path over to a different file that has malicious code injected within

12:12.000 --> 12:12.210
it?

12:12.210 --> 12:13.500
You see the problem?

12:13.500 --> 12:18.660
These strings really provide us a gateway into understanding where things are and how to get to them.

12:18.690 --> 12:20.100
Host names are no different.

12:20.100 --> 12:25.670
If I go into a hostname and I change the hostname from what it was originally, i.e. Chet's computer,

12:25.670 --> 12:28.400
over to Dave's computer, I've changed the name.

12:28.400 --> 12:34.220
This could pose problems when we're intersecting across our network IP addresses and of course URLs

12:34.220 --> 12:35.390
encryption keys.

12:35.390 --> 12:37.280
These all have a similar function.

12:37.310 --> 12:40.670
Let's get to encryption keys because I want to kind of touch on this one a little bit.

12:40.700 --> 12:45.350
Encryption keys make up a key fundamental portion of any system within our network.

12:45.350 --> 12:48.440
Communicating with another system in a secure environment.

12:48.440 --> 12:55.520
If a malware gets into or a malicious user gets into our encryption keys and we're using a PKI structure,

12:55.520 --> 12:58.370
they might now have access to our private key.

12:58.400 --> 13:03.830
That means that all that communication that was embedded within our systems that we perceive to be very

13:03.830 --> 13:06.140
secure is no longer secure.

13:06.140 --> 13:12.530
They can now intercept that information that we deem to be encrypted and decrypt it without even actually

13:12.530 --> 13:13.610
being on the machine.

13:13.640 --> 13:19.310
If you remember your days from Security+, your public private key infrastructure plays a major role

13:19.310 --> 13:25.040
in our secure transmission of encrypted information from one system to another by having that private

13:25.040 --> 13:31.340
key be exploited or taken advantage of or grabbed or stolen, we have a major problem.

13:31.340 --> 13:34.850
They don't actually have to read that encrypted messages on that system.

13:34.850 --> 13:39.140
Once they have that private key, they can grab that information through a man in the middle attack

13:39.140 --> 13:45.140
and then be able to read it, remove it, and even change it at will because they have that private

13:45.140 --> 13:45.740
key.

13:45.770 --> 13:51.080
In this episode, we talked about file analysis and how the differences between static versus dynamic

13:51.080 --> 13:51.770
analysis.

13:51.800 --> 13:56.450
We talked about the importance of static analysis and how it really provides us not only with metadata,

13:56.480 --> 14:01.400
but signature analysis and those different analysis types to ensure that malware or malicious users

14:01.400 --> 14:03.740
don't take advantage of our file structures.

14:03.770 --> 14:09.260
We talked about dynamic analysis and the use of sandboxing, and how we can utilize behavior analysis

14:09.260 --> 14:15.560
to identify if something is occurring from a malicious user or malware to intercept or change that file

14:15.560 --> 14:16.220
structure.

14:16.250 --> 14:22.100
We also touched on file integrity and how file integrity can be utilized through hashing values to ensure

14:22.100 --> 14:28.430
that files don't change from their regular or authorized viewpoint without patching or software upgrades.

14:28.460 --> 14:33.590
Finally, we talked about strings and how strings can be utilized to search functions across our systems,

14:33.590 --> 14:39.290
and how malicious actors can change file names and change paths to provide access to their own malware

14:39.290 --> 14:42.020
without actually deleting anything off our core system.

14:42.050 --> 14:45.980
This provides them a different pathway to avoid integrity checks.

14:46.010 --> 14:50.390
We've identified how encryption keys are utilized on our file structure, and how those encryption keys

14:50.390 --> 14:52.880
can be utilized by malicious actors.

14:52.880 --> 14:54.200
Within your size.

14:54.230 --> 14:59.210
A+ exam, you need to remember that all the things that we covered throughout this episode are potentially

14:59.210 --> 15:04.280
part of your exam, and the questions that you may face, however they're going to be scenario based.

15:04.280 --> 15:07.820
You should expect to see differences between static and dynamic analysis.

15:07.850 --> 15:12.800
You should also expect to see questions on file integrity and the usage of hash files, to ensure that

15:12.800 --> 15:14.300
file integrity hasn't changed.

15:14.330 --> 15:19.610
Finally, you should probably expect to see some questions very roughly or very high level on strings

15:19.610 --> 15:21.050
and how to utilize those strings.

15:21.080 --> 15:25.730
Don't worry, we're going to go into strings in a much more in depth level further along within this

15:25.730 --> 15:26.450
course.
