1
00:00:00,330 --> 00:00:01,170
In this lesson,

2
00:00:01,170 --> 00:00:03,030
we're going to talk about NetFlow data

3
00:00:03,030 --> 00:00:04,680
and how it's used to conduct traffic flow

4
00:00:04,680 --> 00:00:06,570
analysis within our networks.

5
00:00:06,570 --> 00:00:08,910
In order to best monitor traffic in our network,

6
00:00:08,910 --> 00:00:12,630
we can either use full packet capture or NetFlow data.

7
00:00:12,630 --> 00:00:13,920
Now, as you might have guessed,

8
00:00:13,920 --> 00:00:16,680
packet captures can take up a lot of storage space

9
00:00:16,680 --> 00:00:18,600
and they can grow quickly in size.

10
00:00:18,600 --> 00:00:20,970
For example, if I'm conducting a full packet capture

11
00:00:20,970 --> 00:00:23,730
on my home network each day, I would need several gigabytes

12
00:00:23,730 --> 00:00:25,710
of storage just for my small family,

13
00:00:25,710 --> 00:00:27,480
because every single packet that goes in

14
00:00:27,480 --> 00:00:30,210
or out of my house would be captured and logged.

15
00:00:30,210 --> 00:00:32,100
Every video game my son is playing online,

16
00:00:32,100 --> 00:00:33,930
every YouTube video he watches,

17
00:00:33,930 --> 00:00:36,090
every Netflix show my wife is binging.

18
00:00:36,090 --> 00:00:37,851
All of that will be captured bit by bit

19
00:00:37,851 --> 00:00:40,590
inside of that full packet capture.

20
00:00:40,590 --> 00:00:42,960
Now a full packet capture or FPC

21
00:00:42,960 --> 00:00:45,120
is going to capture the entire packet.

22
00:00:45,120 --> 00:00:48,030
This includes the header and the payload for all the traffic

23
00:00:48,030 --> 00:00:49,920
that's entering or leaving your network.

24
00:00:49,920 --> 00:00:52,500
As I said, this would be a ton of information

25
00:00:52,500 --> 00:00:54,690
and quickly eat up all of our storage.

26
00:00:54,690 --> 00:00:57,472
Now because full packet capture takes up so much space,

27
00:00:57,472 --> 00:01:00,990
we often don't collect it in a lot of organizations.

28
00:01:00,990 --> 00:01:04,830
Most businesses and organizations instead will use NetFlow.

29
00:01:04,830 --> 00:01:07,890
Now, NetFlow data and other similar protocols like that

30
00:01:07,890 --> 00:01:11,010
are used to conduct something known as flow analysis.

31
00:01:11,010 --> 00:01:13,590
Flow analysis will rely on a flow collector as a means

32
00:01:13,590 --> 00:01:16,651
of recording metadata and statistics about network traffic

33
00:01:16,651 --> 00:01:19,020
instead of recording each and every frame

34
00:01:19,020 --> 00:01:20,220
or every single packet

35
00:01:20,220 --> 00:01:22,170
that's going in or out of our network.

36
00:01:22,170 --> 00:01:24,071
This allows us to use flow analysis tools

37
00:01:24,071 --> 00:01:26,160
that provide network traffic statistics

38
00:01:26,160 --> 00:01:27,870
sampled by the collector.

39
00:01:27,870 --> 00:01:30,390
Now, by doing this, we can capture information about

40
00:01:30,390 --> 00:01:33,090
the traffic flow instead of the data contained within

41
00:01:33,090 --> 00:01:36,690
that data flow, and this saves us a lot of storage space.

42
00:01:36,690 --> 00:01:38,700
Now with NetFlow and flow analysis,

43
00:01:38,700 --> 00:01:39,960
we're not going to have the contents

44
00:01:39,960 --> 00:01:41,820
of what's going over the network like we would

45
00:01:41,820 --> 00:01:43,410
with a full packet capture.

46
00:01:43,410 --> 00:01:45,720
But we can still gather a lot of metadata

47
00:01:45,720 --> 00:01:47,640
and information about the network traffic

48
00:01:47,640 --> 00:01:49,980
that's helpful to us in our monitoring.

49
00:01:49,980 --> 00:01:51,941
This information is stored inside a database

50
00:01:51,941 --> 00:01:53,811
and can be queried later by different tools

51
00:01:53,811 --> 00:01:56,430
to produce different reports and graphs.

52
00:01:56,430 --> 00:01:58,170
Now, the great thing about flow analysis

53
00:01:58,170 --> 00:01:59,880
is it's going to allow us to highlight trends

54
00:01:59,880 --> 00:02:02,820
and patterns in the traffic being generated by our network,

55
00:02:02,820 --> 00:02:04,200
and this becomes really useful

56
00:02:04,200 --> 00:02:06,210
in our network performance monitoring.

57
00:02:06,210 --> 00:02:07,440
Flow analysis will allow us

58
00:02:07,440 --> 00:02:09,810
to get alerts based on different anomalies we might see

59
00:02:09,810 --> 00:02:11,619
and different patterns or triggers that are outside

60
00:02:11,619 --> 00:02:13,680
of our expected baselines.

61
00:02:13,680 --> 00:02:16,170
These tools also have a visualization component

62
00:02:16,170 --> 00:02:17,730
that allow us to quickly create a map

63
00:02:17,730 --> 00:02:19,020
of different network connections

64
00:02:19,020 --> 00:02:21,720
and the associated flow patterns over those connections.

65
00:02:21,720 --> 00:02:23,340
By identifying different traffic patterns

66
00:02:23,340 --> 00:02:26,100
that might reveal bad behavior, malware in transit,

67
00:02:26,100 --> 00:02:28,032
tunneling, or other bad things out there,

68
00:02:28,032 --> 00:02:29,880
we're going to be able to quickly respond

69
00:02:29,880 --> 00:02:32,070
to these potential problems or incidents.

70
00:02:32,070 --> 00:02:34,140
Now, there are a few different tools we can use

71
00:02:34,140 --> 00:02:36,210
when dealing with traffic flow analysis.

72
00:02:36,210 --> 00:02:38,400
This includes things like NetFlow, Zeek,

73
00:02:38,400 --> 00:02:40,530
and the multi router traffic grapher.

74
00:02:40,530 --> 00:02:42,660
Let's take a look at each of these for a moment.

75
00:02:42,660 --> 00:02:44,430
First, we have NetFlow.

76
00:02:44,430 --> 00:02:45,960
NetFlow is a Cisco developed means

77
00:02:45,960 --> 00:02:47,580
of reporting network flow information

78
00:02:47,580 --> 00:02:49,230
to a structured database.

79
00:02:49,230 --> 00:02:51,660
NetFlow is actually one of the first data flow analyzers

80
00:02:51,660 --> 00:02:52,890
that was created out there,

81
00:02:52,890 --> 00:02:55,170
and eventually it became basically the standard

82
00:02:55,170 --> 00:02:58,950
that everyone started to use under the term IPFIX,

83
00:02:58,950 --> 00:03:01,560
or IP flow information export.

84
00:03:01,560 --> 00:03:02,940
Now, NetFlow allows us

85
00:03:02,940 --> 00:03:05,550
to define a particular traffic flow based on different

86
00:03:05,550 --> 00:03:07,800
packets that share the same characteristics.

87
00:03:07,800 --> 00:03:09,930
For example, if we want to identify packets

88
00:03:09,930 --> 00:03:12,030
with the same source and destination IP,

89
00:03:12,030 --> 00:03:14,730
this could signify there's a session between those two hosts

90
00:03:14,730 --> 00:03:16,590
and it should be considered one data flow

91
00:03:16,590 --> 00:03:18,390
that we can collect information on.

92
00:03:18,390 --> 00:03:20,100
Now, when you look at NetFlow data,

93
00:03:20,100 --> 00:03:22,050
you can capture information about the packets

94
00:03:22,050 --> 00:03:23,820
that are going over these devices,

95
00:03:23,820 --> 00:03:26,130
like the network protocol interface that's being used,

96
00:03:26,130 --> 00:03:28,140
the version and type of IP being used,

97
00:03:28,140 --> 00:03:30,060
the source and destination IP address,

98
00:03:30,060 --> 00:03:33,540
the source and destination port, or the IP type of service.

99
00:03:33,540 --> 00:03:36,060
All this information can be gathered using NetFlow

100
00:03:36,060 --> 00:03:37,890
and then analyzed and displayed visually

101
00:03:37,890 --> 00:03:39,480
using our different tools.

102
00:03:39,480 --> 00:03:42,240
For example, here you can see that I'm using SolarWinds

103
00:03:42,240 --> 00:03:45,420
as a tool to show the NetFlow data of a network,

104
00:03:45,420 --> 00:03:47,850
but you can also review this data in a text-based

105
00:03:47,850 --> 00:03:50,101
environment using NetFlow exports themself.

106
00:03:50,101 --> 00:03:53,100
In this graphical environment though, it becomes really easy

107
00:03:53,100 --> 00:03:55,740
to see that there are 15 different traffic flows.

108
00:03:55,740 --> 00:03:58,620
And if I expand the 15th data flow, we can see the source

109
00:03:58,620 --> 00:04:01,890
and destination IP, the source port, the destination port,

110
00:04:01,890 --> 00:04:04,170
some basic information about that data flow,

111
00:04:04,170 --> 00:04:07,200
but we're not seeing the content of any of those packets

112
00:04:07,200 --> 00:04:08,970
that were part of this data flow.

113
00:04:08,970 --> 00:04:10,080
For us to be able to do that,

114
00:04:10,080 --> 00:04:12,210
we would have to have a full packet capture,

115
00:04:12,210 --> 00:04:14,490
but here we only captured the metadata

116
00:04:14,490 --> 00:04:17,519
or the information about those traffic flows.

117
00:04:17,519 --> 00:04:20,339
Now, if you want to be able to have the best of both worlds,

118
00:04:20,339 --> 00:04:22,110
you can use something like Zeek.

119
00:04:22,110 --> 00:04:23,670
Now, Zeek is a hybrid tool

120
00:04:23,670 --> 00:04:25,992
that passively monitors your network like a sniffer,

121
00:04:25,992 --> 00:04:28,779
but it's only going to log full packet captures

122
00:04:28,779 --> 00:04:31,380
based on data of potential interest.

123
00:04:31,380 --> 00:04:34,140
Essentially, Zeek is going to sample the data going across

124
00:04:34,140 --> 00:04:36,090
the network just like NetFlow does.

125
00:04:36,090 --> 00:04:37,350
But when Zeek finds something

126
00:04:37,350 --> 00:04:39,450
that it deems interesting based on the parameters

127
00:04:39,450 --> 00:04:40,740
and rules you've configured,

128
00:04:40,740 --> 00:04:43,380
it's going to log the entire packet for that part

129
00:04:43,380 --> 00:04:45,480
and then send it over to our cybersecurity analyst

130
00:04:45,480 --> 00:04:47,100
for further investigation.

131
00:04:47,100 --> 00:04:48,930
This method helps us reduce our storage

132
00:04:48,930 --> 00:04:50,070
and processing requirements,

133
00:04:50,070 --> 00:04:51,660
and it gives us the ability

134
00:04:51,660 --> 00:04:54,240
to have all this data in a single database.

135
00:04:54,240 --> 00:04:55,980
Now, one of the great things about Zeek is

136
00:04:55,980 --> 00:04:58,290
that it performs normalization of this data as well,

137
00:04:58,290 --> 00:05:00,330
and then stores it as either a tab delimited

138
00:05:00,330 --> 00:05:04,350
or JavaScript object notation, or JSON formatted text file.

139
00:05:04,350 --> 00:05:05,760
This allows you to use it with lots

140
00:05:05,760 --> 00:05:07,410
of other different cybersecurity tools

141
00:05:07,410 --> 00:05:09,750
and different network monitoring tools as well.

142
00:05:09,750 --> 00:05:12,450
For example, now that I have this normalized data,

143
00:05:12,450 --> 00:05:14,310
I can import that data into another tool

144
00:05:14,310 --> 00:05:17,040
for visualization, searching and analysis.

145
00:05:17,040 --> 00:05:19,291
Here, I've imported my Zeek logs into Splunk,

146
00:05:19,291 --> 00:05:22,050
and then I can have my cybersecurity analyst search

147
00:05:22,050 --> 00:05:25,320
for specific information during a potential incident.

148
00:05:25,320 --> 00:05:28,560
Now, the third tool we have to talk about is MRTG

149
00:05:28,560 --> 00:05:30,491
or the multi router traffic grapher.

150
00:05:30,491 --> 00:05:33,450
The multi router traffic grapher is a tool that's used

151
00:05:33,450 --> 00:05:35,291
to create graphs to show network traffic flows

152
00:05:35,291 --> 00:05:38,340
going through our network interfaces on different routers

153
00:05:38,340 --> 00:05:41,160
and switches, and it does this by pulling these appliances,

154
00:05:41,160 --> 00:05:44,490
using SNMP, the simple network management protocol.

155
00:05:44,490 --> 00:05:47,280
So what is useful about a visualization like this?

156
00:05:47,280 --> 00:05:49,261
Well, you're going to be able to start seeing patterns emerge

157
00:05:49,261 --> 00:05:51,630
that may be outside of your baseline.

158
00:05:51,630 --> 00:05:53,370
For example, here in the top graph,

159
00:05:53,370 --> 00:05:55,080
you could see a big spike in traffic

160
00:05:55,080 --> 00:05:57,120
between 2:00 AM and 4:00 AM.

161
00:05:57,120 --> 00:05:58,110
Is that normal?

162
00:05:58,110 --> 00:05:59,910
Well, maybe and maybe not,

163
00:05:59,910 --> 00:06:02,340
but it's something we should further investigate and analyze

164
00:06:02,340 --> 00:06:04,230
because we're seeing this big spike occur

165
00:06:04,230 --> 00:06:06,120
between 2:00 AM and 4:00 AM.

166
00:06:06,120 --> 00:06:07,590
And that might be something normal

167
00:06:07,590 --> 00:06:08,910
like doing offsite backups,

168
00:06:08,910 --> 00:06:11,040
or it could be something malicious.

169
00:06:11,040 --> 00:06:12,690
If it was the case of something that was normal,

170
00:06:12,690 --> 00:06:14,010
like an offsite backup,

171
00:06:14,010 --> 00:06:15,990
you're going to see this big spike in traffic

172
00:06:15,990 --> 00:06:17,760
because we're sending a backup copy of all

173
00:06:17,760 --> 00:06:20,910
of our data offsite to a cloud provider facility.

174
00:06:20,910 --> 00:06:22,680
That might be a reasonable explanation,

175
00:06:22,680 --> 00:06:24,420
and in that case, I wouldn't need to worry

176
00:06:24,420 --> 00:06:26,340
because I would see that every single night

177
00:06:26,340 --> 00:06:28,020
and I'd be used to seeing it.

178
00:06:28,020 --> 00:06:29,250
Now, on the other hand,

179
00:06:29,250 --> 00:06:31,590
maybe that server's been infected with malware.

180
00:06:31,590 --> 00:06:34,170
Every day at 2:00 to 4:00 AM it's going to send

181
00:06:34,170 --> 00:06:36,750
all of the data back to the bad actors

182
00:06:36,750 --> 00:06:38,880
while all my administrators are at home sleeping.

183
00:06:38,880 --> 00:06:40,950
This is considered data exfiltration

184
00:06:40,950 --> 00:06:42,510
as part of an attack campaign.

185
00:06:42,510 --> 00:06:44,490
That's something you want to be on the lookout for.

186
00:06:44,490 --> 00:06:46,020
Now, just looking at this graphic,

187
00:06:46,020 --> 00:06:47,970
I don't know which of these two cases it is.

188
00:06:47,970 --> 00:06:49,620
Is this something normal like a backup

189
00:06:49,620 --> 00:06:51,240
or is this something malicious?

190
00:06:51,240 --> 00:06:53,160
But if you know your organization

191
00:06:53,160 --> 00:06:56,130
and you know your baselines, now you can look at this graph

192
00:06:56,130 --> 00:06:59,100
and identify what should be investigated based on seeing

193
00:06:59,100 --> 00:07:01,320
that spike between 2:00 AM and 4:00 AM

194
00:07:01,320 --> 00:07:02,960
and then figure out where that additional traffic flow

195
00:07:02,960 --> 00:07:04,890
is going and why.

196
00:07:04,890 --> 00:07:06,960
If we suspected something was malicious here,

197
00:07:06,960 --> 00:07:08,790
like somebody exfiltrating our data,

198
00:07:08,790 --> 00:07:10,320
then we might set up a network sniffer

199
00:07:10,320 --> 00:07:11,850
in front of our file server

200
00:07:11,850 --> 00:07:13,590
and see what traffic is leaving the network

201
00:07:13,590 --> 00:07:14,820
and where it's going.

202
00:07:14,820 --> 00:07:16,350
Then based on that,

203
00:07:16,350 --> 00:07:18,030
we can have an instant response on our hands

204
00:07:18,030 --> 00:07:19,470
and do our cleanup.

205
00:07:19,470 --> 00:07:22,200
Now at this point, we just don't know if this is malicious

206
00:07:22,200 --> 00:07:24,540
or not, but we do know it's something different

207
00:07:24,540 --> 00:07:26,970
and something that is outside of the normal baseline

208
00:07:26,970 --> 00:07:28,800
as indicated by that big spike.

209
00:07:28,800 --> 00:07:30,510
So it's important for us to investigate it

210
00:07:30,510 --> 00:07:32,010
for the health of our network.

