WEBVTT

00:01.040 --> 00:02.600
Hello, my name is Typhoon.

00:02.600 --> 00:10.880
And in this another oxygen lecture, you will learn how to gather usernames and email addresses and

00:10.880 --> 00:13.160
use scraping techniques.

00:26.790 --> 00:35.400
A technique that attackers utilize to extract a large number of data sets from websites whereby extracted

00:35.400 --> 00:41.460
data is stored locally in a file system is called scraping or web scraping.

00:41.700 --> 00:49.560
In this lecture, we will utilize some of the most commonly used tools in Linux to perform scraping.

00:50.190 --> 00:57.300
So let's gather usernames and email addresses by using the Harvester tool.

00:57.300 --> 01:04.320
So the Harvester is a and script that searches through popular search engines and other sites for email

01:04.320 --> 01:06.970
addresses, hosts and subdomains.

01:06.990 --> 01:14.440
Using the harvester is relatively simple, as there are only five command and switches to set here.

01:14.490 --> 01:19.550
The first layer I'm going to share this commands and use all of them of course.

01:19.560 --> 01:23.850
So let's actually use the harvester.

01:24.360 --> 01:28.950
As you can see, we you just writing the harvester and running in terminal.

01:29.310 --> 01:31.020
We'll get something like that.

01:31.950 --> 01:33.810
And here we're going to run this.

01:33.810 --> 01:37.290
The harvester, our rock.

01:38.910 --> 01:40.260
Let's call the Help.

01:40.260 --> 01:41.850
And here there's an.

01:44.270 --> 01:50.990
Options you can use when performing and running the harvester and using scraping techniques.

01:51.670 --> 01:52.300
Here.

01:52.810 --> 01:57.130
So now we're going to use the DT parameter domain.

01:57.490 --> 02:02.590
So here we will use our own domain here.

02:03.070 --> 02:05.860
The Ha star.

02:06.620 --> 02:07.100
The.

02:07.100 --> 02:08.170
This is, uh.

02:08.180 --> 02:12.620
This means we're going to enter the domain after this parameter here.

02:12.650 --> 02:13.670
Oxley.

02:15.410 --> 02:16.790
On Slate.com.

02:17.210 --> 02:22.700
And after that, the here we're going to enter the L limit.

02:22.700 --> 02:25.670
We're going to limit the number of search results.

02:26.000 --> 02:29.360
As you can see here, default is 500.

02:29.360 --> 02:33.380
So if you don't enter any parameters, the default is going to be 500.

02:33.380 --> 02:39.380
But I don't find a lot of results because it's my website and I know what's inside it here.

02:39.980 --> 02:48.020
We're going to set and limit to 200, for example, and now we're going to use the be Google.

02:48.050 --> 02:48.950
Google.

02:49.880 --> 02:51.800
This is the browser that we're going to use here.

02:51.800 --> 02:56.030
Let's translate it p source here.

02:56.270 --> 02:56.930
There's an.

02:58.610 --> 03:07.820
Options you can use as search engine or other kinds of engines that search things such as repeat DNS,

03:07.940 --> 03:14.540
GitHub code, hacker, target hunter, interlink omni, send security trail supplies.

03:14.540 --> 03:21.770
The rest, you know, we all use this tool DNS dumpster and other search engines, for example, being

03:21.980 --> 03:24.740
Baidu and Google.

03:24.740 --> 03:25.160
Yahoo!

03:25.160 --> 03:33.050
Zoom here enter in also invalid search because there's no Google option here.

03:34.990 --> 03:35.590
Then.

03:35.590 --> 03:37.990
Then we're gonna use Bing or.

03:39.600 --> 03:42.360
Being Google GitHub.

03:42.630 --> 03:47.460
Yeah, we will use the Bing here.

03:48.060 --> 03:49.590
Target Telecom.

03:50.220 --> 03:51.420
This is our own websites.

03:51.450 --> 03:59.520
No, not nothing is found because our website is not so famous in Bing and we don't actually give the

03:59.520 --> 04:03.990
search and scraping access to being here.

04:05.070 --> 04:07.710
Let's actually use DuckDuckGo.

04:10.840 --> 04:11.380
Duck.

04:11.440 --> 04:12.370
Duck, call.

04:15.490 --> 04:16.750
Same result here.

04:30.940 --> 04:32.560
It's actually use Yahoo!

04:37.360 --> 04:39.550
And let's actually use the.

04:40.410 --> 04:41.670
GitHub.com.

04:43.960 --> 04:45.550
No emails found.

04:48.260 --> 04:48.860
Ing.

04:51.520 --> 04:53.100
I'm going to search on the bank.

04:53.110 --> 04:59.650
And here we got the IP addresses and subdomains about containing to GitHub.

05:00.370 --> 05:01.210
So.

05:02.320 --> 05:11.290
Here and the attackers can utilize the LinkedIn IP to extract a list of people within a given domain

05:11.290 --> 05:15.760
and easily form a list of possible valid email addresses and usernames.

05:15.760 --> 05:23.920
So an example will be when an organization uses the first and last names within the format of for example.

05:25.000 --> 05:31.240
Example at uh at Oxley dot com for example at Oxford dot com.

05:31.240 --> 05:38.860
The Harvester tool can be utilized to enumerate the user details on who is currently working on organizations

05:39.010 --> 05:39.670
also.

05:40.030 --> 05:47.440
So we can also use the obtain user information in next lectures with tiny.

05:47.440 --> 05:48.910
I'm waiting you in the next lecture.
