1 00:00:01,040 --> 00:00:02,600 Hello, my name is Typhoon. 2 00:00:02,600 --> 00:00:10,880 And in this another oxygen lecture, you will learn how to gather usernames and email addresses and 3 00:00:10,880 --> 00:00:13,160 use scraping techniques. 4 00:00:26,790 --> 00:00:35,400 A technique that attackers utilize to extract a large number of data sets from websites whereby extracted 5 00:00:35,400 --> 00:00:41,460 data is stored locally in a file system is called scraping or web scraping. 6 00:00:41,700 --> 00:00:49,560 In this lecture, we will utilize some of the most commonly used tools in Linux to perform scraping. 7 00:00:50,190 --> 00:00:57,300 So let's gather usernames and email addresses by using the Harvester tool. 8 00:00:57,300 --> 00:01:04,320 So the Harvester is a and script that searches through popular search engines and other sites for email 9 00:01:04,320 --> 00:01:06,970 addresses, hosts and subdomains. 10 00:01:06,990 --> 00:01:14,440 Using the harvester is relatively simple, as there are only five command and switches to set here. 11 00:01:14,490 --> 00:01:19,550 The first layer I'm going to share this commands and use all of them of course. 12 00:01:19,560 --> 00:01:23,850 So let's actually use the harvester. 13 00:01:24,360 --> 00:01:28,950 As you can see, we you just writing the harvester and running in terminal. 14 00:01:29,310 --> 00:01:31,020 We'll get something like that. 15 00:01:31,950 --> 00:01:33,810 And here we're going to run this. 16 00:01:33,810 --> 00:01:37,290 The harvester, our rock. 17 00:01:38,910 --> 00:01:40,260 Let's call the Help. 18 00:01:40,260 --> 00:01:41,850 And here there's an. 19 00:01:44,270 --> 00:01:50,990 Options you can use when performing and running the harvester and using scraping techniques. 20 00:01:51,670 --> 00:01:52,300 Here. 21 00:01:52,810 --> 00:01:57,130 So now we're going to use the DT parameter domain. 22 00:01:57,490 --> 00:02:02,590 So here we will use our own domain here. 23 00:02:03,070 --> 00:02:05,860 The Ha star. 24 00:02:06,620 --> 00:02:07,100 The. 25 00:02:07,100 --> 00:02:08,170 This is, uh. 26 00:02:08,180 --> 00:02:12,620 This means we're going to enter the domain after this parameter here. 27 00:02:12,650 --> 00:02:13,670 Oxley. 28 00:02:15,410 --> 00:02:16,790 On Slate.com. 29 00:02:17,210 --> 00:02:22,700 And after that, the here we're going to enter the L limit. 30 00:02:22,700 --> 00:02:25,670 We're going to limit the number of search results. 31 00:02:26,000 --> 00:02:29,360 As you can see here, default is 500. 32 00:02:29,360 --> 00:02:33,380 So if you don't enter any parameters, the default is going to be 500. 33 00:02:33,380 --> 00:02:39,380 But I don't find a lot of results because it's my website and I know what's inside it here. 34 00:02:39,980 --> 00:02:48,020 We're going to set and limit to 200, for example, and now we're going to use the be Google. 35 00:02:48,050 --> 00:02:48,950 Google. 36 00:02:49,880 --> 00:02:51,800 This is the browser that we're going to use here. 37 00:02:51,800 --> 00:02:56,030 Let's translate it p source here. 38 00:02:56,270 --> 00:02:56,930 There's an. 39 00:02:58,610 --> 00:03:07,820 Options you can use as search engine or other kinds of engines that search things such as repeat DNS, 40 00:03:07,940 --> 00:03:14,540 GitHub code, hacker, target hunter, interlink omni, send security trail supplies. 41 00:03:14,540 --> 00:03:21,770 The rest, you know, we all use this tool DNS dumpster and other search engines, for example, being 42 00:03:21,980 --> 00:03:24,740 Baidu and Google. 43 00:03:24,740 --> 00:03:25,160 Yahoo! 44 00:03:25,160 --> 00:03:33,050 Zoom here enter in also invalid search because there's no Google option here. 45 00:03:34,990 --> 00:03:35,590 Then. 46 00:03:35,590 --> 00:03:37,990 Then we're gonna use Bing or. 47 00:03:39,600 --> 00:03:42,360 Being Google GitHub. 48 00:03:42,630 --> 00:03:47,460 Yeah, we will use the Bing here. 49 00:03:48,060 --> 00:03:49,590 Target Telecom. 50 00:03:50,220 --> 00:03:51,420 This is our own websites. 51 00:03:51,450 --> 00:03:59,520 No, not nothing is found because our website is not so famous in Bing and we don't actually give the 52 00:03:59,520 --> 00:04:03,990 search and scraping access to being here. 53 00:04:05,070 --> 00:04:07,710 Let's actually use DuckDuckGo. 54 00:04:10,840 --> 00:04:11,380 Duck. 55 00:04:11,440 --> 00:04:12,370 Duck, call. 56 00:04:15,490 --> 00:04:16,750 Same result here. 57 00:04:30,940 --> 00:04:32,560 It's actually use Yahoo! 58 00:04:37,360 --> 00:04:39,550 And let's actually use the. 59 00:04:40,410 --> 00:04:41,670 GitHub.com. 60 00:04:43,960 --> 00:04:45,550 No emails found. 61 00:04:48,260 --> 00:04:48,860 Ing. 62 00:04:51,520 --> 00:04:53,100 I'm going to search on the bank. 63 00:04:53,110 --> 00:04:59,650 And here we got the IP addresses and subdomains about containing to GitHub. 64 00:05:00,370 --> 00:05:01,210 So. 65 00:05:02,320 --> 00:05:11,290 Here and the attackers can utilize the LinkedIn IP to extract a list of people within a given domain 66 00:05:11,290 --> 00:05:15,760 and easily form a list of possible valid email addresses and usernames. 67 00:05:15,760 --> 00:05:23,920 So an example will be when an organization uses the first and last names within the format of for example. 68 00:05:25,000 --> 00:05:31,240 Example at uh at Oxley dot com for example at Oxford dot com. 69 00:05:31,240 --> 00:05:38,860 The Harvester tool can be utilized to enumerate the user details on who is currently working on organizations 70 00:05:39,010 --> 00:05:39,670 also. 71 00:05:40,030 --> 00:05:47,440 So we can also use the obtain user information in next lectures with tiny. 72 00:05:47,440 --> 00:05:48,910 I'm waiting you in the next lecture.