1 00:00:00,270 --> 00:00:07,710 In this lesson, we will discuss about the different phases of static malware analysis, as we already know, 2 00:00:08,040 --> 00:00:13,410 static analysis is a technique of analyzing a sample file without executing it. 3 00:00:14,760 --> 00:00:21,420 This method involves extraction of useful information from the suspected binary, which can help us 4 00:00:22,080 --> 00:00:31,260 make informed decisions on how to classify the malware, how to analyze it and where to focus our subsequent 5 00:00:31,560 --> 00:00:32,820 analysis efforts. 6 00:00:34,230 --> 00:00:38,040 As a first step, we do a antivirus scan. 7 00:00:39,280 --> 00:00:47,620 If the antivirus detect the sample as malware, then we don't need to do any further analysis as all 8 00:00:47,620 --> 00:00:52,030 the information about the malware will be available in the vendor portal. 9 00:00:53,280 --> 00:00:59,100 However, it is worth noting that this scan has to be performed with all the features in the antivirus 10 00:00:59,100 --> 00:01:02,310 enabled, something called as deep scan. 11 00:01:04,180 --> 00:01:12,010 If the antivirus doesn't detect the sample as a malware, as a second step, we will see if any other 12 00:01:12,010 --> 00:01:14,890 vendor will detect this as a malware. 13 00:01:15,550 --> 00:01:22,450 That is, we might be using Symantec in our company and the company might not have the signature to 14 00:01:22,450 --> 00:01:24,870 detect it, this specific malware. 15 00:01:25,660 --> 00:01:33,820 So we will see if any other vendor like McAfee or Trend Micro or Kaspersky can detect this sample as 16 00:01:33,820 --> 00:01:34,290 malware. 17 00:01:35,230 --> 00:01:39,810 Now, does this mean that we need to purchase all these antiviruses? 18 00:01:40,810 --> 00:01:41,260 No. 19 00:01:41,770 --> 00:01:50,560 There is a Web based tool called As Virustotal this tool is run and managed by Google and is free to be 20 00:01:50,560 --> 00:01:51,760 used by anyone. 21 00:01:52,540 --> 00:01:56,260 It can be accessed at 22 00:01:56,380 --> 00:01:57,550 www.virustotal.com 23 00:01:59,260 --> 00:02:04,000 How do use this tool, should we submit the file sample directly to virustotal? 24 00:02:04,960 --> 00:02:06,970 There is a risk in doing so. 25 00:02:08,000 --> 00:02:14,110 Remember, the file is a suspicious file, but it is not yet proven to be a malware. 26 00:02:15,040 --> 00:02:20,980 What if this file is a clean file and holds on some sensitive information about your company? 27 00:02:22,400 --> 00:02:27,570 If you upload it to virustotal, everyone in virustotal community can see the file. 28 00:02:28,550 --> 00:02:32,690 So it is never a good idea to submit the file itself. 29 00:02:33,660 --> 00:02:34,720 Then how do we do it? 30 00:02:35,490 --> 00:02:37,770 We submit the file hash. 31 00:02:38,710 --> 00:02:44,260 Every file has a unique fingerprint called file hash. 32 00:02:45,890 --> 00:02:55,250 So how do we get the file hash, we have a tool called as hashcalc short for hash calculater to calculate 33 00:02:55,250 --> 00:03:01,460 the hash values using different algorithms in this image, we have submitted a file sample called 34 00:03:01,460 --> 00:03:03,590 track 01 dot mp3 and. 35 00:03:03,590 --> 00:03:09,260 Hashcalc has calculated various hashes for us like MD5, SHA one, etc.. 36 00:03:11,300 --> 00:03:17,720 It is important to follow the best practices to evaluate if using virustotal is the right for the 37 00:03:17,720 --> 00:03:25,340 scenario, one has to realize that the details of all the submitted files are stored and accessible 38 00:03:25,340 --> 00:03:27,550 to all virustotal community users. 39 00:03:28,460 --> 00:03:30,220 So never submit the actual file. 40 00:03:30,380 --> 00:03:32,510 Instead, use the file hash. 41 00:03:33,910 --> 00:03:40,480 There is a risk involved in submitting the hash too , which is the attacker, might also subscribe 42 00:03:40,480 --> 00:03:48,490 to virustotal community and keeps checking if anyone submits the hash of the malware he has written in a 43 00:03:48,490 --> 00:03:50,890 targeted, advanced, persistent threat. 44 00:03:51,850 --> 00:03:56,830 This might force him to trigger other methods or use other attack vectors. 45 00:03:59,030 --> 00:04:05,690 Next step is we need to understand the actual file type attackers use different techniques to hide 46 00:04:05,690 --> 00:04:12,170 their file by modifying the extension, changing the icon, etc. This is done in order to trick the 47 00:04:12,170 --> 00:04:19,730 users, to execute it so it becomes necessary to determine the actual file type of the suspected malware. 48 00:04:21,060 --> 00:04:27,300 Knowing the file type will help in identifying the malware target operating system, that is whether 49 00:04:27,300 --> 00:04:33,720 it is targeting Windows or Linux or Mac operating systems and also the architectures like 32 bit or 50 00:04:33,960 --> 00:04:35,240 64 bit OS. 51 00:04:36,950 --> 00:04:39,440 File types can be identified by the magic number. 52 00:04:40,100 --> 00:04:42,490 Remember the DOS header in the PE file format? 53 00:04:43,370 --> 00:04:49,780 Magic number is a number embedded at the beginning of a file that indicates its file format. 54 00:04:50,570 --> 00:04:53,010 It is also referred to as file signature. 55 00:04:53,960 --> 00:05:01,190 Though magic numbers are not visible to users there are specialized tools called as hex editors to see the 56 00:05:01,190 --> 00:05:05,810 magic number one such tool is HxD. 57 00:05:07,390 --> 00:05:15,100 As shown in the image, when a file is loaded onto HxD the very first few bytes indicates the file type, 58 00:05:15,940 --> 00:05:17,230 in this case it is. 59 00:05:17,650 --> 00:05:28,010 FF D8 FF E0, which denote a JPEG file the list of all file signatures are recorded in 60 00:05:28,010 --> 00:05:29,080 the Wikipedia. 61 00:05:29,530 --> 00:05:33,390 A link to the same is provided in the resource section of this lesson. 62 00:05:34,950 --> 00:05:41,530 There is an alternate tool to detect the file type of a sample called Exeinfo PE. 63 00:05:42,360 --> 00:05:46,770 This is a graphical tool that tells if a file is executable or not. 64 00:05:48,730 --> 00:05:51,460 Next, we need to check for Packers'. 65 00:05:52,520 --> 00:05:56,640 To refresh the concept, packers are used to compressed binary files. 66 00:05:57,980 --> 00:06:01,550 Malware authors use the Packers to obfuscate their malware. 67 00:06:02,380 --> 00:06:07,510 Example of packers, include UPX, Exe Stealth and PESpin 68 00:06:09,000 --> 00:06:16,470 When binary is packed, the number of readable strings in the sample are reduced, thereby making the 69 00:06:16,470 --> 00:06:23,370 process of malware analysis complex so it is essential to unpack a sample before analysing it. 70 00:06:25,270 --> 00:06:33,310 Exeinfo PE, which was used to learn about the file format, will also help in checking if a file 71 00:06:33,310 --> 00:06:33,850 is packed. 72 00:06:34,900 --> 00:06:37,300 It also gives unpacking instructions. 73 00:06:38,710 --> 00:06:47,500 In the image we have loaded, Exeinfo PE dot exe and the tool says it is packed with UPX. 74 00:06:49,030 --> 00:06:52,290 Down below, it also gives us hints on unpacking the file. 75 00:06:53,470 --> 00:07:00,100 In this case, we can use the tool upx dot exe with the argument of dash d. 76 00:07:03,000 --> 00:07:11,940 Next important step in static malware analysis is string analysis, it is a process of extracting readable characters 77 00:07:11,940 --> 00:07:13,270 and words from the malware. 78 00:07:14,370 --> 00:07:17,780 Strings will help us in understanding the functionality of a malware. 79 00:07:18,510 --> 00:07:24,510 It does so by giving us information like the list of libraries and functions used in the malware. 80 00:07:26,070 --> 00:07:31,740 Any message that the program is trying to print on the screen, like error messages, pop ups, etc.. 81 00:07:33,260 --> 00:07:37,190 File names and file path it is creating or accessing or modifying. 82 00:07:38,470 --> 00:07:45,650 URL's and IP's present typically malwares connected to a domain on an IP address, which is called as 83 00:07:45,670 --> 00:07:50,200 Command and Control, through which the malware author controls the malware. 84 00:07:51,790 --> 00:07:58,120 And string analysis also gives us a registry keys, the keys being created, deleted or modified. 85 00:07:59,380 --> 00:08:06,400 Care has to be exercised while doing string analysis because attacker might include fake string to 86 00:08:06,400 --> 00:08:07,990 misguide the analysis. 87 00:08:09,790 --> 00:08:13,120 There are two widely used tools for string analysis. 88 00:08:13,660 --> 00:08:17,920 One is a Sysinternal tool from Microsoft called strings. 89 00:08:19,310 --> 00:08:23,810 It is a command line tool, the other one is bin text. 90 00:08:24,920 --> 00:08:32,120 It works exactly like strings, but has a graphical user interface, next phase of analysis is to look 91 00:08:32,120 --> 00:08:34,110 into the details of PE file. 92 00:08:34,910 --> 00:08:37,990 We have already learned a few things about PE file structure. 93 00:08:38,810 --> 00:08:43,490 Let's use the knowledge in understanding the malicious indicators in the PE file. 94 00:08:44,450 --> 00:08:47,300 First, we will look at a time date stamp. 95 00:08:48,470 --> 00:08:52,130 This is a date and time when the file was built or compiled. 96 00:08:53,380 --> 00:08:59,650 This is usually set during the compilation process, but it can easily be modified by some specialized 97 00:08:59,650 --> 00:09:00,040 tools. 98 00:09:01,320 --> 00:09:07,770 If the value is older than 1992 or said to a future date like 2022, it 99 00:09:07,770 --> 00:09:09,780 would possibly be a malware file. 100 00:09:11,400 --> 00:09:14,800 Next, we will look into the number of sections field. 101 00:09:15,820 --> 00:09:23,590 Most non-malicious PE file use small number of section. usually between one to six, if there are more 102 00:09:23,590 --> 00:09:28,390 number of sections like more than 10, it could possibly be a malware. 103 00:09:30,190 --> 00:09:38,560 We should also focus on the characteristics or permissions of the sections, usually the dot text section 104 00:09:38,740 --> 00:09:43,090 that contains the executable code should never have write permissions. 105 00:09:44,060 --> 00:09:52,280 However, malware authors keep changing the dot text section in order to morph the malware in order to avoid 106 00:09:52,310 --> 00:09:52,850 detection. 107 00:09:53,890 --> 00:09:59,280 Write permission on a dot text section is an indicator of a malicious file. 108 00:10:00,600 --> 00:10:08,020 Next, important section we look into is the resources section denoted by dot rsrc 109 00:10:09,760 --> 00:10:16,900 As discussed in earlier modules, this section holds the supporting images, fonts, icons, strings, 110 00:10:17,050 --> 00:10:17,610 etc.. 111 00:10:18,740 --> 00:10:25,340 A malware could use the resource section to store configuration data and code that assists in malicious 112 00:10:25,400 --> 00:10:26,090 functionality. 113 00:10:28,250 --> 00:10:34,490 Another aspect of resources is the language attribute, which will help us in understanding if the malware 114 00:10:34,640 --> 00:10:43,160 is targeting a specific country based on language like Russian, Chinese, Arabic, etc. in some rare 115 00:10:43,160 --> 00:10:46,660 cases, it could help to attribute the source of malware. 116 00:10:47,180 --> 00:10:49,940 That is, identifying who is behind the malware. 117 00:10:50,810 --> 00:10:56,210 But malware authors deliberately include a different language to misguide the analysis. 118 00:10:57,870 --> 00:11:01,500 Next, we will look at the size of the resource section. 119 00:11:02,570 --> 00:11:07,640 Usually the size of the resource section will be less compared to the overall file size. 120 00:11:09,100 --> 00:11:13,130 Like less than 25 percent compared to the overall file size. 121 00:11:13,900 --> 00:11:24,550 That is if the full file size is 1 mb, the resources could be around 250 to 300 kb, anything 122 00:11:24,550 --> 00:11:27,160 more than 30 percent could mean it is malicious. 123 00:11:28,170 --> 00:11:35,190 This is because malware authors have a practice of using resources section as a storage place for additional 124 00:11:35,190 --> 00:11:44,190 execution code, in some cases, an entirely new PE 32 file is stored within a resources section, only to 125 00:11:44,190 --> 00:11:47,040 be dropped onto a system post infection. 126 00:11:49,590 --> 00:11:57,270 We should also consider the entropy of the whole file and entropy of each section, entropy is the measurement 127 00:11:57,270 --> 00:11:58,410 of randomness. 128 00:11:59,250 --> 00:12:02,510 More randomness means higher entropy value. 129 00:12:03,270 --> 00:12:07,560 If the entropy is high, then it could mean that the binary is encrypted. 130 00:12:08,530 --> 00:12:12,400 Which is an obfuscation technique used by malware authors. 131 00:12:14,150 --> 00:12:20,340 The tool that will help us to dissect and understand the PE file structure is PE studio. 132 00:12:20,990 --> 00:12:27,620 In fact, this one tool can help us understand various aspects of the file sample, including strings, 133 00:12:27,830 --> 00:12:30,970 libraries and functions, magic number, etc.. 134 00:12:32,390 --> 00:12:37,430 In the next lesson, we will take a look at demonstrations of all the tools discussed here.