1 00:00:00,690 --> 00:00:05,290 Hello and welcome back to a new section in this new section. 2 00:00:05,310 --> 00:00:14,070 We are going to start looking at how to analyze Microsoft Office documents. 3 00:00:14,790 --> 00:00:24,420 We will start off by looking at the principles of analyzing office documents, office attacks, one 4 00:00:24,420 --> 00:00:34,200 of the many ways in which malware authors are now using in order to infiltrate and in fact, computers. 5 00:00:34,920 --> 00:00:38,580 And they can do so using three methods. 6 00:00:39,210 --> 00:00:50,010 First is by using macros, macros, VBA scripts, which are embedded inside the office document and 7 00:00:50,390 --> 00:00:53,660 will execute the office document is open. 8 00:00:54,360 --> 00:01:03,300 It is one of the most powerful VBA scripting language available and can do almost anything which the 9 00:01:03,300 --> 00:01:14,260 malware onto, which is the second is to use features of the office programs, for example, DDE, that 10 00:01:14,310 --> 00:01:19,300 can also automate certain tasks and execute certain commands. 11 00:01:20,070 --> 00:01:29,430 The third is to take advantage of vulnerabilities within the office programs themselves by using a splice 12 00:01:29,430 --> 00:01:30,210 to do so. 13 00:01:32,680 --> 00:01:43,960 Document analysis consists of looking for certain keywords, certain signs of malicious elements, for 14 00:01:43,960 --> 00:01:52,940 example, scripts, for example, VBA macros, he will look for commands in the documents. 15 00:01:53,290 --> 00:02:01,210 So if you come across any kind of document that you raise suspicion and also you should look out for 16 00:02:01,240 --> 00:02:09,670 embedded files and these embedded files could be binary files or even compressed or encoded obfuscated 17 00:02:09,670 --> 00:02:14,020 files within the Microsoft document. 18 00:02:15,370 --> 00:02:23,230 First degree need to be aware of that is that there are two versions of Microsoft Office, also known 19 00:02:23,230 --> 00:02:25,180 as office document formats. 20 00:02:25,750 --> 00:02:31,210 The first version is the older version, which is before 2007. 21 00:02:32,050 --> 00:02:42,390 And in this version, the Microsoft documents will have these extensions, the D.O.C., the Excel and 22 00:02:42,550 --> 00:02:44,170 PBT PowerPoint. 23 00:02:44,950 --> 00:02:52,570 This story format is also known as structured storage format or as has every shot. 24 00:02:53,740 --> 00:03:02,650 And after 2007, Microsoft released a new format also known as OpenOffice XML format. 25 00:03:03,310 --> 00:03:12,950 And in this format, the Microsoft document itself is actually an archive, a sepak containing XML site. 26 00:03:13,720 --> 00:03:24,940 And if this is a format, then you will see the extension, the O.C. X, dot, x, x and so on. 27 00:03:26,550 --> 00:03:33,910 And in this format, if you see it is done in the U.S. and it means that it has got my crew excited. 28 00:03:34,980 --> 00:03:45,660 However, even me counting macros that may not be permitted to run the smackeroos by certain authors 29 00:03:45,660 --> 00:03:49,110 have ways in which it can be overcome. 30 00:03:50,190 --> 00:03:53,520 So these are the two main office formats you should be aware. 31 00:03:55,890 --> 00:04:04,070 Let's just take a look at the SS effort in the old format structured storage format. 32 00:04:04,620 --> 00:04:17,450 The header of this structure will contain a string in hexadecimal, which is the zero one one easy one, 33 00:04:17,640 --> 00:04:19,020 be one twenty one. 34 00:04:19,500 --> 00:04:30,030 And this is easy to remember because these ossy f i e Duquan hecho speak to denote certain strings, 35 00:04:30,040 --> 00:04:31,830 for example, in this case dogfighter. 36 00:04:32,940 --> 00:04:42,450 And then below that you will have a hierarchy of fast system to start, which is the root storage and 37 00:04:42,450 --> 00:04:50,190 below the RESTARICK you have children, you can have stories as well as streams, and it is in the streams 38 00:04:50,190 --> 00:04:54,510 in which you can find some other data, including inverted files. 39 00:04:55,590 --> 00:05:07,290 And one of the important and useful tool in studying analysing the excessive format for acquisition 40 00:05:07,290 --> 00:05:16,860 of this document is the Setto or E 2s, where it does consist of several scripts, for example, or 41 00:05:17,160 --> 00:05:25,210 Browse, whose function is to view and extract streams or times is used to extract time. 42 00:05:25,260 --> 00:05:34,380 So timestamp and this is useful in order to gain some metadata information about when the malicious 43 00:05:34,380 --> 00:05:36,000 document was created. 44 00:05:36,840 --> 00:05:43,500 We also have or idea, too, which is useful for looking for malicious characteristics. 45 00:05:43,950 --> 00:05:47,850 And this might be the first to use when you are doing triaging. 46 00:05:48,150 --> 00:05:57,270 Similar to what we did when we were studying, analyzing PDA documents, we used PDA ID for triaging. 47 00:05:58,230 --> 00:06:04,950 And then we also have the oil Yuva tool, which is used for extracting VBA scripts. 48 00:06:05,640 --> 00:06:14,860 And VBA could also be used for the new Microsoft Office format, which is the archive. 49 00:06:14,880 --> 00:06:16,170 Is it style format. 50 00:06:17,790 --> 00:06:21,810 And these are some this is a link for further information. 51 00:06:21,810 --> 00:06:28,830 If you wish to look into more details about these tools and these tools also come ranks. 52 00:06:30,060 --> 00:06:37,980 Next, we will look at a new Microsoft Office format, which is the Open Office XML format, as we have 53 00:06:37,980 --> 00:06:47,310 seen at the early stages of this course, this is an open XML format with the file, with the extension 54 00:06:47,760 --> 00:06:48,580 DCX. 55 00:06:49,080 --> 00:06:57,360 And inside it, you have an archive consisting of several files, the XML files and the location in 56 00:06:57,600 --> 00:07:05,610 various directories and also directory directory, which contains the web, a project being far. 57 00:07:06,450 --> 00:07:16,760 If the VBA project beautifies this, it means that this document contains VBA scripts, also known as 58 00:07:16,770 --> 00:07:17,550 macro's. 59 00:07:18,600 --> 00:07:26,760 However, normally we behave, we take being will not be allowed to execute you extension, you don't 60 00:07:26,760 --> 00:07:27,670 see X. 61 00:07:28,030 --> 00:07:34,830 However, as I mentioned earlier, now the others may have ways to overcome this limitation. 62 00:07:34,980 --> 00:07:40,200 And it is important to remember that this format is a binary format. 63 00:07:40,200 --> 00:07:45,150 So we can't open it directly with her editor to look at it. 64 00:07:45,630 --> 00:07:53,970 Any other format is deoxy and when you see em, you can be certain that there is a VBA project being 65 00:07:54,120 --> 00:07:55,230 filed within it. 66 00:07:55,270 --> 00:08:05,040 And therefore he has got the VBA macro's the tools that I use for analyzing Microsoft Office documents, 67 00:08:06,180 --> 00:08:15,180 metadata to use and example EXIF to which we also have used before when we were doing our PDF files 68 00:08:15,180 --> 00:08:27,000 analysis, then we can also use for signature retention Zik done plus YAARA and for studying and extracting 69 00:08:27,000 --> 00:08:32,580 VBA scripts from the office file you can use already VBA. 70 00:08:35,160 --> 00:08:42,630 The overflowing analyzing office documents, when you are faced with an office document, the first 71 00:08:42,630 --> 00:08:50,310 step is to determine whether it is the old format, which is the essay format, or is it a new open 72 00:08:50,310 --> 00:08:51,710 office format. 73 00:08:52,350 --> 00:08:58,780 So we need to determine the document time and we can use for identification tools for that. 74 00:08:59,730 --> 00:09:07,380 Second, we will search for malicious indicators using tools in order to look for keywords, files, 75 00:09:07,560 --> 00:09:11,700 structures, file names and so on and enter. 76 00:09:12,180 --> 00:09:20,650 If you were to find any hidden or embedded files, we will extract those files and continue analysis. 77 00:09:21,780 --> 00:09:26,850 So that is all for the principals analyzing these documents. 78 00:09:27,210 --> 00:09:28,320 Thank you for watching. 79 00:09:28,570 --> 00:09:31,260 I'll see you in the next video.