0 1 00:00:11,160 --> 00:00:11,910 Hello everyone. 1 2 00:00:11,940 --> 00:00:16,230 Let us start analyzing a bunch of malicious PDF files. 2 3 00:00:16,310 --> 00:00:23,270 So again we'll be coming back to our FLARE suite of tools collection. 3 4 00:00:23,310 --> 00:00:29,280 You can see there is a folder for PDF. When you go inside you will find that there is a tool called PDF 4 5 00:00:29,280 --> 00:00:38,070 parser which is critical in parsing the complete PDF file and extracting malicious artifacts artifact. 5 6 00:00:38,140 --> 00:00:43,460 Then there is a shortcut for PDF-parser and there is a shortcut for PDfid. 6 7 00:00:43,480 --> 00:00:47,450 So these are basically the compiled executables of the actual program. 7 8 00:00:47,490 --> 00:00:51,670 If you go inside PDf-parser, you'll see that it's simply a python file. 8 9 00:00:51,700 --> 00:00:57,810 You we can run it just like we ran the other OLE file analysis tools in the previous videos. 9 10 00:00:58,240 --> 00:01:05,020 So Pdfid's original tool is not here. So we can just click on the properties of the shortcut and we 10 11 00:01:05,020 --> 00:01:08,240 can see that where exactly this shortcut is pointing to. 11 12 00:01:08,450 --> 00:01:11,880 They can see that it's in Program Files/pdfid 12 13 00:01:12,010 --> 00:01:18,550 So I don't really enjoy running the shortcuts. It's better to always run the python program directly 13 14 00:01:18,640 --> 00:01:23,110 so that in case there is any error you can look at it and try and resolve. 14 15 00:01:23,110 --> 00:01:25,480 So let's quickly go to program files. 15 16 00:01:27,000 --> 00:01:36,010 Pdfid and just copy it and move it to the FLARE folder. 16 17 00:01:36,080 --> 00:01:41,510 So you now have both pdf-parser and pdfid in the same FLARE directory. 17 18 00:01:41,510 --> 00:01:43,640 So pdfid is more of a 18 19 00:01:45,100 --> 00:01:51,650 meta information tool which gives you a bunch of information about the PDF file. 19 20 00:01:51,770 --> 00:01:57,260 For example how many page numbers are there, are there any javascripts inside it and things like that. 20 21 00:01:57,260 --> 00:02:02,870 Whereas PDF-parser is more of a dynamic parsing of the PDF file. 21 22 00:02:02,900 --> 00:02:06,410 So let's begin with using pdfid. 22 23 00:02:06,410 --> 00:02:07,350 For the first 23 24 00:02:12,680 --> 00:02:22,600 So I will come to my pdfid directory and my files are stored in course files/PDF files/PDF examples 24 25 00:02:22,610 --> 00:02:24,350 I have three examples here. 25 26 00:02:24,350 --> 00:02:28,700 So we'll be using them one on one 26 27 00:02:28,700 --> 00:02:37,420 We Will pass >python pdfid.py followed by the location of the file. 27 28 00:02:39,500 --> 00:02:43,360 So once you press enter it will give us a bunch of information. 28 29 00:02:43,360 --> 00:02:47,660 For example this PDF file has 26 objects inside it. 29 30 00:02:47,660 --> 00:02:53,240 If you remember from our previous discussion we talked about how PDF file is basically.....the body of PDF file 30 31 00:02:53,240 --> 00:02:57,920 consists of different objects and all those objects will begin with. 31 32 00:02:58,010 --> 00:03:01,650 'obj' and end with 'endobj'. 32 33 00:03:01,880 --> 00:03:08,780 So there are 26 objects and 26 end-objects so it is ending all the objects properly 33 34 00:03:08,990 --> 00:03:15,500 There are nine streams. Again the body of for the PDF files contain streams and these teams have the 34 35 00:03:15,500 --> 00:03:17,060 data. 35 36 00:03:17,240 --> 00:03:18,990 Then there is one cross-reference. 36 37 00:03:19,010 --> 00:03:21,690 There is one trailer one start xref 37 38 00:03:21,770 --> 00:03:23,790 There are three page numbers. 38 39 00:03:23,960 --> 00:03:27,050 There is one javascript as well. 39 40 00:03:27,050 --> 00:03:33,800 /JS tag and it has been picked up by PDfid 40 41 00:03:33,800 --> 00:03:35,100 well. 41 42 00:03:35,200 --> 00:03:37,190 There is an open action as well. 42 43 00:03:37,190 --> 00:03:42,640 So what I mean by open action here is that once you launch the PDf file, whatever is 43 44 00:03:42,650 --> 00:03:46,370 marked as open action will be immediately executed. 44 45 00:03:46,670 --> 00:03:55,060 So it's very important to understand all these meta properties that we have got from PDfid 45 46 00:03:55,160 --> 00:04:00,710 We already know a bunch of them but there are some of them which are new and the important ones are things 46 47 00:04:00,710 --> 00:04:04,860 like JS, Javascript, AA, openaction 47 48 00:04:04,920 --> 00:04:05,720 XFA, URI 48 49 00:04:05,720 --> 00:04:11,930 So URI again tells us is there is any URI that is present inside the PDF. The embedded file 49 50 00:04:11,930 --> 00:04:12,470 tells us. 50 51 00:04:12,470 --> 00:04:20,540 Is there any embedded file for example an executable or a Flash file that is inside the PDF. So the interesting 51 52 00:04:20,540 --> 00:04:22,340 parts here are javascript's. 52 53 00:04:22,370 --> 00:04:28,550 We know that this file contains javascript and there is an open action that is performed as well which 53 54 00:04:28,550 --> 00:04:33,980 means that as soon as we are launching the PDF ,the PDF is trying to do something without you know giving 54 55 00:04:33,980 --> 00:04:37,410 you any kind of permission or something. 55 56 00:04:37,410 --> 00:04:47,260 All you have to do is just from that PDF itself. let us run for our second file as well 56 57 00:04:47,260 --> 00:04:49,410 file we get something similar. 57 58 00:04:49,510 --> 00:04:54,770 There are 12 objects two streams it has two pages. 58 59 00:04:54,910 --> 00:05:03,580 And again it has javascript inside it and it performs open action as well and there is no embedded file 59 60 00:05:03,820 --> 00:05:06,950 and there is no URI inside that PDF file 60 61 00:05:08,580 --> 00:05:13,110 Let us try with our third example 61 62 00:05:13,230 --> 00:05:18,060 We have eight objects one stream one page. 62 63 00:05:18,060 --> 00:05:19,470 There is no javascript. 63 64 00:05:19,470 --> 00:05:25,900 In this case and that is one xfa, no URI. That's it. 64 65 00:05:25,920 --> 00:05:31,950 So this is how we first collect some kind of static information of the PDF file using pdfid and 65 66 00:05:31,950 --> 00:05:37,230 this can help us in making again some heuristic analysis of the PDF file by looking at the number 66 67 00:05:37,230 --> 00:05:43,020 of pages, whether it has some javascript's or not with it or its performing some open action or not and 67 68 00:05:43,020 --> 00:05:44,350 things like that. 68 69 00:05:44,370 --> 00:05:50,940 So once we have some kind of static heuristics about the PDF file, the next thing that we can do 69 70 00:05:50,940 --> 00:05:56,710 is we can start using PDF parser to actually look into these elements.