{
    "id": "46c8d819-3048-40bf-902f-777c1fe136dc",
    "name": "Basic PDF Analysis",
    "slug": "basic-pdf-analysis",
    "status": "published",
    "lab_type": "pta",
    "is_sample": false,
    "duration_in_seconds": 1800,
    "metadata": {
        "courses": [
            "225b7429-bd2e-433e-9168-318d861e97cf"
        ],
        "pta_sdn": "62",
        "pta_namespace": "my.ine",
        "learning_paths": [],
        "has_published_parent": true
    },
    "session": null,
    "company": "a491bc32-c056-4946-9169-cc053387bada",
    "created": "2022-03-30T02:50:40.672817Z",
    "modified": "2024-04-30T14:40:58.585299Z",
    "is_beta": false,
    "lab_objectives": [],
    "main_learning_area": "3e1aa06f-2e9f-4789-b50d-aa027ad8dcfa",
    "learning_areas": [
        {
            "id": "3e1aa06f-2e9f-4789-b50d-aa027ad8dcfa",
            "name": "Cyber Security",
            "slug": "cyber-security"
        }
    ],
    "categories": [],
    "tags": [],
    "difficulty": null,
    "is_web_access": false,
    "is_lab_experience": false,
    "is_featured": false,
    "cve": null,
    "severity": null,
    "year": null,
    "classification": null,
    "external_url": "",
    "solution_video": null,
    "explanation_video": null,
    "description": "# Scenario\n\nYou are a forensic examiner in a large firm. David, a colleague of yours from the HR department, received two resumes for an open position within the firm. David viewed the resumes and listed the senders as a possible candidates.\n\n![3](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/3.png)\n\nA few days later, the firewall administrator noticed a strange connection going from David's machine to outside the network. David is sure that he hasn't opened any suspicious executable and that he only opened the two resumes he received. However, David remembers something interesting. David says that one of the files popped up a \"save as\" window.\n\n![4](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/4.png)\n\nWhen he pressed \"Cancel,\" another dialog window, which he has never seen before, appeared requesting to click \"Open\" in order to view an encrypted content within the file.\n\n![5](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/5.png)\n\nSadly, David doesn't remember which file it was, since the two names are similar to each other.\nYou were called to examine those files and provide feedback.\n\n# Goals\n\n-   Learn how to profile and examine a PDF file and be able to tell whether it's malicious or not.\n\n# What you will learn\n\n-   Examine the PDF file format\n-   Use various tools to statically analyze a PDF file.\n-   Determine the PDF's file content.\n-   Extract and analyze suspicious objects.\n\n# Recommended tools\n\n-   **Pyid**\n-   **Origami Framework**\n-   **PDF_Parser**",
    "description_html": "<h1>Scenario</h1>\n<p>You are a forensic examiner in a large firm. David, a colleague of yours from the HR department, received two resumes for an open position within the firm. David viewed the resumes and listed the senders as a possible candidates.</p>\n<p><img alt=\"3\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/3.png\" /></p>\n<p>A few days later, the firewall administrator noticed a strange connection going from David's machine to outside the network. David is sure that he hasn't opened any suspicious executable and that he only opened the two resumes he received. However, David remembers something interesting. David says that one of the files popped up a \"save as\" window.</p>\n<p><img alt=\"4\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/4.png\" /></p>\n<p>When he pressed \"Cancel,\" another dialog window, which he has never seen before, appeared requesting to click \"Open\" in order to view an encrypted content within the file.</p>\n<p><img alt=\"5\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/5.png\" /></p>\n<p>Sadly, David doesn't remember which file it was, since the two names are similar to each other.\nYou were called to examine those files and provide feedback.</p>\n<h1>Goals</h1>\n<ul>\n<li>Learn how to profile and examine a PDF file and be able to tell whether it's malicious or not.</li>\n</ul>\n<h1>What you will learn</h1>\n<ul>\n<li>Examine the PDF file format</li>\n<li>Use various tools to statically analyze a PDF file.</li>\n<li>Determine the PDF's file content.</li>\n<li>Extract and analyze suspicious objects.</li>\n</ul>\n<h1>Recommended tools</h1>\n<ul>\n<li><strong>Pyid</strong></li>\n<li><strong>Origami Framework</strong></li>\n<li><strong>PDF_Parser</strong></li>\n</ul>",
    "tasks": "# Tasks\n\n## TASK 1: OBTAIN GENERAL OVERVIEW OF THE SUSPICIOUS PDF FILES\n\nYou can find the 'under investigation' PDF files at **C:\\\\DFP\\\\Labs\\\\Module3\\\\Lab5**].\n\nRun the PDF id tool on both files and examine the result. Are there any clear differences between the two results? If yes, what does that indicate? Is that indication relevant to the investigation?\n\n## TASK 2: EXTRACT THE FILES METADATA\n\nUse the origami framework to extract the metadata from both files. Compare the two outputs for any differences and try to figure out whether they are relevant to our investigation.\n\n## TASK 3: LIST THE OBJECTS IN THE SUSPICIOUS FILE \n\nUse PDF_Parser.py to perform in-depth examination against the suspicious file and list its content.\nAfter that, try to extract what may seem interesting. After this stage, is it clear which PDF file caused the problem? What do you have to support that claim?\n\n## TASK 4: PREPARING THE EXTRACTED OBJECT FOR ANALYSIS \n\nAlthough we extracted the most interesting object within the document, sometimes it may still require some work before we can analyze it. Make sure the object you extracted is analysis-ready.\n\n## TASK 5: EXTRACTING THE EVIL CODE FROM THE OBJECT\n\nNow that the evil object and its code are in an analysis-ready state, extract the evil code (shellcode) from the object and write it down on a separate file. Does the code reveal its purpose?\n\n## TASK 6: ANALYZING THE EVIL CODE FROM THE OBJECT\n\nTry to analyze the evil code that was extracted from the suspicious object. What type of malicious code is it? What does it do?",
    "tasks_html": "<h1>Tasks</h1>\n<h2>TASK 1: OBTAIN GENERAL OVERVIEW OF THE SUSPICIOUS PDF FILES</h2>\n<p>You can find the 'under investigation' PDF files at <strong>C:\\DFP\\Labs\\Module3\\Lab5</strong>].</p>\n<p>Run the PDF id tool on both files and examine the result. Are there any clear differences between the two results? If yes, what does that indicate? Is that indication relevant to the investigation?</p>\n<h2>TASK 2: EXTRACT THE FILES METADATA</h2>\n<p>Use the origami framework to extract the metadata from both files. Compare the two outputs for any differences and try to figure out whether they are relevant to our investigation.</p>\n<h2>TASK 3: LIST THE OBJECTS IN THE SUSPICIOUS FILE</h2>\n<p>Use PDF_Parser.py to perform in-depth examination against the suspicious file and list its content.\nAfter that, try to extract what may seem interesting. After this stage, is it clear which PDF file caused the problem? What do you have to support that claim?</p>\n<h2>TASK 4: PREPARING THE EXTRACTED OBJECT FOR ANALYSIS</h2>\n<p>Although we extracted the most interesting object within the document, sometimes it may still require some work before we can analyze it. Make sure the object you extracted is analysis-ready.</p>\n<h2>TASK 5: EXTRACTING THE EVIL CODE FROM THE OBJECT</h2>\n<p>Now that the evil object and its code are in an analysis-ready state, extract the evil code (shellcode) from the object and write it down on a separate file. Does the code reveal its purpose?</p>\n<h2>TASK 6: ANALYZING THE EVIL CODE FROM THE OBJECT</h2>\n<p>Try to analyze the evil code that was extracted from the suspicious object. What type of malicious code is it? What does it do?</p>",
    "published_date": "2020-10-20T15:32:26Z",
    "solutions": "# SOLUTIONS\n\n## TASK 1: OBTAIN GENERAL OVERVIEW OF THE SUSPICIOUS PDF FILES\n\nYou can find the 'under investigation' PDF files at **C:\\\\DFP\\\\Labs\\\\Module3\\\\Lab5**].\n\nWe start off by running the **PDFID.py** script at [C:\\DFP\\Tools\\Metadata\\Docs\\pdfid_v0_2_2] on each file of the suspected PDFs, as follows.\n\n```\n# cd C:\\DFP\\Tools\\Metadata\\Docs\\pdfid_v0_2_2\n# pdfid.py filename.pdf\n```\n\nThe results will be similar to the following.\n\n![6](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/6.png)\n\n![7](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/7.png)\n\nNotice how running the tool on different files returns different results. The big difference between the number of objects is the first thing we notice. \n\nHowever, that doesn't mean anything since the two files differ in the number of pages too.\nAn interesting difference between the two results can be found in the middle of the second result. \n![9](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/9.png)\n\nThe second file contains a JavaScript object!\nThis is interesting and suspicious at the same time since a resume file has very little use to JavaScript. Something we can tell is that even though the two files are similar in structure and format, the other one doesn't contain JavaScript objects.\n\n![8](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/8.png)\n\n \n\n## TASK 2: EXTRACT THE FILES METADATA\n\nIt is worth trying to extract both files' metadata and see if we can find anything useful within. We can use exiftool once again as follows\n\n```\n# cd C:\\DFP\\Tools\\Metadata\n# \"exiftool(-k).exe\" C:\\DFP\\Labs\\Module3\\Lab5\\filename.pdf\n```\n\n![1](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/1.png)\n\n![2](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/2.png)\n\nOne thing we noticed here is that Lucy's file contains less metadata. On the other hand, Linda's file seems like a template that has been downloaded from a website. This adds another question mark on Lucy's file, in addition to the existence of JavaScript object.\n\nIn case you are a Linux fan, we could do the same using the **pdfmetadata.rb** script from the **origami framework** to examine the metadata. To do this execute the following from inside the **bin** folder of the **origami framework.**\n\n```\n#./pdfmetadata filename.pdf\n```\n\n## TASK 3: LIST THE OBJECTS IN THE MALICIOUS FILE \n\nWe can use the **pdf_parser.py** script to perform a more in-depth analysis of the PDF file. By now, we have good reasons to suspect Lucy's file. So we'll continue our in-depth analysis against it.\n\nWe'll fist start by a general examination using the **--stats** option.\n\n```\n# cd C:\\DFP\\Tools\\Metadata\\Docs\\pdf-parser_v0_6_8\n# pdf-parser.py --stats C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf\n```\n\n![10](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/10.png)\n\nThere are two things worth mentioning. First the number of total objects (150) and more importantly, the number of objects which are related to **Actions.**\n\nBy typing the command without any option, the script will show the whole file content including the header, the footer and the objects within.\n\n```\n# pdf-parser.py C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf\n```\n\n![11](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/11.png)\n\nThe output may seem too large for the console terminal to show, so it may be better to redirect the output to another text file using the '>' symbol.\n\nThe most distinguishable difference between the two files is that Lucy's contains a JavaScript object which is typically used by attackers to deliver malicious payloads.\n\nIt would be a good idea to search for that specific object and extract it for further analysis.\n\nWe can search for the JavaScript reference within the file using the **-- search JavaScript** option.\n\n```\n# pdf-parser.py --search JavaScript C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf\n```\n\n![12](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/12.png)\n\nInterestingly, the JavaScript code is one of the three objects which is related to actions.\n\nThe other object is also worthy of examination.\n\n```\n# pdf-parser.py --object=148 C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf\n```\n\n![13](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/13.png)\n\n## TASK 4: PREPARING THE EXTRACTED OBJECT FOR ANALYSIS \n\nSometimes an attacker tries to make your life harder by compressing or obfuscating the hidden payload. In order to be able to read it and fully analyze the malicious code, we may need to decompress the JavaScript content. We can do that using the **-- filter** and **-- raw** options.\n\n```\n# pdf-parser.py --object=148 --filter --raw C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf\n```\n\n![14](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/14.png)\n\nEven without an in-depth knowledge of JavaScript and before starting the malicious code analysis, we can see that something is not right.\n\nWhy would a JavaScript code, within a PDF file, want to call **cmd.exe** for?\n\n## TASK 5: EXTRACTING THE EVIL CODE FROM THE OBJECT\n\nNow that we have displayed the code in plain text, it is better to extract it to a separate file to make the analysis easier. We can do that as before using the \">\" symbol after the previous command.\n\n![15](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/15.png)\n\n## TASK 6: ANALYZING THE EVIL CODE FROM THE OBJECT\n\n![16](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/16.png)\n\nThe first code seems to be saving a file called Lucy on the victim's HDD. The **nLaunch: 0** suggests that there are no programs being launched for now.\n\nThe second script is even more interesting; the PDF seems to be launching the **CMD.exe** from the victim's machine.\n\n![17](https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/17.png)\n\nThis is definitely not something a normal resume would do.",
    "solutions_html": "<h1>SOLUTIONS</h1>\n<h2>TASK 1: OBTAIN GENERAL OVERVIEW OF THE SUSPICIOUS PDF FILES</h2>\n<p>You can find the 'under investigation' PDF files at <strong>C:\\DFP\\Labs\\Module3\\Lab5</strong>].</p>\n<p>We start off by running the <strong>PDFID.py</strong> script at [C:\\DFP\\Tools\\Metadata\\Docs\\pdfid_v0_2_2] on each file of the suspected PDFs, as follows.</p>\n<pre class=\"codehilite\"><code># cd C:\\DFP\\Tools\\Metadata\\Docs\\pdfid_v0_2_2\n# pdfid.py filename.pdf</code></pre>\n\n<p>The results will be similar to the following.</p>\n<p><img alt=\"6\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/6.png\" /></p>\n<p><img alt=\"7\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/7.png\" /></p>\n<p>Notice how running the tool on different files returns different results. The big difference between the number of objects is the first thing we notice. </p>\n<p>However, that doesn't mean anything since the two files differ in the number of pages too.\nAn interesting difference between the two results can be found in the middle of the second result. \n<img alt=\"9\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/9.png\" /></p>\n<p>The second file contains a JavaScript object!\nThis is interesting and suspicious at the same time since a resume file has very little use to JavaScript. Something we can tell is that even though the two files are similar in structure and format, the other one doesn't contain JavaScript objects.</p>\n<p><img alt=\"8\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/8.png\" /></p>\n<h2>TASK 2: EXTRACT THE FILES METADATA</h2>\n<p>It is worth trying to extract both files' metadata and see if we can find anything useful within. We can use exiftool once again as follows</p>\n<pre class=\"codehilite\"><code># cd C:\\DFP\\Tools\\Metadata\n# \"exiftool(-k).exe\" C:\\DFP\\Labs\\Module3\\Lab5\\filename.pdf</code></pre>\n\n<p><img alt=\"1\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/1.png\" /></p>\n<p><img alt=\"2\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/2.png\" /></p>\n<p>One thing we noticed here is that Lucy's file contains less metadata. On the other hand, Linda's file seems like a template that has been downloaded from a website. This adds another question mark on Lucy's file, in addition to the existence of JavaScript object.</p>\n<p>In case you are a Linux fan, we could do the same using the <strong>pdfmetadata.rb</strong> script from the <strong>origami framework</strong> to examine the metadata. To do this execute the following from inside the <strong>bin</strong> folder of the <strong>origami framework.</strong></p>\n<pre class=\"codehilite\"><code>#./pdfmetadata filename.pdf</code></pre>\n\n<h2>TASK 3: LIST THE OBJECTS IN THE MALICIOUS FILE</h2>\n<p>We can use the <strong>pdf_parser.py</strong> script to perform a more in-depth analysis of the PDF file. By now, we have good reasons to suspect Lucy's file. So we'll continue our in-depth analysis against it.</p>\n<p>We'll fist start by a general examination using the <strong>--stats</strong> option.</p>\n<pre class=\"codehilite\"><code># cd C:\\DFP\\Tools\\Metadata\\Docs\\pdf-parser_v0_6_8\n# pdf-parser.py --stats C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf</code></pre>\n\n<p><img alt=\"10\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/10.png\" /></p>\n<p>There are two things worth mentioning. First the number of total objects (150) and more importantly, the number of objects which are related to <strong>Actions.</strong></p>\n<p>By typing the command without any option, the script will show the whole file content including the header, the footer and the objects within.</p>\n<pre class=\"codehilite\"><code># pdf-parser.py C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf</code></pre>\n\n<p><img alt=\"11\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/11.png\" /></p>\n<p>The output may seem too large for the console terminal to show, so it may be better to redirect the output to another text file using the '&gt;' symbol.</p>\n<p>The most distinguishable difference between the two files is that Lucy's contains a JavaScript object which is typically used by attackers to deliver malicious payloads.</p>\n<p>It would be a good idea to search for that specific object and extract it for further analysis.</p>\n<p>We can search for the JavaScript reference within the file using the <strong>-- search JavaScript</strong> option.</p>\n<pre class=\"codehilite\"><code># pdf-parser.py --search JavaScript C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf</code></pre>\n\n<p><img alt=\"12\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/12.png\" /></p>\n<p>Interestingly, the JavaScript code is one of the three objects which is related to actions.</p>\n<p>The other object is also worthy of examination.</p>\n<pre class=\"codehilite\"><code># pdf-parser.py --object=148 C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf</code></pre>\n\n<p><img alt=\"13\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/13.png\" /></p>\n<h2>TASK 4: PREPARING THE EXTRACTED OBJECT FOR ANALYSIS</h2>\n<p>Sometimes an attacker tries to make your life harder by compressing or obfuscating the hidden payload. In order to be able to read it and fully analyze the malicious code, we may need to decompress the JavaScript content. We can do that using the <strong>-- filter</strong> and <strong>-- raw</strong> options.</p>\n<pre class=\"codehilite\"><code># pdf-parser.py --object=148 --filter --raw C:\\DFP\\Labs\\Module3\\Lab5\\Lucy2.pdf</code></pre>\n\n<p><img alt=\"14\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/14.png\" /></p>\n<p>Even without an in-depth knowledge of JavaScript and before starting the malicious code analysis, we can see that something is not right.</p>\n<p>Why would a JavaScript code, within a PDF file, want to call <strong>cmd.exe</strong> for?</p>\n<h2>TASK 5: EXTRACTING THE EVIL CODE FROM THE OBJECT</h2>\n<p>Now that we have displayed the code in plain text, it is better to extract it to a separate file to make the analysis easier. We can do that as before using the \"&gt;\" symbol after the previous command.</p>\n<p><img alt=\"15\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/15.png\" /></p>\n<h2>TASK 6: ANALYZING THE EVIL CODE FROM THE OBJECT</h2>\n<p><img alt=\"16\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/16.png\" /></p>\n<p>The first code seems to be saving a file called Lucy on the victim's HDD. The <strong>nLaunch: 0</strong> suggests that there are no programs being launched for now.</p>\n<p>The second script is even more interesting; the PDF seems to be launching the <strong>CMD.exe</strong> from the victim's machine.</p>\n<p><img alt=\"17\" src=\"https://assets.ine.com/content/ptp/lab_5_basic_PDF_and_word_document_analysis/17.png\" /></p>\n<p>This is definitely not something a normal resume would do.</p>",
    "flags": [],
    "min_points_to_pass": null,
    "access_type": "default",
    "user_status": "unstarted",
    "user_lab_status": null,
    "user_status_modified": null,
    "user_flags": [],
    "global_running_session": null
}