1 00:00:00,630 --> 00:00:07,260 Now that we have a deeply explored the Elf format, now let's turn our attention to another value as 2 00:00:07,260 --> 00:00:10,860 binary format, the portable executable format. 3 00:00:11,490 --> 00:00:18,090 Understanding the P format is particularly valuable for analyzing Windows binaries, which are prevalent 4 00:00:18,090 --> 00:00:22,950 in malware analysis and Windows specific applications. 5 00:00:23,890 --> 00:00:30,700 And p portable executable can be seen as modified version of the common object file format F, which 6 00:00:30,700 --> 00:00:36,460 was previously utilized in Unix based systems before being replaced by Elf. 7 00:00:36,460 --> 00:00:45,280 And due to this historical connection, P portable executable is sometimes referred to as P slash C 8 00:00:45,280 --> 00:00:56,320 or F, and it's worth noting that the 64 bit variant of P is called P 32 plus, although there are minor 9 00:00:56,320 --> 00:01:00,430 differences between P 32 plus and the original P format. 10 00:01:00,430 --> 00:01:06,760 For simplicity, we will refer to it as P throughout our section here. 11 00:01:06,760 --> 00:01:15,580 And in this overview of the P format, we will highlight its key distinctions from Elf, providing insights 12 00:01:15,580 --> 00:01:19,480 for those interested in working with Windows binaries. 13 00:01:19,480 --> 00:01:28,160 And it's important to keep in mind that the p e shares many similarities with other formats and fortunately 14 00:01:28,160 --> 00:01:35,360 with our newfound knowledge of Elf, which you learned in previous lecture, exploring additional binary 15 00:01:35,360 --> 00:01:38,150 formats becomes a much smoother journey. 16 00:01:38,150 --> 00:01:46,070 And to facilitate our discussion and aid in visualization, I've created a diagram specifically for 17 00:01:46,070 --> 00:01:53,270 this lecture, and this diagram focuses on the key data structures defined within a win and t dot header 18 00:01:53,270 --> 00:02:00,020 file and essential header file included in the Microsoft Windows Software Development Developer kit. 19 00:02:00,110 --> 00:02:04,430 And now let's delve into the specifics of the P format here. 20 00:02:05,180 --> 00:02:11,750 As you examine the diagram, you will notice that both similarities and crucial differences compared 21 00:02:11,750 --> 00:02:12,890 to the format. 22 00:02:12,890 --> 00:02:18,520 And one significant distinction is the presence of the Ms-dos header within the P format. 23 00:02:18,530 --> 00:02:26,150 So yes, we are referring to Ms-dos, the Microsoft operating system that made its début in 1981. 24 00:02:26,180 --> 00:02:32,900 You might be wondering why such an archaic element exists in a modern binary format. 25 00:02:32,900 --> 00:02:40,910 So the answer lies in backward compatibility, and during the introduction of the P portable executable 26 00:02:40,910 --> 00:02:48,380 format, there are there was a transitional period when users were utilizing both traditional Ms-dos 27 00:02:48,380 --> 00:02:56,930 binaries and a newer P binaries, and to ease this transition and reduce confusion, every P file starts 28 00:02:56,930 --> 00:03:05,970 with an Ms-dos header, enabling it to be interpreted as an Ms-dos binary to some extent and the primary 29 00:03:05,970 --> 00:03:13,920 purpose of the Ms-dos header is to outline how to load and execute an Ms-dos stub, which follows the 30 00:03:13,920 --> 00:03:15,120 Ms-dos header here. 31 00:03:15,120 --> 00:03:22,170 And this step typically consists of a small Ms-dos program that runs instead of the main program when 32 00:03:22,170 --> 00:03:25,830 a user executes a P binary in Ms-dos mode. 33 00:03:25,830 --> 00:03:33,180 And while the Ms-dos stub program often prints a message like, this program cannot be run in Dos mode 34 00:03:33,180 --> 00:03:34,740 and then exists. 35 00:03:35,370 --> 00:03:40,020 It's important to note that in theory it could be fully functional. 36 00:03:40,020 --> 00:03:46,500 Ms-dos version of the program itself and the Ms-dos header begins with a magic value represented by 37 00:03:46,500 --> 00:03:49,740 the Ascii characters M Z. 38 00:03:49,830 --> 00:03:53,580 Hence it is sometimes referred to as an MSI header. 39 00:03:53,580 --> 00:04:00,120 For our purposes, the most important field within the Ms-dos header is the last field known as the 40 00:04:00,260 --> 00:04:10,590 l, f, a, n e w, and this field denotes the file offset at which the actual p p portable executable 41 00:04:10,620 --> 00:04:11,880 binary begins. 42 00:04:11,880 --> 00:04:20,430 And consequently when a portable executable where program loader opens the binary, it reads the Ms-dos 43 00:04:20,430 --> 00:04:21,210 header. 44 00:04:21,950 --> 00:04:29,480 And skips past it along with the Mr. Stop and proceeds directly to the start of the PE heaters. 45 00:04:29,510 --> 00:04:37,400 Now let's explore the PE heaters, which can be seen as the equivalent of Elf's executable loader. 46 00:04:37,400 --> 00:04:41,780 However, in the case of PE, the executable heater here. 47 00:04:42,230 --> 00:04:44,130 Executable file. 48 00:04:44,150 --> 00:04:47,630 Heater PE signature and PE optional heater. 49 00:04:48,680 --> 00:04:56,810 So in the case of the P the portable executable, the executable header is divided here. 50 00:04:57,600 --> 00:05:04,920 Into three distinct parts, a 32 bit signature, a P file header and a P optional. 51 00:05:04,950 --> 00:05:05,460 Header. 52 00:05:05,460 --> 00:05:11,280 And if you refer to the definitions in winrt dot header file, we encounter. 53 00:05:11,310 --> 00:05:19,320 A structure named image and headers 64, which encompasses all three components and collectively we 54 00:05:19,320 --> 00:05:26,610 can consider the entire image and the headers 64 structure as portable executables equivalent of the 55 00:05:26,610 --> 00:05:27,600 executable header. 56 00:05:27,600 --> 00:05:35,100 And however, in practice the signature file header and optional header are treated as separate entities 57 00:05:35,100 --> 00:05:42,060 and each serving a unique purpose in the overall structure of the PE format and. 58 00:05:42,940 --> 00:05:49,030 By gaining a comprehensive understanding of the PE format, we acquired the necessary insights to effectively 59 00:05:49,030 --> 00:05:50,980 analyse the Windows binaries. 60 00:05:51,010 --> 00:05:58,300 Navigating the interface of intricacies of these binaries becomes more feasible as we grasp the inner 61 00:05:58,300 --> 00:06:02,530 workings of the PE format and. 62 00:06:03,630 --> 00:06:07,380 Now what we're going to do is here we will firstly. 63 00:06:09,210 --> 00:06:14,130 Go back to Windows Machine because we will create some program here. 64 00:06:16,470 --> 00:06:20,760 And after that here, let's actually open the. 65 00:06:21,950 --> 00:06:23,630 Our windows machine here. 66 00:06:25,800 --> 00:06:26,460 Where? 67 00:06:27,220 --> 00:06:28,120 And that's it. 68 00:06:30,050 --> 00:06:30,650 Hello. 69 00:06:31,100 --> 00:06:40,490 And now let's write a simple Hello world application using C and compile it in our windows machine. 70 00:06:40,490 --> 00:06:42,760 And let me see here. 71 00:06:42,770 --> 00:06:43,360 Yes. 72 00:06:43,370 --> 00:06:44,720 Here we will create a new file. 73 00:06:44,720 --> 00:06:45,350 This is my. 74 00:06:45,920 --> 00:06:47,540 And here we will. 75 00:06:48,880 --> 00:06:51,130 Hello, world dot C. 76 00:06:52,760 --> 00:06:53,600 Let's save it. 77 00:06:54,430 --> 00:06:56,370 On desktop and here. 78 00:06:56,380 --> 00:06:58,390 So what we're going to do is. 79 00:06:59,640 --> 00:07:05,000 And include here include stdio dot h here. 80 00:07:05,010 --> 00:07:07,260 Let's actually increase the font size a little bit. 81 00:07:09,160 --> 00:07:16,960 You don't want to install any recommended plusplus extension pack because we will not develop any other 82 00:07:16,960 --> 00:07:22,990 than just a simple Hello world application here and integer main here. 83 00:07:27,220 --> 00:07:28,360 And here. 84 00:07:31,010 --> 00:07:32,000 We will again. 85 00:07:33,010 --> 00:07:34,270 As we did on the Linux. 86 00:07:34,270 --> 00:07:42,670 We created our own Hello World application, compiled it, analyze it on in previous lecture for. 87 00:07:43,610 --> 00:07:46,640 P e here and L files. 88 00:07:46,850 --> 00:07:48,230 If you remember that. 89 00:07:48,860 --> 00:07:52,130 And here we will print F print f. 90 00:07:55,940 --> 00:07:57,530 Grand theft here. 91 00:07:58,910 --> 00:07:59,630 We'll just. 92 00:07:59,660 --> 00:08:00,410 Hello? 93 00:08:01,740 --> 00:08:02,340 Well world. 94 00:08:03,360 --> 00:08:04,050 And that's it. 95 00:08:04,050 --> 00:08:08,310 After that, we will return here and that's it. 96 00:08:08,610 --> 00:08:12,480 Now we will open our cmd here. 97 00:08:12,510 --> 00:08:15,060 Let's also increase the font size a little bit to. 98 00:08:17,930 --> 00:08:19,130 Let's give it a color. 99 00:08:21,930 --> 00:08:26,100 And here we will go to the desktop here in desktop. 100 00:08:30,020 --> 00:08:30,590 In this graph. 101 00:08:30,590 --> 00:08:36,080 As you can see, we have Hello world dot C and now we will compile it with GCC. 102 00:08:40,020 --> 00:08:45,600 And as you can see, we have error here because we forgot to. 103 00:08:46,090 --> 00:08:48,060 Oops, we forgot to. 104 00:08:49,180 --> 00:08:50,920 Add the comma here after return. 105 00:08:50,920 --> 00:08:51,430 Sorry. 106 00:08:51,430 --> 00:08:51,970 Here. 107 00:08:52,060 --> 00:08:53,200 And that's it. 108 00:08:53,290 --> 00:08:58,660 Now we will use the Hello world dot C and as you can see, our program is compiled. 109 00:08:58,810 --> 00:09:01,190 This is our dot exit here. 110 00:09:01,210 --> 00:09:02,830 Now we will. 111 00:09:04,990 --> 00:09:10,450 And here, as you can see, we created our Hello World program and we will further analyze this program 112 00:09:10,450 --> 00:09:13,090 on Linux using the objdump. 113 00:09:13,420 --> 00:09:17,680 And however, this file can also be analyzed in Windows system. 114 00:09:17,770 --> 00:09:24,940 But since the Objdump tool comes pre-installed in Linux, we will use it right now. 115 00:09:24,940 --> 00:09:25,840 So we will. 116 00:09:27,040 --> 00:09:31,810 Copy this and we will go back to our Linux machine here. 117 00:09:32,560 --> 00:09:33,610 Holly here. 118 00:09:38,680 --> 00:09:40,210 Enter the password. 119 00:09:41,420 --> 00:09:44,060 We'll copy it right on the desktop. 120 00:09:46,370 --> 00:09:51,710 And so the apex, if we run if we open this, we have some of the data. 121 00:09:51,890 --> 00:09:54,470 Data which you will learn all of this here. 122 00:09:54,500 --> 00:09:56,120 We have also BC. 123 00:09:57,890 --> 00:10:01,610 And now what we're going to do is we will open the terminal. 124 00:10:11,770 --> 00:10:13,930 We'll write the Objdump. 125 00:10:15,470 --> 00:10:19,250 The x here and a x here. 126 00:10:19,490 --> 00:10:23,120 And as you can see, we don't have any file because we are not in the right directory. 127 00:10:23,150 --> 00:10:28,220 We will see the printer directory to desktop and objdump. 128 00:10:30,290 --> 00:10:31,790 With x parameter. 129 00:10:31,790 --> 00:10:33,620 And after that, hello dot x. 130 00:10:34,040 --> 00:10:38,240 And here, as you can see, we have a lot of information going on here. 131 00:10:38,630 --> 00:10:49,250 And the signature is simply a string containing the Ascii characters key, followed by the two new characters 132 00:10:49,250 --> 00:10:49,790 here. 133 00:10:50,750 --> 00:10:58,640 And it is analogous to the magic bytes in the ident field in the Elf executable header. 134 00:10:59,140 --> 00:11:05,450 Um, if you remember, we tested that in the previous section and the file header here describes the 135 00:11:05,450 --> 00:11:11,690 general properties of the file and the most important fields are here. 136 00:11:13,490 --> 00:11:14,060 Actually. 137 00:11:14,960 --> 00:11:15,950 Down a little bit. 138 00:11:17,280 --> 00:11:18,900 That's actually a copy this because. 139 00:11:20,970 --> 00:11:23,760 We will need to analyze it further. 140 00:11:54,250 --> 00:11:56,080 Well, copy all of it here. 141 00:11:56,320 --> 00:11:56,950 Now. 142 00:12:12,540 --> 00:12:16,200 And now we will simply open the mouse path here. 143 00:12:19,280 --> 00:12:21,710 And let's save this as a. 144 00:12:26,170 --> 00:12:26,830 Analyze that. 145 00:12:26,830 --> 00:12:28,480 See, because we want. 146 00:12:29,130 --> 00:12:31,560 The color colors here. 147 00:12:35,260 --> 00:12:35,680 That's it. 148 00:12:38,040 --> 00:12:43,680 And after that, let's turn our terminal font size back to normal. 149 00:12:55,370 --> 00:13:00,650 And here, as you can see here, the most important fields here are machine. 150 00:13:03,040 --> 00:13:04,810 Number of. 151 00:13:07,940 --> 00:13:09,140 Here and. 152 00:13:10,250 --> 00:13:11,360 And here. 153 00:13:12,080 --> 00:13:13,490 Also have the. 154 00:13:16,950 --> 00:13:18,090 Characteristics. 155 00:13:19,010 --> 00:13:24,230 Which is, as you can see, we also have this hex number, relocation stripped, executable line number 156 00:13:24,230 --> 00:13:27,070 strings and 32 bit words here. 157 00:13:27,080 --> 00:13:33,890 And the two fields describing the symbol table are deprecated and the P files should no longer make 158 00:13:33,890 --> 00:13:37,160 use of embedded symbols and debugging information. 159 00:13:37,160 --> 00:13:45,620 And instead these symbols are optionally emitted as part of the separate debug file and as an elf's 160 00:13:45,890 --> 00:13:48,080 machine, the machine field here. 161 00:13:50,090 --> 00:13:50,570 Yeah. 162 00:13:50,990 --> 00:13:52,220 Machine field here. 163 00:13:57,010 --> 00:13:59,470 What but the be that we will go to. 164 00:14:04,640 --> 00:14:05,200 It is. 165 00:14:13,080 --> 00:14:16,800 USR include elf dot a header here. 166 00:14:17,040 --> 00:14:18,480 We'll go to a machine. 167 00:14:21,460 --> 00:14:25,510 And as you can see here, here, this is as you can see, its architecture. 168 00:14:25,720 --> 00:14:30,070 And this in machine here in our life. 169 00:14:30,100 --> 00:14:37,000 This machine field describes the architecture of the machine for which the P file is intended, and 170 00:14:37,000 --> 00:14:39,040 in this case, the. 171 00:14:40,010 --> 00:14:42,140 Excited here. 172 00:14:43,800 --> 00:14:46,560 X 86 and. 173 00:14:48,970 --> 00:15:00,970 64 and which you will you will find this x 60, x 86 and 64 because these here defined as constants 174 00:15:01,000 --> 00:15:05,110 0X0 86 or 0 x. 175 00:15:06,100 --> 00:15:07,390 86 here. 176 00:15:09,980 --> 00:15:13,250 Which you will see at the beginning of the file. 177 00:15:14,770 --> 00:15:16,600 And as you can see, its architecture is different. 178 00:15:16,600 --> 00:15:17,320 That's why. 179 00:15:18,880 --> 00:15:25,120 We didn't return it and file format Pi 386 here in this case. 180 00:15:25,120 --> 00:15:37,990 But if your architecture is 8664 bit, you will see this 86 and 6484 and the last four hex numbers. 181 00:15:39,280 --> 00:15:40,960 And here we also. 182 00:15:41,940 --> 00:15:48,420 Have the number of section fields, that number of sections. 183 00:15:56,750 --> 00:15:58,820 Number of section. 184 00:16:03,600 --> 00:16:11,670 And this number of section field is simply the number of entries in the section header table and the 185 00:16:11,670 --> 00:16:18,630 size of opt optional header is the size in bytes of the optional header that follows the file header 186 00:16:18,630 --> 00:16:21,540 and the characteristics field here. 187 00:16:23,160 --> 00:16:24,000 Characteristics. 188 00:16:24,180 --> 00:16:24,720 Feel. 189 00:16:25,140 --> 00:16:28,950 You can see here zero x 107. 190 00:16:30,960 --> 00:16:35,160 Uh, this field lacks the mark here. 191 00:16:36,120 --> 00:16:43,380 And describing things such as the Indians of the binary. 192 00:16:43,410 --> 00:16:52,680 We have this relocation stripped, executable line number stripped 32 bit words and whether it's a DLL 193 00:16:52,680 --> 00:16:58,500 and whether it has been stripped as shown in our objdump output. 194 00:16:58,500 --> 00:17:05,160 The example binary contains characteristics flags that mark it as a large address. 195 00:17:05,190 --> 00:17:11,250 Now the relocation and executable um executable file here. 196 00:17:13,590 --> 00:17:17,250 And it's a 32 bit words, right? 197 00:17:18,080 --> 00:17:23,510 And it's a line numbers also also the line numbers is stripped which you will you will learn what it 198 00:17:23,510 --> 00:17:25,610 is in next lecture. 199 00:17:25,610 --> 00:17:32,480 And despite what the name suggests, the PE optional header is not really optional for executable. 200 00:17:32,490 --> 00:17:38,930 So it may be missing in object files and in fact you will likely find the portable executable optional 201 00:17:38,930 --> 00:17:42,580 header in any portable executable you will encounter. 202 00:17:42,590 --> 00:17:42,940 Right. 203 00:17:42,950 --> 00:17:51,380 So it contains contains lots of fields and I will go over the most important ones in next lecture. 204 00:17:51,380 --> 00:17:52,090 I'm waiting you in. 205 00:17:52,110 --> 00:17:52,850 Next lecture.