1 00:00:00,540 --> 00:00:01,740 Hello, my name is Typhoon. 2 00:00:01,740 --> 00:00:06,660 And now let's delve into the fascinating process of disassembling a binary. 3 00:00:06,840 --> 00:00:12,810 Having witnessed how a binary is compiled, it's time to explore the contents of the object file generated 4 00:00:12,810 --> 00:00:15,430 during the assembly phase of compilation. 5 00:00:15,450 --> 00:00:21,810 Subsequently, I will guide you through the disassembly of the main binary executable, highlighting 6 00:00:21,810 --> 00:00:26,610 the distinctions between its contents and those of the object file. 7 00:00:26,640 --> 00:00:33,540 So this will provide you with a clearer understanding of what resides within an object file and what 8 00:00:33,540 --> 00:00:35,700 is added during the linking phase. 9 00:00:35,730 --> 00:00:41,220 To facilitate our disassembly journey, we will utilize objdump. 10 00:00:41,250 --> 00:00:41,930 Utility. 11 00:00:41,940 --> 00:00:48,040 It's a simple and user friendly disassembler that comes bundled in most Linux distributions. 12 00:00:48,060 --> 00:00:57,210 For now we will rely on Objdump object dump to gain quick insights into the code and data encapsulated 13 00:00:57,210 --> 00:00:59,040 within a binary. 14 00:00:59,810 --> 00:01:08,840 And here what we're going to do here is we'll firstly run this red elf again or the Objdump here. 15 00:01:09,170 --> 00:01:11,090 Objdump. 16 00:01:12,520 --> 00:01:19,900 And SG here aero data and after that we will enter our data file. 17 00:01:19,900 --> 00:01:21,790 In this case my app.au. 18 00:01:24,630 --> 00:01:25,900 And here we will click. 19 00:01:26,670 --> 00:01:27,390 Press enter. 20 00:01:27,390 --> 00:01:28,440 And that's it. 21 00:01:28,440 --> 00:01:32,760 So here we have this output. 22 00:01:33,460 --> 00:01:43,720 And let's try another now another command with this here Objdump uppercase M and let's actually write 23 00:01:43,720 --> 00:01:44,230 it again. 24 00:01:44,230 --> 00:01:46,930 So Objdump uppercase. 25 00:01:46,930 --> 00:01:48,580 M Intel D here. 26 00:01:48,580 --> 00:01:54,340 And after that we will again pass the whole file here and that's it. 27 00:01:54,340 --> 00:01:59,860 So here, if you look at carefully here, so. 28 00:02:01,120 --> 00:02:02,110 You will see. 29 00:02:02,320 --> 00:02:06,550 I called this Objdump twice. 30 00:02:06,790 --> 00:02:08,080 First here. 31 00:02:09,110 --> 00:02:17,210 I tell Objdump to show the contents of the dot or R or data section. 32 00:02:17,210 --> 00:02:23,990 So this stands for read only data and it's part of the binary where all constants are stored. 33 00:02:23,990 --> 00:02:26,050 So including this hello world here. 34 00:02:26,060 --> 00:02:26,320 Right? 35 00:02:26,330 --> 00:02:29,150 So we defined this. 36 00:02:29,150 --> 00:02:29,820 Hello world. 37 00:02:29,840 --> 00:02:32,420 Let me actually find a C file here. 38 00:02:32,420 --> 00:02:34,040 So we define this. 39 00:02:34,040 --> 00:02:36,350 Hello world as constants, right? 40 00:02:36,350 --> 00:02:38,900 We define this macro here. 41 00:02:39,020 --> 00:02:46,490 So here this is a read only data section of our binary file. 42 00:02:46,490 --> 00:02:48,320 And here. 43 00:02:49,310 --> 00:02:56,510 I will return to a more detailed discussion of this error data and on the other sections of Elf binaries 44 00:02:56,510 --> 00:03:01,630 in next lectures, which we will also learn about Elf binary format. 45 00:03:01,640 --> 00:03:10,580 So for now you can see that the contents of error data here consists of Ascii encoding here. 46 00:03:17,450 --> 00:03:20,150 But we can see the Ascii encoding of the string. 47 00:03:21,380 --> 00:03:24,500 And we also so this is the left side output. 48 00:03:24,830 --> 00:03:31,400 On the right side, you can see it's the human readable representations of Jews. 49 00:03:31,400 --> 00:03:32,490 Same bias. 50 00:03:32,510 --> 00:03:37,190 Now let's look again at here we have this comma between the Hello world here. 51 00:03:37,190 --> 00:03:42,380 I don't know why, but it's another mistyped here, probably. 52 00:03:42,960 --> 00:03:43,650 And. 53 00:03:45,010 --> 00:03:52,510 So we are seeing this comma between this hello world because we defined Hello world like that. 54 00:03:53,030 --> 00:03:57,430 I mistyped the comma instead of space here. 55 00:03:57,430 --> 00:03:58,540 Sorry for that. 56 00:03:59,860 --> 00:04:01,480 And here. 57 00:04:01,750 --> 00:04:02,590 That's it. 58 00:04:02,740 --> 00:04:03,550 So. 59 00:04:05,180 --> 00:04:12,980 You may wonder why the call that should reference Potts seems to point into the middle of the main function. 60 00:04:13,130 --> 00:04:21,410 So as I mentioned before, data and code references from object files are not yet fully resolved during 61 00:04:21,410 --> 00:04:27,380 compilation, so the compiler lacks knowledge of the base address at which the file will eventually 62 00:04:27,380 --> 00:04:28,240 be loaded. 63 00:04:28,250 --> 00:04:33,260 Consequently, the call to Potts in the object file remain unsolved. 64 00:04:33,290 --> 00:04:39,680 It awaits the linker to fill in the correct value for this reference. 65 00:04:39,680 --> 00:04:49,580 So you can verify this by employing read elf, which reveals all the relocation symbols present in the 66 00:04:49,580 --> 00:04:50,660 object file. 67 00:04:52,120 --> 00:04:52,840 And. 68 00:04:54,070 --> 00:04:54,910 Here. 69 00:04:56,260 --> 00:04:58,390 We can in second call here. 70 00:04:58,890 --> 00:04:59,260 Um. 71 00:05:00,480 --> 00:05:03,720 With this objdump here. 72 00:05:03,720 --> 00:05:09,240 Disassembles all the code in the object file in Intel syntax, as we mentioned here. 73 00:05:09,720 --> 00:05:14,400 As you can see, it contains only code of the main function. 74 00:05:15,710 --> 00:05:17,750 All right, so. 75 00:05:19,220 --> 00:05:23,150 Uh, this because that's only function defined in the source file. 76 00:05:23,150 --> 00:05:23,510 Right? 77 00:05:23,510 --> 00:05:25,600 So let's try that again here. 78 00:05:25,610 --> 00:05:26,330 As you can see. 79 00:05:26,360 --> 00:05:26,960 Oops, not. 80 00:05:41,110 --> 00:05:48,370 As you can see here, we have this only function integer main function in our program and that's why 81 00:05:48,370 --> 00:05:49,690 we are seeing this. 82 00:05:51,290 --> 00:05:52,610 On the main function here. 83 00:05:54,720 --> 00:06:01,020 So for the most part, the output conforms pretty closely to the assembly code previously produced by 84 00:06:01,020 --> 00:06:02,490 the compilation phase. 85 00:06:02,730 --> 00:06:05,210 Give or take a few the assembly level macros here. 86 00:06:06,300 --> 00:06:15,450 What's interesting to note that is the the pointer to the Hello world string at here. 87 00:06:16,700 --> 00:06:17,930 Eddie here. 88 00:06:21,120 --> 00:06:22,560 This call here. 89 00:06:24,640 --> 00:06:27,000 And we also have this ad. 90 00:06:29,410 --> 00:06:29,940 R d. 91 00:06:29,950 --> 00:06:30,790 A r. 92 00:06:30,820 --> 00:06:31,990 A x. 93 00:06:33,010 --> 00:06:33,700 Here. 94 00:06:35,430 --> 00:06:38,490 This year is set to zero. 95 00:06:38,490 --> 00:06:46,050 So and here the this call that should print the string to the screen using paths here. 96 00:06:56,120 --> 00:07:02,000 Also points to a non-essential location here, as you can see. 97 00:07:02,090 --> 00:07:03,770 So but. 98 00:07:05,360 --> 00:07:07,880 We have the offset of. 99 00:07:08,870 --> 00:07:11,750 I think there's one here. 100 00:07:12,500 --> 00:07:12,980 So. 101 00:07:15,380 --> 00:07:16,640 This prints too. 102 00:07:16,670 --> 00:07:20,810 This shows some nonsensical location to us. 103 00:07:21,540 --> 00:07:30,300 And why does the car that should reference Potts point instead into the middle of a main right and I 104 00:07:30,300 --> 00:07:36,120 previously mentioned that the data and code references from object files are not yet fully resolved 105 00:07:36,120 --> 00:07:42,690 because the compiler doesn't know at what base address the file will eventually be loaded. 106 00:07:42,690 --> 00:07:48,780 That's why the call to the for the linker to fill in the correct value for this reference. 107 00:07:49,720 --> 00:07:54,460 And here we will use the red elf here. 108 00:07:55,880 --> 00:07:57,200 Red elf. 109 00:07:57,200 --> 00:07:58,640 Relax here. 110 00:08:05,740 --> 00:08:10,540 Relax and we will also again use the my app. 111 00:08:11,450 --> 00:08:12,070 That all? 112 00:08:14,840 --> 00:08:15,740 And that's it. 113 00:08:15,740 --> 00:08:16,780 This is our output. 114 00:08:16,790 --> 00:08:19,970 Using the red elf here, we use the relax parameter. 115 00:08:19,970 --> 00:08:21,980 So the relocation. 116 00:08:21,980 --> 00:08:24,680 This is a relocation symbol at here. 117 00:08:26,050 --> 00:08:32,320 Has the linker that it should resolve the reference to the string to point to whatever address it ends 118 00:08:32,320 --> 00:08:35,560 up at in the air or data. 119 00:08:36,820 --> 00:08:37,330 Section. 120 00:08:37,330 --> 00:08:43,630 So similarly, the line marked line, the second line after this offset here. 121 00:08:43,810 --> 00:08:45,670 1A1. 122 00:08:46,830 --> 00:08:47,640 A one. 123 00:08:47,670 --> 00:08:48,480 A1E. 124 00:08:48,510 --> 00:08:49,170 Here. 125 00:08:50,080 --> 00:08:50,830 So. 126 00:08:51,530 --> 00:08:52,340 Here. 127 00:08:52,340 --> 00:08:53,480 Similarly the. 128 00:08:54,290 --> 00:08:59,450 This line tells the linker how to resolve the calls to pots. 129 00:08:59,540 --> 00:08:59,990 Right. 130 00:08:59,990 --> 00:09:05,000 So you may notice that the value for here. 131 00:09:06,010 --> 00:09:10,840 The the first one is zero and the second one is four being subtracted. 132 00:09:12,070 --> 00:09:13,810 From the parts symbol. 133 00:09:14,630 --> 00:09:16,640 So you can ignore that for now. 134 00:09:16,670 --> 00:09:23,590 So the way the linker computes, relocation is a bit involved and the relative output can be confusing 135 00:09:23,600 --> 00:09:24,470 in most cases. 136 00:09:24,470 --> 00:09:31,310 So I will just gloss over the details of relocation here and focus on the bigger picture of this assembly. 137 00:09:32,360 --> 00:09:33,900 A binary instead. 138 00:09:33,920 --> 00:09:39,920 So I will provide more information about relocation symbols in next sections of our course. 139 00:09:41,450 --> 00:09:47,600 And the leftmost column of each line in the output. 140 00:09:48,770 --> 00:09:51,900 Uh, is the offset in the object file here? 141 00:09:52,940 --> 00:09:55,440 Uh, where the resolved reference must be filled in. 142 00:09:55,460 --> 00:10:02,150 So if you are paying close attention, you may notice that in both cases it's equal to the offset of 143 00:10:02,150 --> 00:10:05,850 the instructions that needs to be fixed plus one. 144 00:10:05,870 --> 00:10:11,420 For instance, the call to pots is at code offset. 145 00:10:11,420 --> 00:10:14,120 Here is one. 146 00:10:14,940 --> 00:10:17,790 A1A here, right? 147 00:10:20,160 --> 00:10:25,020 And this is because you only want to override the operand of the instruction. 148 00:10:25,020 --> 00:10:28,350 So not the opcode of the instruction itself. 149 00:10:28,350 --> 00:10:34,530 So it just so happens that for both instructions that need fixing up the opcode is one byte long. 150 00:10:34,530 --> 00:10:37,620 So it point to the instruction operand. 151 00:10:37,950 --> 00:10:42,150 The relocation symbol needs to skip past the opcode byte.