1 00:00:00,540 --> 00:00:02,220 Hello, my name is Dave Boone. 2 00:00:02,250 --> 00:00:09,090 Let's delve into the intriguing realm of symbols and stripped binaries in high level source code such 3 00:00:09,090 --> 00:00:10,200 as C code. 4 00:00:10,230 --> 00:00:17,550 We encounter functions and variables with meaningful, human readable names during the compilation process. 5 00:00:17,550 --> 00:00:26,580 Compilers generate symbols which serve as references to keep track of these symbols symbolic names. 6 00:00:26,580 --> 00:00:33,960 So symbols record the correspondence between binary code data and each symbols meaning. 7 00:00:33,990 --> 00:00:41,580 For instance, function symbols provide vital information by mapping high level function names to their 8 00:00:41,580 --> 00:00:43,880 respective addresses and sizes. 9 00:00:43,890 --> 00:00:51,960 This information proves invaluable to the linker when combining object files, resolving functions and 10 00:00:51,960 --> 00:00:57,210 variables, references between modules and aiding in the debugging process. 11 00:00:57,660 --> 00:00:59,250 And here. 12 00:01:00,440 --> 00:01:09,230 Uh, to give you an idea of what symbolic information looks like here, we're going to do red elf s 13 00:01:09,260 --> 00:01:10,060 y m. 14 00:01:10,100 --> 00:01:12,020 S a dot out here. 15 00:01:12,260 --> 00:01:17,320 And this is this is how symbolic information looks like. 16 00:01:17,330 --> 00:01:21,650 So here we used red elf. 17 00:01:21,910 --> 00:01:31,010 Red elf tool to display the symbols, and you will return to using this red utility in next lectures 18 00:01:31,010 --> 00:01:33,810 and interpreting all its output. 19 00:01:33,830 --> 00:01:42,080 For now, just to keep in mind, among many unfamiliar symbols, there is a symbol for the main function 20 00:01:42,080 --> 00:01:42,980 here. 21 00:01:43,340 --> 00:01:45,680 It should be somewhere. 22 00:01:46,310 --> 00:01:48,710 Main symbol for main function. 23 00:01:56,120 --> 00:02:02,360 My FC abs and it should be 32 object. 24 00:02:02,360 --> 00:02:07,880 And here we're going to write this type is going to has to be function. 25 00:02:09,770 --> 00:02:11,660 It should be global. 26 00:02:13,230 --> 00:02:14,580 Function global. 27 00:02:15,370 --> 00:02:16,540 Function global. 28 00:02:16,900 --> 00:02:17,650 Here. 29 00:02:18,510 --> 00:02:33,570 And here yes, the sizes 37 type is function global and default and the index is 15. 30 00:02:33,570 --> 00:02:36,780 Here the name is Main. 31 00:02:37,200 --> 00:02:42,990 So here you can see that it specifies this address here. 32 00:02:45,530 --> 00:02:50,960 All right at which Main will reside when the binary is loaded into memory. 33 00:02:50,960 --> 00:02:54,700 So the output also shows the size of Main. 34 00:02:54,710 --> 00:03:02,450 In this case it's 37 bytes and indicates that you are dealing with a function symbol. 35 00:03:02,450 --> 00:03:10,070 And here we have this type function here and the symbolic information can be emitted as part of the 36 00:03:10,070 --> 00:03:16,100 binary itself, as you witnessed earlier, or it can be generated separately in the form of a symbol 37 00:03:16,100 --> 00:03:16,640 file. 38 00:03:16,640 --> 00:03:23,360 So a symbolic information comes in various flavors, ranging from basic symbols required by the linker 39 00:03:23,360 --> 00:03:26,000 to more extensive debugging symbols. 40 00:03:26,000 --> 00:03:32,390 So debugging symbols provide a comprehensive mapping between the source code lines and corresponding 41 00:03:32,390 --> 00:03:39,170 minor level instructions so they go beyond simple address mappings and even describe function parameters, 42 00:03:39,170 --> 00:03:41,240 stack frame information and more. 43 00:03:41,990 --> 00:03:52,490 For elf binaries, elf binaries debugging symbols are typically generated in a dwarf format, while 44 00:03:52,520 --> 00:03:59,870 p binaries like Windows binaries usually use the proprietary Microsoft Portable debugging PDB format, 45 00:03:59,870 --> 00:04:08,150 so the information is usually embedded within the binary, while the PDB comes in the form of a separate 46 00:04:08,150 --> 00:04:11,000 symbol file here. 47 00:04:12,240 --> 00:04:16,170 And symbolic information plays a crucial role in binary analysis. 48 00:04:16,170 --> 00:04:21,930 So, for example, having a well defined set of function symbols greatly simplifies the disassembly 49 00:04:21,930 --> 00:04:22,470 process. 50 00:04:22,470 --> 00:04:28,590 So each function symbol serves as a starting point, ensuring accurate disassembly and minimizing the 51 00:04:28,590 --> 00:04:36,060 risk of mistakenly disassembling data as code, which which would result in incorrect instructions in 52 00:04:36,060 --> 00:04:37,980 the disassembly output. 53 00:04:38,010 --> 00:04:43,500 Understanding which parts of a binary belong to a specific functions and knowing their name facilities 54 00:04:43,500 --> 00:04:46,830 comprehension for human reverse engineers. 55 00:04:46,830 --> 00:04:53,450 It allows them to compartmentalize and comprehend the code's purpose. 56 00:04:53,460 --> 00:04:59,850 Even basic linker symbols, though less extensive than debugging information, provide significant assistance 57 00:04:59,850 --> 00:05:08,040 in numerous binary analysis applications, and to parse symbols, you can utilize tools like Read Elf. 58 00:05:08,040 --> 00:05:13,530 Like we use this here in this lecture as I mentioned. 59 00:05:14,250 --> 00:05:18,840 Or you can also employ libraries such as Lib. 60 00:05:19,510 --> 00:05:21,970 Lib feet here, I think. 61 00:05:21,970 --> 00:05:22,450 Yeah. 62 00:05:22,960 --> 00:05:32,410 Lib feet here, which is this is a separate library that we will use um, in two sections later. 63 00:05:32,410 --> 00:05:41,260 And additionally there are symbolized libraries like Lib Dwarf, like lib dwarf, and specifically designed 64 00:05:41,260 --> 00:05:44,260 for parsing dwarf debug symbols. 65 00:05:44,260 --> 00:05:52,420 However, in this course we will not extensively cover these libraries, but basically we will make 66 00:05:52,420 --> 00:05:54,400 this tutorial for this library. 67 00:05:54,400 --> 00:06:01,600 So regrettably, extensive debugging information is typically excluded from production ready binaries. 68 00:06:01,600 --> 00:06:07,810 In fact, even basic symbolic information is often stripped away to reduce file sizes and hinder reverse 69 00:06:07,810 --> 00:06:11,950 engineering, particularly in the case of a malware or proprietary software. 70 00:06:11,980 --> 00:06:18,460 This means that as a binary analyst, you frequently encounter the more challenging scenario of stripped 71 00:06:18,460 --> 00:06:22,940 binaries devoid of any single symbolic information. 72 00:06:22,940 --> 00:06:29,960 So throughout this course, I take into considerations the absence of symbolic information and focus 73 00:06:29,960 --> 00:06:34,460 on analyzing stripped binaries except when explicitly noted otherwise. 74 00:06:34,460 --> 00:06:40,730 So this approach prepares you for real world scenarios where you may have to tackle the complexities 75 00:06:40,730 --> 00:06:44,060 of reverse engineering without the aid of symbols. 76 00:06:44,060 --> 00:06:46,040 And I'm meeting you in the next lecture.