1 00:00:00,440 --> 00:00:01,760 Congratulations. 2 00:00:01,760 --> 00:00:08,450 By now, you have gained a solid understanding of how the binary compilation process works and have 3 00:00:08,450 --> 00:00:11,120 explored the inner workings of binaries. 4 00:00:11,330 --> 00:00:17,570 You have even delved into the realm of static disassembly using objdump, and if you've been following 5 00:00:17,570 --> 00:00:24,050 along, you may have your very own shiny new binaries sitting on your hard drive. 6 00:00:24,170 --> 00:00:31,520 Now it's time to dive into the fascinating journey of loading and executing a binary. 7 00:00:31,550 --> 00:00:39,960 A crucial topic that will pave the way for exploring dynamic analysis concepts in upcoming sections. 8 00:00:39,980 --> 00:00:46,940 While the precise details may vary depending on the platforms and binary format, the process of loading 9 00:00:46,940 --> 00:00:51,260 and executing a binary typically involves a series of fundamental steps. 10 00:00:51,260 --> 00:00:53,240 And here I draw some diagram. 11 00:00:53,240 --> 00:00:53,630 Here. 12 00:00:53,630 --> 00:00:58,550 In this diagram we are providing a glimpse of. 13 00:00:59,760 --> 00:01:08,580 Binaries into how an elf binary such as the one we just compiled in previous lecture, is represented 14 00:01:08,580 --> 00:01:11,310 in memory on a Linux based platform. 15 00:01:11,310 --> 00:01:18,030 On a high level, the process of loading a binary on Windows follows a similar pattern. 16 00:01:18,030 --> 00:01:24,930 So the process of loading a binary is so intricate and requires substantial effort from the operating 17 00:01:24,930 --> 00:01:25,380 system. 18 00:01:25,380 --> 00:01:32,580 And it's important to note that the binaries representation in memory does not necessarily mirror its 19 00:01:32,580 --> 00:01:33,960 own disk representation. 20 00:01:33,960 --> 00:01:37,710 For example, sections of zero initialized. 21 00:01:39,290 --> 00:01:45,050 Zero initialized data within the on disk binary may have compressed to conserve disk space. 22 00:01:45,050 --> 00:01:52,340 So however, when loaded into memory, these sections expand to contain the actual zero values. 23 00:01:53,460 --> 00:02:01,710 So furthermore, certain portions of the on disc binary may be recorded in memory or not loaded into 24 00:02:01,710 --> 00:02:02,390 memory at all. 25 00:02:02,400 --> 00:02:09,960 So as the intricacies of on disk versus in memory, binary representations are closely tied to a specific 26 00:02:09,960 --> 00:02:10,770 binary formats. 27 00:02:10,770 --> 00:02:16,950 We will explore this topic in greater detail in the next sections. 28 00:02:16,950 --> 00:02:20,670 For now, let's focus on the high level overview of the loading process. 29 00:02:20,670 --> 00:02:28,200 So when you decide to run binary, the operating system initiates the setup of a new process dedicated 30 00:02:28,200 --> 00:02:32,400 to running the program, complete with its own virtual address space. 31 00:02:32,610 --> 00:02:38,460 Subsequently, the operating system maps an interpreter into the virtual memory of the process. 32 00:02:39,770 --> 00:02:47,600 This interpreter, a user space program possesses the knowledge and capability to load the binary and 33 00:02:47,600 --> 00:02:51,380 perform the necessary relocations on Linux system. 34 00:02:51,380 --> 00:03:00,340 The interpreter is typically shared library known as a Linux, so here or lib one lib 2.0. 35 00:03:00,560 --> 00:03:10,520 Conversely, on Windows, the interpreter functionality is integrated into the ntdll.dll so once the 36 00:03:10,520 --> 00:03:17,000 interpreter is loaded, the kernel hands over control to it and the interpreter begins its work with 37 00:03:17,000 --> 00:03:18,740 the userspace environment. 38 00:03:18,740 --> 00:03:23,750 So the role of the interpreter is crucial in preparing the binary for execution. 39 00:03:23,750 --> 00:03:29,900 It carries out a series of tasks such as resolving symbol references, setting up the program's initial 40 00:03:29,900 --> 00:03:32,990 memory layout and performing necessary relocations. 41 00:03:33,380 --> 00:03:39,110 By delegating these responsibilities to the interpreter, the operating system can ensure a consistent 42 00:03:39,110 --> 00:03:44,910 and reliable execution environment for binaries across various platforms. 43 00:03:44,970 --> 00:03:51,990 Understanding the loading process is essential as it forms the foundation for dynamic analysis techniques 44 00:03:51,990 --> 00:03:54,660 that we will explore in next sections. 45 00:03:54,660 --> 00:04:01,140 Stay tuned for further exciting insights into dynamic analysis of binaries here. 46 00:04:01,650 --> 00:04:02,610 So. 47 00:04:04,750 --> 00:04:05,440 Here. 48 00:04:08,880 --> 00:04:13,590 What are we going to do is we will first open our Linux machine here. 49 00:04:13,590 --> 00:04:16,500 Let me change the screen here. 50 00:04:17,840 --> 00:04:19,090 Holly here. 51 00:04:23,230 --> 00:04:26,620 It's clear that console and now. 52 00:04:28,070 --> 00:04:39,680 Linux Elf binaries come with a special section called the int int ARP that specifies the path to the 53 00:04:39,680 --> 00:04:43,100 interpreter that is to be used to load the binary. 54 00:04:43,740 --> 00:04:54,090 Now we will see that with red elf with P here dot interp a of my app dot. 55 00:04:54,980 --> 00:04:55,820 All here. 56 00:04:58,430 --> 00:04:59,720 Pay that out because. 57 00:05:00,170 --> 00:05:01,370 And that's it. 58 00:05:02,620 --> 00:05:03,850 And here. 59 00:05:04,630 --> 00:05:12,520 As you can see with Rudolph, we are saying this the interpreter part of the interpreter. 60 00:05:12,820 --> 00:05:18,440 And as mentioned here, the interpreter loads the binary into its virtual address space. 61 00:05:18,460 --> 00:05:25,420 The space in which the interpreter is loaded, and it then parses the binary to find out. 62 00:05:26,100 --> 00:05:29,640 Among other things, which dynamic libraries the binary uses. 63 00:05:29,640 --> 00:05:38,340 So the interpreter maps these into virtual address space using map or an equivalent function, and then 64 00:05:38,340 --> 00:05:45,090 performs any necessary last minute relocations in the binary code sections. 65 00:05:47,310 --> 00:05:48,000 So. 66 00:05:48,390 --> 00:05:50,490 So as I said this. 67 00:05:51,930 --> 00:05:59,580 Does any last minute relocations in the binary code sections to fill in the correct addresses for references 68 00:05:59,580 --> 00:06:00,990 to the dynamic libraries. 69 00:06:01,020 --> 00:06:09,270 In reality, the process of resolving resolving references to functions in dynamic libraries is often 70 00:06:09,360 --> 00:06:11,580 deferred until later. 71 00:06:11,580 --> 00:06:16,950 So in other words, instead of resolving these references immediately at load time, the interpreter 72 00:06:16,950 --> 00:06:21,030 resolves references only when they are invoking for the first time. 73 00:06:21,030 --> 00:06:26,940 So this is known as lazy binding, which I will explain in more detail in next sections. 74 00:06:26,940 --> 00:06:33,180 And after relocation is complete, the interpreter looks up the entry point of the binary and transfers 75 00:06:33,180 --> 00:06:37,980 control to it, beginning normal execution of the binary. 76 00:06:38,070 --> 00:06:43,680 Now that you are familiar with the general anatomy of a life cycle of a binary, it's time to dive into 77 00:06:43,680 --> 00:06:47,280 the details of specific binary format. 78 00:06:47,580 --> 00:06:54,610 And we will start with the widespread Elf format, which is the subject of the next sections of our 79 00:06:54,610 --> 00:06:55,180 course. 80 00:06:55,180 --> 00:06:57,430 And I'm waiting you in next lecture.