1 00:00:00,490 --> 00:00:06,310 Binary analysis revolves around the examination and evaluation of binaries. 2 00:00:06,340 --> 00:00:12,100 To understand this field better, let's delve into the fundamental concepts of binary formats and life 3 00:00:12,100 --> 00:00:13,720 cycle of binaries. 4 00:00:13,750 --> 00:00:20,410 Once you grasp these concepts, you will be well prepared to tackle the subsequent sections which we 5 00:00:20,410 --> 00:00:25,450 will delve into Elf and P binaries in next sections. 6 00:00:25,450 --> 00:00:32,230 So these two formats are extensively utilized on Linux and Windows Systems, making them crucial for 7 00:00:32,230 --> 00:00:36,390 comprehensive binary analysis in the realm of modern computing. 8 00:00:36,400 --> 00:00:44,320 Computations are carried out using the binary numerical systems, which represents numbers as sequences 9 00:00:44,320 --> 00:00:46,390 of ones and zeros. 10 00:00:46,420 --> 00:00:53,530 The binary code, also known as machine code, is a language executed by these computer systems. 11 00:00:53,530 --> 00:01:02,690 So a program comprises a combination of machine instructions which, as the name and binary code and 12 00:01:02,690 --> 00:01:10,790 data such as variables and constants to manage the multitude of programs on a given system effectively, 13 00:01:10,790 --> 00:01:20,360 it becomes essential to store the code and data belonging to each program within a single fine and self-contained 14 00:01:20,360 --> 00:01:20,960 file. 15 00:01:20,960 --> 00:01:28,250 So these files contain executable binary programs are referred as binary executable files or simply 16 00:01:28,250 --> 00:01:28,670 binaries. 17 00:01:28,670 --> 00:01:36,080 So the primary objective of this course is to of this section is to explore and analyze these binaries 18 00:01:36,080 --> 00:01:40,630 comprehensively for malware analysis and reverse engineering. 19 00:01:40,640 --> 00:01:48,410 So before we dive into the intricacies of binary formats like Elf and PE, let's begin with a broad 20 00:01:48,410 --> 00:01:53,240 overview of how executable binaries are generated from source code. 21 00:01:53,270 --> 00:02:00,320 Following that, we will dissect a sample binary, allowing you to gain a solid understanding of the 22 00:02:00,320 --> 00:02:04,850 code and data encapsulated within binary files. 23 00:02:04,880 --> 00:02:13,040 Armed with this knowledge, you will proceed to explore Elf and P binaries in next sections and also 24 00:02:13,040 --> 00:02:13,670 in next sections. 25 00:02:13,670 --> 00:02:20,240 You will have the opportunity to construct your own binary loader, enabling you to parse and analyze 26 00:02:20,240 --> 00:02:20,930 binaries. 27 00:02:20,930 --> 00:02:26,330 So firstly we will understand this compilation phase. 28 00:02:26,330 --> 00:02:32,450 So the production of binaries involves a process called compilation, right? 29 00:02:32,450 --> 00:02:41,600 Which translates human readable source code such as C or C plus plus into machine executable by processor. 30 00:02:41,600 --> 00:02:43,790 So here we are. 31 00:02:43,790 --> 00:02:51,200 I draw some diagram and this illustrates an overview of these steps typically involved in the compilation 32 00:02:51,200 --> 00:02:53,690 process for C code or C plus plus code. 33 00:02:53,690 --> 00:03:04,700 Similar steps apply to some languages as well and compiling C code in encompasses four phases here the 34 00:03:04,700 --> 00:03:08,270 preprocessing compilation assembly. 35 00:03:08,880 --> 00:03:09,690 And linking. 36 00:03:09,690 --> 00:03:15,630 So interestingly, one of these phases is also known as compilation and creating a particular overlap 37 00:03:15,630 --> 00:03:17,310 in terminology. 38 00:03:17,360 --> 00:03:21,330 Other modern compilers often merge some or all of these phases. 39 00:03:21,330 --> 00:03:26,490 We will examine them individually for the sake of clarity and demonstration. 40 00:03:26,490 --> 00:03:32,760 So firstly we have the preprocessing, which you will also learn more deeply in next lectures. 41 00:03:32,760 --> 00:03:38,100 So this initial phase involves the preprocessing of the source code. 42 00:03:38,130 --> 00:03:45,810 It includes the operations such as macro expansions, file inclusion and conditional compilation. 43 00:03:45,810 --> 00:03:52,920 So Preprocessing ensures that the code is ready for subsequent stages of compilation. 44 00:03:52,920 --> 00:03:55,770 We also have the compilation in this phase. 45 00:03:55,770 --> 00:04:02,310 The Preprocessed source code is translated into assembly language specific to the target architecture, 46 00:04:02,310 --> 00:04:10,420 so the compiler analyzes the code syntax checks for errors and generates assembly instructions accordingly. 47 00:04:11,370 --> 00:04:13,500 In third we have assembly. 48 00:04:13,680 --> 00:04:20,100 During assembly, the compiler further translates the assembly code into the object code consisting 49 00:04:20,100 --> 00:04:25,230 of binary instructions and data representations specific to the target processor. 50 00:04:25,260 --> 00:04:32,700 This step involves resolving memory addresses and generating relocation information and lastly linking. 51 00:04:33,180 --> 00:04:39,270 Finally, the linking phase combines the generated object code with the necessary libraries and object 52 00:04:39,270 --> 00:04:43,530 files to create a fully functional binary executable. 53 00:04:43,560 --> 00:04:50,640 The linker resolves symbol references, performs memory address fixing and generates the final binary 54 00:04:50,670 --> 00:04:53,910 file ready for execution. 55 00:04:54,210 --> 00:05:02,190 And it's important to note that although the four phases mentioned here are typically distinct, modern 56 00:05:02,190 --> 00:05:08,910 compilers often optimize and streamline the process by merging some or all of these steps. 57 00:05:08,910 --> 00:05:16,120 So nonetheless, understanding each phase individually provides a solid foundation of for comprehending 58 00:05:16,120 --> 00:05:19,360 the intricacies of the binary analysis process. 59 00:05:19,360 --> 00:05:26,110 And by familiarizing yourself with the compilation process and the anatomy of binaries, you are now 60 00:05:26,110 --> 00:05:30,820 equipped to delve deeper into specific formats of Elf and P binaries. 61 00:05:30,820 --> 00:05:37,900 So these values formats hold a wealth of information waiting to be explored and analyzed, offering 62 00:05:37,900 --> 00:05:43,960 valuable insights into the inner workings of executable programs. 63 00:05:44,990 --> 00:05:47,810 And here we have this diagram here. 64 00:05:47,810 --> 00:05:54,030 So now let's delve deeper into the pre-processing phase. 65 00:05:54,050 --> 00:05:56,240 In next lecture.