1 00:00:00,480 --> 00:00:06,240 Once the pre-processing phase concludes, the source code is prepared for compilation. 2 00:00:06,270 --> 00:00:12,600 During the compilation phase, the pre-processing code undergoes translation into assembly language. 3 00:00:12,600 --> 00:00:18,540 So it's worth noting that compilers often incorporate significant optimization capabilities in this 4 00:00:18,540 --> 00:00:31,770 phase, so these optimizations can be adjusted using the GCC uppercase or zero through GCC uppercase 5 00:00:31,800 --> 00:00:33,120 or three. 6 00:00:33,450 --> 00:00:36,750 So from 0 to 3 options in GCC. 7 00:00:36,870 --> 00:00:44,190 So the level of optimization choosing can greatly impact the resulting disassembly as you will explore 8 00:00:44,190 --> 00:00:47,070 in detail in next sections. 9 00:00:47,070 --> 00:00:54,360 So now you might wonder why the compilation phase produces assembly language instead of directly generating 10 00:00:54,360 --> 00:00:55,110 machine code. 11 00:00:55,110 --> 00:01:01,810 So this design decisions becomes clearer when considering the multitude of programming languages in 12 00:01:01,810 --> 00:01:02,470 existence. 13 00:01:02,470 --> 00:01:09,460 So aside from C, there are numerous popular compiled languages like C Plus plus objective C, Common 14 00:01:09,460 --> 00:01:13,210 Lisp, Delphi, Go and Haskell to just name a few. 15 00:01:13,210 --> 00:01:21,100 So developing a compiler that directly emits machine code from each of these languages would be an incredibly 16 00:01:21,100 --> 00:01:23,780 daunting and time consuming task. 17 00:01:23,800 --> 00:01:30,640 Instead, it is more practical to emit assembly code, which is already quite challenging for developers, 18 00:01:30,640 --> 00:01:37,840 and utilize a dedicated assembler that handles the final translation from assembly to machine code for 19 00:01:37,840 --> 00:01:39,280 all supported languages. 20 00:01:39,310 --> 00:01:46,540 Therefore, the output of the compilation phase is assembly code represented in a reasonably human readable 21 00:01:46,540 --> 00:01:50,080 format with preserved symbolic information. 22 00:01:50,080 --> 00:01:57,400 So in the case of GCC, which automates all compilation phases by default, you need to instruct it 23 00:01:57,400 --> 00:02:02,710 to halt after the compilation stage and save the assembly files to disk. 24 00:02:02,740 --> 00:02:08,410 To examine the emitted assembly as we did for in previous lecture for pre-processing phase. 25 00:02:08,410 --> 00:02:15,880 So to achieve this you can use the s flag where where's is conventional extension for assembly files? 26 00:02:15,910 --> 00:02:25,090 Additionally, the maxim here m a s m Intel option is passed to GCC instructing it to generate assembly 27 00:02:25,090 --> 00:02:29,140 Intel syntax instead of the default AT&T syntax. 28 00:02:29,140 --> 00:02:36,040 So to provide you with a visual representation, we will use visual and practical representation. 29 00:02:36,040 --> 00:02:39,030 We will create an example code. 30 00:02:39,040 --> 00:02:44,860 So here now let's create this code here. 31 00:02:46,570 --> 00:02:55,390 GCC again, as we said, we will use the uppercase S and m a s m Intel here. 32 00:02:55,390 --> 00:03:05,890 And after that we will input our C file, which is my app dot C here, and that's it. 33 00:03:05,890 --> 00:03:06,970 It's compiled. 34 00:03:06,970 --> 00:03:15,010 And what we're going to do is we will read the compilation example with Mousepad with some text editor 35 00:03:15,010 --> 00:03:15,520 here. 36 00:03:15,520 --> 00:03:17,050 Mouse pad. 37 00:03:19,240 --> 00:03:21,400 The my app dot here. 38 00:03:21,400 --> 00:03:28,540 As you can see, we have C and S, we will open the S, which just generated and here this is assembly 39 00:03:28,540 --> 00:03:33,370 generated by the compilation phase for the Hello World Program. 40 00:03:33,370 --> 00:03:39,100 And for now I won't go into details about the assembly code just for now. 41 00:03:39,100 --> 00:03:46,120 But but what's interesting here is that the assembly code is relatively easy to read because the symbols 42 00:03:46,120 --> 00:03:49,630 and functions have been preserved. 43 00:03:49,630 --> 00:03:57,970 So for instance, constant and variable variables have symbolic names rather than like just addresses, 44 00:03:57,970 --> 00:04:06,040 even if it's just an automatically generated name such as SC zero for the Nameless Hello World String, 45 00:04:06,040 --> 00:04:09,640 and there's an explicit label for the main function. 46 00:04:09,970 --> 00:04:12,790 The only function in this case here. 47 00:04:13,510 --> 00:04:22,180 And the main function is going to be right here above LSB zero and. 48 00:04:22,920 --> 00:04:28,080 And there's an explicit label for the um, also hello world String. 49 00:04:28,080 --> 00:04:28,890 So. 50 00:04:30,400 --> 00:04:36,550 And any reference code to the data are also symbolic, such as reference to this. 51 00:04:36,580 --> 00:04:37,840 Hello world. 52 00:04:38,110 --> 00:04:38,860 Here. 53 00:04:40,320 --> 00:04:40,620 This. 54 00:04:40,620 --> 00:04:41,760 Hello, world here. 55 00:04:42,750 --> 00:04:43,650 And. 56 00:04:44,700 --> 00:04:50,230 And you will have no such luxury when dealing with a stripped binaries. 57 00:04:50,250 --> 00:04:52,380 Later in this course. 58 00:04:52,500 --> 00:04:55,800 Here, for example, we have all the variable names and so on. 59 00:04:55,950 --> 00:04:59,700 And in the next lecture we will go with the assembly phase. 60 00:04:59,700 --> 00:05:01,710 So I'm waiting you in the next lecture.