1 00:00:00,410 --> 00:00:06,050 All right, Now let's dive into the fascinating world of the linking phase, which marks the final stage 2 00:00:06,050 --> 00:00:08,450 of the compilation process. 3 00:00:08,480 --> 00:00:15,500 As the name suggests, this phase brings together all the object files and seamlessly merge them into 4 00:00:15,500 --> 00:00:17,600 a unified binary executable. 5 00:00:17,630 --> 00:00:24,200 In modern systems, this phase may also incorporate an additional optimization path known as link time 6 00:00:24,200 --> 00:00:30,500 optimization or enhancing the overall performance of the resulting executable. 7 00:00:30,710 --> 00:00:37,970 And unsurprisingly, the program that performs the linking phase is called a linker or link editor is 8 00:00:37,970 --> 00:00:43,370 typically separate from the compiler, which usually implements all the preceding phases. 9 00:00:43,520 --> 00:00:51,230 As I've already mentioned, object files are relocatable because they are compiled independently from 10 00:00:51,230 --> 00:00:52,760 each other. 11 00:00:52,790 --> 00:00:55,410 As you can see here, preventing the. 12 00:00:56,000 --> 00:01:03,200 That's because the they are preventing the compiler from assuming that an object will end up at any 13 00:01:03,200 --> 00:01:04,890 particular base address. 14 00:01:04,910 --> 00:01:13,460 So these are relocatable and moreover object files may reference functions or variables in other object 15 00:01:13,460 --> 00:01:17,270 files or in libraries that are external to the program. 16 00:01:17,270 --> 00:01:23,750 So before the linking phase, the addresses at which the reference code and data will be placed are 17 00:01:23,750 --> 00:01:24,740 not yet known. 18 00:01:24,740 --> 00:01:33,140 So the object files only contain relocation symbols that specify how functions and variable references 19 00:01:33,140 --> 00:01:35,190 should eventually be resolved. 20 00:01:35,210 --> 00:01:43,470 So in the context of linking references that rely on a relocation symbol are called symbolic references. 21 00:01:43,490 --> 00:01:50,690 When an object file reference is one of its own functions or variables by absolute address, the reference 22 00:01:50,690 --> 00:01:53,090 will also be symbolic. 23 00:01:53,090 --> 00:01:59,840 So the linker job is to take all the object files belonging to a program and merge them into a single 24 00:01:59,840 --> 00:02:06,390 coherent executable, typically intended to be loaded at a particular memory address. 25 00:02:06,420 --> 00:02:11,670 Now here, the arrangement of all modules in the executable is known. 26 00:02:11,700 --> 00:02:15,720 The linker can also resolve more symbolic references as well. 27 00:02:15,840 --> 00:02:23,160 References to the libraries may or may not be completely resolved depending on the type of library. 28 00:02:23,990 --> 00:02:34,310 And static libraries, which on Linux typically have the extension of dot a as here as a dot a. 29 00:02:36,140 --> 00:02:38,240 And are merged into the binary executable. 30 00:02:38,250 --> 00:02:40,500 So allowing any references to them. 31 00:02:40,860 --> 00:02:42,600 To be resolved entirely. 32 00:02:42,600 --> 00:02:46,260 So there are also dynamic which is also called. 33 00:02:46,350 --> 00:02:53,250 The shared libraries which are shared in memory among all programs that run on a system. 34 00:02:53,250 --> 00:02:58,890 In other words, rather than copying the library into every binary that uses it, the dynamic libraries 35 00:02:58,890 --> 00:03:07,770 are loaded into memory only once, and any binary that wants to use the library needs to use this shared 36 00:03:07,770 --> 00:03:08,220 copy. 37 00:03:08,220 --> 00:03:16,260 So during the linking phase, the addresses at which dynamic libraries will reside are not yet known, 38 00:03:16,260 --> 00:03:18,610 so references to them cannot be resolved. 39 00:03:18,630 --> 00:03:24,780 Instead, the linker leaves symbolic references to these libraries even in the final executable, and 40 00:03:24,780 --> 00:03:31,110 these references are not resolved until the binary is actually loaded into memory to be executed. 41 00:03:31,110 --> 00:03:38,230 So most compilers like GCC automatically call the linker at the end of the compilation process to use 42 00:03:38,350 --> 00:03:47,590 the to produce a complete binary executable so you can simply call GCC without any special switches 43 00:03:47,860 --> 00:03:50,050 and compile your application. 44 00:03:50,050 --> 00:03:53,380 My app dot C and that's it. 45 00:03:53,380 --> 00:03:58,570 And here we will now use the file my app dot a. 46 00:03:59,520 --> 00:04:01,770 Or a dot out. 47 00:04:02,100 --> 00:04:02,730 A dot out. 48 00:04:02,730 --> 00:04:04,740 Because this is our output. 49 00:04:05,850 --> 00:04:09,510 While the final file and that's it. 50 00:04:09,510 --> 00:04:11,160 And here. 51 00:04:12,330 --> 00:04:12,990 The first. 52 00:04:12,990 --> 00:04:16,830 Let's actually understand this firstly. 53 00:04:17,010 --> 00:04:21,470 So by default, the executable is called a dot out. 54 00:04:21,480 --> 00:04:30,060 But you can override this by naming, by passing the or parameter or switch to GCC followed by a name 55 00:04:30,060 --> 00:04:31,080 for the output file. 56 00:04:31,080 --> 00:04:38,460 So the file utility now tells us that we are dealing with an Elf 64 bit. 57 00:04:39,620 --> 00:04:40,310 LSB. 58 00:04:41,880 --> 00:04:45,930 I executable rather than a relocatable file. 59 00:04:45,960 --> 00:04:50,070 As you can see here, we have this relocatable file, as you saw. 60 00:04:50,850 --> 00:04:57,480 Also, you saw in the previous lecture and other important information is that the file is dynamically 61 00:04:57,480 --> 00:04:59,280 linked here. 62 00:05:01,220 --> 00:05:04,040 Dynamically length. 63 00:05:07,120 --> 00:05:15,690 So meaning this dynamically linked here, meaning that it uses some libraries that are not merged into 64 00:05:15,700 --> 00:05:21,480 executable but are instead shared among all programs running on the same system. 65 00:05:21,520 --> 00:05:25,630 And finally, we have this interpreter. 66 00:05:25,660 --> 00:05:26,740 This here. 67 00:05:28,530 --> 00:05:29,280 Here. 68 00:05:31,020 --> 00:05:31,770 So. 69 00:05:33,570 --> 00:05:47,670 Um here the interpreter lib64 Linux and x86 64 .0.2 Here in the file output tells you which dynamic 70 00:05:47,670 --> 00:05:55,050 linker will be used to resolve the final dependencies on dynamic libraries when the executable is loaded 71 00:05:55,050 --> 00:05:57,060 into memory to be executed. 72 00:05:57,060 --> 00:06:06,660 So when you run the binary using the dot a dot out command, we are running here. 73 00:06:06,660 --> 00:06:15,510 You can see that it produces the expected output printing the Hello world to us, which confirms that 74 00:06:15,510 --> 00:06:18,360 you have produced a working binary. 75 00:06:18,570 --> 00:06:24,870 But here we have the node script here or stripped here. 76 00:06:26,370 --> 00:06:32,950 But let's what is bit about this binary not being stripped or not stripped. 77 00:06:32,950 --> 00:06:36,910 You will learn that in next lecture.