Static analysis of ELF executable code – tutorial

The following document is a practical example of procedures described in Article Reverse engineering of ELF executable code in forensic analysis (Hakin9 01/2005).
Object of analysis: netc. It should be copied to the home directory before starting the examination of the file.

Preparations

All tools required for the analysis are, of course, available on the Hakin9 Live CD. But if you aren't using it, a Linux system with the following programs installed is required:

Applications from binutils package: Scripts and applications from fenris package: Other tools:

Initial analysis of the object

[01] Use of the file tool to gather some basic information about the analysed file

# file netc
netc: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, statically linked, stripped

As seen above, our analysed file was statically compiled and underwent a stripping procedure.

[02] Search for interesting text strings inside the file:

# strings -a netc > strings.out

File contents analysis provides some interesting information that may be useful in next steps:

Searching through chosen parts of the analysed file may be another way of finding some interesting text strings. For example, information about the version of the compiler and the operating system used for compiling the file is usually stored in the .comment section.

Contents of the selected part of the file can be viewed with following command:

# objdump -j .comment -s file_name > comment.out

or this one:

# objdump -h file_name > sections.out

Ater that, one should load the file, with the selected offset, to any file viewer or editor.

Table of symbols recovery attempts

[03] We find and download the libraries which could've been used for the compilation

The strings command issued in the step [02] showed us that the file had been compiled in Mandrake Linux 10.0 system using the GCC 3.3.2 compiler. In our case we'll limit ourselves to the attempt of recovering symbols associated with the libc library from the Mandrake 10.0 operating system. Because of the purpose of the libc library, it's very probable that it was used in the analysed file. Other library which could be used for the recovering of tables of symbols as well is libgcc.a.
At this point we will use only libc, because the elements of the libgcc.a library will be found and recognised later on, by comparing to a compiled example file.

We save the libc.a library in the ~/analysis/libc_components/ directory.

What can we do if we don't have any hints about which versions of libraries were used during the compilation? In such situations, we can download several different versions of one library, follow these steps to recover the table of symbols and evaluate the results against the number of hits and their effectiveness.

[04] Unpacking the library's objects

# ar x libc.a

[05] Checking the analysed program for the presence of the library's code objects

# search_static netc ~/analysis/libc_components > obj_file

In this step we should pay attention to the possibility of conflicts, which could appear during the performed verification. Detected conflicts appear in the final section of the obj_file file. What is the practical meaning of these conflicts and what impact upon the analysis do they have? Without detailed analysis of the functions' code where the conflicts appeared, it can't be predicted which function was in fact used. Example of a detected conflict:

# Possible conflict below requiring manual resolution:
# ----------------------------------------------------
# /analysis/libc_components/getsrvbynm.o - match at 0x08057580 (0x000000ea bytes)
# /analysis/libc_components/getsrvbypt.o - match at 0x08057580 (0x000000ea bytes)

[06] Generating the list of the found symbolic references from particular library objects.

# gensymbols obj_file > symbols_db

The result of the script actions above is the list of symbols along with the addresses of their code.

[07] Disassembling of the analysed program

# gendump netc > out1

[08] Removing a library functions' code from the file out1

# decomp_strip obj_file out2

[09] Making the analysis easier by adding names of the recognized functions in place of their calls

# decomp_insert_symbols symbols_db out3

[10] Improving the legibility of the code by putting the content of the text strings in place of references to them.

# decomp_xref_data netc out4

For recovering the table of symbols, tools from the fenris package may be used as well.

Next steps:
[a] We start to edit the getfprints script
[b] In the TRYLIBS variable we insert the paths and names of the libraries that will be used to create the base of signatures

# getfprints

[c] We change the name of the executable file to the default signatures file name.

# mv NEW-fnprints.dat fnprints.dat

[d]We recover deleted symbols using the dress program

# dress -F ./fnprints.dat program_name > list_of_recovered_symbols

or

# dress -F ./fnprints.dat program_name program_name_with_added_list_of_symbols

Determining functions added by the compiler

If we know what version of the compiler was used or if we can make an educated guess, an attempt can be made to find the location of the functions added by the compiler (a similar result would be achieved by using the libgcc.a library during the recovery of the table of symbols). To perform this task, we will compare our analysed file to the sample one, compiled with the same compiler as our analysed file.

[11] Creating the sample.c file

int main(int argc, char **argv[])
{
return 0;
}

[12] Compiling the sample file

# gcc —static —o sample sample.c

[13] Comparing elements of the compiled sample file with the code of the analysed file - out4

After comparing the files, the following functions were detected:

Function_start() - Address = 08048100


Function_call_gmon_start() - Address = 08048124


Function__do_global_dtors_aux() - Address = 08048150


Function_frame_dummy() - Address = 080481b0


Location of function_start() will be found by reading the entrypoint value from the ELF header

Determining a location of the main() function

[14] Comparing the structure of the sample file's function_start() with the code of the function from the analysed file, can let us determine the location of the main() function.

08048100: xor %ebp,%ebp
08048102: pop %esi
08048103: mov %esp,%ecx
08048105: and $0xfffffff0,%esp
08048108: push %eax
08048109: push %esp
0804810a: push %edx
0804810b: push $0x804aa90
08048110: push $0x804aa30
08048115: push %ecx
08048116: push %esi
08048117: push $0x804994f
0804811c: call 0x0804a3b0 <__libc_start_main>
08048121: hlt

Determining user functions

[15] Determining user functions (which means functions not recognised as library objects)

# grep 'call 0x' out4 | grep -v '<' > user_f.out

[16] we will try to extract only the unique functions, because many of functions' addresses repeat in the analysed code,

# grep 'call 0x' out4 | grep -v '<' | awk '{print $3}' | sort -u

0x0804812d
0x080481bd
0x08048204
0x080482a5
0x08048303
0x0804834b
0x080483b5
0x080483fd
0x080486b8
0x080488ab
0x08048951
0x080489b3
0x08048a3b
0x08048d9d
0x08049235
0x08049311
0x0804950b
0x0804aed0
0x0804bf70
0x0804bf90
0x08057370
0x08057580


it appears that some of the acquired addresses may not exist in our out4 code. it's the effect of step [8] where we deleted the conflicting functions, but their function signatures still exist.

The analysis of actions performed by the program

[17] The next steps that we should analyse, starting from the main() function, is the flow of control between user functions and the actions they perform. Doing this will allow us to find the answer to such questions, as to what role the analysed object plays in the system and what mechanisms it uses. At the very least, basic Assembly knowledge is required for the proper interpretation of operations performed by selected functions.

Functionmain()
0x0804812d
0x080481bd
0x08048204
0x080482a5
0x08048303
0x0804834b
0x080483b5
0x080483fd
0x080486b8
0x080488ab
0x08048951
0x080489b3
0x08048a3b
0x08048d9d
0x08049235
0x08049311
0x0804950b