[01] Use of the file tool to gather some basic information about the analysed file
# file netc
netc: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, statically linked, stripped
As seen above, our analysed file was statically compiled and underwent a stripping procedure.
[02] Search for interesting text strings inside the file:
# strings -a netc > strings.out
File contents analysis provides some interesting information that may be useful in next steps:
Searching through chosen parts of the analysed file may be another way of finding some interesting text strings. For example, information about the version of the compiler and the operating system used for compiling the file is usually stored in the .comment section. Contents of the selected part of the file can be viewed with following command: # objdump -j .comment -s file_name > comment.out or this one: # objdump -h file_name > sections.out Ater that, one should load the file, with the selected offset, to any file viewer or editor. |
[03] We find and download the libraries which could've been used for the compilation
The strings command issued in the step [02] showed us that the file had been compiled in Mandrake Linux 10.0 system using the GCC 3.3.2 compiler. In our case we'll limit ourselves to the attempt of recovering symbols associated with the libc library from the Mandrake 10.0 operating system. Because of the purpose of the libc library, it's very probable that it was used in the analysed file. Other library which could be used for the recovering of tables of symbols as well is libgcc.a.
At this point we will use only libc, because the elements of the libgcc.a library will be found and recognised later on, by comparing to a compiled example file.
We save the libc.a library in the ~/analysis/libc_components/ directory.
What can we do if we don't have any hints about which versions of libraries were used during the compilation? In such situations, we can download several different versions of one library, follow these steps to recover the table of symbols and evaluate the results against the number of hits and their effectiveness. |
[04] Unpacking the library's objects
# ar x libc.a
[05] Checking the analysed program for the presence of the library's code objects
# search_static netc ~/analysis/libc_components > obj_file
In this step we should pay attention to the possibility of conflicts, which could appear during the performed verification. Detected conflicts appear in the final section of the obj_file file. What is the practical meaning of these conflicts and what impact upon the analysis do they have? Without detailed analysis of the functions' code where the conflicts appeared, it can't be predicted which function was in fact used.
Example of a detected conflict:
# Possible conflict below requiring manual resolution:
# ----------------------------------------------------
# /analysis/libc_components/getsrvbynm.o - match at 0x08057580 (0x000000ea bytes)
# /analysis/libc_components/getsrvbypt.o - match at 0x08057580 (0x000000ea bytes)
[06] Generating the list of the found symbolic references from particular library objects.
# gensymbols obj_file > symbols_db
The result of the script actions above is the list of symbols along with the addresses of their code.
[07] Disassembling of the analysed program
# gendump netc > out1
[08] Removing a library functions' code from the file out1
# decomp_strip obj_file out2
[09] Making the analysis easier by adding names of the recognized functions in place of their calls
# decomp_insert_symbols symbols_db out3
[10] Improving the legibility of the code by putting the content of the text strings in place of references to them.
# decomp_xref_data netc out4
For recovering the table of symbols, tools from the fenris package may be used as well.
Next steps: [a] We start to edit the getfprints script [b] In the TRYLIBS variable we insert the paths and names of the libraries that will be used to create the base of signatures # getfprints [c] We change the name of the executable file to the default signatures file name. # mv NEW-fnprints.dat fnprints.dat [d]We recover deleted symbols using the dress program # dress -F ./fnprints.dat program_name > list_of_recovered_symbols or # dress -F ./fnprints.dat program_name program_name_with_added_list_of_symbols |
If we know what version of the compiler was used or if we can make an educated guess, an attempt can be made to find the location of the functions added by the compiler (a similar result would be achieved by using the libgcc.a library during the recovery of the table of symbols). To perform this task, we will compare our analysed file to the sample one, compiled with the same compiler as our analysed file.
[11] Creating the sample.c file
int main(int argc, char **argv[])
{
return 0;
}
[12] Compiling the sample file
# gcc —static —o sample sample.c
[13] Comparing elements of the compiled sample file with the code of the analysed file - out4
[14] Comparing the structure of the sample file's function_start() with the code of the function from the analysed file, can let us determine the location of the main() function.
08048100: xor %ebp,%ebp
08048102: pop %esi
08048103: mov %esp,%ecx
08048105: and $0xfffffff0,%esp
08048108: push %eax
08048109: push %esp
0804810a: push %edx
0804810b: push $0x804aa90
08048110: push $0x804aa30
08048115: push %ecx
08048116: push %esi
08048117: push $0x804994f
0804811c: call 0x0804a3b0 <__libc_start_main>
08048121: hlt
[15] Determining user functions (which means functions not recognised as library objects)
# grep 'call 0x' out4 | grep -v '<' > user_f.out
[16] we will try to extract only the unique functions, because many of functions' addresses repeat in the analysed code,
# grep 'call 0x' out4 | grep -v '<' | awk '{print $3}' | sort -u
0x0804812d
0x080481bd
0x08048204
0x080482a5
0x08048303
0x0804834b
0x080483b5
0x080483fd
0x080486b8
0x080488ab
0x08048951
0x080489b3
0x08048a3b
0x08048d9d
0x08049235
0x08049311
0x0804950b
0x0804aed0
0x0804bf70
0x0804bf90
0x08057370
0x08057580
it appears that some of the acquired addresses may not exist in our out4 code.
it's the effect of step [8] where we deleted the conflicting functions, but their function signatures still exist.
[17] The next steps that we should analyse, starting from the main() function, is the flow of control between user functions and the actions they perform. Doing this will allow us to find the answer to such questions, as to what role the analysed object plays in the system and what mechanisms it uses. At the very least, basic Assembly knowledge is required for the proper interpretation of operations performed by selected functions.
Functionmain()
0x0804812d
0x080481bd
0x08048204
0x080482a5
0x08048303
0x0804834b
0x080483b5
0x080483fd
0x080486b8
0x080488ab
0x08048951
0x080489b3
0x08048a3b
0x08048d9d
0x08049235
0x08049311
0x0804950b