Friday 17 January 2014

Observing Compiler Operations with C Source Code

A very simple hello world style function was written in C and compiled using the GCC for the purpose of being lastly disassembled using objdump in order to view and summarize the compiler translation of the main function as an exercise in basic understanding of assembler logic.

Here is the original GCC command used to compile the helloworld.c file:

gcc -g -O0 -fno-builtin helloworld.c -o hello

The following are 6 cases of program and compiler option manipulations that were tested and compared for potentially varied output and behavior. My particular group covered the 4th case in this scenario, that being the addition of multiple arguments to the printf() function:

When more than 1 argument is passed into the “printf” function in c source code, the compiler essentially adds in more MOVL (move-load) assembler functions into aligned addresses in the CPU registry that are descending by 4 hex digits (also concurrent with the amount of bits needed to store the length of the variable). When inserting a much larger “long integer”, however, the compiler needed to add in another MOVL function line in order to accommodate the width of the numeral into 8 bits of memory accordingly. The length of the variable doesn’t seem to make a difference (between int and long int). The string was MOV’d to a different registry (eax as opposed to esp which is also a stack pointer) and was not labeled as a pointer but called directly. The last major execution is a function call to printf from the procedure linking table. Interestingly enough, it was pointed out that as opposed to the compiler simply “PUSH”ing the variables into the stack, it utilized the more complex MOVL function due to a yet to be determined reason by us…

Below is the first screenshot of the compiler code with 11 argument paramaters, the first being a string and then next ten being simple 4 bit integers in ascension - 


Here is the discrepancy with the edit in the long 18-digit integer:


The rest of my fellow classmates' cases involved manipulations with the compiler options, listed below along with descriptions of the results observed thereafter:

1) Adding the compiler option -static

The general consensus was that the file size became noticeably larger, due to the nature of the option disabling the dynamic linking in the imported libraries and thereby adding them in their entirety, as opposed to them being shared. The only seeming advantage to this is the fact that the user doesn't need to pre-install any of the libraries being called.

2) Removing the -fno-builtin option from the compilation command

The file size decreases slightly and the compiler replaced the printf() function with the less taxing puts() function, thereby optimizing it.

3) Removing the -g option from the compilation command

This option ends up removing any debugging information and headers from the output file, and thereby lowers its memory size at the cost of omitting any and all useful data for tracking errors in the code.

5) Moving the printf() call to a separate function, and call that function from main()

The snippet of code I ended up using looked like this:


In the objdump output what essentially ended up happening is that in main the callq ended using <newfunc> and then skipped over to newfunc's header and called <printf@plt> from there. Simply put, it ended up being the assembly logic equivalent of what happened in the C code.

6) Removing -O0 and add -O3 to the gcc options

This effectively switches the variation and breadth of optimization that is allowed to the compiler. In this particular case, it has changed from "do as little optimization as possible" to "get into overdrive". As a result, I could assume that it comes down to being a question of compile time vs. execution time. Here is the main observed difference in the compiler output:

Using the O0 option - 


Using the O3 option -


The main difference in this code is the main function's assembler logic optimizations.

No comments:

Post a Comment