Monday 27 January 2014

Initial Experimentation with Cross-Platform Assembler

In this particular exercise, there were two parts to be compiled and executed on two respective platforms - x86/64 and ARM64. Trivial sounding at first, but much more intricate and procedure intensive when labored through, the tasks were to initially create a 10 iteration loop that is outputted as a character string that shows the ascending integer of the iterator. This would be followed by expanding the loop to 30 iterations and presenting the incremented result in a "double digit" format that combines both the quotient and the remainder of the number when divided by 10.

Our group ran into a handful of obstacles throughout the process of completing both parts of the lab, and I have taken the liberty of jotting them down as we encountered them:

- Throughout the first part, we constantly fiddled around with the .data variable lengths and calculations in order to insert the result byte at the correct index of the prescribed string.

- A recurring error we ran into was a segmentation fault involving the index variable, with my first guess assuming that this was attributed to the declaration syntax. Later fixed with the inclusion of parentheses.

- Stepping through the execution with the GNU debugger (gdb) gave us a weird value at register 12 at first.This was corrected by changing the use to the right register (rdx instead rcx).

- The format came out wonky at first, the newline character did not get executed. The correction to this came with Nicholas's logic to include the calculations in the loop on where to store the byte in the string.

- In the Aarch64 port, the first main problem was trying to load the "index" variable to a register, since we couldn't find a replacement for the x86 equivalent of a -b suffix eg. "r13b".

- A segmentation fault followed yet again after finding "strb" which might or might not have worked for the problem above.

- We observed some interesting output when changing line 11 to "adr" from "mov": "qemu: Unsupported syscall: 0" in the 10 iterations of the loop.

- In part 2 for the x86 concerns (which were greatly reduced):

1) Hex values were for some odd reason confused with the ASCII conversion on the divisor.

- As for Aarch64:

1) I was especially not a fan of how the "msub" instruction was explained on the wiki and therefore had some trouble implementing it. Later deduced that the structure of the instruction is as follows:

msub "register to store calculated remainder in" "register where first numeral to be divided is stored" "register where second numeral to be divided is stored" "quotient (whole number result of division)"

Below is our completed code for Part 2 for x86/64, followed by ARM64:

/* x86 */

.text

.globl _start

_start:


    mov $0, %r15 

    mov $10, %r14 /*used to store the divider */ 

    loop:


        mov %r15, %rax

        mov $0, %rdx
        div %r14
      
        add $0x30, %al /*convert to ascii */ 
        add $0x30, %dl 

        mov %al, (msg + len - 3) /*  store the byte */

        mov %dl, (msg + len - 2)

movq $len,%rdx /* message length */

movq $msg,%rsi /* message location */
movq $0x1,%rdi /* file descriptor stdout */
movq $0x1,%rax /* syscall sys_write */
syscall

        inc %r15

        cmp $0x1e, %r15
        jne loop

movq $0,%rdi /* exit status */

movq $60,%rax /* syscall sys_exit */
syscall

.data


msg: .ascii      "Loop: ##\n"

len = . - msg

=================================================================


/* aarch64 */


.text


  .global _start


_start:

  
  mov x19, 0
  mov x23, 10 /* used for dividing */
  adr x24, msg

  loop:
    mov x20, x19/* calculate the byte  */ 
    udiv x21, x20, x23       /* r21 = i / 10 */
    msub x22, x21, x23, x20  /* r22 = i - (r21 * 10) ie gets the remainder*/ 

    add x21, x21, 0x30  /* convert to ascii */

    add x22, x22, 0x30

    strb w21, [x24, len - 3]/* set the byte  */ 

    strb w22, [x24, len - 2]
   
    mov x0, 1   /* print the loop */
    adr x1, msg
    mov x2, len
    mov x8, 64
    svc 0

    add x19, x19, 1 /* if not 30 revert to the beginning of the loop */

    cmp x19, 30
    bne loop

    mov x0, 0

    mov x8, 93
    svc 0 

.data
  msg: .ascii "Loop: ##\n"

  len = . - msg

Not included here, but completed later is the suppression of the zero digits on the quotient in the first 10 iterations of the loop, which ended up being a simple case of comparing the quotient to a zero stored in a register and skipping the register containing the 0 before output depending on whether or not the condition is true. This is was done like so:

x86 -  cmp $0x00, %rax
           je skip_10s

aarch64 - cmp x21, 0x00
                beq skip

Friday 17 January 2014

Observing Compiler Operations with C Source Code

A very simple hello world style function was written in C and compiled using the GCC for the purpose of being lastly disassembled using objdump in order to view and summarize the compiler translation of the main function as an exercise in basic understanding of assembler logic.

Here is the original GCC command used to compile the helloworld.c file:

gcc -g -O0 -fno-builtin helloworld.c -o hello

The following are 6 cases of program and compiler option manipulations that were tested and compared for potentially varied output and behavior. My particular group covered the 4th case in this scenario, that being the addition of multiple arguments to the printf() function:

When more than 1 argument is passed into the “printf” function in c source code, the compiler essentially adds in more MOVL (move-load) assembler functions into aligned addresses in the CPU registry that are descending by 4 hex digits (also concurrent with the amount of bits needed to store the length of the variable). When inserting a much larger “long integer”, however, the compiler needed to add in another MOVL function line in order to accommodate the width of the numeral into 8 bits of memory accordingly. The length of the variable doesn’t seem to make a difference (between int and long int). The string was MOV’d to a different registry (eax as opposed to esp which is also a stack pointer) and was not labeled as a pointer but called directly. The last major execution is a function call to printf from the procedure linking table. Interestingly enough, it was pointed out that as opposed to the compiler simply “PUSH”ing the variables into the stack, it utilized the more complex MOVL function due to a yet to be determined reason by us…

Below is the first screenshot of the compiler code with 11 argument paramaters, the first being a string and then next ten being simple 4 bit integers in ascension - 


Here is the discrepancy with the edit in the long 18-digit integer:


The rest of my fellow classmates' cases involved manipulations with the compiler options, listed below along with descriptions of the results observed thereafter:

1) Adding the compiler option -static

The general consensus was that the file size became noticeably larger, due to the nature of the option disabling the dynamic linking in the imported libraries and thereby adding them in their entirety, as opposed to them being shared. The only seeming advantage to this is the fact that the user doesn't need to pre-install any of the libraries being called.

2) Removing the -fno-builtin option from the compilation command

The file size decreases slightly and the compiler replaced the printf() function with the less taxing puts() function, thereby optimizing it.

3) Removing the -g option from the compilation command

This option ends up removing any debugging information and headers from the output file, and thereby lowers its memory size at the cost of omitting any and all useful data for tracking errors in the code.

5) Moving the printf() call to a separate function, and call that function from main()

The snippet of code I ended up using looked like this:


In the objdump output what essentially ended up happening is that in main the callq ended using <newfunc> and then skipped over to newfunc's header and called <printf@plt> from there. Simply put, it ended up being the assembly logic equivalent of what happened in the C code.

6) Removing -O0 and add -O3 to the gcc options

This effectively switches the variation and breadth of optimization that is allowed to the compiler. In this particular case, it has changed from "do as little optimization as possible" to "get into overdrive". As a result, I could assume that it comes down to being a question of compile time vs. execution time. Here is the main observed difference in the compiler output:

Using the O0 option - 


Using the O3 option -


The main difference in this code is the main function's assembler logic optimizations.

Tuesday 14 January 2014

A Comparison of Code Review Processes

Over the past few days, I have looked over a pair of software package communities online that are ideally releasing their intellectual property under to different open source licenses in order to juxtapose any distinct difference between popular code reviewing techniques used in today's world.

My first example would have to be the community at the OpenMW project who I've been religiously following for a few years now. They have been operating under the GNU GPL version 3 license. Code reviewing and bug tracking in this project is maintained and updated using a project management app known as Redmine. Unfortunately, they have no readily available documentation that explicitly outlines their standards of coding practices, despite the presence of the correlating tab on the site. The features that clearly shines through in its ease of access and readability is the issue tracker and the roadmap being displayed. The issue tracker clearly indicates the issue type, priority, subject, assignee, and dates created and updated. Once the issue is clicked on, many further details are displayed as well as any related progress and discussion surrounding it. A great example of this can be seen here.

This next software project is called SilverStripe CMS - an open source web management tool. Their software is licensed under the BSD License, which they argue is more flexible and less restrictive than the GPL in cases such as companies wanting to integrate SilverStripe code into their product without revealing any custom codes of their own. SilverStripe now uses the familiar Github revision control system for issue tracking. In close similarity with the previously shown Redmine app, it incorporates easily accessible tickets for features and bugs that can be described, discussed, and updated on the same webpage. It also works in conjunction with its git environment where any changes are "pushed" and "pulled" by the appropriate users into the repository as necessary to achieve a specified milestone. Documentation on using some of the introductory features of git and github can be seen here.

Both of the projects found exhibit great use of tools to communicate and attain their personal goals, with the major difference being the more organized and hierarchical nature of SilverStripe which I would surmise is due to the breadth and age of the project itself, with its community being leagues larger than OpenMW's.