This guide is part three of the series, X86–64 Assembly Language Program.

Now that we’ve set up an efficient development environment, it’s time to write some actual Assembly Language code. In this guide, I’ll share the resources and processes I followed to build a simple Assembly Language program.

Printing Command Line Arguments

In order to demonstrate the concepts let’s write an assembly program that counts the number of unique words in a text file whose name is provided as an argument.

As a first step towards that goal, the program would need to access command-line arguments. This first program is all about understanding how to access command-line arguments using x86_64 Linux assembly.

If you are new to assembly programming, I suggest this guide as a starting point to learns some of the fundamentals. I reference some of the concepts here without further explanation.

What is a Command Line Argument

Command-line arguments are textual inputs the program users include on the command line while executing a program. For example, if there is an executable program named a.out, it can be run on a Linux system from the command line with the command

$ ./a.out

Further to that, “command-line arguments” can be included after the executable name. For example,

$ ./a.out 1 2 3

In this case, there are three arguments: 1, 2, and 3.

The Call Stack

It will help to have an understanding of the call stack as this is where command-line arguments are stored when a program begins executing.

If you are not familiar, the stack is a Last In First Out (LIFO) data structure that resides in the computer’s RAM. The stack pointer keeps track of the “top” of the stack. In x86_64, the stack pointer is the register $rpi. $rpi is a pointer to the “Last In” memory address of the stack. The stack pointer can be used to access the stack, or two assembly language commands can be used: push and pop. push can be used to add a value to the stack, and update $rpi to point to this newly added value. pop can be used to get the “Last In” value from the stack, and update $rpi to point to the next value on the stack. The format of these commands is (note that a semicolon is used to denote comments in Assembly language so anything after ; is ignored by the compiler):

; add the value of $rcx to the stack
push rcx
; pop the "Last In" stack value into $rbx. 
; In this case, the value of $rcx
pop rbx

Accessing Command Line Arguments in X86–64 Assembly Language

When a program begins execution, any command-line arguments are stored on the stack. The top of the stack will hold the number of arguments. If you’ve programmed in c or c++ this is referred to as argc in main() meaning argument count.

The second value on the stack is the function name. This is considered the first argument and is included in the argument count total. Therefore, if you provide no arguments when executing a program argc will be equal to one.

Any arguments provided will be the next values on the stack. As an example, if you executed a program with the command

./a.out arg1 arg2 arg3

the stack would look as follows

The Stack

+-------+
| STACK |
+-------+
| 4     |
| a.out |
| arg1  |
| agr2  |
| agr3  |
+-------+

The Stack Pointer ($rpi) would be pointing to the top of the stack, i.e. the memory address of the argc value, 4. Of course, these values would all be encoded in binary.

This program prints out argc and the program name with a newline in between:

section .data

section .text
    global _start
_start:
  call .printNumberOfArgs
  call .printNewline
  call .printArg
  call .printNewline
  call .exit

.printNumberOfArgs:
  pop rbx         ; this is the address of the calling fxn. Remove it from the stack 
                  ; for a moment so I can get to the argc
  pop rcx         ; get argc from stack
  add rcx, 48     ; convert number of args to ascii (only works if < 10)
  push rcx        ; push the ascii converted argc to stack
  mov rsi, rsp    ; store value of rsp, i.e. pointer to ascii argc to param2 of
                  ; sys_write fxn
  mov rdx, 8      ; param3 of sys_write fxn is the number of bits to print
  push rbx        ; return the address of the calling fxn to top of stack.
  call .print
  ; clean up the newline character pushed onto the stack. Retaining the return 
  ; address currently on top of stack
  pop rbx
  pop rcx
  push rbx
  ret

.printArg:
  pop rcx         ; this is the address of the calling fxn. Remove it from 
                  ; the stack for a moment so I can get to the argc       
  mov rsi, [rsp]  ; contents of memory address of stack pointer
  mov rdx, 7      ; how long is the message?
  push rcx        ; push return address back onto stack where it is expected
  jmp .print

.printNewline:
  pop rbx         ; this is the address of the calling fxn. Remove it from the 
                  ; stack for a moment so I can get to the argc
  push 10         ; ascii newline character
  mov rsi, rsp    ; rsp points to top of stack. Newline has been pushed to top 
                  ; of stack. rsi is where 2nd param of sys_write is stored
  push rbx        ; return the address of the calling fxn to top of stack.
  call .print
  ; clean up the newline character pushed onto the stack. 
  ; Retaining the return address currently on top of stack
  pop rbx
  pop rcx
  push rbx
  ret
  
.print:           ; print expects the calling location to be at top of stack
  mov rax, 1
  mov rdi, 1
  syscall
  ret             ; return to location pointed to at top of stack

.exit:
  mov rax, 60
  mov rdi, 0
  syscall

Note the use of push and pop to pull argc and the program name from the stack.

Also important is the use of the call command. The is how one can call a function in assembly while saving the address of the line following the line that has called the function. Then the ret command can be used to return to that line. The pointer to the memory location of the return location is stored on the stack. Notice that this value is removed from the stack in order to enable access to the “buried” values, then it is pushed back onto the stack because that is where the ret command expects the memory address to be.

Printing Command Line Arguments#

What is a Command Line Argument#

The Call Stack#

Accessing Command Line Arguments in X86–64 Assembly Language#

The Stack#

Printing Command Line Arguments

What is a Command Line Argument

The Call Stack

Accessing Command Line Arguments in X86–64 Assembly Language

The Stack