Introduction to assembly language
Learn the basic concepts necessary to analyze code in assembly language.
Assembly is the programming language closest to the machine code executed by computer processors. When reverse engineering software, especially when you don’t have access to the original source code, you often end up working with assembly-level code. Understanding assembler allows you to understand exactly what a program is doing at the hardware level. In this blog I summarized the knowledge I gained about assembly language from reading the book Practical Binary Analysis from the publisher No Starch Press.
Design of a program in assembly
Below is a comparison between C language code and assembly language code.
code in C language
#include <stdio.h>
int main(int argc, char *argvp[]){
printf("Hello world\n");
return 0;
}
assembly language code
.file "hello.c"
.intel_syntax noprefix
.section .rodata
.LC0:
.string "Hello world\n"
.text
.global main
.type main, @function
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edit
mov QWORD PTR [rbp-16], rsi
mov edit, OFFSET FLAT:.LC0
call puts
mov eax, 0
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1..)"
.section .note.GNU-stack,""
The C language source code consists of a main
function that makes a call to the printf
function to display the message “Hello world” on the screen. At a higher level, the corresponding assembly program consists of 4 types of components: instructions, directives, tags and comments.
In assembly language source code, the .section .rodata
directive tells the assembler to place the following content in the .rodata section, which is dedicated to storing read-only constant data. The .section
directive, as mentioned, tells the assembler in which section to place the content that follows it, while .string
is a directive that allows the definition of an ASCII string. There are also other directives to define other data types such as .byte
.word
.long
and .quad
.
The main
function is located in the .text
section, dedicated to storing the code. The .text
directive is a short form of .section .text
and main:
introduces a symbolic tag for the main function.
After defining the main
tag, continue with the instructions contained in main. These instructions can symbolically refer to information previously declared as .LC0
(.LC0 is the symbolic name that the gcc compiler chose for the string “Hello world”).
Assembly language instructions, directives, labels, and comments
Type | Example | Description |
---|---|---|
Instruction | mov eax, 0 | eax=0 |
directive | .section .text | Locate the following content in the section .text |
directive | .string “foobar” | Defines an ASCII string which contains “foobar” |
directive | .long 0x12345678 | Defines a double word with the value 0x12345678 |
Label | foo: .string “foobar” | Defines the string “foobar” with the symbolic name foo |
Comment | \# This is a comment | A comment |
The instructions are the operations that the CPU executes. directives are commands that tell the assembler to produce a particular piece of data, place instructions or information in a particular section, etc. Finally tags are symbolic names that can be used to refer to instructions or data in the assembly program and comments are simply texts made to document the code to other people.
Separation between code and data
In the assembly source code seen above, you can distinguish the code from the data as they were separated into different sections. This makes it easier to inspect the code since you can see which bytes the code corresponds to and which bytes the data corresponds to, however this is not always the case since there is nothing in the x86 architecture that prevents the code from being mixed up. and data in the same section and in practice some hand-crafted compilers or assemblers do exactly this.
AT&T Syntax vs Intel Syntax
When analyzing code it is important to determine the syntax we are observing since the way in which we work with operands is different, among other things.
For example, the AT&T
syntax explicitly prefixes registers with %
and prefixes $.
with constants.
The order of the operands also changes:
; at&t syntax
mov $0x6, %edi
; intel syntax
mov edit, 0x6
; operation performed
ed = 0x6
Structure of an x86 instruction
At the assembly level, instructions generally have the following form:
mnemonic destination, source
The mnemonic is the representation of a machine instruction, and the source and destination are the operands of the instruction.
Not all instructions have two operands, some instructions do not even have operands.
Machine-level structure of x86 instructions
The x86 ISA (Instruction Set Architecture) uses variable length instructions. That is, there are instructions that consist of only 1 byte, multibyte instructions, and there can also be instructions that reach up to 15 bytes.
In addition to this, instructions can start at any memory address. This means that the CPU does not force any alignment on the code, although some compilers sometimes align code to optimize performance.
Prefix | Opcode | Offset | Immediate | Addressing mode | SIB Byte |
---|---|---|---|---|---|
0-4 bytes | 1-3 bytes | 0/1/2/4 bytes | 0/1/2/4 bytes | 0-1 | 0-1 |
An instruction consists of an optional prefix, an opcode, and zero or more operands. All parts are optional except the opcode.
The opcode is the main part of the instruction, while prefixes can modify the behavior of an instruction.
Some instructions have implicit operands. That is, they are not explicitly shown in the instruction since they are innate to the opcode. For example, the recipient operand of opcode 0x05
(an add instruction) is always rax
and only the source operand (src) is variable and needs to be explicitly specified.
Another example of implicit operands is the push
instruction which implicitly updates rsp
(stack pointer register).
Instructions can have different types of operands:
- register operands
- memory operands
- immediate operands
Register Operands
Registers are small but very fast pieces of storage stored in the CPU. Some registers have special purposes, such as the instruction pointer that stores the current address of execution or the stack pointer that stores the address of the top of the stack.
General purpose records:
In the 8086 instruction set, the registers were 16 bits. The x86 ISA extended registers to 32 bits, and x86-64 extended them further to 64 bits. To maintain compatibility, the registers used in new instruction sets are supersets of the old registers.
To specify a register operand in assembler, the register name is used. For example, what mov rax, 64
does is move the value 64 to the rax
register. In the previous example we are using the 64-bit rax register, if we wanted to use the 32-bit part we would have to specify the name eax
and if we wanted to use the 16-bit part we would use the name ax
and finally, if we wanted to use the upper 8 bits of ax
we would use the name ah
and to access the lower 8 bits we would use the name al
.
Other registries such as rbx
, rcx
, rdx
among others, follow the same naming system to access their lower parts.
Registers r8-r15
were added in x86-64 and are not available in earlier variants of x86.
Other records
In addition to the registers mentioned above, there are also other registers such as rip
(eip in 32 bit x86 and ip in 8086) and rflags
(called eflags or flags). The rip
register always points to the address of the next instruction to be executed and is automatically updated by the CPU. The status flags register is used for comparisons and conditionals as well as tracking things like whether the last operation returned zero or resulted in overflow, etc.
Memory operands
Memory operands specify a memory address where the CPU should look for one or more bytes. The x86 ISA supports only one explicit memory operand per instruction. This means that you cannot move bytes from one memory location to another location in the same instruction. To do this, logs must be used as buffer storage.
On x86, memory operands are specified as follows:
[base + index*scale + displacement]
Base and index are 64-bit registers, scale is an integer with the value 1, 2, 4, or 8, and displacement is a 32-bit constant or a symbol.
For example, you can use a statement like mov eax, DWORD PTR [rax*4 + arr]
to access an array, where arr
is the offset containing the starting address of the array, rax
contains the index of the element you want to access from the array and each element of the array is 4 bytes, that is why rax * 4
is multiplied. DWORD PTR
tells the assembler that we want 4 bytes (a doubleword or DWORD) of memory.
Immediate operands
These operands are simply constants encoded in the instruction. For example, in the instruction add rax, 42
, the value 42 is the immediate one.
On x86, immediates are encoded in little-endian format.
Common x86 instructions
List of online instructions:
- http://ref.x86asm.net/
- https://software.intel.com/en-us/articles/intel-sdm/
+--------------------------------------------------------------------+
| Data Transfer |
+----------------------+---------------------------------------------+
| Instruction | Description |
+----------------------+---------------------------------------------+
| mov dst, src | dst = src |
| xchg dst1, dst2 | swap dst1 and dst2 |
| push src | push src onto stack and decrement rsp |
| pop dst | pop value from stack into dst and inc rsp |
+--------------------------------------------------------------------+
| Arithmetic |
+----------------------+---------------------------------------------+
| Instruction | Description |
+----------------------+---------------------------------------------+
| add dst, src | dst += src |
| sub dst, src | dst -= src |
| inc dst | dst += 1 |
| dec dst | dst -= 1 |
| neg dst | dst = -dst |
| cmp src1, src2 | set status flag based on src1 - src2 |
+--------------------------------------------------------------------+
| Logical/bitwise |
+----------------------+---------------------------------------------+
| Instruction | Description |
+----------------------+---------------------------------------------+
| and dst, src | dst &= src |
| or dst, src | dst |= src |
| xor dst, src | dst ^= src |
| not dst | dst = ~dst |
| test src1, src2 | set status flag based on src1 & src2 |
+--------------------------------------------------------------------+
| unconditional branches |
+----------------------+---------------------------------------------+
| Instruction | Description |
+----------------------+---------------------------------------------+
| jmp addr | jump to addr |
| call addr | push return address on stack, then return |
| | function at addr |
| ret | pop return address from stack and return |
| | to that address |
| syscall | Enter the kernel to perform a system call |
+--------------------------------------------------------------------+
| conditional branches |
+----------------------+---------------------------------------------+
| Instruction | Description |
+----------------------+---------------------------------------------+
| je addr/jz addr | jump to addr if zero flag is set |
| | for example, operands were equal on the |
| | last cmp |
| ja addr | jump if dst is above src in the last cmp |
| jb addr | jump if dst is below src in the last cmp |
| jg addr | jump if dst is greater thn src in the cmp |
| jl addr | jump if dst is less than src in the cmp |
+----------------------+---------------------------------------------+
Comparing operands (status flags)
The cmp
instruction subtracts the second operand from the first operand, and according to the result of this operation, sets several status flags in the rflags
register that we can work with later. The most important flags are the following:
- zero flag (ZT) if the result of the subtraction is zero, this means that the operands are equal or rather, they have equal values.
- sign flag (SF) if the result was negative, this means that in the operation
cmp src1, src2
the operand src2 is greater than src1. - overflow flag (OF) the result was an overflow
The test
instruction does the same thing on rflags
only instead of performing a subtraction, it performs an addition.
Implementing system calls
To make a system call, the syscall
instruction is used, but before using it, the system call must be prepared by selecting a number and setting its operands as specified by the system. For example, to make a read system call in Linux, the value 0 is loaded in rax
, then the file descriptor is loaded in (rdi
), buffer address in (rsi
) and number of bytes to read in (rdx
)
section .data
dbbuffer 256 ; Reserve a 256 byte buffer
len equ 256 ; buffer length
section .text
global _start
_start:
; Read system call to read from stdin
mov rax, 0 ; syscall number for read (0)
mov rdi, 0 ; file descriptor 0 (stdin)
mov rsi, buffer ; buffer address
mov rdx, len ; number of bytes to read (buffer size)
syscall ; makes the read system call
mov rbx, rax ; saves the number of bytes read in rbx
; write system call to write to stdout
mov rax, 1 ; syscall number for write (1)
mov rdi, 1 ; file descriptor 1 (stdout)
mov rsi, buffer ; buffer address
mov rdx, rbx ; uses the number of bytes read as a limit
syscall ; makes the write system call
; Exit the program
mov rax, 60 ; syscall number for exit (60)
xor rdi, rdi ; exit code 0
syscall ; program ends
The previous example is an assembly program that reads a message from the keyboard and displays it on the screen. To run it we use the following command:
# we use the nasm assembler
$ nasm -f elf64 -o syscall_example.o syscall_example.asm
# we link the program
$ld -o syscall_example syscall_example.o
# we execute it
./syscall_example
Implementing conditional jumps
As mentioned above, conditional jumps work thanks to state flags which are modified by instructions like cmp
or test
. These jumps are made to specific addresses or labels if the condition is met and if it is not met, the jump will simply be ignored and the instruction that follows it will be executed.
cmp rax, rbx
jb label
In the previous example, a cmp is performed and subsequently a jb (jump if below). This means that if rax < rbx (unsigned comparison) then the jump is performed.
In the following example, the jump jnz (jump if not zero) is performed if rax is not equal to 0
test rax, rax
jnz label
Loading memory addresses
The lea
(load effective address) instruction computes the resulting address of a memory operator and stores it in a register. It is equivalent to the &
operator in C/C++ language.
read r12, [rip+0x2000]
In the previous example the instruction lea
loads the memory address resulting from rip+0x2000
into register r12
La pila (stack)
The stack is a region of memory reserved for storing data related to function calls, such as the return address, function arguments, and local variables. The stack got its name because of the way it is accessed. Instead of writing data to random places on the stack, data is written in last-in-first-out or LIFO order. In this way, values can be written by pushing from the top.
As data is pushed onto the stack, the rsp register (register pointing to the top of the stack) decreases and this is because the stack increases with lower memory addresses.
It should be noted that when we perform a push, as mentioned before, the value is stored at the lowest address of the stack (rsp
) and when we perform a push, the rsp
increments until it is at the memory address it had. Previously, however, the value that we popped is still in memory, so it is important to know that if we have sensitive information on the stack and we want to clean it completely, we must overwrite it or delete it explicitly since a pop will not do it. do.
Function calls and function frames
source code in C language
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
printf("%s=%s\n", argv[1], getenv(argv[1]));
return 0;
}
source code in assembly language
Contents of section .rodata:
.LC0:
.string "%s=%s\n"
Contents of section .text:
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-16]
add rax, 8
mov rax, QWORD PTR [rax]
mov rdi, rax
call getenv
mov rdx, rax
mov rax, QWORD PTR [rbp-16]
add rax, 8
mov rax, QWORD PTR [rax]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
Two function calls are shown in the C language code. The first is getenv
which is used to get the value of an environment variable specified by argv[1]
. Then another call is made to the printf
function.
Now if we compare the C language code with the assembly code, we can observe several things. First, the string “%s=%s” is stored as a constant in the .rodata
(read-only data) section and the symbol .LC0
is used to refer to that constant.
Each function has its own frame on the stack, delimited by rbp
pointing to the base of the function frame and rsp
pointing to the top of the function frame.
In the example we see of the previous assembly code, the first thing the main function does is execute the prologue which basically what it does is establish the function frame.
push rbp ; saving the contents of rbp on the stack
mov rbp, rsp ; Now rbp has the same value as rsp
This prologue is so common that a shortening statement that does the same thing called enter
has been created.
On Linux x86-64, the rbx
and r12-r15
registers must not be contaminated during the execution of a function, that is, if a function makes use of these registers, the function must restore them to the original value before return (ret
). This is achieved by storing the values of said registers that need to be restored on the stack at the beginning of the function execution, and popping said values to restore the registers before returning.
After executing the prologue of the function, the rbp
register is decremented by 0x10 (16) bytes to reserve space for local variables on the stack (4 bytes for argc and 8 bytes for the argv pointer and the rest of bytes used for padding).
On x86-64 Linux systems, the first 6 arguments to a function are passed using the rdi, rsi, rdx, rcx, r8, and r9 registers. If the function receives more than 6 arguments or some arguments do not fit in the 64-bit registers, then the remaining arguments are stored on the stack in reverse order.
; storing parameters in variables before calling a function
mov rdi, param1
mov rsi, param2
mov rdx, param3
mov rcx, param4
mov r8, param5
mov r9, param6
push param9
push param8
push param7
...
call function
This can vary depending on the convention used to pass parameters to functions, that is, if the cdecl convention is used, all arguments are passed on the stack using reverse order without using any registers. Another convention is fastcall which passes some arguments into registers.
The red zone
The red zone is a 128-byte area (in the x86-64 ABI) below the stack pointer. Programmers and compilers can use this area to store temporary data without needing to modify the stack pointer. This can be useful to optimize certain operations and avoid additional instructions to adjust the stack.
A key characteristic of the red zone is that the operating system does not preserve it when handling interrupts or signals. This means that if an interrupt occurs, the interrupt handler could overwrite the data in the red zone.
Preparing arguments and calling functions
In the assembly source code at the beginning of #function-calls-y-function-frames, After running the function prologue, the following is done:
; the memory address where argv begins is stored in the rax register
mov rax, QWORD PTR [rbp-16] ; now rax points to argv[0]
; As in the code in C language what was used was the index argv[1] then
; we add 8 bytes to rax (8 bytes is the size of a pointer) so that it points to
; argv[1]
add rax, 8 ; now rax points to argv[1]
; rax points to argv[1], however what we want is the value at which it is
; pointing rax so the following is done
mov rax, QWORD PTR [rax] ; now rax stores the value in argv[1]
; Now you have to pass the value of rax to rdi since it is the register used to
; pass parameters to functions
mov rdi, rax ; now rdi contains the same value as rax
; The call to the getenv function is made and rdi (argv[1]) is passed to it as a parameter.
call getenv
Reading values returned by functions
If a function returns a value, this value will be stored in rax. In the assembly source code we see at the beginning of #function-calls-y-function-frames after the call The following happens to the getenv function:
call getenv
; the value returned by getenv is stored in the rdi register since it is the register
; which will be passed as an argument later to the printf function
mov rdx, rax
; The process seen above is repeated to store the value of argv[1] in rsi
mov rax, QWORD PTR [rbp-16] ; rax = &argv[0]
add rax, 8 ; rax = &argv[1]
mov rax, QWORD PTR [rax] ; rax = argv[1]
mov rsi, rax ; rsi = rax
; Finally, the value of the constant referenced by .LC0 is stored in edi
mov edit, OFFSET FLAT:.LC0
; When calling a variadic function, the rax register fulfills the function of specifying
; the number of float arguments passed to the function, however in this case it is not
; there are no float arguments so rax (eax) is equal to 0
mov eax, 0
; At this point, the following records look like this
; PARAMETER 1: rdi = "%s=%s\n"
; PARAMETER 2: rsi = argv[1]
; PARAMETER 3: rdx = getenv(argv[1])
; now the printf function is called
call printf ; printf(rdi, rsi, rdx); -> printf("%s=%s\n", argv[1], getenv(argv[1]));
Returning from a function
After the printf
function call is completed, the value of rax
is set to 0 since this register is the one used to return values as mentioned above.
After this, the leave
statement is executed, which is a shorthand way of executing the following:
mov rsp, rbp
pop rbp
This is known as epilo of the function and is done to restore the stack to the initial state before the function call.
Finally, the ret
instruction is executed, which does a pop to obtain the address where it should be returned and subsequently continue the execution of the program in said instruction.
In this way the execution of the main function ends.
Conditional branches
source code in C language
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
if (argc > 5){
printf("argc > 5\n");
}else{
printf("argc <= 5\n");
}
return 0;
}
assembly source code
.LC0:
.string "argc > 5"
.LC1:
.string "argc <= 5"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edi
mov QWORD PTR [rbp-16], rsi
cmp DWORD PTR [rbp-4], 5
jle .L2
mov edi, OFFSET FLAT:.LC0
call puts
jmp .L3
.L2:
mov edi, OFFSET FLAT:.LC1
call puts
.L3:
mov eax, 0
leave
ret
In the above code, after the prologue the values argc and argv are stored on the stack, after this a comparison is made between argc and the constant or immediate value 5
.
; Foreword
push rbp
mov rbp, rsp
; the values of edi and rsi are stored on the stack
mov DWORD PTR [rbp-4], edi ; argc is saved here
mov QWORD PTR [rbp-16], rsi ; argv is saved here
; the comparison is made
cmp DWORD PTR [rbp-4], 5 ; compare(argc, 5)
jle .L2 ; if argc <= 5 then jump to .L2
; If the condition is not met, continue with the rest of the code
; The value of .LC0 is saved in edi to pass it as an argument
; to the puts or printf function
mov edit, OFFSET FLAT:.LC0
call puts ; The puts function is called and the edit record is passed as an argument.
jmp .L3 ; jump to .L3
.L2:
; In case argc is greater than 5, this is the code that will be executed
mov edit, OFFSET FLAT:.LC1 ; the value "argc > 5" is stored in edi
call puts
.L3:
mov eax, 0 ; the return value (eax) is set to 0
leave ; the battery is restored
ret ; It returns to the function that called main
Loops
source code in C language
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
while (argc > 0){
printf("%s\n", argv[(unsigned)--argc]);
}
return 0;
}
source code in assembly language
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edi
mov QWORD PTR [rbp-16], rsi
jmp .L2
.L3:
sub DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
mov eax, eax
lea rdx, [0+rax*8]
mov rax, QWORD PTR [rbp-16]
add rax, rdx
mov rax, QWORD PTR [rax]
mov rdi, rax
call puts
.L2:
cmp DWORD PTR [rbp-4], 0
jg .L3
mov eax, 0
leave
ret
What the previous C language code does, in summary, is display all the arguments passed via CLI on the screen, from the last to the first in a loop.
$exec-cpp hello world test 1 2 3
3
2
1
proof
world
hello
./prog.out
Now the explanation of the assembly code is as follows:
main:
; epilogue of the function
push rbp
mov rbp, rsp
; reserving memory on the stack
sub rsp, 16
; storing the argc and argv values on the stack
mov DWORD PTR [rbp-4], edit
mov QWORD PTR [rbp-16], rsi
; unconditional jump to .L2
jmp .L2
.L3:
; The immediate value 1 is subtracted from argc
sub DWORD PTR [rbp-4], 1
; the value of argc is stored in eax
mov eax, DWORD PTR [rbp-4]
; ???
mov eax, eax
; The value of the result of the effective address of the device is stored in rdx.
; memory operand [0 + rax * 8]
read rdx, [0+rax*8]
; rax = &argv[0]
mov rax, QWORD PTR [rbp-16]
; rax += rdx
add rax, rdx
; rax = argv[rdx]
mov rax, QWORD PTR [rax]
; rdi = rax
mov rdi, rax
; puts(rdi);
call puts
.L2:
; comparing argc with the immediate value 0
cmp DWORD PTR [rbp-4], 0
; if argc > 0, then jump to .L3
jg .L3
; This code is executed only when the value of argc is equal to 0
mov eax, 0
leave
ret
With this this blog ends. As a final recommendation, it is good practice to compare the code that we make in C language with its part in assembly language to further understand what happens in our program.