Basic Binary Analysis

Basic Binary Analysis

Learn the procedure to perform basic binary analysis on Linux.

Before you start reading this blog, I recommend that you download the following file so that you can replicate the procedure from your Linux machine.


Challenge file


Identifying files with file

When we are dealing with an unknown file, the first thing we must do is determine what type of file it is to know what we can do with it. Otherwise we would be doing blind tests.

To determine the type of file we are dealing with, we use the file tool which is based mainly on file patterns such as magic bytes (0x7f ELF in the case of ELF files).

$ file payload
payload: ASCII text

$ head payload
H4sIABzY61gAA+xaD3RTVZq/Sf+lFJIof1r+2aenKKh0klJKi4MmJaUvWrTSFlgR0jRN20iadpKX
UljXgROKjbUOKuOfWWfFnTlzZs/ZXTln9nTRcTHYERhnZ5c/R2RGV1lFTAFH/DNYoZD9vvvubd57
bcBl1ln3bL6e9Hvf9+733e/+v+/en0dqId80WYAWLVqI3LpooUXJgUpKFy6yEOsCy6KSRQtLLQsW
EExdWkIEyzceGVA4JLmDgkCaA92XTXel9/9H6ftVNcv0Ot2orCe3E5RiJhuVbUw/fH3SxkbKSS78
v47MJtkgZynS2YhNxYeZa84NLF0G/DLhV66X5XK9TcVnsXSc6xQ8S1UCm4o/M5moOCHCqB3Geny2
rD0+u1HFD7I4junVdnpmN8zshll6zglPr1eXL5P96pm+npWLcwdL51CkR6r9UGrGZ8O1zN+1NhUv
ZelKNXb3gl02+fpkZnwFyy9VvQgsfs55O3zH72sqK/2Ov3m+3xcId8/vLi+bX1ZaHOooLqExmVna
6rsbaHpejwKLeQqR+wC+n/ePA3n/duKu2kNvL175+MxD7z75W8GC76aSZLv1xgSdkGnLRV0+/KbD
7+UPnnhwadWbZ459b/Wsl/o/NZ468olxo3P9wOXK3Qe/a8fRmwhvcTVdl0J/UDe+nzMp9M4U+n9J
oX8jhT5HP77+ZIr0JWT8+NvI+OnvTpG+NoV/Qwr9Vyn0b6bQkxTl+ixF+p+m0N+qx743k+wWGlX6

The first command shows the functionality of file which tells us that we are dealing with a text file (ASCII) so we proceed to use head to print the top lines of the file. By seeing the initial content, you can easily deduce that it is encoded in base 64.

Base64 is a method used to encode binary information into ASCII text. It is commonly used in web services to ensure that binary information sent over the network is not accidentally corrupted by text-only services. To decode base 64 we use the following command:

$ base64 -d payload > payload_decoded

$ file payload_decoded 
payload_decoded: gzip compressed data, last modified: Mon Apr 10 19:08:12 2017, from Unix, original size modulo 2^32 808960

$ file -z payload_decoded
payload_decoded: POSIX tar archive (GNU) (gzip compressed data, last modified: Mon Apr 10 19:08:12 2017, from Unix)

In the last command executed we used a file functionality to inspect compressed files. This shows us that we are dealing with a file that has been compressed by several layers. The first layer is a gzip compression and the second layer is a tar file.

To unzip the file we use the following command:

# -x : Extract files from an archive
# -v : Verbose
# -z : Filtra el archivo usando gzip
# -f : Especifica el archivo
$ tar xvzf payload_decoded
ctf
67b8601

$ file ctf 67b8601       
ctf:     ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=29aeb60bcee44b50d1db3a56911bd1de93cd2030, stripped
67b8601: PC bitmap, Windows 3.x format, 512 x 512 x 24, image size 786434, resolution 7872 x 7872 px/m, 1165950976 important colors, cbSize 786488, bits offset 54

The zip file contains two files, the ctf file and the 67b8601 file. The ctf file is a 64-bit dynamically linked ELF stripped executable binary and the 67b8601 file is a 512 x 512 pixel bitmap (BMP).

bitmap


Inspecting the ELF file

Now we are going to run the ELF binary (It is not advisable to run unknown binaries on our machines, however in my case I am doing it from a virtual machine).

$ ./ctf                                 
./ctf: error while loading shared libraries: lib5ae9b7f.so: cannot open shared object file: No such file or directory

Before we can run the binary, the linker tells us that a library called lib5ae9b7f.so was not found. To determine what other libraries are needed to run the binary we will use the ldd tool. This tool tells us which shared objects a binary depends on, as well as telling us where these dependencies are located in our system. Be careful with ldd as it can run the binary to determine what dependencies are needed, so it is not safe to use ldd with unknown or dangerous binaries.

$ ldd ctf 
	linux-vdso.so.1 (0x00007fff7efbd000)
	lib5ae9b7f.so => not found
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f840dc00000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f840df94000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f840da1e000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f840deb5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f840dfd2000)	

Reviewing the previous result, there is only one dependency that is not found which is lib5ae9b7f.so

Since the name of the missing library is unusual, this is an indication that we are not going to find it in any repository, therefore it must be hidden somewhere else so we are going to use greo to find the library of the following way.

$ grep "ELF" *                    
grep: 67b8601: match in binary file
grep: ctf: match in binary file

What grep is doing is searching for matches with the specified pattern (“ELF”) in all files in the current directory. This is done because there are magic bytes at the beginning of the ELF binaries. Magic bytes are a series of bytes or characters that serve to indicate that we are indeed dealing with an ELF file. For ELF files, magic bytes consist of the following 4 bytes (0x7f 0x45 0x4c 0x46 == 0x7fELF). These magic bytes are part of the ELF file header.

As we could see in the result of the previous command, two matches were found. The first match is in the ctf file since it is an ELF binary as indicated by the file tool and the second match is found in the 67b8601 file which is strange since according to the result of the file command `, this is not an ELF file but rather a bitmap. This is an indication that there is probably a hidden ELF binary inside the BMP file.


Inspecting the bitmap with xxd

To inspect the file 67b8601 we will have to do it at the byte level. For this the xxd tool is used.

$ xxd 67b8601 | head
00000000: 424d 3800 0c00 0000 0000 3600 0000 2800  BM8.......6...(.
00000010: 0000 0002 0000 0002 0000 0100 1800 0000  ................
00000020: 0000 0200 0c00 c01e 0000 c01e 0000 0000  ................
00000030: 0000 0000 [7f45 4c46] 0201 0100 0000 0000  .....ELF........
00000040: 0000 0000 0300 3e00 0100 0000 7009 0000  ......>.....p...
00000050: 0000 0000 4000 0000 0000 0000 7821 0000  ....@.......x!..
00000060: 0000 0000 0000 0000 4000 3800 0700 4000  ........@.8...@.
00000070: 1b00 1a00 0100 0000 0500 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 f40e 0000 0000 0000 f40e 0000  ................

In the previous result, the magic bytes of the ELF file are observed as already mentioned. With this we already have the beginning of the ELF file, however, finding the end of the file is not so easy since ELF binaries do not have magic bytes to indicate the end of the file.

To determine the total size of the file we must first inspect the header of the ELF binary. Since we already know where it starts (offset 0x34 or 52 in decimal) and what size the header of the ELF binaries is (64 bytes) we can use the dd tool to extract the header.

$ dd skip=52 count=64 if=67b8601 of=elf_header bs=1                                                   
64+0 records in
64+0 records out
64 bytes copied, 0.000793126 s, 80.7 kB/s

Parsing the ELF extracted with readelf

To parse the extracted ELF header, we will use the readelf tool.

$ readelf -h elf_header           
Encabezado ELF:
  Mágico:  7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Clase:                             ELF64
  Datos:                             complemento a 2, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  Versión ABI:                       0
readelf: Error: Too many program headers - 0x7 - the file is not that big
  Tipo:                              DYN (Shared object file)
  Máquina:                           Advanced Micro Devices X86-64
  Versión:                           0x1
  Dirección del punto de entrada:    0x970
  Inicio de encabezados de programa: 64  (bytes in the file)
  Inicio de encabezados de sección:  8568 (bytes in the file)
  Opciones:                          0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         7
  Size of section headers:           64 (bytes)
  Number of section headers:         27
  Section header string table index: 26
readelf: Error: Reading 1728 bytes extends past end of file for encabezados de sección
readelf: Error: Too many program headers - 0x7 - the file is not that big

With the above information, we can determine the total size of the ELF file as follows: First we must remember that the last part of an ELF file is the section header table and the offset to said table is found in the previous result.

   Start of section headers: 8568 (bytes in file)

Second, the previous header also tells us the size of each section header

   Size of section headers: 64 (bytes)

And finally, it also tells us the number of section headers in the table.

   Number of section headers: 27

With this information you can calculate the full size of the ELF file as follows. size=e_shoff+(e_shnume_shentsize)=8568+(2764)=10296\begin{align*} \text{size} &= e\_shoff + (e\_shnum *e\_shentsize) \\ &= 8568 + (27 * 64) \\ &= 10296 \end{align*}

Now that we know the total size of the ELF file inserted into the bitmap, we just need to extract it using dd again.

$ dd skip=52 count=10296 if=67b8601 of=lib5ae9b7f.so bs=1
10296+0 records in
10296+0 records out
10296 bytes (10 kB, 10 KiB) copied, 0.050333 s, 205 kB/s

$ file lib5ae9b7f.so 
lib5ae9b7f.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=5279389c64af3477ccfdf6d3293e404fd9933817, stripped

Now let’s inspect the library extracted with readelf to determine if the binary has been extracted correctly.

$ readelf -hs lib5ae9b7f.so 
Encabezado ELF:
  Mágico:  7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Clase:                             ELF64
  Datos:                             complemento a 2, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  Versión ABI:                       0
  Tipo:                              DYN (Shared object file)
  Máquina:                           Advanced Micro Devices X86-64
  Versión:                           0x1
  Dirección del punto de entrada:    0x970
  Inicio de encabezados de programa: 64  (bytes in the file)
  Inicio de encabezados de sección:  8568 (bytes in the file)
  Opciones:                          0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         7
  Size of section headers:           64 (bytes)
  Number of section headers:         27
  Section header string table index: 26

Symbol table '.dynsym' contains 22 entries:
   Num:    Valor          Tam  Tipo    Unión  Vis      Nombre Ind
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000000008c0     0 SECTION LOCAL  DEFAULT    9 .init
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4.21 (2)
     5: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBC_2.2.5 (3)
     6: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     7: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     8: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (3)
     9: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __[...]@GLIBC_2.4 (4)
    10: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBCXX_3.4 (5)
    11: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memcpy@GLIBC_2.14 (6)
    12: 0000000000000bc0   149 FUNC    GLOBAL DEFAULT   12 _Z11rc4_encryptP[...]
    13: 0000000000000cb0   112 FUNC    GLOBAL DEFAULT   12 _Z8rc4_initP11rc[...]
    14: 0000000000202060     0 NOTYPE  GLOBAL DEFAULT   24 _end
    15: 0000000000202058     0 NOTYPE  GLOBAL DEFAULT   23 _edata
    16: 0000000000000b40   119 FUNC    GLOBAL DEFAULT   12 _Z11rc4_encryptP[...]
    17: 0000000000000c60     5 FUNC    GLOBAL DEFAULT   12 _Z11rc4_decryptP[...]
    18: 0000000000202058     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
    19: 00000000000008c0     0 FUNC    GLOBAL DEFAULT    9 _init
    20: 0000000000000c70    59 FUNC    GLOBAL DEFAULT   12 _Z11rc4_decryptP[...]
    21: 0000000000000d20     0 FUNC    GLOBAL DEFAULT   13 _fini

When inspecting the symbol table, functions with names that are difficult to read are observed, such as _Z11rc4_encryptP, however, it can be deduced that this library has some functionality to encrypt and decrypt data.


Parsing symbols with nm

In C++ language, function overloading allows you to define multiple functions with the same name in a class or namespace, as long as they have different signatures. This is useful in situations where you want multiple functions to perform similar tasks but with different data types or different numbers of arguments.

A function signature is used to uniquely identify a function in the program. This means that two functions can have the same name, but if they have different signatures (that is, they take different types of parameters or a different number of parameters), they will be considered different functions and there will be no ambiguity in their call.

int sum(int a, int b) {
     return a + b;
}

double sum(double a, double b) {
     return a + b;
}

These two functions have the same name: sum, but different signatures (one takes integers and the other takes floating point numbers). Thanks to function overloading, the compiler will be able to determine which one to use based on the arguments you pass to it when calling the function.

However, the linker has no knowledge about this. For example, if there are multiple functions with the name foo, the linker does not know how to resolve references to foo; You just won’t know which version to use.

To prevent this from happening, the C++ compiler emits mangled function names. An mangled name is basically a combination of the original function name and an encoding of the functions parameters. This way, each version of the overloaded function gets a unique name and the linker will have no problem identifying each of these functions.

For the analyst, this is partly beneficial, since the mangled function names provide information about the parameters that the function expects when called. A useful tool for demangling function names is nm.

$nm lib5ae9b7f.so
nm:lib5ae9b7f.so: no symbols

In the previous example, nothing interesting happened and this is because the file lib5ae9b7f.so is stripped so we must tell nm to analyze the dynamic symbol table.

$ nm -D lib5ae9b7f.so 
0000000000202058 B __bss_start
                 w __cxa_finalize@GLIBC_2.2.5
0000000000202058 D _edata
0000000000202060 B _end
0000000000000d20 T _fini
                 w __gmon_start__
00000000000008c0 T _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
                 U malloc@GLIBC_2.2.5
                 U memcpy@GLIBC_2.14
                 U __stack_chk_fail@GLIBC_2.4
0000000000000c60 T _Z11rc4_decryptP11rc4_state_tPhi
0000000000000c70 T _Z11rc4_decryptP11rc4_state_tRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000000b40 T _Z11rc4_encryptP11rc4_state_tPhi
0000000000000bc0 T _Z11rc4_encryptP11rc4_state_tRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000000cb0 T _Z8rc4_initP11rc4_state_tPhi
                 U _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm@GLIBCXX_3.4.21
                 U _ZSt19__throw_logic_errorPKc@GLIBCXX_3.4

The previous result shows us more information, but we still have to do the demangling of the function names. So we do it with the --demangle parameter.

$ nm -D --demangle lib5ae9b7f.so 
0000000000202058 B __bss_start
                 w __cxa_finalize@GLIBC_2.2.5
0000000000202058 D _edata
0000000000202060 B _end
0000000000000d20 T _fini
                 w __gmon_start__
00000000000008c0 T _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 w _Jv_RegisterClasses
                 U malloc@GLIBC_2.2.5
                 U memcpy@GLIBC_2.14
                 U __stack_chk_fail@GLIBC_2.4
0000000000000c60 T rc4_decrypt(rc4_state_t*, unsigned char*, int)
0000000000000c70 T rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
0000000000000b40 T rc4_encrypt(rc4_state_t*, unsigned char*, int)
0000000000000bc0 T rc4_encrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
0000000000000cb0 T rc4_init(rc4_state_t*, unsigned char*, int)
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@GLIBCXX_3.4.21
                 U std::__throw_logic_error(char const*)@GLIBCXX_3.4

Now we can read well the name of the functions which seem to be cryptographic functions that implement the RC4 encryption algorithm. Apart from nm, we can also use c++filt to demanglig function names.

$ c++filt _Z8rc4_initP11rc4_state_tPhi                           
rc4_init(rc4_state_t*, unsigned char*, int)

At this point we have already found the missing library, but before we can run the binary, we must tell the linker where to look for the library.

# specifying where to look for the library
$ export LD_LIBRARY_PATH=`pwd`

# running the binary
$ ./ctf

# since nothing happened we show the return code of the executed process
$echo$?
1

The binary now runs without any problems, but nothing interesting happened during execution and the exit status returns the value 1, indicating an error.


Parsing the ELF file strings

Executable files often contain text strings (strings) that are typically displayed on the screen if certain conditions are met or to display help messages. These strings can help us determine more or less how the program works. For example, if we review the strings of a binary and find HTTP requests or URLs, this is an indication that the program is doing something related to the web.

Sometimes, when analyzing malware, you can find debugging strings that the programmers forgot to delete and these messages can provide important information about the operation of said malware. This is something that has happened in real life.

To analyze the strings a tool called strings is used.

$ strings ctf             
[*] /lib64/ld-linux-x86-64.so.2
lib5ae9b7f.so
[*] __gmon_start__
_Jv_RegisterClasses
_ITM_deregisterTMCloneTable
_ITM_registerTMCloneTable
_Z8rc4_initP11rc4_state_tPhi
...
[*] DEBUG: argv[1] = %s
[*] checking '%s'
[*] show_me_the_flag
>CMb
-v@P^:
flag = %s
guess again!
[*] It's kinda like Louisiana. Or Dagobah. Dagobah - Where Yoda lives!
;*3$"
zPLR
GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
[*] .shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.gcc_except_table
.init_array
.fini_array
.jcr
.dynamic
.got.plt
.data
.bss
.comment

In the previous result we can see the string DEBUG: argv[1] = %s which lets us know that the program expects a command line argument. The other strings, not counting the section names, still do not have a clear use.

Let’s remember that from the analysis we discovered that the binary uses the RC4 encryption so perhaps the message ‘It’s kinda like Louisiana. Or Dagobah. Dagobah - Where Yoda lives!` is used as an encryption key. Now we are going to perform several tests on the binary to see what happens.

# using a random string
$ ./ctf "foo"
checking 'foo'

# using a string found in the output of the strings command
$ ./ctf "show_me_the_flag"
checking 'show_me_the_flag'
okay

# the previous execution shows the message ok, however
# we still have a status code equal to 1 (error)
$echo$?
1

Analyzing system calls (strace)

We are going to analyze the calls that the binary makes to the system during its execution using the strace tool. This will give us a more superficial idea of what the binary does.

$ strace ./ctf "show_me_the_flag"
execve("./ctf", ["./ctf", "show_me_the_flag"], 0x7ffea5dae308 /* 59 vars */) = 0
brk(NULL)                               = 0xde4000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1196000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v3/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
newfstatat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v3", 0x7ffc3be97090, 0) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v2/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
newfstatat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v2", 0x7ffc3be97090, 0) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\t\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=10296, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 4202592, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f0d93000
mmap(0x7f36f0e00000, 2105440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7f36f0e00000
munmap(0x7f36f0d93000, 446464)          = 0
munmap(0x7f36f1003000, 1646688)         = 0
mprotect(0x7f36f0e01000, 2097152, PROT_NONE) = 0
mmap(0x7f36f1001000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f36f1001000
close(3)                                = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=95594, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 95594, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f36f117e000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2432256, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 2445696, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f0a00000
mmap(0x7f36f0a9c000, 1159168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9c000) = 0x7f36f0a9c000
mmap(0x7f36f0bb7000, 577536, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x7f36f0bb7000
mmap(0x7f36f0c44000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x244000) = 0x7f36f0c44000
mmap(0x7f36f0c52000, 12672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f36f0c52000
close(3)                                = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=141720, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 144232, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f115a000
mmap(0x7f36f115d000, 110592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f36f115d000
mmap(0x7f36f1178000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e000) = 0x7f36f1178000
mmap(0x7f36f117c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f36f117c000
close(3)                                = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220x\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1926256, ...}, AT_EMPTY_PATH) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 1974096, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f081e000
mmap(0x7f36f0844000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f36f0844000
mmap(0x7f36f0999000, 344064, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f36f0999000
mmap(0x7f36f09ed000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1cf000) = 0x7f36f09ed000
mmap(0x7f36f09f3000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f36f09f3000
close(3)                                = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=907784, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 909560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f107b000
mmap(0x7f36f108b000, 471040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x7f36f108b000
mmap(0x7f36f10fe000, 368640, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x83000) = 0x7f36f10fe000
mmap(0x7f36f1158000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xdc000) = 0x7f36f1158000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1079000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1076000
arch_prctl(ARCH_SET_FS, 0x7f36f1076740) = 0
set_tid_address(0x7f36f1076a10)         = 80274
set_robust_list(0x7f36f1076a20, 24)     = 0
rseq(0x7f36f1077060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f36f09ed000, 16384, PROT_READ) = 0
mprotect(0x7f36f1158000, 4096, PROT_READ) = 0
mprotect(0x7f36f117c000, 4096, PROT_READ) = 0
mprotect(0x7f36f0c44000, 45056, PROT_READ) = 0
mprotect(0x7f36f1001000, 4096, PROT_READ) = 0
mprotect(0x601000, 4096, PROT_READ)     = 0
mprotect(0x7f36f11c8000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f36f117e000, 95594)           = 0
futex(0x7f36f0c527bc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
getrandom("\xe5\x59\x51\x9f\xcb\x6d\xfb\x91", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0xde4000
brk(0xe05000)                           = 0xe05000
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
write(1, "checking 'show_me_the_flag'\n", 28checking 'show_me_the_flag'
) = 28
write(1, "ok\n", 3ok
)                     = 3
exit_group(1)                           = ?
+++ exited with 1 +++

In the previous output, strace includes all the calls made by the program’s interpreter to initialize the process, causing the output to have more information than necessary. We also see how the linker searches and maps the libraries in memory. It is not until the statement write(1, "checking 'show_me_the_flag'\n", 28checking 'show_me_the_flag') = 28 that we begin to see specific behavior of the program.

From the instruction mentioned below we see how write calls are made to display various messages on the screen, and at the end status code 1 is returned with exit_group(1). In this case strace did not provide us with interesting information so we must use other methods to analyze the binary.


Analyzing library calls (ltrace)

To see the library calls executed by ctf we will use ltrace .

# -i : show the instruction pointer at the time of the library call
# -C : demangle names
$ ltrace -i -C ./ctf "show_me_the_flag"
[0x400fe9] __libc_start_main(0x400bc0, 2, 0x7ffe1aab28b8, 0x4010c0 <unfinished ...>
[0x400c44] __printf_chk(1, 0x401158, 0x7ffe1aab311b, 384checking 'show_me_the_flag'
)                                                                                          = 28
[0x400c51] strcmp("show_me_the_flag", "show_me_the_flag")                                                                                          = 0
[0x400cf0] puts("ok"ok
)                                                                                                                              = 3
[0x400d07] rc4_init(rc4_state_t*, unsigned char*, int)(0x7ffe1aab2650, 0x4011c0, 66, 0x7f9191915ad0)                                               = 0
[0x400d14] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*)(0x7ffe1aab2590, 0x40117b, 58, 3)   = 0x7ffe1aab2590
[0x400d29] rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)(0x7ffe1aab25f0, 0x7ffe1aab2650, 0x7ffe1aab2590, 0x7e889f91) = 0x7ffe1aab25f0
[0x400d36] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)(0x7ffe1aab2590, 0x7ffe1aab25f0, 0x7ffe1aab2600, 0) = 0x7ffe1aab25a0
[0x400d53] getenv("GUESSME")                                                                                                                       = nil
[0xffffffffffffffff] +++ exited (status 1) +++

The first library call we see is __libc_start_main, which is called from the _start function to transfer control to the main function of the program (main function).

The second call is __printf_chk which displays the message seen above: checking 'show_me_the_flag'.

The third call is to strcmp which performs a string comparison between the string we passed as an argument and another one found in the code which is also show_me_the_flag.

The fourth call is simply to puts and displays the message "ok".

So far the behavior we have seen is what we already knew beforehand, however from the fourth call onwards we see new information.

For example, we see how RC4 is initialized through the call to rc4_init. Then an assignment is made to a c++ string, probably with encrypted data since in the next line a call is made to rc4_decrypt passing the aforementioned string as an argument and the decrypted message is assigned to a new c++ string.

Finally, a call is made to getenv which is a library of functions used to examine environment variables. The binary expects there to be an environment variable called “GUESSME” so we are going to declare this variable and run ltrace again.

$ GUESSME="foo" ltrace -i -C ./ctf "show_me_the_flag"
...
[0x400d53] getenv("GUESSME")                                                                                                                       = "foo"
[0x400d6e] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*)(0x7ffcd5851ff0, 0x401183, 5, 5)    = 0x7ffcd5851ff0
[0x400d88] rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)(0x7ffcd5852050, 0x7ffcd5852090, 0x7ffcd5851ff0, 0xa0a6e0) = 0x7ffcd5852050
[0x400d9a] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)(0x7ffcd5851ff0, 0x7ffcd5852050, 0xa0a730, 0) = 0xa0a6e0
[0x400db4] operator delete(void*)(0xa0a730, 0xa0a730, 21, 0)                                                                                       = 0
[0x400dd7] puts("guess again!"guess again!
)                                                                                                                    = 13
[0x400c8d] operator delete(void*)(0xa0a6e0, 0xa0a2b0, 0, 0x7f1734b15ad0)                                                                           = 0
[0xffffffffffffffff] +++ exited (status 1) +++

After the call to getenv, another c++ string is assigned and decrypted. Unfortunately, between decryption and the time the “guess again” message is displayed, no clue is seen to determine the expected value of GUESSME. This means that the comparison is carried out without using a library.


Analyzing the behavior of the ELF at the instruction level using objdump

In the previous example we saw that the call to the puts function with the message "guess again" occurs more or less at address 0x400dd7, so we are going to use objdump to examine the instructions near that address. memory.

$ objdump -M intel -d ctf | grep "400dd7" -A 10 -B 10
  400db4:	48 8b 4c 24 20       	mov    rcx,QWORD PTR [rsp+0x20]
  400db9:	31 c0                	xor    eax,eax
  400dbb:	0f 1f 44 00 00       	nop    DWORD PTR [rax+rax*1+0x0]
  400dc0:	0f b6 14 03          	movzx  edx,BYTE PTR [rbx+rax*1]
  400dc4:	84 d2                	test   dl,dl
  400dc6:	74 05                	je     400dcd <__gmon_start__@plt+0x21d>
  400dc8:	3a 14 01             	cmp    dl,BYTE PTR [rcx+rax*1]
  400dcb:	74 13                	je     400de0 <__gmon_start__@plt+0x230>
  400dcd:	bf af 11 40 00       	mov    edi,0x4011af
  400dd2:	e8 d9 fc ff ff       	call   400ab0 <puts@plt>
  400dd7:	e9 84 fe ff ff       	jmp    400c60 <__gmon_start__@plt+0xb0>
  400ddc:	0f 1f 40 00          	nop    DWORD PTR [rax+0x0]
  400de0:	48 83 c0 01          	add    rax,0x1
  400de4:	48 83 f8 15          	cmp    rax,0x15
  400de8:	75 d6                	jne    400dc0 <__gmon_start__@plt+0x210>
  400dea:	48 8d 7c 24 40       	lea    rdi,[rsp+0x40]
  400def:	be 99 11 40 00       	mov    esi,0x401199
  400df4:	e8 47 fd ff ff       	call   400b40 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignEPKc@plt>
  400df9:	48 8d 54 24 40       	lea    rdx,[rsp+0x40]
  400dfe:	48 8d b4 24 c0 00 00 	lea    rsi,[rsp+0xc0]

By analyzing the assembly code above, we can determine that it is a loop and the comparison is done on a byte-by-byte basis.

The instruction at memory location 400dc0 in the above code saves the byte value pointed to by register rbx and indexed by register rax into register edx. The array that the rbx record is pointing to is probably the string we passed as an argument.

0f b6 14 03 movzx edx,BYTE PTR [rbx+rax*1] ; edx = rbx[rax]

The instructions from 400dc4 to 400dc6 function to compare if the value just saved in the edx register is null, and if it is, a jump will be made to the instructions to be executed in case the comparison with the string fails correct.

400dc4: 84 d2 test dl,dl
400dc6:74 05 je 400dcd <__gmon_start__@plt+0x21d>

The instruction at address 400dc8 is very important since this instruction is where the comparison of our string is made with the one stored in the ctf. This means that the rcx register is currently pointing to the string that is hidden.

400dc8: 3a 14 01 cmp dl,BYTE PTR [rcx+rax*1] ; rbx[rax] == rcx[rax]

We already know where to find the missing string to assign it to GUESSME, however, said string is decrypted at runtime and is not statically available, so a dynamic analysis must be performed.


Dumping a dynamic string buffer using gdb

One of the most used tools in dynamic analysis in GNU/Linux is gdb (GNU Debugger).

$ gdb ./ctf   
GNU gdb (Debian 13.2-1) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ctf...
(No debugging symbols found in ./ctf)
(gdb) b *0x400dc8
Breakpoint 1 at 0x400dc8
(gdb) set env GUESSME=foobar
(gdb) run show_me_the_flag
Starting program: /home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/ctf show_me_the_flag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
checking 'show_me_the_flag'
ok

Breakpoint 1, 0x0000000000400dc8 in ?? ()
(gdb) display/i $pc
1: x/i $pc
=> 0x400dc8:	cmp    (%rcx,%rax,1),%dl
(gdb) info registers rcx
rcx            0x6156e0            6379232
(gdb) x/s 0x6156e0
0x6156e0:	"Crackers Don't Matter"
(gdb) quit
A debugging session is active.

	Inferior 1 [process 150208] will be killed.

Quit anyway? (y or n) y

The first instruction we execute in gdb is to set the breakpoint to stop the execution of the program at the specified address, which is address 0x400d8, which is where the important comparison is made.

(gdb)b *0x400dc8

With the second instruction that we execute in gdb we specify the environment variable GUESSME and in the next instruction we execute the program passing “show_me_the_flag” as an argument.

(gdb) set env GUESSME=foobar
(gdb) run show_me_the_flag

In the next instruction, after the program stopped at the breakpoint, we told gdb to show us the instructions in the current $pc (program counter).

(gdb)display/i $pc
1: x/i $pc
=> 0x400dc8: cmp (%rcx,%rax,1),%dl

In the following instruction, as we already know that what interests us is being pointed to by the rcx register, we tell gdb to show us the address it has stored at that precise moment.

(gdb) info registers rcx
rcx 0x6156e0 6379232

In the following instruction, we tell gdb to show us the dump (dump = x) at memory location 0x6156e0 in c-string format (c-string = s).

(gdb) x/s 0x6156e0
0x6156e0: "Crackers Don't Matter"

Now we can see the string that we were missing. Crackers Don't Matter

With this done, we just need to run the binary again with the corresponding values.

$ GUESSME="Crackers Don't Matter" ./ctf "show_me_the_flag"       
checking 'show_me_the_flag'
ok
flag = 84b34c124b2ba5ca224af8e33b077e9e

© 2023. All rights reserved.