Basic Binary Analysis
Learn the procedure to perform basic binary analysis on Linux.
Before you start reading this blog, I recommend that you download the following file so that you can replicate the procedure from your Linux machine.
Identifying files with file
When we are dealing with an unknown file, the first thing we must do is determine what type of file it is to know what we can do with it. Otherwise we would be doing blind tests.
To determine the type of file we are dealing with, we use the file
tool which is based mainly on file patterns such as magic bytes (0x7f ELF in the case of ELF files).
$ file payload
payload: ASCII text
$ head payload
H4sIABzY61gAA+xaD3RTVZq/Sf+lFJIof1r+2aenKKh0klJKi4MmJaUvWrTSFlgR0jRN20iadpKX
UljXgROKjbUOKuOfWWfFnTlzZs/ZXTln9nTRcTHYERhnZ5c/R2RGV1lFTAFH/DNYoZD9vvvubd57
bcBl1ln3bL6e9Hvf9+733e/+v+/en0dqId80WYAWLVqI3LpooUXJgUpKFy6yEOsCy6KSRQtLLQsW
EExdWkIEyzceGVA4JLmDgkCaA92XTXel9/9H6ftVNcv0Ot2orCe3E5RiJhuVbUw/fH3SxkbKSS78
v47MJtkgZynS2YhNxYeZa84NLF0G/DLhV66X5XK9TcVnsXSc6xQ8S1UCm4o/M5moOCHCqB3Geny2
rD0+u1HFD7I4junVdnpmN8zshll6zglPr1eXL5P96pm+npWLcwdL51CkR6r9UGrGZ8O1zN+1NhUv
ZelKNXb3gl02+fpkZnwFyy9VvQgsfs55O3zH72sqK/2Ov3m+3xcId8/vLi+bX1ZaHOooLqExmVna
6rsbaHpejwKLeQqR+wC+n/ePA3n/duKu2kNvL175+MxD7z75W8GC76aSZLv1xgSdkGnLRV0+/KbD
7+UPnnhwadWbZ459b/Wsl/o/NZ468olxo3P9wOXK3Qe/a8fRmwhvcTVdl0J/UDe+nzMp9M4U+n9J
oX8jhT5HP77+ZIr0JWT8+NvI+OnvTpG+NoV/Qwr9Vyn0b6bQkxTl+ixF+p+m0N+qx743k+wWGlX6
The first command shows the functionality of file
which tells us that we are dealing with a text file (ASCII) so we proceed to use head
to print the top lines of the file. By seeing the initial content, you can easily deduce that it is encoded in base 64.
Base64 is a method used to encode binary information into ASCII text. It is commonly used in web services to ensure that binary information sent over the network is not accidentally corrupted by text-only services. To decode base 64 we use the following command:
$ base64 -d payload > payload_decoded
$ file payload_decoded
payload_decoded: gzip compressed data, last modified: Mon Apr 10 19:08:12 2017, from Unix, original size modulo 2^32 808960
$ file -z payload_decoded
payload_decoded: POSIX tar archive (GNU) (gzip compressed data, last modified: Mon Apr 10 19:08:12 2017, from Unix)
In the last command executed we used a file
functionality to inspect compressed files. This shows us that we are dealing with a file that has been compressed by several layers. The first layer is a gzip
compression and the second layer is a tar
file.
To unzip the file we use the following command:
# -x : Extract files from an archive
# -v : Verbose
# -z : Filtra el archivo usando gzip
# -f : Especifica el archivo
$ tar xvzf payload_decoded
ctf
67b8601
$ file ctf 67b8601
ctf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=29aeb60bcee44b50d1db3a56911bd1de93cd2030, stripped
67b8601: PC bitmap, Windows 3.x format, 512 x 512 x 24, image size 786434, resolution 7872 x 7872 px/m, 1165950976 important colors, cbSize 786488, bits offset 54
The zip file contains two files, the ctf
file and the 67b8601
file. The ctf
file is a 64-bit dynamically linked ELF stripped executable binary and the 67b8601
file is a 512 x 512 pixel bitmap (BMP).
Inspecting the ELF file
Now we are going to run the ELF binary (It is not advisable to run unknown binaries on our machines, however in my case I am doing it from a virtual machine).
$ ./ctf
./ctf: error while loading shared libraries: lib5ae9b7f.so: cannot open shared object file: No such file or directory
Before we can run the binary, the linker tells us that a library called lib5ae9b7f.so
was not found. To determine what other libraries are needed to run the binary we will use the ldd
tool. This tool tells us which shared objects a binary depends on, as well as telling us where these dependencies are located in our system. Be careful with ldd
as it can run the binary to determine what dependencies are needed, so it is not safe to use ldd
with unknown or dangerous binaries.
$ ldd ctf
linux-vdso.so.1 (0x00007fff7efbd000)
lib5ae9b7f.so => not found
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f840dc00000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f840df94000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f840da1e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f840deb5000)
/lib64/ld-linux-x86-64.so.2 (0x00007f840dfd2000)
Reviewing the previous result, there is only one dependency that is not found which is lib5ae9b7f.so
Since the name of the missing library is unusual, this is an indication that we are not going to find it in any repository, therefore it must be hidden somewhere else so we are going to use greo
to find the library of the following way.
$ grep "ELF" *
grep: 67b8601: match in binary file
grep: ctf: match in binary file
What grep is doing is searching for matches with the specified pattern (“ELF”) in all files in the current directory. This is done because there are magic bytes at the beginning of the ELF binaries. Magic bytes are a series of bytes or characters that serve to indicate that we are indeed dealing with an ELF file. For ELF files, magic bytes consist of the following 4 bytes (0x7f 0x45 0x4c 0x46 == 0x7fELF). These magic bytes are part of the ELF file header.
As we could see in the result of the previous command, two matches were found. The first match is in the ctf
file since it is an ELF binary as indicated by the file
tool and the second match is found in the 67b8601
file which is strange since according to the result of the file
command `, this is not an ELF file but rather a bitmap. This is an indication that there is probably a hidden ELF binary inside the BMP file.
Inspecting the bitmap with xxd
To inspect the file 67b8601
we will have to do it at the byte level. For this the xxd
tool is used.
$ xxd 67b8601 | head
00000000: 424d 3800 0c00 0000 0000 3600 0000 2800 BM8.......6...(.
00000010: 0000 0002 0000 0002 0000 0100 1800 0000 ................
00000020: 0000 0200 0c00 c01e 0000 c01e 0000 0000 ................
00000030: 0000 0000 [7f45 4c46] 0201 0100 0000 0000 .....ELF........
00000040: 0000 0000 0300 3e00 0100 0000 7009 0000 ......>.....p...
00000050: 0000 0000 4000 0000 0000 0000 7821 0000 ....@.......x!..
00000060: 0000 0000 0000 0000 4000 3800 0700 4000 ........@.8...@.
00000070: 1b00 1a00 0100 0000 0500 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 f40e 0000 0000 0000 f40e 0000 ................
In the previous result, the magic bytes of the ELF file are observed as already mentioned. With this we already have the beginning of the ELF file, however, finding the end of the file is not so easy since ELF binaries do not have magic bytes to indicate the end of the file.
To determine the total size of the file we must first inspect the header of the ELF binary. Since we already know where it starts (offset 0x34 or 52 in decimal) and what size the header of the ELF binaries is (64 bytes) we can use the dd
tool to extract the header.
$ dd skip=52 count=64 if=67b8601 of=elf_header bs=1
64+0 records in
64+0 records out
64 bytes copied, 0.000793126 s, 80.7 kB/s
Parsing the ELF extracted with readelf
To parse the extracted ELF header, we will use the readelf
tool.
$ readelf -h elf_header
Encabezado ELF:
Mágico: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Clase: ELF64
Datos: complemento a 2, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
Versión ABI: 0
readelf: Error: Too many program headers - 0x7 - the file is not that big
Tipo: DYN (Shared object file)
Máquina: Advanced Micro Devices X86-64
Versión: 0x1
Dirección del punto de entrada: 0x970
Inicio de encabezados de programa: 64 (bytes in the file)
Inicio de encabezados de sección: 8568 (bytes in the file)
Opciones: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 27
Section header string table index: 26
readelf: Error: Reading 1728 bytes extends past end of file for encabezados de sección
readelf: Error: Too many program headers - 0x7 - the file is not that big
With the above information, we can determine the total size of the ELF file as follows: First we must remember that the last part of an ELF file is the section header table and the offset to said table is found in the previous result.
Start of section headers: 8568 (bytes in file)
Second, the previous header also tells us the size of each section header
Size of section headers: 64 (bytes)
And finally, it also tells us the number of section headers in the table.
Number of section headers: 27
With this information you can calculate the full size of the ELF file as follows.
Now that we know the total size of the ELF file inserted into the bitmap, we just need to extract it using dd
again.
$ dd skip=52 count=10296 if=67b8601 of=lib5ae9b7f.so bs=1
10296+0 records in
10296+0 records out
10296 bytes (10 kB, 10 KiB) copied, 0.050333 s, 205 kB/s
$ file lib5ae9b7f.so
lib5ae9b7f.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=5279389c64af3477ccfdf6d3293e404fd9933817, stripped
Now let’s inspect the library extracted with readelf
to determine if the binary has been extracted correctly.
$ readelf -hs lib5ae9b7f.so
Encabezado ELF:
Mágico: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Clase: ELF64
Datos: complemento a 2, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
Versión ABI: 0
Tipo: DYN (Shared object file)
Máquina: Advanced Micro Devices X86-64
Versión: 0x1
Dirección del punto de entrada: 0x970
Inicio de encabezados de programa: 64 (bytes in the file)
Inicio de encabezados de sección: 8568 (bytes in the file)
Opciones: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 7
Size of section headers: 64 (bytes)
Number of section headers: 27
Section header string table index: 26
Symbol table '.dynsym' contains 22 entries:
Num: Valor Tam Tipo Unión Vis Nombre Ind
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000000008c0 0 SECTION LOCAL DEFAULT 9 .init
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBCXX_3.4.21 (2)
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBC_2.2.5 (3)
6: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
7: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
8: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (3)
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __[...]@GLIBC_2.4 (4)
10: 0000000000000000 0 FUNC GLOBAL DEFAULT UND [...]@GLIBCXX_3.4 (5)
11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND memcpy@GLIBC_2.14 (6)
12: 0000000000000bc0 149 FUNC GLOBAL DEFAULT 12 _Z11rc4_encryptP[...]
13: 0000000000000cb0 112 FUNC GLOBAL DEFAULT 12 _Z8rc4_initP11rc[...]
14: 0000000000202060 0 NOTYPE GLOBAL DEFAULT 24 _end
15: 0000000000202058 0 NOTYPE GLOBAL DEFAULT 23 _edata
16: 0000000000000b40 119 FUNC GLOBAL DEFAULT 12 _Z11rc4_encryptP[...]
17: 0000000000000c60 5 FUNC GLOBAL DEFAULT 12 _Z11rc4_decryptP[...]
18: 0000000000202058 0 NOTYPE GLOBAL DEFAULT 24 __bss_start
19: 00000000000008c0 0 FUNC GLOBAL DEFAULT 9 _init
20: 0000000000000c70 59 FUNC GLOBAL DEFAULT 12 _Z11rc4_decryptP[...]
21: 0000000000000d20 0 FUNC GLOBAL DEFAULT 13 _fini
When inspecting the symbol table, functions with names that are difficult to read are observed, such as _Z11rc4_encryptP
, however, it can be deduced that this library has some functionality to encrypt and decrypt data.
Parsing symbols with nm
In C++ language, function overloading allows you to define multiple functions with the same name in a class or namespace, as long as they have different signatures. This is useful in situations where you want multiple functions to perform similar tasks but with different data types or different numbers of arguments.
A function signature is used to uniquely identify a function in the program. This means that two functions can have the same name, but if they have different signatures (that is, they take different types of parameters or a different number of parameters), they will be considered different functions and there will be no ambiguity in their call.
int sum(int a, int b) {
return a + b;
}
double sum(double a, double b) {
return a + b;
}
These two functions have the same name: sum
, but different signatures (one takes integers and the other takes floating point numbers). Thanks to function overloading, the compiler will be able to determine which one to use based on the arguments you pass to it when calling the function.
However, the linker has no knowledge about this. For example, if there are multiple functions with the name foo
, the linker does not know how to resolve references to foo
; You just won’t know which version to use.
To prevent this from happening, the C++ compiler emits mangled function names
. An mangled name
is basically a combination of the original function name and an encoding of the functions parameters. This way, each version of the overloaded function gets a unique name and the linker will have no problem identifying each of these functions.
For the analyst, this is partly beneficial, since the mangled function names
provide information about the parameters that the function expects when called. A useful tool for demangling function names is nm
.
$nm lib5ae9b7f.so
nm:lib5ae9b7f.so: no symbols
In the previous example, nothing interesting happened and this is because the file lib5ae9b7f.so
is stripped so we must tell nm
to analyze the dynamic symbol table.
$ nm -D lib5ae9b7f.so
0000000000202058 B __bss_start
w __cxa_finalize@GLIBC_2.2.5
0000000000202058 D _edata
0000000000202060 B _end
0000000000000d20 T _fini
w __gmon_start__
00000000000008c0 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U malloc@GLIBC_2.2.5
U memcpy@GLIBC_2.14
U __stack_chk_fail@GLIBC_2.4
0000000000000c60 T _Z11rc4_decryptP11rc4_state_tPhi
0000000000000c70 T _Z11rc4_decryptP11rc4_state_tRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000000b40 T _Z11rc4_encryptP11rc4_state_tPhi
0000000000000bc0 T _Z11rc4_encryptP11rc4_state_tRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000000cb0 T _Z8rc4_initP11rc4_state_tPhi
U _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_createERmm@GLIBCXX_3.4.21
U _ZSt19__throw_logic_errorPKc@GLIBCXX_3.4
The previous result shows us more information, but we still have to do the demangling
of the function names. So we do it with the --demangle
parameter.
$ nm -D --demangle lib5ae9b7f.so
0000000000202058 B __bss_start
w __cxa_finalize@GLIBC_2.2.5
0000000000202058 D _edata
0000000000202060 B _end
0000000000000d20 T _fini
w __gmon_start__
00000000000008c0 T _init
w _ITM_deregisterTMCloneTable
w _ITM_registerTMCloneTable
w _Jv_RegisterClasses
U malloc@GLIBC_2.2.5
U memcpy@GLIBC_2.14
U __stack_chk_fail@GLIBC_2.4
0000000000000c60 T rc4_decrypt(rc4_state_t*, unsigned char*, int)
0000000000000c70 T rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
0000000000000b40 T rc4_encrypt(rc4_state_t*, unsigned char*, int)
0000000000000bc0 T rc4_encrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
0000000000000cb0 T rc4_init(rc4_state_t*, unsigned char*, int)
U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)@GLIBCXX_3.4.21
U std::__throw_logic_error(char const*)@GLIBCXX_3.4
Now we can read well the name of the functions which seem to be cryptographic functions that implement the RC4 encryption algorithm. Apart from nm
, we can also use c++filt
to demanglig function names.
$ c++filt _Z8rc4_initP11rc4_state_tPhi
rc4_init(rc4_state_t*, unsigned char*, int)
At this point we have already found the missing library, but before we can run the binary, we must tell the linker where to look for the library.
# specifying where to look for the library
$ export LD_LIBRARY_PATH=`pwd`
# running the binary
$ ./ctf
# since nothing happened we show the return code of the executed process
$echo$?
1
The binary now runs without any problems, but nothing interesting happened during execution and the exit status returns the value 1, indicating an error.
Parsing the ELF file strings
Executable files often contain text strings (strings) that are typically displayed on the screen if certain conditions are met or to display help messages. These strings can help us determine more or less how the program works. For example, if we review the strings of a binary and find HTTP requests or URLs, this is an indication that the program is doing something related to the web.
Sometimes, when analyzing malware, you can find debugging strings that the programmers forgot to delete and these messages can provide important information about the operation of said malware. This is something that has happened in real life.
To analyze the strings a tool called strings
is used.
$ strings ctf
[*] /lib64/ld-linux-x86-64.so.2
lib5ae9b7f.so
[*] __gmon_start__
_Jv_RegisterClasses
_ITM_deregisterTMCloneTable
_ITM_registerTMCloneTable
_Z8rc4_initP11rc4_state_tPhi
...
[*] DEBUG: argv[1] = %s
[*] checking '%s'
[*] show_me_the_flag
>CMb
-v@P^:
flag = %s
guess again!
[*] It's kinda like Louisiana. Or Dagobah. Dagobah - Where Yoda lives!
;*3$"
zPLR
GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
[*] .shstrtab
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt.got
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.gcc_except_table
.init_array
.fini_array
.jcr
.dynamic
.got.plt
.data
.bss
.comment
In the previous result we can see the string DEBUG: argv[1] = %s
which lets us know that the program expects a command line argument. The other strings, not counting the section names, still do not have a clear use.
Let’s remember that from the analysis we discovered that the binary uses the RC4 encryption so perhaps the message ‘It’s kinda like Louisiana. Or Dagobah. Dagobah - Where Yoda lives!` is used as an encryption key. Now we are going to perform several tests on the binary to see what happens.
# using a random string
$ ./ctf "foo"
checking 'foo'
# using a string found in the output of the strings command
$ ./ctf "show_me_the_flag"
checking 'show_me_the_flag'
okay
# the previous execution shows the message ok, however
# we still have a status code equal to 1 (error)
$echo$?
1
Analyzing system calls (strace)
We are going to analyze the calls that the binary makes to the system during its execution using the strace
tool. This will give us a more superficial idea of what the binary does.
$ strace ./ctf "show_me_the_flag"
execve("./ctf", ["./ctf", "show_me_the_flag"], 0x7ffea5dae308 /* 59 vars */) = 0
brk(NULL) = 0xde4000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1196000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v3/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
newfstatat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v3", 0x7ffc3be97090, 0) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v2/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
newfstatat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/glibc-hwcaps/x86-64-v2", 0x7ffc3be97090, 0) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/lib5ae9b7f.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\t\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=10296, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 4202592, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f0d93000
mmap(0x7f36f0e00000, 2105440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x7f36f0e00000
munmap(0x7f36f0d93000, 446464) = 0
munmap(0x7f36f1003000, 1646688) = 0
mprotect(0x7f36f0e01000, 2097152, PROT_NONE) = 0
mmap(0x7f36f1001000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x7f36f1001000
close(3) = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=95594, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 95594, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f36f117e000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2432256, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 2445696, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f0a00000
mmap(0x7f36f0a9c000, 1159168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9c000) = 0x7f36f0a9c000
mmap(0x7f36f0bb7000, 577536, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x7f36f0bb7000
mmap(0x7f36f0c44000, 57344, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x244000) = 0x7f36f0c44000
mmap(0x7f36f0c52000, 12672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f36f0c52000
close(3) = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=141720, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 144232, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f115a000
mmap(0x7f36f115d000, 110592, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f36f115d000
mmap(0x7f36f1178000, 16384, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e000) = 0x7f36f1178000
mmap(0x7f36f117c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f36f117c000
close(3) = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220x\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=1926256, ...}, AT_EMPTY_PATH) = 0
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
mmap(NULL, 1974096, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f081e000
mmap(0x7f36f0844000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f36f0844000
mmap(0x7f36f0999000, 344064, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f36f0999000
mmap(0x7f36f09ed000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1cf000) = 0x7f36f09ed000
mmap(0x7f36f09f3000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f36f09f3000
close(3) = 0
openat(AT_FDCWD, "/home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (The file or directory does not exist)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=907784, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 909560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f36f107b000
mmap(0x7f36f108b000, 471040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x10000) = 0x7f36f108b000
mmap(0x7f36f10fe000, 368640, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x83000) = 0x7f36f10fe000
mmap(0x7f36f1158000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xdc000) = 0x7f36f1158000
close(3) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1079000
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f36f1076000
arch_prctl(ARCH_SET_FS, 0x7f36f1076740) = 0
set_tid_address(0x7f36f1076a10) = 80274
set_robust_list(0x7f36f1076a20, 24) = 0
rseq(0x7f36f1077060, 0x20, 0, 0x53053053) = 0
mprotect(0x7f36f09ed000, 16384, PROT_READ) = 0
mprotect(0x7f36f1158000, 4096, PROT_READ) = 0
mprotect(0x7f36f117c000, 4096, PROT_READ) = 0
mprotect(0x7f36f0c44000, 45056, PROT_READ) = 0
mprotect(0x7f36f1001000, 4096, PROT_READ) = 0
mprotect(0x601000, 4096, PROT_READ) = 0
mprotect(0x7f36f11c8000, 8192, PROT_READ) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
munmap(0x7f36f117e000, 95594) = 0
futex(0x7f36f0c527bc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
getrandom("\xe5\x59\x51\x9f\xcb\x6d\xfb\x91", 8, GRND_NONBLOCK) = 8
brk(NULL) = 0xde4000
brk(0xe05000) = 0xe05000
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
write(1, "checking 'show_me_the_flag'\n", 28checking 'show_me_the_flag'
) = 28
write(1, "ok\n", 3ok
) = 3
exit_group(1) = ?
+++ exited with 1 +++
In the previous output, strace
includes all the calls made by the program’s interpreter to initialize the process, causing the output to have more information than necessary. We also see how the linker searches and maps the libraries in memory. It is not until the statement write(1, "checking 'show_me_the_flag'\n", 28checking 'show_me_the_flag') = 28
that we begin to see specific behavior of the program.
From the instruction mentioned below we see how write calls are made to display various messages on the screen, and at the end status code 1 is returned with exit_group(1)
. In this case strace
did not provide us with interesting information so we must use other methods to analyze the binary.
Analyzing library calls (ltrace)
To see the library calls executed by ctf
we will use ltrace
.
# -i : show the instruction pointer at the time of the library call
# -C : demangle names
$ ltrace -i -C ./ctf "show_me_the_flag"
[0x400fe9] __libc_start_main(0x400bc0, 2, 0x7ffe1aab28b8, 0x4010c0 <unfinished ...>
[0x400c44] __printf_chk(1, 0x401158, 0x7ffe1aab311b, 384checking 'show_me_the_flag'
) = 28
[0x400c51] strcmp("show_me_the_flag", "show_me_the_flag") = 0
[0x400cf0] puts("ok"ok
) = 3
[0x400d07] rc4_init(rc4_state_t*, unsigned char*, int)(0x7ffe1aab2650, 0x4011c0, 66, 0x7f9191915ad0) = 0
[0x400d14] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*)(0x7ffe1aab2590, 0x40117b, 58, 3) = 0x7ffe1aab2590
[0x400d29] rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)(0x7ffe1aab25f0, 0x7ffe1aab2650, 0x7ffe1aab2590, 0x7e889f91) = 0x7ffe1aab25f0
[0x400d36] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)(0x7ffe1aab2590, 0x7ffe1aab25f0, 0x7ffe1aab2600, 0) = 0x7ffe1aab25a0
[0x400d53] getenv("GUESSME") = nil
[0xffffffffffffffff] +++ exited (status 1) +++
The first library call we see is __libc_start_main
, which is called from the _start function to transfer control to the main function of the program (main function).
The second call is __printf_chk
which displays the message seen above: checking 'show_me_the_flag'
.
The third call is to strcmp
which performs a string comparison between the string we passed as an argument and another one found in the code which is also show_me_the_flag
.
The fourth call is simply to puts
and displays the message "ok"
.
So far the behavior we have seen is what we already knew beforehand, however from the fourth call onwards we see new information.
For example, we see how RC4 is initialized through the call to rc4_init
. Then an assignment is made to a c++ string, probably with encrypted data since in the next line a call is made to rc4_decrypt
passing the aforementioned string as an argument and the decrypted message is assigned to a new c++ string.
Finally, a call is made to getenv
which is a library of functions used to examine environment variables. The binary expects there to be an environment variable called “GUESSME” so we are going to declare this variable and run ltrace
again.
$ GUESSME="foo" ltrace -i -C ./ctf "show_me_the_flag"
...
[0x400d53] getenv("GUESSME") = "foo"
[0x400d6e] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*)(0x7ffcd5851ff0, 0x401183, 5, 5) = 0x7ffcd5851ff0
[0x400d88] rc4_decrypt(rc4_state_t*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)(0x7ffcd5852050, 0x7ffcd5852090, 0x7ffcd5851ff0, 0xa0a6e0) = 0x7ffcd5852050
[0x400d9a] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)(0x7ffcd5851ff0, 0x7ffcd5852050, 0xa0a730, 0) = 0xa0a6e0
[0x400db4] operator delete(void*)(0xa0a730, 0xa0a730, 21, 0) = 0
[0x400dd7] puts("guess again!"guess again!
) = 13
[0x400c8d] operator delete(void*)(0xa0a6e0, 0xa0a2b0, 0, 0x7f1734b15ad0) = 0
[0xffffffffffffffff] +++ exited (status 1) +++
After the call to getenv
, another c++ string is assigned and decrypted. Unfortunately, between decryption and the time the “guess again” message is displayed, no clue is seen to determine the expected value of GUESSME
. This means that the comparison is carried out without using a library.
Analyzing the behavior of the ELF at the instruction level using objdump
In the previous example we saw that the call to the puts
function with the message "guess again"
occurs more or less at address 0x400dd7, so we are going to use objdump
to examine the instructions near that address. memory.
$ objdump -M intel -d ctf | grep "400dd7" -A 10 -B 10
400db4: 48 8b 4c 24 20 mov rcx,QWORD PTR [rsp+0x20]
400db9: 31 c0 xor eax,eax
400dbb: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
400dc0: 0f b6 14 03 movzx edx,BYTE PTR [rbx+rax*1]
400dc4: 84 d2 test dl,dl
400dc6: 74 05 je 400dcd <__gmon_start__@plt+0x21d>
400dc8: 3a 14 01 cmp dl,BYTE PTR [rcx+rax*1]
400dcb: 74 13 je 400de0 <__gmon_start__@plt+0x230>
400dcd: bf af 11 40 00 mov edi,0x4011af
400dd2: e8 d9 fc ff ff call 400ab0 <puts@plt>
400dd7: e9 84 fe ff ff jmp 400c60 <__gmon_start__@plt+0xb0>
400ddc: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
400de0: 48 83 c0 01 add rax,0x1
400de4: 48 83 f8 15 cmp rax,0x15
400de8: 75 d6 jne 400dc0 <__gmon_start__@plt+0x210>
400dea: 48 8d 7c 24 40 lea rdi,[rsp+0x40]
400def: be 99 11 40 00 mov esi,0x401199
400df4: e8 47 fd ff ff call 400b40 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignEPKc@plt>
400df9: 48 8d 54 24 40 lea rdx,[rsp+0x40]
400dfe: 48 8d b4 24 c0 00 00 lea rsi,[rsp+0xc0]
By analyzing the assembly code above, we can determine that it is a loop and the comparison is done on a byte-by-byte basis.
The instruction at memory location 400dc0
in the above code saves the byte value pointed to by register rbx
and indexed by register rax
into register edx
. The array that the rbx
record is pointing to is probably the string we passed as an argument.
0f b6 14 03 movzx edx,BYTE PTR [rbx+rax*1] ; edx = rbx[rax]
The instructions from 400dc4
to 400dc6
function to compare if the value just saved in the edx
register is null, and if it is, a jump will be made to the instructions to be executed in case the comparison with the string fails correct.
400dc4: 84 d2 test dl,dl
400dc6:74 05 je 400dcd <__gmon_start__@plt+0x21d>
The instruction at address 400dc8
is very important since this instruction is where the comparison of our string is made with the one stored in the ctf. This means that the rcx
register is currently pointing to the string that is hidden.
400dc8: 3a 14 01 cmp dl,BYTE PTR [rcx+rax*1] ; rbx[rax] == rcx[rax]
We already know where to find the missing string to assign it to GUESSME, however, said string is decrypted at runtime and is not statically available, so a dynamic analysis must be performed.
Dumping a dynamic string buffer using gdb
One of the most used tools in dynamic analysis in GNU/Linux is gdb
(GNU Debugger).
$ gdb ./ctf
GNU gdb (Debian 13.2-1) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ctf...
(No debugging symbols found in ./ctf)
(gdb) b *0x400dc8
Breakpoint 1 at 0x400dc8
(gdb) set env GUESSME=foobar
(gdb) run show_me_the_flag
Starting program: /home/th3g3ntl3man/Documents/dev/cpp/code/chapter5/testing/ctf show_me_the_flag
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
checking 'show_me_the_flag'
ok
Breakpoint 1, 0x0000000000400dc8 in ?? ()
(gdb) display/i $pc
1: x/i $pc
=> 0x400dc8: cmp (%rcx,%rax,1),%dl
(gdb) info registers rcx
rcx 0x6156e0 6379232
(gdb) x/s 0x6156e0
0x6156e0: "Crackers Don't Matter"
(gdb) quit
A debugging session is active.
Inferior 1 [process 150208] will be killed.
Quit anyway? (y or n) y
The first instruction we execute in gdb is to set the breakpoint to stop the execution of the program at the specified address, which is address 0x400d8, which is where the important comparison is made.
(gdb)b *0x400dc8
With the second instruction that we execute in gdb we specify the environment variable GUESSME
and in the next instruction we execute the program passing “show_me_the_flag” as an argument.
(gdb) set env GUESSME=foobar
(gdb) run show_me_the_flag
In the next instruction, after the program stopped at the breakpoint, we told gdb to show us the instructions in the current $pc (program counter).
(gdb)display/i $pc
1: x/i $pc
=> 0x400dc8: cmp (%rcx,%rax,1),%dl
In the following instruction, as we already know that what interests us is being pointed to by the rcx register, we tell gdb to show us the address it has stored at that precise moment.
(gdb) info registers rcx
rcx 0x6156e0 6379232
In the following instruction, we tell gdb to show us the dump (dump = x
) at memory location 0x6156e0 in c-string format (c-string = s
).
(gdb) x/s 0x6156e0
0x6156e0: "Crackers Don't Matter"
Now we can see the string that we were missing. Crackers Don't Matter
With this done, we just need to run the binary again with the corresponding values.
$ GUESSME="Crackers Don't Matter" ./ctf "show_me_the_flag"
checking 'show_me_the_flag'
ok
flag = 84b34c124b2ba5ca224af8e33b077e9e