Python scripting in GDB: a CTF example

By Thomas Berlioz

What to know before reading

This articles supposes you have a little x86-64 Assembly knowledge (registers, basic instructions like mov, addresses) and some GDB experience (breakpoints, stepping, continuing).

It introduces Python scriting in GDB assuming you know some about Python bases like loops, conditions, libraries and types.

Introduction

Back when I used to play CTF every week-end, I was a huge fan of Reverse Engineering (RE) and Exploitation challenges. GDB had no secrets for me as I loved its simplicity and lisibility when associated with plugins like GEF or pwndbg.

Solving RE challenges means running and debugging the same executable again and again with different inputs to check if the instructions differ from an execution to another. That is quickly said and done for easy challenges but this becomes a penible task to do by hand on more complex ones.

gdbpython is a Python wrapper for GDB scripting that uses the GDB Python library to run GDB and execute command directly from the Python script. I only added a few functions to automate some tasks we do a lot in Reverse Engineering challenges like breaking at the entry point or setting breakpoints considering PIE (Position Independant Executable) is present. Let’s see a simple application resolving a challenge and do not hesitate to check the wrapper’s code directly! It’s pretty well documented.

Challenge

You can download the challenge here if you want to try it yourself! As in every CTF challenges, we are searching for a flag to validate it. The flag format is HDCTF{} and the only thing we have is a binary:

$ file challenge
challenge: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically
linked, interpreter /lib/ld-linux.so.2, not stripped
$ ./challenge
Wrong!
$ ./challenge password
Wrong!

Looks like a classic executable waiting for the correct argument to give us the flag. Let’s see what we have in GDB:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
gef  disas main
Dump of assembler code for function main:
   0x0804879a <+0>:	mov    eax,ds:0x83f5988
   0x0804879f <+5>:	mov    edx,0x8804879a
   0x080487a4 <+10>:	mov    ds:0x81f5810,eax
   0x080487a9 <+15>:	mov    DWORD PTR ds:0x81f5814,edx
   0x080487af <+21>:	mov    eax,0x0
   0x080487b4 <+26>:	mov    ecx,0x0
   0x080487b9 <+31>:	mov    edx,0x0
   0x080487be <+36>:	mov    al,ds:0x81f5810
[...]
   0x0804b71e <+12164>:	mov    edx,DWORD PTR ds:0x804c88c
   0x0804b724 <+12170>:	mov    DWORD PTR [eax+0xc],edx
   0x0804b727 <+12173>:	mov    edx,DWORD PTR ds:0x804c890
   0x0804b72d <+12179>:	mov    DWORD PTR [eax+0x10],edx
   0x0804b730 <+12182>:	mov    edx,DWORD PTR ds:0x804c894
   0x0804b736 <+12188>:	mov    DWORD PTR [eax+0x14],edx
   0x0804b739 <+12191>:	mov    eax,ds:0x83f5978
   0x0804b73e <+12196>:	mov    eax,DWORD PTR [eax*4+0x83f5970]
   0x0804b745 <+12203>:	mov    DWORD PTR [eax],0x0
End of assembler dump.

The binary has been obfuscated with movfuscator. This transforms basic Assembly into Assembly with only mov instructions, making it very hard to read and to understand. It is also impossible to get C code back from obfuscated code. Instead of deobfuscate it with tools doing the job for us, let’s try to deal with it as such.

Setting up the environment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/env python3

from gdbpython import *

init('challenge') # break on entry point

flag = "mysuperflag"

run(args = [flag]) # run with argument

print(context()) # print stack, register values, current instructions, ...

GDB context at entry point

Now that we are on the entry point, we can start iterating until we notice something interesting: our argument being processed! A few steps later, here comes our flag:

GDB context at first comparison

We can notice

1
2
$eax   : 0x6d
$edx   : 0x48
1
2
3
4
>>> chr(0x48)
'H'                 # first letter of the expected flag
>>> chr(0x6d)
'm'                 # first letter of the input flag

This could be a pure coincidence, but what if not? Let’s try to modify our input so the first letter matches but not the second one and see if we pass the check once.

We also notice there is a signal handler to handle the exception raised by main on its last instruction. We do not need to understand its job in the code as we are going to bruteforce the characters of the flag, but we do need to take it in consideration as GDB stops on interruptions by default.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
init('challenge') # break on entry point

flag = "HX"

ba(0x804a2f8) # add a breakpoint where comparison is done

run(args = [flag]) # run with argument
cont() # continue after entry point breakpoint
cont() # continue after comparison breakpoint
cont() # continue after SIGILL interrupt

print(context()) # print stack, register values, current instructions, ...
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ gdb -x crack.py
$eax   : 0x48           | `H`
$edx   : 0x48           | `H`
gef c                  | continue after breakpoint
gef c                  | continue after interrupt
$eax   : 0x58           | `X`
$edx   : 0x44           | `D`
gef c                  | continue after breakpoint
gef c                  | continue after interrupt
gef c                  | continue after interrupt
gef c                  | continue after interrupt
[Inferior 1 (process 14617) exited with code 02]

Yay, we do pass the first check ! Now we can do the process for the whole flag !

Exploit

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/usr/bin/env python3

from gdbpython import *

init('challenge') # break on entry point

flag = "H"

ba(0x804a2f8) # add a breakpoint where comparison is done

while flag[-1] != '}': # continue until last char is `}`

    run(args = [flag + 'X']) # run with argument
    cont() # continue after entry point breakpoint

    for i in range(len(flag) + 1):
        print(f"{flag=}")
        cont() # continue after comparison breakpoint
        cont() # continue after SIGILL interrupt

    edx = int(get('edx').split(' ')[-1], 16) # parsing to get correct char
    letter = chr(edx)

    flag += letter # update beginning of the flag
    print(f"{flag=}")

print(context())
print(f"{flag=}")

Run the exploit, wait a little bit and here we are !

flag='HDCTF{M0V3_IS_TUR1NG_C0MPLETE}'

Conclusion

gdbpython helps you with a few functions to interact with GDB easier than if you had to type all the commands by hand. It really is a wrapper around the tools as it does not add any debugging feature that GDB (GEF in particular) does not provide already.

GDB scripting is a very powerful knowledge to gain some time in repetitive debugging tasks like bruteforcing with the wil of controll and use debugging informations during the process. This example illustrates how easy such things can be with the adapted environment.