ACSC 2023

Posted on Mar 1, 2023

Preface
pyso
snake

Preface

Last week I participated in Asian Cyber Security Challenge (ACSC), I managed to get 12th place overall and 1st place in my country as an eligible participant for ICC.

In all Reverse challenges I solved, pyso and snake are the most interested among them and I think it would be nice to share what I’ve done through this writeup.

pyso (7 solves)

Typical crackme challenge in a CTF. But written in py so it should be harder, right?

Time to download pyinstaller extractor again.

Given file: pyso.tar.gz

First look

We’re being given main.py and libvalidator. Here’s the source code of main.py:

import libvalidator
import numpy as np

KEY = np.array(list(b'\xf7\x9b\xc3\xcc\x81\x8d\xb9\xd1\x1a\xddK\xc2/\x85\xb8U'), dtype='uint8')
IV = np.array(list(b'\x84\x9a\xa6\xf6\x84s\x86\xf4\xb9\xda\xe9\xc6\xca@:2'), dtype='uint8')

flag = list(input("Enter flag: ").encode())
flag = np.array(flag, dtype=np.uint8)

if libvalidator.fast_len(flag) != 24:
    print("Wrong length!")

X = np.array(list(b'\xef\x1e\xe6\xd9jh\xaa\xea\xa6\xfcq\x0ec\x94\xd7\xc4\xcc\x7f\xe7\xa1\xf3\x9f\xa6z'), dtype='uint8')

flag = libvalidator.fast_xor(flag, X)
flag = libvalidator.fast_rev(flag)
flag = libvalidator.aes_encrypt(flag, KEY, IV)

E = np.array(list(b'\x81\xdd\xfc\xfd\xa9\xc1LE\xdfy\xfd0\x06f\xa1}\xd2\x8c%<U\xbaA\x97'), dtype='uint8')
if libvalidator.fast_cmp(flag, E):
    print("Correct!")
else:
    print("Try harder!")

Without libvalidator, this is just a simple reverse challenge with simple operations like xor, rev, aes,… So I suspect there must be changes in these operations in the library file to make reversers confusing. With this in mind, I started using my favorite tool - IDA Pro, to do initial analysis.

Skimming through functions that are being used in main.py, these five are the most suspicious one in library file:

_pycc_method_aes_encrypt
_pycc_method_fast_cmp
_pycc_method_fast_len
_pycc_method_fast_rev
_pycc_method_fast_xor

Just by glancing at one of them, I know that I need to set up debugger environment because there’s no way I’m able to reverse these codes by looking and reading them. In order to do this, I use Ubuntu 22.04 (because it already has Python 3.10 for us) to run main.py. While the program is waiting for our input, I use a debugger to attach to the python process that’s running our main.py:

In IDA, to attach to a process, go to Debugger -> Attach to Process... and choose a desired process to attach:

After going through all five functions, I quickly realize that only _pycc_method_aes_encrypt is being used “properly”, other four will quickly return after being called and almost doing nothing. So I shift my focus to only this function.

Analyzing encryption routine

void *__fastcall _pycc_method_aes_encrypt(__int64 a1, __int64 a2)
{
  // ....

  v2 = PyArg_UnpackTuple(a2, "aes_encrypt", 3LL, 3LL, &v22, &v21, &v20);

  // ....

  if ( (unsigned int)NRT_adapt_ndarray_from_python(v22, &v16) || *((_QWORD *)&v17 + 1) != 1LL )
  {
    PyErr_SetString(
      &PyExc_TypeError,
      "can't unbox array from PyObject into native value.  The object maybe of a different type");
    return 0LL;
  }

  // ....

  v8 = __main__::aes_encrypt[abi:v753][abi:c8tJTC_2fWQESiLSjagd4uKgGzNAE_3d](
         &v13,
         &v15,
         v3,
         *((__int64 *)&v3 + 1),
         v4,
         1LL,
         v5,
         *((__int64 *)&v5 + 1),
         v6);

  // ....
}

PyArg_UnpackTuple will try to receive all arguments that’s being passed in main.py, then NRT_adapt_ndarray_from_python will get actual values from one of the arguments and store in memories, finally __main__::aes_encrypt[abi:v753][abi:c8tJTC_2fWQESiLSjagd4uKgGzNAE_3d], the main implementation of our exported aes function, will be executed. While debugging this, I notice that all arguments that’s being passed to __main__::aes_encrypt[abi:v753][abi:c8tJTC_2fWQESiLSjagd4uKgGzNAE_3d] are only our input and its metadata, which means other arguments KEY and IV that are being used in the python file are garbish and used as a decoy.

Digging deeper in our aes_encrypt, I started to see some confused naming like wmemchr3, atexit3, swprintf2,… And if you tried to check these functions, there will be more and more confused functions just like those. Despite being weirdly named, these functions are actually being used to do encryption on our input. Also because there’re so many functions, I have to debug and take notes so I can progress faster.

I spent around 1-2 hours debugging and observing memories to grasp the basic idea of the encryption routine, and to save some time for readers, I will summarize the flow in following bullet points:

It creates a 256 bytes array with static values
Combining above array to start encrypting our input in 16 rounds, each round will encrypt our input 16 times. While this is happening, it also modifies the above array.
Finally it applies the encryption one more time to each characters of our input and compare the result to a fixed byte array of size 64 bytes.
Note: Instead of using iteration to make the encryption process looks pretty, author decided to create unique functions for each iteration. So instead of using a loop and calling just one function, the program will call nth unique functions if it loops nth time. As a result, the program doesn’t look nice at all and debug this is pure pain.

I’m a total noob at crypto so the above idea may not match the intention of author’s algorithm, but at least I managed to use those ideas to solve this challenge.

Recreate program’s flow in Python

While I was solving this challenge and coming to this step, I had already passed half of the contest time and I was at the bottom of the scoreboard, I knew I had to quickly finish this step because this is the most important and frustrating step in this whole challenge. Then I came up with an approach: Instead of trying to recreate everything exactly like the binary, I’ll try to extract every values that only interact with our input and recreate the program by using these values.

For example if an encryption flow (of course I make this up completely) works like this:

def encryption(input, table):
    # ....
    table[i] = table[table[j]]
    # ....
    input[i] = (input[i] + table[i]) ^ (input[i] - table[i])
    # ....
    table[table[i]] = table[j]
    # ....

You can notice that it’ll swap values in table and apply them to our input, if we somehow knew the exact value of table[i] right before applying them to input, we just simply don’t care about what’s happening to table anymore. This will help me to reduce the amount of mistakes I make while reversing and reimplementing this code and save a lot of time if the binary I’m working is big.

So I need to identify which values I need to extract. Looking at aes_encrypt function again, I notice there’s a piece of code that interact directly with my input:

__int64 __fastcall func(...) {
  // ....
  v25 = a9 * indexing;
  v26 = table[v25];
  v27 = v26 + a25 + zero_arrays_idk[indexing * a16];
  if ( v27 != (unsigned __int64)indexing )
  {
    v28 = v27 * a9;
    v29 = table[v28] ^ v26;
    table[v25] = v29;
    v30 = table[v28] ^ v29;
    table[v28] = v30;
    table[v25] ^= v30;
  }
  first_index = a23 * ((unsigned __int64)indexing >> 2);
  second_index = a23 * ((unsigned __int64)v27 >> 2);
  if ( (indexing ^ (unsigned int)v27) > 3 )
  {
    v35 = input[second_index];
    v33 = &input[second_index];
    v34 = input[first_index] ^ v35;
    input[first_index] = v34;
  }
  else
  {
    v33 = &input[second_index];
    v34 = input[first_index];
  }
  input[first_index] = indexing ^ v34;
  *v33 ^= v27;
  *a1 = v27;
  return 0LL;
}

You can see that there’re 2 parts in this code, this first part is table swapping and the second part is applying those changes to our input. So all I really care are first_index, second_index and v27 because only these variables access my input and change values in it.

To extract these values, we will need a debugger to do this dynamically since we need to access the memory to get those values. Again, using my favorite debugger IDA, I manage to extract those 3 arrays with the following script using IDAPython plugin:

# I turned off ASLR in my VM so I can work on this script
# constantly without having to change the address everytime
# the debugger starts again.
add_bpt(0x7FFFF6863D13, 0, BPT_SOFT)    # <-- Base address: 0x29D17
enable_bpt(0x7FFFF6863D13, True)    # <-- Base address: 0x29D17

start = 0x7FFFF6886C5E  # <-- Base address: 0x4CC5E

v27_array = []
first_index = []
second_index = []

for _ in range(16):
    for i in range(16):
        address = start + (i * 192)
        addr_first_idx = (start + 53) + (i * 192)
        addr_seccond_idx = (start + 61) + (i * 192)

        print(hex(address), hex(addr_first_idx), hex(addr_seccond_idx))

        add_bpt(address, 0, BPT_SOFT)
        enable_bpt(address, True)
        add_bpt(addr_first_idx, 0, BPT_SOFT)
        enable_bpt(addr_first_idx, True)
        add_bpt(addr_seccond_idx, 0, BPT_SOFT)
        enable_bpt(addr_seccond_idx, True)
        
        idaapi.continue_process()
        idaapi.wait_for_next_event(WFNE_SUSP, -1)
        v27_array.append(get_reg_value('r11'))

        idaapi.continue_process()
        idaapi.wait_for_next_event(WFNE_SUSP, -1)
        first_index.append(get_reg_value('rcx'))

        idaapi.continue_process()
        idaapi.wait_for_next_event(WFNE_SUSP, -1)
        second_index.append(get_reg_value('rsi'))
    start -= 5952

print(v27_array)
print(first_index)
print(second_index)

Another piece of code where it applies changes to my input:

__int64 __fastcall func(
        _QWORD *a1,
        __int64 a2,
        char a3,
        char a4)
{
  *a1 = 5889 * ((unsigned __int8)(a3 ^ a4) + 2LL * (unsigned __int8)(a3 & a4)) + 3584;
  return 0LL;
}

Later a1 will assign to input, a3 and a4 are come from this function:

__int64 __fastcall func(...)
{
  // ....
  constant_index = (unsigned __int8)(a10 + 5);
  v14 = table[one_value * constant_index];
  table_related_index = a12 + table[one_value * (unsigned __int8)(a11 + v14)];
  v16 = one_value * table_related_index;
  v17 = table[v16];
  v18 = v17 + constant_index + a12;
  table[one_value * constant_index] = v17;
  table[v16] = v14;
  v19 = table[one_value
            * (unsigned __int8)(table[one_value
                                    * (unsigned __int8)(constant_index + table[one_value * (unsigned __int8)(v18 + a13)])]
                              + table_related_index)];
  a1[3] = v18;
  a1[2] = table_related_index;
  a1[1] = constant_index;
  *(_BYTE *)a1 = v19;
  return 0LL;
}

Again, this function only swaps values in table, so we just need to extract those 2 values and apply them to our input. But somehow my brain suddenly shut down at that time and I decided to implement the whole table swapping function…

Anyways, I finished my script on time, and it worked flawlessly. Here’s my encryption script that I reimplemented in python.

Decrypting and getting flag

Decrypting the flag is relatively easy, as we just need to run everything backward and it should yields the flag. There’s a note is that in this equation:

inp[i] = (5889 * (((inp[i] ^ v19) & 0xff) + 2 * ((v19 & inp[i]) & 0xff)) + 3584) & 0xff

We can’t simply reverse all operations to find old inp[i], but we can bruteforce all printable characters, apply to the above equation and see if the result is equal to encrypted buffer. This way we can recover original inp[i]. Here’s my decrypting script.

ACSC{pyth0n_numb4_0n3___s0_m4ny_functi0ns_s0_l1ttl3_us3ful_c0d3}

snake (5 solves)

Classic snake game. Score more than 31337 points and you will get the flag.

Given file: snake.tar.gz

Initial analysis

__int64 __fastcall main(int a1, char **a2, char **a3)
{
  __pid_t v4; // [rsp+Ch] [rbp-4h]

  v4 = fork();
  if ( v4 < 0 )
    return 1LL;
  if ( v4 )                                     // parrent proc
  {
    pid = v4;
    signal(2, handler);
    parrent_proc();
  }
  else                                          // child proc
  {
    if ( ptrace(PTRACE_TRACEME, 0LL, 0LL, 0LL) < 0 )
      return 1LL;
    ((void (*)(void))loc_401A5E)();
  }
  return 0LL;
}

Starting at main, you can see that it’ll call fork and ptrace, and there’re many more ptrace if we dig deep down in parrent_proc function. At this point there’s no doubt that this is a nanomite challenge. Nanomite technique is not a new trick in any CTFs, but for sure it’s one of the most interesting and obnoxious technique to deal with. Basically in this kind of challenge, a process will fork itself into two processes: parent and child, the parent process will then use ptrace to attach to the child one, which makes it impossible to attach our debugger to observe the child one.

In this binary, the child process will go through garbish code that’s going to raise exception:

mov     eax, cs:dword_44A164    ; Address at 0x401B1F
test    eax, eax
jz      short loc_401B55
mov     eax, 0x1010
ud2     ; <-- Raise SIGILL
; ---------------------------
db    2
db 0ADh
db 0F5h

When this happen, the flow will transfer to the process whom attach to our child to handle, in this case it’s our parent process. Looking at parent’s code:

__int64 parrent_proc()
{
  __int64 result; // rax
  unsigned __int64 v1; // [rsp+4h] [rbp-10Ch]
  int stat_loc; // [rsp+Ch] [rbp-104h] OVERLAPPED BYREF
  user_regs_struct register_1; // [rsp+10h] [rbp-100h] BYREF
  unsigned int rdx; // [rsp+F4h] [rbp-1Ch]
  unsigned int rcx; // [rsp+F8h] [rbp-18h]
  unsigned int rbx; // [rsp+FCh] [rbp-14h]
  unsigned __int64 rax; // [rsp+100h] [rbp-10h]
  int v8; // [rsp+10Ch] [rbp-4h]

  wait((__WAIT_STATUS)&stat_loc);
  v8 = 0;
  while ( 1 )
  {
    result = (unsigned __int8)stat_loc;
    if ( (unsigned __int8)stat_loc != 127 )
      return result;
    if ( (stat_loc & 0x7F) == 0 )
      exit(0);
    ptrace(PTRACE_GETREGS, (unsigned int)pid, 0LL, &register_1);
    rax = register_1.rax;
    if ( register_1.rax == 4 )
    {
      // ....
    }
    if ( rax > 4 )
      goto LABEL_19;
    if ( rax != 3 )
    {
      if ( rax <= 3 )
      {
        if ( rax == 1 )
        {
          // ....
        }
        if ( rax == 2 )
        {
          // ....
        }
      }
LABEL_19:
      if ( register_1.rax > 0xFFF && register_1.rax <= 0x2FFF )
        sub_442109((__int64)&register_1);
      goto LABEL_22;
    }
    // ....
LABEL_22:
    end_loop(&register_1);
    wait((__WAIT_STATUS)&stat_loc);
  }
}

First it’ll wait for our child process to change its state:

A state change is considered to be: 
    + The child terminated; 
    + The child was stopped by a signal; 
    + Or the child was resumed by a signal

In this case it was stopped by signal SIGILL, notice that before child steps to ud2 instruction, it moves 0x1010 to eax, after that the process will then handle this by getting child’s register states first:

ptrace(PTRACE_GETREGS, (unsigned int)pid, 0LL, &register_1);

Looking at the document:

PTRACE_GETREGS:
    Copy the tracee's general-purpose or floating-point registers, respectively, 
    to the address data in the tracer

In this case register_1 will hold every values of child’s register. The parent’s then going to look at the value of child’s rax and decide which path it’s going to take from there. Looking at the child’s code where it first raises SIGILL again, there’s no legit code after mov eax, 0x1010 instruction, so how does it continue executing from there? The answer is that the parent will change those rubbish bytes to a nice looking code and modify child’s flow by changing its rip:

_BYTE *__fastcall sub_440514(user_regs_struct *a1)
{
  __int64 v2; // rax
  __int64 v3; // [rsp+18h] [rbp-8h]

  if ( a1->rax != 0x1010 )
    return sub_44067C(a1);
  ptrace(PTRACE_POKETEXT, (unsigned int)pid, a1->rip, 0x9090909090909090LL);
  ptrace(PTRACE_PEEKTEXT, (unsigned int)pid, a1->rip, 0LL);
  v2 = ptrace(PTRACE_PEEKTEXT, (unsigned int)pid, a1->rip + 8, 0LL);
  ptrace(PTRACE_POKETEXT, (unsigned int)pid, a1->rip + 8, v2 - 0x1709CE54C4D832ADLL);
  v3 = ptrace(PTRACE_PEEKTEXT, (unsigned int)pid, a1->rip + 16, 0LL) ^ 0x3BED09CED801B4A8LL;
  return (_BYTE *)ptrace(PTRACE_POKETEXT, (unsigned int)pid, a1->rip + 16, v3);
}

__int64 __fastcall sub_4026DF(user_regs_struct *a1)
{
  a1->rip += 2LL;
  ptrace(PTRACE_SETREGS, (unsigned int)pid, 0LL, a1);
  return ptrace(PTRACE_CONT, (unsigned int)pid, 0LL, 0LL);
}

One of the most common first step to handle nanomite challenges is to fix child’s code so that we can statically analyze it, and there’s a neat trick for this.

LD_PRELOAD trick and further analysis

In order for parent process to modify child’s code, it must call ptrace with argument PTRACE_POKETEXT or PTRACE_POKEDATA:

PTRACE_POKETEXT, PTRACE_POKEDATA:
Copy the word data to the address addr in the tracee's memory

And what LD_PRELOAD trick can do is it will load a custom shared library first by specifying a path to the shared library you want to load in an environment variable called LD_PRELOAD. When you defined an exported symbol like malloc for example, instead of calling malloc in libc.so file, it’ll instead call malloc from our library, which we can do things like inspecting arguments, changing how malloc works…

Combining two above information, the general idea is that we want to hook ptrace using LD_PRELOAD and inspect its arguments. If there’s PTRACE_POKETEXT specified in the first argument of ptrace, we can retreive datas and address where it’s going to patch in the child code. More information about this technique can be found in this excellent blog.

I also used the hook code in mentioned blog and modified a little bit:

#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <stdarg.h>
#include <sys/utsname.h>
#include <sys/stat.h>
#include <stdlib.h>

FILE* pFile2;

long int ptrace(enum __ptrace_request __request, ...){
    pid_t caller = getpid();
    va_list list;
    va_start(list, __request);
    pid_t pid = va_arg(list, pid_t);
    void* addr = va_arg(list, void*);
    void* data = va_arg(list, void*);
    long int (*orig_ptrace)(enum __ptrace_request __request, pid_t pid, void *addr, void *data);
    orig_ptrace = dlsym(RTLD_NEXT, "ptrace");
    long int result = orig_ptrace(__request, pid, addr, data);

    if (__request == PTRACE_POKETEXT){
        fprintf(pFile2, "(0x%lx , 0x%lx),\n", (unsigned long)addr, (unsigned long)data);
    }
    return result;
}

__attribute__((constructor)) static void setup(void) {
    printf("hello\n");
    pFile2 = fopen("./log.txt", "a");
}

Compile with command:

gcc -shared -fPIC ptrace_hook.c -ldl -o ptrace_hook.so

Start the game with this command:

LD_PRELOAD=./ptrace_hook.so ./snake

And after playing the game for a while, the result should look like this:

(0x401471 , 0x4900f058b9090),
(0x40159d , 0x9090909090909090),
(0x4015a5 , 0x25b9ba0000201ab8),
...
(0x401830 , 0x9090909090909090),
(0x401838 , 0xe9a9ba00002071b8),
(0x401849 , 0x4c0589e4458b9090),

I copy this array and apply patches to snake using python script:

from pwn import *

li = [(0x401471 , 0x4900f058b9090),
(0x40159d , 0x9090909090909090),
(0x4015a5 , 0x25b9ba0000201ab8),
# ....
(0x401830 , 0x9090909090909090),
(0x401838 , 0xe9a9ba00002071b8),
(0x401849 , 0x4c0589e4458b9090)]
bin = open('./snake', 'rb').read()

for cmd in li:
    addr, data = cmd
    addr -= 0x400000
    bin = bin[:addr] + p64(data) + bin[addr+8:]

open('./new_snake', 'wb').write(bin)

This will create a new binary name new_snake with a completely fixed child’s code (although you can see there will be some of the code that can’t be disassembled, this is because the parent fixes rip to jump over those malformed bytes).

Now looking at the parent’s code again:

__int64 parrent_proc()
{
  // ....
  while ( 1 )
  {
    // ....
    rax = register_1.rax;
    if ( register_1.rax == 4 )  // [4]
    {
        rbx = register_1.rbx;   // score
        rcx = register_1.rcx;   // height
        rdx = register_1.rdx;   // width
        if ( LODWORD(register_1.rbx) )
        {
            sub_402248();
            sub_4024A9(rbx, rcx, rdx);
        }
        else
        {
            puts("At least try to get 1 point?...");
        }
        goto endloop;
    }
    if ( rax > 4 )
      goto LABEL_19;
    if ( rax != 3 )
    {
      if ( rax <= 3 )
      {
        if ( rax == 1 )     // [1]
        {
          time_stamp = time(0LL);
          register_1.rax = (unsigned int)time_stamp;
          goto endloop;
        }
        if ( rax == 2 )     // [2]
        {
          usleep(0x4E20u);
          goto endloop;
        }
      }
LABEL_19:
      if ( register_1.rax > 0xFFF && register_1.rax <= 0x2FFF )
        // [5]
    }
    // [3]
    v1 = __PAIR64__(register_1.rcx, register_1.rbx);    // Food's coordinate
    if ( !v8 )
    {
      sub_4020DA();
      sub_4020FC();
      v8 = 1;
    }
    sub_40212D(v1)
  }
}

As already mentioned, the parent code will decide which path it’s going to take based on the value of rax. You’ll notice that there’re only 5 branches which I comment from [1] to [5], the 5th one will modify child’s code, the other four are special branches that’s going to control special variables in the game:

[1] will pass the time stamp to child process.
[2] will just sleep.
[3] will inital MD5 hash and update with our previous time stamp if this is the first time it hits, then it will update the hash using coordinates of the food that’s being eaten by the snake.
[4] will get our score, screen’s width and screen’s height, final updates for MD5 hash using .text section of child’s process and then it will create a packet that’s going to be sent to server with the following address snake.chal.ctf.acsc.asia:4444

The packet has the following syntax:

p32(time_stamp) + p32(width) + p32(height) + p32(score) + MD5 hash

And the MD5 will be created as:

hash = MD5(time_stamp + all food's coordinate + child's .text section)

Analyzing packet’s components

We already had time_stamp in [1], .text section of child process can be achieved by getting all the bytes from new_snake with following IDA script:

start = 0x401280
length = 0x442FCE - start

print(idaapi.get_bytes(start, length))

And how about food’s coordinate? Because we need to get a score of 31338 to beat the game, we’ll need at least 31338 food’s positions to create a valid packet. So we need to find the code where it generates these foods.

This is where the appearance of time_stamp takes in place. I highly suspect that all the food will be generated with an algorithm where time_stamp is a random seed, so I look back at child’s code to trace every references of time_stamp and stumble upon this code:

__int64 __fastcall sub_442AD1(__int64 a1)
{
  __int64 result; // rax

  qword_44ACC0[0] = a1;
  for ( dword_44A3C0 = 1; ; ++dword_44A3C0 )
  {
    result = (unsigned int)dword_44A3C0;
    if ( dword_44A3C0 > 311 )
      break;
    qword_44ACC0[dword_44A3C0] = dword_44A3C0
                               - 8612607789318272311LL
                               * (qword_44ACC0[dword_44A3C0 - 1] ^ ((unsigned __int64)qword_44ACC0[dword_44A3C0 - 1] >> 62));
  }
  return result;
}

Next I try to find all codes that has qword_44ACC0 or dword_44A3C0 apperances and again stumble upon this following code:

unsigned __int64 sub_442BF8()
{
  int v0; // eax
  unsigned __int64 v2; // [rsp+0h] [rbp-18h]
  unsigned __int64 v3; // [rsp+0h] [rbp-18h]
  unsigned __int64 v4; // [rsp+0h] [rbp-18h]
  unsigned __int64 v5; // [rsp+0h] [rbp-18h]
  int i; // [rsp+Ch] [rbp-Ch]

  if ( dword_44A3C0 > 311 )
  {
    if ( dword_44A3C0 == 313 )
      sub_442AD1(5489LL);
    for ( i = 0; i <= 155; ++i )
    {
      v2 = qword_44ACC0[i] & 0xFFFFFFFF80000000LL | qword_44ACC0[i + 1] & 0x7FFFFFFF;
      qword_44ACC0[i] = qword_44A3D0[v2 & 1] ^ (v2 >> 1) ^ qword_44ACC0[i + 156];
    }
    while ( i <= 310 )
    {
      v3 = qword_44ACC0[i] & 0xFFFFFFFF80000000LL | qword_44ACC0[i + 1] & 0x7FFFFFFF;
      qword_44ACC0[i] = qword_44A3D0[v3 & 1] ^ (v3 >> 1) ^ qword_44ACC0[i - 156];
      ++i;
    }
    qword_44B678 = ((qword_44B678 & 0xFFFFFFFF80000000LL | qword_44ACC0[0] & 0x7FFFFFFF) >> 1) ^ qword_44B198 ^ qword_44A3D0[qword_44ACC0[0] & 1];
    dword_44A3C0 = 0;
  }
  v0 = dword_44A3C0++;
  v4 = ((unsigned __int64)qword_44ACC0[v0] >> 26) & 0xBBBBBBBBBBBBBBBBLL ^ qword_44ACC0[v0];
  v5 = (((v4 << 19) & 0xA82C9FFFEDA60000LL ^ v4) << 37) & 0xFFFABCD000000000LL ^ (v4 << 19) & 0xA82C9FFFEDA60000LL ^ v4;
  return (v5 >> 45) ^ v5;
}

I then looked up these constants on the internet and found out that this is Mersenne Twister algorithm. I quickly grabbed a C code from this page, modified constants and operations to match everything in the binary and successfully recreate every single food’s coordinate:

#include <stdio.h>

#define NN 312
#define MM 156
#define MATRIX_A 0xB5026F5AA96619E9ULL
#define UM 0xFFFFFFFF80000000ULL /* Most significant 33 bits */
#define LM 0x7FFFFFFFULL /* Least significant 31 bits */


/* The array for the state vector */
static unsigned long long mt[NN]; 
/* mti==NN+1 means mt[NN] is not initialized */
static int mti=NN+1; 

/* initializes mt[NN] with a seed */
void init_genrand64(unsigned long long seed)
{
    mt[0] = seed;
    for (mti=1; mti<NN; mti++) 
        mt[mti] = mti - (8612607789318272311LL * (mt[mti-1] ^ (mt[mti-1] >> 62)));
}

/* generates a random number on [0, 2^64-1]-interval */
unsigned long long genrand64_int64(void)
{
    int i;
    unsigned long long x;
    static unsigned long long mag01[2]={0ULL, MATRIX_A};

    if (mti >= NN) { /* generate NN words at one time */

        /* if init_genrand64() has not been called, */
        /* a default initial seed is used     */
        if (mti == NN+1) 
            init_genrand64(5489ULL); 

        for (i=0;i<NN-MM;i++) {
            x = (mt[i]&UM)|(mt[i+1]&LM);
            mt[i] = mt[i+MM] ^ (x>>1) ^ mag01[(int)(x&1ULL)];
        }
        for (;i<NN-1;i++) {
            x = (mt[i]&UM)|(mt[i+1]&LM);
            mt[i] = mt[i+(MM-NN)] ^ (x>>1) ^ mag01[(int)(x&1ULL)];
        }
        x = (mt[NN-1]&UM)|(mt[0]&LM);
        mt[NN-1] = mt[MM-1] ^ (x>>1) ^ mag01[(int)(x&1ULL)];

        mti = 0;
    }
  
    x = mt[mti++];

    x ^= (x >> 26) & 0xBBBBBBBBBBBBBBBBLL;
    x ^= (x << 19) & 0xA82C9FFFEDA60000LL;
    x ^= (x << 37) & 0xFFFABCD000000000LL;
    x ^= (x >> 45);

    return x;
}

int main(void)
{
    int i;
    init_genrand64(0x63FA7A57); // time_stamp
    for (i=0; i<31338; i++) {
      printf("%02llx ", genrand64_int64() % (0xcb  / 2));
      printf("%02llx ", genrand64_int64() % (0x2c));
    }
    return 0;
}

Create a valid packet

Now I’m having everything I need to create a valid packet, I randomly chose 0x63FA7A57 as a seed, 0xcb and 0x2c as values of width and height respectively and create 31338 food’s positions.

This is my script for recreating a packet and sending to server:

import hashlib
from pwn import *

md5 = hashlib.md5()

li = '0a 03 08 02 3b 04 53 19 15 1b 34 ...'

x = [int(i, 16) for i in li.split(' ')]
text_section = open('hehe', 'rb').read()

coordinates = b''
num = 0
for i in x:
    coordinates += p32(i)
    num += 1

md5.update(p32(0x63FA7A57))
md5.update(coordinates)
md5.update(text_section)

r = remote('snake.chal.ctf.acsc.asia', 4444)

packet = p32(0x63FA7A57)	# time stamp
packet += p32(0xcb)		# width
packet += p32(0x2c)		# height
packet += p32(0x7a6a)	# score
packet += md5.digest()	# md5

r.send(packet)
r.interactive()

After running this, we get:

nguyenguyen753@nguyenguyen753:~/Desktop/CTF/ACSC/snake/mtwister$ python3 script.py 
[+] Opening connection to snake.chal.ctf.acsc.asia on port 4444: Done
[*] Switching to interactive mode
=== LEADERBOARD ===
1. johnwick   31337
2. ishowsewey 12
3. vinh       4
4. cakash     3
5. weekek     2

Congrats! You beat the highscore!
ACSC{y0u_4R3_pr0_G4m3r_Or_ch34t3R}

ACSC{y0u_4R3_pr0_G4m3r_Or_ch34t3R}