Decoding C2 Traffic in Python, or HOWTO eat 🍿 during an IR engagement?
The scene 🎥
When you have the chance to catch an attacker live, it is always a delight to monitor and dissect their moves in real-time, even a posteriori. To make it happen, you must have some kind of Full Packet Capture in the first place, then, of course, you need a thorough reverse-engineering of the malware will document the encoding, fields, and structures.
Ok, you have all the materials needed, now how can we transform the pcap into human transcripts? As usual, while it may seem easy in theory “You just have to parse the application layer”, it is a bit more complicated in real life (especially during an Incident Response engagement, so it is better to be prepared!).
Theory 🆚 Life
First of all, let’s clarify some assumptions.
- Packet loss is a real problem. Given TCP has been solving this problem 👌 since the 80s, nobody actually cares about TCP retransmissions, chunk overlaps, urgency flags. But when you are reading a PCAP, you have to same support a full TCP engine to take care of it.
- IP Fragmentation happens! Especially when you monitor old OS like Windows XP/2003… Deal with it!
- HTTP/1.1 introduced HTTP persistent connections; two HTTP requests can be issued inside the same TCP connection.
- Multiplexing: Two C2 commands can be “in-flight” in parallel (example: one uploading a file, the other executing
- Traffic captured between two Bluecoat proxies prevents you from filtering out IP addresses.
Learning by failing 🗑️
The first time I approached the problem, I implemented it using scapy, with a very basic TCP engine (full disclosure: it routinely entered into an infinite loop: writing a bugfree TCP/IP stack is hard) and then I faced difficulty handling the HTTP Request/response paradigm in non-convoluted ways (Answering Machine was too lightweight, Automaton was too heavy).
Round 1 🟠 Packet loss — 🔴 HTTP/1.1 — 🔴 Proxy
The second time, I kept it simple: delegate the TCP/IP parsing to a robust tool such as Wireshark/tshark and abuse their “Follow TCP session” feature. Obviously, these tools are clearly not designed for that purpose, and automating the session tracking was hackish. Now that I am thinking about it, I wonder if libshark existed at that time, it would have been much easier… 🤷♂️
Round 2 🟢 Packet loss — 🔴 HTTP/1.1 — 🔴 Proxy
The third time, I decided to stop using tools not designed for the job and I met tcpflow. This simple tool performs TCP/IP refragmentation/reassembly, extracts the Layer 7 payload, and writes each one-way flow on disk, you end up with gazillions of files. It is like a folder full of C2 streams but without metadata, ordering, without the understanding of the HTTP request/response paradigm: a cluster mess.
Round 3 🟢 Packet loss — 🔴 HTTP/1.1 — 🔴 Proxy
Now, the gloves are off, we cannot half-baked it finally, it is time to do some real programming!
Final round: Doing it right
Partners in crime
First, I need to introduce two old buddies:
- libnids: Released in 2003, it emulates the IP stack of Linux 2.0.x (state of the art… in 2003) and is based on an asynchronous event notification (quite novel at its time). Libnids offers IP defragmentation and TCP stream assembly.
- libhtp: The HTTP Parsing library developed and used by Suricata among other security projects.
(For the record, MITRE also developed chopshop which seems to be exactly what I was looking for, unfortunately, I never managed to overcome its learning curve, its documentation assumes you are already an expert at it, I am sure they have a lot of awesome documentation, but they are not public as far as I know. And to be honest, I often prefer to copy/paste functions instead of committing to a framework and then spend more time trying to workaround its limits or its ways of working.)
Integrating these dinosaurs in 2020 🦖 🦕 🐊
won’t bore you with all my failures trying to compile these relics on a recent Linux distribution but I spent waaaaay too much time (oh, by the way, thank you Debian very much 🤬).
Instead, I will directly share with you “what works”: a simple Docker container. By hardcoding libraries' versions, the build of the image should still be reproducible in 10 years (if Github still exists…).
Just do it
Here is the skeleton I usually copy/paste when starting a new decoder:
Both libraries use asynchronous event notifications so it is quickly demanding to maintain both state engines at the same time when you make a change in the code. So take your time and it will eventually work 🙂
If your C2 uses HTTP connections, you just have to care about three functions:
- Filter the HTTP sessions you want to track in request_headers_ready based on IP addresses, URL, hostname, etc.
- Inspect the HTTP request in request_complete_callback
- Inspect the HTTP response in request_complete_callback
And you are dialed: you have a rock-solid foundation to build your C2 dissector!
Round 4: 🟢 Packet loss — 🟢 HTTP/1.1 — 🟢 Proxy