Linux malware usually hides in Berkeley Packet Filter (BPF) socket packages, that are small bits of executable logic that may be embedded within the Linux kernel to customise the way it processes community site visitors. Among the most persistent threats on the Web use these filters to stay dormant till they obtain a particular “magic” packet. As a result of these filters may be a whole lot of directions lengthy and contain advanced logical jumps, reverse-engineering them by hand is a gradual course of that creates a bottleneck for safety researchers.
To discover a higher means, we checked out symbolic execution: a technique of treating code as a sequence of constraints, moderately than simply directions. By utilizing the Z3 theorem prover, we will work backward from a malicious filter to routinely generate the packet required to set off it. On this put up, we clarify how we constructed a software to automate this, turning hours of guide meeting evaluation right into a activity that takes just some seconds.
Earlier than we take a look at the way to deconstruct malicious filters, we have to perceive the engine working them. The Berkeley Packet Filter (BPF) is a extremely environment friendly know-how that enables the kernel to tug particular packets from the community stack primarily based on a set of bytecode directions.
Whereas many trendy builders are aware of eBPF (Prolonged BPF), the highly effective evolution used for observability and safety, this put up focuses on “classic” BPF. Initially designed for instruments like tcpdump, basic BPF makes use of a easy digital machine with simply two registers to judge community site visitors at excessive speeds. As a result of it runs deep inside the kernel and might “hide” site visitors from user-space instruments, it has develop into a favourite software for malware authors trying to construct stealthy backdoors.
Making a contextual illustration of BPF directions utilizing LLMs is already lowering the guide overhead for analysts, crafting the community packets that correspond to the validating situation can nonetheless be a number of work, even with the added context supplied by LLM’s.
More often than not this isn’t an issue in case your BPF program has solely ~20 directions, however this could get exponentially extra advanced and time-consuming when a BPF program consists of over 100 directions as we’ve noticed in among the samples.
If we deconstruct the issue we will see that it boils right down to studying a buffer and checking a constraint, relying on the end result we both proceed our execution path or cease and examine the tip end result.
This sort of downside that has a deterministic final result may be solved by Z3, a theorem prover that has the means to resolve issues with a set of given constraints.
BPFDoor is a complicated, passive Linux backdoor, primarily used for cyberespionage by China-based menace actors, together with Crimson Menshen (often known as Earth Bluecrow). Energetic since at the very least 2021, the malware is designed to keep up a stealthy foothold in compromised networks, focusing on telecommunications, training, and authorities sectors, with a robust emphasis on operations in Asia and the Center East.
BPFDoor makes use of BPF to observe all incoming site visitors with out requiring a particular community port to be open.
BPFDoor instance directions
Let’s concentrate on the pattern of which was shared for the analysis carried out by Fortinet (82ed617816453eba2d755642e3efebfcbd19705ac626f6bc8ed238f4fc111bb0). If we dissect the BPF directions and add some annotations, we will write the next:
(000) ldh [0xc] ; Learn halfword at offset 12 (EtherType)
(001) jeq #0x86dd, jt 2, jf 6 ; 0x86DD (IPv6) -> ins 002 else ins 006
(002) ldb [0x14] ; Learn byte at offset 20 (Protocol)
(003) jeq #0x11, jt 4, jf 15 ; 0x11 (UDP) -> ins 004 else DROP
(004) ldh [0x38] ; Learn halfword at offset 56 (Dst Port)
(005) jeq #0x35, jt 14, jf 15 ; 0x35 (DNS) -> ACCEPT else DROP
(006) jeq #0x800, jt 7, jf 15 ; 0x800 (IPv4) -> ins 007 else DROP
(007) ldb [23] ; Learn byte at offset 23 (Protocol)
(008) jeq #0x11, jt 9, jf 15 ; 0x11 (UDP) -> ins 009 else DROP
(009) ldh [20] ; Learn halfword at offset 20 (fragment)
(010) jset #0x1fff, jt 15, jf 11 ; fragmented -> DROP else ins 011
(011) ldxb 4*([14]&0xf) ; Load index (x) register ihl & 0xf
(012) ldh [x + 16] ; Learn halfword at offset x+16 (Dst Port)
(013) jeq #0x35, jt 14, jf 15 ; 0x35 (DNS) -> ACCEPT else DROP
(014) ret #0x40000 (ACCEPT)
(015) ret #0 (DROP)Within the above instance we will set up there are two paths that result in an ACCEPT final result (step 5 and step 13). We are able to additionally clearly observe sure bytes being checked, together with their offsets and values.
Taking these validations, and preserving monitor of something that will match the ACCEPT path, we must always be capable to routinely craft the packets for us.
Calculating the shortest path
To seek out the shortest path to a packet that validates the situations introduced within the BPF directions, we have to preserve monitor of paths that aren’t ending within the unfavorable situation.
We begin off by making a small queue. This queue holds a number of necessary information factors:
At any time when we encounter an instruction that’s checking a situation, we preserve monitor of the end result utilizing a boolean and retailer this in our queue, so we will examine paths on the quantity of situations earlier than the ACCEPT situation is reached and calculate our shortest path. In pseudocode we will specific this finest as:
paths = []
queue = dequeue([(0, [0])])
whereas queue:
computer, path = queue.popleft()
if computer >= len(directions):
proceed
instruction = directions[pc]
if instruction.class == return_instruction:
if instruction_constant != 0: # settle for
paths.append(path)
proceed # drop or settle for, cease parsing this instruction
if instruction.class == jump_instruction:
if instruction.operation == unconditional_jump:
next_pc = computer + 1 + instruction_constant
queue.append((next_pc, path + [next_pc]))
proceed
# Conditional soar, discover each
pc_true = computer + 1 + instruction.jump_true
pc_false = computer + 1 + instruction.jump_false
if instruction.jump_true <= instruction.jump_false:
queue.append((pc_true, path + [pc_true]))
queue.append((pc_false, path + [pc_false]))
# else: identical as above however reverse order of appending
# else: sequential instruction, append to the queueIf we execute this logic in opposition to our earlier BPFDoor instance, we might be introduced with the shortest path to an accepted packet:
(000) code=0x28 jt=0 jf=0 ok=0xc ; Learn halfword at offset 12 (EtherType)
(001) code=0x15 jt=0 jf=4 ok=0x86dd ; IPv6 packet
(002) code=0x30 jt=0 jf=0 ok=0x14 ; Learn byte at offset 20 (Protocol)
(003) code=0x15 jt=0 jf=11 ok=0x11 ; Protocol quantity 17 (UDP)
(004) code=0x28 jt=0 jf=0 ok=0x38 ; Learn phrase at offset 56 (Vacation spot Port)
(005) code=0x15 jt=8 jf=9 ok=0x35 ; Vacation spot port 53
(014) code=0x06 jt=0 jf=0 ok=0x40000 ; Settle forThat is already a useful automation in routinely fixing our BPF constraints relating to analyzing BPF directions and determining how the accepted packet for the backdoor would look. However what if we will take it a step additional?
What if we might create a small software that can give us the anticipated packet again in an automatic method?
One such software that’s good to resolve issues given a set of constraints is Z3. Developed by Microsoft the software is labeled as a theorem prover and exposes straightforward to make use of features performing very advanced mathematical operations beneath the hood.
The opposite software we’ll use for crafting our legitimate magic packets is scapy, a preferred Python library for interactive packet manipulation.
Provided that we have already got a means to determine the trail to an accepted packet, we’re left with fixing the issue by itself, after which translating this answer to the bytes at their respective offsets in a community packet.
A standard method for exploring paths taken in a given program is named symbolic execution. For this method we’re giving enter that can be utilized as variables, together with the constraints. By figuring out the end result of a profitable path we will orchestrate our software to search out all of those profitable paths and show the tip end result to us in a contextualized format.
For this to work we might want to implement a small machine able to preserving monitor of the state of issues like constants, registers, and completely different boolean operators as an final result of a situation that’s being checked.
class BPFPacketCrafter:
MIN_PKT_SIZE = 64 # Minimal packet dimension (Ethernet + IP + UDP headers)
LINK_ETHERNET = "ethernet" # DLT_EN10MB - begins with Ethernet header
LINK_RAW = "raw" # DLT_RAW - begins with IP header straight
MEM_SLOTS = 16 # Variety of scratch reminiscence slots (M[0] to M[15])
def __init__(self, ins: listing[BPFInsn], pkt_size: int = 128, ltype: str = "ethernet"):
self.directions = ins
self.pkt_size = max(self.MIN_PKT_SIZE, pkt_size)
self.ltype = ltype
# Symbolic packet bytes
self.packet = [BitVec(f"pkt_{i}", 8) for i in range(self.pkt_size)]
# Symbolic registers (32-bit)
self.A = BitVecVal(0, 32) # Accumulator
self.X = BitVecVal(0, 32) # Index register
# Scratch reminiscence M[0-15] (32-bit phrases)
self.M = [BitVecVal(0, 32) for _ in range(self.MEM_SLOTS)]With the above code we’ve coated a lot of the machine for preserving a state in the course of the symbolic execution. There are after all extra situations we have to preserve monitor of, however these are dealt with in the course of the fixing course of. To deal with an ADD instruction, the machine maps the BPF operation to a Z3 addition:
def _execute_ins(self, insn: BPFInsn):
cls = insn.cls
if cls == BPFClass.ALU:
op = insn.op
src_val = BitVecVal(insn.ok, 32) if insn.src == BPFSrc.Ok else self.X
if op == BPFOp.ADD:
self.A = self.A + src_valFortunately the BPF instruction set is barely a small set of directions that’s comparatively straightforward to implement — solely having two registers to maintain monitor of is unquestionably a welcome constraint!
The general working of this symbolic execution may be specified by the next abstracted overview:
Initialize the “x” (index) and “a” (accumulator) registers to their base state.
Loop over the directions from the trail that was recognized as a profitable path;
Execute non-jump directions as-is, preserving monitor of register states.
Decide if a soar instruction is encountered, and examine if the department must be taken.
Use the Z3 examine() operate to examine if our situation has been happy with the given constraint (ACCEPT).
Convert the Z3 bitvector arrays into bytes.
Use scapy to assemble packets of the transformed bytes.
If we take a look at the constraints construct by the Z3 solver we will hint the execution steps taken by Z3 to construct the packet bytes:
[If(Concat(pkt_12, pkt_13) == 0x800,
pkt_14 & 0xF0 == 0x40,
True),
If(Concat(pkt_12, pkt_13) == 0x800, pkt_14 & 0x0F >= 5, True),
If(Concat(pkt_12, pkt_13) == 0x800, pkt_14 & 0x0F == 5, True),
If(Concat(pkt_12, pkt_13) == 0x86DD,
pkt_14 & 0xF0 == 0x60,
True),
0x86DD == ZeroExt(16, Concat(pkt_12, pkt_13)),
0x11 == ZeroExt(24, pkt_20),
0x35 == ZeroExt(16, Concat(pkt_56, pkt_57))]The primary a part of the Z3 displayed constraints are the constraints added to make sure we’re increase a legitimate ethernet IP when coping with link-layer BPF directions. The “If” statements apply particular constraints primarily based on which protocol is detected:
IPv4 Logic (0x0800):
pkt_14 & 240 == 64: Byte 14 is the beginning of the IP header. The 0xF0 masks isolates the excessive nibble (the Model discipline) to examine if the model is 4 (0x40).
pkt_14 & 15 == 5: 15 (0x0F), isolating the low nibble (IHL – Web Header Size). This mandates a header size of 5 (20 bytes), which is the usual dimension with out choices.
IPv6 Logic (0x86dd):
We are able to observe the community packet values once we take a look at the second half the place completely different values are being checked:
0x86DD: Packet situation for IPv6 header.
0x11: UDP protocol quantity.
0x35: The vacation spot port (53).
Subsequent to the anticipated values we will see the byte offset of the place it ought to exist in a given packet (e.g. pkt_12, pkt_13).
Now that we’ve established which bytes ought to exist at particular offsets we will convert it into an precise community packet utilizing scapy. If we generate a brand new packet from the bytes of our earlier Z3 constraints we will clearly see what our packet would appear to be, and retailer this for additional processing:
###[ Ethernet ]###
dst = 00:00:00:00:00:00
src = 00:00:00:00:00:00
kind = IPv6 <-- IPv6 Packet
###[ IPv6 ]###
model = 6
tc = 0
fl = 0
plen = 0
nh = UDP <-- UDP Protocol
hlim = 0
src = ::
dst = ::
###[ UDP ]###
sport = 0
dport = area <-- Port 53
len = 0
chksum = 0x0These newly crafted packets can in flip be used for additional analysis or figuring out the presence of those implants by scanning for these over the community.
Understanding what a particular BPF set of directions is doing may be cumbersome and time-consuming work. The instance used is barely a complete of sixteen directions, however we’ve encountered samples that had been over 200 directions that will’ve taken at the very least a day to know. By utilizing the Z3 solver, we will now scale back this time to only seconds, and never solely show the trail to an accepted packet, but additionally the packet skeleton for this as effectively.
We’ve open-sourced the filterforge software to assist the neighborhood automate the deconstruction of BPF-based implants. Yow will discover the supply code, together with utilization examples, on our GitHub repository.
By publishing this analysis and sharing our software for lowering analysts’ time spent determining the BPF directions, we hope to spark additional analysis by others to increase on this type of automation.



