Probably most of the malwares out there use some sort of packer to evade detection and classification or to make the post-analysis more difficult. So in this blog post, I will talk about one of the most-used packing techniques and how to defeat that with the power of binary emulation. Also, I’ll drop a PoC of the new project that I’m working on. Note that this is a universal generic solution for packers that rely on unpacking code in heap memory and execute it.

Background

For packers that encrypt or compress a payload, a stub (a piece of code that contains the decompression or decryption routine) acts as a loader, which executes before the malware. So, we have a loader that decrypts or decompress the real malware to execute it. Stub can use so many techniques to achieve its goal.

There is a technique called self-modifying. In this technique, stub tries to acquire a block of writeable, executable memory, unpack (decrypt/decompress and write) code to the newly allocated memory and finally, transfer execution to the unpacked code in the newly allocated memory. So, if we manage to somehow read that allocated memory and dump the executable payload, right before the execution, we probably unpacked the real malware.

Digging down

Let’s dig down more and talk about some Windows APIs. To acquire a new block of memory, malwares will try to use VirtualAlloc(). VirtualAlloc() is a Windows API inside kernel32.dll. According to Microsoft docs, VirtualAlloc():

Reserves, commits, or changes the state of a region of pages in the virtual address space of the calling process. Memory allocated by this function is automatically initialized to zero.

LPVOID VirtualAlloc(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flAllocationType,
  DWORD  flProtect
);

Important variables here are dwSize, flProtect, and the return value. dwSize will define allocated memory block’s size, flProtect will define the memory protection and the return value will be the address of the newly allocated memory.

First of all, we need to catch flProtect, since in order to execute code inside a memory region, protection value must be one of the following values:

  • PAGE_EXECUTE (0x10)
  • PAGE_EXECUTE_READ (0x20)
  • PAGE_EXECUTE_READWRITE (0x40)
  • PAGE_EXECUTE_WRITECOPY (0x80)

So, we need to monitor dynamically allocated memories with executing protection cause they suggest that malware is about to unpack (or otherwise obtain) code to store there.

There is another interesting API called VirtualProtect(). VirtualProtect() changes the protection of a memory region. Malware can use this API call to change the protection of the allocated memory region if it is not already executable.

BOOL VirtualProtect(
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD  flNewProtect,
  PDWORD lpflOldProtect
);

Important variable is flNewProtect. like VirtualAlloc() if flNewPortect contains one of the excution values (0x10, 0x20, 0x40 or 0x80) we need to monitor that specific region for unpacked code.

After we managed to list memory regions with execute protection, we need to set a memory execution breakpoint at those locations. So, if any code executes inside that region (probably unpacked code), we can catch and dump it.

Choosing tools and solutions

I needed a solution to write a cross-platform tool for automatically unpacking malwares with minimal requirements. as a Python developer, I prefer a framework written in Python or with a Python binding. So I chose a binary emulator Qiling!.

Qiling Framework is aimed to change IoT security research, malware analysis and reverse engineering landscape. The main objective is to build a cross-platform and multi-architecture framework and not just another reverse engineering tool. Qiling Framework is designed as a binary instrumentation and binary emulation framework that supports cross-platform and multi-architecture. It is packed with powerful features such as code interception and arbitrary code injection before or during a binary execution. It is also able to patch a packed binary during execution. Qiling Framework is open source and it is written in Python, a simple and commonly used programming language. This will encourage continuous contributions from the security and open-source community making it a sustainable project.

Also, there is a comparison between Qiling with other tools. take a look at this!.

Writing a PoC

First of all, we need to re-implement VirtualAlloc() and VirtualProtect() APIs in a format that Qiling recognizes. To make this happen, we need to write something like this:

@winapi(cc=STDCALL, params={
    "lpAddress": POINTER,
    "dwSize": SIZE_T,
    "flAllocationType": DWORD,
    "flProtect": DWORD
})
def hook_VirtualAlloc(ql, address, params):
    dwSize = params["dwSize"]
    addr = ql.os.heap.mem_alloc(dw_size) # allocate memory in heap
    return addr

winapi is a decorator that helps to define the structure of the API call. Here we have an API with calling convention of STDCALL and 4 inputs with the type of POINTER, SIZE_T, DWORD, and yet another DWORD.

Qiling will pass ql (sandbox) object as a parameter and address of the location that called the API. params will be a dictionary that contains all passed parameters to that API call.

For VirtualProtect() we can write something like this:

@winapi(cc=STDCALL, params={
    "lpAddress": POINTER,
    "dwSize": UINT,
    "flNewProtect": UINT,
    "lpflOldProtect": POINTER
})
def hook_VirtualProtect(ql, address, params):
    return 1

To override default API hooks, Qiling provides set_api:

ql.set_api("VirtualAlloc", hook_VirtualAlloc)
ql.set_api("VirtualProtect", hook_VirtualProtect)

Inside hooked APIs we need to catch allocated memory regions with executable protection:

addr = params["lpAddress"]
dw_size = params["dwSize"]
fl_new_protect = params["flNewProtect"]

# PAGE_EXECUTE (0x10), 
# PAGE_EXECUTE_READ (0x20), 
# PAGE_EXECUTE_READWRITE (0x40), 
# PAGE_EXECUTE_WRITECOPY (0x80),
if fl_new_protect in [0x10, 0x20, 0x40, 0x80]:
    # add newly allocated memory to list of memory regions
    mem_regions.append({"start": addr, "size": dw_size})

The next step will set up a memory execution breakpoint. Qiling offers a function called hook_code(). hook_code() takes 3 parameters. a callback function, beginning, and end of the memory region.

ql.hook_code(dump_memory_region, begin=addr, end=addr + dw_size)

If any code get excuted inside that memory region, Qiling will call dump_memory_region(). To dump that region, Qiling offers ql.mem.read(). read() takes 2 parameters, start location and size.

excuted_mem = ql.mem.read(address, size)

Full code for PoC will be:

import sys

from qiling import *
from qiling.os.windows.const import *
from qiling.os.const import *
from qiling.os.windows.fncc import *
from qiling.os.windows.utils import *
from qiling.os.windows.thread import *
from qiling.os.windows.handle import *
from qiling.exception import *

mem_regions = []

def get_mem(addr):
    for mem_region in mem_regions:
        start = mem_region['start']
        end = mem_region['start'] + mem_region['size']
        size = mem_region['size']
        if addr in range(start, end + 1): # check if address in that memory region
            return (start, size)
    
    return None
            
def dump_memory_region(ql, address, size):
    mem_region = get_mem(address) # check if memory region exists
    # check if that memory region got removed before (duplication)
    if mem_region is not None:
        # read memory, start =  first excuted address ; end = end of the region
        excuted_mem = ql.mem.read(address, mem_region[1])
        with open(f"{hex(address)}.bin", "wb") as f:
            f.write(excuted_mem) # write extracted code to a binary file

        # delete that region to overcome duplication
        mem_regions.remove({"start": mem_region[0], "size":mem_region[1]})

@winapi(cc=STDCALL, params={
    "lpAddress": POINTER,
    "dwSize": UINT,
    "flNewProtect": UINT,
    "lpflOldProtect": POINTER
})
def hook_VirtualProtect(ql, address, params):
    addr = params["lpAddress"]
    dw_size = params["dwSize"]
    fl_new_protect = params["flNewProtect"]

    # PAGE_EXECUTE (0x10), 
    # PAGE_EXECUTE_READ (0x20), 
    # PAGE_EXECUTE_READWRITE (0x40), 
    # PAGE_EXECUTE_WRITECOPY (0x80),
    if fl_new_protect in [0x10, 0x20, 0x40, 0x80]:
        # add newly allocated memory to list of memory regions
        mem_regions.append({"start": addr, "size": dw_size})
        # add memory on excute breakpoint to newly allocated memory
        ql.hook_code(dump_memory_region, begin=addr, end=addr + dw_size)
    return 1

@winapi(cc=STDCALL, params={
    "lpAddress": POINTER,
    "dwSize": SIZE_T,
    "flAllocationType": DWORD,
    "flProtect": DWORD
})
def hook_VirtualAlloc(ql, address, params):
    dw_size = params["dwSize"]
    addr = ql.os.heap.mem_alloc(dw_size) # allocate memory in heap

    # PAGE_EXECUTE (0x10), 
    # PAGE_EXECUTE_READ (0x20), 
    # PAGE_EXECUTE_READWRITE (0x40), 
    # PAGE_EXECUTE_WRITECOPY (0x80),
    fl_protect = params["flProtect"]
    if fl_protect in [0x10, 0x20, 0x40, 0x80] :

        # add newly allocated memory to list of memory regions
        mem_regions.append({"start": addr, "size": dw_size})
        # add memory on excute breakpoint to newly allocated memory
        ql.hook_code(dump_memory_region, begin=addr, end=addr + dw_size)

    return addr

def sandbox(path, rootfs):
    # create a sanbox for windows x86
    ql = Qiling([path], "rootfs/x86_windows", output = "debug")

    # set API breakpoints
    ql.set_api("VirtualAlloc", hook_VirtualAlloc)
    ql.set_api("VirtualProtect", hook_VirtualProtect)

    ql.run()

if __name__ == "__main__":
    if not len(sys.argv) == 2:
        exit(-1)

    path = sys.argv[1]
    sandbox(path, "rootfs/x86_windows")

Demonstration

For demonstration, i wrote a small program that runs a shellcode.

#include <windows.h>
#include <iostream>

char shellcode[113] =   "\x31\xdb\x64\x8b\x7b\x30\x8b\x7f"
                        "\x0c\x8b\x7f\x1c\x8b\x47\x08\x8b"
                        "\x77\x20\x8b\x3f\x80\x7e\x0c\x33"
                        "\x75\xf2\x89\xc7\x03\x78\x3c\x8b"
                        "\x57\x78\x01\xc2\x8b\x7a\x20\x01"
                        "\xc7\x89\xdd\x8b\x34\xaf\x01\xc6"
                        "\x45\x81\x3e\x43\x72\x65\x61\x75"
                        "\xf2\x81\x7e\x08\x6f\x63\x65\x73"
                        "\x75\xe9\x8b\x7a\x24\x01\xc7\x66"
                        "\x8b\x2c\x6f\x8b\x7a\x1c\x01\xc7"
                        "\x8b\x7c\xaf\xfc\x01\xc7\x89\xd9"
                        "\xb1\xff\x53\xe2\xfd\x68\x63\x61"
                        "\x6c\x63\x89\xe2\x52\x52\x53\x53"
                        "\x53\x53\x53\x53\x52\x53\xff\xd7";

int main() {
    DWORD old_protect;
    LPVOID executable_area = VirtualAlloc(NULL, 113, MEM_RESERVE, PAGE_READWRITE);
    executable_area = VirtualAlloc(executable_area, 113, MEM_COMMIT, PAGE_READWRITE);

    if (executable_area == nullptr) {
        std::cout << "error in allocating memory" << std::endl;
    }

    memcpy(executable_area, shellcode, 113);
    VirtualProtect(executable_area, 113, PAGE_EXECUTE, &old_protect);

    int(*f)() = (int(*)()) executable_area;
    f();
    VirtualFree(executable_area, 113, MEM_RELEASE);
}

We can try the PoC for that program and see output: PoC output PoC output

We managed to dump the shellcode successfully. I hope this was helpful. :)

Please do not hesitate to ping me if there is something wrong!

Read more