Battlefield Malware Analysis (Part 1)
Enough talk. Now it’s time to fasten our seatbelts and dig through some malware!
Introduction
In the first part of “Battlefield Malware Analysis” we will take a look at script based obfuscation and how it can be defeated in a fast and efficient way by using process injection and API hooking. Before we jump straight into the practical part let us define what we mean by script based obfuscation and why it is still so prevalent these days.
Script based obfuscation is a technique used by malware authors that allows them to hide the malicious intent of their scripts from malware analysts and anti-virus. This is achieved by abusing scripting language features like eval functions1, anonymous functions2 and string based encryption, encoding or transformation.
When we think of recent malware campaigns we often see that the initial attack vector used by todays attackers is still phishing. With the help of legitimate looking phishing e-mails attackers are able to get a first foothold into their target organization.
Back in the golden days of malware (80’s and 90’s) it was very common to see malicious attachments like for example “file.pdf.exe” as part of phishing e-mails. Nowadays people are aware of the fact there is baerly no valid reason to deliver exectuable files via e-mail attachments, and therefore we often see them being blocked by default.
Malware authors adapted to these restrictions by abusing legitimate file formats which are commonly used as e-mail attachments and provide some sort of scripting capabilities which will allow them to download and execute malicious code on their victims host.
The prevalence of script based obfuscation techniques during the delivery stage of the Cyber Kill-Chain3 is closely related to the fact that executable files are no longer a reliable way for initial infection.
Scenario
Now it’s time to introduce the scenario that we will be dealing within the first part of our blog post series:
You are a malware analyst tasked to analyze a bunch of malicious JScript attachments that were
delivered as part of phishing campaign to a group of c-level executives from your company.
Some of the c(lick) level executives from your company already opened the attachments and now you are
in a hurry because they expect quick answers to clarify what happend.
It's time to boot up your analyst workstation and provide the information your they requested from you.
Here you can download the malicious JScript attachment.
Lab Setup
In order to complete the exercise and become the hero of your company we recommend the following tools:
- Windows Server 2012 R2 64-Bit
- Python 3.7.5
- Frida 12.7.20
- wscript.exe
- (x64dbg [Apr 29 2019])
Other OS/Software versions will probably work too.
Which APIs to look for?
When dealing with malicious scripts like VBScript or JScript you will notice that the vast majority of malicious scripts depend on activex controls4 / COM Objects5 to overcome the limitations imposed by the scripting language interpreter. By using activex controls from within JScript for example it is possible to accomplish certain tasks such as writing to the registry, creating files or executing other applications, which would be otherwise not possible without direct access to the Windows API. Malware authors will very likely try to hide those kind of actions by using obfuscation techniques for the purpose of avoiding detection from human analysts and anti-virus. The following JScript named “example.js” is intended as a toy example to demonstrate how activex controls can be used to start another application:
// example.js
<script language="JScript">
function fnShellExecuteJ()
{
var objShell = new ActiveXObject("shell.application");
objShell.ShellExecute("notepad.exe", "", "", "open", 1);
}
</script>
In Microsoft Windows, JScript files are associated with the Windows Script Host6 (wscript.exe). This means that when a user double clicks a JScript file it will immediately be executed by wscript.exe. These circumstances are the reason why obfuscated JScript files are so popular among attackers. They provide a simple and effecitve way of executing malicious code on a target system. In the case of “example.js” a double click would result in the execution of the Windows API function ShellExecuteExW
from Shell32.dll
, which in turn will create the process “notepad.exe”.
But how does wscript.exe know that ShellExecuteExW
7 is implemented in Shell32.dll
? This information can be obtained from the windows registry in two simple steps:
- wscript.exe needs to lookup
HKEY_CLASSES_ROOT\Shell.Application\CLSID
. - The
CLSID = {?}
value is used by wscript.exe to lookup theInProcServer32
underHKEY_CLASSES_ROOT\CLSID\{?}
, which holds the implementation ofShellExecuteExW
.
Now it’s time to take a look at the malicious JScript attachments that were send to the c-level execs of our company. The first thing we notice when we search for common keywords like ShellExecute
is that we found one match. But unlike in “example.js” the parameters passed to ShellExecute
are obfuscated, as presented in figure 1.
In the next step we need to get rid of the obfuscation without wasting to much precious time and brain resources on deobfuscating stuff in our head. By debugging wscript.exe and passing “obfuscated.js” as an argument it is possible to verify that the Windows API function ShellExecuteExW
indeed is executed. This can be seen in the Debbuger Window (x64dbg) depicted in figure 2:
The first argument (EBP+8
) passed to ShellExecuteExW
is pointer to the struct SHELLEXECUTEINFOW
8, which contains information such as the application that needs to be exeucted, it’s commandline arguments and other settings. From an Malware Analysts standpoint the contents of the struct are very valuable because they have to be in deobfuscated form, so that ShellExecuteExW
is able to execute the intended application. This means that the deobfuscation routine needs to be applied before the arguments are passed to ShellExecuteExW
. By setting a breakpoint at ShellExecuteExW
we can capture the deobfuscated arguments passed as a pointer to the struct SHELLEXECUTEINFOW
.
In the following step we will take a look at the memory address 004AD94C
in the dump section of our debbuger. This is the address where the struct SHELLEXECUTEINFOW
resides in memory:
The struct contains several fields that are of interest for further analysis. To get a complete understanding of the struct SHELLEXECUTEINFOW
we recommend to lookup the definition on msdn (here). In figure 3 we can already see some of the fields like for example lpVerb
, lpFile
, lpParameters
, lpDirectory
and nShow
.
If our goal is to understand which application gets executed by ShellExecuteExW
and what arguments are passed to it during execution, we need to take a look at the fields lpFile
and lpParameters
. When we inspect these fields we will see that “obfuscated.js” executes the following command:
powershell.exe -exec bypass -command "whoami ; sleep 5"
Success! Now we are able to give our boss the information he needs to calm down the c-level execs of our company. But wait! There are still some malicious JScript files left for analysis. The question now is how you can analyze them without redoing all the steps presented so far? The answer to this question will be covered in the next section. Another important question that needs to be answered is what Windows APIs to look for when dealing with obfuscated scripts? The answer is simple! We don’t know what APIs to look for beforehand because each malicious script is different. The best way to undestand what the malware does and to defeat obfuscation is to intercept all interesting APIs. You might ask yourself what APIs are interesting then? Well it depends, but the blog post “WinDBG and JavaScript Analysis” from Cisco Talos is a good starting point.
Analyzing malicious scripts at scale
In this section of the blog post we will explaint to you how we can analyze a bunch of malicious JScript attachments without repeating the tedious steps introduced in the last section. The answer to this is simple! With the help of process injection and API Hooking, we are able to analyze the function calls of our interest. This allows us to bypass obfuscation and get an understanding of what the malicious script tries to achive on the victims machine. But what if we are really lazy people and we don’t want to implement all of this process injection9 and hooking10 stuff on our own? Then Frida is our answer!
But what is Frida? According to the projects webpage Frida is “[…] Greasemonkey for native apps, or, put in more technical terms, it’s a dynamic code instrumentation toolkit. It lets you inject snippets of JavaScript or your own library into native apps on Windows, macOS, GNU/Linux, iOS, Android, and QNX. Frida also provides you with some simple tools built on top of the Frida API. These can be used as-is, tweaked to your needs, or serve as examples of how to use the API.”.
As you might guess from the description above there are plenty of things that you can do with Frida, but in our case we will solely focuse on how it can be used to automate the deobfuscation of “obfuscated.js”. Wait! so you are telling me…
Indeed, we will use Frida’s core, Gum (Instrumentation Library) and Gum’s JavaScript binding GumJS to hook ShellExecuteExW
and grab the passed arguments in a deobfuscated state. In theory this is achieved as follows:
- Frida core suspends the target process wscript.exe.
- Frida core creates a remote thread in the target process which then loads Frida agent (Gum + Google’s V8 Engine) distributed as a shared library.
- Gum is used to hook the function
ShellExecuteExW
from Shell32.dll in the target process wscript.exe. - Everytime
ShellExecuteExW
is called in inside the target process our JavaScript gets executed with the help of Google’s V8 Engine, which then gives us full access to the arguments passed toShellExecuteExW
.
Before we start writing the JavaScript Code that will be executed instead of ShellExecuteExW
, we can use frida-trace
to create a template for us. Please be aware of the fact that when we run frida-trace
as presented in figure 5 the obfuscated JScript will be executed. When analyzing an unknown script this step is not recommended.
After running frida-trace
the directory “__handlers__\SHELL32.dll\” is created in the current path. This directory contains the JavaScript file “ShellExecuteExW.js” which represents the template mentioned earlier. The JavaScript “ShellExecuteExW.js” will be used to define the behaviour of the hooked function ShellExecuteExW
. Everytime the function ShellExecuteExW
is called within wscript.exe the hooking function onEnter
will also be executed.
/**
* Called synchronously when about to call ShellExecuteExW.
*
onEnter: function (log, args, state) {
//Place your hook functionality here.
},
The arguments passed to ShellExecuteExW
, will also be present in our hooking function onEnter
, via the array args
. Recall from the previous section that the first and only argument passed to ShellExecuteExW
is a pointer to the struct SHELLEXECUTEINFOW
. The address of the struct can now be used to access all the interesting fields which will reveal the purpose of the malicious JScript “obfuscated.js”. Figure 6 depicts the final implementation of the hooking function.
Finally, if we rerun frida-trace
with the same commandline arguments as in figure 5, the previously defined hooking function will be executed whenever ShellExecuteExW
is called within wscript.exe. The hooking function will then print out all of the interesting fields from the struct SHELLEXECUTEINFOW
passed to ShellExecuteExW
:
With the help of frida-trace
and the hooking function onEnter
from “ShellExecuteExW.js”, we can now automate the analysis of the malicious JScript attachments that were received by the c-level executives of our company.
Final Thoughts
The approach presented in this blog post is based on the hypothesis that malware authors at some point will need to extend the functionality of their malicious scripts by using activex controls / COM Objects in order to get access to more powerful APIs. When doing so, it is highly likely that malware authors will try to hide the specifics (function names, arguments) of the APIs used by employing some form of obfuscation. With the help of frida we can easily intercept all the relevant API calls used by malicious scripts which allows us to bypass the implemented obfuscation techniques. Another great benefit that comes from using frida is the high grade of automation that can be achieved by using it.
Want to see something else added? Open an issue.