Monthly Archives: April 2019

Churning Out Machine Learning Models: Handling Changes in Model Predictions


Machine learning (ML) is playing an increasingly important role in cyber security. Here at FireEye, we employ ML for a variety of tasks such as: antivirus, malicious PowerShell detection, and correlating threat actor behavior. While many people think that a data scientist’s job is finished when a model is built, the truth is that cyber threats constantly change and so must our models. The initial training is only the start of the process and ML model maintenance creates a large amount of technical debt. Google provides a helpful introduction to this topic in their paper “Machine Learning: The High-Interest Credit Card of Technical Debt.” A key concept from the paper is the principle of CACE: change anything, change everything. Because ML models deliberately find nonlinear dependencies between input data, small changes in our data can create cascading effects on model accuracy and downstream systems that consume those model predictions. This creates an inherent conflict in cyber security modeling: (1) we need to update models over time to adjust to current threats and (2) changing models can lead to unpredictable outcomes that we need to mitigate.

Ideally, when we update a model, the only change in model outputs are improvements, e.g. fixes to previous errors. Both false negatives (missing malicious activity) and false positives (alerts on benign activity), have significant impact and should be minimized. Since no ML model is perfect, we mitigate mistakes with orthogonal approaches: whitelists and blacklists, external intelligence feeds, rule-based systems, etc. Combining with other information also provides context for alerts that may not otherwise be present. However, CACE! These integrated systems can suffer unintended side effects from a model update. Even when the overall model accuracy has increased, individual changes in model output are not guaranteed to be improvements. Introduction of new false negatives or false positives in an updated model, called churn, creates the potential for new vulnerabilities and negative interactions with cyber security infrastructure that consumes model output. In this article, we discuss churn, how it creates technical debt when considering the larger cyber security product, and methods to reduce it.

Prediction Churn

Whenever we retrain our cyber security-focused ML models, we need to able to calculate and control for churn. Formally, prediction churn is defined as the expected percent difference between two different model predictions (note that prediction churn is not the same as customer churn, the loss of customers over time, which is the more common usage of the term in business analytics). It was originally defined by Cormier et al. for a variety of applications. For cyber security applications, we are often concerned with just those differences where the newer model performs worse than the older model. Let’s define bad churn when retraining a classifier as the percentage of misclassified samples in the test set which the original model correctly classified.

Churn is often a surprising and non-intuitive concept. After all, if the accuracy of our new model is better than the accuracy of our old model, what’s the problem? Consider the simple linear classification problem of malicious red squares and benign blue circles in Figure 1. The original model, A, makes three misclassifications while the newer model, B, makes only two errors. B is the more accurate model. Note, however, that B introduces a new mistake in the lower right corner, misclassifying a red square as benign. That square was correctly classified by model A and represents an instance of bad churn. Clearly, it’s possible to reduce the overall error rate while introducing a small number of new errors which did not exist in the older model.

Figure 1: Two linear classifiers with errors highlighted in orange. The original classifier A has lower accuracy than B. However, B introduces a new error in the bottom right corner.

Practically, churn introduces two problems in our models. First, bad churn may require changes to whitelist/blacklists used in conjunction with ML models. As we previously discussed, these are used to handle the small but inevitable number of incorrect classifications. Testing on large repositories of data is necessary to catch such changes and update associated whitelists and blacklists. Second, churn may create issues for other ML models or rule-based systems which rely on the output of the ML model. For example, consider a hypothetical system which evaluates URLs using both a ML model and a noisy blacklist. The system generates an alert if

  • P(URL = ‘malicious’) > 0.9 or
  • P(URL = ‘malicious’) > 0.5 and the URL is on the blacklist

After retraining, the distribution of P(URL=‘malicious’) changes and all .com domains receive a higher score. The alert rules may need to be readjusted to maintain the required overall accuracy of the combined system. Ultimately, finding ways of reducing churn minimizes this kind of technical debt.

Experimental Setup

We’re going to explore churn and churn reduction techniques using EMBER, an open source malware classification data set. It consists of 1.1 million PE files first seen in 2017, along with their labels and features. The objective is to classify the files as either goodware or malware. For our purposes we need to construct not one model, but two, in order to calculate the churn between models. We have split the data set into three pieces:

  1. January through August is used as training data
  2. September and October are used to simulate running the model in production and retraining (test 1 in Figure 2).
  3. November and December are used to evaluate the models from step 1 and 2 (test 2 in Figure 2).

Figure 2: A comparison of our experimental setup versus the original EMBER data split. EMBER has a ten-month training set and a two-month test set. Our setup splits the data into three sets to simulate model training, then retraining while keeping an independent data set for final evaluation.

Figure 2 shows our data split and how it compares to the original EMBER data split. We have built a LightGBM classifier on the training data, which we’ll refer to as the baseline model. To simulate production testing, we run the baseline model on test 1 and record the FPs and FNs. Then, we retrain our model using both the training data and the FPs/FNs from test 1. We’ll refer to this model as the standard retrain. This is a reasonably realistic simulation of actual production data collection and model retraining. Finally, both the baseline model and the standard retrain are evaluated on test 2. The standard retrain has a higher accuracy than the baseline on test 2, 99.33% vs 99.10% respectively. However, there are 246 misclassifications made by the retrain model that were not made by the baseline or 0.12% bad churn.

Incremental Learning

Since our rationale for retraining is that cyber security threats change over time, e.g. concept drift, it’s a natural suggestion to use techniques like incremental learning to handle retraining. In incremental learning we take new data to learn new concepts without forgetting (all) previously learned concepts. That also suggests that an incrementally trained model may not have as much churn, as the concepts learned in the baseline model still exist in the new model. Not all ML models support incremental learning, but linear and logistic regression, neural networks, and some decision trees do. Other ML models can be modified to implement incremental learning. For our experiment, we incrementally trained the baseline LightGBM model by augmenting the training data with FPs and FNs from test 1 and then trained an additional 100 trees on top of the baseline model (for a total of 1,100 trees). Unlike the baseline model we use regularization (L2 parameter of 1.0); using no regularization resulted in overfitting to the new points. The incremental model has a bad churn of 0.05% (113 samples total) and 99.34% accuracy on test 2. Another interesting metric is the model’s performance on the new training data; how many of the baseline FPs and FNs from test 1 does the new model fix? The incrementally trained model correctly classifies 84% of the previous incorrect classifications. In a very broad sense, incrementally training on a previous model’s mistake provides a “patch” for the “bugs” of the old model.

Churn-Aware Learning

Incremental approaches only work if the features of the original and new model are identical. If new features are added, say to improve model accuracy, then alternative methods are required. If what we desire is both accuracy and low churn, then the most straightforward solution is to include both of these requirements when training. That’s the approach taken by Cormier et al., where samples received different weights during training in such a way as to minimize churn. We have made a few deviations in our approach: (1) we are interested in reducing bad churn (churn involving new misclassifications) as opposed to all churn and (2) we would like to avoid the extreme memory requirements of the original method. In a similar manner to Cormier et al., we want to reduce the weight, e.g. importance, of previously misclassified samples during training of a new model. Practically, the model sees making the same mistakes as the previous model as cheaper than making a new mistake. Our weighing scheme gives all samples correctly classified by the original model a weight of one and all other samples have a weight of: w = α – β |0.5 – Pold (χi)|, where Pold (χi) is the output of the old model on sample χi and αβ are adjustable hyperparameters. We train this reduced churn operator model (RCOP) using an α of 0.9, a β of 0.6 and the same training data as the incremental model. RCOP produces 0.09% bad churn, 99.38% accuracy on test 2.


Figure 3 shows both accuracy and bad churn of each model on test set 2. We compare the baseline model, the standard model retrain, the incrementally learned model and the RCOP model.

Figure 3: Bad churn versus accuracy on test set 2.

Table 1 summarizes each of these approaches, discussed in detail above.


Trained on


Total # of trees





Standard retrain

train + FPs/FNs from baseline on test 1



Incremental model

train + FPs/FNs from baseline on test 1

Trained 100 new trees, starting from the baseline model



train + FPs/FNs from baseline on test 1

LightGBM with altered sample weights


Table 1: A description of the models tested

The baseline model has 100 fewer trees than the other models, which could explain the comparatively reduced accuracy. However, we tried increasing the number of trees which resulted in only a minor increase in accuracy of < 0.001%. The increase in accuracy for the non-baseline methods is due to the differences in data set and training methods. Both incremental training and RCOP work as expected producing less churn than the standard retrain, while showing accuracy improvements over the baseline. In general, there is usually a trend of increasing accuracy being correlated with increasing bad churn: there is no free lunch. That increasing accuracy occurs due to changes in the decision boundary, the more improvement the more changes occur. It seems reasonable the increasing decision boundary changes correlate with an increase in bad churn although we see no theoretical justification for why that must always be the case.

Unexpectedly, both the incremental model and RCOP produce more accurate models with less churn than the standard retrain. We would have assumed that given their additional constraints both models would have less accuracy with less churn. The most direct comparison is RCOP versus the standard retrain. Both models use identical data sets and model parameters, varying only by the weights associated with each sample. RCOP reduces the weight of incorrectly classified samples by the baseline model. That reduction is responsible for the improvement in accuracy. A possible explanation of this behavior is mislabeled training data. Multiple authors have suggested identifying and removing points with label noise, often using the misclassifications of a previously trained model to identify those noisy points. Our scheme, which reduces the weight of those points instead of removing them, is not dissimilar to those other noise reduction approaches which could explain the accuracy improvement.


ML models experience an inherent struggle: not retraining means being vulnerable to new classes of threats, while retraining causes churn and potentially reintroduces old vulnerabilities. In this blog post, we have discussed two different approaches to modifying ML model training in order to reduce churn: incremental model training and churn-aware learning. Both demonstrate effectiveness in the EMBER malware classification data set by reducing the bad churn, while simultaneously improving accuracy. Finally, we also demonstrated the novel conclusion that reducing churn in a data set with label noise can result in a more accurate model. Overall, these approaches provide low technical debt solutions to updating models that allow data scientists and machine learning engineers to keep their models up-to-date against the latest cyber threats at minimal cost. At FireEye, our data scientists work closely with the FireEye Labs detection analysts to quickly identify misclassifications and use these techniques to reduce the impact of churn on our customers.

Cyber Security: Three Parts Art, One Part Science

As I reflect upon my almost 40 years as a cyber security professional, I think of the many instances where the basic tenets of cyber security—those we think have common understanding—require a lot of additional explanation. For example, what is a vulnerability assessment? If five cyber professionals are sitting around a table discussing this question, you will end up with seven or eight answers. One will say that a vulnerability assessment is vulnerability scanning only. Another will say an assessment is much bigger than scanning, and addresses ethical hacking and internal security testing. Another will say that it is a passive review of policies and controls. All are correct in some form, but the answer really depends on the requirements or criteria you are trying to achieve. And it also depends on the skills and experience of the risk owner, auditor, or assessor. Is your head spinning yet? I know mine is! Hence the “three parts art.”

There is quite a bit of subjectivity in the cyber security business. One auditor will look at evidence and agree you are in compliance; another will say you are not. If you are going to protect sensitive information, do you encrypt it, obfuscate it, or segment it off and place it behind very tight identification and access controls before allowing users to access the data? Yes. As we advise our client base, it is essential that we have all the context necessary to make good risk-based decisions and recommendations.

Let’s talk about Connection’s artistic methodology. We start with a canvas that has the core components of cyber security: protection, detection, and reaction. By addressing each of these three pillars in a comprehensive way, we ensure that the full conversation around how people, process, and technology all work together to provide a comprehensive risk strategy is achieved.

Related: Cyber Security is Everyone’s Business


Users understand threat and risk, and know what role they play in the protection strategy. For example, if you see something, say something. Don’t let someone surf in behind you through a badge check entry. And don’t think about trying to shut off your end-point anti-virus or firewall.

Policy are established, documented, and socialized. For example, personal laptops should never be connected to the corporate network. Also, don’t send sensitive information to your personal email account so you can work from home.

Some examples of the barriers used to deter attackers and breaches are edge security with firewalls, intrusion detection and prevention, sandboxing, and advanced threat detection.


The average mean time to identify an active incident in a network is 197 days. The mean time to contain an incident is 69 days.

Incident response teams need to be identified and trained, and all employees need to be trained on the concept of “if you see something, say something.” Detection is a proactive process.

What happens when an alert occurs? Who sees it? What is the documented process for taking action?

What is in place to ensure you are detecting malicious activity? Is it configured to ignore noise and only alert you of a real event? Will it help you bring that 197-day mean time to detection way down?


What happens when an event occurs? Who responds? How do you recover? Does everyone understand their role? Do you War Game to ensure you are prepared WHEN an incident occurs?

What is the documented process to reduce the Kill Chain—the mean time to detect and contain—from 69 days to 69 minutes? Do you have a Business Continuity and Disaster Recovery Plan to ensure the ability to react to a natural disaster, significant cyber breach such as ransomware, DDoS, or—dare I say it—a pandemic?

What cyber security consoles have been deployed that allow quick access to patch a system, change a firewall rule, switch ACL, or policy setting at an end point, or track a security incident through the triage process?

All of these things are important to create a comprehensive InfoSec Program. The science is the technology that will help you build a layered, in-depth defense approach. The art is how to assess the threat, define and document the risk, and create a strategy that allows you to manage your cyber risk as it applies to your environment, users, systems, applications, data, customers, supply chain, third party support partners, and business process.

More Art – Are You a Risk Avoider or Risk Transference Expert?

A better way to state that is, “Do you avoid all risk responsibility or do you give your risk responsibility to someone else?” Hint: I don’t believe in risk avoidance or risk transference.

Yes, there is an art to risk management. There is also science if you use, for example, The Carnegie Mellon risk tools. But a good risk owner and manager documents risk, prioritizes it by risk criticality, turns it into a risk register or roadmap plan, remediates what is necessary, and accepts what is reasonable from a business and cyber security perspective. Oh, by the way, those same five cyber security professional we talked about earlier? They have 17 definitions of risk.

As we wrap up this conversation, let’s talk about the importance of selecting a risk framework. It’s kind of like going to a baseball game and recognizing the program helps you know the players and the stats. What framework will you pick? Do you paint in watercolors or oils? Are you a National Institute of Standards (NIST) artist, an Internal Standards Organization artist, or have you developed your own framework like the Nardone puzzle chart? I developed this several years ago when I was the CTO/CSO of the Commonwealth of Massachusetts. It has been artistically enhanced over the years to incorporate more security components, but it is loosely coupled on the NIST 800-53 and ISO 27001 standards.

When it comes to selecting a security framework as a CISO, I lean towards the NIST Cyber Security Framework (CSF) pictured below. This framework is comprehensive, and provides a scoring model that allows risk owners to measure and target what risk level they believe they need to achieve based on their business model, threat profile, and risk tolerance. It has five functional focus areas. The ISO 27001 framework is also a very solid and frequently used model. Both of these frameworks can result in a Certificate of Attestation demonstrating adherence to the standard. Many commercial corporations do an annual ISO 27001 assessment for that very reason. More and more are leaning towards the NIST CSF, especially commercial corporations doing work with the government.

The art in cyber security is in the interpretation of the rules, standards, and requirements that are primarily based on a foundation in science in some form. The more experience one has in the cyber security industry, the more effective the art becomes. As a last thought, keep in mind that Connection’s Technology Solutions Group Security Practice has over 150 years of cyber security expertise on tap to apply to that art.

The post Cyber Security: Three Parts Art, One Part Science appeared first on Connected.

Troubleshooting NSM Virtualization Problems with Linux and VirtualBox

I spent a chunk of the day troubleshooting a network security monitoring (NSM) problem. I thought I would share the problem and my investigation in the hopes that it might help others. The specifics are probably less important than the general approach.

It began with ja3. You may know ja3 as a set of Zeek scripts developed by the Salesforce engineering team to profile client and server TLS parameters.

I was reviewing Zeek logs captured by my Corelight appliance and by one of my lab sensors running Security Onion. I had coverage of the same endpoint in both sensors.

I noticed that the SO Zeek logs did not have ja3 hashes in the ssl.log entries. Both sensors did have ja3s hashes. My first thought was that SO was misconfigured somehow to not record ja3 hashes. I quickly dismissed that, because it made no sense. Besides, verifying that intution required me to start troubleshooting near the top of the software stack.

I decided to start at the bottom, or close to the bottom. I had a sinking suspicion that, for some reason, Zeek was only seeing traffic sent from remote systems, and not traffic originating from my network. That would account for the creation of ja3s hashes, for traffic sent by remote systems, but not ja3 hashes, as Zeek was not seeing traffic sent by local clients.

I was running SO in VirtualBox 6.0.4 on Ubuntu 18.04. I started sniffing TCP network traffic on the SO monitoring interface using Tcpdump. As I feared, it didn't look right. I ran a new capture with filters for ICMP and a remote IP address. On another system I tried pinging the remote IP address. Sure enough, I only saw ICMP echo replies, and no ICMP echoes. Oddly, I also saw doubles and triples of some of the ICMP echo replies. That worried me, because unpredictable behavior like that could indicate some sort of software problem.

My next step was to "get under" the VM guest and determine if the VM host could see traffic properly. I ran Tcpdump on the Ubuntu 18.04 host on the monitoring interface and repeated my ICMP tests. It saw everything properly. That meant I did not need to bother checking the switch span port that was feeding traffic to the VirtualBox system.

It seemed I had a problem somewhere between the VM host and guest. On the same VM host I was also running an instance of RockNSM. I ran my ICMP tests on the RockNSM VM and, sadly, I got the same one-sided traffic as seen on SO.

Now I was worried. If the problem had only been present in SO, then I could fix SO. If the problem is present in both SO and RockNSM, then the problem had to be with VirtualBox -- and I might not be able to fix it.

I reviewed my configurations in VirtualBox, ensuring that the "Promiscuous Mode" under the Advanced options was set to "Allow All". At this point I worried that there was a bug in VirtualBox. I did some Google searches and reviewed some forum posts, but I did not see anyone reporting issues with sniffing traffic inside VMs. Still, my use case might have been weird enough to not have been reported.

I decided to try a different approach. I wondered if running VirtualBox with elevated privileges might make a difference. I did not want to take ownership of my user VMs, so I decided to install a new VM and run it with elevated privileges.

Let me stop here to note that I am breaking one of the rules of troubleshooting. I'm introducing two new variables, when I should have introduced only one. I should have built a new VM but run it with the same user privileges with which I was running the existing VMs.

I decided to install a minimal edition of Ubuntu 9, with VirtualBox running via sudo. When I started the VM and sniffed traffic on the monitoring port, lo and behold, my ICMP tests revealed both sides of the traffic as I had hoped. Unfortunately, from this I erroneously concluded that running VirtualBox with elevated privileges was the answer to my problems.

I took ownership of the SO VM in my elevated VirtualBox session, started it, and performed my ICMP tests. Womp womp. Still broken.

I realized I needed to separate the two variables that I had entangled, so I stopped VirtualBox, and changed ownership of the Debian 9 VM to my user account. I then ran VirtualBox with user privileges, started the Debian 9 VM, and ran my ICMP tests. Success again! Apparently elevated privileges had nothing to do with my problem.

By now I was glad I had not posted anything to any user forums describing my problem and asking for help. There was something about the monitoring interface configurations in both SO and RockNSM that resulted in the inability to see both sides of traffic (and avoid weird doubles and triples).

I started my SO VM again and looked at the script that configured the interfaces. I commented out all the entries below the management interface as shown below.

$ cat /etc/network/interfaces

# This configuration was created by the Security Onion setup script.
# The original network interface configuration file was backed up to:
# /etc/network/interfaces.bak.
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# loopback network interface
auto lo
iface lo inet loopback

# Management network interface
auto enp0s3
iface enp0s3 inet static
  dns-domain localdomain

#auto enp0s8
#iface enp0s8 inet manual
#  up ip link set $IFACE promisc on arp off up
#  down ip link set $IFACE promisc off down
#  post-up ethtool -G $IFACE rx 4096; for i in rx tx sg tso ufo gso gro lro; do ethtool -K $IFACE $i off; done
#  post-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6

#auto enp0s9
#iface enp0s9 inet manual
#  up ip link set $IFACE promisc on arp off up
#  down ip link set $IFACE promisc off down
#  post-up ethtool -G $IFACE rx 4096; for i in rx tx sg tso ufo gso gro lro; do ethtool -K $IFACE $i off; done
#  post-up echo 1 > /proc/sys/net/ipv6/conf/$IFACE/disable_ipv6

I rebooted the system and brought the enp0s8 interface up manually using this command:

$ sudo ip link set enp0s8 promisc on arp off up

Fingers crossed, I ran my ICMP sniffing tests, and voila, I saw what I needed -- traffic in both directions, without doubles or triples no less.

So, there appears to be some sort of problem with the way SO and RockNSM set parameters for their monitoring interfaces, at least as far as they interact with VirtualBox 6.0.4 on Ubuntu 18.04. You can see in the network script that SO disables a bunch of NIC options. I imagine one or more of them is the culprit, but I didn't have time to work through them individually.

I tried taking a look at the network script in RockNSM, but it runs CentOS, and I'll be darned if I can't figure out where to look. I'm sure it's there somewhere, but I didn't have the time to figure out where.

The moral of the story is that I should have immediately checked after installation that both SO and RockNSM were seeing both sides of the traffic I expected them to see. I had taken that for granted for many previous deployments, but something broke recently and I don't know exactly what. My workaround will hopefully hold for now, but I need to take a closer look at the NIC options because I may have introduced another fault.

A second moral is to be careful of changing two or more variables when troubleshooting. When you do that you might fix a problem, but not know what change fixed the issue.

Finding Weaknesses Before the Attackers Do

This blog post originally appeared as an article in M-Trends 2019.

FireEye Mandiant red team consultants perform objectives-based assessments that emulate real cyber attacks by advanced and nation state attackers across the entire attack lifecycle by blending into environments and observing how employees interact with their workstations and applications. Assessments like this help organizations identify weaknesses in their current detection and response procedures so they can update their existing security programs to better deal with modern threats.

A financial services firm engaged a Mandiant red team to evaluate the effectiveness of its information security team’s detection, prevention and response capabilities. The key objectives of this engagement were to accomplish the following actions without detection:

  • Compromise Active Directory (AD): Gain domain administrator privileges within the client’s Microsoft Windows AD environment.
  • Access financial applications: Gain access to applications and servers containing financial transfer data and account management functionality.
  • Bypass RSA Multi-Factor Authentication (MFA): Bypass MFA to access sensitive applications, such as the client’s payment management system.
  • Access ATM environment: Identify and access ATMs in a segmented portion of the internal network.

Initial Compromise

Based on Mandiant’s investigative experience, social engineering has become the most common and efficient initial attack vector used by advanced attackers. For this engagement, the red team used a phone-based social engineering scenario to circumvent email detection capabilities and avoid the residual evidence that is often left behind by a phishing email.

While performing Open-source intelligence (OSINT) reconnaissance of the client’s Internet-facing infrastructure, the red team discovered an Outlook Web App login portal hosted at https://owa.customer.example. The red team registered a look-alike domain (https://owacustomer.example) and cloned the client’s login portal (Figure 1).

Figure 1: Cloned Outlook Web Portal

After the OWA portal was cloned, the red team identified IT helpdesk and employee phone numbers through further OSINT. Once these phone numbers were gathered, the red team used a publicly available online service to call the employees while spoofing the phone number of the IT helpdesk.

Mandiant consultants posed as helpdesk technicians and informed employees that their email inboxes had been migrated to a new company server. To complete the “migration,” the employee would have to log into the cloned OWA portal. To avoid suspicion, employees were immediately redirected to the legitimate OWA portal once they authenticated. Using this campaign, the red team captured credentials from eight employees which could be used to establish a foothold in the client’s internal network.

Establishing a Foothold

Although the client’s virtual private network (VPN) and Citrix web portals implemented MFA that required users to provide a password and RSA token code, the red team found a singlefactor bring-your-own-device (BYOD) portal (Figure 2).

Figure 2: Single factor mobile device management portal

Using stolen domain credentials, the red team logged into the BYOD web portal to attempt enrollment of an Android phone for CUSTOMER\user0. While the red team could view user settings, they were unable to add a new device. To bypass this restriction, the consultants downloaded the IBM MaaS360 Android app and logged in via their phone. The device configuration process installed the client’s VPN certificate (Fig. 13), which was automatically imported to the Cisco AnyConnect app—also installed on the phone.

Figure 3: Setting up mobile device management

After launching the AnyConnect app, the red team confirmed the phone received an IP address on the client’s VPN. Using a generic tethering app from the Google Play store, the red team then tethered a laptop to the phone to access the client’s internal network.

Escalating Privileges

Once connected to the internal network, the red team used the Windows “runas” command to launch PowerShell as CUSTOMER\user0 and perform a “Kerberoast” attack. Kerberoasting abuses legitimate features of Active Directory to retrieve service accounts’ ticketgranting service (TGS) tickets and brute-force accounts with weak passwords.

To perform the attack, the red team queried an Active Directory domain controller for all accounts with a service principal name (SPN). The typical Kerberoast attack would then request a TGS for the SPN of the associated user account. While Kerberos ticket requests are common, the default Kerberoast attack tool generates an increased volume of requests, which is anomalous and could be identified as suspicious. Using a keyword search for terms such as “Admin”, “SVC” and “SQL,” the consultants identified 18 potentially high-value accounts. To avoid detection, the red team retrieved tickets for this targeted subset of accounts and inserted random delays between each request. The Kerberos tickets for these accounts were then uploaded to a Mandiant password-cracking server which successfully brute-forced the passwords of 4 out of 18 accounts within 2.5 hours.

The red team then compiled a list of Active Directory group memberships for the cracked accounts, uncovering several groups that followed the naming scheme of {ComputerName}_Administrators. The red team confirmed the accounts possessed local administrator privileges to the specified computers by performing a remote directory listing of \\ {ComputerName}\C$. The red team also executed commands on the system using PowerShell Remoting to gain information about logged on users and running software. After reviewing this data, the red team identified an endpoint detection and response (EDR) agent which had the capability to perform in-memory detections that were likely to identify and alert on the execution of suspicious command line arguments and parent/ child process heuristics associated with credential theft.

To avoid detection, the red team created LSASS process memory dumps by using a custom utility executed via WMI. The red team retrieved the LSASS dump files over SMB and extracted cleartext passwords and NTLM hashes using Mimikatz. The red team performed this process on 10 unique systems identified to potentially have active privileged user sessions. From one of these 10 systems, the red team successfully obtained credentials for a member of the Domain Administrators group.

With access to this Domain Administrator account, the red team gained full administrative rights for all systems and users in the customer’s domain. This privileged account was then used to focus on accessing several high-priority applications and network segments to demonstrate the risk of such an attack on critical customer assets.

Accessing High-Value Objectives

For this phase, the client identified their RSA MFA systems, ATM network and high-value financial applications as three critical objectives for the Mandiant red team to target.

Targeting Financial Applications

The red team began this phase by querying Active Directory data for hostnames related to the objectives and found multiple servers and databases that included references to their key financial application. The red team reviewed the files and documentation on financial application web servers and found an authentication og indicating the following users accessed the financial application:

  • CUSTOMER\user1
  • CUSTOMER\user2
  • CUSTOMER\user3
  • CUSTOMER\user4

The red team navigated to the financial application’s web interface (Figure 4) and found that authentication required an “RSA passcode,” clearly indicating access required an MFA token.

Figure 4: Financial application login portal

Bypassing Multi-Factor Authentication

The red team targeted the client’s RSA MFA implementation by searching network file shares for configuration files and IT documentation. In one file share (Figure 5), the red team discovered software migration log files that revealed the hostnames of three RSA servers.

Figure 5: RSA migration logs from \\ CUSTOMER-FS01\ Software

Next, the red team focused on identifying the user who installed the RSA authentication module. The red team performed a directory listing of the C:\Users and C:\ data folders of the RSA servers, finding CUSTOMER\ CUSTOMER_ADMIN10 had logged in the same day the RSA agent installer was downloaded. Using these indicators, the red team targeted CUSTOMER\ CUSTOMER_ADMIN10 as a potential RSA administrator.

Figure 6: Directory listing output

By reviewing user details, the red team identified the CUSTOMER\CUSTOMER_ADMIN10 account was actually the privileged account for the corresponding standard user account CUSTOMER\user103. The red team then used PowerView, an open source PowerShell tool, to identify systems in the environment where CUSTOMER\user103 was or had recently logged in (Figure 7).

Figure 7: Running the PowerView Invoke-UserHunter command

The red team harvested credentials from the LSASS memory of and successfully obtained the cleartext password for CUSTOMER\user103 (Figure 8).

Figure 8: Mimikatz output

The red team used the credential for CUSTOMER\user103 to login, without MFA, to the web front-end of the RSA security console with administrative rights (Figure 9).

Figure 9: RSA console

Many organizations have audit procedures to monitor for the creation of new RSA tokens, so the red team decided the stealthiest approach would be to provision an emergency tokencode. However, since the client was using software tokens, the emergency tokens still required a user’s RSA SecurID PIN. The red team decided to target individual users of the financial application and attempt to discover an RSA PIN stored on their workstation.

While the red team knew which users could access the financial application, they did not know the system assigned to each user. To identify these systems, the red team targeted the users through their inboxes. The red team set a malicious Outlook homepage for the financial application user CUSTOMER\user1 through MAPI over HTTP using the Ruler11 utility. This ensured that whenever the user reopened Outlook on their system, a backdoor would launch.

Once CUSTOMER\user1 had re-launched Outlook and their workstation was compromised, the red team began enumerating installed programs on the system and identified that the target user used KeePass, a common password vaulting solution.

The red team performed an attack against KeePass to retrieve the contents of the file without having the master password by adding a malicious event trigger to the KeePass configuration file (Figure 10). With this trigger, the next time the user opened KeePass a comma-separated values (CSV) file was created with all passwords in the KeePass database, and the red team was able to retrieve the export from the user’s roaming profile.

Figure 10: Malicious configuration file

One of the entries in the resulting CSV file was login credentials for the financial application, which included not only the application password, but also the user’s RSA SecurID PIN. With this information the red team possessed all the credentials needed to access the financial application.

The red team logged into the RSA Security Console as CUSTOMER\user103 and navigated to the user record for CUSTOMER\user1. The red team then generated an online emergency access token (Figure 11). The token was configured so that the next time CUSTOMER\ user1 authenticated with their legitimate RSA SecurID PIN + tokencode, the emergency access code would be disabled. This was done to remain covert and mitigate any impact to the user’s ability to conduct business.

Figure 11: Emergency access token

The red team then successfully authenticated to the financial application with the emergency access token (Figure 12).

Figure 12: Financial application accessed with emergency access token

Accessing ATMs

The red team’s final objective was to access the ATM environment, located on a separate network segment from the primary corporate domain. First, the red team prepared a list of high-value users by querying the member list of potentially relevant groups such as ATM_ Administrators. The red team then searched all accessible systems for recent logins by these targeted accounts and dumped their passwords from memory.

After obtaining a password for ATM administrator CUSTOMER\ADMIN02, the red team logged into the client’s internal Citrix portal to access the employee’s desktop. The red team reviewed the administrator’s documentation and determined the client’s ATMs could be accessed through a server named JUMPHOST01, which connected the corporate and ATM network segments. The red team also found a bookmark saved in Internet Explorer for “ATM Management.” While this link could not be accessed directly from the Citrix desktop, the red team determined it would likely be accessible from JUMPHOST01.

The jump server enforced MFA for users attempting to RDP into the system, so the red team used a previously compromised domain administrator account, CUSTOMER\ ADMIN01, to execute a payload on JUMPHOST01 through WMI. WMI does not support MFA, so the red team was able to establish a connection between JUMPHOST01 and the red team’s CnC server, create a SOCKS proxy, and access the ATM Management application without an RSA pin. The red team successfully authenticated to the ATM Management application and could then dispense money, add local administrators, install new software and execute commands with SYSTEM privileges on all ATM machines (Figure 13).

Figure 13: Executing commands on ATMs as SYSTEM

Takeaways: Multi-Factor Authentication, Password Policy and Account Segmentation

Multi-Factor Authentication

Mandiant experts have seen a significant uptick in the number of clients securing their VPN or remote access infrastructure with MFA. However, there is frequently a lack of MFA for applications being accessed from within the internal corporate network. Therefore, FireEye recommends that customers enforce MFA for all externally accessible login portals and for any sensitive internal applications.

Password Policy

During this engagement, the red team compromised four privileged service accounts due to the use of weak passwords which could be quickly brute forced. FireEye recommends that customers enforce strong password practices for all accounts. Customers should enforce a minimum of 20-character passwords for service accounts. When possible, customers should also use Microsoft Managed Service Accounts (MSAs) or enterprise password vaulting solutions to manage privileged users.

Account Segmentation

Once the red team obtained initial access to the environment, they were able to escalate privileges in the domain quickly due to a lack of account segmentation. FireEye recommends customers follow the “principle of least-privilege” when provisioning accounts. Accounts should be separated by role so normal users, administrative users and domain administrators are all unique accounts even if a single employee needs one of each. 

Normal user accounts should not be given local administrator access without a documented business requirement. Workstation administrators should not be allowed to log in to servers and vice versa. Finally, domain administrators should only be permitted to log in to domain controllers, and server administrators should not have access to those systems. By segmenting accounts in this way, customers can greatly increase the difficulty of an attacker escalating privileges or moving laterally from a single compromised account.


As demonstrated in this case study, the Mandiant red team was able to gain a foothold in the client’s environment, obtain full administrative control of the company domain and compromise all critical business applications without any software or operating system exploits. Instead, the red team focused on identifying system misconfigurations, conducting social engineering attacks and using the client’s internal tools and documentation. The red team was able to achieve their objectives due to the configuration of the client’s MFA, service account password policy and account segmentation.

Exploitation of Critical Cisco ASA Vulnerability

The ACSC has become aware of a change in the threat situation surrounding the recently announced Cisco ASA critical remote code execution vulnerability. Proof of concept code is now available which results in a denial of service condition on targeted vulnerable devices. Cisco first released a security advisory on 29 January detailing the vulnerability and affected devices but has since identified additional attack vectors and released additional, more comprehensive patches.