Category Archives: automation

From SmarterChild to Siri: Why AI is the competitive advantage securing businesses

The dream of an AI-influenced world is finally here. After decades of writing about it, AI has reached a point where it’s ingrained into our daily lives. From the days of SmarterChild – for many, the AIM messenger bot was the first foray into AI – to now the ubiquitous presence of the AI-enabled digital assistant such as Siri, the vision of artificial intelligence transforming
 from sci-fi to reality has come to fruition. But instead … More

The post From SmarterChild to Siri: Why AI is the competitive advantage securing businesses appeared first on Help Net Security.

From unstructured data to actionable intelligence: Using machine learning for threat intelligence

The security community has become proficient in using indicators of compromise (IoC) feeds for threat intelligence. Automated feeds have simplified the task of extracting and sharing IoCs. However, IoCs like IP addresses, domain names, and file hashes are in the lowest levels of the threat intelligence pyramid; they are relatively easy to access and consume, but they’re also easy for attackers to change to evade detection. IoCs are not enough.

Tactics, techniques, and procedures (TTPs) can enable organizations to extract valuable insights like patterns of attack on an enterprise or industry vertical, or trends of attacker techniques in the overall ecosystem. However, TTPs are at the highest level of the threat intelligence pyramid; this information often comes in the form of unstructured texts like blogs, research papers, and incident response (IR) reports, and the process of gathering and sharing these high-level indicators has remained largely manual.

Automating the processing of unstructured text for threat intelligence can benefit threat analysts and customers alike. At my Black Hat session “Death to the IOC: What’s Next in Threat Intelligence“, I presented a system that automates this process using machine learning and natural language processing (NLP) to identify and extract high-level patterns of attack from unstructured text.

Figure 1. Basic structure of system

Trained on documentation of known threats, this system takes unstructured text as input and extracts threat actors, attack techniques, malware families, and relationships to create attacker graphs and timelines.

Data extraction and machine learning

In natural language processing, named entity extraction is a task that aims to classify phrases into pre-defined categories. This is usually a preprocessing step for other more complex tasks like identifying aliases, relationship extraction between actors and TTPs, etc. In our use case, the categories we want to identify are threat actors, malware families, attack techniques, and relationships between entities.

To train our model, our corpus was comprised of about 2,700 publicly available documents that describe the actions, behaviors, and tools of various threat actors. On average, each document in this corpus contained about two thousand tokens.

Figure 2. Training data distributions

We also see that the distribution of tokens that fall into one of our predefined categories is very low. On average, only 1% of the tokens are relevant entities. This tells us that we have class imbalance in our data.

Therefore, in addition to using traditional features that are common to natural language processing tasks (for example, lemma, part of speech, orthographic features), we experimented with using custom word embeddings, which allow the identification of relationships between two words that mean the same thing or are used in similar contexts.

Word embeddings are vector representations of words such that the semantic context in which a word appears is captured in the numeric vector. If two words mean the same thing, or are used in the same context frequently, then we would expect the cosine similarity of their word embedding vectors to be high. In other words, in a graphical representation, datapoints for words that mean the same thing or are used in the same context frequently would be relatively close together.

For example, we looked at some clusters of points formed around APT28 and found that the four closest points to it were either aliases (Sofacy, TG-4127) of the threat or were related by attribution (APT29, Dymalloy).

Figure 3. Tensorboard visualization of custom trained embeddings

We experimented with several models that are suited for a sequence labelling problem and measured performance in two ways—on the test dataset and on only the unseen tokens in the test dataset. We found that the experiments trained using conditional random fields (CRFs) trained on traditional and word embedding features have the best performance for both these scenarios.

Figure 4. Architecture of training pipeline for extractor system

Machine learning for insightful, actionable intelligence

Using the system we developed, we automatically extracted the techniques known to be used by Emotet, a prominent commodity malware family, as well as a spread of APT actors that public documents refer to as Saffron Rose, Snake, and Muddy Water, and generated the following graph, which shows that there is a significant overlap between some techniques used by commodity malware and those used by APTs.

Figure 5. Overlaps in techniques used by commodity malware and APTs

In this graph, we can see that techniques like obfuscated PowerShell, spear-phishing, and process hollowing are not restricted to APTs, but are prevalent in commodity malware. Insights like this can be used by organizations to guide security investments. Organizations can place defensive choke points to detect or prevent these attacker techniques so that they can stop not only annoying commodity malware, but also the high-profile targeted attacks.

At Microsoft, we are continuing to push the boundaries on how machine learning can improve the security posture of our customers. The output of machine learning-backed threat intelligence will show up in the effectiveness of the protection we deliver through Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP) and the broader Microsoft Threat Protection.

In recent months, we have extensively discussed how we’re using machine learning to continuously innovate protections in Microsoft Defender ATP, particularly in hardening against evasion and adversarial attacks. In this blog we showed another application of machine learning: processing the vast amounts of threat intelligence that organizations receive and identifying high-level patterns. More importantly, we’re sharing our approaches so organizations can be inspired to explore more applications of machine learning to improve overall security.


Bhavna Soman (@bsoman3)
Microsoft Defender ATP Research



Talk to us

Questions, concerns, or insights on this story? Join discussions at the Microsoft Defender ATP community.

Read all Microsoft security intelligence blog posts.

Follow us on Twitter @MsftSecIntel.

The post From unstructured data to actionable intelligence: Using machine learning for threat intelligence appeared first on Microsoft Security.

New machine learning model sifts through the good to unearth the bad in evasive malware

We continuously harden machine learning protections against evasion and adversarial attacks. One of the latest innovations in our protection technology is the addition of a class of hardened malware detection machine learning models called monotonic models to Microsoft Defender ATP‘s Antivirus.

Historically, detection evasion has followed a common pattern: attackers would build new versions of their malware and test them offline against antivirus solutions. They’d keep making adjustments until the malware can evade antivirus products. Attackers then carry out their campaign knowing that the malware won’t initially be blocked by AV solutions, which are then forced to catch up by adding detections for the malware. In the cybercriminal underground, antivirus evasion services are available to make this process easier for attackers.

Microsoft Defender ATP’s Antivirus has significantly advanced in becoming resistant to attacker tactics like this. A sizeable portion of the protection we deliver are powered by machine learning models hosted in the cloud. The cloud protection service breaks attackers’ ability to test and adapt to our defenses in an offline environment, because attackers must either forgo testing, or test against our defenses in the cloud, where we can observe them and react even before they begin.

Hardening our defenses against adversarial attacks doesn’t end there. In this blog we’ll discuss a new class of cloud-based ML models that further harden our protections against detection evasion.

Most machine learning models are trained on a mix of malicious and clean features. Attackers routinely try to throw these models off balance by stuffing clean features into malware.

Monotonic models are resistant against adversarial attacks because they are trained differently: they only look for malicious features. The magic is this: Attackers can’t evade a monotonic model by adding clean features. To evade a monotonic model, an attacker would have to remove malicious features.

Monotonic models explained

Last summer, researchers from UC Berkeley (Incer, Inigo, et al, “Adversarially robust malware detection using monotonic classification”, Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, ACM, 2018) proposed applying a technique of adding monotonic constraints to malware detection machine learning models to make models robust against adversaries. Simply put, the said technique only allows the machine learning model to leverage malicious features when considering a file – it’s not allowed to use any clean features.

Figure 1. Features used by a baseline versus a monotonic constrained logistic regression classifier. The monotonic classifier does not use cleanly-weighted features so that it’s more robust to adversaries.

Inspired by the academic research, we deployed our first monotonic logistic regression models to Microsoft Defender ATP cloud protection service in late 2018. Since then, they’ve played an important part in protecting against attacks.

Figure 2 below illustrates the production performance of the monotonic classifiers versus the baseline unconstrained model. Monotonic-constrained models expectedly have lower outcome in detecting malware overall compared to classic models. However, they can detect malware attacks that otherwise would have been missed because of clean features.

Figure 2. Malware detection machine learning classifiers comparing the unconstrained baseline classifier versus the monotonic constrained classifier in customer protection.

The monotonic classifiers don’t replace baseline classifiers; they run in addition to the baseline and add additional protection. We combine all our classifiers using stacked classifier ensembles–monotonic classifiers add significant value because of the unique classification they provide.

How Microsoft Defender ATP uses monotonic models to stop adversarial attacks

One common way for attackers to add clean features to malware is to digitally code-sign malware with trusted certificates. Malware families like ShadowHammer, Kovter, and Balamid are known to abuse certificates to evade detection. In many of these cases, the attackers impersonate legitimate registered businesses to defraud certificate authorities into issuing them trusted code-signing certificates.

LockerGoga, a strain of ransomware that’s known for being used in targeted attacks, is another example of malware that uses digital certificates. LockerGoga emerged in early 2019 and has been used by attackers in high-profile campaigns that targeted organizations in the industrial sector. Once attackers are able breach a target network, they use LockerGoga to encrypt enterprise data en masse and demand ransom.

Figure 3. LockerGoga variant digitally code-signed with a trusted CA

When Microsoft Defender ATP encounters a new threat like LockerGoga, the client sends a featurized description of the file to the cloud protection service for real-time classification. An array of machine learning classifiers processes the features describing the content, including whether attackers had digitally code-signed the malware with a trusted code-signing certificate that chains to a trusted CA. By ignoring certificates and other clean features, monotonic models in Microsoft Defender ATP can correctly identify attacks that otherwise would have slipped through defenses.

Very recently, researchers demonstrated an adversarial attack that appends a large volume of clean strings from a computer game executable to several well-known malware and credential dumping tools – essentially adding clean features to the malicious files – to evade detection. The researchers showed how this technique can successfully impact machine learning prediction scores so that the malware files are not classified as malware. The monotonic model hardening that we’ve deployed in Microsoft Defender ATP is key to preventing this type of attack, because, for a monotonic classifier, adding features to a file can only increase the malicious score.

Given how they significantly harden defenses, monotonic models are now standard components of machine learning protections in Microsoft Defender ATP‘s Antivirus. One of our monotonic models uniquely blocks malware on an average of 200,000 distinct devices every month. We now have three different monotonic classifiers deployed, protecting against different attack scenarios.

Monotonic models are just the latest enhancements to Microsoft Defender ATP’s Antivirus. We continue to evolve machine learning-based protections to be more resilient to adversarial attacks. More effective protections against malware and other threats on endpoints increases defense across the entire Microsoft Threat Protection. By unifying and enabling signal-sharing across Microsoft’s security services, Microsoft Threat Protection secures identities, endpoints, email and data, apps, and infrastructure.


Geoff McDonald (@glmcdona),Microsoft Defender ATP Research team
with Taylor Spangler, Windows Data Science team



Talk to us

Questions, concerns, or insights on this story? Join discussions at the Microsoft Defender ATP community.

Follow us on Twitter @MsftSecIntel.

The post New machine learning model sifts through the good to unearth the bad in evasive malware appeared first on Microsoft Security.

Lessons learned from the Microsoft SOC—Part 2: Organizing people

In the second post in our series, we focus on the most valuable resource in the security operations center (SOC)—our people. This series is designed to share our approach and experience with operations, so you can use what we learned to improve your SOC. In Part 1: Organization, we covered the SOC’s organizational role and mission, culture, and metrics.

The lessons in the series come primarily from Microsoft’s corporate IT security operation team, one of several specialized teams in the Microsoft Cyber Defense Operations Center (CDOC). We also include lessons our Detection and Response Team (DART) have learned helping our customers respond to major incidents.

People are the most valuable asset in the SOC—their experience, skill, insight, creativity, and resourcefulness are what makes our SOC effective. Our SOC management team spends a lot of time thinking about how to ensure our people are set up with what they need to succeed and stay engaged. As we’ve improved our processes, we’ve been able to decrease the time it takes to ramp people up and increase employee enjoyment of their jobs.

Today, we cover the first two aspects of how to set up people in the SOC for success:

  • Empower humans with automation.
  • Microsoft SOC teams and tiers model.

Empower humans with automation

Rapidly sorting out the signal (real detections) from the noise (false positives) in the SOC requires investing in both humans and automation. We strongly believe in the power of automation and technology to reduce human toil, but ultimately, we’re dealing with human attack operators and human judgement is critical to the process.

In our SOC, automation is not about using efficiency to remove humans from the process—it is about empowering humans. We continuously think about how we can automate repetitive tasks from the analyst’s job, so they can focus on the complex problems that people are uniquely able to solve.

Automation empowers humans to do more in the SOC by increasing response speed and capturing human expertise. The toil our staff experiences comes mostly from repetitive tasks and repetitive tasks come from either attackers or defenders doing the same things over and over. Repetitive tasks are ideal candidates for automation.

We also found that we need to constantly refine the automation because attackers are creative and persistent, constantly innovating to avoid detections and preventive controls. When an effective attack method is identified (like phishing), they exploit it until it stops working. But they also continually innovate new tactics to evade defenses introduced by the cybersecurity community. Given the profit potential of attacks, we expect the challenges of evolving attacks to continue for the foreseeable future.

When repetitive and boring work is automated, analysts can apply more of their creative minds and energy to solving the new problems that attackers present to them and proactively hunting for attackers that got past the first lines of defense. We’ll discuss areas where we use automation and machine learning in “Part 3: Technology.”

Microsoft SOC teams and tiers model

At Microsoft, we organized our SOC into specialized teams, allowing them to better develop and apply deep expertise, which supports the overall goals of reducing time to acknowledge and remediate.

This diagram represents the key SOC functions: threat intelligence, incident management, and SOC analyst tiers:

Image showing key SOC functions: threat intelligence, incident management, and SOC analysts (tiers 1, 2, and 3).

Threat intelligence—We have several threat intelligence teams at Microsoft that support the SOC and other business functions. Their role is to both inform business stakeholders of risk and provide technical support for incident investigations, hunting operations, and defensive measures for known threats. These strategic (business) and tactical (technical) intelligence goals are related but distinctly different from each other. We task different teams for each goal and ensure processes are in place (such as daily standup meetings) to keep them in close contact.

Incident management—Enterprise-wide coordination of incidents, impact assessment, and related tasks are handled by dedicated personnel separate from technical analyst teams. At Microsoft, these incident response teams work with the SOC and business stakeholders to coordinate actions that may impact services or business units. Additionally, this team brings in legal, compliance, and privacy experts as needed to consult and advise on actions regarding regulatory aspects of incidents. This is particularly important at Microsoft because we’re compliant with a large number of international standards and regulations.

SOC analyst tiers—This three-tier model for SOC analysts will probably look familiar to seasoned SOC professionals, though there are some subtleties in our model we don’t see widely in the industry.

Image showing Microsoft's Corporate IT SOC tiers and tools: alert queue (hot path), proactive hunt (cold path), tiers 3, 2, and 1, and finally, automation.

Our organization uses the term hot path and cold path to describe how we discover adversaries and optimize processes to handle them.

  • Hot path—Reflects detection of confirmed active attack activity that must be investigated and remediated as soon as possible. Managing and remediating these incidents are primarily handled by Tier 1 and Tier 2, though a small percentage (about 4 percent) are escalated to Tier 3. Automation of investigations and remediations are also beginning to help reduce hot path workloads.
  • Cold path—Refers to all other activities including proactively hunting for adversary campaigns that haven’t triggered a hot path alert.

Roles and functions of the SOC analyst tiers

Tier 1—This team is the primary front line for and focuses on high-speed remediation over a large volume of incidents. Tier 1 analysts respond to a very specific set of alert sources and follow prescriptive instructions to investigate, remediate, and document the incidents. The rule of thumb for alerts that Tier 1 handles is that it can be typically remediated within seconds to minutes. The incidents will be escalated to Tier 2 if the incident isn’t covered by a documented Tier 1 procedure or it requires involved/advanced remediation (for example, device isolation and cleanup).

In addition:

  • The Tier 1 function is currently performed by full-time employees in our corporate IT SOC. In the past and in other teams at Microsoft, we staffed contracted employees or managed service agreements for Tier 1 functions.
  • A current initiative for the full-time employee Tier 1 team is to increase the use of automated investigation and remediation for these incidents. One goal of this initiative is to grow the skills of our current Tier 1 employees, so they can shift to proactive work in other security assignments in SOC or across the company.
  • Tier 1 (and Tier 2) SOC analysts may stay involved with an escalated incident until it is remediated. This helps preserve context during and after transferring ownership of an incident and also accelerates their learning and skills growth.
  • The typical ratio of alert volumes is noted in the Tiers and Tools diagram above. (We’ll share more details in “Part 3: Technology.”)

Tier 2—This team is focused on incidents that require deeper analysis and remediation. Many Tier 2 incidents have been escalated from Tier 1 analysts, but Tier 2 also directly monitors alerts for sensitive assets and known attacker campaigns. These incidents are usually more complex and require an approach that is still structured, but much more flexible than Tier 1 procedures. Additionally, some Tier 2 analysts also proactively hunt for adversaries (typically using lower priority alerts from the same Microsoft Threat Protection tools they use to manage reactive incidents).

Tier 3—This team is focused primarily on advanced hunting and sophisticated analysis to identify anomalies that may indicate advanced adversaries. Most incidents are remediated at Tiers 1 and 2 (96 percent) and only unprecedented findings or deviations from norms are escalated to Tier 3 teams. Tier 3 team members have a high degree of freedom to bring their different skills, backgrounds, and approaches to the goal of ferreting out red team/hidden adversaries. Tier 3 team members have backgrounds as security professionals, data scientists, intelligence analysts, and more. These teams use different tools (Microsoft, custom, and third-party) to sift through a number of different datasets to uncover hidden adversary activity. A favorite of many analysts is the use of Kusto Query Language (KQL) queries across Microsoft Threat Protection tool datasets.

The structure of Tier 3 has changed over time, but has recently gravitated to four different functions:

  • Major incident engineering—Handles escalation of incidents from Tier 2. These virtual teams are created as needed to support the duration of the incident and often include both reactive investigations, as well as proactive hunting for additional adversary presence.
  • External adversary research and threat analysis—Focuses on hunting for adversaries using existing tools and data sources, as well as signals from various external intelligence sources. The team is focused on both hunting for undiscovered adversaries as well as creating and refining alerts and automation.
  • Purple team operations—A temporary duty assignment where Tier 3 analysts (blue team) are paired with our in-house attack team members (red team) as they perform authorized attacks. We found this purple (red+blue) activity results in high-value learning by both teams, strengthening our overall security posture and resilience. This team is also responsible for the critical task of coordinating with red team to deconflict whether a detection is an authorized red team or a live attacker. At customer organizations, we’ve seen failure to deconflict red team activity result in our DART teams flying onsite to respond to a false alarm (an avoidable, expensive, and embarrassing mistake).
  • Future operations team—Focuses on future-proofing our technology and processes by building and testing new capabilities.

Learn more

For more insights into Microsoft’s approach to using technology to empower people, watch Ann Johnson’s keynote at RSA 2019 and download our poster. For information on organizational culture and goals, read Lessons learned from the Microsoft SOC—Part 1: Organization. In addition, see our CISO series to learn more.

Stayed tuned for the second segment in “Lessons learned from the Microsoft SOC—Part 2,” where we’ll cover career paths and readiness programs for people in our SOC. And finally, we’ll wrap up this series with “Part 3: Technology,” where we’ll discuss the technology that enables our people to accomplish their mission.

For more discussion on some of these topics, see John and Kristina’s session (starting at 1:05:48) at Microsoft’s recent Virtual Security Summit.

The post Lessons learned from the Microsoft SOC—Part 2: Organizing people appeared first on Microsoft Security.