Category Archives: machine learning

Thoughtful Design in the Age of Cybersecurity AI

Reading Time: ~ 3 min.

AI and machine learning offer tremendous promise for humanity in terms of helping us make sense of Big Data. But, while the processing power of these tools is integral for understanding trends and predicting threats, it’s not sufficient on its own.

Thoughtful design of threat intelligence—design that accounts for the ultimate needs of its consumers—is essential too. There are three areas where thoughtful design of AI for cybersecurity increases overall utility for its end users.

Designing where your data comes from

To set the process of machine learning in motion, data scientists rely on robust data sets they can use to train models that deduce patterns. If your data is siloed, it relies on a single community of endpoints or is made up only of data gathered from sensors like honeypots and crawlers. There are bound to be gaps in the resultant threat intelligence.

A diverse set of real-world endpoints is essential to achieve actionable threat intelligence. For one thing, machine learning models can be prone to picking up biases if exposed to either too much of a particular threat or too narrow of a user base. That may make the model adept at discovering one type of threat, but not so great at noticing others. Well-rounded, globally-sourced data provides the most accurate picture of threat trends.

Another significant reason real-world endpoints are essential is that some malware excels at evading traditional crawling mechanisms. This is especially common for phishing sites targeting specific geos or user environments, as well as for malware executables. Phishing sites can hide their malicious content from crawlers, and malware can appear benign or sit on a user’s endpoint for extended periods of time without taking an action.

Designing how to illustrate data’s context

Historical trends help to gauge future measurements, so designing threat intelligence that accounts for context is essential. Take a major website like for example. Historical threat intelligence signals it’s been benign for years, leading to the conclusion that its owners have put solid security practices in place and are committed to not letting it become a vector for bad actors. On the other hand, if we look at a domain that was only very recently registered or has a long history of presenting a threat, there’s a greater chance it will behave negatively in the future. 

Illustrating this type of information in a useful way can take the form of a reputation score. Since predictions about a data object’s future actions—whether it be a URL, file, or mobile app—are based on probability, reputation scores can help determine the probability that an object may become a future threat, helping organizations determine the level of risk they are comfortable with and set their policies accordingly.

For more information on why context is critical to actionable threat intelligence, click here.

Designing how you classify and apply the data

Finally, how a threat intelligence provider classifies data and the options they offer partners and users in terms of how to apply it can greatly increase its utility. Protecting networks, homes, and devices from internet threats is one thing, and certainly desirable for any threat intelligence feed, but that’s far from all it can do.

Technology vendors designing a parental control product, for instance, need threat intelligence capable of classifying content based on its appropriateness for children. And any parent knows malware isn’t the only thing children should be shielded from. Categories like adult content, gambling sites, or hubs for pirating legitimate media may also be worthy of avoiding. This flexibility extends to the workplace, too, where peer-to-peer streaming and social media sites can affect worker productivity and slow network speeds, not to mention introduce regulatory compliance concerns. Being able to classify internet object with such scalpel-like precision makes thoughtfully designed threat intelligence that is much more useful for the partners leveraging it.

Finally, the speed at which new threat intelligence findings are applied to all endpoints on a device is critical. It’s well-known that static threat lists can’t keep up with the pace of today’s malware, but updating those lists on a daily basis isn’t cutting it anymore either. The time from initial detection to global protection must be a matter of minutes.

This brings us back to where we started: the need for a robust, geographically diverse data set from which to draw our threat intelligence. For more information on how the Webroot Platform draws its data to protect customers and vendor partners around the globe, visit our threat intelligence page.

The post Thoughtful Design in the Age of Cybersecurity AI appeared first on Webroot Blog.

Deep learning rises: New methods for detecting malicious PowerShell

Scientific and technological advancements in deep learning, a category of algorithms within the larger framework of machine learning, provide new opportunities for development of state-of-the art protection technologies. Deep learning methods are impressively outperforming traditional methods on such tasks as image and text classification. With these developments, there’s great potential for building novel threat detection methods using deep learning.

Machine learning algorithms work with numbers, so objects like images, documents, or emails are converted into numerical form through a step called feature engineering, which, in traditional machine learning methods, requires a significant amount of human effort. With deep learning, algorithms can operate on relatively raw data and extract features without human intervention.

At Microsoft, we make significant investments in pioneering machine learning that inform our security solutions with actionable knowledge through data, helping deliver intelligent, accurate, and real-time protection against a wide range of threats. In this blog, we present an example of a deep learning technique that was initially developed for natural language processing (NLP) and now adopted and applied to expand our coverage of detecting malicious PowerShell scripts, which continue to be a critical attack vector. These deep learning-based detections add to the industry-leading endpoint detection and response capabilities in Microsoft Defender Advanced Threat Protection (Microsoft Defender ATP).

Word embedding in natural language processing

Keeping in mind that our goal is to classify PowerShell scripts, we briefly look at how text classification is approached in the domain of natural language processing. An important step is to convert words to vectors (tuples of numbers) that can be consumed by machine learning algorithms. A basic approach, known as one-hot encoding, first assigns a unique integer to each word in the vocabulary, then represents each word as a vector of 0s, with 1 at the integer index corresponding to that word. Although useful in many cases, the one-hot encoding has significant flaws. A major issue is that all words are equidistant from each other, and semantic relations between words are not reflected in geometric relations between the corresponding vectors.

Contextual embedding is a more recent approach that overcomes these limitations by learning compact representations of words from data under the assumption that words that frequently appear in similar context tend to bear similar meaning. The embedding is trained on large textual datasets like Wikipedia. The Word2vec algorithm, an implementation of this technique, is famous not only for translating semantic similarity of words to geometric similarity of vectors, but also for preserving polarity relations between words. For example, in Word2vec representation:

Madrid – Spain + Italy ≈ Rome

Embedding of PowerShell scripts

Since training a good embedding requires a significant amount of data, we used a large and diverse corpus of 386K distinct unlabeled PowerShell scripts. The Word2vec algorithm, which is typically used with human languages, provides similarly meaningful results when applied to PowerShell language. To accomplish this, we split the PowerShell scripts into tokens, which then allowed us to use the Word2vec algorithm to assign a vectorial representation to each token .

Figure 1 shows a 2-dimensional visualization of the vector representations of 5,000 randomly selected tokens, with some tokens of interest highlighted. Note how semantically similar tokens are placed near each other. For example, the vectors representing -eq, -ne and -gt, which in PowerShell are aliases for “equal”, “not-equal” and “greater-than”, respectively, are clustered together. Similarly, the vectors representing the allSigned, remoteSigned, bypass, and unrestricted tokens, all of which are valid values for the execution policy setting in PowerShell, are clustered together.

Figure 1. 2D visualization of 5,000 tokens using Word2vec

Examining the vector representations of the tokens, we found a few additional interesting relationships.

Token similarity: Using the Word2vec representation of tokens, we can identify commands in PowerShell that have an alias. In many cases, the token closest to a given command is its alias. For example, the representations of the token Invoke-Expression and its alias IEX are closest to each other. Two additional examples of this phenomenon are the Invoke-WebRequest and its alias IWR, and the Get-ChildItem command and its alias GCI.

We also measured distances within sets of several tokens. Consider, for example, the four tokens $i, $j, $k and $true (see the right side of Figure 2). The first three are usually used to represent a numeric variable and the last naturally represents a Boolean constant. As expected, the $true token mismatched the others – it was the farthest (using the Euclidean distance) from the center of mass of the group.

More specific to the semantics of PowerShell in cybersecurity, we checked the representations of the tokens: bypass, normal, minimized, maximized, and hidden (see the left side of Figure 2). While the first token is a legal value for the ExecutionPolicy flag in PowerShell, the rest are legal values for the WindowStyle flag. As expected, the vector representation of bypass was the farthest from the center of mass of the vectors representing all other four tokens.

Figure 2. 3D visualization of selected tokens

Linear Relationships: Since Word2vec preserves linear relationships, computing linear combinations of the vectorial representations results in semantically meaningful results. Below are a few interesting relationships we found:

high – $false + $true ≈’ low
‘-eq’ – $false + $true ‘≈ ‘-neq’
DownloadFile – $destfile + $str ≈’ DownloadString ‘
Export-CSV’ – $csv + $html ‘≈ ‘ConvertTo-html’
‘Get-Process’-$processes+$services ‘≈ ‘Get-Service’

In each of the above expressions, the sign ≈ signifies that the vector on the right side is the closest (among all the vectors representing tokens in the vocabulary) to the vector that is the result of the computation on the left side.

Detection of malicious PowerShell scripts with deep learning

We used the Word2vec embedding of the PowerShell language presented in the previous section to train deep learning models capable of detecting malicious PowerShell scripts. The classification model is trained and validated using a large dataset of PowerShell scripts that are labeled “clean” or “malicious,” while the embeddings are trained on unlabeled data. The flow is presented in Figure 3.

Figure 3 High-level overview of our model generation process

Using GPU computing in Microsoft Azure, we experimented with a variety of deep learning and traditional ML models. The best performing deep learning model increases the coverage (for a fixed low FP rate of 0.1%) by 22 percentage points compared to traditional ML models. This model, presented in Figure 4, combines several deep learning building blocks such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). Neural networks are ML algorithms inspired by biological neural systems like the human brain. In addition to the pretrained embedding described here, the model is provided with character-level embedding of the script.

Figure 4 Network architecture of the best performing model

Real-world application of deep learning to detecting malicious PowerShell

The best performing deep learning model is applied at scale using Microsoft ML.Net technology and ONNX format for deep neural networks to the PowerShell scripts observed by Microsoft Defender ATP through the AMSI interface. This model augments the suite of ML models and heuristics used by Microsoft Defender ATP to protect against malicious usage of scripting languages.

Since its first deployment, this deep learning model detected with high precision many cases of malicious and red team PowerShell activities, some undiscovered by other methods. The signal obtained through PowerShell is combined with a wide range of ML models and signals of Microsoft Defender ATP to detect cyberattacks.

The following are examples of malicious PowerShell scripts that deep learning can confidently detect but can be challenging for other detection methods:

Figure 5. Heavily obfuscated malicious script

Figure 6. Obfuscated script that downloads and runs payload

Figure 7. Script that decrypts and executes malicious code

Enhancing Microsoft Defender ATP with deep learning

Deep learning methods significantly improve detection of threats. In this blog, we discussed a concrete application of deep learning to a particularly evasive class of threats: malicious PowerShell scripts. We have and will continue to develop deep learning-based protections across multiple capabilities in Microsoft Defender ATP.

Development and productization of deep learning systems for cyber defense require large volumes of data, computations, resources, and engineering effort. Microsoft Defender ATP combines data collected from millions of endpoints with Microsoft computational resources and algorithms to provide industry-leading protection against attacks.

Stronger detection of malicious PowerShell scripts and other threats on endpoints using deep learning mean richer and better-informed security through Microsoft Threat Protection, which provides comprehensive security for identities, endpoints, email and data, apps, and infrastructure.


Shay Kels and Amir Rubin
Microsoft Defender ATP team


Additional references:

The post Deep learning rises: New methods for detecting malicious PowerShell appeared first on Microsoft Security.

Analytics 101

From today’s smart home applications to autonomous vehicles of the future, the efficiency of automated decision-making is becoming widely embraced. Sci-fi concepts such as “machine learning” and “artificial intelligence” have been realized; however, it is important to understand that these terms are not interchangeable but evolve in complexity and knowledge to drive better decisions.

Distinguishing Between Machine Learning, Deep Learning and Artificial Intelligence

Put simply, analytics is the scientific process of transforming data into insight for making better decisions. Within the world of cybersecurity, this definition can be expanded to mean the collection and interpretation of security event data from multiple sources, and in different formats for identifying threat characteristics.

Simple explanations for each are as follows:

  • Machine Learning: Automated analytics that learn over time, recognizing patterns in data.  Key for cybersecurity because of the volume and velocity of Big Data.
  • Deep Learning: Uses many layers of input and output nodes (similar to brain neurons), with the ability to learn.  Typically makes use of the automation of Machine Learning.
  • Artificial Intelligence: The most complex and intelligent analytical technology, as a self-learning system applying complex algorithms which mimic human-brain processes such as anticipation, decision making, reasoning, and problem solving.

Benefits of Analytics within Cybersecurity

Big Data, the term coined in October 1997, is ubiquitous in cybersecurity as the volume, velocity and veracity of threats continue to explode. Security teams are overwhelmed by the immense volume of intelligence they must sift through to protect their environments from cyber threats. Analytics expand the capabilities of humans by sifting through enormous quantities of data and presenting it as actionable intelligence.

While the technologies must be used strategically and can be applied differently depending upon the problem at hand, here are some scenarios where human-machine teaming of analysts and analytic technologies can make all the difference:

  • Identify hidden malware with Machine Learning: Machine Learning algorithms recognize patterns far more quickly than your average human. This pattern recognition can detect behaviors that cause security breaches, whether known or unknown, periodically “learning” to become smarter. Machine Learning can be descriptive, diagnostic, predictive, or prescriptive in its analytic assessments, but typically is diagnostic and/or predictive in nature.
  • Defend against new threats with Deep Learning: Complex and multi-dimensional, Deep Learning reflects similar multi-faceted security behaviors in its actual algorithms; if the situation is complex, the algorithm is likely to be complex. It can detect, protect, and correct old or new threats by learning what is reasonable within any environment and identifying outliers and unique relationships.  Deep Learning can be descriptive, diagnostic, predictive, and prescriptive as well.
  • Anticipate threats with Artificial Intelligence: Artificial Intelligence uses reason and logic to understand its ecosystem. Like a human brain, AI considers value judgements and outcomes in determining good or bad, right or wrong.  It utilizes a number of complex analytics, including Deep Learning and Natural Language Processing (NLP). While Machine Learning and Deep Learning can span descriptive to prescriptive analytics, AI is extremely good at the more mature analytics of predictive and prescriptive.

With any security solution, therefore, it is important to identify the use case and ask “what problem are you trying to solve” to select Machine Learning, Deep Learning, or Artificial Intelligence analytics.  In fact, sometimes a combination of these approaches is required, like many McAfee products including McAfee Investigator.  Human-machine teaming as well as a layered approach to security can further help to detect, protect, and correct the most simple or complex of breaches, providing a complete solution for customers’ needs.

The post Analytics 101 appeared first on McAfee Blogs.