Category Archives: remote code execution

Hackers Exploiting ‘Bitmessage’ Zero-Day to Steal Bitcoin Wallet Keys

Bitmessage developers have warned of a critical 'remotely executable' zero-day vulnerability in the PyBitmessage application that was being exploited in the wild. Bitmessage is a Peer-to-Peer (P2P) communications protocol used to send encrypted messages to users. Since it is decentralized and trustless communications, one need-not inherently trust any entities like root certificate

Patch released to fix Firefox arbitrary code execution vulnerability

Mozilla Firefox released an update to patch its open-source web browser after developer Johann Hofmann detected a critical HTML flaw that could allow hackers to exploit the browser remotely. The vulnerability only affected the desktop version of Firefox, and not iOS, Android and Amazon Fire TV versions.

The vulnerability was the result of “insufficient sanitization of HTML fragments in chrome-privileged documents by the affected software,” according to a detailed advisory released by Cisco on Tuesday.

To infiltrate the system, the hacker would use either misleading language or instructions to convince the user to click on a link or open a file that seems legitimate. After the user follows instructions, the attacker gets admin privileges and can remotely corrupt the vulnerable software.

The critical HTML hijack vulnerability exploited Firefox’s Chrome User Interface design elements (no relation to Google Chrome) such as “menu bars, progress bars, window title bars, toolbars, or UI elements created by add-ons,” explains BleepingComputer.

Firefox 58.0.1 is the first update to the new Firefox Quantum browser, just after a week the browser was officially launched. Firefox users are advised to immediately update their browser and not open any emails or click on links that appear suspicious or are sent by unknown contacts. If there any doubts regarding the source of a link, file or email, it’s safer not to click, download or open.

When asked about its plans for 2018, Mozilla wants to expand into the mobile ecosystem by launching an improvement similar to Quantum and heavily focus on Focus, the iOS and Android Firefox version.

“Mobile will be huge for Mozilla in 2018 and we will see how much of that we want to include in Firefox, Focus or even other apps,” Barbara Bermes, product manager for Firefox Mobile told Neowin in an interview. “As it relates in particular to Focus, we want to be the trusted browser providing the most privacy by design and by default. The idea is to include smart defaults that address privacy concerns while not sacrificing performance or convenience.”

(Unpatched) Adobe Flash Player Zero-Day Exploit Spotted in the Wild

Another reason to uninstall Adobe Flash Player—a new zero-day Flash Player exploit has reportedly been spotted in the wild by North Korean hackers. South Korea's Computer Emergency Response Team (KR-CERT) issued an alert Wednesday for a new Flash Player zero-day vulnerability that's being actively exploited in the wild by North Korean hackers to target Windows users in South Korea. <!--

Update Your Firefox Browser to Fix a Critical Remotely Exploitable Flaw

Mozilla has released an important update for its Firefox web browser to patch a critical vulnerability that could allow remote attackers to execute malicious code on computers running an affected version of the browser. The update comes just a week after the company rolled out its new Firefox Quantum browser, a.k.a Firefox 58, with some new features like improved graphics engine and

Cisco Fixes 10.0 CVSS-Scored RCE Bug Affecting Its ASA Software

Cisco has patched a remote code execution (RCE) vulnerability bearing a “perfect” CVSS score of 10.0 that affects its Adaptive Security Appliance (ASA) software. On 29 January, the American multinational technology conglomerate publicly recognized the security issue (CVE-2018-0101) and revealed that it affects the ASA software found in the following 10 Cisco products: 3000 Series […]… Read More

The post Cisco Fixes 10.0 CVSS-Scored RCE Bug Affecting Its ASA Software appeared first on The State of Security.

Critical Flaw Hits Popular Windows Apps Built With Electron JS Framework

A critical remote code execution vulnerability has been reported in Electron—a popular web application framework that powers thousands of widely-used desktop applications including Skype, Signal, Wordpress and Slack—that allows for remote code execution. Electron is an open-source framework that is based on Node.js and Chromium Engine and allows app developers to build cross-platform native

Browser security beyond sandboxing

Security is now a strong differentiator in picking the right browser. We all use browsers for day-to-day activities like staying in touch with loved ones, but also for editing sensitive private and corporate documents, and even managing our financial assets. A single compromise through a web browser can have catastrophic results. It doesn’t help that browsers are also on their way to becoming some of the most complex pieces of consumer software in existence, increasing potential attack surface.

Our job in the Microsoft Offensive Security Research (OSR) team is to make computing safer. We do this by identifying ways to exploit software, and working with other teams across the company on solutions to mitigate attacks. This workflow typically involves identifying software vulnerabilities to exploit. However, we believe that there will always be more vulnerabilities to find, so that isn’t our primary focus. Instead, our job is really all about asking: assuming a vulnerability exists, what can we do with it?

We’ve so far had success with this approach. We have helped improve the security of several Microsoft products, including Microsoft Edge. We continue to make strides in preventing both Remote Code Execution (RCE) with mitigations like Control Flow Guard (CFG), export suppression, and Arbitrary Code Guard (ACG), and isolation, notably with Less Privileged AppContainer (LPAC) and Windows Defender Application Guard (WDAG). Still, we believe it’s important for us to validate our security strategy. One way we do this is to look at what other companies are doing and study the results of their efforts.

For this project, we set out to examine Google’s Chrome web browser, whose security strategy shows a strong focus on sandboxing. We wanted to see how Chrome held up against a single RCE vulnerability, and try to answer: is having a strong sandboxing model sufficient to make a browser secure?

Some of our key findings include the following:

  • Our discovery of CVE-2017-5121 indicates that it is possible to find remotely exploitable vulnerabilities in modern browsers
  • Chrome’s relative lack of RCE mitigations means the path from memory corruption bug to exploit can be a short one
  • Several security checks being done within the sandbox result in RCE exploits being able to, among other things, bypass Same Origin Policy (SOP), giving RCE-capable attackers access to victims’ online services (such as email, documents, and banking sessions) and saved credentials
  • Chrome’s process for servicing vulnerabilities can result in the public disclosure of details for security flaws before fixes are pushed to customers

Finding and exploiting a remote vulnerability

To do this evaluation, we first needed to find some kind of entry point vulnerability. Typically, we do this by finding a memory corruption bug, such as buffer overflow or use-after-free vulnerability. As with any web browser, the attack surface is extensive, including the V8 JavaScript interpreter, the Blink DOM engine, and the pdfium PDF renderer, among others. For this project, we focused our attention on V8.

The bug we ended up using for our exploit was discovered through fuzzing. We leveraged the Windows Security Assurance team's Azure-based fuzzing infrastructure to run ExprGen, an internal JavaScript fuzzer written by the team behind Chakra, our own JavaScript engine. People were likely already throwing all publicly available fuzzers at V8; ExprGen, on the other hand, had only ever been run against Chakra, giving it greater chances of leading to the discovery of new bugs.

Identifying the bug

One of the disadvantages of fuzzing, compared to manual code review, is that it's not always immediately clear what causes a given test case to trigger a vulnerability, or if the unexpected behavior even constitutes a vulnerability at all. This is especially true for us at OSR; we have no prior experience working with V8 and therefore know fairly little about its internal workings. In this instance, the test case produced by ExprGen reliably crashed V8, but not always in the same way, and not in a way that could be easily influenced by an attacker.

As fuzzers often generate very large and convoluted pieces of code (in this case, nearly 1,500 lines of unreadable JavaScript), the first step is typically to minimize the test case -- trimming the fat until we're left with a small, understandable piece of code. This is what we ended up with:

The code above looks strange and doesn't really achieve anything, but it is valid JavaScript. All it does is to create an oddly structured object, then set some of its fields. This shouldn't trigger any strange behavior, but it does. When this code is run using D8, V8's standalone executable version, built from git tag 6.1.534.32, we get a crash:

Looking at the address the crash occurs at (0x000002d168004f14), we can tell it does not happen within a static module. It must therefore be in code dynamically generated by V8's Just-In-Time (JIT) compiler. We also see that the crash happens because the rax register is zero.

At first glance, this looks like a classic null dereference bug, which would be a let-down: those are typically not exploitable because modern operating systems prevent the zero virtual address from being mapped. We can look at surrounding code in order to get a better idea of what might be going on:

We can extract a few things from this code. First, we notice that our crash occurs right before a function call to what looks like a JavaScript function dispatcher stub, mostly due to the address of v8::internal::Builtin_FunctionPrototypeToString being loaded into a register right before that call. Looking at the code of the function located at 0x000002d167e84500, we find that address 0x000002d167e8455f does contain a call rbx instruction, which appears to confirm our suspicion.

The fact that it calls Builtin_FunctionPrototypeToString is interesting, because that's the implementation for the Object.toString method, which our minimized test case calls into. This appears to indicate that the crash is happening within the JIT-compiled version of our func0 Javascript function.

The second piece of information we can glean from the disassembly above is that the zero value contained in register rax at the time of the crash is loaded from memory. It also looks like the value that should have been loaded is being passed to the toString function call as a parameter. We can tell that it's being loaded from [rdi + 0x18]. Based on that, we can take a look at that piece of memory:

This doesn't yield very useful information. We can see that most of these values are pointers, but that's about it. However, it’s useful to know where the value (which is meant to be a pointer) is loaded from, because it can help us figure out why this value is zero in the first place. Using the WinDbg's newly public Time Travel Debugging (TTD) feature, we can place a memory write breakpoint at that location (baw 8 0000025e`a6845dd0), then place an execution breakpoint at the start of the function, and finally re-run the trace backwards (g-).

Interestingly, our memory write breakpoint doesn't trigger, meaning that this memory slot does not get initialized in this function, or at least not before it's used. This might be normal, but if we play around with the test case, for example by replacing the o.b.bc.bca.bcab = 0; line with o.b.bc.bca.bcab = 0xbadc0de;, then we start noticing changes to the memory region where our crash value originates:

We see that our 0xbadc0de constant value ends up in that memory region. Although this doesn't prove anything, it makes it seem likely that this memory region is used by the JIT-compiled function to store local variables. That idea is reinforced by how the code from earlier looked like the value we crash trying to load was being passed to Object.toString as a parameter.

Combined with the fact that TTD confirmed that this memory slot is not initialized by the function, a possible explanation is that the JIT compiler is failing to emit code that would initialize the pointers representing the object members used to access the o.b.ba.bab field.

To confirm this, we can run the test case in D8 with the --trace-turbo and --trace-turbo-graph parameters. Doing so will cause D8 to output information about how TurboFan, V8's JIT compiler, goes about building and optimizing the relevant code. We can use this in conjunction with turbolizer to visualize the graphs that TurboFan uses to represent and optimize code.

TurboFan works by applying various optimization phases to the graph one after the other. About half-way through the optimization pipeline, after the Load elimination optimization phase, this is what our code's flow looks like:

It's fairly straightforward: the optimizer apparently inlined func0 into the infinite loop, and then pulled the first loop iteration out. This information is useful to see how the blocks relate to each other. However, this representation omits nodes that correspond to loading function call parameters, as well as the initialization of local variables, which is the information we're interested in.

Thankfully, we can use turbolizer's interface to display those. Focusing on the second Object.toString call, we can see where the parameter comes from, as well as where it's allocated and initialized:

(NOTE: node labels were manually edited for readability)

At this stage in the optimization pipeline, the code looks perfectly reasonable:

  • A memory block is allocated to store local object o.b.ba (node 235), and its fields baa and bab are initialized
  • A memory block is allocated to store local object o.b (node 259), and its fields are all initialized, with ba specifically being initialized with a reference to the previous o.b.ba allocation
  • A memory block is allocated to store local object o (node 303), and its fields are all initialized
  • Local object o's field b is overwritten with a reference to object o.b (node 185)
  • Local object field o.b.ba.bab is loaded (nodes 199, 209, and 212)
  • The Object.toString method is called, passing o.b.ba.bab as first argument

Code compiled at this stage in the optimization pipeline looks like it shouldn't exhibit the uninitialized local variable behavior that we're hypothesizing is the root cause of the bug. That being said, certain aspects of this representation do lend credence to our hypothesis. Looking at nodes 209 and 212, which load o.b.ba and o.b.ba.bab, respectively, for use as a function call parameter, we can see that the offsets +24 and +32 correspond to the disassembly of the crashing code:

0x17 and 0x1f are values 23 and 31, respectively. This fits, when taking into account how V8 tags values in order to distinguish actual objects from inlined integers (SMIs): if a value meant to represent a JavaScript variable has its least significant bit set, it is considered a pointer to an object, and otherwise an SMI. Because of this, V8 code is optimized to subtract one from JavaScript object offsets before they are used for dereferencing.

As we still don't have an explanation for the bug, we keep looking through optimization passes until we find something strange. This happens after the Escape analysis pass. At that point, the graph looks like the following:

There are two notable differences:

  • The code no longer goes through the trouble of loading o and then o.b—it was optimized to reference o.b directly, probably because that field's value is never changed
  • The code no longer initializes o.b.ba; as can be seen in the graph, turbolizer grays out node 264, which means it is no longer live, and therefore won't be built into the final code

Looking through all the live nodes at this stage seems to confirm that this field is no longer being initialized. As another sanity check, we run d8 on this test case with the flag --no-turbo-escape in order to omit this optimization phase: d8 no longer crashes, confirming that this is where the issue stems from. In fact, that turned out to be Google's fix for the bug: completely disable the escape analysis phase in v8 6.1 until the new escape analysis module was ready for production in v8 6.2.

With all this information about the bug's root cause in hand, we need to find ways to exploit it. It looks like it could be a very powerful bug, but it depends entirely on our ability to control the uninitialized memory slot, as well as how it ends up being used.

Getting an info leak

At this point, the easiest way to get an idea of what we can or can't do with the bug is simply to play around with the test case. For example, we can look at the effect of changing the type of the field we're loading from an uninitialized pointer:

The result is that the field is now loaded directly as a float, rather than an object pointer or SMI:

Similarly, we can try adding more fields to the object:

Running this, we get the following crash:

This is interesting, because it looks like adding fields to the object modifies the offsets from where the object fields are loaded. In fact, if we do the math, we see that (0x67 - 0x1f) / 8 = 9, which is exactly the number of fields we added from o.b.ba.bab. The same applies to the new offset that rbx is loaded from.

Playing around with the test case a bit more, we are able to confirm that we have extensive control over the offset where the uninitialized pointer is loaded from, even though none of these fields are being initialized. At this point, it would be useful to see if we can place arbitrary data into this memory region. The earlier test with 0xbadc0de seemed to indicate that we could, but the offset appeared to change with each run of the test case. Often, exploits get around that by spraying values. The rationale is that if we can't accurately trap our target to a given location, we can instead just make our target bigger. In practice, we can try spraying values by using inline arrays:

Looking at the crash dump, we see:

The crash is essentially the same as previously, but if we look at the memory where our uninitialized data is coming from, we see:

We now have a big block of arbitrary script-controlled values at an offset from r11. Combining this observation with the previous one about offsets, we can come up with something even better:

The result is that we are now dereferencing a float value arbitrary from an arbitrary address:

This, of course, is extremely powerful: it immediately results in an arbitrary read primitive, impractical as it may be. Unfortunately, an arbitrary read primitive without an initial info leak is not that useful: we need to know which addresses to read from in order to make use of it.

As the variable v can be anything we want it to be, we can replace it with an object, and then read its internal fields. For example, we can replace the call to Object.toString() with a custom callback, and replace v with a DataView object, reading back the address of that object's backing store. This produces a way for us to locate fully script-controlled data:

The above code returns (modulo ASLR):

Using WinDbg we can validate that this is indeed the backing store for our buffer:

Once again, this is an incredibly powerful primitive, and we could use it to leak just about any field from an object, as well as the address of any JavaScript object, as those are sometimes stored as fields in other objects.

Building an arbitrary read/write primitive

Being able to place arbitrary data at a known address means we can unlock an even more powerful primitive: the ability to create arbitrary JavaScript objects. Just changing the type of the field being read from a float to an object makes it possible for us to read an object pointer from anywhere in memory, including a buffer whose address is known to us. We can test this by using WinDbg to place controlled data at a known address (the same primitive we just developed above) using the following commands:

This places an SMI representing the integer 0xbadc0de at the location where our arbitrary object pointer would be loaded from. Since we didn't set the least significant bit, it will be interpreted by V8 as an inline integer:

As expected, V8 prints the following output:

Given this, we have the ability to create arbitrary objects. From there, we can put together a convenient arbitrary read/write primitive by creating fake DataView and ArrayBuffer objects. We again place our fake object data at a known location using WinDbg:

We then test it with the following JavaScript:

As expected, the call to DataView.prototype.setUint32 triggers a crash, attempting to write value 0xdeadcafe to address 0x00000badbeefc0de:

Controlling the address where the data will be written to or read from is just a matter of modifying the obj.arraybuffer.backing_store slot populated through WinDbg. Since in the case of a real exploit that memory would be part of the backing store of a real ArrayBuffer object, doing so wouldn’t be difficult. For example, a write primitive might look like this:

With this, we can reliably read and write arbitrary memory locations in the Chrome renderer process from JavaScript.

Achieving Arbitrary Code Execution

Achieving code execution in the renderer process, given an arbitrary read/write primitive, is relatively easy. At the time of writing, V8 allocates its JIT code pages with read-write-execute (RWX) permissions, meaning that getting code execution can be done by locating a JIT code page, overwriting it, and then calling into it. In practice, this is achieved by using our info leak to locate the address of a JavaScript function object and reading its function entrypoint field. Once we've placed our code at that entrypoint, we can call the JavaScript function for the code to execute. In JavaScript, this might look like:

It is worth noting that even if V8 did not make use of RWX pages, it would still be easy to trigger the execution of a Return Oriented Programming (ROP) chain due to the lack of control flow integrity checks. In that scenario, we could, for example, overwrite a JavaScript function object's entrypoint field to point to the desired gadget (likely a stack pivot of some kind) and then make the function call.

Neither of those techniques would be directly applicable to Microsoft Edge, which features both CFG and ACG. ACG, which was introduced in Windows 10 Creators Update, enforces strict Data Execution Prevention (DEP) and moves the JIT compiler to an external process. This creates a strong guarantee that attackers cannot overwrite executable code without first somehow compromising the JIT process, which would require the discovery and exploitation of additional vulnerabilities.

CFG, on the other hand, guarantees that indirect call sites can only jump to a certain set of functions, meaning they can’t be used to directly start ROP execution. Creators Update also introduced CFG export suppression, which significantly reduced the set of valid CFG indirect call targets by removing most exported functions from the valid target set. All these mitigations and others make the exploitation of RCE vulnerabilities in Microsoft Edge that much more complex.

The dangers of RCE

Being a modern web browser, Chrome adopts a multi-process model. There are several process types involved: the browser process, the GPU process, and renderer processes. As its name indicates, the GPU process brokers interactions between the GPU and all the processes that need to use it, while the browser is the global manager process that brokers access to everything from file system to networking.

Each renderer is meant to be the brains behind one or more tabs—it takes care of parsing and interpreting HTML, JavaScript, and the like. The sandboxing model makes it so that these processes only have access to as little as they need to function. As such, a full persistent compromise of the victim's system is not possible from the renderer without finding a secondary bug to escape that sandbox.

With that in mind, we thought it would be interesting to examine what might be possible for an attacker to achieve without a secondary bug. Although most tabs are isolated within individual processes, that's not always the case. For example, if you’re on bing.com and use the JavaScript developer console (which can be opened by pressing F12) to run window.open('https://microsoft.com'), a new tab will open, but it will typically fall into the same process as the original tab. This can be seen by using Chrome's internal task manager, which can be opened by pressing Shift + Escape:

This is an interesting observation, because it indicates that renderer processes are not locked down to any single origin. This means that achieving arbitrary code execution within a renderer process can give attackers the ability to access other origins. While attackers gaining the ability to bypass the Single Origin Policy (SOP) in such a way may not seem like a big deal, the ramifications can be significant:

  • Attackers can steal saved passwords from any website by hijacking the PasswordAutofillAgent interface.
  • Attackers can inject arbitrary JavaScript into any page (a capability known as universal cross-site scripting, or UXSS), for example, by hijacking the blink::ClassicScript::RunScript method.
  • Attackers can navigate to any website in the background without the user noticing, for example, by creating stealthy pop-unders. This is possible because many user-interaction checks happen in the renderer process, with no ability for the browser process to validate. The result is that something like ChromeContentRendererClient::AllowPopup can be hijacked such that no user interaction is required, and attackers can then hide the new windows. They can also keep opening new pop-unders whenever one is closed, for example, by hooking into the onbeforeunload window event.

A better implementation of this kind of attack would be to look into how the renderer and browser processes communicate with each other and to directly simulate the relevant messages, but this shows that this kind of attack can be implemented with limited effort. While the democratization of two-factor authentication mitigates the dangers of password theft, the ability to stealthily navigate anywhere as that user is much more troubling, because it can allow an attacker to spoof the user’s identity on websites they’re already logged into.

Conclusion

This kind of attack drives our commitment to keep on making our products secure on all fronts. With Microsoft Edge, we continue to both improve the isolation technology and to make arbitrary code execution difficult to achieve in the first place. For their part, Google is working on a site isolation feature which, once complete, should make Chrome more resilient to this kind of RCE attack by guaranteeing that any given renderer process can only ever interact with a single origin. A highly experimental version of this site isolation feature can be enabled by users through the chrome://flags interface.

We responsibly disclosed the vulnerability that we discovered along with a reliable RCE exploit to Google on September 14, 2017. The vulnerability was assigned CVE-2017-5121, and the report was awarded a $7,500 bug bounty by Google. Along with other bugs our team reported but didn’t exploit, the total bounty amount we were awarded was $15,837. Google matched this amount and donated $30,000 to Denise Louie Education Center, our chosen organization in Seattle. The bug tracker item for the vulnerability described in this article is still private at time of writing.

Servicing security fixes is an important part of the process and, to Google’s credit, their turnaround was impressive: the bug fix was committed just four days after the initial report, and the fixed build was released three days after that. However, it’s important to note that the source code for the fix was made available publicly on Github before being pushed to customers. Although the fix for this issue does not immediately give away the underlying vulnerability, other cases can be less subtle.

For example, this security bug tracker item was also kept private at the time, but the public fix made the vulnerability obvious, especially as it came with a regression test. This can be expected of an open source project, but it is problematic when the vulnerabilities are made known to attackers ahead of the patches being made available. In this specific case, the stable channel of Chrome remained vulnerable for nearly a month after that commit was pushed to git. That is more than enough time for an attacker to exploit it. Some Microsoft Edge components, such as Chakra, are also open source. Because we believe that it’s important to ship fixes to customers before making them public knowledge, we only update the Chakra git repository after the patch has shipped.

Our strategies may differ, but we believe in collaborating across the security industry in order to help protect customers. This includes disclosing vulnerabilities to vendors through Coordinated Vulnerability Disclosure (CVD), and partnering throughout the process of delivering security fixes.

 

Jordan Rabet

Microsoft Offensive Security Research team

 

 


Talk to us

Questions, concerns, or insights on this story? Join discussions at the Microsoft community.

Follow us on Twitter @MMPC and Facebook Microsoft Malware Protection Center

 

 

Exploring the crypt: Analysis of the WannaCrypt ransomware SMB exploit propagation

(Note: Read our latest comprehensive report on ransomware: Ransomware 1H 2017 review: Global outbreaks reinforce the value of security hygiene.)

 

On May 12, there was a major outbreak of WannaCrypt ransomware. WannaCrypt directly borrowed exploit code from the ETERNALBLUE exploit and the DoublePulsar backdoor module leaked in April by a group calling itself Shadow Brokers.

Using ETERNALBLUE, WannaCrypt propagated as a worm on older platforms, particularly Windows 7 and Windows Server 2008 systems that haven't patched against the SMB1 vulnerability CVE-2017-0145. The resulting ransomware outbreak reached a large number of computers, even though Microsoft released security bulletin MS17-010 to address the vulnerability on March 14, almost two months before the outbreak.

This post—complementary to our earlier post about the ETERNALBLUE and ETERNALROMANCE exploits released by Shadow Brokers—takes us through the WannaCrypt infection routine, providing even more detail about post-exploitation phases. It also describes other existing mitigations as well as new and upcoming mitigation and detection techniques provided by Microsoft to address similar threats.

Infection cycle

The following diagram summarizes the WannaCrypt infection cycle: initial shellcode execution, backdoor implantation and package upload, kernel and userland shellcode execution, and payload launch.

Figure 1. Infection cycle overview

Figure 1. WannaCrypt infection cycle overview

The file mssecsvc.exe contains the main exploit code, which launches a network-level exploit and spawns the ransomware package. The exploit code targets a kernel-space vulnerability and involves multi-stage shellcode in both kernel and userland processes. Once the exploit succeeds, communication between the DoublePulsar backdoor module and mssecsvc.exe is encoded using a pre-shared XOR key, allowing transmission of the main payload package and eventual execution of ransomware code.

Exploit and initial shellcodes

In an earlier blog post, Viktor Brange provided a detailed analysis of the vulnerability trigger and the instruction pointer control mechanism used by ETERNALBLUE. After the code achieves instruction pointer control, it focuses on acquiring persistence in kernel space using kernel shellcode and the DoublePulsar implant. It then executes the ransomware payload in user space.

Heap spray

The exploit code sprays memory on a target computer to lay out space for the first-stage shellcode. It uses non-standard SMB packet segments to make the allocated memory persistent on hardware abstraction layer (HAL) memory space. It sends 18 instances of heap-spraying packets, which have direct binary representations of the first-stage shellcode.

Figure 2. Shellcode heap-spraying packet

Figure 2. Shellcode heap-spraying packet

Initial shellcode execution: first and second stages

The exploit uses a function-pointer overwrite technique to direct control flow to the first-stage shellcode. This shellcode installs a second-stage shellcode as a SYSENTER or SYSCALL routine hook by overwriting model-specific registers (MSRs). If the target system is x86-based, it hooks the SYSENTER routine by overwriting IA32_SYSENTER_EIP. On x64-based systems, it overwrites IA32_LSTAR MSR to hook the SYSCALL routine. More information about these MSRs can be found in Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3C.

Figure 3. First-stage shellcode for x86 systems

Figure 3. First-stage shellcode for x86 systems

Originally, the IA32_SYSENTER_EIP contains the address to nt!KiFastCallEntry as its SYSENTER routine.

Figure 4. Original IA32_SYSENTER_EIP value pointing to KiFastCallEntry

Figure 4. Original IA32_SYSENTER_EIP value pointing to KiFastCallEntry

After modification by the first-stage shellcode, IA32_SYSENTER_EIP now points to the second-stage shellcode.

Figure 5. Modified IA32_SYSENTER_EIP value points to the main shellcode

Figure 5. Modified IA32_SYSENTER_EIP value points to the main shellcode

The first-stage shellcode itself runs in DISPATCH_LEVEL. By running the second-stage shellcode as the SYSENTER routine, the first-stage code guarantees that the second-stage shellcode runs in PASSIVE_LEVEL, giving it access to a broader range of kernel APIs and paged-out memory. And although the second-stage shellcode delivered with this malware actually doesn't access any paged pools or call APIs that require running in PASSIVE_LEVEL, this approach allows attackers to reuse the same module for more complicated shellcode.

Backdoor implantation

The second-stage shellcode, now running on the targeted computer, generates a master XOR key for uploading the payload and other communications. It uses system-specific references, like addresses of certain APIs and structures, to randomize the key.

Figure 6. Master XOR key generation

Figure 6. Master XOR key generation

The second-stage shellcode implants DoublePulsar by patching the SMB1 Transaction2 dispatch table. It overwrites one of the reserved command handlers for the SESSION_SETUP (0xe) subcommand of the Transaction2 request. This subcommand is reserved and not commonly used in regular code.

Figure 7. Copying packet-handler shellcode and overwriting the dispatch table

Figure 7. Copying packet-handler shellcode and overwriting the dispatch table

The following code shows the dispatch table after the subcommand backdoor is installed.

Figure 8. Substitution of 0xe command handler

Figure 8. Substitution of 0xe command handler

Main package upload

To start uploading its main package, WannaCrypt sends multiple ping packets to the target, testing if its server hook has been installed. Remember that the second-stage shellcode runs as a SYSENTER hook—there is a slight delay before it runs and installs the dispatch-table backdoor. The response to the ping packet contains the randomly generated XOR master key to be used for communication between the client and the targeted server.

Figure 9. Code that returns original XOR key

Figure 9. Code that returns original XOR key

This XOR key value is used only after some bit shuffling. The shuffling algorithm basically looks like the following Python code.

Figure 10. XOR bit-shuffling code

Figure 10. XOR bit-shuffling code

The upload of the encoded payload consists of multiple packets as shown below.

Figure 11. SMB Transaction2 packet showing payload upload operation

Figure 11. SMB Transaction2 packet showing payload upload operation

The hooked handler code for the unimplemented subcommand processes the packet bytes, decoding them using the pre-shared XOR key. The picture above shows that the SESSION_SETUP parameter fields are used to indicate the offset and total lengths of payload bytes. The data is 12 bytes long—the first four bytes indicate total length, the next four bytes is reserved, and the last 4 bytes are the current offsets of the payload bytes in little-endian. These fields are encoded with master XOR key.

Because the reserved field is supposed to be 0, the reserved field is actually the same as the master XOR key. Going back to the packet capture above, the reserved field value is 0x38a9dbb6, which is the master XOR key. The total length is encoded as 0x38f9b8be. When this length is XORed with the master XOR key, it is 0x506308, which is the actual length of the payload bytes being uploaded. The last field is 0x38b09bb6. When XORed with the master key, this last field becomes 0, meaning this packet is the first packet of the payload upload.

When all the packets are received, the packet handler in the second-stage shellcode jumps to the start of the decoded bytes.

Figure 12. Decoding and executing shellcode

Figure 12. Decoding and executing shellcode

The transferred and decoded bytes are of size 0x50730c. As a whole, these packet bytes include kernel shellcode, userland shellcode, and the main WannaCrypt PE packages.

Executing the kernel shellcode

The kernel shellcode looks for a kernel image base and resolves essential functions by parsing PE structures. The following figure shows the APIs resolved by the shellcode:

Figure 13. Resolved kernel functions

Figure 13. Resolved kernel functions

It uses ZwAllocateVirtualMemory to allocate a large chunk of RWX memory (0x506d70 in this case). This memory holds the userland shellcode and the main PE packages.

Figure 14. RWX memory allocation through ZwAllocateVirtualMemory

Figure 14. RWX memory allocation through ZwAllocateVirtualMemory

The kernel shellcode goes through processes on the system and injects userland shellcode to the lsass.exe process using an asynchronous procedure call (APC).

Figure 15. APC routines for injecting shellcode to a thread in a userland process

Figure 15. APC routines for injecting shellcode to a thread in a userland process

Userland shellcode—the start of a new infection cycle

After multiple calls to VirtualProtect and PE layout operations, the shellcode loads a bootstrap DLL using a reflective DLL loading method. The WannaCrypt user-mode component contains this bootstrap DLL for both 64- and 32-bit Windows.

Figure 16. Bootstrap DLL functions

Figure 16. Bootstrap DLL functions

This bootstrap DLL reads the main WannaCrypt payload from the resource section and writes it to a file C:\WINDOWS\mssecsvc.exe. It then launches the file using the CreateProcess API. At this stage, a new infection cycle is started on the newly infected computer.

 

Figure 17. Dropping main payload to file system

Figure 17. Dropping main payload to file system

Figure 18. Creating the main payload process

Figure 18. Creating the main payload process

Mitigating and detecting WannaCrypt

WannaCrypt borrowed most of its attack code from those leaked by Shadow Brokers, specifically the ETERNALBLUE kernel exploit code and the DoublePulsar kernel-level backdoor. It leverages DoublePulsar's code execution mechanisms and asynchronous procedure calls (APCs) at the kernel to deliver its main infection package and ransomware payload. It also uses the system file lsass.exe as its injection target.

Mitigation on newer platforms and upcoming SMB updates

The ETERNALBLUE exploit code worked only on older OSes like Windows 7 and Windows Server 2008, particularly those that have not applied security updates released with security bulletin MS17-010. The exploit was limited to these platforms because it depended on executable memory allocated in kernel HAL space. Since Windows 8 and Windows Server 2012, HAL memory has stopped being executable. Also, for additional protection, predictable addresses in HAL memory space have been randomized since Windows 10 Creators Update.

With the upcoming Windows 10 Fall Creators Update (also known as RS3), many dispatch tables in legacy SMB1 drivers, including the Transaction2 dispatch table (SrvTransaction2DispatchTable) memory area, will be set to read-only as a defense-in-depth measure. The backdoor mechanism described here will be much less attractive to attackers because the mechanism will require additional exploit techniques for unlocking the memory area and overwriting function pointers. Furthermore, SMB1 has already been deprecated for years. With the RS3 releases for Windows 10 and Windows Server 2016, SMB1 will be disabled.

Hyper Guard virtualization-based security

WannaCrypt employs multiple techniques to achieve full code execution on target systems. The IA32_SYSENTER_EIP modification technique used by WannaCrypt to run the main shellcode is actually commonly observed when kernel rootkits try to hook system calls. Kernel Patch Protection (or PatchGuard) typically detects this technique by periodically checking for modifications of MSR values. WannaCrypt hooking, however, is too brief for PatchGuard to fire. Windows 10, armed with virtualization-based security (VBS) technologies such as Hyper Guard, can detect and mitigate this technique because it fires as soon as the malicious wrmsr instruction to modify the MSR is executed.

To enable Hyper Guard on systems with supported processors, use Secure Boot and enable Device Guard. Use the hardware readiness tool to check if your hardware system supports Device Guard. Device Guard runs on the Enterprise and Education editions of Windows 10.

Post-breach detection with Windows Defender ATP

In addition to VBS mitigation provided with Hyper Guard, Windows Defender Advanced Threat Protection (Windows Defender ATP) can detect injection of code to userland processes, including the method used by WannaCrypt. Our researchers have also added new detection logic so that Windows Defender ATP flags highly unusual events that involve spawning of processes from lsass.exe.

Figure 19. Windows Defender ATP detection of an anomalous process spawned from a system process

Figure 19. Windows Defender ATP detection of an anomalous process spawned from a system process

While the detection mechanism for process spawning was pushed out in response to WannaCrypt, this mechanism and detection of code injection activities also enable Windows Defender ATP customers to uncover sophisticated breaches that leverage similar attack methods.

Matt Oh
Windows Defender ATP Research Team