Category Archives: Research & Reports

Imperva Cloud WAF and Graylog, Part II: How to Collect and Ingest SIEM Logs

This guide gives step-by-step guidance on how to collect and parse Imperva Cloud Web Application Firewall (WAF, formerly Incapsula) logs into the Graylog SIEM tool. Read Part I to learn how to set up a Graylog server in AWS and integrate with Imperva Cloud WAF.

This guide assumes:

  • You have a clean Graylog server up and running, as described in my earlier blog article
  • You are pushing (or pulling) the Cloud WAF SIEM logs into a folder within the Graylog server
  • You are not collecting the logs yet

Important! The steps below apply for the following scenario:

  • Deployment as a stand-alone EC2 in AWS
  • Single-server setup, with the logs located on the same server as Graylog
  • The logs are pushed to the server uncompressed and unencrypted

Although this blog was created for a deployment on AWS, most of the steps below apply. Other setups (other clouds, on-premises) will require a few networking changes from the guide below.

This article will detail all the steps to configure the log collector and parser in few major steps:

  • Step 1: Install the sidecar collector package for Graylog
  • Step 2: Configure a new log collector in Graylog
  • Step 3: Creating a log Input & extractor with Incapsula content pack for Graylog (the json with the parsing rules)

Step 1: Install the Sidecar Collector Package

  1. Install the Graylog sidecar collector

Let’s first download the appropriate package. Identify the right sidecar collector package suited for our deployment in Github:

Since we are deploying Graylog 2.5.x, the corresponding sidecar collector version is 0.1.x.

Go ahead and install the relevant package.

We are running a 64bits server and .deb work best with Debian/Ubuntu machines.

Run the following commands:

  1. cd /tmp   #or any directory you would like to use

Download the right package:

2. curl -L -O

  1. Install the package:

sudo dpkg -i collector-sidecar_0.1.7-1_amd64.deb  

2. Configure sidecar collector

cd /etc/graylog/collector-sidecar

sudo nano collector_sidecar.yml

Now let’s change the server URL to the local server IP (local IP, not AWS public IP). And let’s add incapsula-logs to the tags:

And let’s install and start graylog-collector-sidecar service with the following commands:

sudo graylog-collector-sidecar -service installsudo systemctl start collector-sidecar 

Step 2: Configure a New Log Collector in Graylog

3. Add a collector in Graylog

Follow the steps below to add a collector:

  • System > Collectors
  • Click Manage Configurations
  • Then click Create configuration

Let’s name it and then click on the newly-created collector:

4. Add Incapsula-logs as a tag

5. Configure the Output and then the Input of the collectors

Click on Create Output and configure the collector as below:

Now click on Create Input and configure the collector as below:

Step 3: Creating Log Inputs and Extractors with Incapsula (now named Imperva Cloud Web Application Firewall) Content Pack for Graylog

6. Let’s now launch a new Input as below:

And configure the Beats collector inputs as required:

The TLS details are not mandatory at this stage as we will work with unencrypted SIEM logs for this blog.

7. Download the Incapsula SIEM package for Graylog in Github

Reach the following link.

Retrieve the json configuration of the package. It includes all the Imperva Cloud Web Application Firewall (formerly Incapsula) parsing rules of event fields which will allows an easy import within Graylog along with clear naming.

Extract the content_pack.json file and import it as an extractor in Graylog.

Go to System/ Content packs and import the json file you just downloaded:

The content pack will use our legacy name and we can now apply the new content pack as below:

The new content pack will be displayed in Graylog System / Input menu from which you can extract its content by clicking “Manage extractors”:

And you can now import it in our predefined extractors  to the input we previously configured.

Paste the content of Incapsula content pack extractor:

If all works as expected, you should get the confirmation as below:

You should now see that the headers that will be parsed by Graylog have been successfully imported and will have appropriate naming as can be seen in a screenshot below:

Field extractors for Incapsula (or Imperva Cloud WAF) events in Graylog SIEM:

8. Restart the sidecar collector service

Once all is configured you can restart the sidecar service with the following command in the server command line:

sudo systemctl restart collector-sidecar

We can also enforce sidecar collector to run at the server startup:

sudo systemctl enable collector-sidecar 

Let’s check that the collector is active and the service is properly running:

The collector service should now appear as active in Graylog:

9. Check that you see messages and logs

Click on the Search bar. After a few minutes you should start to see the logs displayed.

Give it 10-15 minutes before troubleshooting if you don’t see messages displayed immediately.

You can see on the left panel that the filters and retrieved headers are in line with Imperva. Client_IP are retrieved from the Incap-Client-IP and are the real client IPs, Client App are the client classification detected by Imperva Cloud WAF etc…

The various headers are explained in the following headers that you can see:

10. Congratulations!

Congratulations, we are now successfully exporting, collecting and parsing Imperva Cloud WAF/Incapsula SIEM logs!

In the next article, we will review the imported Imperva Cloud WAF dashboard template.

If you have suggestions for improvements or updates in any of the steps, please share with the community in the comments below.

The post Imperva Cloud WAF and Graylog, Part II: How to Collect and Ingest SIEM Logs appeared first on Blog.

Now-Patched Google Photos Vulnerability Let Hackers Track Your Friends and Location History

A now-patched vulnerability in the web version of Google Photos allowed malicious websites to expose where, when, and with whom your photos were taken.

A now-patched vulnerability in the web version of Google Photos allowed  malicious websites to expose where, when, and with whom your photos were taken.


One trillion photos were taken in 2018. With image quality and file size increasing, it’s obvious why more and more people choose to host their photos on services like iCloud, Dropbox and Google Photos.

One of the best features of Google Photos is its search engine. Google Photos automatically tags all your photos using each picture’s metadata (geographic coordinates, date, etc.) and a state-of-the-art AI engine, capable of describing photos with text, and detecting objects and events such as weddings, waterfalls, sunsets and many others. If that’s not enough, facial recognition is also used to automatically tag people in photos. You could then use all this information in your search query just by writing “Photos of me and Tanya from Paris 2018”.

The Threat

I’ve used Google Photos for a few years now, but only recently learned about its search capabilities, which prompted me to check for side-channel attacks. After some trial and error, I found that the Google Photos search endpoint is vulnerable to a browser-based timing attack called Cross-Site Search (XS-Search).

In my proof of concept, I used the HTML link tag to create multiple cross-origin requests to the Google Photos search endpoint. Using JavaScript, I then measured the amount of time it took for the onload event to trigger. I used this information to calculate the baseline time — in this case, timing a search query that I know will return zero results.

Next, I timed the following query “photos of me from Iceland” and compared the result to the baseline. If the search time took longer than the baseline, I could assume the query returned results and thus infer that the current user visited Iceland.

As I mentioned above, the Google Photos search engine takes into account the photo metadata. So by adding a date to the search query, I could check if the photo was taken in a specific time range. By repeating this process with different time ranges, I could quickly approximate the time of the visit to a specific place or country.

Attack Flow

The video below demonstrates how a 3rd-party site can use time measurements to extract the names of the countries you took photos in. The first bar in the video named “controlled” represents the baseline of an empty results page timing. Any time measurement above the  baseline indicates a non-empty result timing, i.e., the current user has visited the queried country.

For this attack to work, we need to trick a user into opening a malicious website while logged into Google Photos. This can be done by sending a victim a direct message on a popular messaging service or email, or by embedding malicious Javascript inside a web ad. The JavaScript code will silently generate requests to the Google Photos search endpoint, extracting Boolean answers to any query the attacker wants.

This process can be incremental, as the attacker can keep track of what has already been asked and continue from there the next time you visit one of his malicious websites.

You can see below the timing function I implemented for my proof of concept:

Below is the code I used to demonstrate how users’ location history can be extracted.

Closing Thoughts

As I said in my previous blog post, it is my opinion that browser-based side-channel attacks are still overlooked. While big players like Google and Facebook are catching up, most of the industry is still unaware.

I recently joined an effort to document those attacks and vulnerable DOM APIs. You can find more information on the xsleaks repository (currently still under construction).

As a researcher, it was a privilege to contribute to protecting the privacy of the Google Photos user community, as we continuously do for our own Imperva customers.


Imperva is hosting a live webinar with Forrester Research on Wednesday March 27 1 PM PT on the topic, “Five Best Practices for Application Defense in Depth.” Join Terry Ray, Imperva SVP and Imperva Fellow, Kunal Anand, Imperva CTO, and Forrester principal analyst Amy DeMartine as they discuss how the right multi-layered defense strategy bolstered by real-time visibility to help security analysts distinguish real threats from noise can provide true protection for enterprises. Sign up to watch and ask questions live or see the recording!

The post Now-Patched Google Photos Vulnerability Let Hackers Track Your Friends and Location History appeared first on Blog.

How Our Threat Analytics Multi-Region Data Lake on AWS Stores More, Slashes Costs

Data is the lifeblood of digital businesses, and a key competitive advantage. The question is: how can you store your data cost-efficiently, access it quickly, while abiding by privacy laws?

At Imperva, we wanted to store our data for long-term access. Databases would’ve cost too much in disk and memory, especially since we didn’t know much it would grow, how long we would keep it, and which data we would actually access in the future. The only thing we did know? That new business cases for our data would emerge.

That’s why we deployed a data lake. It turned out to be the right decision, allowing us to store 1,000 times more data than before, even while slashing costs.

What is a data lake?

A data lake is a repository of files stored in a distributed system. Information is stored in its native form, with little or no processing. You simply store the data in its native formats, such as JSON, XML, CSV, or text.

Analytics queries can be run against both data lakes and databases. In a database you create a schema, plan your queries, and add indices to improve performance. In a data lake, it’s different — you simply store the data and it’s query-ready.

Some file formats are better than others, of course. Apache Parquet allows you to store records in a compressed columnar file. The compression saves disk space and IO, while the columnar format allows the query engine to scan only the relevant columns. This reduces query time and costs.

Using a distributed file system lets you store more data at a lower cost. Whether you use Hadoop HDFS, AWS S3, or Azure Storage, the benefits include:

  • Data replication and availability
  • Options to save more money – for example, AWS S3 has different storage options with different costs
  • Retention policy – decide how long you want to keep your data before it’s automatically deleted

No wonder experts such as Adrian Cockcroft, VP of cloud architecture strategy at Amazon Web Services, said this week that “cloud data lakes are the future.”

Analytic queries: data lake versus database

Let’s examine the capabilities, advantages and disadvantages of a data lake versus a database.

The data

A data lake supports structured and unstructured data and everything in-between. All data is collected and immediately ready for analysis. Data can be transformed to improve user experience and performance. For example, fields can be extracted from a data lake and data can be aggregated.

A database contains only structured and transformed data. It is impossible to add data without declaring tables, relations and indices. You have to plan ahead and transform the data according to your schema.

Figure 1: Data Lake versus Database

The Users

Most users in a typical organization are operational, using applications and data in predefined and repetitive ways. A database is usually ideal for these users. Data is structured and optimized for these predefined use-cases. Reports can be generated, and filters can be applied according to the application’s design.

Advanced users, by contrast, may go beyond an application to the data source and use custom tools to process the data. They may also bring in data from outside the organization.

The last group are the data experts, who do deep analysis on the data. They need the raw data, and their requirements change all the time.

Data lakes support all of these users, but especially advanced and expert users, due to the agility and flexibility of a data lake.

Figure 2: Typical user distribution inside an organization

Query engine(s)

In a database, the query engine is internal and is impossible to change. In a data lake, the query engine is external, letting users choose based on their needs. For example, you can choose Presto for SQL-based analytics and Spark for machine learning.

Figure 3: A data lake may have multiple external query engines. A database has a single internal query engine.

Support of new business use-case

Database changes may be complex. Data should be analyzed and formatted, while schema has to be created before data can be inserted. If you have a busy development team, users can wait months or a year to see the new data in their application.

Few businesses can wait this long. Data lakes solve this by letting users go beyond the structure to explore data. If this proves fruitful, than a formal schema can be applied. You get to results quickly, and fail fast. This agility lets organizations quickly improve their use cases, better know their data, and react fast to changes.

Figure 4: Support of new business use-case

Data lake structure

Here’s how data may flow inside a data lake.

Figure 5: Data lake structure and flow

In this example, CSV files are added to the data lake to a “current day” folder. This folder is the daily partition which allows querying a day’s data using a filter like day = ‘2018-1-1’. Partitions are the most efficient way to filter data.

The data under tables/events is an aggregated, sorted and formatted version of the CSV data. It uses the parquet format to improve query performance and for compression. It also has an additional “type” partition, because most queries work only on a single event type. Each file has millions of records inside, with metadata for efficiency. For example, you can know the count, min and max values for all of the columns without scanning the file.

This events table data has been added to the data lake after the raw data has been validated and analyzed.

Here is a simplified example of CSV to Parquet conversion:

Figure 6: Example for conversion of CSV to Parquet

Parquet files normally hold large number of records, and can be divided internally into “row groups” which have their own metadata. Repeating values improves compression and the columnar structure allows scanning only the relevant columns. The CSV data can be queried at any time, but it is not as efficient as querying the data under the tables/events data.

Flow and Architecture


Imperva’s data lake uses Amazon Web Services (AWS). Below shows the flow and services we used to build it.

Figure 7: Architecture and flow

Adding data (ETL – Extract -> Transform -> Load)

  • We use Kafka, which is a producer-consumer distributed streaming platform. Data is added to Kafka, and later read by a microservice which create raw Parquet files in S3.
  • Another microservice uses AWS Athena to hourly or daily process the data – filter, partition, and sort and aggregate it into new Parquet files
  • This flow is done on each of the AWS regions we support

Figure 8: SQL to Parquet flow example

Technical details:

  • Each partition creation is done by one or more Athena:
  • Each query result with one more more Parquet files
  • ETL microservices run on a Kubernetes cluster per region. They are developed and deployed using our development pipeline.


  • Different microservices consume the aggregated data using Athena API through boto3 Python library
  • Day to day queries are done using SQL client like DBeaver with Athena JDBC driver. Athena AWS management console is also used for SQL queries
  • Apache Spark engine is used to run spark queries, including machine learning using the spark-ml Apache Zeppelin is used as a client to run scripts and display visualization. Both Spark and Zeppelin are installed as part of AWS EMR service.

Multi-region queries

Data privacy regulations such as GDPR add a twist, especially since we store data in multiple regions. There are two ways to perform multi-region queries:

  • Single query engine based in one of the regions
  • Query engine per region – get results per region and perform an aggregation

With a single query engine you can run SQL on data from multiple regions, BUT data is transferred between regions, which means you pay both in performance and cost.

With a query engine per region you have to aggregate the results, which may not be a simple task.

With AWS Athena – both options are available, since you don’t need to manage your own query engine.

Threat Analytics Data Lake – before and after

Before the data lake, we had several database solutions – relational and big data. The relational database couldn’t scale, forcing us to delete data or drop old tables. Eventually, we did analytics on a much smaller part of the data than we wanted.

With the big data solutions, the cost was high. We needed dedicated servers, and disks for storage and queries. That’s overkill: we don’t need server access 24/7, as daily batch queries work fine. We also did not have strong SQL capabilities, and found ourselves deleting data because we did not to pay for more servers.

With our data lake, we get better analytics by:

  • Storing more data (billions of records processed daily!), which is used by our queries
  • Using SQL capabilities on a large amount of data using Athena
  • Using multiple query engines with different capabilities, like Spark for machine learning
  • Allowing queries on multiple regions for an average, acceptable response time of just 3 seconds

In addition we also got the following improvements:

  • Huge cost reductions in storage and compute
  • Reduced server maintenance

In conclusion – a data lake worked for us. AWS services made it easier for us to get the results we wanted at an incredibly low cost. It could work for you, depending on factors such as the amount of data, its format, use cases, platform and more. We suggest learning your requirements and do a proof-of-concept with real data to find out!

The post How Our Threat Analytics Multi-Region Data Lake on AWS Stores More, Slashes Costs appeared first on Blog.

How to Deploy a Graylog SIEM Server in AWS and Integrate with Imperva Cloud WAF

Security Information and Event Management (SIEM) products provide real-time analysis of security alerts generated by security solutions such as Imperva Cloud Web Application Firewall (WAF). Many organizations implement a SIEM solution to bring visibility of all security events from various solutions and to have the ability to search them or create their own dashboard.

Note that a simpler alternative to SIEM is Imperva Attack Analytics, which reduces the burden of integrating a SIEM logs solution and provides a condensed view of all security events into comprehensive narratives of events rated by severity. A demo of Imperva Attack Analytics is available here.

This article will take you step-by-step through the process of deploying a Graylog server that can ingest Imperva SIEM logs and let you review your data. They are:

  • Step 1: Deploy a new Ubuntu server on AWS
  • Step 2: Install java, Mongodb, elasticsearch
  • Step 3: Install Graylog
  • Step 4: Configure the SFTP server on the AWS server
  • Step 5: Start pushing SIEM logs from Imperva Incapsula

The steps apply to the following scenario:

  • Deployment as a stand-alone EC2 on AWS
  • Installation from scratch, from a clean Ubuntu machine (not a graylog AMI in AWS)
  • Single server setup, where the logs are located in the same server as Graylog
  • Push of the logs from Imperva using SFTP

Most of the steps below also apply to any setup or cloud platforms besides AWS. Note that in AWS, a Graylog AMI image does exist, but only with Ubuntu 14 at the time of writing. Also, I will publish future blogs on how to parse your Imperva SIEM logs and how to create a dashboard to read the logs.

Step 1: Deploy an Ubuntu Server on AWS

As a first step, let’s deploy an Ubuntu machine in AWS with the 4GB RAM required to deploy Graylog.

  1. Sign in to the AWS console and click on EC2
  2. Launch an instance and select.
  3. Select Ubuntu server 16.04, with no other software pre-installed.

It is recommended to use Ubuntu 16.04 and above, as some repo are already pre-included such as MongoDB and Java openjdk-8-jre, which simplifies the installation process. The command lines below apply for Ubuntu 16.04 (systemctl command, for instance, is not applicable for Ubuntu 14).

4. Select the Ubuntu server with 4GB RAM.

4GB is the minimum for Graylog, but you might consider more RAM depending on the volume of the data that you plan to gather.

5. Optional: increase the disk storage.

 Since we will be collecting logs, we will need more storage than the default space. The storage volume will depend a lot on the site traffic and the type of logs you will retrieve (all traffic logs or only security events logs).

Note that you will likely require much more than 40GB. If you are deploying on AWS, you can easily increase the capacity of your EC2 server anytime.

6. Select an existing key pair so you can connect to your AWS server via SSH later.

If you do not have an existing SSH key pair in your AWS account, you can create it using the ssh-keygen tool, which is part of the standard openSSH installation or using puttygen on Windows. Here’s a guide to creating and uploading your SSH key pairs.

7. Give your EC2 server a clear name and identify its public DNS and IPv4 addresses.

8. Configure the server security group in AWS.

Let’s make sure that port 9000 in particular is open. You might need to open other ports if logs are forwarded from another log collector, such as port 514 or 5044.

It is best practice that you open port 22 only from Cloud WAF IP (this link) or from your IP. Prevent from opening port 22 to the world.

You can also consider locking the UI access to your public IP only.

9. SSH to your AWS server with the Ubuntu user, after uploading your key in Putty and putting the AWS public DNS entry.

10. Update your Ubuntu system to the latest versions and updates.

sudo apt-get update

sudo apt-get upgrade

Select “y” when prompted or the default options offered.

Step 2: Install Java, MongoDB and Elasticsearch

11. Install additional packages including Java JDK.

sudo apt-get install apt-transport-https openjdk-8-jre-headless uuid-runtime pwgen

Check that Java is properly installed by running:

java -version

And check the version installed. If all is working properly, you should see a response like:12. Install MongoDB. Graylog uses MongoDB to store the Graylog configuration data

MongoDB is included in the repos of Ubuntu 16.04 and works with Graylog 2.3 and above.

sudo apt-get install mongodb-server

Start mongoDB and make sure it starts with the server:

sudo systemctl start mongod

sudo systemctl enable mongod

And we can check that it is properly running by:

sudo systemctl status mongod

13. Install and configure Elasticsearch

Graylog 2.5.x can be used with Elasticsearch 5.x. You can find more instructions in the Elasticsearch installation guide:

wget -qO – | sudo apt-key add –

echo “deb stable main” | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list

sudo apt-get update && sudo apt-get install elasticsearch

Now modify the Elasticsearch configuration file located at /etc/elasticsearch/elasticsearch.yml and set the cluster name to graylog.

sudo nano /etc/elasticsearch/elasticsearch.yml

Additionally you need to uncomment (remove the # as first character) the line: graylog

Now, you can start Elasticsearch with the following commands:

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl restart elasticsearch.service

By running sudo systemctl status elasticsearch.service you should see Elasticsearch up and running as below:

Step 3: Install Graylog

14. We can now install Graylog repository and Graylog itself with the following commands:

sudo dpkg -i graylog-2.5-repository_latest.deb

sudo apt-get update && sudo apt-get install graylog-server

15. Configure Graylog

First, create a password of at least 64 characters by running the following command:

pwgen -N 1 -s 96

And copy the result referenced below as password

Let’s create its sha256 checksum as required in the Graylog configuration file:

echo -n password | sha256sum

Now you can open the Graylog configuration file:

sudo nano /etc/graylog/server/server.conf

And replace password_secret and root_password_sha2 with the values you created above.

The configuration file should look as below (replace with your own generated password):

Now replace the following entries with your AWS CNAME that was given when creating your EC2 instance. Note that also, depending on your setup, you can replace the alias below with your internal IP.


web:16. Optional: Configure HTTPS for the Graylog web interface

Although not mandatory, it is recommended that you configure https to your Graylog server.

Please find the steps to setup https in the following link:

17. Start the Graylog service and enable it on system startup

Run the following commands to restart Graylog and enforce it on the server startup:

sudo systemctl daemon-reload
sudo systemctl enable graylog-server.service
sudo systemctl start graylog-server.service

Now we can check that Graylog has properly started:

sudo systemctl status graylog-server.service18. Login to the Graylog console

You should now be able to login to the console.

If the page is not loading at all, check if you have properly configured the security group of your instance and that port 9000 is open.

You can login with username ‘admin’ and the password you set as your secret password.

Step 4: Configure SFTP on your server and Imperva Cloud WAF SFTP push

19. Create a new user and its group

Let’s create a directory where the logs will be sent to Incapsula to send logs.

sudo.adduser incapsula

incapsula is the user name created in this example. You can replace it to the name of your choice. You will be prompted to choose a password.

Let’s create a new group:

sudo groupadd incapsulagroup

And associate the incapsula user to this group

sudo usermod -a -G incapsulagroup incapsula  

20. Let’s create a directory where the logs will be sent to

In this example, we will send all log to /home/incapsula/logs

cd /home

sudo mkdir incapsula

cd incapsula

sudo mkdir logs

21. Now let’s set strict permissions restrictions to that folder

For security purposes, we want to restrict access of this user strictly to the folder where the logs will be sent. The home and incapsula folders can be owned by root while logs will be owned by our newly created user.

sudo chmod 755 /home/incapsula

sudo chown root:root /home/incapsula

Now let’s assign our new user (incapsula in our example) as the owner of the logs directory:

sudo chown -R incapsula:incapsulagroup /home/incapsula/logs

The folder is now owned by incapsula and belongs to incapsulagroup.

And you can see that the incapsula folder is restricted to root, so the newly created incapsula user can only access the /home/incapsula/logs folder, to send its logs.

22. Now let’s configure open-ssh SFTP server and set the appropriate security restrictions.

sudo nano /etc/ssh/sshd_config

Comment out this section:

#Subsystem sftp /usr/lib/openssh/sftp-server

And add this line right below:

subsystem sftp internal-sftp

Change the authentication to allow password authentication so Incapsula can send logs using username / password authentication:

PasswordAuthentication yes

And add the following lines at the bottom of the document:

match group incapsulagroup

chrootDirectory /home/incapsula

X11Forwarding no

AllowTcpForwarding no
ForceCommand internal-sftp
PasswordAuthentication yes

Save the file and exit.

Let’s now restart the SSH server:

sudo service sshd restart

23. Now let’s check that we can send files using SFTP

For that, let’s open use Filezilla and try to upload a file. If everything worked properly, you should be able to:

  • Connect successfully
  • See the logs folder, and be unable to navigate
  • Copy a file to the remote server

Step 5: Push the logs from Imperva Incapsula to the Graylog SFTP folder

24. Configure the Logs in Imperva Cloud WAF SIEM logs tab

  • Log into your account.
  • On the sidebar, click Logs > Log Setup
    • Make sure you have SIEM logs license enabled.
  • Select the SFTP option
  • In the host section enter your public facing AWS hostname. Note that your security group should be open to Incapsula IPs as described in the Security Group section earlier.
  • Update the path to log
  • For the first testing, let’s disable encryption and compression.
  • Select CEF as log format
  • Click Save

See below an example of the settings. Click Test Connection and ensure it is successful. Click Save.

25. Make sure the logs are enabled for the relevant sites as below

You can select either security logs or all access logs on a site-per-site basis.

Selecting All Logs will retrieve all access logs, while Security Logs will push only logs where security events were raised.

  • Note that selecting All Logs will have a significant impact on the volume of logs.

You can find more details on the various settings of the SIEM logs integration in Imperva documentation in this link.

26. Verify that logs are getting pushed from Incapsula servers to your FTP folder


The first logs might take some time to reach your server, depending on the volume of traffic on the site, in particular for a site with little traffic. Generate some traffic and events.

27. Enhance performance and security

To improve the security and performance of your SIEM integration project, you can consider enforcing https in Graylog. You can find a guide to configure https on Graylog here.

 That’s it! In my next blogs, we will describe how to start collecting and parsing Imperva and Incapsula logs using Graylog and how to create your first dashboard.

If you have suggestions for improvements or updates in any of the steps, please share with the community in the comments below.

The post How to Deploy a Graylog SIEM Server in AWS and Integrate with Imperva Cloud WAF appeared first on Blog.

Hundreds of Vulnerable Docker Hosts Exploited by Cryptocurrency Miners

Docker is a technology that allows you to perform operating system level virtualization. An incredible number of companies and production hosts are running Docker to develop, deploy and run applications inside containers.

You can interact with Docker via the terminal and also via remote API. The Docker remote API is a great way to control your remote Docker host, including automating the deployment process, control and get the state of your containers, and more. With this great power comes a great risk — if the control gets into the wrong hands, your entire network can be in danger.

In February, a new vulnerability (CVE-2019-5736) was discovered that allows you to gain host root access from a docker container.  The combination of this new vulnerability and exposed remote Docker API can lead to a fully compromised host.

According to Imperva research, exposed Docker remote API has already been taken advantage of by hundreds of attackers, including many using the compromised hosts to mine a lesser-known-albeit-rising cryptocurrency for their financial benefit.

In this post you will learn about:

  • Publicly exposed Docker hosts we found
  • The risk they can put organizations in
  • Protection methods

Publicly Accessible Docker Hosts

The Docker remote API listens on ports 2735 / 2736. By default, the remote API is only accessible from the loopback interface (“localhost”, “”), and should not be available from external sources. However, as with other cases —  for example, publically-accessible Redis servers such as RedisWannaMine —  sometimes organizations are misconfiguring their services, allowing easy access to their sensitive data.

We used the Shodan search engine to find open ports running Docker.

We found 3,822 Docker hosts with the remote API exposed publicly.

We wanted to see how many of these IPs are really exposed. In our research, we tried to connect to the IPs on port 2735 and list the Docker images. Out of 3,822 IPs, we found approximately 400 IPs are accessible.

Red indicates Docker images of crypto miners, while green shows production environments and legitimate services  

We found that most of the exposed Docker remote API IPs are running a cryptocurrency miner for a currency called Monero. Monero transactions are obfuscated, meaning it is nearly impossible to track the source, amount, or destination of a transaction.  

Other hosts were running what seemed to be production environments of MySQL database servers, Apache Tomcat, and others.

Hacking with the Docker Remote API

The possibilities for attackers after spawning a container on hacked Docker hosts are endless. Mining cryptocurrency is just one example. They can also be used to:

  • Launch more attacks with masked IPs
  • Create a botnet
  • Host services for phishing campaigns
  • Steal credentials and data
  • Pivot attacks to the internal network

Here are some script examples for the above attacks.

1. Access files on the Docker host and mounted volumes

By starting a new container and mounting it to a folder in the host, we got access to other files in the Docker host:

It is also possible to access data outside of the host by looking on container mounts. Using the Docker inspect command, you can find mounts to external storage such as NFS, S3 and more. If the mount has write access, you can also change the data.

2. Scan the internal network

When a container is created in one of the predefined Docker network “bridge” or “host,” attackers can use it to access hosts the Docker host can access within the internal network. We used nmap to scan the host network to find services. We did not need to install it, we simply used a ready image from the Docker Hub:

It is possible to find other open Docker ports and navigate inside the internal network by looking for more Docker hosts as described in our Redis WannaMine post.

3. Credentials leakage

It is very common to pass arguments to a container as environment variables, including credentials such as passwords. You can find examples of passwords sent as environment variables in the documentation of many Docker repositories.

We found 3 simple ways to detect credentials using the Docker remote API:

Docker inspect command

“env” command on a container

Docker inspect doesn’t return all environment variables. For example, it doesn’t return ones which were passed to docker run using the –env-file argument. Running “env” command on a container will return the entire list:

Credentials files on the host

Another option is mounting known credentials directories inside the host. For example, AWS credentials have a default location for CLI and other libraries and you can simply start a container with a mount to the known directory and access a credentials file like “~/.aws/credentials”.

4. Data Leakage

Here is an example of how a database and credentials can be detected, in order to run queries on a MySQL container:

Wrapping Up

In this post, we saw how dangerous exposing the Docker API publicly can be.

Exposing Docker ports can be useful, and may be required by third-party apps like ‘portainer’, a management UI for Docker.

However, you have to make sure to create security controls that allow only trusted sources to interact with the Docker API. See the Docker documentation on Securing Docker remote daemon.

Imperva is going to release a cloud discovery tool to better help IT, network and security administrators answer two important questions:

  • What do I have?
  • Is it secure?

The tool will be able to discover and detect publicly-accessible ports inside the AWS account(s). It will also scan both instances and containers. To try it, please contact Imperva sales.

We also saw how credentials stored as environment variables can be retrieved. It is very common and convenient, but far from being secure. Instead of using environment variables, it is possible to read the credentials on runtime (depends on your environment). In AWS you can use roles and KMS. In other environments, you can use 3rd party tools like Vault or credstash.

The post Hundreds of Vulnerable Docker Hosts Exploited by Cryptocurrency Miners appeared first on Blog.

Latest Drupal RCE Flaw Used by Cryptocurrency Miners and Other Attackers

cryptocurrency miner via website vulnerability

Another remote code execution vulnerability has been revealed in Drupal, the popular open-source Web content management system. One exploit — still working at time of this writing — has been used in dozens of unsuccessful attacks against our customers, with an unknown number of attacks, some likely successful, against other websites.

Published on February 20th, the new vulnerability (known as CVE 2019-6340 and SA-CORE-2019-003) is about fields types that don’t sanitize data from non-form sources when the Drupal 8 core REST module and another web services module such as JSON:API are both enabled. This allows arbitrary PHP remote code execution that could lead to compromise of the web server.

An exploit was published a day after the vulnerability was published, and continues to work even after following the Drupal team’s proposed remediation of disabling all web services modules and banning PUT/PATCH/POST requests to web services resources. Despite the fix, it is still possible to issue a GET request and therefore perform remote code execution as was the case with the other HTTP methods. Fortunately, users of Imperva’s Web Application Firewall (WAF) were protected.

Attack Data

Imperva research teams constantly analyze attack traffic from the wild that passes between clients and websites protected by our services. We’ve found dozens of attack attempts aimed at dozens of websites that belong to our customers using this exploit, including sites in government and the financial services industry.

The attacks originated from several attackers and countries, and all were blocked thanks to generic Imperva policies that had been in place long before the vulnerability was published.

Figure 1 below shows the daily number of CVE 2019-6340 exploits we’ve seen in the last couple of days.

Figure 1: Attacks by date

As always, attacks followed soon after the exploit was published. So being up to date with security updates is a must.

According to Imperva research, 2018 saw a year-over-year increase in Drupal vulnerabilities, with names such as DirtyCOW and Drupalgeddon 1, 2 and 3. These were used in mass attacks that targeted hundreds of thousands of websites.

There were a few interesting payloads in the most recent attacks. One payload tries to inject a Javascript cryptocurrency (Monero and Webchain) miner named CoinIMP into an attacked site’s index.php file so that site visitors will run the mining script when they browse the site’s main page, for the attacker’s financial benefit.

The following is CoinIMP’s client side embedded script ( The script uses a 64 character length key generated by the CoinIMP panel to designate the site key of the attacker on CoinIMP.

The attacker’s payload also tries to install a shell uploader to upload arbitrary files on demand.

Here is the upload shell ( content:

Imperva Customers Protected

Customers of Imperva Web Application Firewall (WAF, formerly Incapsula) were protected from this attack due to our RCE detection rules. So although the attack vector is new, its payload is old and has been dealt with in the past.

We also added new dedicated and generic rules to our WAF (both the services formerly known as Incapsula and SecureSphere) to strengthen our security and provide wider coverage to attacks of this sort.

The post Latest Drupal RCE Flaw Used by Cryptocurrency Miners and Other Attackers appeared first on Blog.

No One is Safe: the Five Most Popular Social Engineering Attacks Against Your Company’s Wi-Fi Network

Your Wi-Fi routers and access points all have strong WPA2 passwords, unique SSIDs, the latest firmware updates, and even MAC address filtering. Good job, networking and cybersecurity teams! However, is your network truly protected? TL;DR: NO!

In this post, I’ll cover the most common social engineering Wi-Fi association techniques that target your employees and other network users. Some of them are very easy to launch, and if your users aren’t aware of and know how to avoid them, it’s only a matter of time until your network is breached.

Attackers only need a Unix computer (which can be as inexpensive and low-powered as a $30 Raspberry Pi), a Wi-Fi adapter with monitor mode enabled, and a 3G modem for remote control. They can also buy ready-made stations with all of the necessary tools and user interface, but where’s the fun in that? 

Figure 1: Wi-Fi hacking tools

1) Evil Twin AP

An effortless and easy technique. All attackers need to do is set up an open AP (Access Point) with the same or similar SSID (name) as the target and wait for someone to connect. Place it far away from the target AP where the signal is low and it’s only a matter of time until some employee connects, especially in big organizations. Alternatively, impatient attackers may follow the next technique.

Figure 2: Evil Twin Demonstration

2. Deauthentication / Disassociation Attack

In the current IEEE 802.11 (Wi-Fi protocol) standards, whenever a wireless station wants to leave the network, it sends a deauthentication or disassociation frame to the AP. These two frames are sent unencrypted and are not authenticated by the AP, which means anyone can spoof those packets.

This technique makes it very easy to sniff the WPA 4-way handshake needed for a Brute Force attack, since a single deauthentication packet is enough to force a client to reconnect.

Even more importantly, attackers can spoof these messages repeatedly and thus disable the communication between Wi-Fi clients and the target AP, which increases the chance your users will connect to the attacker’s twin AP. Combining these 2 techniques works very well, but still depends on the user connecting to the fake AP. The following technique does not, however.

3. Karma Attack

Whenever a user device’s Wi-Fi is turned on but not connected to a network, it openly broadcasts the SSIDs of previously-associated networks in an attempt to connect to one of them. These small packets, called probe requests, are publicly viewable by anyone in the area.

The information gathered from probe requests can be combined with geo-tagged wireless network databases such as to map the physical location of these networks.

If one of the probe requests contains an open Wi-Fi network SSID, then generating the same AP for which the user device is sending probes will cause the user’s laptop, phone or other device to connect to an attacker’s fake AP automatically.

Forcing any connected device to send probe requests is very easy, thanks to the previous technique.

Figure 3: Sniffing Probe Requests

4. Known Beacons

The final attack I’ll discuss that can lead your user to connect to an attacker’s fake AP is “Known Beacons.” This is a random technique where attacker broadcast dozens of beacon frames of common SSIDs that nearby wireless users have likely connected to in the past (like AndroidAP, Linksys, iPhone, etc.). Again, your users will automatically authenticate and connect due to the “Auto-Connect” feature.

An attacker has connected with your user, now what?

Once attackers have access to your user, there’s a variety of stuff they can do: sniff the victim’s traffic, steal login credentials, packet injection, port scan, exploit the user device, etc. But most importantly, the attacker can also get the target AP password by a victim-customized web phishing attack.

Since the victim is using the black hat hacker’s machine as a router, there are many ways to manipulate the phishing page to look convincing. One of them is a captive portal. For example, by DNS hijacking, he can forward all web requests to his local web server, so that his page appears no matter where the victim tries to access it from. Even worse, most operating systems will identify his page as a legitimate captive portal and open it automatically!

Figure 4: Captive Portal Attack

5. Bypassing MAC Address Filtering

As mentioned, your networks may use MAC Filtering, which means only predefined devices can connect to your network and having the password is not enough. How much does that help?

All MAC addresses are hard-coded into a network card and can never be changed. However, attackers can change the MAC address in their operating system and pretend to be one of the allowed devices.

Attackers can easily get the MAC address of one of your network’s allowed devices, since every packet sent to and from your employee’s device includes its MAC address unencrypted. Of course, attackers have to force your employee’s device to disconnect (using deauthentication packets) before connecting to your network using the hacked MAC address.

How Can You Mitigate?

Detecting an Evil AP in your area can be done easily by scanning and comparing configurations of nearby access points. However, as with any social engineering attack, the best way to mitigate is by training your users, which is a critical element of security.

Make sure your network users understand the risk of connecting to open access points and are well aware of the techniques mentioned. Running simulations of the above attacks is also recommended.

Finally, while specific techniques will come and go, social engineering will always remain a popular strategy for attackers. So make sure you and your users remain aware!

The post No One is Safe: the Five Most Popular Social Engineering Attacks Against Your Company’s Wi-Fi Network appeared first on Blog.