Wrangling your cybersecurity data with Monad and Databricks

July 20, 2023
Jacolon Walker

Today, the average Fortune 500 enterprise manages 40+ cybersecurity vendors. Each tool has its own format for logs, outputs, and databases.

Wiz, Tenable and AWS all show IP, domain, and vulnerability type in different ways.

The sheer volume and complexity of security data makes cybersecurity a data management problem. Without the ability to query security data sources in a unified schema, enterprises can’t meaningfully understand their security posture.

When security teams can easily join tables from a few important tools, they can create combinatory analytics that reveal highly valuable insights. To show just how powerful and easy it can be, we built out an analytics workflow within a Databricks notebook to show exposed, high-priority vulnerabilities and whether they match currently known exploits. The example and all the code we’re using can be found in the Databricks Solution Accelerator here.

The workflow follows a straightforward path. We start with some exploratory data analysis to get a handle on the shape of the data we’re working with. We begin by grouping vulnerabilities by severity to get a rough prioritization, and count the number of vulnerabilities according to their connector type. We also look at the number of vulnerabilities per location. Comparing these groups gives us a first look at the highest-leverage categories of vulnerabilities in our environment. We also chart the number, type and severity of vulnerabilities over time to get a sense of how we’re trending.

With a good understanding of our vulnerability landscape, we can use data from NIST and CISA to see which of our vulnerabilities match up categorically with exploits taking place in the wild. That additional layer gives us the most concise and well-prioritized list of vulnerabilities to focus on; those that are on important assets, exposed to the outside world, and most likely to be exploited according to current field data.

We end up with a graph showing which assets need the most urgent attention.

All of the code is available in Github through Databrick’s Solution Accelerator here. If you need help getting this set up, shoot us a note: hello@monad.com