Resources / Blog / Monad Partners with Databricks to Bring 285+ Security Sources to the Lakehouse

May 7, 2026

Monad Partners with Databricks to Bring 285+ Security Sources to the Lakehouse

Valerie Zargarpur

Head of Marketing

Databricks is in the security market now. In March 2026, they launched Lakewatch, an open agentic SIEM built on the Lakehouse architecture, with Adobe, Dropbox, and National Australia Bank as early design partners. The pitch: decouple storage from compute, land 100% of your telemetry in open formats, and use AI agents for detection and triage at machine speed. Pricing is based on compute consumed, not data ingested.

If you've been running security workloads on Databricks already, Lakewatch gives that investment a dedicated security layer. If you've been watching from a legacy SIEM and doing the math on ingestion costs, it gives you a named destination to evaluate.

For teams planning that migration, the hardest part isn't choosing the destination. It's replicating the data feeds your legacy SIEM had out of the box, from hundreds of security tools, in a format that's actually useful on the other side. That's the problem Monad solves, and it's why we're partnering with Databricks.

Where Monad fits in

Lakewatch handles analytics, detection-as-code, AI-driven hunting, and petabyte-scale search. It's built to do powerful things with security data once it's in the Lakehouse. The harder problem, and the one that stalls most migrations, is getting data out of your 50, 100, or 200 security tools and into Delta Lake in a format that's useful for any of that.

Any team that's migrated off a legacy SIEM knows this firsthand. The SIEM had pre-built integrations for every source. Moving to a new platform means rebuilding all of those data feeds, and security telemetry is a different animal than general-purpose data. The APIs are unstable. Vendors ship breaking changes without notice. Schemas vary wildly across tools, even across versions of the same tool. And for detection and correlation to work well, data should arrive normalized to a common schema like OCSF before it hits the lake. Raw JSON dumped into Delta tables creates a second problem downstream: analysts writing ad hoc parsers in notebooks instead of running detections.

Monad eliminates the ingestion bottleneck so teams can get to full Databricks coverage faster, with less engineering risk and no gap in visibility during the transition.

What Monad's Databricks output does

Monad now streams security telemetry directly into Databricks Delta Lake tables via Unity Catalog. The output handles the plumbing that would otherwise take weeks or months of custom development.

Monad Databricks output configuration screen

Automatic table creation and schema inference. When Monad sends data to a table that doesn't exist yet, it creates it. Schema is inferred from the data itself, and mergeSchema support means new fields get added automatically as sources evolve. No one has to maintain DDL by hand.

Compressed staging via Unity Catalog Volumes. Monad writes gzip-compressed JSONL files to a Unity Catalog Volume, then bulk-loads them into Delta tables using COPY INTO. This pattern is tuned for throughput: fewer, larger loads instead of record-by-record inserts.

OAuth M2M authentication. The output authenticates with Databricks using service principal credentials, following the same security model Databricks recommends for production workloads. No long-lived tokens sitting in config files.

Permission validation on connect. Before any data flows, Monad's Test Connection verifies that the service principal has every required grant: USE CATALOG, USE SCHEMA, CREATE TABLE, SELECT, MODIFY, and volume access. If anything is missing, it tells you exactly which permission to fix, before you find out the hard way in production.

Pipeline view showing data flowing from sources to Databricks Delta Lake

Batch configuration defaults are optimized for COPY INTO throughput: 50,000 records or 10MB per batch, whichever comes first, with a 30-second maximum publish interval. Teams can tune these based on their own latency and warehouse compute trade-offs.

What Monad adds to a Databricks security deployment

Breadth that keeps up with real environments. Monad connects to 300+ security sources and destinations, including on-prem infrastructure that cloud-only tools can't reach. Every connector is tested daily against live APIs, not validated once at ship time. When an upstream vendor changes their authentication flow or response schema, Monad catches it before your pipeline breaks at 2am. Enterprise security teams at companies like Robinhood and CoreWeave run Monad in production. For teams evaluating Lakewatch or already running security analytics on Databricks, this is the difference between a proof of concept with five sources and a production deployment with full coverage across cloud and on-prem.

Normalization at ingest, not after the fact. Monad can map data to common schemas, including OCSF, before it lands in Delta Lake. Teams that enable normalization get clean, structured data that Lakewatch's detection-as-code rules and AI agents can operate on from day one, instead of writing regex parsers in notebooks to make Okta logs line up with AWS CloudTrail. Teams running their own Spark-based detections on Databricks, outside of Lakewatch, get the same benefit: query-ready tables with consistent field names across sources.

Filtering and routing across destinations. SIEM migrations don't happen in a single cutover. Most teams run their legacy SIEM in parallel while they build confidence in the new platform. Monad handles this natively: filter, transform, and route the same data to multiple destinations simultaneously. Keep your existing SIEM fed while you bring Databricks online, then shift traffic as you're ready. No duplicate pipelines, no gap in coverage during the transition.

Monad pipeline with split routing: Databricks + SIEM destinations

Getting started

Setting up the Databricks output requires a workspace with Unity Catalog enabled, a running SQL warehouse, and an existing catalog and schema. Monad creates the table and staging volume automatically.

Create a service principal in your Databricks Account Console, generate OAuth credentials, and grant it permissions on your target schema. Point Monad at your workspace hostname, SQL warehouse HTTP path, catalog, schema, and table name. Hit Test Connection. Monad validates every permission and reports any gaps. Once connected, data flows.

Test Connection results showing validated permissions

The full setup documentation, including batch tuning and troubleshooting, is in the Monad docs.

Try it

If you're running security workloads on Databricks, evaluating Lakewatch, or planning a migration off a legacy SIEM, Monad gets your data there: enriched and query-ready from 300+ sources, including on-prem, without building custom pipelines.

Schedule a demo to see it running, or reach out at product@monad.com.

‍

ON THIS POST