AppSec, Turbocharged: Harnessing Semgrep, Monad, and Snowflake for Peak Efficiency, Coverage and Visibility

October 17, 2023
Darwin Salazar

In recent years, software supply chain security has emerged as a paramount challenge for organizations of all sizes and types. Incidents like the Kaseya, Log4j and Codecov attacks underscore the gravity of this risk. Vulnerabilities can infiltrate your codebase through various channels—open-source libraries, compromised CI/CD pipelines, or even minor oversights in developer practices, and can escalate into significant security breaches. Traditional Application Security (AppSec) methods often fall short of stopping the threat, as they struggle to synthesize disparate data points and consequently, paint an unclear picture of an application’s security landscape.

Recognizing this, we've forged partnerships with industry frontrunners to craft a modern, data-driven strategy, harmonizing the capabilities of Semgrep, Monad, and Snowflake. This synergy enhances visibility into codebase activities, identifies vulnerabilities, and presents insights with unparalleled clarity. Monad seamlessly manages the data infrastructure and transformation, allowing teams to harness their data's full potential and drive transformative security solutions.

The Recipe

Semgrep: Your App’s Security Lifeline

Semgrep’s application security platform helps engineers fix the issues that matter before production. Semgrep scans for vulnerabilities in your code (SAST), known vulnerabilities in open source libraries (SCA), and accidentally committed secrets (secret scanning). Security teams use Semgrep’s confidence ratings and reachability analysis to prioritize what issues are surfaced in the developer workflow. With 3400+ out-of-the-box rules across 30+ languages, and the ability to easily create custom rules, Semgrep accelerates the time it takes to implement and scale a best-in-class AppSec program - all while adding value from Day 1.

Semgrep is available on AWS Marketplace.

Monad: The Data Transformation Maestro

Monad, a security data ELT platform, seamlessly orchestrates API integrations, data ingestion, deduplication, and normalization, utilizing its versatile Monad Object Model (MoM), the Open Cybersecurity Schema Framework (OCSF), or even your own custom model. The MoM, a unified security data layer, generates a SAST findings table, capturing vital security findings data and ensuring it is de-duplicated, normalized, and readily accessible in any data warehouse, including Snowflake. Monad adeptly minimizes noise and duplicate findings across security solutions, enhancing risk identification and remediation processes. By enabling all teams, such as AppSec DevOps, and GRC, to directly interact with data via SQL and third-party business intelligence tools, Monad simplifies the querying experience across multiple data sources and strengthens cross-departmental security collaboration. Learn more about Monad’s capabilities and get your first month free following this link.

Snowflake: The Unified Data Cloud

Snowflake serves as the unified data cloud platform, making it easier to view your entire application's security posture at a glance. The power of Snowflake lies in its ability to collate information quickly and effectively, allowing for swift and smart decision-making. With advanced data visualization features, Snowflake turns raw data into actionable insights, providing a robust foundation for a data-driven approach to application security. Monad also supports data warehouses and storage solutions including Amazon S3, Amazon Security Lake, Databricks, and Google Cloud BigQuery. 

Securing the Software Supply Chain: A Unified, Data-Driven Strategy

360-Degree Visibility

Traditional ways of keeping apps secure can be a bit of a juggle, making you hop between different tools to see the full picture, but we’ve changed the game! By bringing together data from Semgrep and any other AppSec tools Monad has integrations for and then organizing it with our MoM model, we open up a world of possibilities. Now, you can see everything you need in one place, save time, avoid missing important details, and even create handy dashboards to keep an eye on key measures. We’re all about simplifying security and enabling you to make data-driven decisions.

Finding the Proverbial Needle in the Haystack

Scenarios like the Log4j incident have shown how rapidly vulnerabilities can proliferate through an organization's codebase. Scanning and assessing large code repositories manually can be a slow, painstaking process. Our data-driven strategy accelerates this, enabling you to quickly identify known vulnerabilities in your most crucial applications. Faster detection leads to faster fixes, shrinking the chance for attackers to strike.

Tackle the Most Urgent Risks First through Enriched Context 

Once vulnerabilities are identified, the next challenge is prioritizing them. Snowflake consolidates all your security data into a visual format, making it easier to pinpoint which vulnerabilities require immediate attention. This optimizes your remediation process, ensuring that high-priority threats are addressed first, thereby reducing your application's overall risk profile.

Diverse Use-Cases

Our combined approach, featuring Semgrep, Monad, and Snowflake unlocks an unmatched level of application security. Below are some use cases it enables:

  1. Comprehensive Vulnerability Scanning: Semgrep's scanning capabilities can pinpoint not just known but emerging vulnerabilities, offering a breadth of coverage that standalone solutions can't match.
  2. In-Depth Codebase Monitoring: The historical context of your app’s code commit history enriched Semgrep’s visibility and Snowflake’s analytics capabilities allow you to preemptively flag potential risk factors that could otherwise go unnoticed. For example, if a certain repository experiences a spike in vulnerabilities compared to others in that same timeframe, our approach would immediately identify it.
  3. Real-Time Anomaly Alerts: Monad's robust data synthesis, displayed in Snowflake, enables immediate alerting on unusual developer activities, ensuring you're not just reacting to threats, but proactively countering them.
  4. Unified Data Insights: Snowflake’s data aggregation and visualization (i.e., Snowflake Snowsight) features convert multifaceted security data into clear, actionable insights, a task that would be fragmented and time-consuming without this integrated setup.
  5. Efficient Remediation: With all your data housed and analyzed in one place, you can execute queries like the one below to focus your remediation efforts more precisely and promptly. 

Unlocking Security Insights with SQL Queries

Identifying Unaddressed High-Severity SQL Injection Risks by Filepath

This query pinpoints unaddressed high-severity findings related to the vulnerability titled 'python.sqlalchemy.security.audit.avoid-sqlalchemy-text.avoid-sqlalchemy-text', indicating a risk of SQL Injection. SQL Injection is a critical flaw that can result in data breaches, unauthorized access, and data corruption. The output specifies each affected FINDING_LOCATION_FILEPATH and the total count of open findings per file path, aiding in prioritizing remediation efforts for this substantial security risk.

SELECT FINDING_LOCATION_FILEPATH, VULN_TITLE, COUNT(*) AS Open_Findings_Count
FROM mart_sast_finding
WHERE CONNECTOR_TYPE = 'semgrep'
AND FINDING_STATUS = 'open'
AND VULN_TITLE = 'python.sqlalchemy.security.audit.avoid-sqlalchemy-text.avoid-sqlalchemy-text'
GROUP BY FINDING_LOCATION_FILEPATH, VULN_TITLE
ORDER BY Open_Findings_Count DESC;
Query results display the scanned file paths with the highest count of open findings.

Secret-related open issues by asset and date 

The asset_id pinpoints the full file path and the exact line where the vulnerability is. Essential for tracking and managing exposure of sensitive information across different assets over time. (Note: Issues are identified as secret-related based on the 'VULN_TITLE' starting with 'generic.secrets'.)

SELECT DATE(FIRST_SEEN_AT) AS Date, VULN_DESCRIPTION, COUNT(*) AS Open_Secret_Issues_Count
FROM mart_sast_finding
WHERE VULN_TITLE LIKE 'generic.secrets%'
AND finding_status = 'open'AND CONNECTOR_TYPE = 'semgrep'
GROUP BY Date, ASSET_ID,  VULN_DESCRIPTION
ORDER BY Date, Open_Secret_Issues_Count DESC
Query results display the type of secret exposure and where it occurred.
Darwin - Semgrep POC 21.2022 26 sep 2023
Number of open secret exposure issues over time

Identifying Anomalous Spikes in Vulnerabilities

This query focuses on Semgrep 'high' severity findings by date, serving as a key tool for pinpointing days with unusual surges in vulnerabilities within scanned repositories. (Note: Semgrep categorizes severity as 'low, medium, high'.)


SELECT DATE(FIRST_SEEN_AT) AS Date, COUNT(*) AS Critical_Vulnerabilities_Count
FROM mart_sast_finding
WHERE vuln_severity = 'high'
AND CONNECTOR_TYPE = 'semgrep'
GROUP BY Date
ORDER BY Date
Darwin - Semgrep POC '1.2022 AV 23 24 6.202] Jun 27 Aug 9 sep 21.
High severity findings over time

Onboarding Guide

To get started with setting up your integration, ensure you're logged into your Monad account. If you haven’t joined us yet, it’s the perfect time to dive in with Monad Basic! Once you're in, simply follow these steps to proceed:

  1. Begin by integrating Semgrep as your frontline tool for identifying code vulnerabilities. The complete setup guide can be found in the Semgrep Input documentation.
  2. Configure your Security Data Lake Output. Depending on your chosen platform—Snowflake, Databricks, or a custom Security Lake—you'll find tailored instructions in the respective documentation: Snowflake Turbo Output, Databricks Output, and Amazon Security Lake Output.

A Paradigm Shift in Application Security

In today's security landscape, where software supply chain security is paramount, embracing a data-driven approach emerges as a key combatant. Monad plays a pivotal role in this strategy, breaking down silos, extracting vital security findings, normalizing them according to your preferred data model, and delivering crucial insights precisely where needed. Our collaboration with Semgrep and Snowflake, or your preferred data output, shifts the approach from reactive to proactive, providing deep insight into your application's codebase health. The three use cases and queries outlined above are your gateway to immediate value, and the depth of insights only expands as more solutions integrate with Monad and feed into your data lake. The possibilities are boundless, spanning insider threat detection, vulnerability management, risk reporting, and so much more. 

Ready to elevate your security approach? Start with Monad Basic, free for up to a million rows ingested, and step into a future of enhanced, data-driven security.

Sign up to get news and insights from Monad straight to your inbox below: