A guide for developers and technical leads on how to add automated security checks directly into Claude Code workflows – from local IDE to CI/CD pipeline – using Semgrep through MCP as the connection tool
01 IntroductionAgentic engineering has changed how we build software – and, in the process, exposed a critical security weak point. As teams adopt agentic tools like Claude Code, the security review infrastructure, including shift-left frameworks that are built around human-paced development, becomes less reliable. The unit of production is no longer a function but a complete, deployable module generated in seconds. This acceleration has introduced the Verification Tax: a phenomenon where the labor of verification replaces the labor of creation.
BaxBench, a 2025 benchmark study from ETH Zurich published at ICML, evaluated 11 leading LLMs on 392 real-world backend generation tasks across 14 frameworks and six programming languages. No model exceeded 37% correct and secure generation. In every tested model, end-to-end attacks successfully exploited roughly half of the programs that passed all functional tests, demonstrating how important integrated security checks are.
More recently, DryRun Security’s Agentic Coding Security Report (published in March 2026) moved beyond benchmarks to production-representative development. Claude Code (Sonnet 4.6), OpenAI Codex (GPT-5.2), and Google Gemini (2.5 Pro) were tasked to build two full applications from identical specifications, introducing features via pull requests as a real engineering team would. The results: 143 security issues across 38 scans, with 26 of 30 pull requests (87%) containing at least one vulnerability. Not a single agent produced a fully secure application.
For more than a decade, shift-left security functioned as the industry’s answer to catching vulnerabilities early, moving validation from post-deployment audits to the point of code creation. For human-paced development, it worked. The problem is that the tooling suite built to implement it has not kept up with the evolution of software development.
This article will guide you through a modernized shift-left architecture using Claude Code and Semgrep via MCP.
What This Article Covers:
02 What Is Shift-Left Security & Why It Is EssentialShift-left security is the principle of moving vulnerability detection from post-deployment audits directly to the point of creation.
In an AI-native workflow, however, “left” has moved further than ever: it now refers to the moment of generation, before code is shipped.
The economics behind it are well-established: a vulnerability caught at the coding stage costs minutes to remediate. The same vulnerability detected in a production incident is way more expensive in engineering time, compliance exposure, and reputational cost.
The Verification Tax necessitates this shift. While AI tools like Claude Code can generate the entire code base, 64% of development teams report that manually verifying this output takes as long as, or longer than, writing it from scratch. Senior developers are the hardest hit, spending an average of 4.3 minutes reviewing each AI suggestion compared to 1.2 minutes for juniors.When this verification tax is not addressed, organizations accumulate technical debt at an unsustainable rate. This is measured by the Technical Debt Ratio (TDR):
A TDR above 5% is a high-risk indicator for systemic collapse. Traditional SAST tools fail at agentic development due to the speed of code generation. They flag issues after the developer has already moved to the next prompt, increasing the cognitive cost of refactoring.
The true shift-left architecture for Claude Code must run within the agent loop itself – in the same terminal session, against the code just produced – before moving on. This setup, using the Semgrep plugin, ensures consistent security checks for pattern-based vulnerabilities and secrets, giving you confidence in early detection while recognizing that it doesn’t catch logic flaws or zero-days, which remain challenging in AI-native workflows.
The benefits of an “In-Loop” Shift-Left approach include:
This architecture transforms Claude Code from a speed-optimized lines of code generator into a security-aware agent, closing the gap between functional correctness and production readiness.
03 The Integration ArchitectureThese three components provide the base layer of the scan-on-generate security architecture:
Semgrep is an open-source SAST tool that analyzes code structure to detect vulnerabilities such as injection flaws, secrets, and insecure patterns, with high precision and low latency – fast enough to run inline on every generation event without disrupting the development loop.
The three capabilities that matter for this architecture:
It handles intent analysis and contextual risk assessment. When Semgrep flags a pattern, Claude evaluates its exploitability and generates a logic-preserving patch.
For example, if Semgrep finds a potential SQL injection path in a newly generated repository layer, Claude can:
The BaxBench study found that when models were given security-specific prompting – explicit instructions to reason about vulnerabilities – the rates of correct and secure generation improved significantly, particularly for reasoning models.
The Model Context Protocol provides the standardized interface that enables the loop to be automatic.
With it, Semgrep is registered as a tool in Claude Code’s environment, a post-write hook triggers a scan on every generated file, and findings return as structured context within the same session, before the developer has moved on.
The official Semgrep MCP server exposes the following tools to Claude Code:
04 ImplementationThe following steps build the full architecture from scratch in Claude Code. Learn more about Semgrep implementation here.
Prerequisites
# Via Homebrew (macOS/Linux)
brew install semgrep
# Via pip
python3 -m pip install semgrep
# Verify installation — must be 1.146.0 or higher
semgrep --version
# Log in to Semgrep to enable Pro rules and Supply Chain analysis
semgrep login
Tip: The authenticated Pro engine enables cross-file and cross-function dataflow reachability, reducing false positives by an additional 25% and increasing true positive detection by 250% compared to the Community Edition running locally without authentication.
# Start a Claude Code instance
claude
# Open the plugin browser
/plugin
# Navigate to Discover, search for Semgrep, and install
# Then run the setup skill — this configures the MCP server and hooks
/setup-semgrep-plugin
This single command installs three components simultaneously:
Alternatively, for enterprise deployments where you want explicit control over the MCP configuration, add the following to your Claude Code settings directly:
{
"mcpServers": {
"semgrep": {
"type": "stdio",
"command": "uvx",
"args": ["semgrep-mcp"],
"env": {
"SEMGREP_APP_TOKEN": "<your-token>"
}
}
}
}
Enterprise deployment note: Use stdio transport (local, communicates via standard input/output) rather than the remote streamable-http endpoint (mcp.semgrep.ai).
The default Semgrep rule set covers everything, which means it includes too much for inline development. Alert fatigue is the primary reason shift-left implementations fail in practice. The goal is not comprehensive coverage but high-confidence, low-noise coverage of the vulnerability classes that AI-generated code introduces consistently.
A recommended starting configuration for Claude Code workflows:
# .semgrep.yml — place in project root
rules:
- id: use-security-audit
patterns:
- pattern: $X
message: Security audit
languages: [python, javascript, typescript, java, go]
severity: ERROR
# Run with focused rule sets:
# semgrep scan --config=p/security-audit --config=p/secrets --severity=ERROR
For CI/CD enforcement, use the full suite. For the Claude Code inline loop, restrict to:
To maximize this shift-left approach, use prompts that force Claude to reason over the Semgrep output, for example:
“Scan this module with Semgrep. For any findings, differentiate between true positives and test placeholders. For real vulnerabilities, implement a secure fix that preserves our current architecture.”
The scan-on-generate loop is the developer assist layer. It catches the majority of issues at generation, before a PR is ever opened. It is not the enforcement gate, and it should not be treated as one.
Configure the CI pipeline to run Semgrep independently on every PR and block merges on critical-severity findings:
# .github/workflows/semgrep.yml
name: Semgrep Security Gate
on:
pull_request:
branches: [main, develop]
jobs:
semgrep:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write
steps:
- uses: actions/checkout@v4
- name: Run Semgrep
run: |
pip install semgrep
semgrep scan \
--config=p/security-audit \
--config=p/secrets \
--sarif \
--output=semgrep.sarif \
--severity=ERROR
- name: Upload SARIF findings
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: semgrep.sarif
if: always()
The two layers serve two different functions.
05 What Semgrep CatchesEvery security tool has a defined surface area. Here is where this architecture thrives and where it struggles.
06 Measuring the Security ImpactIf the integration is not measured, it cannot be optimized.
Engineering leaders must move beyond vanity metrics: raw vulnerability counts without severity weighting tell you nothing about actual risk reduction. The signals that matter are those that demonstrate a measurable reduction in developer toil and security debt.
The traditional security lifecycle suffers from a triage delay where CI/CD findings sit in a backlog for days or weeks. In a properly configured Claude Code + Semgrep loop, detection and remediation happen within the same session, MTTR should drop to minutes. A rising MTTR is a diagnostic signal: either prompts are ineffective, Claude is generating fixes that require significant manual rework, or the rule set is producing too many false positives for developers to engage with.
As mentioned earlier, ~40% of the time saved by AI code generation is immediately consumed by reviewing, correcting, and verifying the output. This burden falls disproportionately on senior engineers, who spend high-value time auditing AI-generated code rather than focusing on architecture. The scan-on-generate loop is designed to absorb baseline verification automatically, shifting it from human review to deterministic tooling.
Security debt is a particularly high-interest form of technical debt. Defects caught at generation cost minutes to fix; the same defects discovered post-deployment carry weeks of engineering time, compliance exposure, and incident cost. The scan-on-generate architecture does not add a new security budget line. It reallocates existing remediation costs to the phase where it is much cheaper to absorb.
Establish a baseline over two weeks of instrumentation before activating the local MCP loop, then measure the reduction after activation. This is the cleanest before/after signal for scan-on-generate impact, because it isolates the generation-time intervention from other security program changes. A successful integration shows a corresponding decline in CI gate findings as fewer issues survive the local loop.
07 ConclusionThe transition to AI-driven development has transformed the economics of software security. The main hustle is no longer code generation, it is verification. As AI agents generate sophisticated, production-ready modules at speeds no human review process can match, verification debt accumulates silently – a problem that compounds with every PR merged and every deployment shipped.
The deployment of Claude Code with Semgrep via MCP represents a strategic move towards a verification-first SDLC. Fast, deterministic pattern-matching handles the known vulnerability classes that AI introduces structurally. Claude’s reasoning layer handles the contextual judgment that rules cannot replicate. MCP removes the manual step that breaks down under velocity pressure. Together, they form a defence that addresses both the surface-level risks that static analysis was designed to detect and the logic-level risks that require semantic understanding.
The goal is not to eliminate all risks; this is nearly impossible, as zero-day threats and architectural flaws will always require human judgment. Instead, it is to minimize the cost of security remediation by identifying vulnerabilities early. This approach helps reduce the Mean Time to Remediation (MTTR) from days to minutes, ensuring that the efficiency gains from AI translate into genuine quality improvements rather than an increasing burden of technical debt.
Ready to Develop, Secure, and Scale agentic development workflows across your enterprise? Contact us!