Automated STRIDE Threat Modeling for CI/CD [Deep Dive]
Bottom Line
Treat AI as a classifier and prioritizer, not the primary source of findings. Let code-defined threat models generate deterministic evidence, then use structured AI output to map those findings into STRIDE and enforce release gates.
Key Takeaways
- ›Use code-defined models so threat analysis runs on every pull request, not once per quarter.
- ›Generate deterministic findings first, then let AI classify them into STRIDE with a strict schema.
- ›Gate merges on explicit blocker rules like internet-facing High or Critical findings.
- ›Keep sensitive payloads out of prompts by masking architecture notes before the AI step.
Most teams treat threat modeling as a workshop artifact, which means it drifts the moment the architecture changes. A better pattern is to commit the model, generate findings on every pull request, and let AI do the last-mile work of mapping deterministic findings into the six STRIDE buckets your team actually triages. This tutorial walks through a practical pipeline using OWASP pytm 1.3.1, GitHub Actions, and a structured AI review step.
- Run threat modeling from code so every pull request gets the same baseline analysis.
- Use pytm to generate findings, then use AI only to classify, summarize, and gate.
- Return structured JSON from the AI layer so CI can make deterministic pass/fail decisions.
- Keep sensitive examples out of prompts by masking them before they leave your repo context.
Prerequisites
Bottom Line
The reliable pattern is generator first, AI second. Let pytm produce the evidence, then let a structured model map those findings to STRIDE and decide whether the change should block the pipeline.
What you need
- A GitHub repository with GitHub Actions enabled.
- Python 3 locally and on CI runners.
- An OPENAIAPIKEY stored as a GitHub Actions secret.
- A small folder in the repo for threat model code, templates, and generated output.
- If your architecture notes contain real identifiers, scrub them with the Data Masking Tool before they are included in the AI step.
The design choice that matters most is scope. Do not ask an LLM to free-form threat model your system from prose alone. Use the model to classify and prioritize findings that were already generated from a code-defined architecture. That keeps the pipeline explainable and much easier to audit.
1. Model the System in Code
Step 1: Create the working layout
- Create a dedicated
threatmodel/directory in your repo. - Install the minimum dependencies for deterministic generation and structured parsing.
- Check both the model file and the report template into source control.
mkdir -p threatmodel build
python -m pip install --upgrade pip
pip install pytm openai pydantic
Step 2: Define the architecture as Python
pytm models assets, data flows, and trust boundaries directly in code. Start with one service that matters, not your whole platform. The example below intentionally keeps a public HTTP path and a not-yet-hardened datastore so the first run produces findings you can test against.
#!/usr/bin/env python3
from pytm.pytm import TM, Boundary, Actor, Server, Datastore, Dataflow, Data, Classification
tm = TM("payments-api")
tm.description = "Checkout API with a public client path, admin path, and SQL datastore."
tm.isOrdered = True
internet = Boundary("Internet")
production = Boundary("Production")
customer = Actor("Customer")
customer.inBoundary = internet
ops_admin = Actor("Ops Admin")
ops_admin.inBoundary = internet
api = Server("Payments API")
api.inBoundary = production
api.OS = "Linux"
api.isHardened = False
api.hasAccessControl = True
db = Datastore("Orders DB")
db.inBoundary = production
db.OS = "PostgreSQL"
db.isSql = True
db.isHardened = False
order_data = Data(
name="Order payload",
description="Order details, addresses, and payment metadata",
classification=Classification.SENSITIVE,
isPII=True,
isStored=True,
isSourceEncryptedAtRest=False,
isDestEncryptedAtRest=True,
)
customer_to_api = Dataflow(customer, api, "Checkout request")
customer_to_api.protocol = "HTTP"
customer_to_api.dstPort = 80
customer_to_api.data = order_data
admin_to_api = Dataflow(ops_admin, api, "Admin operations")
admin_to_api.protocol = "HTTPS"
admin_to_api.dstPort = 443
api_to_db = Dataflow(api, db, "Write order record")
api_to_db.protocol = "PostgreSQL"
api_to_db.dstPort = 5432
api_to_db.data = order_data
tm.process()
This is the critical shift-left move: the architecture is now diffable. Security review stops being a slide deck and becomes another artifact your CI can evaluate.
Step 3: Add a minimal report template
Use --report for a human-readable artifact and --json for machine-oriented output. A tiny markdown template is enough for the AI layer because it gives the model a stable, compact summary.
# Threat Model Report
## System description
{tm.description}
## Findings
{findings:repeat:* {{item.id}} | {{item.severity}} | {{item.target}} | {{item.description}}
}
2. Classify Findings with AI
Step 4: Generate deterministic findings first
Run pytm before you involve AI. This is the step that creates the evidence set your pipeline can trust.
python threatmodel/payments_api.py --report threatmodel/template.md > build/threat-report.md
python threatmodel/payments_api.py --json build/tm.json
The report gives reviewers readable findings. The JSON artifact is useful later if you want trend reporting, dashboards, or custom policy checks beyond STRIDE.
Step 5: Map findings into STRIDE with structured output
The AI layer should do three jobs only:
- Map each finding to one STRIDE category.
- Add short remediation guidance tied to the existing evidence.
- Set a boolean field your CI can use for a release gate.
The example below uses the Responses API and Structured Outputs through the Python SDK. The model is gpt-4.1-mini, which supports the v1/responses endpoint and structured output parsing.
import json
from pathlib import Path
from typing import Literal
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
class Finding(BaseModel):
title: str
threat_id: str | None = None
stride: Literal[
"Spoofing",
"Tampering",
"Repudiation",
"Information Disclosure",
"Denial of Service",
"Elevation of Privilege",
]
severity: Literal["Critical", "High", "Medium", "Low"]
evidence: str
remediation: str
block_release: bool
class Review(BaseModel):
summary: str
findings: list[Finding]
report = Path("build/threat-report.md").read_text()
response = client.responses.parse(
model="gpt-4.1-mini",
input=[
{
"role": "system",
"content": (
"You classify existing threat-model findings into STRIDE. "
"Do not invent new findings. Map each finding to exactly one STRIDE category. "
"Set block_release to true only when the finding is High or Critical and "
"the evidence points to an internet-facing, credential-handling, or sensitive-data path."
),
},
{"role": "user", "content": report},
],
text_format=Review,
)
review = response.output_parsed
Path("build/stride-review.json").write_text(
json.dumps(review.dict(), indent=2)
)
That final JSON file is the handoff boundary between AI and CI. Once you have it, the rest of the pipeline should be plain code.
3. Wire It into GitHub Actions
Step 6: Add a pull-request workflow
GitHub documents path-based triggers with on.pull_request.paths, workflow-level permissions, and job summaries through GITHUBSTEPSUMMARY. This workflow runs only when the threat-model files change, installs dependencies, generates findings, classifies them, and fails the job if any blocker is present.
name: stride-threat-model
on:
pull_request:
paths:
- 'threatmodel/**'
- '.github/workflows/stride-threat-model.yml'
workflow_dispatch:
permissions:
contents: read
jobs:
threat-model:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: '3.13'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pytm openai pydantic
- name: Generate threat-model artifacts
run: |
mkdir -p build
python threatmodel/payments_api.py --report threatmodel/template.md > build/threat-report.md
python threatmodel/payments_api.py --json build/tm.json
- name: Classify findings into STRIDE
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: python threatmodel/classify_stride.py
- name: Gate the pull request
run: |
python - <<'PY'
import json
import sys
from pathlib import Path
review = json.loads(Path('build/stride-review.json').read_text())
blockers = [item for item in review['findings'] if item['block_release']]
print(f"Blockers: {len(blockers)}")
for item in blockers:
print(f"- {item['stride']}: {item['title']}")
if blockers:
sys.exit(1)
PY
- name: Publish job summary
if: ${{ always() }}
run: |
python - <<'PY' >> "$GITHUB_STEP_SUMMARY"
import json
from collections import Counter
from pathlib import Path
review = json.loads(Path('build/stride-review.json').read_text())
counts = Counter(item['stride'] for item in review['findings'])
print('### STRIDE review')
print()
print(review['summary'])
print()
for key in [
'Spoofing',
'Tampering',
'Repudiation',
'Information Disclosure',
'Denial of Service',
'Elevation of Privilege',
]:
print(f'- {key}: {counts.get(key, 0)}')
PY
That workflow is intentionally small. It keeps the CI contract obvious: generate, classify, gate, summarize.
Verify and Troubleshoot
Expected output
A healthy first run should leave three concrete artifacts behind:
build/threat-report.mdwith deterministic findings from pytm.build/tm.jsonfor machine-oriented downstream analysis.build/stride-review.jsonwith AI-normalized STRIDE categories and blocker flags.
Your workflow summary should look roughly like this:
### STRIDE review
3 findings classified from the current threat-model report.
- Spoofing: 0
- Tampering: 1
- Repudiation: 0
- Information Disclosure: 1
- Denial of Service: 0
- Elevation of Privilege: 1
If one or more findings return "block_release": true, the pull request should fail in the gate step. That is the expected enforcement path, not an error in the workflow.
Top 3 troubleshooting cases
- No findings are generated. Check that your model ends with tm.process(), that all important elements are in scope, and that the architecture actually exposes risky traits such as plain
HTTP, weak hardening, or sensitive data movement. - The AI step returns schema or parsing errors. Structured Outputs require a supported modern model. If you switched away from gpt-4.1-mini or another current structured-output-capable model, move back to a supported option and keep the schema simple.
- The workflow fails before classification. Verify the OPENAIAPIKEY secret exists, the report file path is correct, and your actions/setup-python@v6 step is installing dependencies before the classification script runs.
What's Next
Once the basic pipeline works, the next improvements are straightforward:
- Add --stale_days checks so you can detect when threat-model code has drifted from the implementation it describes.
- Define custom blocker policy by service tier, such as failing only on internet-facing High findings for internal tools but failing on all sensitive-data Critical findings for payment flows.
- Extend the workflow to comment on pull requests, publish weekly trend counts, or feed STRIDE category metrics into your engineering scorecards.
The main idea does not change: deterministic generation gives you consistency, and structured AI output gives you human-friendly prioritization without giving up CI/CD control.
Frequently Asked Questions
Can AI replace manual STRIDE threat modeling in CI/CD? +
Why use OWASP pytm before calling an LLM? +
How do I fail a pull request only on serious STRIDE findings? +
block_release from the AI step and compute it from explicit policy. For example, fail only when a finding is High or Critical and affects an internet-facing or sensitive-data path.What if my repository cannot send architecture details to an external model? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.