Can AI replace manual STRIDE threat modeling in CI/CD?

Not safely on its own. A stronger pattern is to generate findings from a code-defined model first, then use AI to classify, summarize, and prioritize those findings. That keeps the pipeline explainable and prevents the model from inventing architecture details.

Why use OWASP pytm before calling an LLM?

pytm gives you deterministic artifacts from code, including reports and JSON, which makes CI runs reproducible. The LLM then works as a thin analysis layer over that evidence instead of acting as the evidence source.

How do I fail a pull request only on serious STRIDE findings?

Return a structured field like block_release from the AI step and compute it from explicit policy. For example, fail only when a finding is High or Critical and affects an internet-facing or sensitive-data path.

What if my repository cannot send architecture details to an external model?

Keep the AI step off by default and run only the deterministic threat-model generation in CI. If you still want the classification layer, mask identifiers, minimize prompt content, or move the AI call to an approved internal environment before enabling the gate.

Automated STRIDE Threat Modeling for CI/CD [Deep Dive]

Most teams treat threat modeling as a workshop artifact, which means it drifts the moment the architecture changes. A better pattern is to commit the model, generate findings on every pull request, and let AI do the last-mile work of mapping deterministic findings into the six STRIDE buckets your team actually triages. This tutorial walks through a practical pipeline using OWASP pytm 1.3.1, GitHub Actions, and a structured AI review step.

Run threat modeling from code so every pull request gets the same baseline analysis.
Use pytm to generate findings, then use AI only to classify, summarize, and gate.
Return structured JSON from the AI layer so CI can make deterministic pass/fail decisions.
Keep sensitive examples out of prompts by masking them before they leave your repo context.

Prerequisites

Bottom Line

The reliable pattern is generator first, AI second. Let pytm produce the evidence, then let a structured model map those findings to STRIDE and decide whether the change should block the pipeline.

What you need

A GitHub repository with GitHub Actions enabled.
Python 3 locally and on CI runners.
An OPENAIAPIKEY stored as a GitHub Actions secret.
A small folder in the repo for threat model code, templates, and generated output.
If your architecture notes contain real identifiers, scrub them with the Data Masking Tool before they are included in the AI step.

The design choice that matters most is scope. Do not ask an LLM to free-form threat model your system from prose alone. Use the model to classify and prioritize findings that were already generated from a code-defined architecture. That keeps the pipeline explainable and much easier to audit.

1. Model the System in Code

Step 1: Create the working layout

Create a dedicated threatmodel/ directory in your repo.
Install the minimum dependencies for deterministic generation and structured parsing.
Check both the model file and the report template into source control.

mkdir -p threatmodel build
python -m pip install --upgrade pip
pip install pytm openai pydantic

Step 2: Define the architecture as Python

pytm models assets, data flows, and trust boundaries directly in code. Start with one service that matters, not your whole platform. The example below intentionally keeps a public HTTP path and a not-yet-hardened datastore so the first run produces findings you can test against.

#!/usr/bin/env python3
from pytm.pytm import TM, Boundary, Actor, Server, Datastore, Dataflow, Data, Classification

tm = TM("payments-api")
tm.description = "Checkout API with a public client path, admin path, and SQL datastore."
tm.isOrdered = True

internet = Boundary("Internet")
production = Boundary("Production")

customer = Actor("Customer")
customer.inBoundary = internet

ops_admin = Actor("Ops Admin")
ops_admin.inBoundary = internet

api = Server("Payments API")
api.inBoundary = production
api.OS = "Linux"
api.isHardened = False
api.hasAccessControl = True

db = Datastore("Orders DB")
db.inBoundary = production
db.OS = "PostgreSQL"
db.isSql = True
db.isHardened = False

order_data = Data(
    name="Order payload",
    description="Order details, addresses, and payment metadata",
    classification=Classification.SENSITIVE,
    isPII=True,
    isStored=True,
    isSourceEncryptedAtRest=False,
    isDestEncryptedAtRest=True,
)

customer_to_api = Dataflow(customer, api, "Checkout request")
customer_to_api.protocol = "HTTP"
customer_to_api.dstPort = 80
customer_to_api.data = order_data

admin_to_api = Dataflow(ops_admin, api, "Admin operations")
admin_to_api.protocol = "HTTPS"
admin_to_api.dstPort = 443

api_to_db = Dataflow(api, db, "Write order record")
api_to_db.protocol = "PostgreSQL"
api_to_db.dstPort = 5432
api_to_db.data = order_data

tm.process()

This is the critical shift-left move: the architecture is now diffable. Security review stops being a slide deck and becomes another artifact your CI can evaluate.

Step 3: Add a minimal report template

Use --report for a human-readable artifact and --json for machine-oriented output. A tiny markdown template is enough for the AI layer because it gives the model a stable, compact summary.

# Threat Model Report

## System description
{tm.description}

## Findings
{findings:repeat:* {{item.id}} | {{item.severity}} | {{item.target}} | {{item.description}}
}

2. Classify Findings with AI

Step 4: Generate deterministic findings first

Run pytm before you involve AI. This is the step that creates the evidence set your pipeline can trust.

python threatmodel/payments_api.py --report threatmodel/template.md > build/threat-report.md
python threatmodel/payments_api.py --json build/tm.json

The report gives reviewers readable findings. The JSON artifact is useful later if you want trend reporting, dashboards, or custom policy checks beyond STRIDE.

Pro tip: Use AI for classification and prioritization, not primary finding discovery. That keeps your gate deterministic and your audit trail defensible.

Step 5: Map findings into STRIDE with structured output

The AI layer should do three jobs only:

Map each finding to one STRIDE category.
Add short remediation guidance tied to the existing evidence.
Set a boolean field your CI can use for a release gate.

The example below uses the Responses API and Structured Outputs through the Python SDK. The model is gpt-4.1-mini, which supports the v1/responses endpoint and structured output parsing.

import json
from pathlib import Path
from typing import Literal

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class Finding(BaseModel):
    title: str
    threat_id: str | None = None
    stride: Literal[
        "Spoofing",
        "Tampering",
        "Repudiation",
        "Information Disclosure",
        "Denial of Service",
        "Elevation of Privilege",
    ]
    severity: Literal["Critical", "High", "Medium", "Low"]
    evidence: str
    remediation: str
    block_release: bool

class Review(BaseModel):
    summary: str
    findings: list[Finding]

report = Path("build/threat-report.md").read_text()

response = client.responses.parse(
    model="gpt-4.1-mini",
    input=[
        {
            "role": "system",
            "content": (
                "You classify existing threat-model findings into STRIDE. "
                "Do not invent new findings. Map each finding to exactly one STRIDE category. "
                "Set block_release to true only when the finding is High or Critical and "
                "the evidence points to an internet-facing, credential-handling, or sensitive-data path."
            ),
        },
        {"role": "user", "content": report},
    ],
    text_format=Review,
)

review = response.output_parsed
Path("build/stride-review.json").write_text(
    json.dumps(review.dict(), indent=2)
)

That final JSON file is the handoff boundary between AI and CI. Once you have it, the rest of the pipeline should be plain code.

3. Wire It into GitHub Actions

Step 6: Add a pull-request workflow

GitHub documents path-based triggers with on.pull_request.paths, workflow-level permissions, and job summaries through GITHUBSTEPSUMMARY. This workflow runs only when the threat-model files change, installs dependencies, generates findings, classifies them, and fails the job if any blocker is present.

name: stride-threat-model

on:
  pull_request:
    paths:
      - 'threatmodel/**'
      - '.github/workflows/stride-threat-model.yml'
  workflow_dispatch:

permissions:
  contents: read

jobs:
  threat-model:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-python@v6
        with:
          python-version: '3.13'
          cache: 'pip'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install pytm openai pydantic

      - name: Generate threat-model artifacts
        run: |
          mkdir -p build
          python threatmodel/payments_api.py --report threatmodel/template.md > build/threat-report.md
          python threatmodel/payments_api.py --json build/tm.json

      - name: Classify findings into STRIDE
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: python threatmodel/classify_stride.py

      - name: Gate the pull request
        run: |
          python - <<'PY'
          import json
          import sys
          from pathlib import Path

          review = json.loads(Path('build/stride-review.json').read_text())
          blockers = [item for item in review['findings'] if item['block_release']]

          print(f"Blockers: {len(blockers)}")
          for item in blockers:
              print(f"- {item['stride']}: {item['title']}")

          if blockers:
              sys.exit(1)
          PY

      - name: Publish job summary
        if: ${{ always() }}
        run: |
          python - <<'PY' >> "$GITHUB_STEP_SUMMARY"
          import json
          from collections import Counter
          from pathlib import Path

          review = json.loads(Path('build/stride-review.json').read_text())
          counts = Counter(item['stride'] for item in review['findings'])

          print('### STRIDE review')
          print()
          print(review['summary'])
          print()
          for key in [
              'Spoofing',
              'Tampering',
              'Repudiation',
              'Information Disclosure',
              'Denial of Service',
              'Elevation of Privilege',
          ]:
              print(f'- {key}: {counts.get(key, 0)}')
          PY

That workflow is intentionally small. It keeps the CI contract obvious: generate, classify, gate, summarize.

Verify and Troubleshoot

Expected output

A healthy first run should leave three concrete artifacts behind:

build/threat-report.md with deterministic findings from pytm.
build/tm.json for machine-oriented downstream analysis.
build/stride-review.json with AI-normalized STRIDE categories and blocker flags.

Your workflow summary should look roughly like this:

### STRIDE review

3 findings classified from the current threat-model report.

- Spoofing: 0
- Tampering: 1
- Repudiation: 0
- Information Disclosure: 1
- Denial of Service: 0
- Elevation of Privilege: 1

If one or more findings return "block_release": true, the pull request should fail in the gate step. That is the expected enforcement path, not an error in the workflow.

Top 3 troubleshooting cases

No findings are generated. Check that your model ends with tm.process(), that all important elements are in scope, and that the architecture actually exposes risky traits such as plain HTTP, weak hardening, or sensitive data movement.
The AI step returns schema or parsing errors. Structured Outputs require a supported modern model. If you switched away from gpt-4.1-mini or another current structured-output-capable model, move back to a supported option and keep the schema simple.
The workflow fails before classification. Verify the OPENAIAPIKEY secret exists, the report file path is correct, and your actions/setup-python@v6 step is installing dependencies before the classification script runs.

Watch out: If you let the model read broad architecture prose or issue threads, it may over-generalize. Keep the prompt tightly scoped to generated findings and masked supporting context.

What's Next

Once the basic pipeline works, the next improvements are straightforward:

Add --stale_days checks so you can detect when threat-model code has drifted from the implementation it describes.
Define custom blocker policy by service tier, such as failing only on internet-facing High findings for internal tools but failing on all sensitive-data Critical findings for payment flows.
Extend the workflow to comment on pull requests, publish weekly trend counts, or feed STRIDE category metrics into your engineering scorecards.

The main idea does not change: deterministic generation gives you consistency, and structured AI output gives you human-friendly prioritization without giving up CI/CD control.

Automated STRIDE Threat Modeling for CI/CD [Deep Dive]

Bottom Line

Prerequisites

Bottom Line

What you need

1. Model the System in Code

Step 1: Create the working layout

Step 2: Define the architecture as Python

Step 3: Add a minimal report template

2. Classify Findings with AI

Step 4: Generate deterministic findings first

Step 5: Map findings into STRIDE with structured output

3. Wire It into GitHub Actions

Step 6: Add a pull-request workflow

Verify and Troubleshoot

Expected output

Top 3 troubleshooting cases

What's Next

Frequently Asked Questions

Get Engineering Deep-Dives in Your Inbox