Blast-Radius Analysis for Autonomous CI/CD [2026]
Bottom Line
Treat blast radius as a policy input, not a human judgment call. If your pipeline can score changed services and dependencies before deploy, it can auto-promote low-risk releases and stop high-risk ones without slowing the rest of engineering.
Key Takeaways
- ›Score deploy risk from commit diff plus service dependencies, not raw file counts
- ›Keep thresholds in policy data so platform teams can tune prod gates safely
- ›Use OPA to decide auto-deploy vs manual approval from one JSON payload
- ›Run kubectl diff and server-side dry-run before the real apply step
Autonomous delivery only works when the pipeline can explain how far a change can spread before it promotes it. A blast-radius gate does that by turning a commit range, a service dependency map, and deployment policy into one deploy or no-deploy decision. In this tutorial, you will build a lightweight gate that scores changed services, blocks risky production rollouts, and still keeps low-risk releases fully automatic.
- Compute impact from changed paths plus downstream dependencies.
- Emit one JSON payload that both humans and policy engines can review.
- Let opa eval decide whether production can stay autonomous.
- Validate the final rollout with kubectl diff and kubectl apply --dry-run=server.
Prerequisites
Prerequisites box
- A monorepo or multi-service repo where path ownership is stable enough to map files to services.
- A CI system that can expose commit SHAs and pass artifacts between jobs. The example below uses GitHub Actions because jobs and needs are well-defined in the official workflow syntax.
git,python3,opa, andkubectlavailable on the runner.- Kubernetes manifests in a predictable directory such as
k8s/. - A policy owner who can define acceptable thresholds for
dev,staging, andprod.
Bottom Line
Treat blast radius as structured data. Once your pipeline can score impact before deploy, production approvals become a policy decision instead of an on-call guess.
Step 1: Model the blast radius
Start with a repo-owned map that answers two questions: which paths belong to which service, and which services pull risk downstream when they change. The first half is direct ownership. The second half is the real blast radius, because a change in shared ingress, platform code, or a payment dependency is usually more dangerous than the number of edited files suggests.
Check in a small ownership graph
{
"path_rules": [
{"prefix": "services/payments/", "service": "payments", "tier": "critical", "weight": 5},
{"prefix": "services/checkout/", "service": "checkout", "tier": "high", "weight": 4},
{"prefix": "platform/ingress/", "service": "edge", "tier": "critical", "weight": 6},
{"prefix": "infra/base/", "service": "shared-infra", "tier": "critical", "weight": 7}
],
"dependency_edges": {
"payments": [],
"checkout": ["payments"],
"edge": ["checkout", "payments"],
"shared-infra": ["edge", "checkout", "payments"]
},
"thresholds": {
"dev": 12,
"staging": 10,
"prod": 8
}
}
Keep this file boring and reviewable. Resist the urge to derive it from half a dozen APIs on day one. If you need to share sample payloads in tickets or postmortems and they include schema names or customer-shaped fields, run them through the Data Masking Tool first so your analysis workflow does not leak sensitive context.
Step 2: Score and annotate risk
The scoring script should do exactly three things: compute a stable merge base with git merge-base, list changed files with git diff --name-only, and translate those files into direct and transitive impact. Emit JSON, because the same payload can feed policy checks, job summaries, and audit logs.
#!/usr/bin/env python3
import argparse
import json
import subprocess
from collections import deque
def git(*args):
return subprocess.check_output(["git", *args], text=True).strip()
def build_reverse_edges(edges):
reverse = {}
for service, deps in edges.items():
reverse.setdefault(service, set())
for dep in deps:
reverse.setdefault(dep, set()).add(service)
return reverse
def expand_impact(direct, reverse_edges):
impacted = set(direct)
queue = deque(direct)
while queue:
current = queue.popleft()
for downstream in reverse_edges.get(current, set()):
if downstream not in impacted:
impacted.add(downstream)
queue.append(downstream)
return sorted(impacted)
parser = argparse.ArgumentParser()
parser.add_argument("--base-ref", required=True)
parser.add_argument("--head-ref", required=True)
parser.add_argument("--target-env", required=True)
parser.add_argument("--map-file", default="ci/service-map.json")
args = parser.parse_args()
with open(args.map_file, encoding="utf-8") as fh:
service_map = json.load(fh)
base_sha = git("merge-base", args.base_ref, args.head_ref)
changed = git("diff", "--name-only", base_sha, args.head_ref).splitlines()
matches = []
direct = set()
raw_score = 0
for rule in service_map["path_rules"]:
hit_files = [path for path in changed if path.startswith(rule["prefix"])]
if hit_files:
direct.add(rule["service"])
raw_score += rule["weight"]
matches.append({
"service": rule["service"],
"tier": rule["tier"],
"weight": rule["weight"],
"files": hit_files,
})
reverse_edges = build_reverse_edges(service_map["dependency_edges"])
expanded = expand_impact(sorted(direct), reverse_edges)
blast_radius_score = raw_score + max(0, len(expanded) - len(direct))
document = {
"base_sha": base_sha,
"head_sha": args.head_ref,
"target_env": args.target_env,
"changed_files": changed,
"direct_services": sorted(direct),
"impacted_services": expanded,
"matches": matches,
"thresholds": service_map["thresholds"],
"blast_radius_score": blast_radius_score,
}
print(json.dumps(document, indent=2))
This script is intentionally simple. You can review the generated JSON in a job summary, and if you want cleaner diffs during policy reviews, paste the payload into TechBytes' Code Formatter before checking examples into docs or runbooks.
Step 3: Enforce policy in CI and before deploy
Now separate policy from pipeline logic. The pipeline should gather facts; the policy should decide whether a production deploy stays autonomous. That is exactly the kind of boundary opa eval is good at, especially when you keep thresholds and sensitive-service rules in versioned policy files.
Codify approval rules in Rego
package cicd
import rego.v1
default allow := false
allow if {
input.target_env != "prod"
}
allow if {
input.target_env == "prod"
input.blast_radius_score < input.thresholds.prod
not touches_shared_infra
}
touches_shared_infra if {
some match in input.matches
match.service == "shared-infra"
}
deny_reason contains "production score exceeds threshold" if {
input.target_env == "prod"
input.blast_radius_score >= input.thresholds.prod
}
deny_reason contains "shared infrastructure changes require review" if {
touches_shared_infra
}
Wire the gate into GitHub Actions
name: deploy
on:
pull_request:
push:
branches: [main]
jobs:
analyze:
runs-on: ubuntu-latest
outputs:
decision: ${{ steps.policy.outputs.decision }}
score: ${{ steps.policy.outputs.score }}
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- name: Build blast-radius payload
run: |
python3 scripts/blast_radius.py \
--base-ref origin/main \
--head-ref "${GITHUB_SHA}" \
--target-env prod \
> blast-radius.json
- name: Evaluate policy
run: |
opa eval \
--data policy/blast_radius.rego \
--input blast-radius.json \
"data.cicd" > opa-result.json
- id: policy
name: Export decision
run: |
python3 - <<'PY'
import json
import os
blast = json.load(open("blast-radius.json", encoding="utf-8"))
result = json.load(open("opa-result.json", encoding="utf-8"))
value = result["result"][0]["expressions"][0]["value"]
decision = "auto-deploy" if value["allow"] else "manual-approval"
with open(os.environ["GITHUB_OUTPUT"], "a", encoding="utf-8") as fh:
fh.write(f"score={blast['blast_radius_score']}\n")
fh.write(f"decision={decision}\n")
with open(os.environ["GITHUB_STEP_SUMMARY"], "a", encoding="utf-8") as fh:
fh.write("### Blast radius\n\n")
fh.write(f"- Score: {blast['blast_radius_score']}\n")
fh.write(f"- Decision: {decision}\n\n")
fh.write("\n")
fh.write(json.dumps(blast, indent=2))
fh.write("\n\n")
PY
deploy:
runs-on: ubuntu-latest
needs: analyze
if: ${{ needs.analyze.outputs.decision == 'auto-deploy' }}
steps:
- uses: actions/checkout@v5
- name: Preview cluster change
run: |
set +e
kubectl diff -f k8s/
status=$?
if [ "$status" -gt 1 ]; then
exit "$status"
fi
- name: Validate against the API server
run: kubectl apply --dry-run=server -f k8s/
- name: Deploy
run: kubectl apply -f k8s/
The important pattern is the handoff: job outputs carry only the deploy decision and score, while the full artifact stays in JSON. That keeps the gate deterministic and avoids rebuilding risk logic in shell conditionals later.
Verification and expected output
Run the same flow locally before you trust it in production CI. That lets you catch bad path mappings, missing dependencies, and policy mistakes without burning deploy cycles.
- Generate a payload for a known change range.
- Evaluate the payload with opa eval --data and --input.
- Preview the cluster delta with kubectl diff.
- Validate admission and schema checks with kubectl apply --dry-run=server.
python3 scripts/blast_radius.py \
--base-ref origin/main \
--head-ref HEAD \
--target-env prod \
> blast-radius.json
opa eval \
--data policy/blast_radius.rego \
--input blast-radius.json \
--format pretty \
"data.cicd"
kubectl diff -f k8s/
kubectl apply --dry-run=server -f k8s/
Expected signals:
blast-radius.jsonlistsdirect_services,impacted_services, and a singleblast_radius_score.- opa eval returns
allow: truefor low-risk changes, orallow: falsewith one or moredeny_reasonvalues for risky production changes. - kubectl diff returns exit code 0 when there are no differences and 1 when differences exist. Treat values greater than 1 as real errors.
- kubectl apply --dry-run=server should succeed before the real apply step ever runs.
Troubleshooting top 3 and What's next
Troubleshooting top 3
- Wrong merge base, wrong score. If your checkout is shallow,
git merge-basecan resolve against incomplete history. Usefetch-depth: 0in the checkout step so the analyzer sees the real commit graph. kubectl difffails healthy builds. Exit code 1 means differences were found, not that the command is broken. Normalize the step so only exit codes above 1 fail the pipeline.- Policy drift from architecture drift. If teams move code without updating
service-map.json, your score becomes noise. Make map updates part of service creation, decomposition, and ownership transfer reviews.
What's next
- Add runtime context such as service criticality, pager ownership, or recent incident history to the JSON payload.
- Split thresholds by deployment strategy so canaries can auto-promote at a higher score than full rollouts.
- Store approved exceptions as policy data instead of hard-coding branch or team carve-outs in YAML.
- Publish the payload to your engineering portal so incident responders can see why the pipeline allowed or blocked a rollout.
For current command behavior, keep the official references close: GitHub Actions workflow syntax, workflow commands and GITHUB_OUTPUT, OPA CLI docs, kubectl diff, and git merge-base.
Frequently Asked Questions
What is blast-radius analysis in a CI/CD pipeline? +
allow or manual review decision.How do I calculate deployment blast radius from Git changes? +
git merge-base to anchor the comparison, then git diff --name-only to collect changed paths. Map those paths to direct service ownership, expand through downstream dependencies, and convert the result into a weighted score that your policy engine can evaluate.Should kubectl diff block my deployment job? +
kubectl diff returns 0 when there are no differences and 1 when differences exist. Treat values above 1 as errors, but do not fail the pipeline just because a change is present.Why use OPA instead of hard-coded GitHub Actions conditions? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.