Rubber-Duck Critic Agent for Code Review Workflows
Bottom Line
A rubber-duck critic agent works best as a planning gate, not an autonomous coder. Make it return structured objections, missing tests, and next actions before implementation starts.
Key Takeaways
- ›Use JSON schema output so the review result can feed CI or PR comments.
- ›Keep the agent read-only: it critiques plans, risks, and tests.
- ›Block only high-risk gaps; send ordinary ambiguity back as revise.
- ›Validate against realistic plans before adding the critic to CI.
A rubber-duck critic agent is a deliberately skeptical reviewer that reads your implementation plan before you write code. Instead of approving broad intentions like "add validation" or "refactor the service," it asks for concrete files, risks, tests, rollback paths, and assumptions. In this tutorial, you will build a small Node.js agent that turns a patch plan into structured review feedback your team can run locally or in CI.
- Use structured JSON output so the critic is machine-checkable, not just conversational.
- Score plans on scope, risk, tests, and reversibility before implementation begins.
- Keep the agent read-only: it critiques plans and patches, but does not edit code.
- Start with a local CLI, then promote it to CI once the expected output is stable.
Prerequisites
Bottom Line
The agent is useful only if it produces a repeatable artifact. Treat the critic as a planning gate: it should return structured objections, severity, and next actions before a developer starts editing.
Prerequisites box
- Node.js installed locally.
- An OpenAI API key available as
OPENAI_API_KEY. - A model name supplied through
OPENAI_MODEL, so the script does not hard-code a moving target. - A repository with a written implementation plan or pull request description.
The design has one strict boundary: the rubber-duck critic does not implement anything. It reviews intent, evidence, and test strategy. That separation matters because the agent should feel safe to run against untrusted pull requests, planning documents, or early design notes.
1. Shape the review contract
Start by deciding what the critic is allowed to judge. A useful contract is narrow enough to be consistent and broad enough to catch planning failures. The agent should evaluate:
- Whether the plan names the files, APIs, migrations, or interfaces likely to change.
- Whether risk is tied to concrete behavior rather than vague concern.
- Whether tests cover the changed behavior and at least one failure path.
- Whether the plan explains how to back out or reduce blast radius.
Create a new folder and install the minimal dependencies:
mkdir duck-critic
cd duck-critic
npm init -y
npm install openai
If you paste code snippets into a shared doc or issue, run them through TechBytes' Code Formatter so indentation survives review comments and ticket systems.
2. Build the CLI agent
Create critic.mjs. The script reads a plan file, sends it to the Responses API with responses.create, and asks for a strict JSON object. The schema keeps the result usable by humans and automation.
import fs from "node:fs/promises";
import process from "node:process";
import OpenAI from "openai";
const planPath = process.argv[2];
if (!planPath) {
console.error("Usage: node critic.mjs <plan-file>");
process.exit(1);
}
const model = process.env.OPENAI_MODEL;
if (!model) {
console.error("Set OPENAI_MODEL to the model your team has approved.");
process.exit(1);
}
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const plan = await fs.readFile(planPath, "utf8");
const response = await client.responses.create({
model,
input: [
{
role: "system",
content: "You are a senior code review planning critic. Be specific, skeptical, and concise. Do not write code."
},
{
role: "user",
content: `Review this implementation plan before coding starts:\n\n${plan}`
}
],
text: {
format: {
type: "json_schema",
name: "critic_report",
strict: true,
schema: {
type: "object",
additionalProperties: false,
required: ["verdict", "risk_score", "blocking_questions", "missing_tests", "next_actions"],
properties: {
verdict: { type: "string", enum: ["approve", "revise", "block"] },
risk_score: { type: "integer", minimum: 1, maximum: 5 },
blocking_questions: { type: "array", items: { type: "string" } },
missing_tests: { type: "array", items: { type: "string" } },
next_actions: { type: "array", items: { type: "string" } }
}
}
}
}
});
const report = JSON.parse(response.output_text);
console.log(JSON.stringify(report, null, 2));
3. Feed it a real plan
Create a deliberately incomplete planning note so you can see the critic push back.
cat > plan.md <<'EOF'
We will add retry handling to the billing webhook worker.
The worker should retry failed API calls and log better errors.
Implementation should be small and tested.
EOF
Run the critic:
OPENAI_MODEL="your-approved-model" node critic.mjs plan.md
The agent should object because the plan omits concrete retry limits, idempotency rules, duplicate charge protection, and test cases. That is the point. A helpful critic does not reward confident vagueness.
4. Verification and expected output
The exact wording will vary, but the shape should match the schema. A strong first result looks like this:
{
"verdict": "revise",
"risk_score": 4,
"blocking_questions": [
"Which failures are retryable, and what is the maximum retry window?",
"How will the worker avoid duplicate billing side effects?"
],
"missing_tests": [
"Test retry exhaustion without duplicate webhook processing.",
"Test non-retryable billing API errors."
],
"next_actions": [
"Define retry policy and idempotency behavior before implementation.",
"Name the worker file, queue configuration, and observability changes."
]
}
Verify three things before trusting the agent in a team workflow:
- The command exits successfully for valid input and fails clearly when
OPENAI_MODELis missing. - The output parses as JSON every time, with no markdown wrapper or commentary.
- The critic gives different feedback for low-risk UI copy changes than for billing, auth, migrations, or data deletion.
5. Troubleshooting: top 3 failures
1. The output is not valid JSON
Use text.format with json_schema and parse response.output_text. If you ask for JSON only in prose, you are depending on prompt discipline instead of an API-level output contract.
2. The critic is too generic
Add your team's review rubric to the system message. Include local risk categories such as schema migrations, billing, authentication, background jobs, data retention, and public API compatibility.
3. The agent blocks everything
Tune the verdict rubric, not just the tone. Reserve block for missing safety constraints, irreversible operations, unclear ownership, or no test path. Use revise for normal planning gaps.
What's next
Once the CLI is stable, wire it into the earliest review stage where plans exist. Good integration points include:
- A pull request template check that runs when
PLAN.mdchanges. - A pre-implementation design review bot for risky areas like billing, security, and migrations.
- A CI job that posts the JSON report as a pull request comment.
- A dashboard that tracks repeated planning gaps across teams.
Keep the rubber-duck critic humble. It should not decide architecture alone, but it should make weak assumptions visible early enough that humans can correct them cheaply.
Frequently Asked Questions
What is a rubber-duck critic agent for code review? +
Should a critic agent run before or after implementation? +
Why use structured JSON for an AI review agent? +
Can this replace human code review? +
Get Engineering Deep-Dives in Your Inbox
Weekly breakdowns of architecture, security, and developer tooling — no fluff.
Related Deep-Dives
AI Code Review Checklist for Engineering Teams
A practical checklist for applying AI review without weakening human ownership.
Developer ToolsStructured Outputs for Reliable Developer Tools
How schema-constrained responses make AI tooling easier to test and automate.
Developer ToolsCI Automation Patterns for Pull Requests
A field guide to adding useful automation without slowing review loops.