Home Posts Rubber-Duck Critic Agent for Code Review Workflows
AI Engineering

Rubber-Duck Critic Agent for Code Review Workflows

Rubber-Duck Critic Agent for Code Review Workflows
Dillip Chowdary
Dillip Chowdary
Tech Entrepreneur & Innovator · June 05, 2026 · 7 min read

Bottom Line

A rubber-duck critic agent works best as a planning gate, not an autonomous coder. Make it return structured objections, missing tests, and next actions before implementation starts.

Key Takeaways

  • Use JSON schema output so the review result can feed CI or PR comments.
  • Keep the agent read-only: it critiques plans, risks, and tests.
  • Block only high-risk gaps; send ordinary ambiguity back as revise.
  • Validate against realistic plans before adding the critic to CI.

A rubber-duck critic agent is a deliberately skeptical reviewer that reads your implementation plan before you write code. Instead of approving broad intentions like "add validation" or "refactor the service," it asks for concrete files, risks, tests, rollback paths, and assumptions. In this tutorial, you will build a small Node.js agent that turns a patch plan into structured review feedback your team can run locally or in CI.

  • Use structured JSON output so the critic is machine-checkable, not just conversational.
  • Score plans on scope, risk, tests, and reversibility before implementation begins.
  • Keep the agent read-only: it critiques plans and patches, but does not edit code.
  • Start with a local CLI, then promote it to CI once the expected output is stable.

Prerequisites

Bottom Line

The agent is useful only if it produces a repeatable artifact. Treat the critic as a planning gate: it should return structured objections, severity, and next actions before a developer starts editing.

Prerequisites box

  • Node.js installed locally.
  • An OpenAI API key available as OPENAI_API_KEY.
  • A model name supplied through OPENAI_MODEL, so the script does not hard-code a moving target.
  • A repository with a written implementation plan or pull request description.

The design has one strict boundary: the rubber-duck critic does not implement anything. It reviews intent, evidence, and test strategy. That separation matters because the agent should feel safe to run against untrusted pull requests, planning documents, or early design notes.

1. Shape the review contract

Start by deciding what the critic is allowed to judge. A useful contract is narrow enough to be consistent and broad enough to catch planning failures. The agent should evaluate:

  • Whether the plan names the files, APIs, migrations, or interfaces likely to change.
  • Whether risk is tied to concrete behavior rather than vague concern.
  • Whether tests cover the changed behavior and at least one failure path.
  • Whether the plan explains how to back out or reduce blast radius.

Create a new folder and install the minimal dependencies:

mkdir duck-critic
cd duck-critic
npm init -y
npm install openai

If you paste code snippets into a shared doc or issue, run them through TechBytes' Code Formatter so indentation survives review comments and ticket systems.

2. Build the CLI agent

Create critic.mjs. The script reads a plan file, sends it to the Responses API with responses.create, and asks for a strict JSON object. The schema keeps the result usable by humans and automation.

import fs from "node:fs/promises";
import process from "node:process";
import OpenAI from "openai";

const planPath = process.argv[2];
if (!planPath) {
  console.error("Usage: node critic.mjs <plan-file>");
  process.exit(1);
}

const model = process.env.OPENAI_MODEL;
if (!model) {
  console.error("Set OPENAI_MODEL to the model your team has approved.");
  process.exit(1);
}

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const plan = await fs.readFile(planPath, "utf8");

const response = await client.responses.create({
  model,
  input: [
    {
      role: "system",
      content: "You are a senior code review planning critic. Be specific, skeptical, and concise. Do not write code."
    },
    {
      role: "user",
      content: `Review this implementation plan before coding starts:\n\n${plan}`
    }
  ],
  text: {
    format: {
      type: "json_schema",
      name: "critic_report",
      strict: true,
      schema: {
        type: "object",
        additionalProperties: false,
        required: ["verdict", "risk_score", "blocking_questions", "missing_tests", "next_actions"],
        properties: {
          verdict: { type: "string", enum: ["approve", "revise", "block"] },
          risk_score: { type: "integer", minimum: 1, maximum: 5 },
          blocking_questions: { type: "array", items: { type: "string" } },
          missing_tests: { type: "array", items: { type: "string" } },
          next_actions: { type: "array", items: { type: "string" } }
        }
      }
    }
  }
});

const report = JSON.parse(response.output_text);
console.log(JSON.stringify(report, null, 2));
Watch out: Do not send secrets, customer records, or proprietary dumps as part of a plan. Mask sensitive values first with the Data Masking Tool or your internal redaction workflow.

3. Feed it a real plan

Create a deliberately incomplete planning note so you can see the critic push back.

cat > plan.md <<'EOF'
We will add retry handling to the billing webhook worker.
The worker should retry failed API calls and log better errors.
Implementation should be small and tested.
EOF

Run the critic:

OPENAI_MODEL="your-approved-model" node critic.mjs plan.md

The agent should object because the plan omits concrete retry limits, idempotency rules, duplicate charge protection, and test cases. That is the point. A helpful critic does not reward confident vagueness.

4. Verification and expected output

The exact wording will vary, but the shape should match the schema. A strong first result looks like this:

{
  "verdict": "revise",
  "risk_score": 4,
  "blocking_questions": [
    "Which failures are retryable, and what is the maximum retry window?",
    "How will the worker avoid duplicate billing side effects?"
  ],
  "missing_tests": [
    "Test retry exhaustion without duplicate webhook processing.",
    "Test non-retryable billing API errors."
  ],
  "next_actions": [
    "Define retry policy and idempotency behavior before implementation.",
    "Name the worker file, queue configuration, and observability changes."
  ]
}

Verify three things before trusting the agent in a team workflow:

  • The command exits successfully for valid input and fails clearly when OPENAI_MODEL is missing.
  • The output parses as JSON every time, with no markdown wrapper or commentary.
  • The critic gives different feedback for low-risk UI copy changes than for billing, auth, migrations, or data deletion.

5. Troubleshooting: top 3 failures

1. The output is not valid JSON

Use text.format with json_schema and parse response.output_text. If you ask for JSON only in prose, you are depending on prompt discipline instead of an API-level output contract.

2. The critic is too generic

Add your team's review rubric to the system message. Include local risk categories such as schema migrations, billing, authentication, background jobs, data retention, and public API compatibility.

3. The agent blocks everything

Tune the verdict rubric, not just the tone. Reserve block for missing safety constraints, irreversible operations, unclear ownership, or no test path. Use revise for normal planning gaps.

What's next

Once the CLI is stable, wire it into the earliest review stage where plans exist. Good integration points include:

  • A pull request template check that runs when PLAN.md changes.
  • A pre-implementation design review bot for risky areas like billing, security, and migrations.
  • A CI job that posts the JSON report as a pull request comment.
  • A dashboard that tracks repeated planning gaps across teams.

Keep the rubber-duck critic humble. It should not decide architecture alone, but it should make weak assumptions visible early enough that humans can correct them cheaply.

Frequently Asked Questions

What is a rubber-duck critic agent for code review? +
It is an AI-assisted reviewer that challenges an implementation plan before code is written. Unlike a coding agent, it should ask targeted questions about risk, files, tests, and rollback instead of making edits.
Should a critic agent run before or after implementation? +
Run it before implementation when the plan is still cheap to change. You can also run it after a pull request is opened, but its highest leverage is catching missing constraints before code exists.
Why use structured JSON for an AI review agent? +
Structured JSON makes the output parseable by CI, pull request bots, and dashboards. It also forces the agent to separate verdict, risk_score, missing_tests, and next_actions instead of producing a vague paragraph.
Can this replace human code review? +
No. It is a planning pressure test, not an ownership or architecture authority. Humans still decide tradeoffs, approve risk, and verify implementation quality.

Get Engineering Deep-Dives in Your Inbox

Weekly breakdowns of architecture, security, and developer tooling — no fluff.

Found this useful? Share it.