Advisory Board · Multi-model review

Payments API idempotency keys

Adversarial plan review by a three-model board before implementation.

Date 2026-06-24 Rounds 2 Board Claude · Codex · Gemini

Final verdict

Do not ship yet — unanimous

Round 1 was two “do not ship” plus one “ship with changes” (Codex). In Round 2, after reading the others, Codex moved to “do not ship yet.” No seat dissents from the verdict.

Plan under review

Add idempotency keys to the payments API. Clients may send an Idempotency-Key header on POST /charges. The server stores the key-to-response mapping in Redis with a 24h TTL; on a duplicate key it returns the cached response.

Keys are optional, and the team intends to enable the feature for all clients at once next sprint.

Board reviews — round by round

Claude

Architecture lens claude-opus-4-8
Round 1 · independent Do not ship yet

Close — fix two things first.

Top risks

  • Concurrent in-flight duplicates. Redis store-on-completion leaves a race: two requests with the same key arriving near-simultaneously both miss the cache and double-charge. You need an atomic reserve (SET NX) on receipt, with an in-progress sentinel, not store-after-response.
  • Key collision / weak semantics. A cached response keyed on the header alone ignores the request body. A client reusing a key with a different payload silently gets the old charge — or worse, masks a real second charge. Must bind key → request-fingerprint and return 422 on mismatch.
  • Big-bang rollout. “All clients at once next sprint” gives no blast-radius control. Redis is now in the critical payment path — define behavior when Redis is down (fail-open = lost idempotency; fail-closed = outage).

The one change I’d insist on

Atomic SET NX reservation at request entry + persist the response under that key, so the dedup window covers in-flight requests, not just completed ones.

Ask another seat to check

24h TTL vs. client retry/settlement windows — does a legitimate retry after expiry double-charge? Verify against payment-processor reconciliation semantics.

Round 2 · rebuttal Do not ship
  1. Under-weighted: Codex’s point that transient 5xx must not be cached as authoritative is sharper than my generic “cache the response.” A 502 stored as the canonical reply would permanently poison the key. And Gemini’s insistence on cache-hit/latency alerting — I named Redis-down behavior but not the observability to detect it.
  2. Changed my view: I’m dropping any tolerance for big-bang. Gemini + Codex converge on phased rollout, and on a core payments path the blast radius argument is decisive. Rollout sequencing is now a blocker, not a footnote.
  3. Still dispute: Codex’s overall “ship with changes.” The atomic-claim race and fingerprint binding aren’t tweaks — without them the feature double-charges under the exact concurrency it exists to prevent. That’s not-ready, not ship-with-changes.
  4. Recommendation: Do not ship — block on atomic claim + sentinel, key ↔ fingerprint binding (422), defined Redis-down behavior, non-caching of transient 5xx, and a phased rollout with latency/hit alerting.

Codex

Implementation / testing lens gpt-5.5
Round 1 · independent Ship with changes

Top risks

  • Caching “response” is underspecified. You must replay status code, body, headers, and error shape consistently, but avoid caching transient 5xx/timeouts as authoritative charge outcomes.
  • No request fingerprinting. Same Idempotency-Key with different amount/customer/card should return a conflict, not a cached unrelated charge.
  • Race handling is missing. Two concurrent POSTs with the same key can both create charges unless Redis uses an atomic claim / in-progress / completed state.

Single change I’d insist on

Implement atomic idempotency state: SET NX claim with request hash, mark completed only after a durable charge result, replay only matching hashes, return conflict on hash mismatch, and define behavior for in-flight retries.

Ask other seats to check

Product / legal / ops should confirm whether 24h retention and optional keys match client retry behavior, dispute/audit needs, and payment provider idempotency guarantees.

Round 2 · rebuttal Do not ship yet
  • Under-weighted: Claude/Gemini are right that Redis-down behavior and rollout mechanics are not secondary details. On a payments write path, “idempotency available unless Redis has issues” is a product/ops contract, not just an implementation edge case.
  • Changed view: I’d move from ship with changes to do not ship yet. The core algorithm changes are clear, but the plan lacks enough failure-mode and rollout specificity to safely schedule as-is.
  • Still dispute: I don’t think “all clients at once next sprint” is inherently impossible if keys are optional and server behavior is backwards compatible. But for payments, it still needs a kill switch, metrics, and staged enablement by route/client cohort before broad reliance.
  • Final recommendation: Do not ship yet — require atomic reservation/in-progress handling, request fingerprint binding, explicit Redis failure policy, replay semantics, observability, and phased rollout before approval.
Verdict change: Codex was the lone “ship with changes” in Round 1. After reading Claude and Gemini, it moved to “do not ship yet,” making the board unanimous.

Gemini

Product / ops lens Gemini frontier (thinking: HIGH)
Round 1 · independent Do not ship yet

Strongest objections & risks

  • Operational blast radius: an “all clients at once” rollout is incredibly risky for a core payments path. A Redis latency spike or configuration error will instantly degrade all transactions.
  • Concurrency / double-charge risk: the plan lacks a locking strategy. If a client sends two identical requests simultaneously, both may bypass the cache before the first response is saved, resulting in catastrophic double-charges for users.
  • Undefined failure modes: there is no operational plan for Redis downtime. If Redis is unreachable, failing closed blocks all payments; failing open silently breaks the idempotency promise.

Single most insisted change

Mandate a phased, measurable rollout (e.g. internal testing → opt-in beta → progressive percentage) backed by strict alerting on Redis latency and cache-hit rates, replacing the proposed big-bang release.

Question for other seats

Ask the architecture/security seat to strictly review the exact Redis locking mechanism (e.g. SETNX) to guarantee that concurrent identical requests are mathematically prevented from creating race conditions.

Round 2 · rebuttal Do not ship
  • Under-weighted: Codex’s point about not caching 5xx errors. From an ops perspective, caching a transient 500 would permanently block a user’s legitimate retry for 24 hours, causing a massive spike in unresolvable support tickets.
  • Changed my view: Claude and Codex’s alignment on request fingerprinting. I focused heavily on concurrency and rollout, but they correctly identified that a client reusing a key for a different payload without triggering a 422 mismatch is a catastrophic product flaw.
  • Still dispute: Codex’s “ship with changes” rating. This plan lacks fundamental architectural safety (atomic locks, fingerprinting) and operational safety (phased rollout). This isn’t a minor tweak; it’s a complete structural redesign of a core payments path. We cannot greenlight this conditionally.
  • Recommendation: Do not ship — mandate atomic SET NX locks, payload fingerprinting, 5xx cache exclusion, and a dark-launched, phased rollout.

Consensus blockers — must fix before ship

  1. Atomic dedup
    SET NX claim with an in-progress sentinel that completes only after a durable charge result. Without it, two concurrent same-key requests both miss the cache and double-charge — the exact failure the feature exists to prevent.
  2. Request-fingerprint binding
    Bind key → hash(request); a reused key with a different payload returns 409/422 conflict, never a replayed or unrelated charge.
  3. Redis failure policy + kill switch
    Redis is now in the critical payment path. Decide fail-open (lost idempotency) vs fail-closed (outage) explicitly, with a kill switch.
  4. Never cache transient failures
    Do not store 5xx/timeouts as the authoritative response; a cached 500 poisons the key and blocks legitimate retries for 24h.
  5. Replay fidelity
    Replay status code, body, and headers consistently.
  6. Phased rollout + observability
    Dark-launch then cohort/percentage ramp; alert on Redis latency and cache-hit rate. Replace the big-bang enablement.

Preserved dissent

Dissent on the record

Codex: “All clients at once” is not inherently impossible given optional keys + backwards-compatible server behavior — but only with a kill switch, metrics, and staged enablement by route/client cohort.

The rollout aggressiveness is debatable; the correctness blockers above are not.

Open questions

Next actions

  1. Spec the atomic state machine + fingerprint.
  2. Decide Redis failure policy + kill switch.
  3. Enumerate non-cacheable status codes.
  4. Design the phased rollout + dashboards.
  5. Then re-review.