Claude
Architecture lens claude-opus-4-8Close — fix two things first.
Top risks
- Concurrent in-flight duplicates. Redis store-on-completion leaves a race: two requests with the same key arriving near-simultaneously both miss the cache and double-charge. You need an atomic reserve (
SET NX) on receipt, with an in-progress sentinel, not store-after-response. - Key collision / weak semantics. A cached response keyed on the header alone ignores the request body. A client reusing a key with a different payload silently gets the old charge — or worse, masks a real second charge. Must bind key → request-fingerprint and return 422 on mismatch.
- Big-bang rollout. “All clients at once next sprint” gives no blast-radius control. Redis is now in the critical payment path — define behavior when Redis is down (fail-open = lost idempotency; fail-closed = outage).
The one change I’d insist on
Atomic SET NX reservation at request entry + persist the response under that key, so the dedup window covers in-flight requests, not just completed ones.
Ask another seat to check
24h TTL vs. client retry/settlement windows — does a legitimate retry after expiry double-charge? Verify against payment-processor reconciliation semantics.
- Under-weighted: Codex’s point that transient 5xx must not be cached as authoritative is sharper than my generic “cache the response.” A 502 stored as the canonical reply would permanently poison the key. And Gemini’s insistence on cache-hit/latency alerting — I named Redis-down behavior but not the observability to detect it.
- Changed my view: I’m dropping any tolerance for big-bang. Gemini + Codex converge on phased rollout, and on a core payments path the blast radius argument is decisive. Rollout sequencing is now a blocker, not a footnote.
- Still dispute: Codex’s overall “ship with changes.” The atomic-claim race and fingerprint binding aren’t tweaks — without them the feature double-charges under the exact concurrency it exists to prevent. That’s not-ready, not ship-with-changes.
- Recommendation: Do not ship — block on atomic claim + sentinel, key ↔ fingerprint binding (422), defined Redis-down behavior, non-caching of transient 5xx, and a phased rollout with latency/hit alerting.