TL;DR — Anthropic Postmortem of Three Recent Issues
In Aug–Sep 2025, Claude users saw degraded output quality due to infrastructure bugs, not intentional changes.
The Three Issues
1. *Context window routing error*
- Short-context requests sometimes routed to long-context servers.
- Started small, worsened after load-balancing changes.
2. *Output corruption*
- TPU misconfigurations led to weird outputs (wrong language, syntax errors).
- Runtime optimizations wrongly boosted improbable tokens.
3. *Approximate top-k miscompilation*
- A compiler bug in TPU/XLA stack corrupted token probability selection.
- Occasionally dropped the true top token.
Why It Was Hard to Detect
- Bugs were subtle, intermittent, and platform-dependent.
- Benchmarks missed these degradations.
- Privacy/safety rules limited access to real user data for debugging.
Fixes and Next Steps
- More sensitive, continuous evals on production.
- Better tools to debug user feedback safely.
- Stronger validation of routing, output correctness, and token-selection.