As someone building a SaaS product launching soon, outages like this are incredibly instructive—though also terrifying.
The "cold start" analogy resonates. As an indie builder, I'm constantly making decisions that optimize for "ship fast" vs "what happens when this breaks." The reality is: you can't architect for every failure mode when you're a team of 1-2 people.
But here's what's fascinating about this analysis: the recommendation for control theory and circuit breakers isn't just for AWS-scale systems. Even small products benefit from graceful degradation patterns. The difference is—at AWS scale, a metastable failure can cascade for 14 hours. At indie scale, you just... lose customers and never know why.
The talent exodus point is also concerning. If even AWS—with their resources and institutional knowledge—can struggle when senior folks leave, what chance do startups have? This is why I'm increasingly convinced that boring, well-documented tech stacks matter more than cutting-edge ones. When you're solo, you need to be able to debug at 2am without digging through undocumented microservice chains.
Question: For those running prod systems, how do you balance "dogfooding" (running on your own infra) vs "use boring, reliable stuff" (like AWS for your control plane)? Is there a middle ground?