Ask HN: How do you handle oncall at 15-30 engineers?

15 hours agoby chw9e

I'm doing research on how oncall breaks down as engineering teams scale, specifically at Series A/B stage companies (roughly 15-30 engineers).

The hypothesis I'm exploring: oncall workflows were designed for small teams where everyone knows the codebase. They break at scale because of specialization and context loss before volume becomes a problem.

Some patterns I've heard so far:

- Oncall engineers spending 10-20% of time just routing bugs to the right owner

- Bug reports from CS/sales teams missing critical context (no logs, vague repro steps)

- "Couple hours" average per bug, mostly on investigation not fixing

- Session replay tools too expensive to run at meaningful coverage

I'm trying to figure out:

- Is this universal, or just specific to certain tech stacks/org structures?

- What's the actual breaking point - team size, user scale, something else?

- Has anyone solved this well? What worked?

If you're currently running oncall at a growing company, I'd love to hear:

- What percentage of oncall time is triage/routing vs. actual fixing?

- What broke first as you scaled?

- What have you tried?

No comments