There's a ton of jargon here. Summarized...
Why EBS didn't work:
- EBS costs for allocation
- EBS is slow at restores from snapshot (faster to spin up a database from a Postgres backup stored in S3 than from an EBS snapshot in S3)
- EBS only lets you attach 24 volumes per instance
- EBS only lets you resize once every 6–24 hours, you can't shrink or adjust continuously
- Detaching and reattaching EBS volumes can take 10s for healthy volumes to 20m for failed ones, so failover takes longer
Why all this matters:
- their AI agents are all ephemeral snapshots; they constantly destroy and rebuild EBS volumes
What didn't work:
- local NVMe/bare metal: need 2-3x nodes for durability, too expensive; snapshot restores are too slow
- custom page-server psql storage architecture: too complex/expensive to maintain
Their solution:
- block COWs
- volume changes (new/snapshot/delete) are a metadata change
- storage space is logical (effectively infinite) not bound to disk primitives
- multi-tenant by default
- versioned, replicated k/v transactions, horizontally scalable
- independent service layer abstracts blocks into volumes, is the security/tenant boundary, enforces limits
- user-space block device, pins i/o queues to cpus, supports zero-copy, resizing; depends on Linux primitives for performance limits
Performance stats (single volume):
- (latency/IOPS benchmarks: 4 KB blocks; throughput benchmarks: 512 KB blocks)
- read: 110,000 IOPS and 1.375 GB/s (bottlenecked by network bandwidth
- write: 40,000–67,000 IOPS and 500–700 MB/s, synchronousy replicated
- single-block read latency ~1 ms, write latency ~5 ms