Great write up. We use vLLM kv cache and continuous batching as a foundation for requests in ScalarLM and also add batching optimizations in a centralized queue and by adding explicit batching support in our client.
https://www.scalarlm.com
There is more perf you can sqeeuze out of vLLM