Back
How batching, caching, quantization, and speculative decoding changed serving economics.
llm
inference
performance
optimization