Back
How batching, caching, quantization, and speculative decoding changed serving economics.
llm
inference
performance
optimization
Why smaller LLMs became useful for routing, extraction, classification, and edge workflows.
slm
fine-tuning