- Model2Vec Fast Static Text Embeddings
Model2Vec distills a sentence transformer into a fast static lookup table. How it works, where it fits, and where it falls short.
6 min - LLM Inference Became A Systems Problem
How batching, caching, quantization, and speculative decoding changed serving economics.
8 min