Browse By Department
Large Language Model Refinement and Infe vLLM and High-Performance Inference: Memory Optimization Parallel Execution Token Streaming and Scalable Model Servin Book 2 (Paperback)
Once a language model has been refined its effectiveness depends on how well it can be delivered in real-world environments. This book examines the systems and techniques that enable efficient inference with a particular focus on vLLM and the...
$18.50 Delivery: $null