Doprava zdarma při nákupu nad 1 499 Kč přes Zásilkovnu nebo PPL Box.

Zjistit stav objednávky

Staňte se součástí komunity milovníků knih z celého světa a získejte hromadu výhod. Založit účet zdarma

Doprava zdarma se Zásilkovnou nad 1 499 Kč

Kurýr DPD 69 Kč PPL shop 49 Kč Balíkovna 69 Kč PPL kurýr 74 Kč PPL box 39 Kč Balíkovna 49 Kč Výdejní místo DPD 49 Kč Zásilkovna 39 Kč

Kontakt

Jak nakupovat

Pomoc

Můj účet

▸ Prázdný :-(

Doprava zdarma při nákupu nad 1 499 Kč přes Zásilkovnu nebo PPL Box.

AI Inference Optimization Engineering

Name: AI Inference Optimization Engineering
Brand: Independently published
SKU: 52770465
Price: 247 CZK
Availability: InStock
Author: ChatVariety Team
ISBN: 9798199720021

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

ChatVariety Team

Jazyk

Angličtina

Kniha Brožovaná

Libristo kód: 52770465

Nakladatelství Independently published, červen 2026

Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Celý popis

Libristo kód: 52770465

25 b

Připravujeme Nové

Nové

247 Kč

Očekávané naskladnění Naskladnění 07. 06. 2026

30 dní na vrácení zboží

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Herečka & Polyglotka

EWA KASP pro

Přehrát video

Libristo má největší výběr cizojazyčné literatury. Proto své knihy kupuji tady.

Informace o knize

Plný název AI Inference Optimization Engineering

Autor ChatVariety Team

Jazyk

Angličtina

Vazba Kniha - Brožovaná

Datum vydání 2026

Počet stran 96

EAN 9798199720021

Libristo kód 52770465

Nakladatelství Independently published

Váha 142

Rozměry 152 x 229 x 5

Kategorie

Výpočetní a informační technologie > Informatika > Umělá inteligence > Přirozený jazyk a strojový překlad

Darujte tuto knihu ještě dnes

Je to snadné

1 Přidejte knihu do košíku a zvolte doručit jako dárek 2 Obratem vám zašleme poukaz 3 Kniha dorazí na adresu obdarovaného

Často hledané

Categories

Authors

Publishers

Často hledané

Zboží

Categories

Authors

Publishers

Doručení

Nákupní rádce

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Informace o knize

Kategorie

Darujte tuto knihu ještě dnes

Je to snadné

Často hledané

Categories

Authors

Publishers

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Informace o knize

Kategorie

Darujte tuto knihu ještě dnes

Je to snadné

Nemáte účet? Získejte výhody Libristo účtu!