AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment - Couverture souple

Livre 6 sur 20: Production AI Engineering Series

Team, ChatVariety

9798199720021: AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Couverture souple

ISBN 13 : 9798199720021

Editeur : Independently published, 2026

Afficher les exemplaires de cette �dition comportant l'ISBN

0 D'occasion

5 Neuf

De EUR 13,59

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Les informations fournies dans la section � Synopsis � peuvent faire r�f�rence � une autre �dition de ce titre.

�diteur: Independently published
Date d'�dition: 2026
Langue: anglais
ISBN 13: 9798199720021
Reliure: Broch�
Nombre de pages: 95
Coordonn�es du fabricant: Manufactured by Amazon on behalf of the author
https://www.amazon.fr/hz/contact-us

c/o Amazon Media EU S.�.r.l., 38 Avenue John F. Kennedy
Luxembourg
L-1855
Luxembourg

R�sultats de recherche pour AI Inference Optimization Engineering: Quantization,...

Image d'archives

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Team, ChatVariety

Edit� par Independently published, 2026

ISBN 13 : 9798199720021

Neuf Couverture souple

impression � la demande

Vendeur : California Books, Miami, FL, Etats-Unis

�valuation du vendeur 4 sur 5 �toiles

Etat : New. Print on Demand. N� de r�f. du vendeur I-9798199720021

Contacter le vendeur

Acheter neuf

EUR 13,59

Livraison gratuite
Exp�dition nationale�: Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

AI Inference Optimization Engineering

Team, Chatvariety

Edit� par Independently published, 2026

ISBN 13 : 9798199720021

Neuf PAP

Vendeur : PBShop.store US, Wood Dale, IL, Etats-Unis

�valuation du vendeur 5 sur 5 �toiles

PAP. Etat : New. New Book. Shipped from UK. Established seller since 2000. N� de r�f. du vendeur L2-9798199720021

Contacter le vendeur

Acheter neuf

EUR 14,14

Livraison gratuite
Exp�dition nationale�: Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

AI Inference Optimization Engineering

Team, Chatvariety

Edit� par Independently published, 2026

ISBN 13 : 9798199720021

Neuf PAP

Vendeur : PBShop.store UK, Fairford, GLOS, Royaume-Uni

�valuation du vendeur 5 sur 5 �toiles

PAP. Etat : New. New Book. Shipped from UK. Established seller since 2000. N� de r�f. du vendeur L2-9798199720021

Contacter le vendeur

Acheter neuf

EUR 13,42

Exp�dition �EUR 3,85
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

AI Inference Optimization Engineering (Paperback)

Chatvariety Team

Edit� par Independently Published, 2026

ISBN 13 : 9798199720021

Neuf Paperback

impression � la demande

Vendeur : CitiRetail, Stevenage, Royaume-Uni

�valuation du vendeur 5 sur 5 �toiles

Paperback. Etat : new. Paperback. Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. N� de r�f. du vendeur 9798199720021

Contacter le vendeur

Acheter neuf

EUR 16,84

Exp�dition �EUR 43,25
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : 1 disponible(s)

Ajouter au panier

Image d'archives

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Chatvariety Team

Edit� par Independently Published Jun 2026, 2026

ISBN 13 : 9798199720021

Neuf Taschenbuch

Vendeur : AHA-BUCH GmbH, Einbeck, Allemagne

�valuation du vendeur 5 sur 5 �toiles

Taschenbuch. Etat : Neu. Neuware - Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: - Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.- State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.- Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.- Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.- Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. N� de r�f. du vendeur 9798199720021

Contacter le vendeur

Acheter neuf

EUR 13

Exp�dition �EUR 60,71
Exp�dition depuis Allemagne vers Etats-Unis

Quantit� disponible : 2 disponible(s)

Ajouter au panier