Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More - Couverture souple

Poisson, Peter E.

9798294338459: Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Couverture souple

ISBN 13 : 9798294338459

Editeur : Independently published, 2025

Afficher les exemplaires de cette �dition comportant l?ISBN

2 D'occasion

De EUR 18,59

6 Neuf

De EUR 19

Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.

Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you’re building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.

Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.

Key Features:
• Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more
• Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching
• Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment
• Covers performance profiling, streaming, batching, and cost-efficient scaling
• Future-proof insights on compiler-aware models, LoRA 2.0, and edge inference

Ready to build LLM systems that are faster, cheaper, and more scalable?
Grab your copy of Optimizing LLM Performance today and deploy smarter.

Les informations fournies dans la section � Synopsis � peuvent faire r�f�rence � une autre �dition de ce titre.

�diteur: Independently published
Date d'�dition: 2025
Langue: anglais
ISBN 13: 9798294338459
Reliure: Broch�
Nombre de pages: 163
Coordonn�es du fabricant: non disponible
Personne responsable: non disponible

Acheter D'occasion

�tat : Comme neuf

Unread book in perfect condition...

Afficher cet article

EUR 18,59

Exp�dition �EUR 2,29
Exp�dition nationale�: Etats-Unis

Ajouter au panier

Acheter neuf

Afficher cet article

EUR 19

Exp�dition �EUR 2,29
Exp�dition nationale�: Etats-Unis

Ajouter au panier

R�sultats de recherche pour Optimizing LLM Performance: Framework-Agnostic Techniques...

Image d'archives

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Edit� par Independently published, 2025

ISBN 13 : 9798294338459

Ancien ou d'occasion Couverture souple

Vendeur : GreatBookPrices, Columbia, MD, Etats-Unis

�valuation du vendeur 5 sur 5 �toiles

Etat : As New. Unread book in perfect condition. N� de r�f. du vendeur 50955172

Contacter le vendeur

Acheter D'occasion

EUR 18,59

Exp�dition �EUR 2,29
Exp�dition nationale�: Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Edit� par Independently published, 2025

ISBN 13 : 9798294338459

Neuf Couverture souple

Vendeur : GreatBookPrices, Columbia, MD, Etats-Unis

�valuation du vendeur 5 sur 5 �toiles

Etat : New. N� de r�f. du vendeur 50955172-n

Contacter le vendeur

Acheter neuf

EUR 19

Exp�dition �EUR 2,29
Exp�dition nationale�: Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance

Poisson, Peter E.

Edit� par Independently Published, 2025

ISBN 13 : 9798294338459

Neuf PAP

Vendeur : PBShop.store US, Wood Dale, IL, Etats-Unis

�valuation du vendeur 5 sur 5 �toiles

PAP. Etat : New. New Book. Shipped from UK. Established seller since 2000. N� de r�f. du vendeur L2-9798294338459

Contacter le vendeur

Acheter neuf

EUR 21,37

Livraison gratuite
Exp�dition nationale�: Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance (Paperback)

Peter E. Poisson

Edit� par Independently Published, 2025

ISBN 13 : 9798294338459

Neuf Paperback

impression � la demande

Vendeur : Grand Eagle Retail, Bensenville, IL, Etats-Unis

�valuation du vendeur 5 sur 5 �toiles

Paperback. Etat : new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from multiple locations in the US or from the UK, depending on stock availability. N� de r�f. du vendeur 9798294338459

Contacter le vendeur

Acheter neuf

EUR 21,94

Livraison gratuite
Exp�dition nationale�: Etats-Unis

Quantit� disponible : 1 disponible(s)

Ajouter au panier

Image d'archives

Optimizing LLM Performance

Poisson, Peter E.

Edit� par Independently Published, 2025

ISBN 13 : 9798294338459

Neuf PAP

Vendeur : PBShop.store UK, Fairford, GLOS, Royaume-Uni

�valuation du vendeur 4 sur 5 �toiles

PAP. Etat : New. New Book. Shipped from UK. Established seller since 2000. N� de r�f. du vendeur L2-9798294338459

Contacter le vendeur

Acheter neuf

EUR 19,20

Exp�dition �EUR 4,82
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Edit� par Independently published, 2025

ISBN 13 : 9798294338459

Neuf Couverture souple

Vendeur : GreatBookPricesUK, Woodford Green, Royaume-Uni

�valuation du vendeur 5 sur 5 �toiles

Etat : New. N� de r�f. du vendeur 50955172-n

Contacter le vendeur

Acheter neuf

EUR 19,19

Exp�dition �EUR 17,38
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Edit� par Independently published, 2025

ISBN 13 : 9798294338459

Ancien ou d'occasion Couverture souple

Vendeur : GreatBookPricesUK, Woodford Green, Royaume-Uni

�valuation du vendeur 5 sur 5 �toiles

Etat : As New. Unread book in perfect condition. N� de r�f. du vendeur 50955172

Contacter le vendeur

Acheter D'occasion

EUR 20,83

Exp�dition �EUR 17,38
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : Plus de 20 disponibles

Ajouter au panier

Image d'archives

Optimizing LLM Performance (Paperback)

Peter E. Poisson

Edit� par Independently Published, 2025

ISBN 13 : 9798294338459

Neuf Paperback

impression � la demande

Vendeur : CitiRetail, Stevenage, Royaume-Uni

�valuation du vendeur 5 sur 5 �toiles

Paperback. Etat : new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. N� de r�f. du vendeur 9798294338459

Contacter le vendeur

Acheter neuf

EUR 23,26

Exp�dition �EUR 42,88
Exp�dition depuis Royaume-Uni vers Etats-Unis

Quantit� disponible : 1 disponible(s)

Ajouter au panier