Practical LLM Evaluation for Production Systems: Measure, monitor, and improve reliable LLM systems across training and inference - Couverture souple

Ammar Mohanna; Indrajit Kar; Feli Ralte

 
9781807423896: Practical LLM Evaluation for Production Systems: Measure, monitor, and improve reliable LLM systems across training and inference

Synopsis

Build reliable LLM-powered systems using practical evaluation frameworks, production metrics, and deployment-ready monitoring strategies.

Key Features

  • Design evaluation frameworks for production-grade LLM systems
  • Measure reliability, safety, latency, and cost across LLM workflows
  • Apply unified evaluation methods to text, multimodal, and agentic AI systems
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Move beyond benchmarks and learn how to evaluate whether LLM-powered systems actually work in production. This book gives you practical frameworks, metrics, and operational strategies to measure reliability, safety, quality, latency, and cost across modern AI systems. Guided by experienced AI leaders and researchers, you’ll build evaluation pipelines that support real business decisions instead of isolated leaderboard scores.

The book takes a product-first approach to evaluation, treating it as a continuous operational capability rather than a one-time testing exercise. You’ll explore how evaluation changes across training, inference, and end-to-end system operation while learning how to connect metrics directly to deployment gates, rollback criteria, monitoring systems, and production reliability goals.

Using practical examples and real-world workflows, the book covers evaluation strategies for text LLMs, vision-language models, multimodal conversational systems, Mixture-of-Experts architectures, agentic systems, reasoning models, Text2SQL and Text2Cypher systems, retrieval pipelines, embedding models, OCR workflows, and guardrail SLMs.

By the end of this book, you’ll be able to design and operate reliable, safe, and cost-effective LLM-powered applications with confidence.

What you will learn

  • Design repeatable evaluation pipelines for LLM systems
  • Measure inference quality, latency, and operational cost
  • Evaluate multimodal, agentic, and reasoning AI systems
  • Build regression gates and deployment evaluation workflows
  • Detect hallucinations and grounding failures in VLMs
  • Assess routing stability in Mixture-of-Experts models
  • Evaluate Text2SQL, OCR, and retrieval-based systems
  • Translate evaluation signals into production decisions

Who this book is for

ML engineers, GenAI engineers, AI architects, data scientists, platform engineers, and engineering managers responsible for deploying LLM-powered systems in production will benefit from this book. Applied AI researchers and technical decision-makers looking to measure reliability, safety, and operational readiness across modern AI systems will also find it valuable. Readers should have a working understanding of machine learning, Python, and modern LLM concepts.

Table of Contents

  1. Foundations of LLM Evaluation: Core Concepts and Primitives
  2. Building Reliable Text Only LLMs Through Training Evaluation
  3. Controlling Text-Only LLM Behavior at Inference Time
  4. Grounding and Reliability in Vision Language Models during Training
  5. Evaluating Visual Grounding and Reliability at Inference Time
  6. Evaluating Multimodal Conversational LLMs Across Training and Inference
  7. Evaluating Routing and Reliability in Mixture of Expert LLM
  8. Evaluating Reliability and Control in Computer Using Agent LLM
  9. Evaluating Information Extraction and Document Understanding LLMs
  10. Evaluating Reasoning LLMs in Depth
  11. Evaluating Specialized LLM Systems

Les informations fournies dans la section « Synopsis » peuvent faire référence à une autre édition de ce titre.

À propos des auteurs

Ammar Mohanna, PhD, is an AI and machine learning specialist based in Beirut, Lebanon. His work focuses on practical LLM systems, evaluation, MLOps/LLMOps, and applied generative AI. He teaches and consults on production AI, AI agents, and graph-based machine learning, with an emphasis on turning research ideas into reliable, usable systems for real-world teams.



Indrajit Kar comes with 18 years of various Industry experience, leading all three division, AI consulting R&D and solution engineering. He and his team build cutting edge AI and deep learning solutions to address some of the toughest problems for his customers.

He has 14 research papers and 12 patents in NLP, Timeseries, Computer Vision, and Deep learning.

In his spare time, Indrajit enjoys giving advice to small and medium-sized entrepreneurs on how to enter the AI and data science markets, attract customers, develop their products, and monetize their existing data. He's won many accolades in his career from ace innovator, services excellence awards, and 40 top data scientist under the age of 40 award.

He has enabled AI & Data science program for sectors like Smart Cities, Retail, supply chain, automotive factories, Healthcare, pharma, infrastructure & utilities. Also heading research and development in the area of Deep learning, predictive maintenance using IIoT/sensor data, edgeAi, Lidar tech, NLP and GPU powered computer vision.

In the past, he spearheaded complex Analytics projects helping industries like BFSI, Retail, CPG, FMCG, petroleum/oil & gas, to take data driven decision, predict business outcomes, allocate budget, predict customer behaviour, retention customers, acquire new customers, maximize revenue & forecasting for key areas Pricing, marketing, sale, advertisement and promotion.



Zonunfeli Ralte is an Artificial Intelligence entrepreneur, researcher, and technology leader. She founded RastrAI Private Limited, the first AI startup from India's North East region, advancing innovation in emerging technologies. Recognized as Mizoram's first woman specializing in Artificial Intelligence and Machine Learning, she has authored three books on Artificial Intelligence, Generative AI, and Computer Vision.

She is also an accomplished researcher with 16 published research papers and six Best Research Awards, reflecting her significant contributions to Artificial Intelligence, Deep Learning, and applied AI innovation.

Les informations fournies dans la section « A propos du livre » peuvent faire référence à une autre édition de ce titre.