The Art of Data Alchemy: Unlocking raw data into pure gold for AI and Analytics - Couverture souple

CAYLA, Benoît

 
9798294846527: The Art of Data Alchemy: Unlocking raw data into pure gold for AI and Analytics

Synopsis

The journey through this book begins with a simple question: what does it mean to prepare data? Like a modern-day alchemist, a data practitioner’s role is to transform raw, messy, and often incomplete data into something structured, refined, and valuable—ready to power AI models, analytics dashboards, and strategic decisions.

To begin, we lay the foundation by defining what data preparation really is, why it matters, and how it fits into the broader data lifecycle. From there, we zoom in on the data itself, learning how to explore and understand it before attempting any transformation. This understanding is crucial: it guides everything that follows.

As we move forward, we confront familiar challenges—missing values, inconsistencies, outliers, and errors. These are the impurities in our raw material, and we’ll address them with practical techniques that clean and stabilize your data. This naturally leads us to data transformation, where we reshape, normalize, aggregate, and reformat the data so it’s fit for purpose.

With the basics in place, we venture into data enrichment. Sometimes raw data isn’t enough. We’ll bring in external context, extract insights from text and images, and leverage machine learning techniques to uncover hidden patterns—all to enhance the value and depth of the dataset.

From there, we prepare our datasets for real-world applications. You’ll learn how well-prepared data fuels AI and machine learning workflows, how it supports accurate reporting and compelling dashboards, and how its quality directly affects the success of any downstream use.

In later chapters, we explore cutting-edge tools and trends—from generative AI that automates parts of the preparation process, to visual, no-code tools like Alteryx that make data preparation more accessible. We’ll also scale our efforts to handle large, distributed datasets with technologies like PySpark, and explore the transition from DataPrep to DataOps.

We close by looking ahead: at emerging trends, evolving roles like the citizen developer, and the challenges of real-time and privacy-aware data preparation.

This book is your practical companion. You can follow it from start to finish, or dip into the chapters that speak most directly to your needs. Whatever your path, the goal is the same: to help you turn raw data into gold—data that’s ready for whatever comes next.

Table of content:
- Introduction to data preparation
- Unveiling the secrets of data
- Data quality challenges
- Techniques for data transformation
- Enriching the dataset
- Preparing data for Machine Learning and AI
- Preparing data for analytics
- Generative AI for Data Preparation
- Visual data preparation
- Data preparation at scale
- Trends and future challenges

Les informations fournies dans la section « Synopsis » peuvent faire référence à une autre édition de ce titre.