A comprehensive guide for data scientists to master effective data cleaning tools and techniques
In data science, data analysis, or machine learning, most of the effort needed to achieve your actual purpose lies in cleaning your data. Using Python, R, and command-line tools, you will learn the essential cleaning steps performed in every production data science or data analysis pipeline. This book not only teaches you data preparation but also what questions you should ask of your data.
The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired.
You will begin by looking at data ingestion of a range of data formats. Moving on, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals.
By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks.
This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing.
Basic familiarity with statistics, general concepts in machine learning, knowledge of a programming language (Python or R), and some exposure to data science are helpful.
The text will also be helpful to intermediate and advanced data scientists who want to improve their rigor in data hygiene and wish for a refresher on data preparation issues.
Les informations fournies dans la section « Synopsis » peuvent faire référence à une autre édition de ce titre.
David Mertz, Ph.D. is the founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. He created a data science training program for Anaconda Inc. and was a senior trainer for them. With the advent of deep neural networks, he has turned to training our robot overlords as well.
He previously worked for 8 years with D. E. Shaw Research and was also a Director of the Python Software Foundation for 6 years. David remains co-chair of its Trademarks Committee and Scientific Python Working Group. His columns, Charming Python and XML Matters, were once the most widely read articles in the Python world.
Les informations fournies dans la section « A propos du livre » peuvent faire référence à une autre édition de ce titre.
Vendeur : Textbooks_Source, Columbia, MO, Etats-Unis
paperback. Etat : Good. Ships in a BOX from Central Missouri! May not include working access code. Will not include dust jacket. Has used sticker(s) and some writing or highlighting. UPS shipping for most packages, (Priority Mail for AK/HI/APO/PO Boxes). N° de réf. du vendeur 009781373U
Quantité disponible : 1 disponible(s)
Vendeur : GreatBookPrices, Columbia, MD, Etats-Unis
Etat : good. May show signs of wear, highlighting, writing, and previous use. This item may be a former library book with typical markings. No guarantee on products that contain supplements Your satisfaction is 100% guaranteed. Twenty-five year bookseller with shipments to over fifty million happy customers. N° de réf. du vendeur 42642714-5
Quantité disponible : 1 disponible(s)
Vendeur : GreatBookPrices, Columbia, MD, Etats-Unis
Etat : As New. Unread book in perfect condition. N° de réf. du vendeur 42642714
Quantité disponible : 1 disponible(s)
Vendeur : GreatBookPrices, Columbia, MD, Etats-Unis
Etat : New. N° de réf. du vendeur 42642714-n
Quantité disponible : 1 disponible(s)
Vendeur : BargainBookStores, Grand Rapids, MI, Etats-Unis
Paperback or Softback. Etat : New. Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools. Book. N° de réf. du vendeur BBS-9781801071291
Quantité disponible : 5 disponible(s)
Vendeur : California Books, Miami, FL, Etats-Unis
Etat : New. N° de réf. du vendeur I-9781801071291
Quantité disponible : Plus de 20 disponibles
Vendeur : PBShop.store UK, Fairford, GLOS, Royaume-Uni
PAP. Etat : New. New Book. Delivered from our UK warehouse in 4 to 14 business days. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. N° de réf. du vendeur L0-9781801071291
Quantité disponible : Plus de 20 disponibles
Vendeur : PBShop.store US, Wood Dale, IL, Etats-Unis
PAP. Etat : New. New Book. Shipped from UK. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. N° de réf. du vendeur L0-9781801071291
Quantité disponible : Plus de 20 disponibles
Vendeur : Ria Christie Collections, Uxbridge, Royaume-Uni
Etat : New. In. N° de réf. du vendeur ria9781801071291_new
Quantité disponible : Plus de 20 disponibles
Vendeur : Rarewaves.com USA, London, LONDO, Royaume-Uni
Paperback. Etat : New. Data in its raw state is rarely ready for productive analysis. This book not only teaches you data preparation, but also what questions you should ask of your data. It focuses on the thought processes necessary for successful data cleaning as much as on concise and precise code examples that express these thoughts. N° de réf. du vendeur LU-9781801071291
Quantité disponible : Plus de 20 disponibles