Articles liés à Big Data With Pyspark: Processing Large Datasets: A...

Big Data With Pyspark: Processing Large Datasets: A Hands-On Guide To Distributed Data Engineering, Machine Learning And Big Data Pipelines With Apache Spark And Python - Couverture souple

 
9798290030715: Big Data With Pyspark: Processing Large Datasets: A Hands-On Guide To Distributed Data Engineering, Machine Learning And Big Data Pipelines With Apache Spark And Python

Synopsis

You'll Learn

  • Understand the Foundations of Big Data and Distributed Computing: Gain a solid grasp of Big Data concepts, including the 5 Vs, the challenges of traditional systems, and the fundamental principles of distributed computing like parallelism, fault tolerance, and scalability.

  • Master the PySpark Ecosystem: Learn the architecture of Apache Spark, its core components (Spark SQL, Structured Streaming, MLlib, GraphFrames), and how the PySpark API seamlessly integrates with Python.

  • Set Up Your PySpark Environment: Get hands-on experience setting up a complete development environment on your local machine and learn how to run applications in various cloud platforms like Databricks, AWS EMR, and Google Cloud Dataproc.

  • Process Data with RDDs and DataFrames: Master Spark's core data structures, from the low-level RDDs to the powerful and optimized DataFrames. Learn to apply a wide range of transformations and actions for data manipulation.

  • Perform Advanced Data Wrangling and Feature Engineering: Acquire skills in data cleaning, handling missing values and duplicates, and performing complex transformations using Spark SQL, Window Functions, and User-Defined Functions (UDFs), including high-performance Pandas UDFs.

  • Connect to Diverse Data Sources: Read and write data from various formats (CSV, JSON, Parquet) and connect to external systems like relational databases (JDBC), NoSQL stores (Cassandra, MongoDB), and cloud storage (S3, ADLS).

  • Build Real-Time Data Pipelines: Implement modern, fault-tolerant data ingestion with Structured Streaming, including handling event time, watermarking, and performing stateful transformations for real-time analytics.

  • Apply Machine Learning at Scale with MLlib: Learn to build and evaluate distributed machine learning pipelines for classification, regression, and clustering tasks using Spark's MLlib library.

  • Analyze Graph-Structured Data: Explore the power of GraphFrames to model and analyze complex relationships, run graph algorithms like PageRank, and find patterns in network data.

  • Optimize PySpark Applications for Performance: Dive deep into performance tuning, including understanding DAGs and shuffles, managing partitioning, optimizing joins, and configuring memory settings to make your code run faster and more efficiently.

  • Monitor, Debug, and Deploy Applications: Utilize the Spark UI to monitor your jobs, troubleshoot common errors, and learn to package and deploy your PySpark applications to different cluster managers like YARN and Kubernetes.

  • Solve Real-World Big Data Problems: Apply your knowledge through practical case studies, including building a recommendation engine, a real-time fraud detection system, and an ETL pipeline, to solidify your skills and build a portfolio.

Les informations fournies dans la section « Synopsis » peuvent faire référence à une autre édition de ce titre.

Acheter neuf

Afficher cet article
EUR 20,11

Autre devise

EUR 6,79 expédition depuis Etats-Unis vers France

Destinations, frais et délais

Résultats de recherche pour Big Data With Pyspark: Processing Large Datasets: A...

Image d'archives

Publishing, PythQuill
Edité par Independently published, 2025
ISBN 13 : 9798290030715
Neuf Couverture souple
impression à la demande

Vendeur : California Books, Miami, FL, Etats-Unis

Évaluation du vendeur 5 sur 5 étoiles Evaluation 5 étoiles, En savoir plus sur les évaluations des vendeurs

Etat : New. Print on Demand. N° de réf. du vendeur I-9798290030715

Contacter le vendeur

Acheter neuf

EUR 20,11
Autre devise
Frais de port : EUR 6,79
De Etats-Unis vers France
Destinations, frais et délais

Quantité disponible : Plus de 20 disponibles

Ajouter au panier