CUDA and GPU Parallel Computing Engineering: Accelerating Scientific and High-Performance Workloads Through CUDA Kernels, Memory Optimization, and Multi-GPU Scaling - Couverture souple

Virek, Eamon

9798196510748: CUDA and GPU Parallel Computing Engineering: Accelerating Scientific and High-Performance Workloads Through CUDA Kernels, Memory Optimization, and Multi-GPU Scaling

Couverture souple

ISBN 13 : 9798196510748

Editeur : Independently published, 2026

Afficher les exemplaires de cette �dition comportant l?ISBN

0 D'occasion

1 Neuf

De EUR 19,31

A practical guide to high-performance CUDA development for engineers, researchers, and developers who need more than introductory examples. This book focuses on the full workflow of GPU computing, from understanding how streaming multiprocessors execute warps to building maintainable, testable, and scalable applications for real scientific workloads.

The chapters move from core architecture and programming fundamentals into profiling, memory tuning, numerical accuracy, and multi-GPU scaling. You will see how to turn a correct kernel into an efficient one, how to measure bottlenecks with Nsight tools, and how to make informed tradeoffs between occupancy, bandwidth, latency, and precision.

What this book covers

GPU architecture and execution behavior, including warps, scheduling, memory hierarchy, and data movement costs.
CUDA kernel design, with launch configuration, indexing, synchronization, debugging, and reusable interfaces.
Performance engineering, using profiling metrics and iterative optimization based on measured results.
Memory optimization, including coalescing, shared memory tiling, register pressure, cache behavior, and data layout.
Common scientific patterns, such as stencils, reductions, scans, sparse formats, and batched linear algebra.
Numerical correctness, with floating point behavior, stable summation, boundary handling, and CPU validation.
Advanced coordination techniques, such as warp and block level operations, streams, events, and asynchronous overlap.
Host and multi-GPU engineering, covering pinned memory, unified memory, partitioning strategies, NCCL, halo exchange, and scaling studies.

Why it stands out

Engineering-first approach, centered on real optimization decisions rather than isolated syntax.
Workflow oriented, with profiling, testing, benchmarking, and regression tracking built into the discussion.
Useful for scientific computing, especially stencil solvers, sparse methods, reductions, and iterative pipelines.
Built for maintainability, with guidance on project structure, code reuse, and repeatable validation.

Ideal for anyone who wants to write CUDA code that is not only correct, but also fast, traceable, and ready for production-scale workloads.

Les informations fournies dans la section � Synopsis � peuvent faire r�f�rence � une autre �dition de ce titre.

�diteur: Independently published
Date d'�dition: 2026
Langue: anglais
ISBN 13: 9798196510748
Reliure: Broch�
Nombre de pages: 251
Coordonn�es du fabricant: Manufactured by Amazon on behalf of the author
https://www.amazon.fr/hz/contact-us

c/o Amazon Media EU S.�.r.l., 38 Avenue John F. Kennedy
Luxembourg
L-1855
Luxembourg