CUDA and GPU Parallel Computing Engineering: Accelerating Scientific and High-Performance Workloads Through CUDA Kernels, Memory Optimization, and Multi-GPU Scaling - Couverture souple

Virek, Eamon

 
9798196510748: CUDA and GPU Parallel Computing Engineering: Accelerating Scientific and High-Performance Workloads Through CUDA Kernels, Memory Optimization, and Multi-GPU Scaling

Synopsis

A practical guide to high-performance CUDA development for engineers, researchers, and developers who need more than introductory examples. This book focuses on the full workflow of GPU computing, from understanding how streaming multiprocessors execute warps to building maintainable, testable, and scalable applications for real scientific workloads.

The chapters move from core architecture and programming fundamentals into profiling, memory tuning, numerical accuracy, and multi-GPU scaling. You will see how to turn a correct kernel into an efficient one, how to measure bottlenecks with Nsight tools, and how to make informed tradeoffs between occupancy, bandwidth, latency, and precision.

What this book covers

  1. GPU architecture and execution behavior, including warps, scheduling, memory hierarchy, and data movement costs.
  2. CUDA kernel design, with launch configuration, indexing, synchronization, debugging, and reusable interfaces.
  3. Performance engineering, using profiling metrics and iterative optimization based on measured results.
  4. Memory optimization, including coalescing, shared memory tiling, register pressure, cache behavior, and data layout.
  5. Common scientific patterns, such as stencils, reductions, scans, sparse formats, and batched linear algebra.
  6. Numerical correctness, with floating point behavior, stable summation, boundary handling, and CPU validation.
  7. Advanced coordination techniques, such as warp and block level operations, streams, events, and asynchronous overlap.
  8. Host and multi-GPU engineering, covering pinned memory, unified memory, partitioning strategies, NCCL, halo exchange, and scaling studies.

Why it stands out

  • Engineering-first approach, centered on real optimization decisions rather than isolated syntax.
  • Workflow oriented, with profiling, testing, benchmarking, and regression tracking built into the discussion.
  • Useful for scientific computing, especially stencil solvers, sparse methods, reductions, and iterative pipelines.
  • Built for maintainability, with guidance on project structure, code reuse, and repeatable validation.

Ideal for anyone who wants to write CUDA code that is not only correct, but also fast, traceable, and ready for production-scale workloads.

Les informations fournies dans la section « Synopsis » peuvent faire référence à une autre édition de ce titre.