zoukankan      html  css  js  c++  java
  • CUDA application design and development

     

    Author: Rob Farber

    Published by Elsevier Inc

    Foreword

    Arguably, for any language to be successful, it must be surrounded by an ecosystem of powerful compilers, performance and correctness tools, and optimized libraries. --Jeffrey S. Vetter

    Preface 

    CUDA (Compute Unified Device Architecture)
    harness those tens of thousands of threads of execution(I like this verb.)

    Book organization

    Chapter 1. Introduces basic CUDA concepts and the tools needed to build and debug CUDA applications. Simple examples are provided that
    demonstrates both the thrust C++ and C runtime APIs. Three simple rules for high-performance GPU programming are introduced.
    Chapter 2. Using only techniques introduced in Chapter 1, this chapter provides a complete, general-purpose machine-learning and
    optimization framework that can run 341 times faster than a single core of a conventional processor. Core concepts in machine learning and numerical optimization are also covered, which will be of interest to those who desire the domain knowledge as well as the ability to
    program GPUs. 

    Chapter 3. Profiling is the focus of this chapter, as it is an essential skill in high-performance programming. The CUDA profiling tools are introduced and applied to the real-world example from Chapter 2. Some surprising bottlenecks in the Thrust API are uncovered. Introductory data-mining techniques are discussed and data-mining functors for both Principle Components Analysis and Nonlinear Principle Components Analysis are provided, so this chapter should be of interest to users as well as programmers.

    Chapter 4. The CUDA execution model is the topic of this chapter. Anyone who wishes to get peak performance from a GPU must
    understand the concepts covered in this chapter. Examples and profiling output are provided to help understand both what the GPU is doing
    and how to use the existing tools to see what is happening. 

    Chapter 5. CUDA provides several types of memory on the GPU. Each type of memory is discussed, along with the advantages and

    disadvantages.
    Chapter 6. With over three orders-of-magnitude in performance difference between the fastest and slowest GPU memory, efficiently using memory
    on the GPU is the only path to high performance. This chapter discusses techniques and provides profiler output to help you understand and
    monitor how efficiently your applications use memory. A general functor-based example is provided to teach how to write your own generic
    methods like the Thrust API.
    Chapter 7. GPUs provide multiple forms of parallelism, including multiple GPUs, asynchronous kernel execution, and a Unified Virtual
    Address (UVA) space. This chapter provides examples and profiler output to understand and utilize all forms of GPU parallelism.
    Chapter 8. CUDA has matured to become a viable platform for all application development for both GPU and multicore processors. Pathways
    to multiple CUDA backends are discussed, and examples and profiler output to effectively run in heterogeneous multi-GPU environments are
    provided. CUDA libraries and how to interface CUDA and GPU computing with other high-level languages like Python, Java, R, and FORTRAN are
    covered. 

    Chapter 9. With the focus on the use of CUDA to accelerate computational tasks, it is easy to forget that GPU technology is also a splendid platform for visualization. This chapter discusses primitive restart and how it can dramatically accelerate visualization and gaming applications. A complete working example is provided that allows the reader to create and fly around in a 3D world. Profiler output is used to demonstrate why

    primitive restart is so fast. The teaching framework from this chapter is extended to work with live video streams in Chapter 12.
    Chapter 10. To teach scalability, as well as performance, the example from Chapter 3 is extended to use MPI (Message Passing Interface). A
    variant of this example code has demonstrated near-linear scalability to 500 GPGPUs (with a peak of over 500,000 single-precision gigaflops)
    and delivered over one-third petaflop (1015 floating-point operations per second) using 60,000 x86 processing cores.
    Chapter 11. No book can cover all aspects of the CUDA tidal wave. This is a survey chapter that points the way to other projects that provide free
    working source code for a variety of techniques, including Support Vector Machines (SVM), Multi-Dimensional Scaling (MDS), mutual
    information, force-directed graph layout, molecular modeling, and others. Knowledge of these projects—and how to interface with other
    high-level languages, as discussed in Chapter 8—will help you mature as a CUDA developer.
    Chapter 12. A working real-time video streaming example for vision recognition based on the visualization framework in Chapter 9 is
    provided. All that is needed is an inexpensive webcam or a video file so that you too can work with real-time vision recognition. This example
    was designed for teaching, so it is easy to modify. Robotics, augmented reality games, and data fusion for heads-up displays are obvious
    extensions to the working example and technology discussion in this chapter.
  • 相关阅读:
    SQLite的总结与在C#的使用
    Linq中比较字符串类型的日期
    C#中委托,匿名函数,lamda表达式复习
    MYSQL中SUM (IF())
    C#在属性中用Lambda语法
    Mysql绿色版安装和遇到的问题
    FormsAuthentication权限管理
    存储过程中高性能安全式SQL拼接
    JavaScript实现搜索联想功能
    JavaScript组成
  • 原文地址:https://www.cnblogs.com/JohnShao/p/2745575.html
Copyright © 2011-2022 走看看