https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/
1. 编译:
objects = main.o particle.o v3.o all: $(objects) nvcc -arch=sm_20 $(objects) -o app %.o: %.cpp nvcc -x cu -arch=sm_20 -I. -dc $< -o $@ clean: rm -f *.o app
2 链接
nvcc –arch=sm_20 –dlink v3.o particle.o main.o –o gpuCode.o
g++ gpuCode.o main.o particle.o v3.o –lcudart –o app
NVCC 的控制精度的一些编译选项
--use_fast_math (-use_fast_math)
Make use of fast math library. '--use_fast_math' implies '--ftz=true --prec-div=false
--prec-sqrt=false --fmad=true'.
--ftz {true|false} (-ftz)
This option controls single-precision denormals support. '--ftz=true' flushes
denormal values to zero and '--ftz=false' preserves denormal values. '--use_fast_math'
implies '--ftz=true'.
Default value: false.
--prec-div {true|false} (-prec-div)
This option controls single-precision floating-point division and reciprocals.
'--prec-div=true' enables the IEEE round-to-nearest mode and '--prec-div=false'
enables the fast approximation mode. '--use_fast_math' implies '--prec-div=false'.
Default value: true.
--prec-sqrt {true|false} (-prec-sqrt)
This option controls single-precision floating-point squre root. '--prec-sqrt=true'
enables the IEEE round-to-nearest mode and '--prec-sqrt=false' enables the
fast approximation mode. '--use_fast_math' implies '--prec-sqrt=false'.
Default value: true.
--fmad {true|false} (-fmad)
This option enables (disables) the contraction of floating-point multiplies
and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
or DFMA). '--use_fast_math' implies '--fmad=true'.
Default value: true.