关于最近的cuda原子操作问题 - 走看看

zoukankan html css js c++ java

关于最近的cuda原子操作问题

一定一定得避免原子操作，因为对于性能的影响实在是太明显了，例如，throughput从800MBps骤降至110MBps,

看论坛是看到有人转述的一筒子的话，记录于下:

honestly, if you're trying to do this you're probably going down the wrong path, but general rules of thumb are

- don't have multiple threads within a warp contending for a lock, that leads to all sorts of confusing issues for most people because inter-warp branches are not the same as intra-warp branches
- avoid global memory contention as much as possible (e.g., if you need to have a critical section among all warps in all CTAs, do per-CTA shared memory locks then a global lock)
- traditional threading primitives implemented with atomics are a pretty terrible idea, if you can avoid atomics as much as possible (or entirely) you can get a big perf win (and there are very interesting ways you can do this, and when I say big perf win, I mean on the order of 5-10x)

("well," you think, "it sounds like tim is speaking from experience!" oh yes, I am)

查看全文

相关阅读:
将文件放到Android模拟器的SD卡中的两种解决方法
 Response JSON数据返回
 jAVA 得到Map价值
 【动态规划】leetcode
思考互联网分布式系统
 Cocos2d-x数据持久-变更数据
 小程序猿都找到了工作经验的方式
 抄360于Launcher浮动窗口的屏幕显示内存使用情况(改进版)
vb.net窗口继承（房重建知识汇总）
Spring该讲座

原文地址：https://www.cnblogs.com/superniaoren/p/2121837.html

Copyright © 2011-2022 走看看