Cache pingpong - 走看看

zoukankan html css js c++ java

Cache pingpong
简单的描述: Cache ping-pong is a more subtle form of thrashing that can happen when more than one CPU is using data that is cached on the same line. For example, say there are two CPUs using an array of structures:
```
   struct flags {
      bool pushed;
      bool popped;
   } states[N];
```
There is no conflict: One CPU touches only flags.pushed, and the other only flags.popped. But if these two elements are cached together then when the first CPU modifies a flags.pushed the cache coherence machinery will need to copy that cache line over to the second CPU's cache. Ping. And when the second CPU modifies a flags.popped its cache line will have to be copied over to the first CPU's cache. Pong. The answer is to lay out your data in memory so that the elements needed by each CPU are on separate cache lines. This may mean breaking up logical structures and spreading their elements across multiple arrays, in a pattern reminiscent of old-school FORTRAN:
```
   struct flags {
      bool pushed[N];
      char pad[LINE_SIZE];
      bool popped[N];
   } states;
```
```
更一般的描述:
```
One way of maintaining cache coherence in multiprocessing designs with CPUs that have local caches is to ensure that single cache lines are never held by more than one CPU at a time. With write-through caches, this is easily implemented by having the CPUs invalidate cache lines on snoop hits. However, if multiple CPUs are working on the same set of data from main memory, this can lead to the following scenario: CPU #1 reads a cache line from memory. CPU #2 reads the same line, CPU #1 snoops the access and invalidates its local copy. CPU #1 needs the data again and has to re-read the entire cache line, invalidating the copy in CPU #2 in the process. CPU #2 now also re-reads the entire line, invalidating the copy in CPU #1. Lather, rinse, repeat. The result is a dramatic performance loss because the CPUs keep fetching the same data over and over again from slow main memory. Possible solutions include: Use a smarter cache coherence protocol, such as MESI. Mark the address space in question as cache-inhibited. Most CPUs will then resort to single-word accesses which should be faster than reloading entire cache lines (usually 32 or 64 bytes). If the data set is small, make one copy in memory for each CPU. If the data set is large and processed sequentially, have each CPU work on a different part of it (one starting at the beginning, one at the middle, etc.).
查看全文

相关阅读:
32.ExtJS简单的动画效果
 set、env、export差分
 【翻译】Why JavaScript Is and Will Continue to Be the First Choice of Programmers
J2EE请求和响应—Servlet
Leetcode: Spiral Matrix. Java
Android正在使用Handler实现信息发布机制（一）
Android开发工具综述，开发人员必备工具
 Android 从硬件到应用程序：一步一步爬上去 5 -- 在Frameworks蒂姆层硬件服务
 HDU 2828 DLX搜索
 2016第三周三

原文地址：https://www.cnblogs.com/ohscar/p/3109622.html