writel __raw_writel mb()/rmb()/wmb()

zoukankan html css js c++ java

writel __raw_writel mb()/rmb()/wmb()

在邮件列表里讨论了一下writel是如何实现的，这个函数实现在操作系统层，有内存保护的情况下，往一个寄存器或者内存地址写一个数据。

在arch/alpha/kernel/io.c中有

188 void writel(u32 b, volatile void __iomem *addr)
189 {
190     __raw_writel(b, addr);
191     mb();
192 }

这样一个writel函数的作用应该是向一个地址上写一个值，我想知道这个函数底下具体实现的细节，于是往下继续跟踪代码：__raw_writel(b, addr);

129 void __raw_writel(u32 b, volatile void __iomem *addr)
130 {
131     IO_CONCAT(__IO_PREFIX,writel)(b, addr);
132 }

再往下跟踪 IO_CONCAT，在对应的io.h中的定义如下：

134 #define IO_CONCAT(a,b) _IO_CONCAT(a,b)
135 #define _IO_CONCAT(a,b) a ## _ ## b

这段代码前几天问过了，是标示将两边的字符串连接起来的意思。

跟踪__IO_PREFIX 定义如下

501 #undef __IO_PREFIX
502 #define __IO_PREFIX     apecs

到这里就结束了，再往下我就晕了，有问题如下：

1、到底是怎么将数据写入地址的？我把这些单独提取出来，进行预编译，宏展开后，发现是这样的：

void __raw_writel(                                )
{
    apecs_writel(b, addr);
}

但是在内核里根本就没找到apecs_writel函数，请帮忙解释下。

For the first question,
you should refer to the file "arch\alpha\kernle\Machvec_impl.h"
"~\Machve.h" "~\io.c" "~\io.h" "~\core_**.h".

as you have analysized before, in the file Machvec_impl.h and Machve.h,
DO_CIA_IO,IO,IO_LITE, these three macros implement the symbole
connection between ** arch and writel function, and the function
pointer initializations.
so, the details implementation to writel is to init the
alpha_machine_vector structure and the definition to the relevant
function pointer invoked to complete the low-level write operation.

.mv_writel =CAT(low,_writel),<---IO(CIA,cia)<-->cia_writel(b, addr); <---

|
writel(b, addr)-->__raw_writel(b, addr);--->cia_writel(b,addr)---------------

For the second quesiton,
mb()--->__asm__ __volatile__("mb": : :"memory");
so, it is a memory barrier for alpha architecture to ensure some
operations before some actions could be occured.
and, it is similiar with the barrier() in x86 platform/arm platform.

继续阅读代码，看看定义__IO_PREFIX之后紧接着包含了哪个头文件。在哪个头文
件里面寻找答案。对于你的apsec，看看以下代码段（linux-2.6.28-rc4）

arch/alpha/include/asm/core_apecs.h
------------------------------------------
#undef __IO_PREFIX
#define __IO_PREFIX apecs
#define apecs_trivial_io_bw 0
#define apecs_trivial_io_lq 0
#define apecs_trivial_rw_bw 2
#define apecs_trivial_rw_lq 1
#define apecs_trivial_iounmap 1
#include <asm/io_trivial.h>
------------------------------------------

arch/alpha/include/asm/io_trivial.h
------------------------------------------
__EXTERN_INLINE void
IO_CONCAT(__IO_PREFIX,writel)(u32 b, volatile void __iomem *a)
{
*(volatile u32 __force *)a = b;
}

就是最终通过*(volatile u32 __force *)a = b;
来写入数据的。

如果在没有os，没有mmu的情况下，当开发板裸跑的时候，我们只需要一句话就一切ok：

*(unsigned long *)addr = value;

在阅读linux 2.6.23内核代码中遇到mb()/rmb()/wmb() 这几个宏，不明白如何使用，
在分析其汇编代码后，大概的了解了这和内存屏障有关，代码如下：

#define X86_FEATURE_XMM2 (0*32+26) /* Streaming SIMD Extensions-2 */

......

#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)

#ifdef CONFIG_X86_OOSTORE
/* Actually there are no OOO store capable CPUs for now that do SSE,
but make it already an possibility. */
#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
#else
#define wmb() __asm__ __volatile__ ("": : :"memory")
#endif

.......

/*
* Alternative instructions for different CPU types or capabilities.
*
* This allows to use optimized instructions even on generic binary
* kernels.
*
* length of oldinstr must be longer or equal the length of newinstr
* It can be padded with nops as needed.
*
* For non barrier like inlines please define new variants
* without volatile and memory clobber.
*/
#define alternative(oldinstr, newinstr, feature) \
asm volatile ("661:\n\t" oldinstr "\n662:\n" \
      ".section .altinstructions,\"a\"\n" \
      "   .align 4\n" \
      "   .long 661b\n"          /* label */ \
      "   .long 663f\n"    /* new instruction */ \
      "   .byte %c0\n"          /* feature bit */ \
      "   .byte 662b-661b\n"    /* sourcelen */ \
      "   .byte 664f-663f\n"    /* replacementlen */ \
      ".previous\n" \
      ".section .altinstr_replacement,\"ax\"\n" \
      "663:\n\t" newinstr "\n664:\n" /* replacement */\
      ".previous" :: "i" (feature) : "memory")

内存屏障主要解决的问题是编译器的优化和CPU的乱序执行。
编译器在优化的时候，生成的汇编指令可能和c语言程序的执行顺序不一样，在需要程序严格按照c语言顺序执行时，需要显式的告诉编译不需要优化，这在linux下是通过barrier()宏完成的，它依靠volidate关键字和 memory关键字，前者告诉编译barrier()周围的指令不要被优化，后者作用是告诉编译器汇编代码会使内存里面的值更改，编译器应使用内存里的新值而非寄存器里保存的老值。
同样，CPU执行会通过乱序以提高性能。汇编里的指令不一定是按照我们看到的顺序执行的。linux中通过mb()系列宏来保证执行的顺序。具体做法是通过mfence/lfence指令（它们是奔4后引进的，早期x86没有）以及x86指令中带有串行特性的指令（这样的指令很多，例如linux中实现时用到的lock指令，I/O指令，操作控制寄存器、系统寄存器、调试寄存器的指令、iret指令等等）。简单的说，如果在程序某处插入了mb()/rmb()/wmb()宏，则宏之前的程序保证比宏之后的程序先执行，从而实现串行化。wmb的实现和barrier()类似，是因为在x86平台上，写内存的操作不会被乱序执行。
实际上在RSIC平台上，这些串行工作都有专门的指令由程序员显式的完成，比如在需要的地方调用串行指令，而不像x86上有这么多隐性的带有串行特性指令（例如lock指令）。所以在risc平台下工作的朋友通常对串行化操作理解的容易些。

原文地址 http://blog.chinaunix.net/u/6071/showart_2049460.html

查看全文

相关阅读:
跃迁方法论 Continuous practice
EPI online zoom session 面试算法基础知识直播分享
 台州 OJ 2648 小希的迷宫
 洛谷 P1074 靶形数独
 洛谷 P1433 DP 状态压缩
 台州 OJ FatMouse and Cheese 深搜记忆化搜索
 台州 OJ 2676 Tree of Tree 树状 DP
台州 OJ 2537 Charlie's Change 多重背包二进制优化路径记录
 台州 OJ 2378 Tug of War
台州 OJ 2850 Key Task BFS

原文地址：https://www.cnblogs.com/leaven/p/1902760.html