zoukankan      html  css  js  c++  java
  • 核中汇编写的字符串函数代码分析

    *************************************************************** 
    开始啃用汇编写的字符串函数: 
    *************************************************************** 
    --------------------------------------------------------------- 
    _I386_STRING_H_宏 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #ifndef _I386_STRING_H_ 
    #define _I386_STRING_H_ 
    当包括了该汇编写的字符串处理函数的头文件后,就定义这个宏予以说明。 
    --------------------------------------------------------------- 
    __KERNEL__宏 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #ifdef __KERNEL__ 
    #include <linux/config.h> 
    注意: 
    只有定义的了__KERNEL__宏才会包含config.h头文件。 
    /* 
    * On a 486 or Pentium, we are better off not using the 
    * byte string operations. But on a 386 or a PPro the 
    * byte string ops are faster than doing it by hand 
    * (MUCH faster on a Pentium). 
    */ 
    下面这段注释很重要,建议看看: 
    /* 
    * This string-include defines all string functions as inline 
    * functions. Use gcc. It also assumes ds=es=data space, this *should be normal. Most of the string-functions are rather *heavily hand-optimized, 
    * see especially strsep,strstr,str[c]spn. They should work, but are not 
    * very easy to understand. Everything is done entirely within the register 
    * set, making the functions fast and clean. String instructions have been 
    * used through-out, making for "slightly" unclear code :-) 

    * NO Copyright (C) 1991, 1992 Linus Torvalds, 
    * consider these trivial functions to be PD. 
    */ 

    /* AK: in fact I bet it would be better to move this stuff all out of line. */ 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCPY strcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCPY 
    static inline char * strcpy(char * dest,const char *src) 

    int d0, d1, d2; 
    __asm__ __volatile__( 
    "1:\tlodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2) 
    :"0" (src),"1" (dest) 
    : "memory"); 
    return dest; 


    分析: 
    1.改写指令更清楚点: 
    1: ---> 1: 
    lodsb ---> mov al,ds:[si] 
    inc si 
    stosb ---> mov es:[di],al 
    inc di 
    testb al,al ---> test al,al 
    jne 1 ---> jne 1 
    明显该循环以0结束,当读到最后一个为0的字节后,该循环终止。 

    2.参数分析: 
    S: si/esi 
    &: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

    0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 

    int d0, d1, d2; 
    "=&S" (d0), "=&D" (d1), "=&a" (d2) 
    "0" (src),"1" (dest) 
    代码分析: 
    该输入操作数src和dst是既用作为输入操作数,又用作输出操作数的。在最开始时,src,dest作为整个函数的入口参数。将src,dest这两个char*型指针送入si/esi,di/edi中。在"0"与"1"的作用下,src与d0占据同一个寄存器si/esi,dst与d1占据同一个寄存器di/edi,所以d0,d1将分别从si/esi,di/edi中取出src,dest存入其中的函数入口参数,从而实现了将参数转移到函数局部变量上来。在函数的执行中si/esi,di/edi寄存器发生了变化。最后函数执行完毕返回时。由于src,dest前面指定的"0"和"1"说明了src,dest是既用作为输入操作数,又用作输出操作数的。且又分别与第0,1个输出操作数d0,d1占据同一个寄存器si/esi,di/edi。且又在"&"的保护下,明确指明输出操作数不能覆盖输入操作数,所以src,dest分别存入si/esi,di/edi中作为输出。 

    D: di/edi 
    a: ax/eax 
    "memory": 这是register-modified部分。说明内存修改不可预测,禁止编译器将其值缓存于寄存器中。 

    3.指令分析: 
    lodsb: == mov al,[si] 
    inc si / dec si 
    stosb: == mov es:[di],al 
    inc di / dec di 
    testb: == test oprd1,oprd2 
    把oprd1 & oprd2指令执行后,设置标志ZF,PF,SF. 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNCPY strncpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCPY 
    static inline char * strncpy(char * dest,const char *src,size_t count) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "1:\tdecl %2\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "rep\n\t" 
    "stosb\n" 
    "2:" 
    : "=&S" (d0), "=&D" (d1), "=&c" (d2), "=&a" (d3) 
    :"0" (src),"1" (dest),"2" (count) 
    : "memory"); 
    return dest; 


    指令重排: 
    1: decl ecx ===> 1: dec cx 
    js 2 ===> js 2 
    lodsb ===> mov al,ds:[si] 
    inc si / dec si 
    stosb ===> mov es:[di],al 
    inc di /dec si 
    testb al,al ===> test al,al 
    jne 1 ===> jne 1 

    rep ===> rep 
    stosb ===> mov es:[di],al 
    inc di /dec si 
    2: ===> 2: 

    分析: 
    对这段代码的分析分3种情况: 
    若内存中为: abcde\0, 
    1)要求复制3个字符: 
    (1)初始值CX == 3 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    3-->2: copy a 
    2-->1: copy b 
    1-->0: copy c 
    0-->-1 js 2 

    2)要求复制5个字符: 
    (1)初始值CX == 5 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    5-->4: copy a 
    4-->3: copy b 
    3-->2: copy c 
    2-->1: copy d 
    1-->0: copy e 
    0-->-1 js 2 
    (2)所以复制5个字符: 复制5个字符:5个字符. 

    3)要求复制6个字符: 
    (1)初始值CX == 6 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    6-->5: copy a 
    5-->4: copy b 
    4-->3: copy c 
    3-->2: copy d 
    2-->1: copy e 
    1-->0: copy \0 
    test al,al ===> al == \0 ZF == 1成立. 
    jne 1 ===> 不会跳转到1 

    继续往下执行:此时CX == 0,al == \0 
    rep: 判断CX是否为0,而cx == 0,就结束循环 
    (2)所以复制6个字符: 复制6个字符:5个字符+一个'\0'. 

    4)要求复制10个字符: 
    初始值CX == 10 
    然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
    10-->9: copy a 
    9-->8: copy b 
    8-->7: copy c 
    7-->6: copy d 
    6-->5: copy e 
    5-->4: copy \0 
    test al,al ===> al == \0 ZF == 1成立. 
    jne 1 ===> 不会跳转到1 

    继续往下执行:此时CX == 4,al == \0 
    rep : CX==4,CX!=0,(CX=CX-1)==3,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==3,CX!=0,(CX=CX-1)==2,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==2,CX!=0,(CX=CX-1)==1,继续往下执行 
    copy al == \0 
    重复循环: 
    rep : CX==1,CX!=0,(CX=CX-1)==0,继续往下执行 
    copy al == \0 
    重复循环:rep: cx==0,就结束循环 
    (2)所以复制10个字符,先复制6个字符:5个字符+一个'\0',再填充4个'\0' 

    5)要求复制0个字符: 
    (1)初始值CX == 0 
    0-->-1 js 2 
    (2)所以复制了0个字符。 

    6)要求复制-1个字符: 
    (1)初始值CX == -1 
    -1-->-2 js 2 
    (2)所以复制了0个字符。 
    注意: 
    static inline char * strncpy(char * dest,const char *src,size_t count),该函数中的count是送往cx/ecx中去了,而ecx最大为32位故对有符号数最多复制2G-1个字节,即字符串不能超过(2G-1)B。 
    当时产生疑问,当CX<=0时,都是不复制,为何不干脆用个无符号数,这样可以扩大到4G。请看下一个函数就解决了。因为当要把两个字符串串联起来时,也是用ECX作为计数器的,而ECX为32位,最大表示范围为4G-1,所以这两个字符串的长度就各分了一半为2G-1. 
    rep指令说明: 
    重复其后面的串操作指令动作,每一次重复都先判断CX是否为0,如为0就结束循环,否则CX的值减1。 
    类似于loop指令,但loop指令是先把CX的值减1,后再来判断是否为0。 
    注意在重复过程中的减一操作,不会影响各标志。 
    --------------------------------------------------------------- 
    strcat() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCAT 
    static inline char * strcat(char * dest,const char * src) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "decl %1\n" 
    "1:\tlodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
    :"memory"); 
    return dest; 

    指令重排: 
    repne ===> while(ECX != 0 && ZF != 1) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 


    decl %1 ===> dec edi 
    1: ===> 1: 
    lodsb ===> mov al, ds:[esi] 
    inc esi 
    stosb ===> mov es:[edi], al 
    inc edi 
    testb %%al,%%al ===> test al, al 
    jne 1 ===> jne 1 

    参数初始值分析: 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
    src ==> si/esi 此处为: esi 
    dest ==> di/edi 此处为: edi 
    0 ==> ax/eax 此处为: ax 
    0xffffffffu ===> ecx 此处为: ecx 
    所以,esi,edi指向两个字符串的起始位置;而ax==0;ecx==0xffffffffu 

    一般情况分析: 
    初始值: 
    esi--->'abc\0' (src) 
    edi--->'123\0' (dest) 
    al == 0 
    ecx == 0xffffffffu 
    while(ECX != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 

    在edi所指向的字符串中一直找到以'\0'结束的地方。然后,edi指向'\0'字节的下一个字节,ECX--;再就循环结束。此时edi=edi+4;ECX=ECX-4。 

    说明:可见要么在es:[edi]所指向的字符串中找到为'\0'的字符,从而能结束循环。要么该字符串大于或等于0xffffffff(2G-1B)(不计结尾处的'\0'),使得ECX减为0,从而结束循环。 

    dec edi 
    edi = edi - 1;edi就指向es:[edi]所指向的字符串中的'\0'结束处字符。 

    此时寄存器的值为: 
    esi--->'abc\0' (src) 
    edi--->'123\0'中的为'\0'结尾处字符 (dest) 
    al == 0 
    ecx == 0xffffffffbu 

    1: 
    mov al, ds:[esi] 
    inc esi 
    mov es:[edi], al 
    inc edi 
    test al, al 
    jne 1 
    将ds:[esi]所指向的字符串复制到es:[edi]所指向的字符串的结尾处,从es:[edi]所指向字符串的'\0'处开始。该'\0'被覆盖。 

    esi--->'abc\0?'中的'?'处. (src) 
    edi--->'123abc\0?'中的最后为'?'结尾处字符 (dest) 
    al == 0 ,注意这个0是从esi所指向的字符串中取出的结尾字符,而非初始化的0 

    功能:strcat(char * dest,const char * src),将src所指向的字符串复制到dest所指向的字符串的后面,将dest的'\0'覆盖,dest-src串成一个字符串后,再将src的'\0'复制过来使dest-src串结的字符串结束。 

    算法过程: 
    1.先扫描dest所指向的字符串,找到其的为'\0'处; 
    2.再从src所指向的字符串中一一将src所指向的字符串的各个字节复制到dest以'\0'为起始处。一直复制到src所指向的字符串的最后一个'\0',将这个'\0'复制完后。就结束程序。 
    可见,该函数要求src,dest所向的字符串要以'\0'结束。 

    特殊情况1: 
    初始值: 
    esi--->'abc\0' (src) 
    edi--->'123456789... ...YX' 该字符串>=0xffffffff (dest) 
    设edi指向es这个段的开始处,为0基址。 
    即:edi[0]=='1',edi[0xffffffff]=='X',由于edi只有32位,表示范围为0X0--->0xffffffff,共4G个字符。所以就算该字符串有多于4G的字符,esi将无法引用,所以该edi所指向的字符串到edi[0xffffffff]=='X'止。字符再多,edi再++,edi又变为了0。 
    esi的分析也同此。 
    al == 0 
    ecx == 0xffffffffu 
    while(ECX != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ECX--; 

    循环体执行0xffffffff次 
    由于edi所指向的字符串>=0xffffffff,则在上面的寻找edi所指向的字符串的'\0'结束符时候,就会使ECX == 0,从而结束循环,此时edi指向(0xffffffff)处的字节。(不考虑段越界) 
    出循环时,ECX == 0,edi == 0xffffffff。 

    dec edi 
    edi = edi - 1;edi == 0xffffffff-1,即:edi[0xffffffff-1]=='Y'。 

    此时寄存器的值为: 
    esi--->'abc\0' (src) 
    edi--->'123456......YX',edi==0xffffffff-1,edi就指向edi[0xffffffff-1]=='Y'(即:0xffffffff-1)处的字节 (dest) 
    al == 0 
    ecx == 0x00000000u 

    1: 
    mov al, ds:[esi] 
    inc esi 
    mov es:[edi], al 
    inc edi 
    test al, al 
    jne 1 
    将ds:[esi]所指向的字符串'abc\0'中的esi[0]=='a'复制到es:[edi]==es:edi[0xffffffff-1]=='Y'处。该es:[0xffffffff-1]=='Y'的字节'Y'被覆盖为'a'。即:esi[0]=='a'--->edi[0xffffffff-1]=='Y' 
    edi--->'123456......aX'。 
    这时,esi++,esi[1]=='b';edi++,edi[0xffffffff]=='X'。 

    再从ds:[esi]中复制下一个esi[1]=='b',到edi[0xffffffff]=='X' 
    edi--->'123.....ab',edi++,edi==0x00000000,就指向edi[0]=='1'处的字节 
    esi++,esi[2]=='c'.esi--->'abc\0?'中的'c'处, (src) 

    再从esi[2]=='c',复制到edi[0x00000000]=='1'处。 
    esi++,esi[3]=='\0',esi--->'abc\0?'中的'\0'处. (src) 
    edi--->'c23.....ab',edi++,edi==0x00000001,就指向edi[0x00000001]=='2'处的字节 

    再从esi[3]=='\0',复制到edi[0x00000001]=='2'处。 
    esi++,esi[4]=='?',esi--->'abc\0?'中的'?'处. (src) 
    edi--->'c\03.....ab',edi++,edi==0x00000002,就指向edi[0x00000002]=='3'处的字节。 

    所以合并后的字符串为"c\0". 

    与此类似,当src中的字符等于4G时,情况同上;而当src,dest均等于4G时,情况也同上。 
    只要src,dest中的字符之和不大于4G-1,留一个给'\0',就OK! 

    当src,dest中有一个或多个为空时,情况简单: 
    当dest为空,而src不为空:将src所指向的字符串连同'\0'复制到dest中去! 
    当src为空,而dest不为空:dest不动,只将src所指的'\0',复制并覆盖dest中的最后一个'\0'! 
    当src为空,而dest为空:只将src所指的'\0',复制并覆盖dest中那个'\0'! 

    参考资料: 
    S:si/esi 
    D:di/edi 
    a:ax/eax 
    c:cx/ecx 
    &: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

    0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 
    --------------------------------------------------------------- 
    strncat() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCAT 
    static inline char * strncat(char * dest,const char * src,size_t count) 

    int d0, d1, d2, d3; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "decl %1\n\t" 
    "movl %8,%3\n" 
    "1:\tdecl %3\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "stosb\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n" 
    "2:\txorl %2,%2\n\t" 
    "stosb" 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
    : "memory"); 
    return dest; 

    指令重排: 
    repne ===> while(ecx != 0 && ZF != 1) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    decl %1 ===> decl edi 
    movl %8,%3 ===> movl count,ecx 
    1: ===> 1:
    decl %3 ===> decl ecx
    js 2 ===> js 2
    lodsb ===> mov al,ds:[esi]
    inc esi 
    stosb ===> mov es:[edi],al 
    inc edi 
    testb %%al,%%al ===> test al,al
    jne 1 ===> jne 1
    2: ===> 2:
    xorl %2,%2 ===> xor eax,eax
    stosb ===> mov es:[edi],al 
    inc edi
    参数初始值分析: 
    : "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
    : "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
    esi: esi = src 
    edi: edi = dest 
    eax: eax = 0 
    ecx: ecx = 0xffffffff 
    "g": 让编译器决定如何装入它。 

    代码分析: 
    while(ecx != 0 && ZF != 1) 

    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    decl edi 
    在es:[edi]所指向的字符串中寻找'\0'处。然后回调edi指向该'\0'。 
    当该字符串在4G-1个字节中时,以'\0'正常结束。而当该字符串等于4G时,以ecx==0结束循环,edi回调后指向edi[0xffffffff-1]处。而字符串大于4G则不可能。 

    movl count,ecx 
    1:
    decl ecx
    js 2
    mov al,ds:[esi]
    inc esi 
    mov es:[edi],al 
    inc edi 
    test al,al
    jne 1
    2:
    xor eax,eax
    mov es:[edi],al 
    inc edi

    1:表示开始复制esi所指向的字符串到edi中去。 
    2:表示复制结束后,在未尾再加个'\0'。 
    分情况讨论: 
    1)若count数大于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串连同'\0'复制过了后,结束1:循环,在2:中再在'\0'的后面再复制一个'\0',再edi++,结束程序。 

    2)若count数小于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中只复制count个后,ecx将减为-1后,由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

    3)若count等于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中复制count个后,ecx将减为0后,再在开始处ecx--,ecx == -1, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

    4)若count为负数,在开始处ecx--,ecx == 负数, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',即给edi所指向的字符串的那个'\0'再用'\0'重写一遍'\0',再edi++,结束程序。 

    尽管可以复制4G个字节,由于count为有符号数,则最多复制2G-2(除掉'\0')个字节。这显然是假设es:[edi]这个字符串最大为2G而来的,因为作者也不知道es:[edi]所指向的字符串有多长,虽然大部分不可能有2G,只有点点大,但作者却是作了最一般化的处理。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCMP strcmp() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCMP 
    static inline int strcmp(const char * cs,const char * ct) 

    int d0, d1; 
    register int __res; 
    __asm__ __volatile__( 
    "1:\tlodsb\n\t" 
    "scasb\n\t" 
    "jne 2f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "xorl %%eax,%%eax\n\t" 
    "jmp 3f\n" 
    "2:\tsbbl %%eax,%%eax\n\t" 
    "orb $1,%%al\n" 
    "3:" 
    :"=a" (__res), "=&S" (d0), "=&D" (d1) 
    :"1" (cs),"2" (ct) 
    :"memory"); 
    return __res; 


    初始值分析: 
    ax/eax:register int __res; 
    si/esi:const char* cs; 
    di/edi:const char* ct; 
    ZF == 0 

    指令重排: 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    scasb ===> if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    jne 2 ===> jne 2; 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    xorl %%eax,%%eax ===> xorl eax,eax 
    jmp 3 ===> jmp 3 
    2: sbbl %%eax,%%eax ===> 2: sbbl eax,eax 
    orb $1,%%al ===> orb al ,1 
    3: ===> 3: 

    1)代码剖析: 
    这是比较ds:[esi]和es:[edi]两个字符串是否相等。这两个字符串当以'\0'结束。函数返回值存放在eax中。将ds:[esi]中的每个字符送往al中,再与es:[edi]中的相应的各个字符进行比较,相同就置位ZF=1,然后测试al该字符是否为'\0',如果不是则继续比较下一个字符;如果是'\0',则就清eax为0,结束比较函数,该eax就为函数的返回值。 

    2)情况: 
    1.ds:[esi]和es:[edi]两个字符串是相等:同上,eax返回0 
    2.ds:[esi]和es:[edi]两个字符串不相等: 
    (1)ds:[esi]的字符串ASCII小于es:[edi]的ASCII 
    ds:[esi]=="abc\0" 
    es:[edi]=="xyz\0" 
    if((al-es:[edi])==0) ===>if( ('a'-'x')==0 ) 
    ZF = 1; 条件不成立; CF == 1 
    edi++; edi++; edi指向'y' 
    jne 2 ; jne 2 
    2: sbbl eax,eax eax = eax-eax-CF=-1=0xffffffff 
    orb al ,1 al = 0xff 

    结论: 
    cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

    (2)ds:[esi]的字符串ASCII大于es:[edi]的ASCII 
    ds:[esi]=="xyz\0" 
    es:[edi]=="abc\0" 
    if((al-es:[edi])==0) ===>if( ('x'-'a')==0 ) 
    ZF = 1; 条件不成立; CF == 0 
    edi++; edi++; edi指向'y' 
    jne 2 ; jne 2 
    2: sbbl eax,eax eax = eax-eax-CF=0 
    orb al ,1 al = 0|1=1=0x00000001 
    输出: eax==0x00000001 
    结论: 
    cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1 

    (3)当其中一个字符串是另一个字符串的子字符串时: 
    ds:[esi]=="abc\0" 
    es:[edi]=="abc123\0" 
    当比较到'\0'-'1'时,结束循环,返回-1. 
    而是这种情况时候 : 
    ds:[esi]=="abc123\0" 
    es:[edi]=="abc\0" 
    当比较到'1'-'\0'时,结束循环,返回1. 

    (4)若其中一个为无限长的字符串,另一个为有限长的字符串时: 
    则要么在其中的一个位置不同,跳出来同上面的分析;要么一个相当于为另一个的子字符串,分析同上。 
    可见,只要一个字符串符合以'\0'结束的规则,另一个字符串就算没有'\0'结束,也能正常终止函数。 

    (5)两个字符串均为无限长的字符串: 
    若两者在中间某处不等,就终止跳出,分析同上。 
    若两者完全相等且又无限长,则就地直比较下去。esi,edi将递增到0xffffffff,然后又回到0x00000000。若两字符串是从0x00000000开始的话,就又重复比较下去,一个死循环。若两字符串是从中间某处开始,这个内存中的0x00000000开始处或其后面有不同的字符,就会终止函数。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNCMP strncmp() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRNCMP 
    static inline int strncmp(const char * cs,const char * ct,size_t count) 

    register int __res; 
    int d0, d1, d2; 
    __asm__ __volatile__( 
    "1:\tdecl %3\n\t" 
    "js 2f\n\t" 
    "lodsb\n\t" 
    "scasb\n\t" 
    "jne 3f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n" 
    "2:\txorl %%eax,%%eax\n\t" 
    "jmp 4f\n" 
    "3:\tsbbl %%eax,%%eax\n\t" 
    "orb $1,%%al\n" 
    "4:" 
    :"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2) 
    :"1" (cs),"2" (ct),"3" (count) 
    :"memory"); 
    return __res; 

    初始值: 
    ax/eax:__res 
    si/esi:const char * cs 
    di/edi:const char * ct 
    cx/ecx:count 

    指令重排: 
    1: decl %3 ===> 1: decl ecx 
    js 2 ===> js 2 
    lodsb ===> mov al,ds:[esi]
    inc esi 
    scasb ===> if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    jne 3 ===> jne 3 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
    jmp 4 ===> jmp 4 
    3: sbbl %%eax,%%eax ===> 3: sbbl eax,eax 
    orb $1,%%al ===> orb 1,al 
    4: ===> 4: 

    此函数分析同上: 
    1)当指定的要比较的字符个数小于两个字符串长度时: 
    a:两字符串相同:ecx变为-1,由js 2出循环,再由xorl eax,eax将eax清0,作为函数的返回值返回。 
    b:两字符串不相同:由jne 3跳出来: 
    b-1:当cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1; 
    b-2:当cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

    2)当指定的要比较的字符个数count等于两个字符串长度时: 
    a:两者相等时: 
    由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
    b:两者不相等时: 
    同上分析。 

    3)当指定的要比较的字符个数count大于两个字符串时: 
    a:两者相等时: 
    比较到'\0'时,由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
    b:两者不相等时: 
    同上分析。 

    4)当指定的要比较的字符个数count<=0时: 
    程序流程如下: 
    根本就不比较,直接返回0,结束函数。 
    1: decl %3 ===> 1: decl ecx 
    js 2 ===> js 2 
    ... ... 
    2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
    jmp 4 ===> jmp 4 
    ... ... 
    4: ===> 4: 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRCHR strchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRCHR 
    static inline char * strchr(const char * s, int c) 

    int d0; 
    register char * __res; 
    __asm__ __volatile__( 
    "movb %%al,%%ah\n" 
    "1:\tlodsb\n\t" 
    "cmpb %%ah,%%al\n\t" 
    "je 2f\n\t" 
    "testb %%al,%%al\n\t" 
    "jne 1b\n\t" 
    "movl $1,%1\n" 
    "2:\tmovl %1,%0\n\t" 
    "decl %0" 
    :"=a" (__res), "=&S" (d0) 
    :"1" (s),"0" (c) 
    :"memory"); 
    return __res; 


    初始值: 
    ax/eax:int c 
    si/esi:const char *s 

    指令重排: 
    movb %%al,%%ah ===> movl al,ah 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    cmpb %%ah,%%al ===> cmpb ah,al
    je 2 ===> je 2 
    testb %%al,%%al ===> testb al,al 
    jne 1 ===> jne 1 
    movl $1,%1 ===> movl 1,esi 
    2: movl %1,%0 ===> 2: movl esi,eax 
    decl %0 ===> decl eax 

    功能: 
    ds:[esi]所指向的字符串以'\0'结束,在其中从前往后寻找c字符。如果找到,就返回该字符所在字符串中的位置。如果没找到,就返回0。 

    改写成C语言: 
    al == 要找寻的字符c; 
    esi == 该字符串的起始偏移地址; 
    int eax; 
    char ah; 
    ah = al; 
    1: 
    al = *(ds*16 + esi); 
    esi++; 
    if( al == ah ) 
    goto 2; 
    if( al != 0 ) 
    goto 1; 
    esi = 1; 
    2: 
    eax = esi; 
    eax--; 
    return eax; 

    极端情况: 
    如果ds:[esi]所指向的字符串不以'\0'结束的话,esi一个劲的++,直到变到0xffffffff,然后又变为0x00000000,又从头开始寻找,如果开头及到ds:[esi]处都找不到该字符c,或是也没有'\0'时,就陷入一个死循环。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRRCHR strrchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRRCHR 
    static inline char * strrchr(const char * s, int c) 

    int d0, d1; 
    register char * __res; 
    __asm__ __volatile__( 
    "movb %%al,%%ah\n" 
    "1:\tlodsb\n\t" 
    "cmpb %%ah,%%al\n\t" 
    "jne 2f\n\t" 
    "leal -1(%%esi),%0\n" 
    "2:\ttestb %%al,%%al\n\t" 
    "jne 1b" 
    :"=g" (__res), "=&S" (d0), "=&a" (d1) 
    :"0" (0),"1" (s),"2" (c) 
    :"memory"); 
    return __res; 



    初始值分析: 
    __res : 0 
    si/esi : const char * s 
    ax/eax : c 

    指令重排: 
    movb %%al,%%ah ===> movb al,ah 
    1: lodsb ===> 1: mov al,ds:[esi]
    inc esi 
    cmpb %%ah,%%al ===> cmpb ah,al 
    jne 2 ===> jne 2 
    leal -1(%%esi),%0 ===> leal [esi-1],__res(g) 
    2: testb %%al,%%al ===> 2: testb al,al 
    jne 1 ===> jne 1 
    本函数分析类似上面的strchr()。只不过是找到在const char *s所指向的字符串c出现的最后的位置。找到了,返回其所在地址;没找到,返回0。分析类似上面的strchr(),不再重复。 
    strrchr - Find the last occurrence of a character in a string. 

    如果s为空指针,则后果无法预料。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRLEN strlen() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRLEN 
    static inline size_t strlen(const char * s) 

    int d0; 
    register int __res; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "notl %0\n\t" 
    "decl %0" 
    :"=c" (__res), "=&D" (d0) 
    :"1" (s),"a" (0), "0" (0xffffffffu) 
    :"memory"); 
    return __res; 


    参数初始值分析: 
    di/edi:const char * s 
    ax/eax:0 
    cx/ecx:0xffffffff 
    size_t ecx = 0xffffffff; 
    ZF = 0; 
    char * edi = s; 
    指令重排: eax = 0; 
    repne ===> while(ecx != 0 && ZF == 0) 
    scasb ===> { 
    if((al-es:[edi])==0) 
    ZF = 1; 
    edi++; 
    ecx--; 

    notl %0 ===> ecx = !ecx; 
    decl %0 ===> ecx--; 

    此处函数主要是ecx = !ecx,由于ecx是从0xffffffff递减下来的。记住:递减计数和递增计数是一样的计数,只要在最后,取个反,就让两者相互转化了。在递减计数或递增计数过程中多计数了的值,在最后取反后,要(转化后的数--)。 

    至于各种情况分析,很简单,同前,无须多说。 
    而对于极端情况分析,edi++,ecx--到0xfffffffff--->0x00000000,情况同前。 

    参考: 
    typedef unsigned int __kernel_size_t; 
    typedef __kernel_size_t size_t; 
    --------------------------------------------------------------- 
    __memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static inline void * __memcpy(void * to, const void * from, size_t n) 

    int d0, d1, d2; 
    __asm__ __volatile__( 
    "rep ; movsl\n\t" 
    "movl %4,%%ecx\n\t" 
    "andl $3,%%ecx\n\t" 
    #if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */ 
    "jz 1f\n\t" 
    #endif 
    "rep ; movsb\n\t" 
    "1:" 
    : "=&c" (d0), "=&D" (d1), "=&S" (d2) 
    : "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from) 
    : "memory"); 
    return (to); 


    参数初始值: 
    cx/ecx:n/4 
    di/edi:to 
    si/esi:from 


    指令重排: ecx = n/4; 
    rep ===> while( ecx-- != 0 ) 
    movsl ===> (long)ds:[esi] = (long)es:[edi]; 
    movl %4,%%ecx ===> ecx = n; 
    andl $3,%%ecx ===> ZF = ecx & 0x00000003 
    #if 1 
    jz 1 ===> if(ZF==0) goto 1; 
    #endif 
    rep ===> while( ecx-- != 0 ) 
    movsb ===> (char)ds:[esi] = (char)es:[edi]; 
    1: ===> 1:

    分析: 
    1.先进行4B为单位的复制: 
    ecx = n/4;然后就开始复制。 
    2.求出ecx = ecx % 4;对不足4B的字节进行复制。 
    ZF = ecx & 0x00000003; 
    以上为一般情况分析。 

    3.如果 0< n <4: 
    则ecx = n/4 == 0; 
    if( ecx-- !=0 )条件不成立,不进行4B单位的复制。直接进行以字节为单位的复制。 

    4.如果n = 0: 
    两个if条件均不满足,根本就不复制。 

    5.如果n < 0: 
    函数依然工作,只是牵涉到补码了,后果未知。 

    如果0<n<4 

    参考: 
    typedef unsigned int __kernel_size_t; 
    typedef __kernel_size_t size_t; 
    --------------------------------------------------------------- 
    __constant_memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * This looks ugly, but the compiler can optimize it totally, 
    * as the count is constant. 
    */ 
    static inline void * __constant_memcpy(void * to, const void * from, size_t n) 

    long esi, edi; 
    if (!n) return to; 
    #if 1 /* want to do small copies with non-string ops? */ 
    switch (n) 

    case 1: *(char*)to = *(char*)from; return to; 
    case 2: *(short*)to = *(short*)from; return to; 
    case 4: *(int*)to = *(int*)from; return to; 
    #if 1 /* including those doable with two moves? */ 
    case 3: *(short*)to = *(short*)from; 
    *((char*)to+2) = *((char*)from+2); return to; 
    case 5: *(int*)to = *(int*)from; 
    *((char*)to+4) = *((char*)from+4); return to; 
    case 6: *(int*)to = *(int*)from; 
    *((short*)to+2) = *((short*)from+2); return to; 
    case 8: *(int*)to = *(int*)from; 
    *((int*)to+1) = *((int*)from+1); return to; 
    #endif/* 1 */ 
    }/* switch */ 
    #endif/* 1 */ 
    esi = (long) from; 
    edi = (long) to; 
    if (n >= 5*4) 

    /* large block: use rep prefix */ 
    int ecx; 
    __asm__ __volatile__( 
    "rep ; movsl" 
    : "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
    : "0" (n/4), "1" (edi),"2" (esi) 
    : "memory" 
    ); 
    }/* if */ 

    else 

    /* small block: don't clobber ecx + smaller code */ 
    if (n >= 4*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 3*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 2*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    if (n >= 1*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    }/* else */ 

    switch (n % 4) 

    /* tail */ 
    case 0: return to; 

    case 1: __asm__ __volatile__( 
    "movsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    case 2: __asm__ __volatile__( 
    "movsw" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    default: __asm__ __volatile__( 
    "movsw\n\tmovsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 
    }/* switch */ 


    代码分析: 
    1.对1-8,(一包括7)个字节的复制,采用不同类型的变量进行复制: 
    #if 1 /* want to do small copies with non-string ops? */ 
    switch (n) 

    case 1: *(char*)to = *(char*)from; return to; 
    case 2: *(short*)to = *(short*)from; return to; 
    case 4: *(int*)to = *(int*)from; return to; 
    #if 1 /* including those doable with two moves? */ 
    case 3: *(short*)to = *(short*)from; 
    *((char*)to+2) = *((char*)from+2); return to; 
    case 5: *(int*)to = *(int*)from; 
    *((char*)to+4) = *((char*)from+4); return to; 
    case 6: *(int*)to = *(int*)from; 
    *((short*)to+2) = *((short*)from+2); return to; 
    case 8: *(int*)to = *(int*)from; 
    *((int*)to+1) = *((int*)from+1); return to; 
    #endif/* 1 */ 
    }/* switch */ 
    #endif/* 1 */ 
    当要复制的字节数为:1-8个之间时。执行以上这段程序。当字节数为: 
    1个:用char * 
    2个:用short * 
    4个:用int* 

    2.复制的字节数在[20,>20],[16,19],[12,15],[8,11],[4,7]: 
    if (n >= 5*4) //当要复制的字节数在[20,>20]时: 

    /* large block: use rep prefix */ 
    int ecx; 
    __asm__ __volatile__( 
    "rep ; movsl" 
    : "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
    : "0" (n/4), "1" (edi),"2" (esi) 
    : "memory" 
    ); 
    }/* if */ 

    分析: esi = (long) from; 
    edi = (long) to; 
    ecx = n/4; 
    rep ===> if( ecx-- != 0 ) 
    movsl ===> { 
    (unsigned long)es:[edi] = ds:[esi]; 

    然后就转入下一个switch{}结构体中执行: 
    switch (n % 4) 

    /* tail */ 
    case 0: return to; 

    case 1: __asm__ __volatile__( 
    "movsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    case 2: __asm__ __volatile__( 
    "movsw" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 

    default: __asm__ __volatile__( 
    "movsw\n\tmovsb" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    return to; 
    }/* switch */ 
    代码简单,不再啰嗦。就是再将剩下的不足4B的字节复制过去。 
    default是表示,n%4 == 3,先复制一个字,再复制一个字节,共3B。 
    -------------------------------------------------------------- 
    else //当要复制的字节数在 4<= n <=19时: 

    /* small block: don't clobber ecx + smaller code */ 
    //当要复制的字节数在[16,19]时: 
    if (n >= 4*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[12,15]时: 
    if (n >= 3*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[8,11]时: 
    if (n >= 2*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 

    //当要复制的字节数在[4,7]时: 
    if (n >= 1*4) __asm__ __volatile__( 
    "movsl" 
    :"=&D"(edi),"=&S"(esi) 
    :"0"(edi),"1"(esi) 
    :"memory"); 
    }/* else */ 

    分析: 
    ???: ecx初始值没指定???ecx = n/4这才对啊! 
    其实这些代码合并成一个: 
    if( n >- 1*4 )//7,[9,19] 
    __asm__ __volatile__( 
    "rep; movsl\t\n" 
    :"=&D"(edi),"=&S"(esi),"=C" 
    :"0"(edi),"1"(esi),"2"(n/4) 
    :"memory"); 

    注意: 
    __constant_memcpy()与__memcpy()很相同,参数个数和类型一样,同时功能作用也一样。 
    --------------------------------------------------------------- 
    __constant_memcpy3d() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMCPY 
    #ifdef CONFIG_X86_USE_3DNOW/* 对下面的__constant_memcpy3d() 
    __memcpy3d(),memcpy()*/ 
    #include <asm/mmx.h> 
    /* 
    * This CPU favours 3DNow strongly (eg AMD Athlon) 
    */ 
    static inline void * __constant_memcpy3d(void * to, const void * from, size_t len) 

    if (len < 512) 
    return __constant_memcpy(to, from, len); 
    return _mmx_memcpy(to, from, len); 

    ????_mmx_memcpy()函数找不到,只好罢手!!! 
    --------------------------------------------------------------- 
    __memcpy3d() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static __inline__ void *__memcpy3d(void *to, const void *from, size_t len) 

    if (len < 512) 
    return __memcpy(to, from, len); 
    return _mmx_memcpy(to, from, len); 

    ????_mmx_memcpy()函数找不到,只好罢手!!! 
    --------------------------------------------------------------- 
    memcpy() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define memcpy(t, f, n) \ 
    (__builtin_constant_p(n) ? \ 
    __constant_memcpy3d((t),(f),(n)) : \ 
    __memcpy3d((t),(f),(n))) 
    #else/* CONFIG_X86_USE_3DNOW */ 
    /* 
    * No 3D Now! 
    */ 
    #define memcpy(t, f, n) \ 
    (__builtin_constant_p(n) ? \ 
    __constant_memcpy((t),(f),(n)) : \ 
    __memcpy((t),(f),(n))) 
    #endif/* CONFIG_X86_USE_3DNOW */ 

    int __builtin_constant_p(exp)学习: 
    You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constantfolding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compiletime constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option. 
    You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example: 

    #define Scale_Value(X) \ 
    (__builtin_constant_p (X) \ 
    ? ((X) * SCALE + OFFSET) : Scale (X)) 

    You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC will never return 1 when you call the inline function with a string constant or compound literal and will not return 1 when you pass a constant numeric value to the inline function unless you specify the ‘-O’ option. 

    使用__builtin_constant_p()要和gcc中的-O选项配合使用。 

    You may also use __builtin_constant_p in initializers for static data. For instance,you can write 
    static const int table[] = { 
    __builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1, 
    /* . . . */ 
    }; 
    This is an acceptable initializer even if EXPRESSION is not a constant expression. 
    GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.Previous versions of GCC did not accept this built-in in data initializers. The earliest version where it is completely safe is 3.0.1. 

    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMMOVE 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMMOVE 
    void *memmove(void * dest,const void * src, size_t n); 
    memmove()延用string.c中的函数。 

    #define memcmp __builtin_memcmp 
    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMCHR memchr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMCHR 
    static inline void * memchr(const void * cs,int c,size_t count) 

    int d0; 
    register void * __res; 
    if (!count) return NULL; 
    __asm__ __volatile__( 
    "repne\n\t" 
    "scasb\n\t" 
    "je 1f\n\t" 
    "movl $1,%0\n" 
    "1:\tdecl %0" 
    :"=D" (__res), "=&c" (d0) 
    :"a" (c),"0" (cs),"1" (count) 
    :"memory"); 
    return __res; 


    功能:cs指定内存的起始位置,count指定查找的个数,c指定要查找的内容。在以cs指定的内存为查找的起始位置,以cs+count为终止位置来查找内容c。找到就返回所找到的位置;没找到就返回0。 

    参数初始值: 
    ax/eax: c 
    di/edi: const void * cs 
    cx/ecx: count 
    ZF = 0; 
    ax = c; 
    edi = cs; 
    ecx = count; 
    指令重排: 
    repne ===> while( ecx-- != 0 && ZF == 0) 

    scasb ===> if((al-es:[edi++])==0) 
    ZF = 1; 

    je 1 ===> if(ZF == 1) goto 1; 
    movl $1,%0 ===> edi = 1; 
    1: ===> 1: 
    decl %0 ===> edi--; 
    return edi; 
    返回值:如果找到了c,就返回c所在的位置,如果没找到,就返回0。 
    一般情况代码简单,就此住手。 

    特殊情况: 
    1.若ecx==0:则两个if条件均不满足,直接返回0,结束程序。 
    2.若ecx为0xffffffff巨大的数:要么在其中能找到能与c相匹配的数,返回其位置;要么找不到,当ecx--变为0时,(当ecx==0时,跳出循环时,ecx还要再--又变为0xffffffff),并返回0。 
    3.此处无负数,故ecx<0一情况无须多虑。由于是内存操作函数,连'\0'也可以进入比较范围。 
    --------------------------------------------------------------- 
    __memset_generic() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    static inline void * __memset_generic(void * s, char c,size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep\n\t" 
    "stosb" 
    : "=&c" (d0), "=&D" (d1) 
    :"a" (c),"1" (s),"0" (count) 
    :"memory"); 
    return s; 

    ax = c; 
    edi = s; 
    ecx = count; 
    rep ====> while( ecx !=0 ) 

    stosb ====> es:[edi] = al; 

    return s; 
    --------------------------------------------------------------- 
    __constant_count_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* we might want to write optimized versions of these later */ 
    #define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 
    --------------------------------------------------------------- 
    __constant_c_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * memset(x,0,y) is a reasonably common thing to do, so we want to fill 
    * things 32 bits at a time even when we don't know the size of the 
    * area at compile-time.. 
    */ 
    static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep ; stosl\n\t" 
    "testb $2,%b3\n\t" 
    "je 1f\n\t" 
    "stosw\n" 
    "1:\ttestb $1,%b3\n\t" 
    "je 2f\n\t" 
    "stosb\n" 
    "2:" 
    :"=&c" (d0), "=&D" (d1) 
    :"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
    :"memory"); 
    return (s); 

    参数初始值分析: 
    ax/eax: c 
    cx/ecx: count/4 
    di/edi: void *s 

    指令重排: 
    rep ====> while( ecx-- != 0 ) 

    stosl ====> (long)es:[edi] = eax; 
    edi += 4; 

    testb $2,%b3 ====> if( (0x02 & (char)count) == 0 ) 
    je 1 ====> goto 1; 
    stosw ====> (short)es:[edi] = ax; 
    edi += 2; 
    1: testb $1,%b3 ====> 1: if( (0x01 & (char)count) == 0) 
    je 2 ====> goto 2; 
    stosb ====> (char)es:[edi] = al; 
    2: ====> 2: 
    分析: 
    先以4B为单位进行复制字节。完成后,再分别测试倒数第2位,最后一位是否为1,从而判断是否还剩3,2,1,0个字节。若还剩3B,则复制一个字后,还剩1B;若还剩2B,则复制一个字后,还剩0B.与后面还剩2,0B的情况一样。 

    特殊情况: 
    若count==0,则while,if条件均不满足,跳出循环。 
    --------------------------------------------------------------- 
    __HAVE_ARCH_STRNLEN strnlen() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* Added by Gertjan van Wingerde to make minix and sysv module work */ 
    #define __HAVE_ARCH_STRNLEN 
    static inline size_t strnlen(const char * s, size_t count) 

    int d0; 
    register int __res; 
    __asm__ __volatile__( 
    "movl %2,%0\n\t" 
    "jmp 2f\n" 
    "1:\tcmpb $0,(%0)\n\t" 
    "je 3f\n\t" 
    "incl %0\n" 
    "2:\tdecl %1\n\t" 
    "cmpl $-1,%1\n\t" 
    "jne 1b\n" 
    "3:\tsubl %2,%0" 
    :"=a" (__res), "=&d" (d0) 
    :"c" (s),"1" (count) 
    :"memory"); 
    return __res; 

    /* end of additional stuff */ 

    参数初始值分析: 
    cx/ecx: const char * s 
    dx/edx: count 
    ax/eax: __res 

    指令重排: 
    size_t edx; 
    edx = count; 
    char * eax,ecx; 
    ecx = s; 

    movl %2,%0 ====> eax = s; //ecx = eax = s; 
    jmp 2 ====> goto 2; 

    1: cmpb $0,(%0) ====> 1: if( ((char)(ds:[eax]))==0 )
    je 3 ====> goto 3; 
    incl %0 ====> eax++; 

    2: decl %1 ====> 2: edx--; 
    cmpl $-1,%1 ====> if( (0xffffffff & edx) != 0) 
    jne 1 ====> goto 1; 

    3: subl %2,%0 ====> 3: eax -= ecx; 
    return eax; 
    各种情况分析: 
    1.字符串的长度(不含'\0') < count: 
    s==>"abcd\0?" 
    count == 5: eax已经指向'\0',但还尚未比较之。edx==1,经过edx--后变为edx==0,从而结束函数。再经过eax-=ecx;后,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

    count == 6: edx==1,尚未变为0,但eax=='\0',且经过if条件的比较后,跳出循环,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

    2.字符串的长度(不含'\0') == count: 
    s==>"abcd\0?" 
    count == 4: count总共比较3次,eax最后指向'd'(但尚未比较),eax-=ecx后,eax==3,为count-1的值,也即循环的次数。 

    3.字符串的长度(不含'\0') > count: 
    s==>"abcd\0?" 
    count == 3: 共循环2次后,count变为0,从而结束循环。此时比较了两个字符'a'和'b',eax指向'c',但尚未比较。eax-=ecx后,eax=2,为count-1,也就是所循环的次数。 

    4.字符串的长度(不含'\0')== 0: 
    s==>"\0?" 
    count == 4: 返回eax==0。 

    5.count == 1 
    s==>"abcd\0?" 
    count == 1: 返回eax==0。 

    6.count == 0 
    s==>"abcd\0?" : edx--后,edx变为0xffffffff,要么当edx又减为0时,终止循环,eax当为0,共加了0xffffffff次,又回到原来的值;要么找到为'\0'处,此时返回字符串的长度(不含'\0')。 

    功能分析: 
    s指定一个字符串的首地址,count指定一个长度。对该字符串进行扫描,若字符串的总长度(不含'\0')小于count,就返回该字符串的总长度(不含'\0');若字符串的总长度(不含'\0')>= count,就返回count-1;若字符串的总长度(不含'\0')== 0或count==1就返回0。若count==0则情况未知。 

    --------------------------------------------------------------- 
    __HAVE_ARCH_STRSTR strstr() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_STRSTR 
    extern char *strstr(const char *cs, const char *ct); 
    此处当是引用string.c中的strstr()函数。 
    --------------------------------------------------------------- 
    __constant_c_and_count_memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 
    /* 
    * This looks horribly ugly, but the compiler can optimize it totally, 
    * as we by now know that both pattern and count is constant.. 
    */ 
    static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 

    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 

    分析: 
    1.count == [0,4] : 
    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 


    2.count > 4 : 
    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 

    a):注意这种在函数内部使用宏的方法: 
    1)先用#define定义宏; 
    2)再用一对{}括住函数体; 
    3)再在后面用#undef取消所定义的的宏; 

    b):#define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    参数初始值: 
    ax/eax: pattern 
    cx/ecx: count/4 
    di/edi: s 

    指令重排: 
    COMMON("")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    return s; 

    COMMON("\n\tstosb")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x ===> stosb ===> es:[edi] = al; 
    edi += 1; 
    return s; 

    COMMON("\n\tstosw")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x ===> stosw ===> es:[edi] = ax; 
    edi += 2; 
    return s; 

    COMMON("\n\tstosw\n\tstosb")展开为: 
    eax = pattern; 
    edi = s; 
    ecx = count/4; 
    rep ===> while( ecx-- != 0 ) 

    stosl ===> es:[edi] = eax; 
    edi += 4; 

    x => stosw;stosb=> es:[edi] = ax; 
    edi += 2; 
    es:[edi] = al; 
    edi += 1; 

    return s; 

    c): 进一步分析: 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 


    对剩下的字节数进行移动!!! 

    --------------------------------------------------------------- 
    __constant_c_x_memset()
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __constant_c_x_memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_c_and_count_memset((s),(c),(count)) : \ 
    __constant_c_memset((s),(c),(count))) 

    功能:对s所指定的的字符串用c填充指定的个数count个字节。 

    参考资料: 
    1.__constant_c_and_count_memset(): 
    static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

    switch (count) 

    case 0: 
    return s; 
    case 1: 
    *(unsigned char *)s = pattern; 
    return s; 
    case 2: 
    *(unsigned short *)s = pattern; 
    return s; 
    case 3: 
    *(unsigned short *)s = pattern; 
    *(2+(unsigned char *)s) = pattern; 
    return s; 
    case 4: 
    *(unsigned long *)s = pattern; 
    return s; 

    #define COMMON(x) \ 
    __asm__ __volatile__( \ 
    "rep ; stosl" \ 
    x \ 
    : "=&c" (d0), "=&D" (d1) \ 
    : "a" (pattern),"0" (count/4),"1" ((long) s) \ 
    : "memory") 

    int d0, d1; 
    switch (count % 4) 

    case 0: COMMON(""); return s; 
    case 1: COMMON("\n\tstosb"); return s; 
    case 2: COMMON("\n\tstosw"); return s; 
    default: COMMON("\n\tstosw\n\tstosb"); return s; 



    #undef COMMON 


    2.__constant_c_memset(): 
    static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep ; stosl\n\t" 
    "testb $2,%b3\n\t" 
    "je 1f\n\t" 
    "stosw\n" 
    "1:\ttestb $1,%b3\n\t" 
    "je 2f\n\t" 
    "stosb\n" 
    "2:" 
    :"=&c" (d0), "=&D" (d1) 
    :"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
    :"memory"); 
    return (s);

    --------------------------------------------------------------- 
    __memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_count_memset((s),(c),(count)) : \ 
    __memset_generic((s),(c),(count))) 

    功能:将s所指定的内存区域用c字符填充count次数。 

    参考资料: 
    1.__constant_count_memset(): 
    #define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 

    2.__memset_generic(): 
    static inline void * __memset_generic(void * s, char c,size_t count) 

    int d0, d1; 
    __asm__ __volatile__( 
    "rep\n\t" 
    "stosb" 
    : "=&c" (d0), "=&D" (d1) 
    :"a" (c),"1" (s),"0" (count) 
    :"memory"); 
    return s; 

    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMSET memset() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    #define __HAVE_ARCH_MEMSET 
    #define memset(s, c, count) \ 
    (__builtin_constant_p(c) ? \ 
    __constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \ 
    __memset((s),(c),(count))) 

    功能同上: 

    参考资料: 
    1.__constant_c_x_memset(): 
    #define __constant_c_x_memset(s, c, count) \ 
    (__builtin_constant_p(count) ? \ 
    __constant_c_and_count_memset((s),(c),(count)) : \ 
    __constant_c_memset((s),(c),(count))) 

    2.__memset()同上。 

    ?????(0x01010101UL*(unsigned char)(c))是什么意思??? 
    --------------------------------------------------------------- 
    __HAVE_ARCH_MEMSCAN memscan() 
    --------------------------------------------------------------- 
    include/asm-i386/string.h 

    /* 
    * find the first occurrence of byte 'c', or 1 past the area if none 
    */ 
    #define __HAVE_ARCH_MEMSCAN 
    static inline void * memscan(void * addr, int c, size_t size) 

    if (!size) return addr; 
    __asm__("repnz; scasb\n\t" 
    "jnz 1f\n\t" 
    "dec %%edi\n" 
    "1:" 
    : "=D" (addr), "=c" (size) 
    : "0" (addr), "1" (size), "a" (c) 
    : "memory"); 
    return addr; 


    重排指令: 
    edi = addr; 
    ecx = size; 
    eax = c; 
    ZF = 0; 
    repnz ====> while( ecx-- != 0 && ZF == 0 ) 

    scasb ====> if( (al - es:[edi++]) == 0 ) 
    ZF = 1; 

    jnz 1 ====> if( ZF != 0 ) goto 1; 
    dec %%edi ====> edi--; 
    1: ====> 1: 

    此函数的汇编非常简单,就不再啰嗦了。 
    线性扫描内存,找到了第一个'c',就返回找到的地址;没找到就返回所比较的最后一个位置。 
    #endif /* __KERNEL__ */ 

    #endif /* !_I386_STRING_H_ */ 
    *************************************************************** 
    汇编写的字符串函数终于啃完了!!! 

  • 相关阅读:
    ubuntu 安装 redis desktop manager
    ubuntu 升级内核
    Ubuntu 内核升级,导致无法正常启动
    spring mvc 上传文件,但是接收到文件后发现文件变大,且文件打不开(multipartfile)
    angular5 open modal
    POJ 1426 Find the Multiple(二维DP)
    POJ 3093 Margritas
    POJ 3260 The Fewest Coins
    POJ 1837 Balance(二维DP)
    POJ 1337 A Lazy Worker
  • 原文地址:https://www.cnblogs.com/taek/p/2338939.html
Copyright © 2011-2022 走看看