strlen源码，远没有想象中的那么简单、、、、

zoukankan html css js c++ java

strlen源码，远没有想象中的那么简单、、、、
/*** *strlen - return the length of a null-terminated string * *Purpose: * Finds the length in bytes of the given string, not including * the final null character. * *Entry: * const char * str - string whose length is to be computed * *Exit: * length of the string "str", exclusive of the final null byte * *Exceptions: * *******************************************************************************/ size_t __cdecl strlen ( const char * str ) { const char *eos = str; while( *eos++ ) ; return( eos - str - 1 ); }
　　
01 size_t strlen(str)const char *str; 02 { 03 const char *char_ptr; 04 const unsigned long int *longword_ptr; 05 unsigned long int longword, himagic, lomagic; 06 07 /* Handle the first few 08 characters by reading one character at a time. Do this until CHAR_PTR is 09 aligned on a longword boundary. */ 10 for (char_ptr = str; ((unsigned long int)char_ptr &(sizeof(longword) - 1)) != 11 0; ++char_ptr) 12 if (*char_ptr == '\0') 13 return char_ptr - str; 14 15 /* All these elucidatory comments refer to 4-byte longwords, but the theory 16 applies equally well to 8-byte longwords. */ 17 longword_ptr = (unsigned long int*)char_ptr; 18 19 /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits the "holes." 20 Note that there is a hole just to the left of each 21 byte, with an extra at the end: bits: 01111110 11111110 11111110 11111111 22 bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD The 1-bits make sure that 23 carries propagate to the next 0-bit. The 0-bits provide holes for carries 24 to fall into. */ 25 himagic = 0x80808080L; 26 lomagic = 0x01010101L; 27 if (sizeof(longword) > 4) 28 { 29 /* 64-bit version of the magic. */ /* Do the shift in two steps to avoid a 30 warning if long has 32 bits. 31 */ 32 himagic = ((himagic << 16) << 16) | himagic; 33 lomagic = ((lomagic << 16) << 16) | lomagic; 34 } 35 if (sizeof(longword) > 8) 36 abort(); 37 /* Instead of the traditional loop which tests each character, we will test 38 a longword at a time. The tricky part is testing if *any of the four* 39 bytes in the longword in question are zero. */ 40 for (;;) 41 { 42 longword = *longword_ptr++; 43 if (((longword - lomagic) &~longword &himagic) != 0) 44 { 45 /* Which of the bytes was the zero? If none of them were, it was a 46 misfire; continue the search. 47 */ 48 const char *cp = (const char*)(longword_ptr - 1); 49 if (cp[0] == 0) 50 return cp - str; 51 if (cp[1] == 0) 52 return cp - str + 1; 53 if (cp[2] == 0) 54 return cp - str + 2; 55 if (cp[3] == 0) 56 return cp - str + 3; 57 if (sizeof(longword) > 4) 58 { 59 if (cp[4] == 0) 60 return cp - str + 4; 61 if (cp[5] == 0) 62 return cp - str + 5; 63 if (cp[6] == 0) 64 return cp - str + 6; 65 if (cp[7] == 0) 66 return cp - str + 7; 67 } 68 } 69 } 70 }
　　挨个判断字符是否为0，遇到0则退出，代码很简洁，也不算性能低。只是有点不足，在字长是4字节或者8字节
的计算机上，每次只读取一个字节，有些浪费计算机的能力，如果每次都读取4字节或者8字节，总的读取次数
就大大减少，在读取4字节或者8字节的时候，如果地址不在边界上，机器就要分两次才能读取完成，这样性能
将会降低，弱化优化效果，所以前几个字符必须单独处理，然后从字长边界地址开始，每次读取4字节或者8字
节。

新的方式：

* 开头的几字节单独处理
* 中间部分4字节或者8字节处理
* 最后几字节单独处理

看上去很好，但是还有一个问题，4字节或者8字节读取的时候，如何保证有全0的字节存在，因为0是用来表示
字符串的结尾的。判断连续的几个字节中是否存在全0的字节，成了优化的关键。我们不能一个字节一个字节判
断，因为优化的思想就是一次读取多个字节，减少总的读取次数，单独判断每一个字节的话，就失去优化的效
果了。

怎么办呢，当然首先考虑位运算了。

* 一个纯0的字节有什么特点? 很明显，每一位都是0，按位取反后每一位都是1。
* 一个全0的字节还有什么特点? 这个字节减1，必然要从更高字节借1，借1后，该字节的最高位必然是1。

似乎有些眉目了，以4字节整数n为例，我们只要把每个字节分别减去1，如果有纯0的字节存在，必然会有借位，
借位之后会在字节最高位留下一个1。只要判断每个字节的最高位是否存在1就可以了，然而，这里还有一个问
题，就是这个4字节整数里，某些字节本来最高位可能就含有1，所以必须排除掉这些字节。

解决方案：

* 将n的每一个字节分别减1，并取出最高位，得到x，如果存在借位，该字节最高位就是1
* 将n的每一个字节按位取反并取出最高位，得到y，y中某字节最高位为1，表示它在n里是0
* 将x和y按位与运算，若不等于0，说明n至少有1字节原本最高位不是1，后来变成1了，就是借位

若n中存在全0字节，则 x&y 一定不为0，因为借位的那个字节最高位会被置为1
若n中不存在全0字节，则不会产生借位，x&y 等于0。
x&y == (n-0x01010101) & ~n & 0x80808080
参考http://www.cppblog.com/ant/archive/2007/10/12/32886.aspx
==============================================================================

本博客已经废弃，不在维护。新博客地址：http://wenchao.ren

我喜欢程序员，他们单纯、固执、容易体会到成就感；面对压力，能够挑灯夜战不眠不休；面对困难，能够迎难而上挑战自我。他
们也会感到困惑与傍徨，但每个程序员的心中都有一个比尔盖茨或是乔布斯的梦想“用智慧开创属于自己的事业”。我想说的是，其
实我是一个程序员
==============================================================================
查看全文

相关阅读:
GzipOutputStream及GzipInputStream的用法
 java的ZipOutputStream压缩文件的两个问题（乱码和每次zip后文件md5变化）
HttpClient对URL编码的处理方式解惑！
使用tmpfs缓存文件提高性能
 eclipse attach source code support folder zip & jar format
HTTP头部详解及使用Java套接字处理HTTP请求
 curl使用总结
 cURL: win64sslsspi from Mirrors 64bit win7 version
httpclient解析gzip网页
 使用Gzip加速网页的传输

原文地址：https://www.cnblogs.com/rollenholt/p/2443719.html