[C++] cout、wcout无法正常输出中文字符问题的深入调查（2）：VC2005的crt源码分析

zoukankan html css js c++ java

[C++] cout、wcout无法正常输出中文字符问题的深入调查（2）：VC2005的crt源码分析
作者：zyl910

　　前面测试了各种编译器的执行结果，但为什么它们的执行结果是那样呢？这需要仔细分析。VC2005的测试结果比较典型，而且调试跟踪比较方便，于是本篇对VC2005的crt源码进行分析。

一、须知

　　开发工具是VC2005，平台为32位的x86，编译模式为Debug，使用MBCS字符集。

二、cout输出窄字符串

2.1 已初始化locale

　　“已初始化locale”是指——在输出前执行了初始化locale，即执行了下列语句——
// init. locale::global(locale("")); wcout.imbue(locale(""));
　　现在开始进行分析。
　　“cout << psa”表示使用cout输出窄字符串。按F11单步跟踪，它依次进入了下列函数——
operator<<：[C++库] 流输出运算符。
basic_streambuf<char>::sputn：[C++库] 输出字符串（公开方法）。
basic_streambuf<char>::xsputn：[C++库] 输出字符串（内部实现）。循环对源串中的每一个char调用overflow。【注意#1】gbk编码的汉字是2个字节，会调用overflow 2次。
basic_filebuf<char>::overflow：[C++库] 数据溢出，即向文件写入一个字符。【注意#2】因为现在是char版，无需转换编码，直接调用_Fputc。
_Fputc<char>：[C++库]向文件写入一个char。
fputc：[C库] 向文件写入一个char。
_flsbuf：[C库] 刷新缓冲区并输出char。
_write：[C库] 向文件写数据。
_write_nolock：[C库] 向文件写数据（不加锁版）。【注意#3】条件判断存在漏洞，导致汉字的首字节无法输出。返回-1。

　　此时的调用栈——
> msvcr80d.dll!_write_nolock(int fh=0x00000001, const void * buf=0x0012fb50, unsigned int cnt=0x00000001) 行170 C
  msvcr80d.dll!_write(int fh=0x00000001, const void * buf=0x0012fb50, unsigned int cnt=0x00000001) 行74 + 0x11 字节 C
  msvcr80d.dll!_flsbuf(int ch=0xffffffba, _iobuf * str=0x10311d20) 行189 + 0x11 字节 C
  msvcr80d.dll!fputc(int ch=0xffffffba, _iobuf * str=0x10311d20) 行52 + 0x4b 字节 C
  msvcp80d.dll!std::_Fputc<char>(char _Byte=0xba, _iobuf * _File=0x10311d20) 行81 + 0xf 字节 C++
  msvcp80d.dll!std::basic_filebuf<char,std::char_traits<char> >::overflow(int _Meta=0x000000ba) 行261 + 0x1c 字节 C++
  msvcp80d.dll!std::basic_streambuf<char,std::char_traits<char> >::xsputn(const char * _Ptr=0x0041774d, int _Count=0x00000007) 行379 + 0x1a 字节 C++
  msvcp80d.dll!std::basic_streambuf<char,std::char_traits<char> >::sputn(const char * _Ptr=0x0041774c, int _Count=0x00000008) 行170 C++
  wchar_crtbug_2005.exe!std::operator<<<std::char_traits<char> >(std::basic_ostream<char,std::char_traits<char> > & _Ostr={...}, const char * _Val=0x0041774c) 行768 + 0x3e 字节 C++
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a58) 行45 + 0x12 字节 C++

　　发现_write_nolock函数存在Bug，代码摘录——
// C:\VS2005\VC\crt\src\write.c, 160 line: /* don't need double conversion if it's ANSI mode C locale */ if (toConsole && !(isCLocale && (tmode == __IOINFO_TM_ANSI))) { UINT consoleCP = GetConsoleCP(); char mboutbuf[MB_LEN_MAX]; wchar_t tmpchar; int size = 0; int written = 0; char *pch; for (pch = (char *)buf; (unsigned)(pch - (char *)buf) < cnt; ) { BOOL bCR; if (tmode == __IOINFO_TM_ANSI) { bCR = *pch == LF; /* * Here we need to do double convert. i.e. convert from * multibyte to unicode and then from unicode to multibyte in * Console codepage. */ if (!isleadbyte(*pch)) { if (mbtowc(&tmpchar, pch, 1) == -1) { break; } } else if ((cnt - (pch - (char*)buf)) > 1) { if (mbtowc(&tmpchar, pch, 2) == -1) { break; } /* * Increment pch to accomodate DBCS character. */ ++pch; } else { break; } ++pch; } else if (tmode == __IOINFO_TM_UTF8 || tmode == __IOINFO_TM_UTF16LE) { /* * Note that bCR set above is not valid in case of UNICODE * stream. We need to set it using unicode character. */ tmpchar = *(wchar_t *)pch; bCR = tmpchar == LF; pch += 2; } if (tmode == __IOINFO_TM_ANSI) { if( (size = WideCharToMultiByte(consoleCP, 0, &tmpchar, 1, mboutbuf, sizeof(mboutbuf), NULL, NULL)) == 0) { break; } else { if ( WriteFile( (HANDLE)_osfhnd(fh), mboutbuf, size, (LPDWORD)&written, NULL) ) { charcount += written; if (written < size) break; } else { dosretval = GetLastError(); break; } } if (bCR) { size = 1; mboutbuf[0] = CR; if (WriteFile((HANDLE)_osfhnd(fh), mboutbuf, size, (LPDWORD)&written, NULL) ) { if (written < size) break; lfcount ++; charcount++; } else { dosretval = GetLastError(); break; } } } else if ( tmode == __IOINFO_TM_UTF8 || tmode == __IOINFO_TM_UTF16LE) ... // C:\VS2005\VC\crt\src\write.c, 443 line: if (charcount == 0) { /* If nothing was written, first check if an o.s. error, otherwise we return -1 and set errno to ENOSPC, unless a device and first char was CTRL-Z */ if (dosretval != 0) { /* o.s. error happened, map error */ if (dosretval == ERROR_ACCESS_DENIED) { /* wrong read/write mode should return EBADF, not EACCES */ errno = EBADF; _doserrno = dosretval; } else _dosmaperr(dosretval); return -1; } ...
　　_write_nolock函数的主要处理流程是——
循环处理源串中的每一个char
{
调用mbtowc将当前char转换为宽字符。利用isleadbyte函数判断当前char是不是多字节字符的首字节，再判断是否能凑够2个字节进行转换。
调用WideCharToMultiByte将宽字符转为窄字符串。
调用WriteFile将窄字符串写入文件。
}

　　问题就是出在“调用mbtowc将当前char转换为宽字符”这一步——
因为先前在basic_streambuf<char>::xsputn函数中，就已经将源串分解为各个char了。gbk编码的汉字是2个字节，所以会先将汉字的首字节传递到_write_nolock函数。
因现在是首字节，所以“if (!isleadbyte(*pch))”判断为假。因现在只有一个字节，“else if ((cnt - (pch - (char*)buf)) > 1)”判断也为假。最终到else分支，执行break跳出循环。
跳出循环后，因为没有输出字符，于是进入“if (charcount == 0)”分支。因dosretval变量未初始化，所以该变量为非0值的可能性很高，于是进入了“if (dosretval != 0)”分支。最终执行“return -1”返回-1。

　　函数返回时——
_write_nolock：【注意#3】条件判断存在漏洞，导致汉字的首字节无法输出。返回-1。
_write：返回_write_nolock的返回值，即返回-1。
_flsbuf：因_flsbuf的返回值（-1）与字符数不同（sizeof(TCHAR)），返回EOF（-1）。
fputc：返回_flsbuf的返回值，即返回EOF（-1）。
_Fputc<char>：因“fputc的返回值(EOF)与EOF不相等”的结果为假（(fputc(_Byte, _File) != EOF)），返回false。
basic_filebuf<char>::overflow：因_Fputc返回false，返回_Traits::eof()，即EOF（-1）。
basic_streambuf<char>::xsputn：【注意#4】因overflow返回EOF（-1），跳出循环，返回实际输出的字符数。
basic_streambuf<char>::sputn：返回xsputn的返回值，即返回实际输出的字符数。
operator<<：【注意#5】因实际输出的字符数与源串字符数不同，设置流标记为bad。

　　这就是“已初始化locale时，cout无法输出中文窄字符串”的原因。

2.2 未初始化locale

　　“未初始化locale”是指——在输出前没有初始化locale，即将相关语句注释了——
// init. //locale::global(locale("")); //wcout.imbue(locale(""));
　　“cout << psa”仍会执行到_write_nolock函数。此时的调用栈——
> msvcr80d.dll!_write_nolock(int fh=0x00000001, const void * buf=0x0012fb98, unsigned int cnt=0x00000001) 行268 + 0x5 字节 C
  msvcr80d.dll!_write(int fh=0x00000001, const void * buf=0x0012fb98, unsigned int cnt=0x00000001) 行74 + 0x11 字节 C
  msvcr80d.dll!_flsbuf(int ch=0xffffffba, _iobuf * str=0x10311d20) 行189 + 0x11 字节 C
  msvcr80d.dll!fputc(int ch=0xffffffba, _iobuf * str=0x10311d20) 行52 + 0x4b 字节 C
  msvcp80d.dll!std::_Fputc<char>(char _Byte=0xba, _iobuf * _File=0x10311d20) 行81 + 0xf 字节 C++
  msvcp80d.dll!std::basic_filebuf<char,std::char_traits<char> >::overflow(int _Meta=0x000000ba) 行261 + 0x1c 字节 C++
  msvcp80d.dll!std::basic_streambuf<char,std::char_traits<char> >::xsputn(const char * _Ptr=0x0041774d, int _Count=0x00000007) 行379 + 0x1a 字节 C++
  msvcp80d.dll!std::basic_streambuf<char,std::char_traits<char> >::sputn(const char * _Ptr=0x0041774c, int _Count=0x00000008) 行170 C++
  wchar_crtbug_2005.exe!std::operator<<<std::char_traits<char> >(std::basic_ostream<char,std::char_traits<char> > & _Ostr={...}, const char * _Val=0x0041774c) 行768 + 0x3e 字节 C++
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行45 + 0x12 字节 C++

　　在_write_nolock函数中，因为现在使用的是C默认locale（未初始化locale），所以执行的语句不同。代码摘录——
// C:\VS2005\VC\crt\src\write.c, 160 line: /* don't need double conversion if it's ANSI mode C locale */ if (toConsole && !(isCLocale && (tmode == __IOINFO_TM_ANSI))) { ... // C:\VS2005\VC\crt\src\write.c, 268 line: } else if ( _osfile(fh) & FTEXT ) { /* text mode, translate LF's to CR/LF's on output */ dosretval = 0; /* no OS error yet */ if(tmode == __IOINFO_TM_ANSI) { char ch; /* current character */ char *p = NULL, *q = NULL; /* pointers into buf and lfbuf resp. */ char lfbuf[BUF_SIZE]; p = (char *)buf; /* start at beginning of buffer */ while ( (unsigned)(p - (char *)buf) < cnt ) { q = lfbuf; /* start at beginning of lfbuf */ /* fill the lf buf, except maybe last char */ while ( q - lfbuf < sizeof(lfbuf) - 1 && (unsigned)(p - (char *)buf) < cnt ) { ch = *p++; if ( ch == LF ) { ++lfcount; *q++ = CR; } *q++ = ch; } /* write the lf buf and update total */ if ( WriteFile( (HANDLE)_osfhnd(fh), lfbuf, (int)(q - lfbuf), (LPDWORD)&written, NULL) ) { charcount += written; if (written < q - lfbuf) break; } else { dosretval = GetLastError(); break; } }
　　因现在isCLocale为真，于是转到“else if ( _osfile(fh) & FTEXT )”分支。简单做了一下换行符处理后，便调用WriteFile写数据。操作成功。

　　这就是“未初始化locale，cout能正常输出中文窄字符串”的原因。

2.3 其他测试

　　修改了一下项目配置，改为Unicode字符集。进行调试，发现程序运行效果完全相同。这是因为_write_nolock是msvcr80d.dll中已经编译好代码，本项目的编译参数不会影响msvcr80d.dll的执行效果。
　　再修改项目配置，改为静态链接。进行调试，发现程序运行效果完全相同。原理同上。
　　

三、wcout输出宽字符串

3.1 已初始化locale

　　“wcout << psw”表示使用cout输出窄字符串。按F11单步跟踪，它依次进入了下列函数——
operator<<：[C++库] 流输出运算符。
basic_streambuf<wchar_t>::sputn：[C++库] 输出字符串（公开方法）。
basic_streambuf<wchar_t>::xsputn：[C++库] 输出字符串（内部实现）。循环对源串中的每一个wchar_t调用overflow。【注意#1】汉字一般是1个wchar_t，会调用overflow 1次。
basic_filebuf<wchar_t>::overflow：[C++库] 数据溢出，即向文件写入一个字符。【注意#2】因为现在是wchar_t版，需要进行编码转换。
codecvt<wchar_t,char,int>::out：[C++库] 将wchar_t串转为char串（公开方法）。
codecvt<wchar_t,char,int>::do_out：[C++库] 将wchar_t串转为串（内部实现）。
_Wcrtomb：[C库] 调用WideCharToMultiByte将wchar_t字符转换为多字节串。

　　此时的调用栈——
> msvcp80d.dll!_Wcrtomb(char * s=0x0018fc18, wchar_t wchar=L'汉', int * pst=0x6ad750ec, const _Cvtvec * ploc=0x00264cf0) 行111 C
  msvcp80d.dll!std::codecvt<wchar_t,char,int>::do_out(int & _State=0, const wchar_t * _First1=0x0018fc38, const wchar_t * _Last1=0x0018fc3a, const wchar_t * & _Mid1=0x0018fc38, char * _First2=0x0018fc18, char * _Last2=0x0018fc20, char * & _Mid2=0x0018fc18) 行1000 + 0x1f 字节 C++
  msvcp80d.dll!std::codecvt<wchar_t,char,int>::out(int & _State=0, const wchar_t * _First1=0x0018fc38, const wchar_t * _Last1=0x0018fc3a, const wchar_t * & _Mid1=0x0018fc38, char * _First2=0x0018fc18, char * _Last2=0x0018fc20, char * & _Mid2=0x0018fc18) 行897 C++
  msvcp80d.dll!std::basic_filebuf<wchar_t,std::char_traits<wchar_t> >::overflow(unsigned short _Meta=27721) 行273 + 0x90 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::xsputn(const wchar_t * _Ptr=0x004187e2, int _Count=5) 行379 + 0x1a 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::sputn(const wchar_t * _Ptr=0x004187e0, int _Count=6) 行170 C++
  tcharall_cpp_2005.exe!std::operator<<<wchar_t,std::char_traits<wchar_t> >(std::basic_ostream<wchar_t,std::char_traits<wchar_t> > & _Ostr={...}, const wchar_t * _Val=0x004187e0) 行853 + 0x3e 字节 C++
wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行46 + 0x12 字节 C++

　　在_Wcrtomb函数中，它会调用WideCharToMultiByte这个Windows API进行编码转换。
　　编码转换成功后，又会回到overflow函数。它会调用fwrite输出转换后的char串，依次进入了下列函数——
fwrite：[C库] 向文件写入数据。
_fwrite_nolock：[C库] 向文件写入数据（不加锁版）。【注意#3】循环对数据的每一个char调用_flsbuf。
_flsbuf(int ch, _iobuf* str) // [C库] 刷新缓冲区并输出char。
_write(int fh, const void* buf, unsigned int cnt) // [C库] 向文件写数据。
_write_nolock(int fh, const void* buf, unsigned int cnt) // [C库] 向文件写数据（不加锁版）。

　　此时的调用栈——
> msvcr80d.dll!_write_nolock(int fh=0x00000001, const void * buf=0x0018fae0, unsigned int cnt=0x00000001) 行470 C
  msvcr80d.dll!_write(int fh=0x00000001, const void * buf=0x0018fae0, unsigned int cnt=0x00000001) 行74 + 0x11 字节 C
  msvcr80d.dll!_flsbuf(int ch=0xffffffba, _iobuf * str=0x67cc1d20) 行189 + 0x11 字节 C
  msvcr80d.dll!_fwrite_nolock(const void * buffer=0x0018fc18, unsigned int size=0x00000001, unsigned int num=0x00000002, _iobuf * stream=0x67cc1d20) 行194 + 0xd 字节 C
  msvcr80d.dll!fwrite(const void * buffer=0x0018fc18, unsigned int size=0x00000001, unsigned int count=0x00000002, _iobuf * stream=0x67cc1d20) 行83 + 0x15 字节 C
  msvcp80d.dll!std::basic_filebuf<wchar_t,std::char_traits<wchar_t> >::overflow(unsigned short _Meta=0x6c49) 行280 + 0x59 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::xsputn(const wchar_t * _Ptr=0x004187e2, int _Count=0x00000005) 行379 + 0x1a 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::sputn(const wchar_t * _Ptr=0x004187e0, int _Count=0x00000006) 行170 C++
  tcharall_cpp_2005.exe!std::operator<<<wchar_t,std::char_traits<wchar_t> >(std::basic_ostream<wchar_t,std::char_traits<wchar_t> > & _Ostr={...}, const wchar_t * _Val=0x004187e0) 行853 + 0x3e 字节 C++
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行46 + 0x12 字节 C++

　　在_write_nolock函数中，又遇到了同样的问题——
因为先前在_fwrite_nolock函数中，就已经将源串分解为各个char了。gbk编码的汉字是2个字节，所以会先将汉字的首字节传递到_write_nolock函数。
因现在是首字节，所以“if (!isleadbyte(*pch))”判断为假。因现在只有一个字节，“else if ((cnt - (pch - (char*)buf)) > 1)”判断也为假。最终到else分支，执行break跳出循环。
跳出循环后，因为没有输出字符，于是进入“if (charcount == 0)”分支。因dosretval变量未初始化，所以该变量为非0值的可能性很高，于是进入了“if (dosretval != 0)”分支。最终执行“return -1”返回-1。

　　函数返回时——
_write_nolock：【注意#4】条件判断存在漏洞，导致汉字的首字节无法输出。返回-1。
_write：返回_write_nolock的返回值，即返回-1。
_flsbuf：因_flsbuf的返回值（-1）与字符数不同（sizeof(TCHAR)），返回EOF（-1）。
_fwrite_nolock：因_flsbuf返回EOF（-1），跳出循环，返回实际输出的字符数（0）。
fwrite：返回_fwrite_nolock的返回值，即返回0。
basic_filebuf<wchar_t>::overflow：因fwrite的返回值（0）与编码转换后的字符数不同，返回_Traits::eof()，即WEOF（-1）。
basic_streambuf<wchar_t>::xsputn：【注意#5】因overflow返回WEOF（-1），跳出循环，返回实际输出的字符数。
basic_streambuf<wchar_t>::sputn：返回xsputn的返回值，即返回实际输出的字符数。
operator<<：【注意#6】因实际输出的字符数与源字符数不同，设置流标记为bad。

　　这就是“已初始化locale时，cout无法输出中文窄字符串”的原因。虽然basic_filebuf<wchar_t>::overflow能正常的将宽字符转为窄字符串，但_write_nolock的Bug造成了无法输出。

3.2 未初始化locale

　　未初始化locale时，“wcout << psw”的执行路径与先前不同，依次进入了下列函数——
operator<<：[C++库] 流输出运算符。
basic_streambuf<wchar_t>::sputn：[C++库] 输出字符串（公开方法）。
basic_streambuf<wchar_t>::xsputn：[C++库] 输出字符串（内部实现）。循环对源串中的每一个wchar_t调用overflow。【注意#1】汉字一般是1个wchar_t，会调用overflow 1次。
basic_filebuf<wchar_t>::overflow：[C++库] 数据溢出，即向文件写入一个字符。【注意#2】因为现在是“未初始化locale”，不做编码转换，直接调用_Fputc<wchar_t>。
_Fputc<wchar_t>：[C++库] 输出 wchar_t。
fputwc：[C库] 输出 wchar_t（公开方法）。
_fputwc_nolock：[C库] 输出 wchar_t（内部实现）。【注意#3】因为现在是wchar_t版，需要进行编码转换。
wctomb_s：[C库] （缓冲安全版）将宽字符转为多字节字符（公开方法）。
_wctomb_s_l：[C库] （缓冲安全版）将宽字符转为多字节字符（内部实现）。

　　此时的调用栈——
> msvcr80d.dll!_wctomb_s_l(int * pRetValue=0x0012fbac, char * dst=0x0012fba0, unsigned int sizeInBytes=0x00000005, wchar_t wchar=L'汉', localeinfo_struct * plocinfo=0x00000000) 行81 C++
  msvcr80d.dll!wctomb_s(int * pRetValue=0x0012fbac, char * dst=0x0012fba0, unsigned int sizeInBytes=0x00000005, wchar_t wchar=L'汉') 行145 + 0x18 字节 C++
  msvcr80d.dll!_fputwc_nolock(wchar_t ch=L'汉', _iobuf * str=0x10311d20) 行133 + 0x14 字节 C
  msvcr80d.dll!fputwc(wchar_t ch=L'汉', _iobuf * str=0x10311d20) 行60 + 0xe 字节 C
  msvcp80d.dll!std::_Fputc<wchar_t>(wchar_t _Wchar=L'汉', _iobuf * _File=0x10311d20) 行86 + 0xf 字节 C++
  msvcp80d.dll!std::basic_filebuf<wchar_t,std::char_traits<wchar_t> >::overflow(unsigned short _Meta=0x6c49) 行261 + 0x1c 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::xsputn(const wchar_t * _Ptr=0x0041773e, int _Count=0x00000005) 行379 + 0x1a 字节 C++
  msvcp80d.dll!std::basic_streambuf<wchar_t,std::char_traits<wchar_t> >::sputn(const wchar_t * _Ptr=0x0041773c, int _Count=0x00000006) 行170 C++
  wchar_crtbug_2005.exe!std::operator<<<wchar_t,std::char_traits<wchar_t> >(std::basic_ostream<wchar_t,std::char_traits<wchar_t> > & _Ostr={...}, const wchar_t * _Val=0x0041773c) 行853 + 0x3e 字节 C++
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行46 + 0x12 字节 C++

　　在_wctomb_s_l函数中，因为现在使用的是C默认locale（未初始化locale），对于编码大于255的字符会报错。代码摘录——
// C:\VS8_2005\VC\crt\src\wctomb.c, 79 line: if ( _loc_update.GetLocaleT()->locinfo->lc_handle[LC_CTYPE] == _CLOCALEHANDLE ) { if ( wchar > 255 ) /* validate high byte */ { if (dst != NULL && sizeInBytes > 0) { memset(dst, 0, sizeInBytes); } errno = EILSEQ; return errno; }
　　函数返回时——
_wctomb_s_l：【注意#4】因现在是C地区，而汉字的unicode码>255，于是返回EILSEQ。
wctomb_s：同_wctomb_s_l，返回EILSEQ。
_fputwc_nolock：因wctomb_s的返回值非0，返回WEOF（-1）。
fputwc：返回WEOF（-1）。
_Fputc<wchar_t>：判断条件为“return (::fputwc(_Wchar, _File) != WEOF);”，返回false。
basic_filebuf<wchar_t>::overflow：因_Fputc返回false，返回WEOF（-1）。
basic_streambuf<wchar_t>::xsputn：【注意#5】因overflow返回WEOF（-1），跳出循环，返回实际输出的字符数。
basic_streambuf<wchar_t>::sputn：返回xsputn的返回值，即返回实际输出的字符数。
operator<<：【注意#6】因实际输出的字符数与源字符数不同，设置流标记为bad。

　　这就是“未初始化locale时，cout无法输出中文窄字符串”的原因。主要因为C默认locale不支持编码大于255的字符。

四、printf输出窄字符串

4.1 已初始化locale

　　“printf("\t%s\n", psa)”表示使用printf输出窄字符串。按F11单步跟踪，它依次进入了下列函数——
printf：[C库] 带格式输出。
_output_l：[C库] 根据locale信息进行带格式输出。对格式字符串进行解析，根据“%s”提取窄字符串，然后调用write_string输出窄字符串。
write_string：[C库] 写窄字符串。循环对源串中的每一个字符调用write_char。
write_char：[C库] 写窄字符。

　　此时的调用栈——
> msvcr80d.dll!write_char(char ch=0xd7, _iobuf * f=0x10311d20, int * pnumwritten=0x0012fba8) 行2442 C++
  msvcr80d.dll!write_string(char * string=0x0041774f, int len=0x00000004, _iobuf * f=0x10311d20, int * pnumwritten=0x0012fba8) 行2570 + 0x19 字节 C++
  msvcr80d.dll!_output_l(_iobuf * stream=0x10311d20, const char * format=0x00417823, localeinfo_struct * plocinfo=0x00000000, char * argptr=0x0012fe54) 行2260 + 0x18 字节 C++
  msvcr80d.dll!printf(const char * format=0x00417820, ...) 行63 + 0x18 字节 C
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行50 + 0x13 字节 C++

　　write_char函数的源码如下——
// C:\VS8_2005\VC\crt\src\output.c, 2428 line: LOCAL(void) write_char ( _TCHAR ch, FILE *f, int *pnumwritten ) { if ( (f->_flag & _IOSTRG) && f->_base == NULL) { ++(*pnumwritten); return; } #ifdef _UNICODE if (_putwc_nolock(ch, f) == WEOF) #else /* _UNICODE */ if (_putc_nolock(ch, f) == EOF) #endif /* _UNICODE */ *pnumwritten = -1; else ++(*pnumwritten); }
　　可见，因现在采用的是MBCS字符集，它是调用_putc_nolock函数来输出字符的。
　　在VS2005中，_putc_nolock函数无法按F11单步跟踪进去。而且“C:\VS8_2005\VC\crt\src”目录下也找不到_putc_nolock函数的源码。
　　虽然无法看见_putc_nolock函数的源码，但根据测试结果可以知道，它能正常的处理窄字符串。

4.2 未初始化locale

　　未初始化locale时，“printf("\t%s\n", psa)”的执行路径与先前相同，最终调用_putc_nolock逐个逐个的输出窄字符。

五、printf输出宽字符串

5.1 已初始化locale

　　“printf("\t%ls\n", psw)”表示使用printf输出宽字符串。按F11单步跟踪，它依次进入了下列函数——
printf：[C库] 带格式输出。
_output_l：[C库] 根据locale信息进行带格式输出。对格式字符串进行解析，根据“%ls”提取宽字符串，随后调用wctomb_s进行编码转换。
wctomb_s：[C库] （缓冲安全版）将宽字符转为多字节字符（公开方法）。
_wctomb_s_l：[C库] （缓冲安全版）将宽字符转为多字节字符（内部实现）。

　　此时的调用栈——
> msvcr80d.dll!_wctomb_s_l(int * pRetValue=0x0012fb2c, char * dst=0x0012fb24, unsigned int sizeInBytes=0x00000006, wchar_t wchar=L'W', localeinfo_struct * plocinfo=0x00000000) 行115 C++
  msvcr80d.dll!wctomb_s(int * pRetValue=0x0012fb2c, char * dst=0x0012fb24, unsigned int sizeInBytes=0x00000006, wchar_t wchar=L'W') 行145 + 0x18 字节 C++
  msvcr80d.dll!_output_l(_iobuf * stream=0x10311d20, const char * format=0x0041781c, localeinfo_struct * plocinfo=0x00000000, char * argptr=0x0012fe54) 行2252 + 0x2d 字节 C++
  msvcr80d.dll!printf(const char * format=0x00417818, ...) 行63 + 0x18 字节 C
  wchar_crtbug_2005.exe!main(int argc=0x00000001, char * * argv=0x003b6a00) 行51 + 0x13 字节 C++

　　在_wctomb_s_l函数中，因为现在已初始化locale，所以它能能正确的将宽字符串转为窄字符串。
　　编码转换成功后，又会回到_output_l函数。它会调用write_string输出转换后的窄字符串，依次进入了下列函数——
write_string：[C库] 写窄字符串。循环对源串中的每一个字符调用write_char。
write_char：[C库] 写窄字符。调用_putc_nolock函数正常的输出窄字符串。

5.2 未初始化locale

　　未初始化locale时，“printf("\t%ls\n", psw)”的执行路径与先前大致相同，也调用_wctomb_s_l进行编码转换。
　　在_wctomb_s_l函数中，因为现在使用的是C默认locale（未初始化locale），对于编码大于255的字符会报错，于是造成可宽字符串不能输出。

六、总结

　　总结一下不能输出时的原因——
已初始化locale时，cout无法输出中文窄字符串：因为_write_nolock函数中的条件判断存在漏洞，导致汉字的首字节无法输出。
已初始化locale时，wcout无法输出中文宽字符串：因为_write_nolock函数中的条件判断存在漏洞，导致汉字的首字节无法输出。
未初始化locale时，wcout无法输出中文宽字符串：因为在C默认locale时的_wctomb_s_l函数不支持编码大于255的字符。
未初始化locale时，printf无法输出中文宽字符串：因为在C默认locale时的_wctomb_s_l函数不支持编码大于255的字符。

　　其中前2条是bug，而后2条是C标准中规定的。

参考资料——
《ISO/IEC 9899:1999》（C99）. ISO/IEC，1999. www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
《C++ International Standard - ISO IEC 14882 Second edition 2003》（C++03）. ISO/IEC，2003-10-15.
《C++标准程序库—自修教程与参考手册》. Nicolai M.Josuttis 著，侯捷、孟岩译. 华中科技大学出版社，2002-09.
《[C] 跨平台使用TCHAR——让Linux等平台也支持tchar.h，解决跨平台时的格式控制字符问题，多国语言的同时显示》. http://www.cnblogs.com/zyl910/archive/2013/01/17/tcharall.html
《[C++] cout、wcout无法正常输出中文字符问题的深入调查（1）：各种编译器测试》. http://www.cnblogs.com/zyl910/archive/2013/01/20/wchar_crtbug_01.html
作者：zyl910

出处：http://www.cnblogs.com/zyl910/

版权声明：自由转载-非商用-非衍生-保持署名 | Creative Commons BY-NC-ND 3.0.
查看全文

相关阅读:
图片剪纸刀：批量切割图片工具
 Photosynth软件试用(将照片拼接成实景)
制作一份简单的网络地图(世博地图的配准和切割)
Discuz论坛地图插件(通过自定义Discuz Code实现)
Maven Settings.xml 配置模板
 CentOS 7 firewalld 配置详解 (转)
Silverlight学习笔记八右键菜单控件
 Silverlight学习笔记十三关于SilverLight的打印
 Silverlight学习笔记十二动态加载图片和显示提示(ToolTip)
Silverlight学习笔记十一动态创建TabContro的TabItem

原文地址：https://www.cnblogs.com/zyl910/p/wchar_cppbug_02.html

[C++] cout、wcout无法正常输出中文字符问题的深入调查（2）：VC2005的crt源码分析

一、须知

二、cout输出窄字符串

2.1 已初始化locale

2.2 未初始化locale

2.3 其他测试

三、wcout输出宽字符串

3.1 已初始化locale

3.2 未初始化locale

四、printf输出窄字符串

4.1 已初始化locale

4.2 未初始化locale

五、printf输出宽字符串

5.1 已初始化locale

5.2 未初始化locale

六、总结