zoukankan html css js c++ java

多字节与宽字节 string wstring 互转

多字节字符集（MBCS，Multi-Byte Chactacter Set）：指用多个字节来表示一个字符的字符编码集合。一般英文字母用1Byte，汉语等用2Byte来表示。兼容ASCII 127。

在最初的时候，Internet上只有一种字符集——ANSI的ASCII字符集，它使用7 bits来表示一个字符，总共表示128个字符，其中包括了英文字母、数字、标点符号等常用字符。

为了扩充ASCII编码，以用于显示本国的语言，不同的国家和地区制定了不同的标准，由此产生了 GB2312, BIG5, JIS 等各自的编码标准。这些使用 2 个字节来代表一个字符的各种汉字延伸编码方式，称为 ANSI 编码，又称为"MBCS（Muilti-Bytes Charecter Set，多字节字符集）"。

不同 ANSI 编码之间互不兼容，当信息在国际间交流时，无法将属于两种语言的文字，存储在同一段 ANSI 编码的文本中。一个很大的缺点是，同一个编码值，在不同的编码体系里代表着不同的字。这样就容易造成混乱。导致了unicode码的诞生。

宽字节字符集：一般指Unicode编码的字符集，

Unicode称为统一码或万国码，统一了不同国家的字符编码。

Unicode通常用两个字节表示一个字符，原有的英文编码从单字节变成双字节，只需要把高字节全部填为0就可以。

为了统一所有文字的编码，Unicode应运而生。Unicode把所有语言都统一到一套编码里，这样就不会再有乱码问题了。

Unicode固然统一了编码方式，但是它的效率不高，比如UCS-4(Unicode的标准之一)规定用4个字节存储一个符号，那么每个英文字母前都必然有三个字节是0，这对存储和传输来说都很耗资源。为了提高Unicode的编码效率，于是就出现了UTF-8编码。UTF-8可以根据不同的符号自动选择编码的长短。比如英文字母可以只用1个字节就够了。

UTF是“Unicode Transformation Format”的缩写，可以翻译成Unicode字符集转换格式，即怎样将Unicode定义的数字转换成程序数据。用char、char16_t、char32_t分别表示无符号8位整数，无符号16位整数和无符号32位整数。UTF-8、UTF-16、UTF-32分别以char、char16_t、char32_t作为编码单位。（注： char16_t 和 char32_t 是 C++ 11 标准新增的关键字。如果你的编译器不支持 C++ 11 标准，请改用 unsigned short 和 unsigned long。）“汉字”的UTF-8编码需要3个字节。“汉字”的UTF-16编码需要两个char16_t，大小是2个字节。“汉字”的UTF-32编码需要两个char32_t，大小是4个字节。

普通字符、字符串前加 L 就变成宽字符 wchar_t 存储（用2Byte存1个字符）了，例如，L‘看’，L"abc啊";或_T("sf飞")

MFC中的 CString 与 std::string 的转换：

1. 使用Unicode字符集时，CString等价于CStringW；使用多字节字符集时，CString相对于CStringA

2. CString --> std::string

// 1. Unicode下 CString --> std::string
// 方法1
CString str = L"sdf";
std::string s = CT2A(str.GetString());
    // GetString()比较新的VS有，旧可以用GetBuffer（）
    std::string s = CT2A(str.GetBuffer());
    str.ReleaseBuffer();
// 方法2
CString str = L"dshf";
CStringA stra(str);
std::string s(stra);
//或
std::string s(CStringA(str));

//方法3
USES_CONVERSION;
CString str = L"djg";
std::string s = W2A(str);
//首先str--》const wchar_t* ，然后W2A将const wchar_t*--》const char*，
//最后用const char*初始化s

3. std::string --> CStringW / std::wstring

std::string s("dhhh");
CStringW strw(CStringA(s.c_str());
std::wstring sw(strw);

1）TCHAR 转换为const wchar_t *，直接强制转换，在TCHAR前面加上(*const wchar_t)

2）BSTR：是一个OLECHAR*类型的Unicode字符串，是一个COM字符串，带长度前缀，与VB有关，没怎么用到过。

LPSTR：即 char *，指向以'/0'结尾的8位（单字节）ANSI字符数组指针

LPWSTR：即wchar_t *，指向'/0'结尾的16位（双字节）Unicode字符数组指针

LPCSTR：即const char *

LPCWSTR：即const wchar_t *

LPTSTR：LPSTR、LPWSTR两者二选一，取决于是否宏定义了UNICODE或ANSI

LPCTSTR： LPCSTR、LPCWSTR两者二选一，取决于是否宏定义了UNICODE或ANSI，

如下是从MFC库中拷来的：

#ifdef UNICODE 
typedef LPWSTR LPTSTR; 
typedef LPCWSTR LPCTSTR;
#else 
typedef LPSTR LPTSTR; 
typedef LPCSTR LPCTSTR; 
#endif

相互转换方法：

LPWSTR->LPTSTR: 　　 W2T();
LPTSTR->LPWSTR: 　　 T2W();
LPCWSTR->LPCSTR: 　　W2CT();
LPCSTR->LPCWSTR: 　　T2CW();
ANSI->UNICODE: 　　A2W();
UNICODE->ANSI: 　　W2A();

3）

LPWSTR转为LPCSTR

LPCSTR=CW2A(LPWSTR);

CString与LPCWSTR的转化(http://www.cnblogs.com/foolboy/archive/2005/07/25/199869.html)

问题起因：
在写WritePrivateProfileString写.ini配置文件时在msdn中看到，如果想要写的配置信息即时生效，必须在之前使用WritePrivateProfileStringW来re-read一下目标.ini文件，其原文如下：

// force the system to re-read the mapping into shared memory  
// so that future invocations of the application will see it  
//  without the user having to reboot the system  
WritePrivateProfileStringW( NULL, NULL, NULL, L"appname.ini" );

查了一下msdn中WritePrivateProfileStringW的原型如下：

WINBASEAPI BOOL WINAPI WritePrivateProfileStringW ( 
 LPCWSTR lpAppName,  //section []中的字符串
 LPCWSTR lpKeyName,  // key  “=”左边的字符串
 LPCWSTR lpString,   //写入的内容
 LPCWSTR lpFileName ) // 配置文件的路径
例如：
[section]
key=string

　　其中的每个参数的类型都为LPCWSTR，实际中获得的文件名都为CString，问题产生。

问题分析：

LPCWSTR 是Unicode字符串指针，初始化时串有多大，申请空间就有多大，以后存储若超过则出现无法预料的结果，这是它与CString的不同之处。而CString是一个串类，内存空间类会自动管理。LPCWSTR 初始化如下：

LPCWSTR Name=L"TestlpCwstr";

由于LPCWSTR必须指向Unicode的字符串，问题的关键变成了Anis字符与Unicode字符之间的转换，不同编码间的转换，通过查找资料可知，可以ATL中转换宏可以用如下方法实现：

//方法一 
CString str=_T("TestStr"); 
USES_CONVERSION; 
LPWSTR pwStr=new wchar_t[str.GetLength()+1]; 
wcscpy(pwStr,T2W((LPCTSTR)str));
 
// 方法二 
CString str=_T("TestStr"); 
USES_CONVERSION; 
LPWCSTR pwcStr = A2CW((LPCSTR)str);

MFC中CString和LPSTR是可以通用，其中A2CW表示(LPCSTR) -> (LPCWSTR)，USER_CONVERSION表示用来定义一些中间变量，在使用ATL的转换宏之前必须定义该语句。
顺便也提一下，如果将LPCWSTR转换成CString，那就更加容易，在msdn中的CString类说明中提到了可以直接用LPCWSTR来构造CString，所以可以进行如下的转换代码：

LPCWSTR pcwStr = L"TestpwcStr";
CString str(pcwStr);

问题总结：
在头文件<atlconv.h>中定义了ATL提供的所有转换宏，如：

  A2CW (LPCSTR)  -> (LPCWSTR)
  A2W        (LPCSTR)  -> (LPWSTR)
  W2CA (LPCWSTR) -> (LPCSTR)
  W2A        (LPCWSTR) -> (LPSTR)

所有的宏如下表所示：

A2BSTR	OLE2A	T2A	W2A
A2COLE	OLE2BSTR	T2BSTR	W2BSTR
A2CT	OLE2CA	T2CA	W2CA
A2CW	OLE2CT	T2COLE	W2COLE
A2OLE	OLE2CW	T2CW	W2CT
A2T	OLE2T	T2OLE	W2OLE
A2W	OLE2W	T2W	W2T

上表中的宏函数，非常的有规律，每个字母都有确切的含义如下：

2	to 的发音和 2 一样，所以借用来表示“转换为、转换到”的含义。
A	ANSI 字符串，也就是 MBCS。
W、OLE	宽字符串，也就是 UNICODE。
T	中间类型T。如果定义了 _UNICODE，则T表示W；如果定义了 _MBCS，则T表示A
C	const 的缩写

    利用这些宏，可以快速的进行各种字符间的转换。使用前必须包含头文件，并且申明USER_CONVERSION；使用 ATL 转换宏，由于不用释放临时空间，所以使用起来非常方便。但是考虑到栈空间的尺寸（VC 默认2M），使用时要注意几点：
    1、只适合于进行短字符串的转换；
    2、不要试图在一个次数比较多的循环体内进行转换；
    3、不要试图对字符型文件内容进行转换，因为文件尺寸一般情况下是比较大的；
    4、对情况 2 和 3，要使用 MultiByteToWideChar() 和 WideCharToMultiByte()；

MultiByteToWideChar() 和 WideCharToMultiByte()的用法：
www.cnblogs.com/ranjiewen/p/5770639.html

int MultiByteToWideChar(
　　UINT CodePage, //指定执行转换的多字节字符所使用的字符集，CP_ACP：ANSI字符集，CP_UTF8：UTF-8字符集
　　DWORD dwFlags, // 一般为NULL
　　LPCSTR lpMultiByteStr, // [in] 要被转换的字符指针
　　int cchMultiByte,  // lpMultiByteStr指针指向的字符串的长度，若字符串以结尾，可简单写为 -1
　　LPWSTR lpWideCharStr, //[out] 输出的宽字符串指针
　　int cchWideChar  // 指定由参数lpWideCharStr指向的缓冲区的宽字符数。若此值为0，函数不会执行转换，而是返回目标缓存lpWideChatStr所需的宽字符数。
　　);

int WideCharToMultiByte(
UINT CodePage,  //指定执行转换的字符集
DWORD dwFlags,  // NULL
LPCWSTR lpWideCharStr, // 待转换的字符串
int cchWideChar, // 待转换的字符串长度，若以空字符结尾，则可写-1
LPSTR lpMultiByteStr, // 指向接收被转换字符串的缓冲区
int cbMultiByte,  // 缓冲区的长度，若为0，函数返回接收的缓冲区的长度
LPCSTR lpDefaultChar, // NULL
LPBOOL lpUsedDefaultChar //NULL
);

　　例子：

/// std::string ==> std::wstring
std::wstring s2ws(std::string s)
{//CP_ACP : ANSI字符集
    //当cchWideChar=0，返回存宽字符的长度，
    //并且待转换的字符串的长度为 -1 时，返回的长度包括空字符，new时 new wchar_t[nLen]
    //待转换的字符串的长度为 s.size() 时，返回的长度不包括空字符，new时 new wchar_t[nLen+1]
    // 1. 用 -1
    int nLen = ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), -1, NULL, 0);
    wchar_t *buf = new wchar_t[nLen];
    //wmemset(buf, 0, nLen);//当转换包括,就不用初始化0了
    ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), -1, buf, nLen);
    std::wstring ws(buf);
    delete[] buf;
    return ws;
    // 2. 用 s.size()
    int nLen = ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), s.size(), NULL, 0);
    wchar_t *buf = new wchar_t[nLen+1];
    wmemset(buf, 0, nLen+1);//当转换包括,就不用初始化0了
    ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), s.size(), buf, nLen);
    std::wstring ws(buf);
    delete[] buf;
    return ws;
}
/// std::wstring ==> std::string
std::string ws2s(std::wstring ws)
{
    int nLen = ::WideCharToMultiByte(CP_ACP, NULL, ws.c_str(), -1, NULL, 0, NULL, NULL);
    char* buf = new char[nLen];
    ::WideCharToMultiByte(CP_ACP, NULL, ws.c_str(), -1, buf, nLen, NULL, NULL);
    std::string s(buf);
    delete[] buf;
    return s;
}
 
///// 当需要转换不同字符集（ANSI：CP_ACP UTF8：CP_UTF8）时，
///// 就必须用WideCharToMultiByte和MultiByteToWideChar （暂时没找到别的，高手请指教）
// ANSI ==> UTF8
std::string ANSI_to_UTF8(std::string sAnsi)
{
    std::wstring wsAnsi = s2ws(sAnsi);
    int nLen = ::WideCharToMultiByte(CP_UTF8, NULL, wsAnsi.c_str(), -1, NULL, 0, NULL, NULL);
    char* buf = new char[nLen];
    ::WideCharToMultiByte(CP_UTF8, NULL, wsAnsi.c_str(), -1, buf, nLen, NULL, NULL);
    std::string sUtf8(buf);
    delete[] buf;
    return sUtf8;
}
// UTF8 ==> ANSI
std::string UTF8_to_ANSI(std::string sUtf8)
{
    //std::wstring wsUtf8 = s2ws(sUtf8);//不能用这句，因为这是ANSI字符集的转换
    int nLen = ::MultiByteToWideChar(CP_UTF8, NULL, sUtf8.c_str(), -1, NULL, 0);
    wchar_t *wbuf = new wchar_t[nLen];
    ::MultiByteToWideChar(CP_UTF8, NULL, sUtf8.c_str(), -1, wbuf, nLen);
    std::wstring wsUtf8(wbuf);
    delete[] wbuf;
 
    //int nLen2 = ::WideCharToMultiByte(CP_ACP, NULL, wsUtf8.c_str(), -1, NULL, 0, NULL, NULL);
    //char* buf = new char[nLen2];
    //::WideCharToMultiByte(CP_ACP, NULL, wsUtf8.c_str(), -1, buf, nLen2, NULL, NULL);
    //std::string sAnsi(buf);
    //delete[] buf;
    //或者
    std::string sAnsi = ws2s(wsUtf8);
    return sAnsi;
}
 
int main(int argc, char* argv[])
{
    std::string s( "Hello world.你好，中国。");
    std::wstring ws = s2ws(s);
    std::string s1 = ws2s(ws);
    std::string sAnsi(s);
    std::string sUtf8 = ANSI_to_UTF8(sAnsi);
    std::string sAnsi2 = UTF8_to_ANSI(sUtf8);
 
    std::ofstream file("1.txt");
    file << sUtf8.c_str();
    return 0;
}

上面的函数整理：

#include <Windows.h>
// std::string ==> std::wstring
bool s2ws(const std::string &s,std::wstring &ws)
{
    if (s.empty())
        return true;
    int nLen = ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), -1, NULL, 0);//-1,返回的nLen包括,即s.size()+1
    wchar_t *buf = new wchar_t[nLen];
    int nWrited = ::MultiByteToWideChar(CP_ACP, NULL, s.c_str(), -1, buf, nLen);//-1: 转换包括
    ws = buf;
    delete[] buf;
    return (nLen == nWrited) ? true : false;
}
// std::wstring ==> std::string
bool ws2s(const std::wstring &ws, std::string &s)
{
    if (ws.empty())
        return true;
    int nLen = ::WideCharToMultiByte(CP_ACP, NULL, ws.c_str(), -1, NULL, 0, NULL, NULL);
    char* buf = new char[nLen];
    int nWrited = ::WideCharToMultiByte(CP_ACP, NULL, ws.c_str(), -1, buf, nLen, NULL, NULL);
    s = buf;
    delete[] buf;
    return (nWrited == nLen) ? true : false;
}

///// 转换不同字符集（ANSI：CP_ACP UTF8：CP_UTF8）
// ANSI ==> UTF8
bool ANSI_to_UTF8(const std::string &sAnsi, std::string &sUtf8)
{
    if (sAnsi.empty())
        return true;
    std::wstring wsAnsi;
    int nLen = ::MultiByteToWideChar(CP_ACP, NULL, sAnsi.c_str(), -1, NULL, 0);
    wchar_t *buf1 = new wchar_t[nLen];
    int nWrited = ::MultiByteToWideChar(CP_ACP, NULL, sAnsi.c_str(), -1, buf1, nLen);
    wsAnsi = buf1;
    delete[] buf1;
    if (nWrited != nLen)
        return false;
    nLen = ::WideCharToMultiByte(CP_UTF8, NULL, wsAnsi.c_str(), -1, NULL, 0, NULL, NULL);
    char* buf2 = new char[nLen];
    nWrited = ::WideCharToMultiByte(CP_UTF8, NULL, wsAnsi.c_str(), -1, buf2, nLen, NULL, NULL);
    sUtf8 = buf2;
    delete[] buf2;
    return (nWrited == nLen) ? true : false;
}
// UTF8 ==> ANSI
bool UTF8_to_ANSI(const std::string &sUtf8, std::string &sAnsi)
{
    if (sUtf8.empty())
        return true;
    int nLen = ::MultiByteToWideChar(CP_UTF8, NULL, sUtf8.c_str(), -1, NULL, 0);
    wchar_t *wbuf = new wchar_t[nLen];
    int nWrited = ::MultiByteToWideChar(CP_UTF8, NULL, sUtf8.c_str(), -1, wbuf, nLen);
    std::wstring wsUtf8(wbuf);
    delete[] wbuf;
    if (nWrited != nLen)
        return false;
    nLen = ::WideCharToMultiByte(CP_ACP, NULL, wsUtf8.c_str(), -1, NULL, 0, NULL, NULL);
    char* buf = new char[nLen];
    nWrited = ::WideCharToMultiByte(CP_ACP, NULL, wsUtf8.c_str(), -1, buf, nLen, NULL, NULL);
    sAnsi = buf;
    delete[] buf;
    return (nWrited == nLen) ? true : false;
}

View Code

采用ATL封装_bstr_t的过渡：

#include <comutil.h>  
#pragma comment(lib, "comsuppw.lib")
 
string ws2s(const wstring& ws)
{
    _bstr_t t = ws.c_str();  
    char* pchar = (char*)t;  
    string result = pchar;  
    return result;  
}
 
wstring s2ws(const string& s)
{
    _bstr_t t = s.c_str();  
    wchar_t* pwchar = (wchar_t*)t;  
    wstring result = pwchar;  
    return result; 
}
--------------------- 
原文：https://blog.csdn.net/liminwang0311/article/details/79975174

使用MFC的CString：

#include <atlstr.h>
std::string ws2s(std::wstring ws)
{
	return std::string(CStringA(CStringW(ws.c_str())));
}
std::wstring s2ws(std::string s)
{
	return std::wstring(CStringW(CStringA(s.c_str())));
}

//其实 ws => const wchar_t* => CStringW => LPCWSTR  => CStringA => LPCSTR => string
string s(CStringA(CStringW(ws.c_str()));
wstring ws(CStringW(CStringA(s.c_str()));
//其中 CStringW => LPCWSTR 和 CStringA => LPCSTR 是默认自动转换的。定义了 operator LPCSTR() const
	LPCSTR pStr = "kkk"; //LPCSTR == const char*
	LPCWSTR pwStr = L"hhh"; //LPCWSTR == const wchar_t*
	CStringA a(pwStr);//"hhh"
	CStringW w(pStr);//L"kkk"

std::string <--> std::wstring 最简单用basic_string的迭代器构造函数
（注意：不支持中文）

	std::string s("hello world.");
	std::wstring ws(s.begin(), s.end());

	std::wstring ws2(L"hello China.");
	std::string s2(ws2.begin(), ws2.end());

常记溪亭日暮，沉醉不知归路。兴尽晚回舟，误入藕花深处。争渡，争渡，惊起一滩鸥鹭。

昨夜雨疏风骤，浓睡不消残酒。试问卷帘人，却道海棠依旧。知否？知否？应是绿肥红瘦。

查看全文

相关阅读:
应用程序框架实战十三:DDD分层架构之我见
 Util应用程序框架公共操作类(三):数据类型转换公共操作类（扩展篇）
Util应用程序框架公共操作类(二):数据类型转换公共操作类（源码篇）
不能使用 float 和 double 来表示金额等精确的值
 JVM 字节码指令手册
 MyBatis: Invalid bound statement (not found)错误的可能原因
 Oracle：ORA-01219:database not open:queries allowed on fixed tables/views only
手写 Spring MVC
8080 端口被占用的解决方法 netstat -ano；taskkill （命令行）
Java 工具类 IpUtil

原文地址：https://www.cnblogs.com/htj10/p/11027323.html