zoukankan      html  css  js  c++  java
  • Multibyte VS WideChar Conversion

    1. Multibyte shows to us as char*. While in fact, it can be any code page encoding, including gbk, utf8, etc. If a char* represents utf8 characters, we need to handle it specially in below way:

    // Convert unicode(windows default utf16) to utf8 char* string.

    MakeUTF8String(const wchar_t* pWide)

    {
        // Compute the required size of the buffer by passing cbMultiByte as 0
        MCAD_ASSERT(pWide);
        int nBytes = ::WideCharToMultiByte(CP_UTF8, 0, pWide, -1, NULL, 0, NULL, NULL);
        MCAD_ASSERT(nBytes > 0);

        char* pNarrow = new char[nBytes + 1];
        nBytes = ::WideCharToMultiByte(CP_UTF8, 0, pWide, -1, pNarrow, nBytes + 1, NULL, NULL);
        UTxString8 utf8Str(pNarrow);
        MCAD_ASSERT(nBytes);
        delete [] pNarrow;
        return utf8Str;

    }

    // Convert a utf8 char* to be utf16 unicode string wchar*.

    MakeWideString(const char* pNarrow)
    {
         // Get the required size of the buffer that receives the Unicode string.
        DWORD dwMinSize;
        dwMinSize = MultiByteToWideChar (CP_UTF8, 0, pNarrow, -1, NULL, 0);

        // Convert headers from ASCII to Unicode.
        wchar* pWide = new wchar[nBytes + 1];
        MultiByteToWideChar (CP_UTF8, 0, pNarrow, -1, pWide, dwMinSize);

        // Below method can work as      above code section:
        // setlocale(LC_ALL, ".65001/.UTF8"); // utf8
        // mbstowcs     
    }

     

    2. My hot summary of WideCharToMultiByte() vs. wcstombs() kind comparison:

    1) wcstombs() calls WideCharToMultiByte() inside.  WideCharToMultiByte() is the basic function to do the conversion. But because of its too many too detail arguments, CRT provides the instead function to ease our work. Very good consideration!

    2) When call WideCharToMultiByte(), there is an argument by which we can set the code page to be used to do the conversion. While for wcstombs(), it just use current system locale. So in order to use these 2 functions really in the same place, Setlocale must be called before wcstombs().

    Resource related:

    wcstombs和mbstowcs

    MultiByteToWideChar

    http://msdn.microsoft.com/en-us/library/ms776413(VS.85).aspx

    Note: For UTF-8, dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails with ERROR_INVALID_FLAGS.

    mbstowcs

    http://msdn.microsoft.com/en-us/library/k1f9b8cy(VS.71).aspx

  • 相关阅读:
    JavaScript 的核心机制——event loop(最易懂版)
    关于敏捷讨论的感想
    前端,如何更优雅的面对异步
    广告行业中那些趣事系列10:推荐系统中不得不说的DSSM双塔模型
    书中自有黄金屋系列7:读《博世宁医学通识讲义》
    广告行业中那些趣事系列9:一网打尽Youtube深度学习推荐系统
    书中自有黄金屋系列6:读《浪潮之巅》-下篇
    书中自有黄金屋系列6:读《浪潮之巅》-上篇
    广告行业中那些趣事系列8:详解BERT中分类器源码
    书中自有黄金屋系列5:读《正面管教》
  • 原文地址:https://www.cnblogs.com/taoxu0903/p/1282433.html
Copyright © 2011-2022 走看看