zoukankan      html  css  js  c++  java
  • (转)What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

    原文地址:http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc

    Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR,LPCTSTR are. In this article, I would attempt by best to clear out the fog.

    In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world. 

    Visual C++ compiler supports char and wchar_t as native data-types for ANSI and Unicode characters respectively. Though there is more concrete definition of Unicode, but for understanding assume it as two-byte character which Windows OS uses for multiple language support.

    What if you want your C/C++ code to be independent of character encoding/mode used? 
    Suggestion: Use generic data-types and names to represent characters and string.

    For example, instead of replacing:

    char cResponse; // 'Y' or 'N'
    char sUsername[64];
    // str* functions

    with

    wchar_t cResponse; // 'Y' or 'N'
    wchar_t sUsername[64];
    // wcs* functions

    In order to support multi-lingual (i.e. Unicode) in your language, you can simply code it in more generic manner:

    #include<TCHAR.H> // Implicit or explicit include
    TCHAR cResponse; // 'Y' or 'N'
    TCHAR sUsername[64];
    // _tcs* functions

    The following project setting in General page describes which Character Set is to be used for compilation:
    (General -> Character Set)

    This way, when your project is being compiled as Unicode, the TCHAR would translate to wchar_t. If it is being compiled as ANSI/MBCS, it would be translated to char. You are free to use char and wchar_t, and project settings will not affect any direct use of these keywords.

    TCHAR is defined as:

    #ifdef _UNICODE
    typedef wchar_t TCHAR;
    #else
    typedef char TCHAR;
    #endif

    The macro _UNICODE is defined when you set Character Set to "Use Unicode Character Set", and therefore TCHARwould mean wchar_t. When Character Set if set to "Use Multi-Byte Character Set", TCHAR would mean char.

    Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of using strcpystrlenstrcat (including the secure versions suffixed with_s); or wcscpywcslenwcscat (including secure), you should better use use _tcscpy_tcslen_tcscatfunctions.

    As you know strlen is prototyped as:

    size_t strlen(const char*);

    And, wcslen is prototyped as:

    size_t wcslen(const wchar_t* );

    You may better use _tcslen, which is logically prototyped as:

    size_t _tcslen(const TCHAR* );

    WC is for Wide Character. Therefore, wcs turns to be wide-character-string. This way, _tcs would mean _T Character String. And you know _T may be char or what_t, logically.

    But, in reality, _tcslen (and other _tcs functions) are actually not functions, but macros. They are defined simply as:

    #ifdef _UNICODE
    #define _tcslen wcslen 
    #else
    #define _tcslen strlen
    #endif
    

    You should refer TCHAR.H to lookup more macro definitions like this.

    You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: A library or DLL may export a single function, with same name and prototype (Ignore overloading concept of C++). For instance, when you export a function as:

    void _TPrintChar(char);

    How the client is supposed to call it as?

    void _TPrintChar(wchar_t);

    _TPrintChar cannot be magically converted into function taking 2-byte character. There has to be two separate functions:

    void PrintCharA(char); // A = ANSI 
    void PrintCharW(wchar_t); // W = Wide character
    

    And a simple macro, as defined below, would hide the difference:

    #ifdef _UNICODE
    void _TPrintChar(wchar_t); 
    #else 
    void _TPrintChar(char);
    #endif

    The client would simply call it as:

    TCHAR cChar;
    _TPrintChar(cChar);
    

    Note that both TCHAR and _TPrintChar would map to either Unicode or ANSI, and therefore cChar and the argument to function would be either char or wchar_t.

    Macros do avoid these complications, and allows us to use either ANSI or Unicode function for characters and strings. Most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (a macro!) is good. SetWindowText is one example:

    // WinUser.H
    #ifdef UNICODE
    #define SetWindowText  SetWindowTextW
    #else
    #define SetWindowText  SetWindowTextA
    #endif // !UNICODE

    There are very few functions that do not have macros, and are available only with suffixed W or A. One example isReadDirectoryChangesW, which doesn't have ANSI equivalent.


    You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:
    "This is ANSI String. Each letter takes 1 byte."

    The string text given above is not Unicode, and would be quantifiable for multi-language support. To represent Unicode string, you need to use prefix L. An example:

    L"This is Unicode string. Each letter would take 2 bytes, including spaces."

    Note the L at the beginning of string, which makes it a Unicode string. All characters (I repeat all characters) would take two bytes, including all English letters, spaces, digits, and the null character. Therefore, length of Unicode string would always be in multiple of 2-bytes. A Unicode string of length 7 characters would need 14 bytes, and so on. Unicode string taking 15 bytes, for example, would not be valid in any context.

    In general, string would be in multiple of sizeof(TCHAR) bytes!

    When you need to express hard-coded string, you can use:

    "ANSI String"; // ANSI
    L"Unicode String"; // Unicode
    
    _T("Either string, depending on compilation"); // ANSI or Unicode
    // or use TEXT macro, if you need more readability

    The non-prefixed string is ANSI string, the L prefixed string is Unicode, and string specified in _T or TEXT would be either, depending on compilation.

    String classes, like MFC/ATL's CString implement two versions using macro. There are two classes named CStringA for ANSI, CStringW for Unicode. When you use CString (which is a macro/typedef), it translates to either of two classes. Okay. The TCHAR type-definition was for a single character. You can definitely declare an array of TCHAR. What if you want to express a character-pointer, or a const-character-pointer - Which one of the following?

    // ANSI characters
    foo_ansi(char*);
    foo_ansi(const char*);
    /*const*/ char* pString;
     
    // Unicode/wide-string
    foo_uni(WCHAR*); // or wchar_t*
    foo_uni(const WCHAR*);
    /*const*/ WCHAR* pString;
     
    // Independent 
    foo_char(TCHAR*);
    foo_char(const TCHAR*);
    /*const*/ TCHAR* pString;
    After reading about TCHAR stuff, you'd definitely select the last one as your choice. But here is a better alternative. Before that, note that TCHAR.H header file declares only TCHAR datatype and for the following stuff, you need to include Windows.h (defined in WinNT.h).

    NOTE: If your project implicitly or explicitly includes Windows.h, you need not include TCHAR.H

    • char* replacement: LPSTR
    • const char* replacement: LPCSTR
    • WCHAR* replacement: LPWSTR
    • const WCHAR* replacement: LPCWSTR (C before W, since const is before WCHAR)
    • TCHAR* replacement: LPTSTR
    • const TCHAR* replacement: LPCTSTR
    Now, I hope you understand the following signatures:
    BOOL SetCurrentDirectory( LPCTSTR lpPathName );
    DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);
    Continuing. You must have seen some functions/methods asking you to pass number of characters, or returning the number of characters. Well, like GetCurrentDirectory, you need to pass number of characters, and not number of bytes. For example::
    TCHAR sCurrentDir[255];
     
    // Pass 255 and not 255*2 
    GetCurrentDirectory(sCurrentDir, 255);
    On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C++, you can simply use new:
    LPTSTR pBuffer; // TCHAR* 
    
    pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.
    But if you use memory allocation functions like mallocLocalAllocGlobalAlloc, etc; you must specify the number of bytes!
    pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );
    Typecasting the return value is required, as you know. The expression in malloc's argument ensures that it allocates desired number of bytes - and makes up room for desired number of characters.

    License

    This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

    About the Author

     
    Ajay Vijayvargiya

    Software Developer (Senior)

    India India

    Member
    Started programming with GwBasic back in 1996 (Those lovely days!). Found the hidden talent!
     
    Touched COBOL and Quick Basic for a while. 
     
    Finally learned C and C++ entirely on my own, and fell in love with C++, still in love! Began with Turbo C 2.0/3.0, then to VC6 for 4 years! Finally on VC2008/2010.
     
    I enjoy programming, mostly the system programming, but the UI is always on top of MFC! Quite experienced on other environments and platforms, but I prefer Visual C++. Zeal to learn, and to share!
  • 相关阅读:
    (转)CSS3之pointer-events(屏蔽鼠标事件)属性说明
    Linux下source命令详解
    控制台操作mysql常用命令
    解决beego中同时开启http和https时,https端口占用问题
    有关亚马逊云的使用链接收集
    favicon.ico--网站标题小图片二三事
    网络博客
    Gitbook 命令行工具
    Markdown 轻量级标记语言
    SVN 集中式版本控制系统
  • 原文地址:https://www.cnblogs.com/lebronjames/p/2393915.html
Copyright © 2011-2022 走看看