原文地址:http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc
Many C++ Windows programmers get confused over what bizarre identifiers like TCHAR
,LPCTSTR
are. In this article, I would attempt by best to clear out the fog.
In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character - all English characters are represented through this encoding. And let's say a 2-byte character is Unicode, which can represent ALL languages in the world.
Visual C++ compiler supports char
and wchar_t
as native data-types for ANSI and Unicode characters respectively. Though there is more concrete definition of Unicode, but for understanding assume it as two-byte character which Windows OS uses for multiple language support.
What if you want your C/C++ code to be independent of character encoding/mode used?
Suggestion: Use generic data-types and names to represent characters and string.
For example, instead of replacing:
char cResponse; // 'Y' or 'N' char sUsername[64]; // str* functions
with
wchar_t cResponse; // 'Y' or 'N' wchar_t sUsername[64]; // wcs* functions
In order to support multi-lingual (i.e. Unicode) in your language, you can simply code it in more generic manner:
#include<TCHAR.H> // Implicit or explicit include TCHAR cResponse; // 'Y' or 'N' TCHAR sUsername[64]; // _tcs* functions
The following project setting in General page describes which Character Set is to be used for compilation:
(General -> Character Set)
This way, when your project is being compiled as Unicode, the TCHAR
would translate to wchar_t
. If it is being compiled as ANSI/MBCS, it would be translated to char
. You are free to use char
and wchar_t
, and project settings will not affect any direct use of these keywords.
T
is defined as:CHAR
#ifdef _UNICODE typedef wchar_t TCHAR; #else typedef char TCHAR; #endif
The macro _UNICODE
is defined when you set Character Set to "Use Unicode Character Set", and therefore TCHAR
would mean wchar_t
. When Character Set if set to "Use Multi-Byte Character Set", TCHAR would mean char
.
Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of using strcpy
, strlen
, strcat
(including the secure versions suffixed with_s); or wcscpy
, wcslen
, wcscat
(including secure), you should better use use _tcscpy
, _tcslen
, _tcscat
functions.
As you know strlen
is prototyped as:
size_t strlen(const char*);
And, wcslen
is prototyped as:
size_t wcslen(const wchar_t* );
You may better use _tcslen
, which is logically prototyped as:
size_t _tcslen(const TCHAR* );
WC is for Wide Character. Therefore, wcs
turns to be wide-character-string. This way, _tcs
would mean _T Character String. And you know _T may be char
or what_t
, logically.
But, in reality, _tcslen
(and other _tcs
functions) are actually not functions, but macros. They are defined simply as:
#ifdef _UNICODE #define _tcslen wcslen #else #define _tcslen strlen #endif
You should refer TCHAR.H
to lookup more macro definitions like this.
You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: A library or DLL may export a single function, with same name and prototype (Ignore overloading concept of C++). For instance, when you export a function as:
void _TPrintChar(char);
How the client is supposed to call it as?
void _TPrintChar(wchar_t);
_TPrintChar
cannot be magically converted into function taking 2-byte character. There has to be two separate functions:
void PrintCharA(char); // A = ANSI void PrintCharW(wchar_t); // W = Wide character
And a simple macro, as defined below, would hide the difference:
#ifdef _UNICODE void _TPrintChar(wchar_t); #else void _TPrintChar(char); #endif
The client would simply call it as:
TCHAR cChar; _TPrintChar(cChar);
Note that both TCHAR
and _TPrintChar
would map to either Unicode or ANSI, and therefore cChar
and the argument to function would be either char
or wchar_t
.
Macros do avoid these complications, and allows us to use either ANSI or Unicode function for characters and strings. Most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (a macro!) is good. SetWindowText
is one example:
// WinUser.H #ifdef UNICODE #define SetWindowText SetWindowTextW #else #define SetWindowText SetWindowTextA #endif // !UNICODE
There are very few functions that do not have macros, and are available only with suffixed W or A. One example isReadDirectoryChangesW
, which doesn't have ANSI equivalent.
You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:
"This is ANSI String. Each letter takes 1 byte."
The string text given above is not Unicode, and would be quantifiable for multi-language support. To represent Unicode string, you need to use prefix L
. An example:
L"This is Unicode string. Each letter would take 2 bytes, including spaces."
Note the L at the beginning of string, which makes it a Unicode string. All characters (I repeat all characters) would take two bytes, including all English letters, spaces, digits, and the null character. Therefore, length of Unicode string would always be in multiple of 2-bytes. A Unicode string of length 7 characters would need 14 bytes, and so on. Unicode string taking 15 bytes, for example, would not be valid in any context.
In general, string would be in multiple of sizeof(TCHAR)
bytes!
When you need to express hard-coded string, you can use:
"ANSI String"; // ANSI L"Unicode String"; // Unicode _T("Either string, depending on compilation"); // ANSI or Unicode // or use TEXT macro, if you need more readability
The non-prefixed string is ANSI string, the L prefixed string is Unicode, and string specified in _T
or TEXT
would be either, depending on compilation.
String classes, like MFC/ATL's CString implement two versions using macro. There are two classes named CStringA for ANSI, CStringW for Unicode. When you use CString (which is a macro/typedef), it translates to either of two classes. Okay. The TCHAR type-definition was for a single character. You can definitely declare an array of TCHAR. What if you want to express a character-pointer, or a const-character-pointer - Which one of the following?
// ANSI characters foo_ansi(char*); foo_ansi(const char*); /*const*/ char* pString; // Unicode/wide-string foo_uni(WCHAR*); // or wchar_t* foo_uni(const WCHAR*); /*const*/ WCHAR* pString; // Independent foo_char(TCHAR*); foo_char(const TCHAR*); /*const*/ TCHAR* pString;After reading about
TCHAR
stuff, you'd definitely select the last one as your choice. But here is a better alternative. Before that, note that TCHAR.H header file declares only TCHAR
datatype and for the following stuff, you need to include Windows.h (defined in WinNT.h).NOTE: If your project implicitly or explicitly includes Windows.h, you need not include TCHAR.H
- char* replacement:
LPSTR
- const char* replacement:
LPCSTR
- WCHAR* replacement:
LPWSTR
- const WCHAR* replacement:
LPCWSTR
(C before W, sinceconst
is beforeWCHAR
) - TCHAR* replacement:
LPTSTR
- const TCHAR* replacement:
LPCTSTR
BOOL SetCurrentDirectory( LPCTSTR lpPathName ); DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);Continuing. You must have seen some functions/methods asking you to pass number of characters, or returning the number of characters. Well, like
GetCurrentDirectory
, you need to pass number of characters, and not number of bytes. For example::
TCHAR sCurrentDir[255]; // Pass 255 and not 255*2 GetCurrentDirectory(sCurrentDir, 255);On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C++, you can simply use
new
:LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.But if you use memory allocation functions like
malloc
, LocalAlloc
, GlobalAlloc
, etc; you must specify the number of bytes!pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );Typecasting the return value is required, as you know. The expression in
malloc
's argument ensures that it allocates desired number of bytes - and makes up room for desired number of characters.License
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
About the Author
Ajay Vijayvargiya Software Developer (Senior)
India Member |
Started programming with GwBasic back in 1996 (Those lovely days!). Found the hidden talent! Touched COBOL and Quick Basic for a while. Finally learned C and C++ entirely on my own, and fell in love with C++, still in love! Began with Turbo C 2.0/3.0, then to VC6 for 4 years! Finally on VC2008/2010. I enjoy programming, mostly the system programming, but the UI is always on top of MFC! Quite experienced on other environments and platforms, but I prefer Visual C++. Zeal to learn, and to share! |