Character sequences
The string class has been briefly introduced in an earlier chapter. It is a very powerful class to handle and manipulate strings of characters. However, because strings are, in fact, sequences of characters, we can represent them also as plain arrays of elements of a character type.
For example, the following array:
char foo [20];
is an array that can store up to 20 elements of type char. It can be represented as:
Therefore, this array has a capacity to store sequences of up to 20 characters. But this capacity does not need to be fully exhausted: the array can also accommodate shorter sequences. For example, at some point in a program, either the sequence "Hello" or the sequence "Merry Christmas" can be stored in foo, since both would fit in a sequence with a capacity for 20 characters.
By convention, the end of strings represented in character sequences is signaled by a special character: the null character, whose literal value can be written as ' ' (backslash, zero).
In this case, the array of 20 elements of type char called foo can be represented storing the character sequences "Hello" and "Merry Christmas" as:
Notice how after the content of the string itself, a null character (' ') has been added in order to indicate the end of the sequence. The panels in gray color represent char elements with undetermined values.
Initialization of null-terminated character sequences
Because arrays of characters are ordinary arrays, they follow the same rules as these. For example, to initialize an array of characters with some predetermined sequence of characters, we can do it just like any other array:
char myword[] = { 'H', 'e', 'l', 'l', 'o', ' ' };
The above declares an array of 6 elements of type char initialized with the characters that form the word "Hello" plus a null character ' ' at the end.
But arrays of character elements have another way to be initialized: using string literals directly.
In the expressions used in some examples in previous chapters, string literals have already shown up several times. These are specified by enclosing the text between double quotes ("). For example:
"the result is: "
This is a string literal, probably used in some earlier example.
Sequences of characters enclosed in double-quotes (") are literal constants. And their type is, in fact, a null-terminated array of characters. This means that string literals always have a null character (' ') automatically appended at the end.
Therefore, the array of char elements called myword can be initialized with a null-terminated sequence of characters by either one of these two statements:
char myword[] = { 'H', 'e', 'l', 'l', 'o', ' ' };
char myword[] = "Hello";
In both cases, the array of characters myword is declared with a size of 6 elements of type char: the 5 characters that compose the word "Hello", plus a final null character (' '), which specifies the end of the sequence and that, in the second case, when using double quotes (") it is appended automatically.
Please notice that here we are talking about initializing an array of characters at the moment it is being declared, and not about assigning values to them later (once they have already been declared). In fact, because string literals are regular arrays, they have the same restrictions as these, and cannot be assigned values.
Expressions (once myword has already been declared as above), such as:
myword = "Bye";
myword[] = "Bye";
would not be valid, like neither would be:
myword = { 'B', 'y', 'e', ' ' };
This is because arrays cannot be assigned values. Note, though, that each of its elements can be assigned a value individually. For example, this would be correct:
myword[0] = 'B';
myword[1] = 'y';
myword[2] = 'e';
myword[3] = ' ';
Strings and null-terminated character sequences
Plain arrays with null-terminated sequences of characters are the typical types used in the C language to represent strings (that is why they are also known as C-strings). In C++, even though the standard library defines a specific type for strings (class string), still, plain arrays with null-terminated sequences of characters (C-strings) are a natural way of representing strings in the language; in fact, string literals still always produce null-terminated character sequences, and not string objects.
In the standard library, both representations for strings (C-strings and library strings) coexist, and most functions requiring strings are overloaded to support both.
For example, cin and cout support null-terminated sequences directly, allowing them to be directly extracted from cin or inserted into cout, just like strings. For example:
// strings and NTCS:
#include <iostream>
#include <string>
using namespace std;
int main ()
{
char question1[] = "What is your name? ";
string question2 = "Where do you live? ";
char answer1 [80];
string answer2;
cout << question1;
cin >> answer1;
cout << question2;
cin >> answer2;
cout << "Hello, " << answer1;
cout << " from " << answer2 << "!
";
return 0;
}
//What is your name? Homer
//Where do you live? Greece
//Hello, Homer from Greece!
In this example, both arrays of characters using null-terminated sequences and strings are used. They are quite interchangeable in their use together with cin and cout, but there is a notable difference in their declarations: arrays have a fixed size that needs to be specified either implicit or explicitly when declared; question1 has a size of exactly 20 characters (including the terminating null-characters) and answer1 has a size of 80 characters; while strings are simply strings, no size is specified. This is due to the fact that strings have a dynamic size determined during runtime, while the size of arrays is determined on compilation, before the program runs.
In any case, null-terminated character sequences and strings are easily transformed from one another:
Null-terminated character sequences can be transformed into strings implicitly, and strings can be transformed into null-terminated character sequences by using either of string's member functions c_str or data:
char myntcs[] = "some text";
string mystring = myntcs; // convert c-string to string
cout << mystring; // printed as a library string
cout << mystring.c_str(); // printed as a c-string
(note: both c_str and data members of string are equivalent)
Other data types
Type aliases (typedef / using)
A type alias is a different name by which a type can be identified. In C++, any valid type can be aliased so that it can be referred to with a different identifier.
In C++, there are two syntaxes for creating such type aliases: The first, inherited from the C language, uses the typedef keyword:
typedef existing_type new_type_name ;
where existing_type is any type, either fundamental or compound, and new_type_name is an identifier with the new name given to the type.
For example:
typedef char C;
typedef unsigned int WORD;
typedef char * pChar;
typedef char field [50];
This defines four type aliases: C, WORD, pChar, and field as char, unsigned int, char* and char[50], respectively. Once these aliases are defined, they can be used in any declaration just like any other valid type:
C mychar, anotherchar, *ptc1;
WORD myword;
pChar ptc2;
field name;
More recently, a second syntax to define type aliases was introduced in the C++ language:
using new_type_name = existing_type ;
For example, the same type aliases as above could be defined as:
using C = char;
using WORD = unsigned int;
using pChar = char *;
using field = char [50];
Both aliases defined with typedef and aliases defined with using are semantically equivalent. The only difference being that typedef has certain limitations in the realm of templates that using has not. Therefore, using is more generic, although typedef has a longer history and is probably more common in existing code.
Note that neither typedef nor using create new distinct data types. They only create synonyms of existing types. That means that the type of myword above, declared with type WORD, can as well be considered of type unsigned int; it does not really matter, since both are actually referring to the same type.
Type aliases can be used to reduce the length of long or confusing type names, but they are most useful as tools to abstract programs from the underlying types they use. For example, by using an alias of int to refer to a particular kind of parameter instead of using int directly, it allows for the type to be easily replaced by long (or some other type) in a later version, without having to change every instance where it is used.
Unions
Unions allow one portion of memory to be accessed as different data types. Its declaration and use is similar to the one of structures, but its functionality is totally different:
union type_name {
member_type1 member_name1;
member_type2 member_name2;
member_type3 member_name3;
.
.
} object_names;
This creates a new union type, identified by type_name, in which all its member elements occupy the same physical space in memory. The size of this type is the one of the largest member element. For example:
union mytypes_t {
char c;
int i;
float f;
} mytypes;
declares an object (mytypes) with three members:
mytypes.c
mytypes.i
mytypes.f
Each of these members is of a different data type. But since all of them are referring to the same location in memory, the modification of one of the members will affect the value of all of them. It is not possible to store different values in them in a way that each is independent of the others.
One of the uses of a union is to be able to access a value either in its entirety or as an array or structure of smaller elements. For example:
union mix_t {
int l;
struct {
short hi;
short lo;
} s;
char c[4];
} mix;
If we assume that the system where this program runs has an int type with a size of 4 bytes, and a short type of 2 bytes, the union defined above allows the access to the same group of 4 bytes: mix.l, mix.s and mix.c, and which we can use according to how we want to access these bytes: as if they were a single value of type int, or as if they were two values of type short, or as an array of char elements, respectively. The example mixes types, arrays, and structures in the union to demonstrate different ways to access the data. For a little-endian system, this union could be represented as:
The exact alignment and order of the members of a union in memory depends on the system, with the possibility of creating portability issues.
Anonymous unions
When unions are members of a class (or structure), they can be declared with no name. In this case, they become anonymous unions, and its members are directly accessible from objects by their member names. For example, see the differences between these two structure declarations:
structure with regular union |
---|
|
structure with anonymous union |
|
The only difference between the two types is that in the first one, the member union has a name (price), while in the second it has not. This affects the way to access members dollars and yen of an object of this type. For an object of the first type (with a regular union), it would be:
book1.price.dollars
book1.price.yen
whereas for an object of the second type (which has an anonymous union), it would be:
book2.dollars
book2.yen
Again, remember that because it is a member union (not a member structure), the members dollars and yen actually share the same memory location, so they cannot be used to store two different values simultaneously. The price can be set in dollars or in yen, but not in both simultaneously.
Enumerated types (enum)
Enumerated types are types that are defined with a set of custom identifiers, known as enumerators, as possible values. Objects of these enumerated types can take any of these enumerators as value.
Their syntax is:
enum type_name {
value1,
value2,
value3,
.
.
} object_names;
This creates the type type_name, which can take any of value1, value2, value3, ... as value. Objects (variables) of this type can directly be instantiated as object_names.
For example, a new type of variable called colors_t could be defined to store colors with the following declaration:
enum colors_t {black, blue, green, cyan, red, purple, yellow, white};
Notice that this declaration includes no other type, neither fundamental nor compound, in its definition. To say it another way, somehow, this creates a whole new data type from scratch without basing it on any other existing type. The possible values that variables of this new type color_t may take are the enumerators listed within braces. For example, once the colors_t enumerated type is declared, the following expressions will be valid:
colors_t mycolor;
mycolor = blue;
if (mycolor == green) mycolor = red;
Values of enumerated types declared with enum are implicitly convertible to an integer type, and vice versa. In fact, the elements of such an enum are always assigned an integer numerical equivalent internally, to which they can be implicitly converted to or from. If it is not specified otherwise, the integer value equivalent to the first possible value is 0, the equivalent to the second is 1, to the third is 2, and so on... Therefore, in the data type colors_t defined above, black would be equivalent to 0, blue would be equivalent to 1, green to 2, and so on...
A specific integer value can be specified for any of the possible values in the enumerated type. And if the constant value that follows it is itself not given its own value, it is automatically assumed to be the same value plus one. For example:
enum months_t { january=1, february, march, april,
may, june, july, august,
september, october, november, december} y2k;
In this case, the variable y2k of the enumerated type months_t can contain any of the 12 possible values that go from january to december and that are equivalent to the values between 1 and 12 (not between 0 and 11, since january has been made equal to 1).
The implicit conversion works both ways: a value of type months_t can be assigned a value of 1 (which would be equivalent to january), or an integer variable can be assigned a value of january (equivalent to 1).
Enumerated types with enum class
But, in C++, it is possible to create real enum types that are neither implicitly convertible to int and that neither have enumerator values of type int, but of the enum type itself, thus preserving type safety. They are declared with enum class (or enum struct) instead of just enum:
enum class Colors {black, blue, green, cyan, red, purple, yellow, white};
Each of the enumerator values of an enum class type needs to be scoped into its type (this is actually also possible with enum types, but it is only optional). For example:
Colors mycolor;
mycolor = Colors::blue;
if (mycolor == Colors::green) mycolor = Colors::red;
Enumerated types declared with enum class also have more control over their underlying type; it may be any integral data type, such as char, short or unsigned int, which essentially serves to determine the size of the type. This is specified by a colon and the underlying type following the enumerated type. For example:
enum class EyeColor : char {blue, green, brown};
Here, Eyecolor is a distinct type with the same size of a char (1 byte).
References
[1] Character Squences
[2] Other Data Types
- 变更记录
时间 | 地点 | 修改人 | 备注 |
---|---|---|---|
2020-09-18 AM | 佛山 | PatrickLee | 首发 |