NOTE:
These are a little sketchy...I'll fill in details 'soon'.
C-style strings | C++ string class | |
---|---|---|
Built-In vs. Library |
(C)strings are a special use of arrays with a character base type. Therefore, you don't need to #include a special library to use them. |
There is a class data type that can store strings of characters (like the
literal "Hello World!" and others we've seen). This data type is
called, oddly enough, |
Declaration and Storage |
(C)strings are stored as character arrays with a slight difference. A normal char array declared such as: const short MAX_ARR = 10; char arr[MAX_ARR]; could hold MAX_ARR (or 10) characters of data. If, on the other hand,
this array were used to store a (C)string, it could only hold MAX_ARR-1
(or 9) characters of data. The reason being that all (C)strings are
terminated by a special character that cannot be typed by the user and
therefore cannot be part of the user's data! This character is ASCII 0
( Initialization is allowed in a few styles: const short MAX_S = 20; char var[MAX_S] = "value", var2[MAX_S] = { 'v', 'a', 'l', 'u', 'e' }; Both of these create a (C)string variable whose initial content is the
string "value". Even though the
former
is simpler, the
latter
is just as valid. Neither are clearly adding the 'null' terminator
on the value, however (pardon the pun). Both store the But those aren't all: const char title[] = "String Program"; const short MAX_T = sizeof(title)/sizeof(char); Here we initialize a seemingly un-sized (C)string and then afterward calculate the size by dividing the number of bytes the entire array takes up by the size of an individual element. This is recommended to only be done with constant (C)strings -- NOT (C)string variables. Lastly, be careful for the following pitfall: const short MAX_S = 5; char s[MAX_S] = "apple", t[MAX_S] = { 'a', 'p', 'p', 'l', 'e' }; Here we've said the maximum is to be 5 characters. Then we initialize
with a 6-element string literal! Most compilers will at least issue
a warning for this situation, but never assume anything... The
variable In both cases, the variables are not actual (C)strings. They are simply
arrays with base type char. You'll run into trouble later if you try
to use a (C)string based function with these variables since there is
no guarantee that the next memory location will contain a |
To declare a variable of the string var_name; You can even initialize it, declare multiple variables at once, or even make a constant like you do with the built-in types: string var = "value", var2, var3; const string title = "String Program"; The advantage that the |
Literals |
All string literals are really (C)string literals. They have a type
of ' As we've known for some time, when the compiler finds two string literals separated by naught but whitespace, it will automatically concatenate them. Note how this aids in breaking the following long message to avoid it being cut off at the edge of a printout: cout << "\n\aMay you live long...so I might" " get a right answer!!!\n\n"; |
See over there <---... |
Use as Function Arguments |
Passing (C)string arguments to functions is just like passing
normal array arguments to functions. The brackets ( By design, (C)string arguments require no data length argument accompany them to functions. That's because even if the function needs to know where the data elements end, it can simply look for the 'null' terminator. (See below about writing your own (C)string processing functions.) One often overlooked feature of (C)string arguments is their use in generalizing a function without losing centrality (lessening cohesion). By passing things such as prompts and error messages to your functions via (C)string arguments, your caller can make all user interaction just as they please. Note how this is done in the get_nonneg() function: short get_nonneg(const char prompt[], const char errmsg[]) { short value; cout << prompt; cin >> value; while (value < 0) { cout << errmsg << '\n' << prompt; cin >> value; } return value; } And here's a call to this function: short num; num = get_nonneg("How many C++ books do you own? ", ""); Even better, note how by having the (C)string arguments be const, the caller can provide their prompt and error message as literal strings. Thus they can avoid having to come up silly names for variables/const'ants to hold these (C)strings. We can even take advantage of the compiler's handling of 'constant reference' arguments with respect to default argument values: short get_nonneg(const char prompt[] = "", const char errmsg[] = ""); Now the caller doesn't have to provide the error message if they don't want: short num; num = get_nonneg("How many C++ books do you own? "); However, we may want to remove the explicit short get_nonneg(const char prompt[], const char errmsg[]) { short value; cout << prompt; cin >> value; while (value < 0) { cout << errmsg << prompt; cin >> value; } return value; } // ...elsewhere at the call... short num; num = get_nonneg("\nHow many C++ books do you own? "); Or, the caller may decide they want just an initial prompt and no more: short num; cout << "Enter a non-negative value or die of boredom: "; num = get_nonneg(); |
To pass a string class object as a function argument, you'll typically want to use either a reference (to change the contents) or constant reference (to not change). Passing a string class object by value would cause the compiler to make a copy of all that object's internal information -- the string itself, length counter, etc. That can be a lot of data and can take a long time. The only thing that would have to change about the get_nonneg() function, for instance, would be the head: short get_nonneg(const string & prompt = "", const string & errmsg = ""); Note how we can even use the default argument values since the 'reference' is treated as a constant! |
Provided Processing Facilities: Aggregate Processing |
Many of the things you'd want to do to (C)strings is already built into the standard libraries. cout knows how to print them (we've been doing that since hello.C in 121). cin can sort-of read (C)strings. To do this well requires a bit of extra help, though. And there is a whole library dedicated to (C)string functions. OutputJust as with string literals, cout has been taught how to print
a (C)string on the screen. Just use the insertion
( const short MAX_S = 20; char s[MAX_S]; // fill s somehow... cout << s; InputCannot protect themselves from over-run AT ALL! cin, setw+ const short MAX_S = 20; char s[MAX_S]; cin >> s; // works, but leaves program open to most // common security attack -- buffer overrun Instead: const short MAX_S = 20; char s[MAX_S]; cin >> setw(MAX_S) >> s; // now no buffer overrun, // but may leave stray // stuff in buffer: cin.ignore(numeric_limits<streamsize>::max(), '\n'); // throw out stray stuff For a space containing (C)string: const short MAX_S = 20; char s[MAX_S]; cin.getline(s, MAX_S); // reads to '\n' or MAX_S-1 chars -- // whichever happens first If preceeded by an extraction ( double x; // can be any data type, really... const short MAX_S = 20; char s[MAX_S]; cin >> x; // read with extraction (stops at any spaces, // notably '\n' here) cin.getline(s, MAX_S); // reads '\n' immediately and is done! // s is an empty string and program // doesn't pause for user to type info... Instead: const short MAX_S = 20; char s[MAX_S]; if (cin.peek() == '\n') // if '\n' is waiting (from prior extraction) { cin.ignore(); // throw it out! } cin.getline(s, MAX_S); // reads '\n' immediately only if user simply // hits <Enter> with no data... But, for portability: const short MAX_S = 20; char s[MAX_S]; cout.flush(); // make sure any waiting prompt is displayed... if (cin.peek() == '\n') { cin.ignore(); } cin.getline(s, MAX_S); // reads '\n' immediately only if user simply // hits <Enter> with no data... And leftovers are worse than with setw()! const short MAX_S = 20; char s[MAX_S]; cout.flush(); if (cin.peek() == '\n') { cin.ignore(); } cin.getline(s, MAX_S); // sets failure if doesn't reach '\n' before // the (MAX_S-1)th char if (cin.fail()) // ran out of room... { cin.clear(); // clear failure cin.ignore(numeric_limits<streamsize>::max(), '\n'); // throw out rest of line } Perhaps a re-usable function? void get_line(char s[], const long max) { cout.flush(); if (cin.peek() == '\n') { cin.ignore(); } cin.getline(s, max); if (cin.fail()) { cin.clear(); cin.ignore(numeric_limits<streamsize>::max(), '\n'); } return; } Could also make the clear/ignore optional with a defaulted argument. The <cstring> LibrarySimplisticAssignment: const short MAX_S = 20; char s[MAX_S], t[MAX_S]; strcpy(t, s); // like t = s -- only it'll work Concatenation: const short MAX_S = 20; char s[MAX_S], t[MAX_S]; strcat(t, s); // like t += s -- only it'll work Comparison: const short MAX_S = 20; char s[MAX_S], t[MAX_S]; want to compare | instead compare ------------------+------------------------------------- s < t | strcmp(s, t) < 0 s <= t | strcmp(s, t) <= 0 s > t | strcmp(s, t) > 0 s >= t | strcmp(s, t) >= 0 s == t | strcmp(s, t) == 0 s != t | strcmp(s, t) != 0 It is case sensitive!!! Watch out!!! Overrun Protectedstrncpy const short MAX_S = 20, MAX_T = 50; char s[MAX_S], t[MAX_T]; strncpy(s, t, MAX_S-1); // copy all data we can hold s[MAX_S-1] = '\0'; // attach '\0' -- just in case! strncat const short MAX_S = 20, MAX_T = 50; char s[MAX_S], t[MAX_T]; strncat(s, t, MAX_S-1-strlen(s)); // append all data we can hold s[MAX_S-1] = '\0'; // attach '\0' -- just in case! Otherstrncmp -- compare up to 'n' chars strlen -- how many data chars? etc. |
Many of the things you'd want to do to strings are already built into the string class. cout knows how to print string objects. cin can read string objects. To do this sometimes requires a bit of extra help, though. OutputJust as with string literals, cout has been taught how to print
a string class object on the screen. Just use the insertion
( string s; // fill s somehow... cout << s; InputCan protect themselves from over-run -- they automatically grow to be as long as needed to store their data. The only way to break in using the string class would be to take up all the computer's memory, but that would actually crash the machine -- not leave it open to attack. cin, string s; cin >> s; // read in a single 'word' from user Or, if we [potentially] want embedded spaces: string s; getline(cin, s); // read until '\n' entered (extracted, but not stored) But, if getline() is preceeded by an extraction operation
( double x; // any type, really... string s; cin >> x; // read with extraction (stops at any spaces // notably '\n' here) getline(cin, s); // reads '\n' immediately and is done! // s is an empty string and program // doesn't pause for user to type info... Instead: string s; if (cin.peek() == '\n') // if '\n' is waiting (from prior extraction) { cin.ignore(); // throw it out! } getline(cin, s); // reads '\n' immediately only if user simply // hits <Enter> with no data... And, for portability: string s; cout.flush(); // make sure any waiting prompt is displayed... if (cin.peek() == '\n') { cin.ignore(); } getline(cin, s); // reads '\n' immediately only if user simply // hits <Enter> with no data... Perhaps a re-usable function? void get_line(string & s) { cout.flush(); if (cin.peek() == '\n') { cin.ignore(); } getline(cin, s); return; } Program FriendlinessOne can use string name; cout << "\nHello, what's your [first] name? "; cin >> name; cout << "\nWelcome to the show, " << name << "...\n"; Further prompts can also be embellished with the user's name: cout << "So, " << name << ", tell me about yourself...\n" << "What do you make each year? "; cin >> sign >> salary; Although this could be done with (C)strings, it would be more difficult because we'd not know how long their name was and that would hinder both declaration of the name variable and initial input of their name. string class FunctionsObjects of the string class can use standard assignment: string var, var2; var = "value"; var2 = var; And even something called concatenation: string var = "Hello", var2; var2 = var + ' ' + "World" + '!'; // var2 is now "Hello World!" The addition operator is used to attach two strings end to beginning to create a new string with both contents. (Note that the two strings being combined are left unchanged but a new string is created. Just like when you add 3 and 4 to get 7, 3 and 4 are unchanged but 7 is created. Or, less esoterically, when you build a wall from bricks, the bricks remain but you've created a wall.) But that's not all! A string var; cin >> var; cout << '\'' << var << "' is " << var.length() << " characters long.\n"; Note the '.' (dot) syntax as we had with both versions of cin.ignore().
This is in general how you call a function that is inside a
You can also replace part of a string with another string: string first, middle, last, whole; cout << "\nWhat is your name (First Middle Last)? "; cin >> first >> middle >> last; whole = first + ' ' + middle + ' ' + last; cout << "\nWell, '" << whole <<"' is a fine name,\nbut " << "wouldn't '"; whole.replace(first.length()+1, middle.length(), "Allowicious"); cout << whole << "' sound so much cooler?\n"; This fragment demonstrates declaration, input, concatenation and assignment, output, and length finding as well as replacement. Note the three arguments given to the replace() function: a position from which to replace, a number of characters to replace, and the replacement string. Here we want to replace from the first character of the middle name (which is right after the entire first name and the space that separates the first and middle names) for the entirety of the middle name with 'Allowicious' (yes, we are being facetious). You can also locate a sub-string of a string: string s = "...loan of the company car..."; s.replace(s.find("the"), 3, "a"); Here, the find() function returns the location within s where the first 't' followed by 'h' and then by 'e' occurs. Then, it replaces those 3 characters with a single 'a'. Be careful, though, as it searches without understanding and so the above might find "them" or "other" or "bathe" just as easily as it did "the". Also, it would NOT find "The" since we asked for the sub-string to start with a lower-case 't' -- not a capital! Finally, you can assign one string to be a sub-string of another string. This can be done in two ways, explicitly: string s = "Happy Birthday George", t; t = s.substr(6, 8); Here t would become equal to "Birthday". (Note that the 'B' in s is at position 6. You might think it should be position 7 with 'Happy' taking slots 1-5 and the space position 6. But the computer likes to think of string positions as distances from the beginning. Thus, 'Hello' begins at position 0 and goes to position 4, the space is at position 5, and therefore the 'B' of 'Birthday' is at position 6.) Or we could do the assignment less explicitly: string s = "Happy Birthday George", t; t.assign(s, 6, 8); Here we don't extract the sub-string and then assign it as a second step. Instead, we give assign() the original string, a starting position, and a number of characters to take and it overwrites the 'calling string' (the one to the left of the dot) with these characters. This actually proves to be more efficient than the first version using substr(). By providing both facilities to accomplish the same basic task, however,
the creators of the string s = "hello", t; t.assign(s, 1, 4); t = 'y' + t + 'w'; Or we could use substr() and concatenation: string s = "hello", t; t = 'y' + s.substr(1, 4) + 'w'; Some might find the latter a more natural solution than the former. It could also be considered more clear and even more elegant. Also...<, <=, >, >=, ==, !=, and string::compare(). |
Provided Processing Facilities:
char-wise Processing |
By using the subscript operator ( const short MAX_S = 20; char s[MAX_S]; s[0] = 'H'; s[1] = 'i'; s[2] = '\0'; // without this it isn't a realy (C)string! cout << s[1]; // just look at the second character |
Normal Viewsize_type vs. iterator (size_type-->iterator translation) string::size_type p; // set p to something in [0..s.length() ) string::iterator i = s.begin()+p; // use *i to access/change char at offset p in s Or more specifically: p = s.find( /* ... */ ); if (p != string::npos) { i = s.begin()+p; // use *i to access/change char at offset p in s } // ...or... p = rand()%s.length(); i = s.begin()+p; // use *i to access/change char at offset p in s Another Approach
string s = "Hello"; cout << s[5]; // may crash, should just return garbage vs. string s = "Hello"; cout << s.at(5); // program dies with an exception |
Programmer 'Hand' Processing |
short i = 0; while (s[i] != '\0') // no telling when we'll reach it, so indef loop { // do something with s[i] ++i; } |
string::iterator i; // iterator to alter, const_iterator to view for (i = s.begin(); i != s.end(); ++i) // s is [begin, end), so def loop { // do something with *i } |
Use as 'Array' Base Types |
An array of (C)strings is by its nature 2D. Note that the base-type of the outer array is itself an array: const short MAX_LINES = 66, MAX_LINE_LEN = 81; char page[MAX_LINES][MAX_LINE_LEN]; Here we make a 2D array of characters. The first dimension is the
number of lines. The second dimension is the number of characters
in a line. We've made the line length 1 longer than normal to leave
room for the terminating Note that one can 'subscript' the page array either once or twice. Twice would give a single character since offsets into both dimensions would have been specified: page[4][3] // specifies the 5th line's 4th character Specifying only a single 'index' would pick a particular line, but not a particular character within that line. Therefore, the result would be an entire line -- which we are treating as a (C)string: page[4] // specifies the entire 5th line This subscripting result could be passed to any function/operation that could handle a (C)string: cout << page[4]; // output // ...or... cin >> setw(MAX_LINE_LEN) >> page[4]; // input // ...or... strcpy(page[4], "dark and stormy"); // 'assignment' // ...or... strcat(page[4], " night"); // concatenation/append We can hide this 2D aspect by using a typedef (type definition). Such typedef's can be placed in a particular function, global to a program file, or in a library for anyone to #include and use. Please note that the typedef for an array (a (C)string here) also needs a constant to go along with it. This should be placed immediately before the typedef where-ever that may end up (see above). The Message typedef in this example is hidden inside the function as it isn't used anywhere else. Note how the messages array appears 1D, but is in reality 2D because its base type is a 1D array type: short get_nonneg(const char prompt[], const char errmsg[]) { const short MSG_LEN = 70; typedef char Message[MSG_LEN]; const short MAX_MESS = 5; const Message messages[MAX_MESS] = { "Dumkoff! Enter larger numbers!", "I've seen rocks with larger IQs!", "What were your parents thinking?!", "Vous etes un bete chien!", "May you live long...so I might" " get a right answer!!!" }; // stuff... cout << messages[rand()%MAX_MESS]; // choose a random msg // other stuff... } Here is this example as a whole program you can try out. |
An array with a string class base type is not actually 2D, but can be treated in a 2D fashion (see char-wise access above). string page[MAX_LINES]; const string messages[MAX_MESS] = { /* ... */ }; One can also make a vector with a string class base type. (This is quite tricky to do with a (C)string.) Access patterns can be similar, but can also be done via iterator(s). vector<string> page(MAX_LINES); |
Member Variables of a class |
Works like an array member, but with help from the cstring library: const short MAX_S = xx; class CStrMemb { char str[MAX_S]; public: void get_str(char s[], const short len = 0) const { if (len > 0) { strncpy(s, str, len-1); s[len-1] = '\0'; } else { strcpy(s, str); } return; } bool set_str(const char s[]) { strncpy(str, s, MAX_S-1); str[MAX_S-1] = '\0'; return true; } }; |
Works just like a built-in type thanks to all its classy support: class StrClsMemb { string str; public: string get_str(void) const { return str; } bool set_str(const string & s) { str = s; return !str.empty(); } }; |