NOTE:

These are a little sketchy...I'll fill in details 'soon'.

  C-style strings C++ string class
Built-In vs. Library

(C)strings are a special use of arrays with a character base type. Therefore, you don't need to #include a special library to use them.

There is a class data type that can store strings of characters (like the literal "Hello World!" and others we've seen). This data type is called, oddly enough, string. It is not a built-in type (like short, char, or double), however. This data type is defined as a class in the string standard library. (Recall that cout and cin are objects/variables of class types from over in the iostream library. We'll learn to create our own classes later this semester.

Declaration and Storage

(C)strings are stored as character arrays with a slight difference. A normal char array declared such as:

    const short MAX_ARR = 10;

    char arr[MAX_ARR];

could hold MAX_ARR (or 10) characters of data. If, on the other hand, this array were used to store a (C)string, it could only hold MAX_ARR-1 (or 9) characters of data. The reason being that all (C)strings are terminated by a special character that cannot be typed by the user and therefore cannot be part of the user's data! This character is ASCII 0 ('\0') and is also called the 'null' character.

Initialization is allowed in a few styles:

    const short MAX_S = 20;

    char var[MAX_S] = "value", var2[MAX_S] = { 'v', 'a', 'l', 'u', 'e' };

Both of these create a (C)string variable whose initial content is the string "value". Even though the former is simpler, the latter is just as valid. Neither are clearly adding the 'null' terminator on the value, however (pardon the pun). Both store the '\0' character at the end of the data, just in different ways. The one using a string literal gets its ASCII 0 because the string literal contains one and all of the string literal is copied into the array. The one using an initialization list gets its ASCII 0 because of the rule that all elements not initialized by a too-short list are filled with the 0 pattern for the data type -- hence all (including the position immediately following the 'e') are filled with '\0'.

But those aren't all:

    const char title[] = "String Program";
    const short MAX_T = sizeof(title)/sizeof(char);

Here we initialize a seemingly un-sized (C)string and then afterward calculate the size by dividing the number of bytes the entire array takes up by the size of an individual element. This is recommended to only be done with constant (C)strings -- NOT (C)string variables.

Lastly, be careful for the following pitfall:

    const short MAX_S = 5;

    char s[MAX_S] = "apple", t[MAX_S] = { 'a', 'p', 'p', 'l', 'e' };

Here we've said the maximum is to be 5 characters. Then we initialize with a 6-element string literal! Most compilers will at least issue a warning for this situation, but never assume anything... The variable t, on the other hand, is initialized with a list of values and so gets no warning -- EVER.

In both cases, the variables are not actual (C)strings. They are simply arrays with base type char. You'll run into trouble later if you try to use a (C)string based function with these variables since there is no guarantee that the next memory location will contain a '\0'!

To declare a variable of the string type, simply use it like you would any of the built-in types:

    string var_name;

You can even initialize it, declare multiple variables at once, or even make a constant like you do with the built-in types:

    string var = "value", var2, var3;
    const string title = "String Program";

The advantage that the string class has over the built-in types here is that when you declare a string variable and don't initialize it, it will contain an empty string -- every time, guaranteed! (Recall that if you don't initialize a double or other built-in type variable, it simply retains the garbage bits that were in that memory location before. Although this often ends up being a 0, it is not guaranteed. Here, we are guaranteed that the uninitialized string variable will be an empty string.

Literals

All string literals are really (C)string literals. They have a type of 'const char[n]' where 'n' is the number of characters in the string array. So, the string literal "Hello World!" has a type of 'const char[13]' -- 12 data characters and the 'null' terminator.

As we've known for some time, when the compiler finds two string literals separated by naught but whitespace, it will automatically concatenate them. Note how this aids in breaking the following long message to avoid it being cut off at the edge of a printout:

    cout << "\n\aMay you live long...so I might"
            " get a right answer!!!\n\n";

See over there <---...

Use as Function Arguments

Passing (C)string arguments to functions is just like passing normal array arguments to functions. The brackets ([]) don't need to be filled in. If they are, the compiler just ignores the value, anyway. If the base type is plain, then the argument is passed by 'reference to elements'. If the base type is made const, then it is passed by 'constant reference to elements'.

By design, (C)string arguments require no data length argument accompany them to functions. That's because even if the function needs to know where the data elements end, it can simply look for the 'null' terminator. (See below about writing your own (C)string processing functions.)

One often overlooked feature of (C)string arguments is their use in generalizing a function without losing centrality (lessening cohesion).

By passing things such as prompts and error messages to your functions via (C)string arguments, your caller can make all user interaction just as they please. Note how this is done in the get_nonneg() function:

short get_nonneg(const char prompt[], const char errmsg[])
{
    short value;
    cout << prompt;
    cin >> value;
    while (value < 0)
    {
        cout << errmsg << '\n' << prompt;
        cin >> value;
    }
    return value;
}

And here's a call to this function:

    short num;
    num = get_nonneg("How many C++ books do you own?  ", "");

Even better, note how by having the (C)string arguments be const, the caller can provide their prompt and error message as literal strings. Thus they can avoid having to come up silly names for variables/const'ants to hold these (C)strings.

We can even take advantage of the compiler's handling of 'constant reference' arguments with respect to default argument values:

short get_nonneg(const char prompt[] = "", const char errmsg[] = "");

Now the caller doesn't have to provide the error message if they don't want:

    short num;
    num = get_nonneg("How many C++ books do you own?  ");

However, we may want to remove the explicit '\n' from the re-prompt and force the caller to insert their own if they want it:

short get_nonneg(const char prompt[], const char errmsg[])
{
    short value;
    cout << prompt;
    cin >> value;
    while (value < 0)
    {
        cout << errmsg << prompt;
        cin >> value;
    }
    return value;
}

// ...elsewhere at the call...
    short num;
    num = get_nonneg("\nHow many C++ books do you own?  ");

Or, the caller may decide they want just an initial prompt and no more:

    short num;
    cout << "Enter a non-negative value or die of boredom:  ";
    num = get_nonneg();

To pass a string class object as a function argument, you'll typically want to use either a reference (to change the contents) or constant reference (to not change). Passing a string class object by value would cause the compiler to make a copy of all that object's internal information -- the string itself, length counter, etc. That can be a lot of data and can take a long time.

The only thing that would have to change about the get_nonneg() function, for instance, would be the head:

short get_nonneg(const string & prompt = "", const string & errmsg = "");

Note how we can even use the default argument values since the 'reference' is treated as a constant!

Provided Processing Facilities:
Aggregate Processing

Many of the things you'd want to do to (C)strings is already built into the standard libraries. cout knows how to print them (we've been doing that since hello.C in 121). cin can sort-of read (C)strings. To do this well requires a bit of extra help, though. And there is a whole library dedicated to (C)string functions.

Output

Just as with string literals, cout has been taught how to print a (C)string on the screen. Just use the insertion (<<) operator with it:

    const short MAX_S = 20;
    char s[MAX_S];

    // fill s somehow...

    cout << s;

Input

Cannot protect themselves from over-run AT ALL!

cin, setw+>>, getline

    const short MAX_S = 20;
    char s[MAX_S];

    cin >> s;    // works, but leaves program open to most
                 // common security attack -- buffer overrun

Instead:

    const short MAX_S = 20;
    char s[MAX_S];

    cin >> setw(MAX_S) >> s;   // now no buffer overrun,
                               // but may leave stray
                               // stuff in buffer:
    cin.ignore(numeric_limits<streamsize>::max(), '\n');   // throw out stray stuff

For a space containing (C)string:

    const short MAX_S = 20;
    char s[MAX_S];

    cin.getline(s, MAX_S);   // reads to '\n' or MAX_S-1 chars --
                             // whichever happens first

If preceeded by an extraction (>>):

    double x;      // can be any data type, really...

    const short MAX_S = 20;
    char s[MAX_S];

    cin >> x;          // read with extraction (stops at any spaces,
                       // notably '\n' here)

    cin.getline(s, MAX_S);   // reads '\n' immediately and is done!
                             // s is an empty string and program
                             // doesn't pause for user to type info...

Instead:

    const short MAX_S = 20;
    char s[MAX_S];

    if (cin.peek() == '\n')  // if '\n' is waiting (from prior extraction)
    {
        cin.ignore();              // throw it out!
    }
    cin.getline(s, MAX_S);   // reads '\n' immediately only if user simply
                             // hits <Enter> with no data...

But, for portability:

    const short MAX_S = 20;
    char s[MAX_S];

    cout.flush();            // make sure any waiting prompt is displayed...
    if (cin.peek() == '\n')
    {
        cin.ignore();
    }
    cin.getline(s, MAX_S);   // reads '\n' immediately only if user simply
                             // hits <Enter> with no data...

And leftovers are worse than with setw()!

    const short MAX_S = 20;
    char s[MAX_S];

    cout.flush();
    if (cin.peek() == '\n')
    {
        cin.ignore();
    }
    cin.getline(s, MAX_S);   // sets failure if doesn't reach '\n' before
                             // the (MAX_S-1)th char

    if (cin.fail())          // ran out of room...
    {
        cin.clear();                 // clear failure
        cin.ignore(numeric_limits<streamsize>::max(), '\n');   // throw out rest of line
    }

Perhaps a re-usable function?

void get_line(char s[], const long max)
{
    cout.flush();
    if (cin.peek() == '\n')
    {
        cin.ignore();
    }
    cin.getline(s, max);
    if (cin.fail())
    {
        cin.clear();
        cin.ignore(numeric_limits<streamsize>::max(), '\n');
    }
    return;
}

Could also make the clear/ignore optional with a defaulted argument.

The <cstring> Library

Simplistic

Assignment:

    const short MAX_S = 20;
    char s[MAX_S], t[MAX_S];

    strcpy(t, s);       // like t = s -- only it'll work

Concatenation:

    const short MAX_S = 20;
    char s[MAX_S], t[MAX_S];

    strcat(t, s);       // like t += s -- only it'll work

Comparison:

    const short MAX_S = 20;
    char s[MAX_S], t[MAX_S];

    want to compare  |  instead compare
   ------------------+-------------------------------------
      s < t          |  strcmp(s, t) < 0
      s <= t         |  strcmp(s, t) <= 0
      s > t          |  strcmp(s, t) > 0
      s >= t         |  strcmp(s, t) >= 0
      s == t         |  strcmp(s, t) == 0
      s != t         |  strcmp(s, t) != 0

It is case sensitive!!! Watch out!!!

Overrun Protected

strncpy

    const short MAX_S = 20, MAX_T = 50;
    char s[MAX_S], t[MAX_T];

    strncpy(s, t, MAX_S-1);    // copy all data we can hold
    s[MAX_S-1] = '\0';         // attach '\0' -- just in case!

strncat

    const short MAX_S = 20, MAX_T = 50;
    char s[MAX_S], t[MAX_T];

    strncat(s, t, MAX_S-1-strlen(s));    // append all data we can hold
    s[MAX_S-1] = '\0';                   // attach '\0' -- just in case!

Other

strncmp -- compare up to 'n' chars

strlen -- how many data chars?

etc.

Many of the things you'd want to do to strings are already built into the string class. cout knows how to print string objects. cin can read string objects. To do this sometimes requires a bit of extra help, though.

Output

Just as with string literals, cout has been taught how to print a string class object on the screen. Just use the insertion (<<) operator with it:

    string s;

    // fill s somehow...

    cout << s;

Input

Can protect themselves from over-run -- they automatically grow to be as long as needed to store their data. The only way to break in using the string class would be to take up all the computer's memory, but that would actually crash the machine -- not leave it open to attack.

cin, >>, getline

    string s;

    cin >> s;         // read in a single 'word' from user

Or, if we [potentially] want embedded spaces:

    string s;

    getline(cin, s);  // read until '\n' entered (extracted, but not stored)

But, if getline() is preceeded by an extraction operation (>>), s will be empty! Instead:

    double x;    // any type, really...
    string s;

    cin >> x;     // read with extraction (stops at any spaces
                  // notably '\n' here)

    getline(cin, s);     // reads '\n' immediately and is done!
                         // s is an empty string and program
                         // doesn't pause for user to type info...

Instead:

    string s;

    if (cin.peek() == '\n')   // if '\n' is waiting (from prior extraction)
    {
        cin.ignore();              // throw it out!
    }
    getline(cin, s);         // reads '\n' immediately only if user simply
                             // hits <Enter> with no data...

And, for portability:

    string s;

    cout.flush();            // make sure any waiting prompt is displayed...
    if (cin.peek() == '\n')
    {
        cin.ignore();
    }
    getline(cin, s);         // reads '\n' immediately only if user simply
                             // hits <Enter> with no data...

Perhaps a re-usable function?

void get_line(string & s)
{
    cout.flush();
    if (cin.peek() == '\n')
    {
        cin.ignore();
    }
    getline(cin, s);
    return;
}

Program Friendliness

One can use strings not only to accomplish simple text manipulation, but also to make the program more user friendly:

    string name;

    cout << "\nHello, what's your [first] name?  ";
    cin >> name;
    cout << "\nWelcome to the show, " << name << "...\n";

Further prompts can also be embellished with the user's name:

    cout << "So, " << name << ", tell me about yourself...\n"
         << "What do you make each year?  ";
    cin >> sign >> salary;

Although this could be done with (C)strings, it would be more difficult because we'd not know how long their name was and that would hinder both declaration of the name variable and initial input of their name.

string class Functions

Objects of the string class can use standard assignment:

    string var, var2;
    var = "value";
    var2 = var;

And even something called concatenation:

    string var = "Hello", var2;
    var2 = var + ' ' + "World" + '!';
    // var2 is now "Hello World!"

The addition operator is used to attach two strings end to beginning to create a new string with both contents. (Note that the two strings being combined are left unchanged but a new string is created. Just like when you add 3 and 4 to get 7, 3 and 4 are unchanged but 7 is created. Or, less esoterically, when you build a wall from bricks, the bricks remain but you've created a wall.)

But that's not all! A string variable can tell you the length of its contents (number of characters it contains):

    string var;
    cin >> var;
    cout << '\'' << var << "' is " << var.length()
         << " characters long.\n";

Note the '.' (dot) syntax as we had with both versions of cin.ignore(). This is in general how you call a function that is inside a class: object/variable, dot, function (with () and any arguments necessary).

You can also replace part of a string with another string:

    string first, middle, last, whole;

    cout << "\nWhat is your name (First Middle Last)?  ";
    cin >> first >> middle >> last;

    whole = first + ' ' + middle + ' ' + last;

    cout << "\nWell, '" << whole <<"' is a fine name,\nbut "
         << "wouldn't '";

    whole.replace(first.length()+1, middle.length(), "Allowicious");

    cout << whole << "' sound so much cooler?\n";

This fragment demonstrates declaration, input, concatenation and assignment, output, and length finding as well as replacement. Note the three arguments given to the replace() function: a position from which to replace, a number of characters to replace, and the replacement string. Here we want to replace from the first character of the middle name (which is right after the entire first name and the space that separates the first and middle names) for the entirety of the middle name with 'Allowicious' (yes, we are being facetious).

You can also locate a sub-string of a string:

    string s = "...loan of the company car...";
    s.replace(s.find("the"), 3, "a");

Here, the find() function returns the location within s where the first 't' followed by 'h' and then by 'e' occurs. Then, it replaces those 3 characters with a single 'a'.

Be careful, though, as it searches without understanding and so the above might find "them" or "other" or "bathe" just as easily as it did "the". Also, it would NOT find "The" since we asked for the sub-string to start with a lower-case 't' -- not a capital!

Finally, you can assign one string to be a sub-string of another string. This can be done in two ways, explicitly:

    string s = "Happy Birthday George", t;

    t = s.substr(6, 8);

Here t would become equal to "Birthday". (Note that the 'B' in s is at position 6. You might think it should be position 7 with 'Happy' taking slots 1-5 and the space position 6. But the computer likes to think of string positions as distances from the beginning. Thus, 'Hello' begins at position 0 and goes to position 4, the space is at position 5, and therefore the 'B' of 'Birthday' is at position 6.)

Or we could do the assignment less explicitly:

    string s = "Happy Birthday George", t;

    t.assign(s, 6, 8);

Here we don't extract the sub-string and then assign it as a second step. Instead, we give assign() the original string, a starting position, and a number of characters to take and it overwrites the 'calling string' (the one to the left of the dot) with these characters. This actually proves to be more efficient than the first version using substr().

By providing both facilities to accomplish the same basic task, however, the creators of the string class have given programmers using it more flexibility. They are now free to code in a fashion more natural to their way of thinking and not be forced to do the job a single particular way. Consider, for example, building the word 'yellow' from the word 'hello'. We could use assign() and concatenation:

    string s = "hello", t;

    t.assign(s, 1, 4);
    t = 'y' + t + 'w';

Or we could use substr() and concatenation:

    string s = "hello", t;

    t = 'y' + s.substr(1, 4) + 'w';

Some might find the latter a more natural solution than the former. It could also be considered more clear and even more elegant.

Also...<, <=, >, >=, ==, !=, and string::compare().

Provided Processing Facilities:
char-wise Processing

By using the subscript operator ([]), one can access a single char of a (C)string for either viewing or changing it:

    const short MAX_S = 20;
    char s[MAX_S];

    s[0] = 'H';
    s[1] = 'i';
    s[2] = '\0';    // without this it isn't a realy (C)string!

    cout << s[1];   // just look at the second character

Normal View

size_type vs. iterator (size_type-->iterator translation)

    string::size_type p;
    // set p to something in [0..s.length() )
    string::iterator i = s.begin()+p;
    // use *i to access/change char at offset p in s

Or more specifically:

    p = s.find( /* ... */ );
    if (p != string::npos)
    {
        i = s.begin()+p;
        // use *i to access/change char at offset p in s
    }

    // ...or...

    p = rand()%s.length();
    i = s.begin()+p;
    // use *i to access/change char at offset p in s

Another Approach

[] vs. at

    string s = "Hello";
    cout << s[5];        // may crash, should just return garbage

vs.

    string s = "Hello";
    cout << s.at(5);        // program dies with an exception
Programmer 'Hand' Processing

   short i = 0;
   while (s[i] != '\0')    // no telling when we'll reach it, so indef loop
   {
      // do something with s[i]
      ++i;
   }

   string::iterator i;   // iterator to alter, const_iterator to view

   for (i = s.begin(); i != s.end(); ++i)  // s is [begin, end), so def loop
   {
      // do something with *i
   }
Use as 'Array' Base Types

An array of (C)strings is by its nature 2D. Note that the base-type of the outer array is itself an array:

    const short MAX_LINES = 66, MAX_LINE_LEN = 81;

    char page[MAX_LINES][MAX_LINE_LEN];

Here we make a 2D array of characters. The first dimension is the number of lines. The second dimension is the number of characters in a line. We've made the line length 1 longer than normal to leave room for the terminating '\0' character.

Note that one can 'subscript' the page array either once or twice. Twice would give a single character since offsets into both dimensions would have been specified:

    page[4][3]      // specifies the 5th line's 4th character

Specifying only a single 'index' would pick a particular line, but not a particular character within that line. Therefore, the result would be an entire line -- which we are treating as a (C)string:

    page[4]        // specifies the entire 5th line

This subscripting result could be passed to any function/operation that could handle a (C)string:

    cout << page[4];                           // output
// ...or...
    cin >> setw(MAX_LINE_LEN) >> page[4];      // input
// ...or...
    strcpy(page[4], "dark and stormy");        // 'assignment'
// ...or...
    strcat(page[4], " night");                 // concatenation/append

We can hide this 2D aspect by using a typedef (type definition). Such typedef's can be placed in a particular function, global to a program file, or in a library for anyone to #include and use.

Please note that the typedef for an array (a (C)string here) also needs a constant to go along with it. This should be placed immediately before the typedef where-ever that may end up (see above).

The Message typedef in this example is hidden inside the function as it isn't used anywhere else. Note how the messages array appears 1D, but is in reality 2D because its base type is a 1D array type:

short get_nonneg(const char prompt[], const char errmsg[])
{
    const short MSG_LEN = 70;
    typedef char Message[MSG_LEN];

    const short MAX_MESS = 5;
    const Message messages[MAX_MESS] = { "Dumkoff!  Enter larger numbers!",
                                         "I've seen rocks with larger IQs!",
                                         "What were your parents thinking?!",
                                         "Vous etes un bete chien!",
                                         "May you live long...so I might"
                                           " get a right answer!!!"
                                       };

    // stuff...

            cout << messages[rand()%MAX_MESS];  // choose a random msg

    // other stuff...
}

Here is this example as a whole program you can try out.

An array with a string class base type is not actually 2D, but can be treated in a 2D fashion (see char-wise access above).

    string page[MAX_LINES];

    const string messages[MAX_MESS] = { /* ... */ };

One can also make a vector with a string class base type. (This is quite tricky to do with a (C)string.) Access patterns can be similar, but can also be done via iterator(s).

    vector<string> page(MAX_LINES);
Member Variables of a class

Works like an array member, but with help from the cstring library:

    const short MAX_S = xx;
    class CStrMemb
    {
        char str[MAX_S];

    public:
        void get_str(char s[], const short len = 0) const
        {
            if (len > 0)
            {
                strncpy(s, str, len-1);
                s[len-1] = '\0';
            }
            else
            {
                strcpy(s, str);
            }
            return;
        }

        bool set_str(const char s[])
        {
            strncpy(str, s, MAX_S-1);
            str[MAX_S-1] = '\0';
            return true;
        }
    };

Works just like a built-in type thanks to all its classy support:

    class StrClsMemb
    {
        string str;

    public:
        string get_str(void) const { return str; }

        bool set_str(const string & s)
        {
            str = s;
            return !str.empty();
        }
    };