Iterators for vectors and strings

Basic Concepts

iterators are to vectors/strings what pointers are to arrays/C-strings...sorta:

classes	Built-ins
vector/string	array/C-string
at/[]/size_type	[]/size_t
iterator	pointer

Essential Usage

Declaration, Initialization, and Access

An iterator holds a reference to the data at a certain position within a string or vector. To declare one, you use this syntax:

    vector<double>::iterator itr;

Here we have an iterator that can hold a reference to a double within a vector of doubles. To make it actually hold such a reference, we must assign it:

    vector<double> vec;

    itr = vec.begin();

Here we've made it 'iterate' the first (0th) element of the vector vec. Now, how do we access that double?! Use the unary * operator (dereference...remember?):

    cout << *itr;   // print element itr iterates

    *itr = 4.2;   // store 4.2 in element itr iterates

The question remains, does itr iterate anything? Hunh? We assigned it — and even printed it, but does that position exist? Oh... Let's check:

    if (itr != vec.end())
    {
        cout << *itr;   // print element itr iterates
        *itr = 4.2;   // store 4.2 in element itr iterates
    }

Much better! Now we know it exists before we store into it — a definite no-no.

Creating iterators

All iterator positions which are valid are between begin() and end(): [begin..end). Of course, the exclusivity overrides the inclusivity when begin==end (which will happen when the vector/string is empty).

    for (itr = vec.begin(); itr != vec.end(); ++itr)
    {
        // use *itr -- it's safe!  (I promise...)
    }

Note the similarity to pointers/arrays:

    for (p = arr; p-arr != cur_len; ++p)
    {
        // use *p -- it's safe!  (I promise...)
    }

Intermediate Usage

iterator Math

All 'pointer math' operations work in iterator versions:

    iterator + size_type
    size_type + iterator
    iter2 - iter1 (--> size_type)
    iterator += size_type
    iterator - size_type
    iterator -= size_type
    iterator++, ++iterator
    iterator--, --iterator

Just make sure that the size_type you are adding/subtracting is legitimate for the current vector/string's size()!

But Of Course...

And since we can now subtract two iterators to get a size_type, we can also adjust our for loop pattern to look even more like the array version:

    for (itr = vec.begin(); itr-vec.begin() != vec.size(); ++itr)
    {
        // use *itr -- it's safe!  (I promise...)
    }

Note how we are now subtracting the initial iterator from the current iterator to get a size_type to compare to the vector's size. *bounce*

Passing iterators to Functions

If you pass an iterator to a function, you do NOT need to also pass the vector into which it iterates in order to affect that vector's elements...!!!

That's right. Remember that an iterator holds a REFERENCE to the vector's element and so you can alter a vector element without even having the vector at hand!

    void swap(vector<double>::iterator a,
              vector<double>::iterator b)
    {
        double c = *a;
        *a = *b;
        *b = c;
        return;
    }

Now we can call this function with:

    swap(itr, itr+1);

To swap two adjacent elements in a vector.

The Need to NOT Change

If a vector is constant, you cannot use an iterator into it. Instead you need to use a const_iterator — which holds a constant reference to the vector's element. (You can also use a const_iterator on a regular vector when you just don't want to change the contents accidentally.)

The usage of const_iterator is identical to iterator except that you cannot store into the dereferenced value.

This all works because the const keyword is useful to overload class functions in addition to their argument lists. What? Yeah. A vector or string which is a constant will always call the const version of the begin function and we'll receive a const_iterator as a result. That way you cannot get a plain iterator into a constant vector or string — no accidental changes!

But what about if I just want to look at the elements in a non-const vector or string? Well, through the magic of inheritance (which we'll discuss later in the semester), the iterator type is upwardly compatible with the const_iterator class. (This is somewhat like the way you can store a double value into a long double but not the other way around.)

Odds and Ends

With vectors, you get the additional benefits of access to:

    .insert(ip, val)
    .insert(ip, ipsb, ipse)
    .erase(ipb, ipe)
    .erase(ip)
    .assign(ipsb, ipse)

Where ip is an iterator position, an 's' indicates the 'source' vector and 'b' and 'e' represent 'begin' and 'end', respectively. These pairs are as above begin and end iterators and so the end iterator is not placed into the sequence — ends are ALWAYS exclusive.

(There are iterator versions of these functions for the string class as well, but we already had such functionality with size_types, so it isn't that big of a deal. *smile* Oh, but string also has iterator versions of its replace functions!)

There are also constructors for both string and vector classes which accept pairs of iterators from which to initialize the new sequence.

In fact, the mechanism by which these iterators are passed to the constructors is so powerful that it can accept anything that acts like an iterator — even an old pointer! This finally gives us a way to initialize a vector without a horrid sequence of push_backs:

    const ____ arr[SIZE] = { a, b, c, ..., d };
                         // arr             arr+SIZE

    vector<____> vec(arr, arr+SIZE);

When Is an iterator Not Really an iterator?

Note that iterators into a vector/string which are at or after an erased position are considered 'invalidated' — they should not be dereferenced ...upon pain of death!

itr2 ----------------------------+
                                 |
                                \ /
                                 `
+----+----+----+----+----+----+----+----+
| aa | bb | cc | dd | ee | ff | gg | hh |   vec [8]
+----+----+----+----+----+----+----+----+
                 ^
                / \
                 |
itr -------------+

Where the [8] is signifying that vec contains 8 elements.

Now let's erase two elements starting with itr:

    vec.erase(itr, itr + 2);

Even though the item gg is still in the vector, I am not allowed to dereference itr2 because it and itr and any other iterators which iterated at/beyond itr are considered invalidated!

Furthermore, if I were to accidentally dereference itr2, I would be erroneously using memory I no longer have legitimate access to. (It may still contain a gg, but it won't be the one from inside the vector!)

Look at the 'vector' after the erase:

itr2 ----------------------------+
                                 |
                                \ /
                                 `
+----+----+----+----+----+----+----+----+
| aa | bb | cc | ff | gg | hh | gg | hh |   vec [6]
+----+----+----+----+----+----+----+----+
                 ^
                / \
                 |
itr -------------+

Notice that dd and ee are gone and vec is now only 6 elements long — but seems to be physically 8 positions in length! That's because there was no need to physically remove the memory locations after shifting the data down to cover up the erased elements. The vector merely remembers that those last two positions are not part of the data anymore.

Unfortunately, no one told the iterators that certain positions were no longer valid! Here we've only created two iterators into the vector, but a typical application may have tens or more iterators into a vector when an erase-ure occurs! It is somewhere between impractical to impossible for the vector to inform all iterators of position invalidation.

Still, why are the iterators invalid? They both still iterate positions in the physical vector, right? Well, yes, but they are no longer iterating the data the programmer expected them to iterate. itr no longer iterates dd because it is no longer part of the vector. And itr2 no longer iterates gg because the vector's actual data gg is two positions left of itr2's position. Notice what happens to itr2 when I perform the following assignment:

    vec[4] = "qq";

See? Oh, sorry. Here:

itr2 ----------------------------+
                                 |
                                \ /
                                 `
+----+----+----+----+----+----+----+----+
| aa | bb | cc | ff | qq | hh | gg | hh |   vec [6]
+----+----+----+----+----+----+----+----+
                 ^
                / \
                 |
itr -------------+

What? You still don't see it? Look where itr2 iterates: gg. But there is no longer any gg in the vector! Now it is more obvious why itr2 is considered invalidated, right?

Still not sure about itr, eh? Well, all I can say is that it was expected to have been iterating dd and we no longer have such a beast! It currently iterates ff — whomever that is!

If I'd done something like this:

    sz = itr2 - itr;             // record distance between iterators

    vec.erase(itr, itr + 2);     // erase 2 elements starting with itr

    itr--;                       // the previous position is still valid

    itr2 = itr + sz - 2 + 1;     // re-calculate the position of itr2
                                 // adjusting for removed elements and
                                 // the shift of itr

I can retrieve the originally intended position for itr2. (Note that I did 'itr--' immediately after the erase'ure.) This particular case is actually a little short-sighted, however. See this sample code for a more general procedure.

Any invalidated iterator could be adjusted in this way, but many programmers will simply give up and re-establish the position the way they did in the first place. This might be by some search technique or relative positioning or ...

For the Adventurous

But I had a nice algorithm that used size_types to process my string (or vector) in reverse. How can I do that with iterators?

Well, there are also reverse iterators:

    vector<double>::reverse_iterator ritr;
    vector<double>::const_reverse_iterator ritr2;

These are similar to forward iterators in behavior — only in reverse! For instance, you can initialize them with the rbegin function and check them for validity with the rend function.

    ritr = vec.rbegin();
    if (ritr != vec.rend())
    {
        cout << *ritr;
        *ritr = 4.2;
    }

The most confounding thing about them to most beginners is that when you increment them — ++ritr — they move to a previous element. Remember, though, they are iterating in reverse!

    for (ritr = vec.rbegin(); ritr-vec.rbegin() != vec.size(); ++ritr)
    {
        // use *ritr -- it's safe!  (I promise...)
    }

This loop will go through all the elements of the vector vec from the last to the first.

Likewise, decrementing them will move to the following element.