Buckets, Buckets -- Who's Got the Buckets?

Topical Information

This project will help you show your mastery of Hash Tables, Binary Search Trees, and (self-organizing) linked lists.

Program Information

Someone has claimed that simple linear linked list chaining of hash table collisions gives truly poor performance. This individual would like to try replacing the simple linked list chain with a self-organizing linked list (I'd recommend the this strategy). Someone else claims that a BST would be the best solution.

To avoid WWIII, you decide to code all three versions and collect timing information for each. (Recall that to get good timings you may need to use thousands of data values and even do the operations several times to take an average time. This is separate from the average timing you'll gather for your graph. That average is an average of at least 5 separate runs of your program.)

To ease your coding trauma, you decide to make a single hash table class which takes its chaining DS as a template parameter. All three of the proposed variants have insert, delete, and search functions. All three can be simply (i.e. default) constructed. You can have this parameter default to a simple linked list. This is the behavior most programmers will expect, after all.

Remember you'll need to use the same set of data to test each version. You'll also want to gear your hashing function to produce as many duplicate values as possible. In fact, just to make this hash table class even more general, you should probably have the hashing function be another template parameter. Give the programmers using your class a default for this parameter, though, so they won't have to think too hard if they don't want to.

And what kind of data are we hashing? I'd template it, but make your tests with something relatively simple — like a built-in type. More complicated data will just complicate the timings as data are hashed and compared.

Once you have average timings (average 5 runs each at several different values of n — the size of the data set), graph all three different chaining variants together for comparison. If there is relatively little difference (low mean-square error), the choice is moot. If one is much faster than the others, we've found a winner. If two are faster than the third, we'll have to keep searching for a way to distinguish the faster two.

At any rate, write a short discussion of your results (including the graphs for support).

This assignment is (Level 5).

Options

Add (Level 3) to run a fourth set of tests with a secondary hash table as your chain. This complicates matters somewhat since a hash table doesn't default construct as well and will need two template parameters filled in for its chaining DS and hash function. Still, it won't be that terrible. Add it to your timing runs and your graphs and discussion.
Add (Level 4) to perform a (complete) Big-O analysis of each chaining alorithm as it affects the insertion, deletion, and searching of the hash table overall.