Data = Value

Topical Information

This lab is designed to give you practice with labeled data in files. Again, classes can help ease your design woes.

Program Information

Write a program that transfers the contents of one file to a second file. The first file will contain an arbitrary (unknown) number of data groups. A data group will normally consist of a person's name (a string containing spaces), the person's student ID (a large integer), the person's GPA (a floating point number), and the person's gender (a character).

The data in each group should be in a labeled data format. Be sure to use some kind of string library to help you with the label/value processing (as we discussed in class). (The library provided here is good but not everything you'll need. You still need functions to convert strings into integers and into floating point numbers. string to bool, string to string, and string to character conversions should be fairly easy.)

You can choose the actual values to use for the people's data, but you should be able to handle:

invalid labels
comments (at least whole-line)
unknown labels
missing labels
unordered (but properly blocked) labels

Remember that your program cannot know how many data groups are in the file ahead of time! In this context, this means that you cannot store all the names in your program's memory — not even using dynamic memory or a vector.

You'll have to read the files' names from the user. Protect your program against any errors that may occur during the opening of the files. (Try to use a class/functions to break up the program into more manageable pieces.)

As an example, you might have the input data file contain:

# format is 'label = value' -- one per line
# known labels are: name, ID, GPA, and gender
# spacing around '=' is okay
name = Jason James
ID= 123456
GPA =9.2
gender=M
# mixed items
name = Tammy James
GPA = 11.2
gender = f
ID = 123457
# mixed, missing, and extra fields
name = Henry Ramirez
GPA = 12.3
ID = 111888
major = ChE
class = soph
ID = 788531
# missing fields
name=Suzie Shah
geNDEr=t

(The highlighting is provided as a visual aid -- it is not really gonna be part of your input files. *grin*)

The program should produce from this a 'clean copy' such as:

# format is 'label = value' -- one per line
# known labels are: name, ID, GPA, and gender
# spacing around '=' is okay
name = Jason James
ID = 123456
GPA = 9.2
gender = M
name = Tammy James
ID = 123457
GPA = 11.2
gender = F
name = Henry Ramirez
ID = 111888
GPA = 12.3
name = Suzie Shah
ID = 788531
gender = T

Note how the user's commentary is gone and only the program's reminder commentary is replicated. Also all labels (and gender values) are now in standard capitalization/format and order of data in each group as well as spacing of each data line is uniform. Labels that had their defaulted values (and therefore were not assigned) are not stored. (Although you could output a comment noting that the value was missing: # no GPA specified.)

And the program interaction might look something like (the parts in this color are typed by the user):

$ ./copypeople.out

                 Welcome to the People Data Copying Program!!!

Please enter the name of your data file:  bob.dat

I'm sorry, I could not open 'bob.dat'.  Please enter another name:
students

File 'students' opened successfully!

Please enter the name of the copy file:  /can't write here

I'm sorry, I could not open '/can't write here'.  Please enter another name:
students.bak

File 'students.bak' opened successfully!

Copying data from 'students' to 'students.bak'...

Done copying data!

Thank you for using the PCP!!

Endeavor to have a tremendous day!

$

Thought Provoking Questions

What does your main look like? Aside from user interfacing code, is it over 10 lines long? (It shouldn't be...)
How much data does an object read from the file?
How does the object know what is and is not part of its data?
How do you recognize comments in the data file?
How do you recognize labeled data lines in the file?
How can you tell a line is neither (comment nor labeled data)?
How can you detect that you are done with a block of data?
What happens if you hit the end of the file while in the object's input method? (Hint: There are two possibilities!)
How do you fill in default values when you run out of data in your block?
How do you split a line into the label part and the value part?
What happens if the label separator is part of a data item? (Maybe there's a student named =) at the school..?)
Does spacing around the label or around the separator matter to your program?
How do you recognize that a label is valid or not?
(In your design) Can a single data item take up multiple lines? Can multiple data items be on a single line? Why/Why not?
Can you use a function which translates a string into an integer to help you translate floating point values?
How would do you translate a string into bool data? char data? string data?
Why don't the handy and input libraries associated with the string library and its test program have .C files?

This assignment is (Level 4).

Options

Add (Level 3) more if your 'labeled' format is XML (or at least XML-like). XML uses a system of tags similar to HTML. The example file above might look like this as XML:
```
<student>
    <name>Jason James</name>
    <id>123456</id>
    <gpa>9.2</gpa>
    <gender>M</gender>
</student>
<student>
    <name>Tammy James</name>
    <gpa>11.2</gpa>
    <gender>f</gender>
    <id>123457</id>
</student>
<student>
    <name>Henry Ramirez</name>
    <gpa>12.3</gpa>
    <id>111888</id>
    <major>ChE</major>
    <class>soph</class>
</student>
<student>
    <id>788531</id>
    <name>Suzie Shah</name>
    <gender>t</gender>
</student>
```
(The indention is not required, but has been added here for clarity.)

Note how some problems mentioned in the TPQs above are removed by the rigid structure but others are created.

Here's a brief introduction to the syntax of XML.

You can take some liberties. You can allow the tags to be case insensitive for your user's convenience. You can allow no root tag for your convenience. You can allow embedding -- in comments for everyone's convenience. But otherwise, basic syntax rules should be followed.