What is good programming style? It consists of several key elements, but chief among them — an underlying rule — is to always be consistent.
Or, if you aren't so new to programming, have a look at this comic and then continue below.
Since you are new to programming, you may wish to experiment a bit to find your true style. That's fine. But please maintain consistency within a single program. And once you find your style, you can use it consistently thereafter. *smile*
Beyond that, please follow the below guidelines to the best of your ability. (Don't worry, I'll be there with my magically-colorful marking pens to guide you when you stray. *smile*) (But if you persist in the infractional behavior, I'll be forced to take points off your portfolio score, eventually. *sigh*)
The different elements that make up good style (in any language, really) are:
Spacing | Identifiers |
Structure/Organization | Clarity |
Brevity | Special |
Programmers prefer to keep things brief so that there is less clutter and redundancy to deal with.
Less is more.
This is part of the KIS principle: Keep It Simple.
And, always remember that, "The less I type, the fewer mistakes I can make."
#include only as necessary
Only #include a library's header if you are going to use items from that library (functions, constants, typedefinitions, etc.) in the current file. Don't say you are using a library if you don't use it there.
One of the most common places I'll see this particular offense (after the beginning of the semester when you just want to #include every library we've talked about) is when you are creating your own libraries. If your library's implementation uses items from another library, it should #include that other library's header itself. In particular, don't #include a library header in your library's interface file so that the implementation file can use it! Only #include a library in your interface file if you need to use that library's facilities in the interface file...
#include'ing many headers (often all the headers you know seem to appear up there!) shows that you don't really understand how or why the libraries' features work. It can also slow down your compilation a great deal. It might do you well to place a comment with each #include that lists exactly which features you are using from that library. This will keep you better in mind of why/how you are using libraries.
When we can't make something brief, we try to make it as clear as possible. Sometimes we're even willing to forgo a little brevity if we can make something more clear.
place declarations at the top of each function
Even though C++ allows you to create constants/variables at any point in the program, conventional wisdom holds that it is much easier to read a program when variables don't continually pop up throughout the program. It also helps keep your thoughts clear and organized if you try to place all of your variables at the top of the function to begin with.
It's fine if you forget to declare some of your variables/constants (you're a newbie programmer, sa'right), just put them in later (when the compiler tells you to, most likely). But be sure to place them with all of the others at the top of the function.
This practice of grouping all your constant/variable declarations at the top of their function also avoids lots of 'undeclared' errors when you've placed a declaration inside a while loop, for instance, and then used the memory location in the loop condition or after the loop. (I've also seen students come up with such errors when declaring a memory location at line 10 of their code and then later deciding they need to use that variable in a calculation at line 8.)
But perhaps worst of all is when you get lots of 'redeclaration' errors because you keep declaring helper variables/constants each time you use them (maybe an input helper or a calculation helper). Placing all of your declarations at the top of each function can help you avoid such duplication before the compiler has to catch it for you. Or at least help you decide which duplicate you should remove.
Exception: The Loop Control Variable (LCV) of a for loop is becoming more popularly declared within the head of the loop — unless, of course, its ending value will be used after the loop is over. (An LCV that is declared in a loop's head exists only within that loop — just like formal arguments declared within a function's head exist only inside that function.)
avoid magic numbers — use constants instead
Whenever you have some physical (acceleration due to gravity, Avagadro's number, etc.), mathematical (π, e, etc.), agreed/defined (inches per foot, ounces per pound), or economic (tax rates, interest rates, etc.) constant in a program, it is wise to make it an actual constant — don't just use the literal value in calculations and don't make it a variable unless it is meant to change while the program is running (like a program that must update tax rates when legislation does, too).
You will often also have symbols used in a program (the letter R is a common abbreviation for Thursday, for instance) that would do better as constants (const char Thursday = 'R';).
For a group of related integral constants, you can also use the enumeration language feature. For instance:
enum WeekDays { THURSDAY, FRIDAY, SATURDAY, SUNDAY, MONDAY, TUESDAY, WEDNESDAY };
would define that WeekDays was a data type with the seven specified constants as its only allowed values. (They are automatically initialized with THURSDAY being 0 and the rest being one greater than its predecessor such that WEDNESDAY would be 6.)
If you wanted the values to start other than at 0, just initialize the first one. All others will be automatically incremented by one from their predecessor:
enum MonthNums { January = 1, February, March, April, May, June, July, August, September, October, November, December };
So February here will have the value 2 and March will be 3 and so on...
You can even assign each one its own value, but that takes away some of the programmer-efficiency benefits:
enum MonthDays { Jan_days = 31, Feb_days = 28, Mar_days = 31, Apr_days = 30, May_days = 31, Jun_days = 30, Jul_days = 31, Aug_days = 31, Sep_days = 30, Oct_days = 31, Nov_days = 30, Dec_days = 31 };
At least you still have the data type (MonthDays) to encapsulate all the constants of your enumeration. *shrug*
comments
Comments are there to clarify unclear code. Well written code will have a feature known as self-documentation. This means that just reading the code (without comments) should give you a pretty good idea of what is going on. However, even self-documented code needs a little push now and again. Traditional comments help to ensure the clarity of your code.
For example, this:
double x, y; // declare x and y variables
would be considered bad commenting (as well as poor identifiers).
This would be an improvement on both the comment's content and placement:
double x, y; // upper left coord of rectangle
And here we finally reach excellent content and placement (as well as fixing the identifiers such that we may not even need the comment so much any longer):
// upper left coordinate of rectangle double up_left_x, up_left_y;
Comments should be placed to the right of or (even better) before the line(s) they comment. Comments that come after what they comment do less good at clarifying because the reader is already confused! Comments to the left are cluttery and obscure (not to mention tricky to put in). The more confusing your code, the more you should lean to a clear comment before the code. If the code is only slightly confusing, a note to the right might be appropriate.
Comments should relate the code to the real-world process/problem. They should NOT explain features of the C++ language. If someone is reading code to learn to program, they can either figure it out themselves or buy a book or take a course. Most people reading your code are already programmers and don't need to know that: double x, y; declares two large floating point variables. They'd be more interested to know that x and y represented the 2D coordinates of the upper left corner of a rectangle. (Although you should have probably then called them up_left_x and up_left_y, eh?)
Comments are most common in places such as the following:
Clarity really culminates in the use of clear identifiers. This is a penultimate goal! If you name things well, in fact, you can avoid placing comments in some places — making your code/coding more brief.
use descriptive identifiers
An identifier is the name/word used to identify a variable, constant, data type, or function. This identifier should start with a letter (it can start with an underscore (_), but this practice is frowned upon as we try to reserve that for standard library names). The identifier can thereafter contain letters, underscores, or digits.
The case of your identifiers (in fact everything in C++) is significant! So make sure that you are consistent with not just your spelling of identifiers, but also their capITaliZaTioN.
The length limit on identifiers is something you'd not likely reach any day soon. (It is actually "an arbitrarily long sequence of [allowed characters]" in which "all characters are significant" according to section 2.10 of the C++98 standard.)
Make the identifiers you choose representative of the real-world (or problem) meaning of the thing it identifies. Don't use names such as t when time is more clear (and time_of_departure is even clearer than that, but some might say it's a little much to type; *shrug*). Don't use names such as bob or Dmitri — what would these possibly mean, anyway?!
Often people name constants in all upper case, data types in mixed case, and variables and function names in lower case. Other people don't. It would be a good thing for you to pick a capitalization style and stick with it. (I will say, though, that the all-caps constants style is really handy for quickly spotting one in code.)
Eye strain is a major concern in the computing industry. *sigh* Making the code spaced out helps alleviate this eye strain considerably. (The mind likes a little gap in the view to ease its processing, I suppose... *shrug*) In fact, spacing helps within a program's interface as well — those who use lots of console interfaces have similar problems with eye strain.
In addition to its medical benefits, good spacing can help to enforce structure and organization (detailed below). Likewise, a logical interface is also easier for a user to follow.
blank lines
Blank lines within a program help to draw attention to logical steps (groups of statements) within the code. It is common to put a blank line before each grouped segment of statements (similarly to where one puts comments) — often in front of the comment for that group.
Blank lines within a program's output help to focus the user's attention on shifts in content: I was inputting data, now I'm printing output.
line wrap
Long lines work fine for the editor and compiler, but the printer really abhors them. It will most commonly chop them off around 75-80 characters. Try to keep your lines at that length (or less). Remember, a single statement can wrap around several lines. (See also indention for continued statements.)
This rule applies to lines your program outputs, too. Don't let them wrap at the user's screen boundary. Put in \n's and/or endl's to break long output lines appropriately.
indention (or is it "indentation"?)
It is important to indent to clarify the flow of the program. Whenever you open a curly brace ({), you should indent within it by one level. When you close a curly brace (}), you should unindent it (the close curly) by one level.
What should a 'level' be? Commonly a level is between 3 and 5 spaces. One or two spaces doesn't provide much of a clue (if the font is small) that the indention has shifted. (Although it is, oddly, enough to drive the OCD — or similarly affected among us quite mad when regular indention is off by just one or two columns.)
The common Tab's 8 spaces is quite deep (considering that you may have multiple indents 'stacked up' and we only have 75-80 characters per line). (This also implies that you should avoid use of the Tab key for indention. I know it is convenient, but it will really screw up your program's style. If you are lucky, your editor may support automatic conversion of Tab key presses to a specified number of spaces. At least two of the ones in our lab do support this feature.)
The one exception to this curly brace rule is in a switch structure. The switch requires syntactic braces to enclose all of its cases. Indenting inside these braces is optional. However, whether you use bracing on your case blocks or not, you must indent inside of them.
So any of the following are valid indention schemes for a switch.
Optional Left Indention | No Extra Left Indention | |
---|---|---|
break aligns with case | switch (common_expression) { case val1: case val2: // code to execute break; case val3: // code to execute break; default: // code to execute break; } | switch (common_expression) { case val1: case val2: // code to execute break; case val3: // code to execute break; default: // code to execute break; } |
break indented with block | switch (common_expression) { case val1: case val2: // code to execute break; case val3: // code to execute break; default: // code to execute break; } | switch (common_expression) { case val1: case val2: // code to execute break; case val3: // code to execute break; default: // code to execute break; } |
braces on case blocks; break inside | switch (common_expression) { case val1: case val2: { // code to execute break; } case val3: { // code to execute break; } default: { // code to execute break; } } | switch (common_expression) { case val1: case val2: { // code to execute break; } case val3: { // code to execute break; } default: { // code to execute break; } } |
braces on case blocks; break outside | switch (common_expression) { case val1: case val2: { // code to execute } break; case val3: { // code to execute } break; default: { // code to execute } break; } | switch (common_expression) { case val1: case val2: { // code to execute } break; case val3: { // code to execute } break; default: { // code to execute } break; } |
Note also how the break statement that is a necessary part of the switch structure's flow control may be indented with the case block or aligned with the case (like a close curly brace to its matching open). Some folks also like to place braces on case blocks. Some of them put the break inside the braces, others put it outside. I leave it up to your personal taste. (Although the last two in the right column have that curly-curly alignment that gives me a little shiver like someone walked over my grave...)
Another place to indent is when you wrap a long statement across to a second (or further) line(s). Try to break the long line before an operator and if possible a whole expression. Particularly, try to keep parenthesized expressions together if at all possible. (If this is not possible, perhaps you'd like to indent the wrapped line(s) over to the opening parenthesis to visibly show its relationship? For instance we could break the following line at the minus before second_y:
distance = sqrt( pow( first_x - second_x, 2.0 ) + pow( first_y - second_y, 2.0 ) );
To become:
distance = sqrt( pow( first_x - second_x, 2.0 ) + pow( first_y - second_y, 2.0 ) );
Or, if we wanted it more appealing, we could break the entire pow expression to the second line:
distance = sqrt( pow( first_x - second_x, 2.0 ) + pow( first_y - second_y, 2.0 ) );
If a statement needs to break inside a literal string, you must terminate the string on the first line and then re-open it on the next line. This doesn't require an operator as the compiler will automatically concatenate space-separated literal strings in your source code. For example, the following:
cout << "Let's say this terribly long string needed to break to wrap the statement\nbecause just dropping down in front of the earlier operator wouldn't help as\nthe line would still be too long, wouldn't it?\n";
could become:
cout << "Let's say this terribly long string needed to break " "to wrap the statement\nbecause just dropping down in " "front of the earlier operator wouldn't help as\nthe " "line would still be too long, wouldn't it?\n";
Notice how the spacing is still included inside the double-quotes or else words might run into one another on-screen.
The second line should be indented at least to the first operator in the prior line. (Note how we lined up with parentheses on the second distance example and with the quotation mark in the string example. As you can see, then, other indention targets than the first operator can make sense, too — use your best judgement or ask your instructor.) There is a similar example of the wrapped enumeration definition elsewhere. It simply involves syntactic braces ({}) instead of parentheses ( () ) or quotes ("").
horizontal space
Even spacing horizontally along a line can be important. Try to place space around operators. Also place space after prompts before the user types their response.
Organizing the logical structure of a program can greatly enhance its readability, clarity, and even its ability to grow with future needs — its enhance-ability, if you will.
use braces on all structures
Even though braces ({}) are optional on the flow control structures when only one statement's flow is being controlled, it is a good idea to always use them. It makes all of your structures more consistent, neater, and easier to read. A half-braced structure simply looks lame.
This practice will also save you hassles when you need to add a statement to a branch or loop that wasn't previously braced. A common mistake in this situation is to forget to add the braces on the structure — even though they are now needed as you've added a second (or further) statement(s). If this extra statement is before an else, the compiler will catch it. In all other situations, however, you'd have to notice it as a (fairly) subtle logic error. Having had braces on all structures from the beginning would have avoided this common mistake.
The one exception to this rule is in the case of cascaded if structures. In this situation we consciously relinquish the else's braces to avoid excessive indention levels and produce a cleaner structure overall.
branches should connect
Never place exit calls, returns, or other such code inside a branch. Branches were meant to connect together — they entered at the top and should exit all together at the bottom to continue on with whatever comes afterward. Look at the flow-chart form of the branching structures and note how the arrow enters at the first diamond and all of the flowing arrows reconnect at the end before the program can continue on.
(The single exception here is the use of break in a switch. This is necessary as the switch cannot work correctly without them. That is, it would still have a single 'exit' point, but the 'branches' wouldn't really branch...remember that 'fall-thru' feature?)
a loop is a circle with a (single) tail
Loops should not have multiple exit points (just as branches shouldn't). A loop should continue to flow in a circle until the (continuation) condition fails. (At which point the program will continue executing with the next statement after the loop.)
functions should exit through a single point
A function has a single entry point: the call (which logically connects to its head/open curly brace). It should likewise have a single exit point: the return (which both logically and in practice connects back to the caller — just 'after' their call to the function).
This implies that each function should have a return statement (even functions with a void return type) — and only one return statement.
avoid break, continue, and goto
The break statement should only be used in a switch structure. Nowhere else is it required. If you want a loop to end 'early' in some situation(s), design it as an indefinite loop — a while or do — and && extra condition(s) — the logical opposite(s) of the if's around your break — representing these early withdrawals. (If you have trouble compounding these new conditions to the original, step back to "when do I want to stop?" and apply DeMorgan's Laws.)
If you used the break to skip over part of the loop's body, then things get a bit trickier, but it is still do-able. Proceed as above, but append an if after your loop ends re-checking your original loop termination condition. Inside this if, place the statement(s) which formerly preceded the break's if. Perhaps an example:
for (I; C; U) { A if (C2) { break; } B }
Would be transformed into:
I t = false; // t is a temporary bool variable while (C && !t) { A t = C2; if (!t) { B U } }
The continue statement can similarly be avoided by proper use of structured programming. In its most common form, programmers try to apply a continue in an if in the midst of a surrounding looping structure. By simply negating the if condition and extending the if's braces — you did remember to brace your if, didn't you? — around the remainder of the loop body, the continue is avoided. For example:
while (C) { A if (C2) { continue; } B }
Would be transformed into:
while (C) { A if (!C2) { B } }
Finally, the goto keyword is simply there because our ancestor — poor, old C — had it. (Although it can be used to make code more efficient, the result is often MUCH less readable.)
In fact, the goto statement was the primary target of the original structured programming movement. (Its only true place of sense is in Assembly language programming ...and I'm sure even that has been disputed from time to time.)
Every language may also have its own special considerations. (In fact, you'll find that each programming community can develop its own special style guidelines to a point. *shrug*) Here is the principle one I've found for C++.
avoid the int type
The data type int should only be used when connecting to Operating System (OS) or standard library features. The problem is that the type int can change size when migrating a program amongst various OS's. (The hardware is a strong force in the decision of how large an int should be — the system WORD size. But it is the OS that makes the final call — as the OS is our liaison to the hardware, we must follow its decision.)
Therefore, it should be avoided for any application bound data. Only use int when interfacing with the OS (for instance the return type on main) or legacy/poorly designed/implemented library interfaces (toupper(), rand(), or .ignore(n,c), for instance). When designing your own program's variables and/or constants, always choose short for small integers or long for large integers.