When discussing portions of a number, we (especially computer scientists) use the terms 'least significant' digits and 'most significant' digits. By this we mean nothing about the importance of those digits. We simply are implying that if you were to change the digit, how much impact would it have on the entire number. For instance, if you had the number 482, changing the 2 into a 3 would have very little effect on the number itself. However, changing the 4 to a 5 would increase it by 100 -- a much more significant effect.
When you need to extract certain digits, you can use integer division and modulo by powers of 10 (because we do our numbers in base 10). To keep the least significant 4 digits of a 7-digit number, for instance, you'd do:
least_sig_4 = big_num % 10000;
The reason this works is thus:
_____123 R 4567 10000 / 1234567 10000 ------ 23456 20000 ----- 34567 30000 ----- 4567
Or, you might see it like this:
1234567 / 10000 = 123.4567
but with integer division this is:
1234567 / 10000 = 123 R 4567 (over 10000)
From this same process, we see that to throw away those 4 least significant digits, we'd use plain integer division:
most_sig_3 = big_num / 10000;
Recall that with integer division we keep the quotient instead of the remainder (as with modulo).
You can also generalize this to other digit groups. Let's say that a company's product code was an 8-digit number with 3 segments as so: 12-345-678. We could store this value as a single long integer (which can store up to 10 digits -- although only about 20% of complete 10-digit numbers). This would require only 4 bytes (32 bits) of memory or disk space. Storing the 3 parts would require anywhere from 6 bytes to 10 bytes (storing as 3 short integers or 10 characters -- with the dashes).
To break it apart for printing, we could do this:
// get 8-digit long integer from memory or disk storage first_pair = whole_code / 1000000; last_triple = whole_code % 1000; middle_group = whole_code % 1000000 / 1000;
Note how the first and last groups were simple integer division or modulo. The middle group is a combination of the two. It first keeps the least significant 6 digits (1e6 as divisor) and then throws away 3 least significant digits (1e3 as divisor). That leaves it with just the 3 most significant of the 6 least significant -- or the middle 3 digits of the overall number!
1234567812345678 % 1000000 keep only least sig 6 digits (1e6 divisor) 345678/ 1000 throw away least sig 3 digits (1e3 divisor) 345
This isn't the only possible way, though. When you have more than two digit group extractions, there are generally more than one way to perform them. This extraction could have also been done as:
middle_group = whole_code / 1000 % 1000;
Here we first throw away the 3 least significant digits before keeping only the 3 least significant digits of what remains.
12345678 12345678/ 1000 throw away least sig 3 digits (1e3 divisor)12345 % 1000 keep only least sig 3 digits (1e3 divisor) 345
So, in order to keep the n
least
significant digits of a base 10 number, use modulo with a
divisor of 1en
(as an integer!).
To throw away the n
least significant
digits of a base 10 number, use integer division with a
divisor of 1en
(as an integer!).
(If you are dealing with a different base, use the
n
th power of your base as the
divisor. 'keep' is still modulo and 'throw away' is still
integer division.)
Recalling that the integer types have these properties:
Max (signed) | Max (unsigned) | |
---|---|---|
short | 32767 | 65535 |
long | 2147483647 | 4294967295 |
We see that a short integer can store any 4-digit number completely. It can also store about 30% of the 5-digit numbers when signed or a little over 60% of them when unsigned. For long integers it can store all 9-digit numbers. When signed, it can hold about 20% of the 10-digit numbers as compared to 40% when unsigned.
Recalling that the discrete types have these properties:
Bits | Bytes | |
---|---|---|
bool | 32 | 4 |
char | 8 | 1 |
short | 16 | 2 |
long | 32 | 4 |
We see that to store a multi-grouped number will take quite a bit of storage if done as individual character digits -- one byte for each digit plus one for each group seperator.
Storing as one short integer per group is slightly better: 2 bytes for each group. But storing as a single long integer is best: 4 bytes and that's it.
Let's look at a couple of examples:
Digit Sequence | Stored as | ||
---|---|---|---|
chars | shorts | a long | |
(in bytes) | |||
123-4567 | 8 | 4 | 4 |
12-345-678 | 10 | 6 | 4 |
123-456-789 | 11 | 6 | 4 |
123-456789 | 10 | N/A | 4 |
All of this assumes that your numbers (or even groups) can fit as a single whole number, of course. What if it can't? (Note how the last example can't be stored as a pair of short integers.) We can still minimize it a bit:
1234-567890 short+long 2+4=6 bytes 123-456-7890 3*short 3*2=6 bytes or long+short 4+2=6 bytes
The first two are fairly self-explanatory, but what about that last one?! Two groups are combined into a single long integer. Hey...come to think of it, we were doing even three groups combined into a single long integer above. We've seen how to break a long integer into smaller groups, but how do we combine these groups for storage?
To combine the 123-456 into a single long integer for storage, simply multiply the first group by 1e3 (3 being the number of digits in the second group) and add the second group:
123 * 1000 + 456 // gives 123456..?
Be careful, though. When the compiler sees this exactly, it won't give the correct result. Instead you'll get -7616. (Well, it depends on the compiler, really, but traditionally it would be this.) Why? When you multiply the 123 (short) by 1000, you'd like to get 123000. Instead you get -8072 (123000 crammed into a short). To avoid this possibility and make sure it will work, simply make your 1000 a long integer literal:
123 * 1000L + 456 // gives 123456
or constant:
const long dig_shift_3 = 1000; 123 * dig_shift_3 + 456 // gives 123456
(Note: the L after the literal 1000 makes it a Long integer. You are allowed to use lower case l, but this is MUCH harder to read and is severely frowned upon!)