Variables and types
Definitionβ
Variables in programming are kind of a container with a label that holds information. We use them for storing data values and to refer to them later using their name.
Technically a variable is a named location in memory. It is the basic unit of storage in a program.
Advantagesβ
The advantages that come when using variables (instead of raw numbers or text) are various:
- the value stored can be changed during program execution
- they provide a way of labeling data with a descriptive name, giving it a meaning
- all the operations done on the variable affect that memory location
- the value can change but the name does not, so we know that we can always refer to it without worrying about its value.
Imagine that you want to write a program that sums two numbers.
Those numbers can be 3 and 2, so you write:
cout << 3+2; // Output: 5
or 13 and 56:
cout << 13+57; // Output: 70
But if we want to cover all possible cases of numbers, we can't do it that way. We use variables instead:
// ... linking section and main()
int a;
int b;
cin >> a;
cin >> b;
cout << a+b;
Then we can insert every number we want and the program will change the result according to the input.
The power of variables is that they are generic and don't represent just one magic number, so they can be used in various cases with different values/data each time.
So when used in a program, that program acquires the possibility to cover multiple cases instead of just one.
Variables are usually declared at the top of the body of the main()
function. You can't work
with a variable if the declaration is after you use it
Variable declarationβ
To declare (or create) a variable we need to specify the type and assign it a value.
Data typesβ
In C and C++ there are some keywords that assign to some data a certain primitive built-in data type.
They are:
TYPE | DESCRIPTION |
---|---|
int | stores integers (whole numbers), without decimals, such as 5 or -5 |
bool | true/false or 1/0. It can assume only these two values |
float | stores a single-precision floating point value, or decimal value |
double | stores a double-precision floating point value, or decimal value. It's more precise than float |
char | stores single characters, such as 'a' or 'B'. Char values are surrounded by single quotes |
string | stores text, such as "text". String values are surrounded by double quotes |
wchar_t | stores a wide character. It's like char type but greater in size. Also known as UTF or Unicode |
void | represents a valueless entity. It means "nothing" or "no type" |
Don't pay too much attention to the last two primitive types (wchar_t
and void
) because for
now they are not important. Also keep in mind that these numbers have a certain range/size and they
don't go to infinity. For example, int
type can go from -2147483648 to 2147483647.
These limits are defined in the <limits>
header
of the C++ Standard Library,
included by default in every C++ program. We will learn more about INT_MIN
and INT_MAX
in
a future explanation.
Why can an int
type go specifically from -2147483648 to 2147483647? Well, the reason lies
in the size of an integer value, which is 4 bytes, or 32 bits. The first bit is used
for the sign and can't be used, so the maximum number that can be represented in that size
(separating positive and negative values) can be mathematically calculated by elevating
the possible states that a bit can assume (0 or 1, so 2 states in total) to the number of
digits available (32 - 1). Therefore the formula range = possibleStates ^ numberOfBits
:
If we also consider 0, we can conclude that numbers can go from -2147483648 to
2147483648 - 1. These are all the possible combinations/configurations with 31 bits available plus
one for the sign, using 0
to represent positive numbers and 1
for negative ones. This means
that you can't store values above or below that range in a regular (signed
) integer type.
If we need greater numbers and we don't use negative values we can also use the keyword
unsigned
before int
to expand the range up to 232 (4,294,967,296) without using
the first bit for the sign, although using unsigned numbers is generally
not recommended.
A better way is to use the long
keyword.1
Attempting to calculate a number that is beyond the range of a variableβs type is known as an overflow. The C++ standard generally leaves the results of an overflow undefined.
A floating-point overflow generates an exception that, if not handled, will cause your program to crash.
An integer overflow is even worse because C++ generates an incorrect result without complaint.
Auto and Decltypeβ
With the C++11 standard or later, you can also use the auto
type. This keyword allows
the programmer to leave the type deduction to the compiler itself. All variables declared as
auto
must be initialized with some value.
Together with auto
, we can use the decltype
specifier, which is a keyword that is also
used to specify a type, but works differently: you give it a variable inside the round brackets
and an expression following that. You can read more here.
See this program:
#include <iostream>
using namespace std;
int main() {
int x = 100;
float y = 199.00;
decltype(x) z = y/x; // x is an integer
cout << z << endl;
return 0;
}
What decltype
does is recognize the type of a variable inserted and assume that type. In
the above case, x
is the name of the variable passed and its type is int
, so the whole
expression decltype(x)
becomes the equivalent of int
and the output is:
1
But if we write decltype(y)
, the output we get is:
1.99
because now the type that decltype
assumes is the same as y
, which is float
.
The first thing to say is that decltype
isn't used a lot, and we'll never do in our
programs. Second, this functionality has been added to C++11 and if you want to use it in
DevC++ or other old compilers it won't work. However, there's a way to
fix this in DevC++ since it supports C++11
standard, but not by default.
Syntaxβ
To create a variable this syntax is used:
type variableName = value;
so for example:
int num = 5;
The name that we give to a variable is called the "identifier". In the above examples is
variableName
and num
respectively.
A valid identifier is a sequence of one or more letters, digits and underscore characters
_
(but not used at the start of the identifier).
Neither spaces nor punctuation marks or symbols can be part of an identifier. Only letters,
digits and single underscore characters are valid. In addition, variable identifiers always
have to begin with a letter.2
Also, they cannot match any standard reserved keyword of the C++ language:
asm, auto, bool, break, case, catch, char, class, const, const_cast,
continue, default, delete, do, double, dynamic_cast, else, enum,
explicit, export, extern, false, float, for, friend, goto, if,
inline, int, long, mutable, namespace, new, operator, private,
protected, public, register, reinterpret_cast, return, short, signed,
sizeof, static, static_cast, struct, switch, template, this, throw,
true, try, typedef, typeid, typename, union, unsigned, using,
virtual, void, volatile, wchar_t, while
C++ and C languages are "case sensitive". That means that an identifier written in capital letters is completely different from another one with the same name but written in small letters.
So, for example, the num
variable is not the same as the NUM
variable or the Num
variable. These are three different identifiers identifying three
different variables.
To give a variable (or to an identifier in general) a name that is made of multiple words, programmers usually use the lowerCamelCase naming technique, where every new word after the first one starts with a capital letter. It's just a convention, so you can also not do that, but it helps separate the words when it's not possible to use spaces.
In contraposition to that convention, there is also the UpperCamelCase, also known as
Pascal Case. Also Snakecase exists, which uses the underscore _
to delimitate words
and that seems like a snake.
But why camels? The name derives from the "jumps" within a word, which bring to mind the camel's humps.
Assignmentβ
The equal sign =
is used to assign values to the variable FROM RIGHT TO LEFT.
Remember that is not the variable num
that gives its value to 5, but it's 5 that is being
stored inside num
!
Multiple variables can also be declared in a single line, separating each one with a colon:
int x = 5, y = 0, z = -31;
float a = 0.5, b = 47.999;
You can't declare different variable types like in the previous way:
int x = 5, float a = 0.5;
Initializationβ
Declare π Define π Initializeβ
When we first declare a variable, we introduce it before its first use. That process is called initialization, and if we assign a memory location and a value to the variable we have defined it.
Unexpected behaviorsβ
Usually it's better to initialize the variable at a certain value, to avoid unexpected behaviors.
int x; // you don't know what value x has
int x = 0; // preferred way
Unlike some programming languages, C/C++ does not initialize most variables to a given value (such as zero) automatically. Thus when a variable is assigned a memory location by the compiler, the default value of that variable is whatever (garbage) value happens to already be in that memory location. A variable that has not been given a known value (usually through initialization or assignment) is called an uninitialized variable.3
Let's say for example that you want to create a counter called x
that starts at 0. If you
declare that variable without specifying its value it will get a random number and not 0.
To correct that error we can write x = 0;
, to "zero-initialize" that variable.
Since C++11 there is also another method to avoid that problem and create an empty variable:
int x{};
int x{};
is a kind of default initialization, called value initialization (you can read
more about this here, but
int x{};
(copy initialization) is only supported from C++11, while int x = 0;
has
no such restriction. That said, the latter solution is probably clearer and it has the
benefit that it works with older C++ standards.