Data
One of the most important concepts we will cover in this course is the idea of data types. When we provide instructions to a computer, those instructions typically operate on some data. That data is stored into memory. Everything stored in memory is stored in binary essentially 0's and 1's. data types helps us interpret those 0's and 1's and define what we can and cannot do to a value.
Bits and Bytes
The most basic unit of storage in computer's memory is a bit. A single bit is only able to store a 0 or a 1. Essentially two distinct states. A single bit stores very little information as it can only store a 0 or a 1. However, if you put together multiple bits you can store a lot more information.
With two bits you can form the following unique sequences of bits:
00, 01, 10, 11
So, with a single bit, you can store 2 different values. However, with 2 bits you can store 4 different values.
If you have 3 bits you can store the following sequences:
000, 001, 010, 011, 100, 101, 110, 111
this would be 8 different values. With each additional bit, you store double the previous possible number of values
In 8 bits, you can store 256 different values. In 32 bits, you can store over 4 billion different values
A byte is 8 bits of storage. Thus, 8 bits is 1 byte. 32 bits is 4 bytes. While the smallest unit of storage is a single bit, the smallest piece of memory that we can directly refer to is a byte.
Data Abstraction
A computer stores every piece of information it operates on as binary values (stored in 0's and 1's). Everything you see on your computer is actually just a number. The colour of a pixel for example can be represented as something called an RGB value. This is made of 3 numbers each of which range between 0 to 255. These numbers correspond to the amount of red, green and blue light in that pixel. black is 0,0,0 (all lights off), white is 255,255,255 (all lights on). The words that you are reading are sequences of encoded values representing the characters and symbols that you see. The character b for example is the number 98 using the ASCII encoding method. The computer stores 98 in memory, but when we display the information, we print the letter b. This idea of data abstraction is really important when we look at how programs work.
Since all data is simply a sequence of bits, we need to know how to interpret the bits. In C, we do this interpretation by using data types
By defining our data as a certain type, what we are defining two things:
- the amount of memory we use for storing that data (how many bytes)
- how to interpret the bits at that memory location
Variables
When we want to work with data in our programs, we need to define a place in memory to store that data. We do that using something called variables.
In C, to declare a variable we use the following syntax
typename variablename;
Example:
int x;
The typename is the name of the datatype. The following are the basic data types that we will use this semester. There are other data types but we will be sticking with these ones:
- char - 1 byte of storage typically used to hold the encoding of a character
- int - 4 bytes of storage typically used to hold a whole number.
- float - 4 bytes of storage meant to hold a floating point number. This is a number that has decimal places.
- double - 8 bytes of storage meant to hold a floating point number. The difference between a double and a float is that doubles can store floating point values more accurately.
Initialization
When you declare a variable, you will often want to initialize the variable with an expression. This ensures that the variable has default value. The syntax to do this is to use the assignment operator:
typename variable1 = expression;
Here are some sample usage:
int a = 5;
double b = 1.2 + 3.5;
Styling Note
While it is possible to declare 2 or more variables of the same type on one line by separating the variable names with commas. However, it is not good styling practice.
It makes your code far less readable AND there are some quirky things that happens in some situations that can lead to errors. It is best to avoid this when you write your code.
| declaring variables | Good/Bad Styling |
|---|---|
| ✅ Good! |
| ❌ Bad! |
It is better to declare each variable on a separate line as it makes your code easier to read.
Therefore, although C's syntax allows declaration of multiple variables in one statement, it makes the programs harder to read and should be avoided.
chars and ints
Both the char and int data types essentially store whole numbers (numbers without decimal places). The differences between them has to do with the amount of storage reservied. A char is only 1 byte, thus, it has only 8 bits of storage. The number of different values that you can store in 8 bits is 256. An integer on the other hand uses 4 bytes of memory and thus it can store number in 32 bits. A 32 bit value can store 4,294,967,296 numbers.
The exact amount of memory for each type (4 bytes for int for example) actually can change over time. Computers from the 1990's had integers that only used 2 bytes of memories, (16 bits). This means that an int can only hold 65,536 values.
Floating Points
The bytes of a floating point variable is split into two pieces, the exponent and the mantissa. The mantissa forms the significant digits of the value while the exponent determines the number of 0's before/after the mantissa. Essentially we are storing numbers in the computer equivalent of scientific notation.
For example take the number 123.45, in scientific notation, this number is . The 1.2345 is the mantissa, the 2 is the exponent we are raising the 10 to. Thus, a float will use some of the bits for the mantissa and some for the exponent. Doing this allows us to store very large and very small numbers because how big or small a number is can be represented by just changing the exponent component of the floating point. However, as we are using some of our 4 bytes to store the exponent, the mantissa can only be represented using the remaining bits. Thus, while its possible to store a number larger than 2147483647 within a 32 bit float, we actually would not be able to store a number to the same accuracy.. meaning that the number stored using a float will be closer to 2147480000. Essentially we give up accuracy to store larger or smaller values.
Modifiers
Modifiers can be added to a variable declaration to alter the amount of storage or to modify interpretation of that storage.
const
The const modifier makes it so that the variable is not modifiable and thus creating a constant. When using this modifier it is essential that the variable is initialize as no further alterations can be made after the fact.
Contants are useful because they provide a single source for a value that may be used throughout a program. For example, it might be useful to store the something like the rate for HST (harmonized sales tax) into a constant. This tax constant can be used whenever we need to do a calculation involving the HST. If the tax rate ever changes, we only need to adjust the value in one place as opposed to everywhere in our programs.
unsigned
Adding the word unsigned to an integer type (int, char, long etc.) will cause all bits to be used for positive numbers only. If the dta type is declared without unsigned modifier we assume that the variable is signed. That is 1 bit of the variable is used to represent the sign (positive or negative) and the other bits represent the number.
For example, below are two variables. a is signed and b is unsigned. both can store over 4 billion different numbers. However, a range from -2147483648 to 2147483647 while b range from 0 to 4294967295. Since some information can never be negative, use of unsigned allows all bits to be used to represent the number which allows for storage of larger numbers.
In the source code below, at the end of each line there is a statement that starts with //. These are called comments. A comment is a statement we put into a program to describe what is happening. In practice we do not comment every line of code but it is an effective way to deliver an explanation in the code itself.
int a; // range from -2147483648 to 2147483647
unsigned int b; // range from 0 to 4294967295
short
short reduces the amount of storaged used. for example:
short int x;
short x; //we can also drop the int here and simply call the type short
This is useful if we are trying to optimize amount of memory we use... If you are writing an application on some hardware that has very limited memory, this technique may be useful.
long
The long modifier increases the amount of storaged used for a variable.
long int x;
long x; //we can drop the int here also and simply call the type long
long long x;
Exactly how this is interpreted and the exact size of these types are somewhat in flux. For a while, int and long were the same size. If you wanted a more storage you had to declare the variable as a long long.
Type Conversions
As stated above, there are two general ways to store data; as whole numbers (int, long, char, etc.) or floating point values (float and double). Each of these types store and interpret data differently. The amount of space you have to store the data is also different. It is important to know what happens as you convert from one type to another.
Narrowing
When you try to convert a larger or more general type to a smaller or more specific type, the conversion is called narrowing. Narrowing can cause data lose and have unintended consequences. Sometimes the compilers will provide warnings when it sees this happening, but not always.
For example, whenever you convert a floating point value to an integer value you have narrowing. During this conversion process the numbers after the decimal places are truncated (chopped off). This may have unintended results. If the compiler spots that you are trying to do this, you will get warning. Try compiling the following program and you will get a warning:
#include <stdio.h>
int main(void)
{
int myNumber = 9.87;
printf("%d\n",myNumber);
return 0;
}
However, that doesn't always happen and you could potentially end up with unintended errors. What do you think the output of the following blurb of code is?
#include <stdio.h>
int main(void)
{
int myNumber = 10.12*100;
printf("%d\n",myNumber);
return 0;
}
Try compiling the above code and running it.
Firstly you will notice that unlike the first program there was no warning. Secondly, you will notice that there is an error... the output is not as expected. The reason is that 10.12 is actually stored as something like 10.11999999. This is part of the imprecise nature of floating point values. when you multiply this by 100 you get 1011.999999. However, because type conversion is done through truncation... the .999999 part is just cut off
Promoting
Promoting is the opposite of narrowing. It happens when you go from a smaller or less general type to a larger more general type. These types of conversions happen without fear of data lose. This is why any operation that involves a floating point and integer results in a floating point value. The integer is actually promoted to a floating point.
Casting
Casting is refers to the conversion from one data type to another. Implicit casting is casting that the compiler will do automatically. Explicit casting is done when the type conversion requires the programmer to clearly state the conversion. This is done so that the programmer knows that the type of conversion they are doing is potentially error prone. To perform an explicit cast, put the type you are converting to into brackets before the expression being converted.
For example, recall that this program will generate a warning:
#include <stdio.h>
int main(void)
{
int myNumber = 9.87;
printf("%d\n",myNumber);
return 0;
}
You can tell the compiler that you understand that this is potentially error prone but you still want to do it by performing an explicit cast:
#include <stdio.h>
int main(void)
{
//the (int) is an explicit cast.
//it is a note to the compiler to essentially state that you know there is
//potential for information loss
int myNumber = (int)9.87;
printf("%d\n",myNumber);
return 0;
}
Constants
As stated earlier, we can turn any variable into a constant by putting the const modifier in front of the variable. For example:
const double PI = 3.14159;
This creates a double variable named pi assigns it 3.14149 and and flags every attempt at modifying pi as a syntax error. pi takes memory and the data is stored into an unchangeable variable.
Another method of creating constants is to use the pre-processor #define statement:
#define PI 3.14159
This method of creating constants does NOT create a space in memory that holds 3.14159. Instead, before the program is even compiled, every part of the code that says PI gets modified to say 3.14159 instead.
Thus, if we had this block of code:
#define PI 3.14159
int main(void)
{
double radius = 5;
double area = PI * radius * radius;
...
}
the compiler actually does not see this code... instead it sees this code:
int main(void)
{
double radius = 5;
double area = 3.14159 * radius * radius;
...
}
the PI symbol changes to 3.14159 in the program, there is no memory associated with PI. If you declared a constant using const modifier, there is an actual piece of memory that is reserved and initialized to the assigned value. the code does not actually change.
Styling Conventions - Naming and Casing
constants
Traditionally constants are spelled in SCREAMING_SNAKE_CASE
This means that if there are multiple words, we separate the words with underscore characters. We also all caps every character (thus the screaming) of the constant. This is especially important if you are using #define to create constants because we don't want our code to be accidentally modified in our code. Thus if we never create other names in SCREAMING_SNAKE_CASE, there is no chance of accidentally replacing something with our constant.
variables
When you declare a variable, you must provide a name for that variable. That name must be descriptive. It needs to convey what it is that the variable will store. In C, upper and lower case matter. When declaring variables, it is also best to use lowerCamelCase. To create a name that is lowerCamelCase, write all words of the descriptive name in all lower case. Starting with the second word, capitalize the first character of the word.
Examples of lower Camelcase:
widthOfCar
startingSalary
hoursWorked
Unless it is a commonly known convention, single letter variable names are to be avoided.
- Using x,y and z to indicate coordinates in 3D space is generally accepted so this is fine.
- Using a for age, or w for width would not be common convention and thus needs to be avoided.
Declare variables on its own line. This makes your program far easier to read. It is better to do this:
//this is good
int age;
int height;
and not this:
//this is bad
int age,height;
Variables should generally be declared one per line unless two variables are very closely related.
For details on styling, check out the style guide
Some examples in the notes use single letter variable names as it was only illustrating a concept. This is not how you should declare variables in a real program where variables store real information. Variable names should always reflect what they store