Data Types

One of the most important concepts we will cover in this course is the idea of data types. When we provide instructions to a computer, those instructions typically operate on some data. That data is stored into memory. Everything stored in memory is stored in binary essentially 0's and 1's. data types helps us interpret those 0's and 1's and define what we can and cannot do to a value.

Bits and Bytes

The most basic unit of storage in computer's memory is a bit. A single bit is only able to store a 0 or a 1. Essentially two distinct states. A single bit stores very little information as it can only store a 0 or a 1. However, if you put together multiple bits you can store a lot more information.

With two bits, you can form the following unique sequences of bits:

00, 01, 10, 11

So, with a single bit, you can store 2 different values. However, with 2 bits you can store 4 different values.

If you have 3 bits you can store the following sequences:

000, 001, 010, 011, 100, 101, 110, 111

this would be 8 different values. With each additional bit, you store double the previous possible number of values

In 8 bits, you can store 256 different values. In 32 bits, you can store over 4 billion different values

info

To find the exact number of values you can store using n bits, use the formula:

$2^n$

Thus,

Number of values in 8 bits is $2^8 = 256$

Number of values in 32 bits is $2^{32} = 4 294 967 296$

A byte is 8 bits of storage. Thus, 8 bits is 1 byte. 32 bits is 4 bytes. While the smallest unit of storage is a single bit, the smallest piece of memory that we can directly refer to is a byte.

Data Abstraction

A computer stores every piece of information it operates on as numbers (stored in 0's and 1's). Everything you see on your computer is actually just a number. The colour of a pixel for example can be represented as something called an RGB value. This is made of 3 numbers each of which range between 0 to 255. These numbers correspond to the amount of red, green and blue light in that pixel. black is 0,0,0 (all lights off), white is 255,255,255 (all lights on). The words that you are reading are sequences of encoded values representing the characters and symbols that you see. The character b for example is the number 98 using the ASCII encoding method. The computer stores 98 in memory, but when we display the information, we print the letter b. This idea of data abstraction is really important when we look at how programs work.

Since all data is simply a sequence of bits, we need to know how to interpret the bits. In C, we do this interpretation by using data types

By defining our data as a certain type, what we are defining two things:

the amount of memory we use for storing that data (how many bytes)
how to interpret the bits at that memory location

Variables

When we want to work with data in our programs, we need to define a place in memory to store that data. We do that using something called variables.

In C, to declare a variable we use the following syntax

	typename variablename;

Example:

int x;

The typename is the name of the datatype. The following are the basic data types that we will use this semester. There are other data types but we will be sticking with these ones:

char - 1 byte of storage meant to hold the encoding of a character
int - 4 bytes of storage meant to hold a whole number.
float - 4 bytes of storage meant to hold a floating point number. This is a number that has decimal places.
double - 8 bytes of storage meant to hold a floating point number. The difference between a double and a float is that doubles can store floating points more accurately.

It is also possible to declare multiple variables per line by separating them with a comma. Each variable is the same type.

typename variable1, variable2, variable3, ...;

Example:

int x, y;

caution

It is better to declare each variable on a separate line as it makes your code easier to read. Syntactically you can declare all your variables on the same line but it is not a good idea!

Initialization

When you declare a variable, you will often want to initialize the variable with an expression. This ensures that the variable has default value. The syntax to do this is to use the assignment operator:

typename variable1 = expression;
typename variable1 = expression1, variable2 = expression2, ...;

Here is a sample usage:

int a = 5;
double b = 0.6, c = 1.2 + 6;

warning

While it is possible to declare 2 or more variables on the same line, it is not good styling practice. It makes the programs less understandable. Therefore, while it is syntactically correct to declare more than one variable per line, it is not idea. It is better to declare one variable per line.

chars and ints

Both the char and int data types essentially store whole numbers (numbers without decimal places). The differences between them has to do with the amount of storage reservied. A char is only 1 byte, thus, it has only 8 bits of storage. The number of different values that you can store in 8 bits is 256. An integer on the other hand uses 4 bytes of memory and thus it can store number in 32 bits. A 32 bit value can store 4,294,967,296 numbers.

info

The exact amount of memory for each type (4 bytes for int for example) actually can change. Computers from the 1990's had integers that only used 2 bytes of memories, (16 bits). This means that an int can only hold 65,536 values.

Floating Points

The bytes of a floating point variable is split into two pieces, the exponent and the mantissa. The mantissa forms the significant digits of the value while the exponent determines the number of 0's before/after the mantissa. Essentially we are storing numbers in the computer equivalent of scientific notation.

For example take the number 123.45, in scientific notation, this number is $1.2345 X 10^2$ . The 1.2345 is the mantissa, the 2 is the exponent we are raising the 10 to. Thus, a float will use some of the bits for the mantissa and some for the exponent. Doing this allows us to store very large and very small numbers because how big or small a number is can be represented by just changing the exponent component of the floating point. However, as we are using some of our 4 bytes to store the exponent, the mantissa can only be represented using the remaining bits. Thus, while its possible to store a number larger than 2147483647 within a 32 bit float, we actually would not be able to store a number to the same accuracy.. meaning that the number stored using a float will be closer to 2147480000. Essentially we give up accuracy to store larger signficantly larger or smaller values.

Modifiers

Modifiers can be added to a variable declaration to alter the amount of storage or to modify interpretation of that storage.

unsigned

Adding the word unsigned to an integer type (int, char, long etc.) will cause all bits to be used for positive numbers only. If the word unsigned does not exist, we assume that the variable is signed, 1 bit of the variable is used to represent the sign (positive or negative) and the other bits represent the number.

For example, below are two variables. a is signed and b is unsigned. both can store over 4 billion numbers. However, a range from -2147483648 to 2147483647 while b range from 0 to 4 294 967 295. Some values are never negative, use of unsigned allows all bits to be used to represent the number which allows for larger numbers.

int a;
unsigned int b;

short

short reduces the amount of storaged used. for example:

short int x;
short x;  //we can also drop the int here and simply call the type short

This is useful if we are trying to optimize amount of memory we use... If you are writing an application on some hardware that has very limited memory, this technique may be useful.

long

long increases the amount of storaged used for a variable.

long int x;
long x;  //we can drop the int here also and simply call the type long
long long x;

Exactly how this is interpreted and the exact size of these types are somewhat in flux. For a while, int and long were the same size. If you wanted a more storage you had to declare the variable as a long long.

Type Conversions

As stated above, there are two general ways to store data; as whole numbers (int, long, char, etc.) or floating point values (float and double). Each of these types store and interpret data differently. The amount of space you have to store the data is also different. It is important to know what happens as you convert from one type to another.

Narrowing

When you try to convert a larger or more general type to a smaller or more specific type, the conversion is called narrowing. Narrowing can cause data lose and have unintended consequences. Sometimes the compilers will provide warnings when it sees this happening.

For example, whenever you convert a floating point value to an integer value you have narrowing. During this conversion process the numbers after the decimal places are truncated (chopped off). This may have unintended results. If the compiler spots that you are trying to do this, you will get warning. Try compiling the following program and you will get a warning:

#include <stdio.h>
int main(void)
{
	int myNumber = 9.87;
	printf("%d\n",myNumber);
	return 0;
}

However, that doesn't always happen and you could potentially end up with unintended errors. What do you think the output of the following blurb of code is?

#include <stdio.h>
int main(void)
{
	int myNumber = 10.12*100;
	printf("%d\n",myNumber);
	return 0;
}

Try compiling the above code and running it. Firstly you will notice that unlike the first program there was no warning. Secondly, you will notice that there is an error... the output is not as expected. The reason is that 10.12 is actually stored as something like 10.11999999. This is part of the in-precise nature of floating point values. when you multiply this by 100 you get 1011.999999. However, because type conversion is done through truncation... the .999999 part is just cut off

Promoting

Promoting is the opposite of narrowing. It happens when you go from a smaller or less general type to a larger more general type. These types of conversions happen without fear of data lose. This is why any operation that involves a floating point and integer results in a floating point value. The integer is actually promoted to a floating point.

Casting

Casting is refers to the conversion from one data type to another. Implicit casting is casting that the compiler will do automatically. Explicit casting is done when the type conversion requires the programmer to clearly state the conversion. This is done so that the programmer knows that the type of conversion they are doing is potentially error prone. To perform an explicit cast, put the type you are converting to into brackets before the expression being converted.

For example, recall that this program will generate a warning:

#include <stdio.h>
int main(void)
{
	int myNumber = 9.87;
	printf("%d\n",myNumber);
	return 0;
}

You can tell the compiler that you understand that this is potentially error prone but you still want to do it by performing an explicit cast:

#include <stdio.h>
int main(void)
{
	//the (int) is an explicit cast.
	//it is a note to the compiler to essentially state that you know there is
	//potential for information loss
	int myNumber = (int)9.87;
	printf("%d\n",myNumber);
	return 0;
}

Styling Conventions

When you declare a variable, you must provide a name for that variable. That name must be descriptive. It needs to convey what it is that the variable will store. In C, upper and lower case matter. When declaring variables, it is also best to use lower camelcase. To create a name that is lower camel case, write all words of the descriptive name in all lower case. Starting with the second word, capitalize the first character of the word.

Examples of lower Camelcase:

widthOfCar
startingSalary
hoursWorked

Unless it is a commonly known convention, single letter variable names are to be avoided.

Using x,y and z to indicate coordinates in 3D space is generally accepted so this is fine.
Using a for age, or w for width would not be common convention and thus needs to be avoided.

Declare variables on its own line. This makes your program far easier to read. It is better to do this:

//this is good
int age;
int height;

and not this:

//this is bad
int age,height;

Variables should generally be declared one per line unless two variables are very closely related.

For details on styling, check out the style guide

warning

Examples in this chapter were using single variable names as it was only illustrating the mathematical operators. This is not how you should declare variables in a real program

Bits and Bytes​

Data Abstraction​

Variables​

Initialization​

chars and ints​

Floating Points​

Modifiers​

unsigned​

short​

long​

Type Conversions​

Narrowing​

Promoting​

Casting​

Styling Conventions​