Formatted Data
Data files are often organized in a specific way. Using the formatting, we can read our data into a program by making use of regular expressions and the fscanf() function.
For example, suppose we had a file that stored all the books in a home library. Each row of the file contains information about a single book. The following describes the format of the data. Note that the < > are used to indicate the start/end of a field
<author last name>, <author first name>;<book title>;<number of pages>
Sample file:
Pratchett, Terry;Guards! Guards!;413
Watterson, Bill;Calvin and Hobbes;128
Azimov, Isaac;Foundation;320
If we wanted to work with this information, we can create a struct called Book that will hold the information about a book.
struct Book
{
char lastName[50];
char firstName[50];
char title[150];
int numPages;
};
To store information on multiple books we can create an array of Book structs. This array for example can hold up to 50 books.
struct Book myLibrary[50];
Next we can read in and store the information into each element of the array. To do this, we can write the following function. This function will open a file with name filename. If successful, the function reads the contents of the file into mylibrary[] array which has a maximum capacity of max. The function returns the total number of records successfully read.
int readBookFile(const char filename[], struct Book lib[], int max);
recall that the fscanf() function returns the number of fields properly read from a file. In this case we have 4 fields in each line of the record. Thus, we can use this to essentially loop through the file reading line by line checking to see if we read all 4 fields. When that fails, we are done and return number of records read.
int readBookFile(const char filename[], struct Book lib[], int max)
{
int numBooks = 0;
//open the file in read mode. Note how we use the variable filename
//in this example and we don't use "filename" because we are opening
//whatever file we call the function with. The name of our file can
//be anything. "filename" means the name of our file is literally
//"filename"
FILE* fp = fopen(filename,"r");
//check to ensure the file opened
if(fp != NULL){
while(numBooks < max && fscanf("%[^,], %[^;];%[^;];%d\n", lib[numBooks].lastName,
lib[numBooks].firstName, lib[numBooks].title, &(lib[numBooks].pages)) == 4){
numBooks++;
}
}
return numBooks;
}
C performs something called lazy evaluation of statements involving &&(and) and ||(or). In our loop we have two expressions that we check with an && operation between them.
The first part numBooks < max, ensures we do not store anything past the end of the array.
With lazy evaluation in any situation where we are testing A && B, the evaluation stops if A is false. The reason is because A && B is only true if both A and B are true. Thus, if A is false we do not need to test B... we can be lazy...
This is not just to make our code faster by not evaluating B.. its also very useful for our situation here. if numBooks < max is false, it means we have filled our array. We do not want to do fscanf() at this point because lib[numBooks] will be outside the bounds of our array. This can lead to wierd hard to debug errors and crashes.
If we were to swap the two parts of our && expression, we could potentially read something into memory that is outside our array before checking that we are out of range. Keep this in mind when you write your code.