Introduction to C using lcc-win
jacob navia
Contents
1
Introduction to C
11
1.1
Why learn C?
11
1.2
Program organization
12
1.3
Hello
13
Program input
14
What are “function parameters” ?
15
Console mode programs and windows programs
16
1.4
An overview of the compilation process .
17
1.4.1
The run time environment
18
We wrote the program first
18
We compiled our design
19
Run time
19
1.5
An overview of the standard libraries . .
19
The “stdheaders.h” include file .
20
1.5.1
Passing arguments to a program
20
Implementation details
23
1.6
Iteration constructs
23
1.6.1
for
23
1.6.2
while
25
1.6.3
do
25
1.6.4
break and continue
25
1.7
Types
25
1.7.1
What is a type?
26
1.7.2
Types classification
27
1.7.3
Integer types
29
1.7.4
Floating types
29
1.7.5
Compatible types
29
1.7.6
Incomplete types
30
1.7.7
Qualified types
30
1.7.8
Casting
31
1.7.9
The basic types
31
1.8
Declarations and definitions
31
1.8.1
Variable declaration
33
1.8.2
Function declarations
35
1.8.3
Function definitions
36
1.8.4
Scope of identifiers
37
3
1.8.5
Linkage and duration of objects . .
37
1.8.6
Variable definition
38
1.8.7
Statement syntax
38
1.9
Errors and warnings
38
1.10 Input and output
40
1.10.1
Predefined devices
41
1.10.2
The typical sequence of operations
42
1.10.3
Examples
42
1.10.4
Other input/output functions
48
The current position
48
1.10.5
File buffering
49
Error conditions
50
1.11 Commenting the source code
50
1.11.1
Describing a function
51
1.11.2
Describing a file
53
1.12 An overview of the whole language
53
1.12.1
Statements
54
1.12.2
Declarations
58
1.12.3
Pre-processor
59
1.12.4
Control-flow
61
1.12.5
Extensions of lcc-win
62
2
A closer view
65
2.1
Identifiers
65
2.1.1
Identifier scope and linkage
66
2.2
Constants
67
2.2.1
Evaluation of constants . .
67
Constant expressions
68
2.2.2
Integer constants
69
2.2.3
Floating constants
70
2.2.4
Character string constants .
70
2.2.5
Character abbreviations . .
71
2.3
Arrays
72
2.3.1
Variable length arrays
74
2.3.2
Array initialization
74
2.3.3
Compound literals
75
2.4
Function calls
75
2.4.1
Prototypes
76
2.4.2
Functions with variable number of arguments. .
77
Implementation details
77
2.4.3
stdcall
78
2.4.4
Inline
78
2.5
Assignment
79
2.6
The four operations
79
2.6.1
Integer division
79
2.6.2
Overflow
80
2.6.3
Postfix
80
2.7
Conditional operator
81
2.8
Register
82
2.8.1
Should we use the register keyword?
82
2.9
Sizeof
82
2.10
Enum
82
2.10.1
Const
83
Implementation details
83
2.11
Goto
83
2.12
Break and continue statements
84
2.13
Return
85
2.13.1
Two types of return statements
85
2.13.2
Returning a structure
86
2.13.3
Never return a pointer to a local variable
86
2.13.4
Unsigned
86
2.14
Null statements
86
2.15
Switch statement
87
2.16
Logical operators
88
2.17
Bitwise operators
89
2.18
Shift operators
90
2.19
Address-of operator
91
2.20
Indirection
91
2.21
Sequential expressions
92
2.22
Casts
93
2.22.1
When to use casts
93
2.22.2
When not to use casts
94
2.23
Selection
94
2.24
Predefined identifiers
96
2.25
Precedence of the different operators
96
2.26
The printf family
97
2.26.1
Conversions
98
2.26.2
The minimum field width
99
2.26.3
The precision
99
2.26.4
The conversions
99
2.26.5
Scanning values
100
2.27
Pointers
103
2.27.1
Operations with pointers
105
2.27.2
Addition or subtraction of a displacement: pointer arithmetic
106
2.27.3
Subtraction
106
2.27.4
Relational operators
107
2.27.5
Null pointers
107
2.27.6
Pointers and arrays
107
2.27.7
Assigning a value to a pointer
107
2.27.8
References
108
2.27.9
Why pointers?
109
2.28
setjmp and longjmp
109
2.28.1
General usage
109
2.28.2
Register variables and longjmp
112
2.29 Time and date functions
113
3
Simple programs
117
3.1
strchr
117
3.1.1
How can strchr fail?
117
3.2
strlen
118
3.2.1
A straightforward implementation . .
118
3.2.2
An implementation by D. E. Knuth
118
3.2.3
How can strlen fail?
120
3.3
ispowerOfTwo
120
3.3.1
How can this program fail?
121
3.3.2
Write ispowerOfTwo without any loops .
121
3.4
signum
122
3.5
strlwr
123
3.5.1
How can this program fail?
123
3.6
paste
124
3.6.1
How can this program fail?
126
3.7
Using arrays and sorting
128
3.7.1
How to sort arrays
131
3.7.2
Other qsort applications
136
3.7.3
Quicksort problems
138
3.8
Counting words
140
3.8.1
The organization of the table . .
142
3.8.2
Memory organization
144
3.8.3
Displaying the results
146
3.8.4
Code review
147
3.9
Hexdump
148
3.9.1
Analysis
150
3.9.2
Exercises
151
3.10
Text processing
152
3.10.1
Detailed view
158
main
158
ProcessChar
158
ReadLongComment and ReadLineComment
158
ReadCharConstant
158
OutputStrings
158
3.10.2
Analysis
158
3.10.3
Exercises:
159
3.11
Using containers
160
4
Structures and unions
163
4.1
Structures
163
4.1.1
Structure size
167
4.1.2
Using the pragma pack feature
168
4.1.3
Structure packing in other environments . .
169
Gcc
169
Hewlett Packard
169
IBM
169
Comeau computing C
169
Microsoft
170
4.1.4
Bit fields
170
4.2
Unions
170
4.3
Using structures
173
4.4
Basic data structures
175
4.4.1
Lists
175
4.4.2
Hash tables
179
4.4.3
The container library of lcc-win .
180
4.5
Fine points of structure use
181
5
Simple programs using structures
183
5.1
Reversing a linked list
183
5.1.1
Discussion
185
An improvement
185
Preconditions
185
6
A closer look at the pre-processor
189
6.1
Preprocessor commands
191
6.1.1
Preprocessor macros
191
6.2
Conditional compilation
192
6.3
The pragma directive
193
6.4
Token concatenation
193
6.5
The # operator
194
6.6
The include directive
195
6.7
Things to watch when using the preprocessor
195
7
More advanced stuff
197
7.1
Using function pointers
197
7.2
Using the "signal" function
202
7.2.1
Discussion
204
longjmp usage
204
Guard pages
204
8
Advanced C programming with lcc-win
205
8.1
Operator overloading
206
8.1.1
What is operator overloading?
206
8.1.2
Rules for the arguments
209
8.1.3
Name resolution
210
8.1.4
Differences to C++
210
8.2
Generic functions
211
8.2.1
Usage rules
212
8.3
Default arguments
212
8.4
References
213
9
Numerical programming
215
9.1
Floating point formats
216
9.1.1
Float (32 bit) format
216
9.1.2
Long double (80 bit) format
217
9.1.3
The qfloat format
218
9.1.4
Special numbers
218
9.2
Range
219
9.3
Precision
221
9.4
Understanding exactly the floating point format . .
224
9.5
Rounding modes
225
9.6
The machine epsilon
226
9.7
Rounding
227
9.8
Using the floating point environment .
228
9.8.1
The status flags
229
9.8.2
Reinitializing the floating point environment .
229
9.9
Numerical stability
230
9.9.1
Algebra doesn’t work
232
9.9.2
Underflow
232
9.10 The math library
234
10
Memory management and memory layout
241
10.1
Functions for memory management
243
10.2
Memory management strategies
243
10.2.1
Static buffers
243
Advantages:
243
Drawbacks:
244
10.3
Stack based allocation
244
Advantages:
244
Drawbacks:
244
10.3.1
“Arena” based allocation
245
Advantages:
245
Drawbacks:
245
10.4
The malloc / free strategy
246
Advantages:
246
Drawbacks:
246
10.5
The malloc with no free strategy
247
Advantages:
247
Drawbacks:
247
10.6
Automatic freeing (garbage collection). .
247
Advantages:
247
Drawbacks:
247
10.7
Mixed strategies
248
10.8
A debugging implementation of malloc
248
10.8.1
Improving allocate/release . .
251
11
The libraries of lcc-win
253
11.1
The regular expressions library. A “grep” clone.
254
11.2
Using qfloats: Some examples
258
Contents
9
11.3 Using bignums: some examples
258
12
Pitfalls of the C language
261
12.1 Defining a variable in a header file
261
12.2 Confusing = and ==
261
12.3 Forgetting to close a comment
261
12.4 Easily changed block scope
262
12.5 Using increment or decrement more than once in an expression.
262
12.6 Unexpected Operator Precedence
262
12.7 Extra Semi-colon in Macros
263
12.8 Watch those semicolons!
264
12.9 Assuming pointer size is equal to integer size
264
12.10Careful with unsigned numbers
264
12.11Changing constant strings
264
12.12Indefinite order of evaluation
265
12.13A local variable shadows a global one
266
12.14Careful with integer wraparound
266
12.15Problems with integer casting
267
12.16Octal numbers
267
12.17Wrong assumptions with realloc
267
12.18Be careful with integer overflow
268
12.18.1 Overflow in calloc
268
12.19The abs macro can yield a negative number
268
12.20Adding two positive numbers might make the result smaller. .
269
12.21Assigning a value avoiding truncation
269
12.22The C standard
270
12.22.1 Standard word salads
270
12.22.2 A buffer overflow in the C standard document
272
Getting rid of buffer overflows
273
Buffer overflows are not inevitable
274
The attitude of the committee
274
12.22.3 A better implementation of asctime
275
13
Bibliography
279
Appendices
281
.1
Using the command line compiler
283
1
Introduction to C
This book supposes you have the lcc-win compiler system installed. You will need a
compiler anyway, and lcc-win is free for you to use, so please (if you haven’t done that
yet) download it and install it before continuing. http://www.q-software-solutions.de
What the C language concerns, this is not a full-fledged introduction to all of C.
There are other, better books that do that (see the bibliography at the end of this
book). Even if I try to explain things from ground up, there isn’t here a description
of all the features of the language. Note too, that this is not just documentation or
a reference manual. Functions in the standard library are explained, of course, but
no exhaustive documentation of any of them is provided in this tutorial.
But before we start, just a quick answer to the question:
1.1
Why learn C?
C has been widely criticized, and many people are quick to show its problems and
drawbacks. But as languages come and go, C stands untouched. The code of lcc-win
has software that was written many years ago, by many people, among others by
Dennis Ritchie, the creator of the language itself. The answer to this question is very
simple: if you write software that is going to stay for some time, do not learn “the
language of the day”: learn C.
C doesn’t impose you any point of view. It is not object oriented, but you can
do object oriented programming in C if you wish. Objective C generates C, as does
Eiffel and several other object-oriented languages. C, precisely because of this lack
of a programming model is adapted to express all of them. Even C++ started as a
pre-processor for the C compiler.
C is not a functional language but you can do functional programming with it if
you feel like. See the “Illinois FP” language implementations in C, and many other
functional programming languages that are coded in C.
Most LISP interpreters and Scheme interpreters/compilers are written in C. You
can do list processing in C, surely not so easily like in lisp, but you can do it. It
has all essential features of a general purpose programming language like recursion,
procedures as first class data types, and many others that this tutorial will show you.
Many people feel that C lacks the simplicity of Java, or the sophistication of
C++ with its templates and other goodies. True. C is a simple language, without
any frills. But it is precisely this lack of features that makes C adapted as a first
time introduction into a complex high-level language that allows you fine control over
11
12
Chapter 1. Introduction to C
what your program is doing without any hidden features. The compiler will not do
anything else than what you told it to do.
The language remains transparent, even if some features from Java like the
garbage collection are incorporated into the implementation of C you are going to
use.1
As languages come and go, C remains. It was at the heart of the UNIX operating
system development in the seventies, it was at the heart of the microcomputer revo-
lution in the eighties, and as C++, Delphi, Java, and many others came and faded,
C remained, true to its own nature. Today, the linux kernel is written completely
in C together with many other operating systems, window systems, and many other
applications.
1.2
Program organization
A program in C is written in one or several text files called source modules. Each of
those modules is composed of functions, i.e. smaller pieces of code that accomplish
some task, and data, i.e. variables or tables that are initialized before the program
starts. There is a special function called main that is where the execution of the
program begins.
In C, the organization of code in files has semantic meaning. The main source file
given as an argument to the compiler defines a compilation unit. Each compilation
unit defines a name space, i.e. a scope. Within this name space each name is unique
and defines only one object.
A unit can import common definitions using the #include preprocessor directive,
or just by declaring some identifier as extern.
C supports the separate compilation model, i.e. you can split the program in
several independent units that are compiled separately, and then linked with the link
editor to build the final program.
Normally each module is written in a separate text file that contains functions
or data declarations. Interfaces between modules are written in “header files” that
describe types or functions visible to several modules of the program. Those files
have a “.h” extension, and they come in two flavours: system-wide, furnished with
lcc-win, and private, specific to the application you are building.
Each module has in general one or several functions, i.e. pieces of code that
accomplish some task, reading some data, performing some calculations, or organizing
several other functions into some bigger aggregate. There is no distinction between
functions and procedures in C. A procedure is a function of return type void.
A function has a parameter list, a body, and possibly a return value. The body
can contain declarations for local variables, i.e. variables activated when execution
reaches the function body. The body is a series of expressions separated by semi-
colons. Each statement can be an arithmetic operation, an assignment, a function
call, or a compound statement, i.e. a statement that contains another set of state-
ments.
1Lisp and scheme, two list oriented languages featured automatic garbage collection for decades.
APL and other interpreters offered this feature too. Lcc-win offers you the garbage collector devel-
oped by Hans Boehm.
1.3. Hello
13
1.3
Hello
To give you an idea of the flavor of C we use the famous example given already by
the authors of the language. We build here a program that when run will put in the
screen the message “hello”.
This example is a classic, and appears already in the tutorial of the C language
published by B. W. Kernighan in 1974, four years before the book “The C program-
ming language” was published. Their example would still compile today, albeit with
some warnings:
main() { printf(“Hello world\n”); }
In today’s C the above program would be:
#include <stdio.h>
(1)
int main(void)
(2)
{
(3)
printf("Hello\n");
(4)
return 0;
(5)
}
(6)
Note that obviously the numbers in parentheses are not part of the program text but
are in there so that I can refer to each line of the program.
1) Using a feature of the compiler called ‘pre-processor’, you can textually include
a whole file of C source with the #include directive. In this example we include from
the standard includes of the compiler the “stdio.h” header file. You will notice that
the name of the include file is enclosed within a < > pair. This indicates the compiler
that it should look for this include file in the standard include directory, and not in
the current directory. If you want to include a header file in another directory or in
the compilation directory, use the double quotes to enclose the name of the file, for
instance #include "myfile.h"
2) We define a function called “main” that returns an integer as its result, and
receives no arguments (void). All programs in C have a function called main, and it is
here that all programs start. The “main” function is the entry-point of the program.
3) The body of the function is a list of statements enclosed by curly braces.
4) We call the standard function “printf” that formats its arguments and displays
them in the screen. A function call in C is written like this: function-name ‘(‘
argument-list ‘)’. In this case the function name is “printf”, and its argument list
is the character string "Hello\n”. Character strings in C are enclosed in double
quotes, and can contain sequences of characters that denote graphical characters like
new line (\n) tab (\t), backspace (\b), or others. In this example, the character
string is finished by the new line character \n. See page 50 for more on character
string constants, page 54 for function call syntax.
5) The return statement indicates that control should be returned (hence its
name) to the calling function. Optionally, it is possible to specify a return result, in
this case the integer zero.
6) The closing brace finishes the function scope.
Programs in C are defined in text files that normally have the .c file extension.
You can create those text files with any editor that you want, but lcc-win proposes a
14
Chapter 1. Introduction to C
specialized editor for this task called “Wedit”. This program allows you to enter the
program text easily, since it is adapted to the task of displaying C source text. To
make this program then, we start Wedit and enter the text of that program above.
Program input
If you know how an integrated development environment (IDE) works, you can skip
this section.
When you click in the icon of lcc-win, you start a program designed to make it
easy for you to type your program and to find information about it. When you start
it for the first time, it will display a blank window, expecting that you tell it what it
should do.
The first thing to do is to go to the “File” menu, and select the New-> File item.
This will indicate to the IDE that you want to start a new program module. You get
prompted with a small window that asks for the file name. Just enter “hello.c”.
You will see than that a blank sheet of paper opens, where you can enter the text
of the program. You should type the program as shown and pay attention to avoid
any typing mistake. Remember: the machine doesn’t understand anything. If you
forget a quote, or any special sign it will not work and the compiler will spit error
messages that can be confusing. Check that you type exactly what you see above.
Once this is done, you can compile, and link-edit your program by just clicking
in the compile menu or pressing F9.2
2If this doesn’t work or you receive warnings, you have an installation problem (unless you made
a typing mistake). Or maybe I have a bug. When writing mail to me do not send messages like: “It
doesn’t work”. Those messages are a nuisance since I can’t possibly know what is wrong if you do
not tell me exactly what is happening. Wedit doesn’t start? Wedit crashes? The computer freezes?
The sky has a black color?
Keep in mind that in order to help you I have to reproduce the problem in my setup. This is
impossible without a detailed report that allows me to see what goes wrong.
Wedit will make a default project for you, when you click the “compile” button. This can go
wrong if there is not enough space in the disk to compile, or the installation of lcc-win went wrong
and Wedit can’t find the compiler executable, or many other reasons. If you see an error message
1.3. Hello
15
To run the program, you use the “execute” option in the “Compiler” menu (or you
type Ctrl+F5), or you open a command shell and type the program’s name. Let’s
do it the hard way first.
The first thing we need to know is the name of the program we want to start.
This is easy; we ask the IDE (Wedit) about it using the “Executable stats” option in
the “Utils” menu. We get the following display.
We see at the first line of the bottom panel, that the program executable is called:
h:\lcc\projects\hello.exe.
We open a command shell window, and type the command:
C:\>h:\lcc\projects\lcc1\hello.exe
Hello
C:\>
Our program displays the character string “Hello” and then a new line, as we wanted.
If we erase the \n of the character string, press F9 again to recompile and link, the
display will be:
C:\>h:\lcc\projects\lcc1\hello.exe
Hello
C:\>
But how did we know that we have to call “printf” to display a string? Because the
documentation of the library told us so. . . The first thing a beginner to C must do
is to get an overview of the libraries provided already with the system so that he/she
doesn’t waste time rewriting programs that can be already used without any extra
effort. Printf is one of those, but are several thousands of pre-built functions of all
types and for all tastes. We present an overview of them in the next section.
What are “function parameters” ?
When you have a function like:
int fn(int a) { ... }
the argument (named a) is copied into a storage area reserved by the compiler for the
functions arguments. Note that the function fn will use only a copy, not the original
value. For instance:
int fn1(int a)
{
a = a+7;
return a;
}
int fn2(void)
please do not panic, and try to correct the error the message is pointing you to.
A common failure happens when you install an older version of Wedit in a directory that has
spaces in it. Even if there is an explicit warning that you should NOT install it there, most people
are used to just press return at those warnings without reading them. Then, lcc-win doesn’t work
and they complain to me. I have improved this in later versions, but still problems can arise.
16
Chapter 1. Introduction to C
{
int b = 7;
fn1(b);
return b;
}
The fn2 function will always return 7, because function fn1 works with a copy of b,
not with b itself. This is known as passing arguments by value. This rule will not be
used for arrays, in standard C. When you see a statement like:
printf("Hello\n");
it means that the address of the first element is passed to “printf”, not a copy of the
whole character array. This is of course more efficient than making a copy, but there
is no free lunch. The cost is that the array can be modified by the function you are
calling. More about this later.
Console mode programs and windows programs
Windows makes a difference between text mode programs and windows programs. In
the first part of this book we will use console programs, i.e. programs that run in a
text mode window receiving only textual input and producing text output. Those are
simpler to build than the more complicated GUI (Graphical User Interface) programs.
Windows knows how to differentiate between console/windows programs by look-
ing at certain fields in the executable file itself. If the program has been marked
by the compiler as a console mode program, windows opens a window with a black
background by default, and initializes the standard input and standard output of the
program before it starts. If the program is marked as a windows program, nothing
is done, and you can’t use the text output or input library functions.
For historical reasons this window is called sometimes a “DOS” window, even if
there is no MSDOS since more than a decade. The programs that run in this console
window are 32 bit programs and they can open a window if they wish. They can
use all of the graphical features of windows. The only problem is that an ugly black
window will be always visible, even if you open a new window.
You can change the type of program lcc-win will generate by checking the corre-
sponding boxes in the “Linker” tab of the configuration wizard, accessible from the
main menu with “Project” then “Configuration”.
Under other operating systems the situation is pretty much the same. Linux offers
a console, and even the Macintosh has one too. In many situations typing a simple
command sequence is much faster than clicking dozens of menus/options till you get
where you want to go. Besides, an additional advantage is that console programs
are easier to automate and make them part of bigger applications as independent
components that receive command-line arguments and produce their output without
any human intervention.
1.4. An overview of the compilation process
17
1.4
An overview of the compilation process
When you press F9 in the editor, a complex sequence of events, all of them invisible
to you, produce an executable file. Here is a short description of this, so that at least
you know what’s happening behind the scene.
Wedit calls the C compiler proper. This program is called lcc.exe and is in the
installation directory of lcc, in the bin directory. For instance, if you installed lcc in
c:\lcc, the compiler will be in c:\lcc\bin.
This program will read your source file, and produce another file called object
file, that has the same name as the source file but a .obj extension under windows,
or a .o extension under linux. C supports the separate compilation model, i.e. you
can compile several source modules producing several object files, and rely in the
link-editor lcclnk.exe to build the executable.
Lcclnk.exe is the link-editor, or linker for short. This program reads different
object files, library files and maybe other files, and produces either an executable file
or a dynamically loaded library, a DLL.
When compiling your hello.c file then, the compiler produced a “hello.obj” file,
and from that, the linker produced a hello.exe executable file. The linker uses several
files that are stored in the \lcc\lib directory to bind the executable to the system
DLLs, used by all programs: kernel32.dll, crtdll.dll, and many others.
The workings of the lcc compiler are described in more detail in the technical
documentation. Here we just tell you the main steps.
• The source file is first pre-processed. The #include directives are resolved, and
the text of the included files is inserted into the source file.
The result of this process can be seen if you call the compiler with the -E
flag. For instance, to see what is the result of pre-processing the hello.c file
you call the compiler in a command shell window with the command line:
lcc -E hello.c.
The resulting file is called hello.i. The i means intermediate file.
• The front end of the compiler proper processes the resulting text. Its task
is to generate a series of intermediate code statements. Again, you can see
the intermediate code of lcc by calling the compiler with lcc -z hello.c.
This will produce an intermediate language file called hello.lil that contains the
intermediate language statements.
• The code generator takes those intermediate instructions and emits assembler
instructions from them. Assembly code can be generated with the lcc -S
hello.c command. The generated assembly file will be called hello.asm. The
generated file contains a listing of the C source and the corresponding transla-
tion into assembly language.
• The assembler takes those assembly instructions and emits the encodings that
the integrated circuit can understand, and packages those encodings in a file
called object file that under Windows has an .obj extension, and under Unix a
o, extension This file is passed then (possibly with other object files) to the
linker lcclnk that builds the executable.
18
Chapter 1. Introduction to C
Organizing all those steps and typing all those command lines can be boring. To
make this easier, the IDE will do all of this with the F9 function key.
1.4.1
The run time environment
The program starts in your machine. A specific operating system is running, a certain
file and hard disk configuration is present; you have so many RAM chips installed,
etc. This is the run-time environment.
The file built by the linker lcclnk is started through a user action (you double
click in its icon) or by giving its name at a command shell prompt, or by the action
of another program that requests to the operating system to start it.
The operating system accesses the hard disk at the specified location, and reads
all the data in the file into RAM. Then, it determines where the program starts, and
sets the program counter of the printed circuit in your computer to that memory
location.
The piece of code that starts is the “startup” stub, a small program that does
some initialization and calls the “main” procedure. It pushes the arguments to main
in the same way as for any other procedure.
The main function starts by calling another function in the C library called
“printf”. This function writes characters using a “console” emulation, where the win-
dow is just text. This environment is simpler conceptually, and it is better suited to
many things for people that do not like to click around a lot.
The printf function deposits characters in the input buffer of the terminal emula-
tion program, that makes the necessary bits change color using the current font, and
at the exact position needed to display each glyph. Windows calls the graphic drivers
in your graphic card that control the video output of the machine with those bits to
change. The bits change before your hand has had the time to move a millimeter.
Graphic drivers are fast today, and in no time they return to windows that returns
control to the printf function.
The printf function exits, then control returns to main, that exits to the startup,
that calls ExitProcess, and the program is finished by the operating system
Your hand is still near the return key.
We have the following phases in this process:
• Design-time. We wrote the program first.
• Compile-time. We compiled our design.
• Run-time. The compiled instructions are started and the machine executes
what we told it to do.
We wrote the program first
The central point in communicating with a printed circuit is the programming lan-
guage you use to define the sequence of operations to be performed. The sequence is
prepared using that language, first in your own circuit, your brain, then written down
with another (the keyboard controller), then stored and processed by yet another, a
personal computer (PC).
1.5. An overview of the standard libraries
19
We compiled our design
Compiled languages rely on piece of software to read a textual representation first,
translating it directly into a sequence of numbers that the printed circuit understands.
This is optionally done by assembling several pieces of the program together as a unit.
Run time
The operating system loads the prepared sequence of instructions from the disk into
main memory, and passes control to the entry point. This is done in several steps.
First the main executable file is loaded and then all the libraries the program needs.
When everything has been mapped in memory, and all the references in each part
have been resolved, the OS calls the initialization procedures of each loaded library.
If everything goes well, the OS gives control to the program entry point.
1.5
An overview of the standard libraries
You remember that we stressed that in our hello.c program you should include the
stdio.h system header file. OK, but how do you know which header file you need?
You have to know which header declares which functions. These headers and the
associated library functions are found in all C99 compliant compilers.
Header
Purpose
assert.h
Diagnostics for debugging help.
complex.h
Complex numbers definitions.
ctype.h
Character classification (isalpha, islower, isdigit)
errno.h
Error codes set by the library functions
fenv.h
Floating point environment. Functions concern-
ing the precision
of the calculations, exception handling, and re-
lated items.
See page 228
float.h
Characteristics of floating types (float, double,
long double, qfloat).
See page 216
inttypes.h
Characteristics of integer types
iso646.h
Alternative spellings for some keywords. If you
prefer writing the
operator && as and, use this header.
limits.h
Size of integer types.
locale.h
Formatting of currency values using local con-
ventions.
math.h
Mathematical functions.
setjmp.h
Non local jumps, i.e. jumps that can go past
function boundaries.
See page 87.
signal.h
Signal handling. See page 196.
20
Chapter 1. Introduction to C
stdbool.h
Boolean type and values
stddef.h
Defines macros and types that are of general use
in a program.
NULL, offsetof, ptrdiff_t, size_t, and several
others.
stdint.h
Portable integer types of specific widths.
stdio.h
Standard input and output.
stdlib.h
Standard library functions.
string.h
String handling. Here are defined all functions
that deal with the
standard representation of strings as used in C.
See “Traditional string representation in C” on
page 138.
stdarg.h
Functions with variable number of arguments
are described here.
See page 55.
time.h
Time related functions.See page 165.
tgmath.h
Type-generic math functions
wchar.h
Extended multibyte/wide character utilities
wctype.h
Wide character classification and mapping utili-
ties
The “stdheaders.h” include file
Normally, it is up to you to remember which header contains the declaration of which
function. This can be a pain, and it is easy to confuse some header with another. To
avoid this overloading of the brain memory cells, lcc-win proposes a “stdheaders.h”
file, that consists of :
#include <assert.h>
#include <complex.h>
etc
Instead of including the standard headers in several include statements, you just
include the “stdheaders.h” file and you are done with it. True, there is a very slight
performance lost in compilation time, but it is not really significant.
1.5.1
Passing arguments to a program
We can’t modify the behavior of our hello program with arguments. We have no
way to pass it another character string for instance, that it should use instead of
the hard-wired "hello\n”. We can’t even tell it to stop putting a trailing new line
character.
Programs normally receive arguments from their environment. A very old but
still quite effective method is to pass a command line to the program, i.e. a series of
character strings that the program can use to access its arguments.
Let’s see how arguments are passed to a program.
1.5. An overview of the standard libraries
21
#include <stdio.h>
(1)
int main(int argc,char *argv[])
(2)
{
int count ;
(3)
for (count=0;count < argc;count++) {
(4)
printf(
(5)
"Argument %d = %s\n",
count,
argv[count]);
}
(6)
return 0;
}
1.
We include again stdio.h
2.
We use a longer definition of the “main” function as before. This one is as
standard as the previous one, but allows us to pass parameters to the program.
There are two arguments:
int argc This is an integer that in C is known as “int”. It contains the number
of arguments passed to the program plus one.
char *argv[] This is an array of pointers to characters containing the actual
arguments given. For example, if we call our program from the command line
with the arguments “foo” and “bar”, the argv[ ] array will contain:
argv[0] The name of the program that is running.
argv[1] The first argument, i.e. “foo”.
argv[2] The second argument, i.e. “bar”.
We use a memory location for an integer variable that will hold the current
argument to be printed. This is a local variable, i.e. a variable that can only be
used within the enclosing scope, in this case, the scope of the function “main”.
3.
Local variables are declared (as any other variables) with: <type> identifier;
For instance int a; double b; char c; Arrays are declared in the same fash-
ion, but followed by their size in square brackets:
int a[23]; double b[45]; char c[890];
4.
We use the “for” construct, i.e. an iteration. See the explanations page 23.
5.
We use again printf to print something in the screen. This time, we pass to
printf the following arguments:
"Argument %d = ‘%s’\n"
count
argv[count]
Printf will scan its first argument. It distinguishes directives (introduced with a
per-cent sign %), from normal text that is outputted without any modification.
In the character string passed there are two directives a %d and a %s. The first
22
Chapter 1. Introduction to C
one means that printf will introduce at this position, the character representa-
tion of a number that should also be passed as an argument. Since the next
argument after the string is the integer “count”, its value will be displayed at
this point. The second one, a %s means that a character string should be in-
troduced at this point. Since the next argument is argv[count], the character
string at the position “count” in the argv[ ] array will be passed to printf that
will display it at this point.
6. We finish the scope of the for statement with a closing brace. This means, the
iteration body ends here.
Now we are ready to run this program. Suppose that we have saved the text of the
program in the file “args.c”. We do the following:3
h:\lcc\projects\args> lcc args.c
h:\lcc\projects\args> lcclnk args.obj
We first compile the text file to an object file using the lcc compiler. Then, we link
the resulting object file to obtain an executable using the linker lcclnk. Now, we can
invoke the program just by typing its name:
h:\lcc\projects\args> args
Argument 0 = args
We have given no arguments, so only argv[0] is displayed, the name of the program,
in this case “args”. Note that if we write:
h:\lcc\projects\args> args.exe
Argument 0 = args.exe
The name of the program changed from "args" to args.exe", its full name. We can
even write:
h:\lcc\projects\args> h:\lcc\projects\args.exe
Argument 0 = h:\lcc\projects\args.exe
Now the full path is displayed. But that wasn’t the objective of the program. More
interesting is to write:
h:\lcc\projects\args> args foo bar zzz
Argument 0 = args
Argument 1 = foo
Argument 2 = bar
Argument 3 = zzz
The program receives 3 arguments, so argc will have a value of 4. Since our variable
count will run from 0 to argc-1, we will display 4 arguments: the zeroth, the first,
the second, etc.
3We use the toolsdir >www for compiling 32 bits programs. If you want to use the 64 bit tools
use lcc64 and lcclnk64.
1.6. Iteration constructs
23
Implementation details
The arguments are retrieved from the operating system by the code that calls ‘main’.
Some operating systems provide a specific interface for doing this; others will pass
the arguments to the startup. Since C can run in circuit boards where there is no
operating system at all, in those systems the ‘main’ function will be defined as always
int main(void).
1.6
Iteration constructs
We introduced informally the “for” construct above, but a more general introduction
to loops is necessary to understand the code that will follow. There are three iteration
constructs in C: “for”, “do”, and “while”.
1.6.1
for
The “for” construct has
1. An initialization part, i.e. code that will be always executed before the loop
begins,
2. A test part, i.e. code that will be executed at the start of each iteration to
determine if the loop has reached the end or not, and
3. An increment part, i.e. code that will be executed at the end of each iteration.
Normally, the loop counters are incremented (or decremented) here.
The general form is then:
for(init ; test ; increment) {
statement block
}
Within a for statement, you can declare variables local to the “for” loop. The scope
of these variables is finished when the for statement ends.
#include <stdio.h>
int main(void)
{
for (int i = 0; i< 2;i++) {
printf("outer i is %d\n",i);
for (int i = 0;i<2;i++) {
printf("i=%d\n",i);
}
}
return 0;
}
The output of this program is:
outer i is 0
i=0
24
Chapter 1. Introduction to C
i=1
outer i is 1
i=0
i=1
Note that the scope of the identifiers declared within a ‘for’ scope ends just when the
for statement ends, and that the ‘for’ statement scope is a new scope. Modify the
above example as follows to demonstrate this:
#include <stdio.h>
int main(void)
{
for (int i = 0; i< 2;i++) {
1
printf("outer i is %d\n",i);
2
int i = 87;
for (int i = 0;i<2;i++) {
3
printf("i=%d\n",i);
4
}
5
}
6
return 0;
7
}
At the innermost loop, there are three identifiers called ‘i’.
• The first i is the outer i. Its scope goes from line 1 to 6 — the scope of the for
statement.
• The second i (87) is a local identifier of the compound statement that begins
in line 1 and ends in line 7. Compound statements can always declare local
variables.
• The third i is declared at the innermost for statement. Its scope starts in line
4 and goes up to line 6. It belongs to the scope created by the second for
statement.
Note that for each new scope, the identifiers of the same name are shadowed by the
new ones, as you would normally expect in C. When you declare variables in the first
part of the for expression, note that you can add statements and declarations, but
after the first declaration, only declarations should follow. For instance, if you have:
struct f {int a,b};
struct f StructF;
for (StructF.a = 6,int i=0; i<10;i++)
is allowed, but NOT
for (int i=0, StructF.a = 67; i<10; i++) // Syntax error
1.7. Types
25
1.6.2
while
The “while” construct is much simpler. It consists of a single test that determines
if the loop body should be executed or not. There is no initialization part, nor
increment part.
The general form is:
while (test) {
statement block
}
Any “for” loop can be transformed into a “while” loop by just doing:
init
while (test) {
statement block
increment
}
1.6.3
do
The “do” construct is a kind of inverted while. The body of the loop will always
be executed at least once. At the end of each iteration the test is performed. The
general form is:
do {
statement block
} while (test);
1.6.4
break and continue
Using the “break” keyword can stop any loop. This keyword provokes an exit of the
block of the loop and execution continues right afterwards.
The “continue” keyword can be used within any loop construct to provoke a jump
to the start of the statement block. The loop continues normally, only the statements
between the continue keyword and the end of the loop are ignored.
1.7
Types
A machine has no concept of type, everything is just a sequence of bits, and any
operation with those sequences of bits can be done, even if it is not meaningful at
all, for example adding two addresses, or multiplying the contents of two character
strings.
A high level programming language however, enforces the concept of types of
data. Operations are allowed between compatible types and not between any data
whatsoever. It is possible to add two integers, or an integer and a floating point
number, and even an integer and a complex number. It is not possible to add an
integer to a function or to a character string, the operation has no meaning for those
types.
26
Chapter 1. Introduction to C
An operation implies always compatible types between the operands or a con-
version from two incompatible types to make them compatible. It is not possible to
multiply a number with a character string but is possible to transform the contents of
a character string into a number and then do a multiplication. These conversions can
be done automatically by the compiler (for instance the conversion between integers
and floating point data) or explicitely specified by the programmer through a cast or
a function call.
In C, all data must be associated with a specific type before it can be used. All
variables must be declared to be of a known type before any operation with them is
attempted since to be able to generate code the compiler must know the type of each
operand. C is statically typed.
C allows the programmer to define new types based on the previously defined
ones. This means that the type system in C is static, i.e. known at compile time,
but extensible since you can add new types.
This is in contrast to dynamic typing, where no declarations are needed since the
language associates types and data during the run time. Dynamic typing is much
more flexible, but this flexibility has a price: the run time system must constantly
check the types of the operands for each operation to see if they are compatible. This
run-time checking slows down the program considerably.
In C there is absolutely no run time checking in most operations, since the com-
piler is able to check everything during the compilation, which accelerates the exe-
cution of the program, and allows the compiler to discover a lot of errors during the
compilation instead of crashing at run time when an operation with incompatible
types is attempted.
1.7.1
What is a type?
A first tentative, definition for what a type is, could be “a type is a definition of
the format of a sequence of storage bits”. It gives the meaning of the data stored
in memory. If we say that the object a is an int, it means that the bits stored at
that location are to be understood as a natural number that is built by consecutive
additions of powers of two. If we say that the type of a is a double, it means that the
bits are to be understood as the IEEE 754 standard sequences of bits representing a
double precision floating point value.
A second, more refined definition would encompass the first but add the notion
of "concept" behind a type. For instance in some machines the type size_t has
exactly the same bits as an unsigned long, yet, it is a different type. The difference
is that we store sizes in size_t objects, and not some arbitrary integer. The type is
associated with the concept of size. We use types to convey a concept to the reader
of the program.
A wider definition is that a type is also a set of operations available on it. Numeric
types define the four operations, boolean data defines logical operations, etc. Some
people would say that it is the set of operations that defines a type4.
4What does the standard writes about types? In §6.2.5 it writes:
The meaning of a value stored in an object or returned by a function is determined by
the type of the expression used to access it. (An identifier declared to be an object is
the simplest such expression; the type is specified in the declaration of the identifier.)
1.7. Types
27
The base of C’s type hierarchy are machine types, i.e. the types that the in-
tegrated circuit understands. C has abstracted from the myriad of machine types
some types like ’int’ or ’double’ that are almost universally present in all processors.
There are many machine types that C doesn’t natively support, for instance some
processors support BCD coded data but that data is accessible only through spe-
cial libraries. C makes an abstraction of the many machine types present in many
processors, selecting only some of them and ignoring others.
It can be argued why a type makes its way into the language and why another
doesn’t. For instance the most universal type always present in all binary machines
is the boolean type (one or zero). Still, it was ignored by the language until the C99
standard incorporated it as a native type5.
Functions have a type too. The type of a function is determined by the type of
its return value, and all its arguments. The type of a function is its interface with
the outside world: its inputs (arguments) and its outputs (return value).
Each type can have an associated pointer type: for int we have int pointer, for
double we have double pointer, etc. We can have also pointers that point to an
unspecified object. They are written as void *, i.e. pointers to void6.
Types in C can be incomplete, i.e. they can exist as types but nothing is known
about them, neither their size nor their bit-layout. They are useful for encapsulating
data into entities that are known only to certain parts of the programor for partially
defining types that can be fully defined later in the program. Example:
struct MyData;
Nothing is known about the internal structure of MyData. In most cases the
module handles out pointers to those incomplete structures.
This construct is there to allow you to implement a strong barrier between each
module that uses those types since all users of those hidden types can’t allocate them
(their size is unknown) or access the internal structure since it is unknown.
Information hiding is a design principle that stresses separation and modularity
in software construction by avoiding different modules to depend too much on each
other. In this specific case, an incomplete type allows the structure of a hidden type
to evolve freely without affecting at all the other parts of the program that remain
tied only to the specified functional interface.
1.7.2
Types classification
This type classification is based on the classification published by Plauger and Brody,
slightly modified.
Like many other definitions in the standard I am unable to figure out anything from this sentence.
Sorry.
Does this definition imply a format specification (meaning of a value)? Or it is simply a recursive
definition with an infinite loop? A type is: ... the type of the expression used...
5Using the operator overloading feature of lcc-win you can define user defined types that will
behave almost as if they were native types. This is not a feature of the C language as such but it is
present in most programming languages
6There is obviously no void object. A pointer to void is a special pointer that can point to any
object.
28
Chapter 1. Introduction to C
The schema can be understood as follows:
A C type can be either a function type, an incomplete type or an object type.
Function types can be either fully specified, i.e. we have a prototype available, or
partially specified with unknown arguments but a known return value.
Incomplete types are unspecified and it is assumed that they will be specified
elsewhere, except for the void type that is an incomplete type that can’t be further
specified. They have several uses that are explained in depth in 1.7.6.
Object types can be either scalar or aggregate types. Aggregate types are built
from the scalar types: structures, unions and arrays. Scalar types are of two kinds:
arithmetic or pointer types. Pointer types can point to scalar or composite types, to
functions or to incomplete types.
Arithmetic types have two kinds: integer types and floating types. The integer
types are bit fields, enumerations, and the types bool, char, short, int, long and long
long, all with signed or unsigned types, except the boolean type that hasn’t any
signed form and belongs to the unsigned types. The char type has not only signed
1.7. Types
29
and unsigned flavors but in some more esoteric classifications has a third flavor "plain
char", different as a type from an unsigned or a signed char. We do not need to go
into that hair splitting here.
Floating types are either real or complex, with both of them appearing in three
flavors: float, double and long double.
1.7.3
Integer types
The language doesn’t specify exactly how big each integer type must be, but it has
some requirements as to the minimum values a type must be able to contain, hence
its size. The char type must be at least 8 bits, the int type must be at least 16 bits,
and the long type must be at least 32 bits. How big each integer type actually is, is
defined in the standard header limits.h for each implementation.
1.7.4
Floating types
Floating types are discussed in more detail later. Here we will just retain that they
can represent integer and non integer quantities, and in general, their dynamic range
is bigger that integers of the same size. They have two parts: a mantissa and an
exponent.
As a result, there are some values that can’t be expressed in floating point, for
instance 1/3 or 1/10. This comes as a surprise for many people, so it is better to
underscore this fact here. More explanations for this later on.
Floating point arithmetic is approximate, and many mathematical laws that we
take for granted like a + b - a is equal to b do not apply in many cases to floating
point math.
1.7.5
Compatible types
There are types that share the same underlying representation. For instance, in lcc-
win for the Intel platform, in 32 bits, long and int are compatible. In the version
of lcc-linux for 64 bits however, long is 64 bits and int is 32 bits, they are no longer
compatible types.
In that version long is compatible with the long long type.
Plauger and Brody give the following definition for when two types are compatible
types:
• Both types are the same.
• Both are pointer types, with the same type qualifiers, that point to compatible
types.
• Both are array types whose elements have compatible types. If both specify
repetition counts, the repetition counts are equal.
• Both are function types whose return types are compatible. If both specify
types for their parameters, both declare the same number of parameters (in-
cluding ellipses) and the types of corresponding parameters are compatible.
Otherwise, at least one does not specify types for its parameters. If the other
30
Chapter 1. Introduction to C
specifies types for its parameters, it specifies only a fixed number of parame-
ters and does not specify parameters of type float or of any integer types that
change when promoted.
• Both are structure, union, or enumeration types that are declared in different
translation units with the same member names. Structure members are de-
clared in the same order. Structure and union members whose names match are
declared with compatible types. Enumeration constants whose names match
have the same values.
1.7.6
Incomplete types
An incomplete type is missing some part of the declaration. For instance
struct SomeType;
We know now that SomeType is a struct, but since the contents aren’t specified, we
can’t use directly that type. The use of this is precisely to avoid using the type:
encapsulation. Many times you want to publish some interface but you do not want
people using the structure, allocating a structure, or doing anything else but pass
those structure to your functions. In those situations, an opaque type is a good thing
to have.
Note that the opaque (incomplete) types are much more protected than "pro-
tected" members in C++. If you do not hand out the header file containing the type
definitions nobody ever will be able to see how that type is built, unless (of course)
they disassemble the generated code.
1.7.7
Qualified types
All types we have seen up to now are unqualified. To each unqualified type corre-
sponds one or more qualified type that adds the keywords const, restrict, and volatile.
The const keyword means that the programmer specifies that the value of the
object of this type is read only. Assignments to const objects is an error. The
restrict keyword applies to pointer types and it means that there is no alias for this
object, i.e. that the pointer is the only pointer to this object within this function or
local scope. For instance:
void f(int * restrict p,int * restrict q)
{
while (*q) {
*p = *q; // Copy
p++;
// Increment
q++;
}
}
During the execution of this function, the restrict keyword ensures that when the
object pointed to by p is accessed, this doesn’t access also the object pointed by q.
1.8. Declarations and definitions
31
This keyword enables the compiler to perform optimizations based on the fact
that p and q point to different objects.
The volatile keyword means that the object qualified in this way can change
by means not known to the program and that it must be treated specially by the
compiler. The compiler should follow strictly the rules of the language, and no
optimizations are allowed for this object. This means that the object should be
stored in memory for instance, and not in a register. This keyword is useful for
describing a variable that is shared by several threads or processes. A static volatile
object is a good model for a memory mapped I/O register, i.e. a memory location
that is used to read data coming from an external source.
1.7.8
Casting
The programmer can at any time change the type associated with a piece of data by
making a “cast” operation. For instance if you have:
float f = 67.8f;
you can do
double d = (double)f;
The "(double)” means that the data in f should be converted into an equivalent
data using the double precision representation. We will come back to types when we
speak again about casts later (page 69).
1.7.9
The basic types
The basic types of the language come wired in when you start the compiler. They are
the different representations of numbers that the underlying printed circuit under-
stands, i.e. has hardware dedicated to performing arithmetic operations with them.
Sometimes the lack of hardware support can be simulated in software (for floating
point numbers), even if most CPUs now support floating point.
Table-1.2 shows the sizes in the 32 bit implementation of lcc-win of the basic
types of ANSI-C.
Lcc-win offers you other types of numbers, shown in Table-1.3. To use them you
should include the corresponding header file, they are not “built in” into the compiler.
They are built using a property of this compiler that allows you to define your own
kind of numbers and their operations. This is called operator overloading and will
be explained further down.
Under lcc-win 64 bits, the sizes of the standard types change a bit. See Table-1.4
The C standard defines the minimum sizes that all types must have in all imple-
mentations. See Table-1.5
1.8
Declarations and definitions
It is very important to understand exactly the difference between a declaration and
a definition in C.
32
Chapter 1. Introduction to C
Table 1.2: Standard type sizes in lcc-win
Type
Size
Description
bytes
_Bool
1
Logical type, can be either zero or one. Include
<stdbool.h> to use them.
char
1
Character or small integer type.
Comes in two flavours: signed or unsigned.
short
2
Integer or unicode character stored in 16 bits.
Signed or unsigned.
int
4
Integer stored in 32 bits. Signed or unsigned.
long
4 or 8
Identical to int under windows 32 bit and in
windows-64.
In Unix 64 bit versions it is 64 bits
pointer
4 or 8
All pointers are the same size in lcc-win. Under
Unix 64 it is 64 bits.
long long
8
Integer stored in 64 bits. Signed or unsigned.
float
4
Floating-point single precision.
(Around 7 digits)
double
8
Floating-point double precision.
(Approx. 15 digits)
long double
12
Floating point extended precision
(Approx 19 digits)
float _Complex
24
Complex number
double _Complex
24
long double
_Com-
24
plex7
Table
1.3: Increased precision numbers in lcc-win
Type
Header
Size (bytes)
Description
qfloat
qfloat.h
56
352 bits floating point
bignum
bignum.h
variable
Extended precision number
int128
int128.h
16
128 bit signed integer type
A declaration introduces an identifier to the compiler. It says in essence: this
identifier is a xxx and its definition will come later. An example of a declaration is
extern double sqrt(double);
With this declaration, we introduce to the compiler the identifier sqrt, telling
it that it is a function that takes a double precision argument and returns a double
precision result. Nothing more. No storage is allocated for this declaration, besides
the storage allocated within the compiler internal tables. If the function so declared
is never used, absolutely no storage will be used. A declaration doesn’t use any space
in the compiled program, unless what is declared is effectively used. If that is the
case, the compiler emits a record for the linker telling it that this object is defined
elsewhere.
A definition tells the compiler to allocate storage for the identifier. For instance,
1.8. Declarations and definitions
33
Table 1.4: Type sizes for lcc-win 64 bits
Type
Size
Comment
bool
char
int,long
long long
Same as in the 32 bit version
In the linux and AIX 64 bit versions, long is 64 bits.
pointer
8
All 64 bit versions have this type size
long double
16
This type could not be maintained at 12 bytes
since it would misalign the stack, that must be aligned
at 8 byte boundaries.
float, double
Same as in the 32 bit version
The double type uses the SSE registers in the Intel
architecture version
Table 1.5: Minimum size of standard types
Type
Minimum size
char
8 bits
short
16 bits
int
16 bits
long
32 bits
long long
64 bits
when we defined the function main above, storage for the code generated by the
compiler was created, and an entry in the program’s symbol table was done. In the
same way, when we wrote: int count, above, the compiler made space in the local
variables area of the function to hold an integer.
And now the central point: You can declare a variable many times in your pro-
gram, but there must be only one place where you define it. Note that a definition
is also a declaration, because when you define some variable, automatically the com-
piler knows what it is, of course. For instance if you write: double balance; even
if the compiler has never seen the identifier balance before, after this definition it
knows it is a double precision number.
Note that when you do not provide for a declaration, and use this feature: def-
inition is a declaration; you can only use the defined object after it is defined. A
declaration placed at the beginning of the program module or in a header file frees
you from this constraint. You can start using the identifier immediately, even if its
definition comes much later, or even in another module.
1.8.1
Variable declaration
A variable is declared with <type> <identifier> ; like
int a; double d; long long h;
34
Chapter 1. Introduction to C
All those are definitions of variables. If you just want to declare a variable,
without allocating any storage, because that variable is defined elsewhere you add
the keyword extern:
extern int a;
extern double d;
extern long long d;
Optionally, you can define an identifier, and assign it a value that is the result of
some calculation:
double fn(double f) {
double d = sqrt(f);
// more statements
}
Note that initializing a value with a value unknown at compile time is only possible
within a function scope. Outside a function you can still write:
int a = 7;
or
int a = (1024*1024)/16;
but the values you assign must be compile time constants, i.e. values that the compiler
can figure out when doing its job. See "constant expressions", page 68
Pointers are declared using an asterisk:
int *pInt;
This means that a will contain the machine address of some unspecified integer.
Remember: this pointer will contain garbage until it is initialized:
int sum;
int *pInt = ∑
Now the pInt variable contains the machine address of sum.
You can save some typing by declaring several identifiers of the same type in the
same declaration like this:
int a,b=7,*c,h;
Note that c is a pointer to an integer, since it has an asterisk at its left side. This
notation is surely somehow confusing, specially for beginners. Use this multiple
declarations when all declared identifiers are of the same type and put pointers in
separate lines.
The syntax of C declarations has been criticized for being quite obscure. This is
true; there is no point in negating an evident weakness. In his book “Deep C secrets”
Peter van der Linden writes a simple algorithm to read them. He proposes (chapter
3) the following:8
8Deep C secrets. Peter van der Linden ISBN 0-13-177429-8
1.8. Declarations and definitions
35
The Precedence Rule for Understanding C Declarations.
Rule 1: Declarations are read by starting with the name and then reading in
precedence order.
Rule 2: The precedence, from high to low, is:
2.A : Parentheses grouping together parts of a declaration
2.B: The postfix operators:
2.B.1: Parentheses ( ) indicating a function prototype, and
2.B.2: Square brackets [ ] indicating an array.
2.B.3: The prefix operator: the asterisk denoting "pointer to".
Rule 3: If a const and/or volatile keyword is next to a type specifier e.g. int, long,
etc.) it applies to the type specifier. Otherwise the const and/or volatile keyword
applies to the pointer asterisk on its immediate left.
Using those rules, we can even understand a thing like:
char * const *(*next)(int a, int b);
We start with the variable name, in this case “next”. This is the name of the
thing being declared. We see it is in a parenthesized expression with an asterisk,
so we conclude that “next is a pointer to. . . ” well, something. We go outside the
parentheses and we see an asterisk at the left, and a function prototype at the right.
Using rule 2.B.1 we continue with the prototype.
“next is a pointer to a function
with two arguments”. We then process the asterisk: “next is a pointer to a function
with two arguments returning a pointer to. . . ” Finally we add the char * const, to
get “next” is a pointer to a function with two arguments returning a pointer to a
constant pointer to char.
Now let’s see this:
char (*j)[20];
Again, we start with “j is a pointer to”. At the right is an expression in brackets, so
we apply 2.B.2 to get “j is a pointer to an array of 20”. Yes what? We continue at
the left and see ”char”. Done. “j” is a pointer to an array of 20 chars. Note that we
use the declaration in the same form without the identifier when making a cast:
j = (char (*)[20]) malloc(sizeof(*j));
We see enclosed in parentheses (a cast) the same as in the declaration but without
the identifier j.
1.8.2
Function declarations
A declaration of a function specifies:
• The return type of the function, i.e. the kind of result value it produces, if any.
• Its name.
• The types of each argument, if any.
The general form is:
36
Chapter 1. Introduction to C
<type> <Name>(<type of arg 1>, ... <type of arg N> ) ;
double sqrt(double) ;
Note that an identifier can be added to the declaration but its presence is optional.
We can write:
double sqrt(double x);
if we want to, but the “x” is not required and will be ignored by the compiler.
Functions can have a variable number of arguments. The function “printf” is an
example of a function that takes several arguments. We declare those functions like
this:
int printf(char *, ...);
The ellipsis means “some more arguments”.
Why are function declarations important?
When I started programming in C, prototypes for functions didn’t exist. So you
could define a function like this:
int fn(int a)
{
return a+8;
}
and in another module write:
fn(7,9);
without any problems.
Well, without any problems at compile time of course. The program crashed or
returned nonsense results. When you had a big system of many modules written
by several people, the probability that an error like this existed in the program was
almost 100%. It is impossible to avoid mistakes like this. You can avoid them most
of the time, but it is impossible to avoid them always.
Function prototypes introduced compile time checking of all function calls. There
wasn’t anymore this dreaded problem that took us so many debugging hours with
the primitive debugger of that time. In the C++ language, the compiler will abort
compilation if a function is used without prototypes. I have thought many times
to introduce that into lcc-win, because ignoring the function prototype is always an
error. But, for compatibility reasons I haven’t done it yet.
1.8.3
Function definitions
Function definitions look very similar to function declarations, with the difference
that instead of just a semi colon, we have a block of statements enclosed in curly
braces, as we saw in the function “main” above. Another difference is that here we
have to specify the name of each argument given, these identifiers aren’t optional any
more: they are needed to be able to refer to them within the body of the function.
Here is a rather trivial example:
1.8. Declarations and definitions
37
int addOne(int input)
{
return input+1;
}
1.8.4
Scope of identifiers
The scope of an identifier is the extent of the program where the identifier is active,
i.e. where in the program you can use it. There are three kinds of identifier scopes:
1) File scope. An identifier with file scope can be used anywhere from the its
definition till the end of the file where it is declared.
2) Block scope. The identifier is visible within a block of code enclosed in curly
braces ‘‘ and ‘’.
3) Function prototype scope. This scope is concerned with the identifiers that
are used within function prototypes. For instance
void myFunction(int arg1);
the identifier ’arg1’ is within prototype scope and disappears immediately after the
prototype is parsed. We could add a fourth scope for function labels, that are visible
anywhere within a function without any restriction of block scopes.
1.8.5
Linkage and duration of objects
The linkage of an object is whether it is visible outside the current compilation unit
or not. Objects that are marked as external or appear at global scope without
the keyword ‘static’ are visible outside the current compilation unit. Note that an
identifier that doesn’t refer to an object can have global scope but not be visible
outside the current compilation unit. Enumerations, for instance, even if they have
global scope are not “exported” to other modules.
In general, we have the public objects, visible in all modules of the program, and
the private ones, marked with the keyword ‘static’ and visible only in the current
compilation unit. The duration of an object means the time when this object is
active. It can be:
1) Permanent. The object is always there, and lives from the start to the end
of the program. It can maybe even live after the program has finished. They are
declared at global scope. The initial value is either explicitly given by the program
such as in: int p = 78; or implicitly defined as zero as in int p;.
2) Transitory. The object starts its life when the program enters the scope where
it lives, and disappears after the program leaves the scope. Automatic variables are
transitory objects. The initial value is undefined and in most compilers it consists of
whatever values were stored previously at that memory location.
3) Allocated. The object starts its life as the result of one of the allocation
functions like malloc, or GC_malloc, and it ends its life when its storage is reclaimed,
either explicitly because the programs calls the ‘free’ function or because the garbage
collector determines that its storage is no longer used. The initial value depends on
the allocation function: malloc returns uninitialized memory, calloc and GC_malloc
zeroes the memory before returning it.
38
Chapter 1. Introduction to C
1.8.6
Variable definition
A variable is defined only when the compiler allocates space for it. For instance, at
the global level, space will be allocated by the compiler when it sees a line like this:
int a;
or
int a = 67;
In the first case the compiler allocates sizeof(int) bytes in the non-initialized variables
section of the program. In the second case, it allocates the same amount of space
but writes 67 into it, and adds it to the initialized variables section.
1.8.7
Statement syntax
In C, the enclosing expressions of control statements like if, or while, must be enclosed
in parentheses. In many languages that is not necessary and people write:
if a < b run(); // Not in C...
in C, the if statement requires a parentheses
if (a<b) run();
The assignment in C is an expression, i.e. it can appear within a more complicated
expression:
if ( (x = z) > 13) z = 0;
This means that the compiler generates code for assigning the value of z to x, then
it compares this value with 13, and if the relationship holds, the program will set z
to zero. This construct is considered harmful however, because it is very easy to mix
the assignment (=) and the equality (==) operations.
1.9
Errors and warnings
It is very rare that we type in a program and that it works at the first try. What
happens, for instance, if we forget to close the main function with the corresponding
curly brace? We erase the curly brace above and we try:
h:\lcc\examples>lcc args.c
Error args.c: 15
syntax error; found ‘end of input’ expecting ‘}’
1 errors, 0 warnings
Well, this is at least a clear error message. More difficult is the case of forgetting to
put the semi-colon after the declaration of count, in the line 3 in the program above:
1.9. Errors and warnings
39
D:\lcc\examples>lcc args.c
Error args.c: 6
syntax error; found ‘for’ expecting ‘;’
Error args.c: 6
skipping ‘for’
Error args.c: 6
syntax error; found ‘;’ expecting ‘)’
Warning args.c: 6
Statement has no effect
Error args.c: 6
syntax error; found ‘)’ expecting ‘;’
Error args.c: 6
illegal statement termination
Error args.c: 6
skipping ‘)’
6 errors, 1 warnings
D:\lcc\examples>
We see here a chain of errors, provoked by the first. The compiler tries to arrange
things by skipping text, but this produces more errors since the whole “for” construct
is not understood. Error recovering is quite a difficult undertaking, and lcc-win isn’t
very good at it. So the best thing is to look at the first error, and in many cases,
the rest of the error messages are just consequences of it.9
Another type of errors
can appear when we forget to include the corresponding header file. If we erase the
#include <stdio.h> line in the args program, the display looks like this:
D:\lcc\examples>lcc args.c
Warning args.c: 7
missing prototype for printf
0 errors, 1 warnings
This is a warning. The printf function will be assumed to return an integer, what,
in this case, is a good assumption. We can link the program and the program works.
It is surely NOT a good practice to do this, however, since all argument checking is
not done for unknown functions; an error in argument passing will pass undetected
and will provoke a much harder type of error: a run time error.
In general, it is better to get the error as soon as possible. The later it is discov-
ered, the more difficult it is to find it, and to track its consequences. Do as much
as you can to put the C compiler in your side, by using always the corresponding
header files, to allow it to check every function call for correctness.
The compiler gives two types of errors, classified according to their severity: a
warning, when the error isn’t so serious that doesn’t allow the compiler to finish its
task, and the hard errors, where the compiler doesn’t generate an executable file and
returns an error code to the calling environment.
We should keep in mind however that warnings are errors too, and try to get rid
from them.
The compiler uses a two level “warning level” variable. In the default state, many
warnings aren’t displayed to avoid cluttering the output. They will be displayed
however, if you ask explicitly to raise the warning level, with the option -A. This
compiler option will make the compiler emit all the warnings it would normally
suppress. You call the compiler with lcc -A <filename>, or set the corresponding
button in the IDE, in the compiler configuration tab.
9You will probably see another display in your computer if you are using a recent version of
lcc-win. I improved error handling when I was writing this tutorial.
40
Chapter 1. Introduction to C
Errors can appear in later stages of course. The linker can discover that you have
used a procedure without giving any definition for it in the program, and will stop
with an error. Or it can discover that you have given two different definitions, maybe
contradictory to the same identifier. This will provoke a link time error too.
But the most dreaded form of errors are the errors that happen at execution time,
i.e. when the program is running. Most of these errors are difficult to detect (they
pass through the compilation and link phases without any warnings. . . ) and provoke
the total failure of the software.
The C language is not very “forgiving” what programmer errors concerns. Most
of them will provoke the immediate stop of the program with an exception, or return
completely nonsense results. In this case you need a special tool, a debugger, to
find them. Lcc-win offers you such a tool, and you can debug your program by just
pressing F5 in the IDE.
Summary:
• Syntax errors (missing semi-colons, or similar) are the easiest to correct.
• The compiler emits two kinds of diagnostic messages: warnings and errors.
• You can rise the compiler error reporting with the -A option.
• The linker can report errors when an identifier is defined twice or when an
identifier is missing a definition.
• The most difficult errors to catch are run time errors, in the form of traps or
incorrect results.
1.10
Input and output
In the Unix operating system, where the C language was designed, one of its central
concepts is the “FILE” generalization. Devices as varied as serial devices, disks, and
what the user types in his/her keyboard are abstracted under the same concept: a
FILE as a sequence of bytes to handle.
The FILE structure is a special opaque structure defined in <stdio.h>. Contrary
to other opaque structures, its definition is exposed in stdio.h, but actually its fields
are never directly used.10
We have two kinds of input/output: direct operations and formatted operations.
In direct operations, we just enter data from or to the device without any further
processing. In formatted operations, the data is assumed to be of a certain type, and
it is formatted before being sent to the device.
As in many other languages, to perform some operation with a file you have
to setup a connection to it, by first “opening” it in some way, then you can do
input/output to it, and then, eventually, you close the connection with the device by
“closing” it.
Table-1.6 shows a short overview of the functions that use files.
10 An “opaque” structure is a structure whose definition is hidden. Normally, we have just a
pointer to it, but nowhere the actual definition.
1.10. Input and output
41
Table 1.6: File operations
Name
Purpose
fopen
Opens a file
fclose
Closes a file
fprintf
Formatted output to a file
fputc
Puts a character in a file
putchar
Puts a character to stdout
getchar
Reads a character from standard input
feof
True when current position is at the end of the file
ferror
True when error reading from the device
fputs
Puts a string in a file.
fread
Reads from a file a specified amount of data into a buffer.
freopen
Reassigns a file pointer
fgetc
Reads one character from a stream
fscanf
Reads data from a file using a given data format
fsetpos
Assigns the file pointer (the current position)
fseek
Moves the current position relative to the start of the file,
to the end of the file, or relative to the current position
ftell
returns the current position
fwrite
Writes a buffer into a file
remove
Erases a file
rename
Renames a file.
rewind
Repositions the file pointer to the beginning of a file.
setbuf
Controls file buffering.
tmpnam
Returns a temporary file name
ungetc
Pushes a character back into a file.
unlink
Erases a file
1.10.1
Predefined devices
To establish a connection with a device we open the file that represents it. There are
three devices always open that represent the basic devices that the language assumes:
1. The standard input device, or “stdin”. This device is normally associated with
the keyboard at the start of the program.
2. The standard output device, or “stdout”. This device is normally associated
with the computer screen in text mode.11
3. The standard error device or “stderr” that in most cases is also associated with
the computer screen.
11The text mode window is often called “Dos window” even if it has nothing to do with the old
MSDOS operating system. It is a window with black background by default, where you can see
only text, no graphics. Most of the examples following use this window. To start it you just go to
“Start”, then “Run” and type the name of the command shell: “cmd.exe”
42
Chapter 1. Introduction to C
Other devices that can be added or deleted from the set of devices the program
can communicate with. The maximum value for the number of devices that can be
simultaneously connected is given by the macro FOPEN_MAX, defined in stdio.h.
Under some systems you do not have these devices available or they are available
but not visible. Some of those systems allow you to open a “command line” window
that acts like a primitive system console with a text mode interface. When you use
those command line windows your program has these standard devices available.
1.10.2
The typical sequence of operations
To establish a connection with a device we use the “fopen” function, that returns
a pointer to a newly allocated FILE structure, or NULL if the connection with the
device fails, for whatever reason. Once the connection established we use fwrite/fread
to send or receive data from/to the device. When we do not need the connection any
more we break the connection by “closing” the file.
#include <stdio.h>
int main(int argc,char *argv[])
{
unsigned char buffer[2048];
unsigned int byteswritten, bytesread;
// Error checking suppressed for clarity
FILE *f = fopen(argv[1],"r");
FILE *out = fopen(argv[2],"w");
bytesread = fread(buffer,1,sizeof(buffer),f);
byteswritten = fwrite(buffer,1,sizeof(bufffer),out);
fclose(f);
fclose(out);
return 0;
}
In this hypothetical program we establish a connection for reading from a device
named in the first argument (argv[1]). We connect to another device named in the
second argument, we read from the first device, we write into the second device and
then we close both.
1.10.3
Examples
For a beginner, it is very important that the basic libraries for reading and writing
to a stream, and the mathematical functions are well known. To make more concrete
the general descriptions about input/output from the preceding sections we present
here a compilable, complete example.
The example is a function that will read a file, counting the number of characters
that appear in the file.
A program is defined by its specifications. In this case, we have a general goal
that can be expressed quickly in one sentence: “Count the number of characters in
a file”. Many times, the specifications aren’t in a written form, and can be even
1.10. Input and output
43
completely ambiguous. What is important is that before you embark in a software
construction project, at least for you, the specifications are clear.
#include <stdio.h>
(1)
int main(int argc,char *argv[])
(2)
{
int count=0; // chars read
(3)
FILE *infile;
(4)
int c;
(5)
infile = fopen(argv[1],"rb");
(6)
c = fgetc(infile);
(7)
while (c != EOF) {
(8)
count++;
(9)
c = fgetc(infile);
(10)
}
printf("%d\n",count);
(11)
return 0;
}
1.
We include the standard header “stdio.h” again. Here is the definition of a FILE
structure.
2.
The same convention as for the “args” program is used here.
3.
We set at the start, the count of the characters read to zero. Note that we do
this in the declaration of the variable. C allows you to define an expression
that will be used to initialize a variable.
4.
We use the variable “infile” to hold a FILE pointer. Note the declaration for
a pointer:
<type> * identifier; the type in this case, is a complex structure
(composite type) called FILE and defined in stdio.h. We do not use any fields
of this structure, we just assign to it, using the functions of the standard library,
and so we are not concerned about the specific layout of it. Note that a pointer
is just the machine address of the start of that structure, not the structure
itself. We will discuss pointers extensively later.
5.
We use an integer to hold the currently read character.
6.
We start the process of reading characters from a file first by opening it. This
operation establishes a link between the data area of your hard disk, and the
FILE variable. We pass to the function fopen an argument list, separated by
commas, containing two things: the name of the file we wish to open, and the
mode that we want to open this file, in our example in read mode. Note that
the mode is passed as a character string, i.e. enclosed in double quotes.
7.
Once opened, we can use the fgetc function to get a character from a file. This
function receives as argument the file we want to read from, in this case the
variable “infile”, and returns an integer containing the character read.
44
Chapter 1. Introduction to C
8. We use the while statement to loop reading characters from a file. This state-
ment has the general form: while (condition) . . . statements
The loop
body will be executed for so long as the condition holds. We test at each itera-
tion of the loop if our character is not the special constant EOF (End Of File),
defined in stdio.h.
9. We increment the counter of the characters. If we arrive here, it means that
the character wasn’t the last one, so we increase the counter.
10. After counting the character we are done with it, and we read into the same
variable a new character again, using the fgetc function.
11. If we arrive here, it means that we have hit EOF , the end of the file. We
print our count in the screen and exit the program returning zero, i.e. all is
OK. By convention, a program returns zero when no errors happened, and an
error code, when something happens that needs to be reported to the calling
environment.
Now we are ready to start our program. We compile it, link it, and we call it with:
h:\lcc\examples> countchars countchars.c
288
We have achieved the first step in the development of a program. We have a version
of it that in some circumstances can fulfill the specifications that we received. But
what happens if we just write
h:\lcc\examples> countchars
We get the following box that many of you have already seen several times: Why?
Well, let’s look at the logic of our program. We assumed (without any test) that
argv[1] will contain the name of the file that we should count the characters of. But
if the user doesn’t supply this parameter, our program will pass a nonsense argument
to fopen, with the obvious result that the program will fail miserably, making a trap,
or exception that the system reports. We return to the editor, and correct the faulty
logic. Added code is in bold.
#include <stdio.h>
#include <stdlib.h>(1)
int main(int argc,char *argv[])
{
size_t count=0; // chars read
FILE *infile;
int c;
if (argc < 2) { (2)
printf("Usage: countchars <file name>\n");
exit(EXIT_FAILURE); (3)
}
infile = fopen(argv[1],"r");
1.10. Input and output
45
c = fgetc(infile);
while (c != EOF) {
count++;
c = fgetc(infile);
}
printf("%d\n",count);
return 0;
}
1. We need to include <stdlib.h> to get the prototype declaration of the exit()
function that ends the program immediately.
2. We use the conditional statement “if” to test for a given condition. The general
form of it is:
if (condition) { statements } else { statements }
3. We use the exit function to stop the program immediately. This function re-
ceives an integer argument that will be the result of the program. In our case
we return the error code 1. The result of our program will be then, the inte-
ger 1.Note that we do not use the integer constant 1 directly, but rather use
the predefined constants EXIT_SUCCESS (defined as 0) or EXIT_FAILURE
(defined as 1) in stdlib.h. In other operating systems or environments, the nu-
meric value of EXIT_FAILURE could be different. By using those predefined
constants we keep our code portable from one implementation to the other.
Now, when we call countchars without passing it an argument, we obtain a nice
message:
h:\lcc\examples> countchars
Usage: countchars <file name>
This is MUCH clearer than the incomprehensible message box from the system isn’t
it? Now let’s try the following:
h:\lcc\examples> countchars zzzssqqqqq
And we obtain the dreaded message box again. Why? Well, it is very unlikely that
a file called “zzzssqqqqq” exists in the current directory. We have used the function
fopen, but we didn’t bother to test if the result of fopen didn’t tell us that the
operation failed, because, for instance, the file doesn’t exist at all!
A quick look at the documentation of fopen (that you can obtain by pressing F1
with the cursor over the “fopen” word in Wedit) will tell us that when fopen returns
a NULL pointer (a zero), it means the open operation failed. We modify again our
program, to take into account this possibility:
#include <stdio.h>
#include <stdlib.h>
int main(int argc,char *argv[])
46
Chapter 1. Introduction to C
{
size_t count=0; // chars read
FILE *infile;
int c;
if (argc < 2) {
printf("Usage: countchars <file name>\n");
exit(EXIT_FAILURE);
}
infile = fopen(argv[1],"r");
if (infile == NULL) {
printf("File %s doesn’t exist\n",argv[1]);
exit(EXIT_FAILURE);
}
c = fgetc(infile);
while (c != EOF) {
count++;
c = fgetc(infile);
}
printf("%d\n",count);
return 0;
}
We try again:
H:\lcc\examples> lcc countchars.c
H:\lcc\examples> lcclnk countchars.obj
H:\lcc\examples> countchars sfsfsfsfs
File sfsfsfsfs doesn’t exist
H:\lcc\examples>
Well this error checking works. But let’s look again at the logic of this program.
Suppose we have an empty file. Will our program work?
If we have an empty file, the first fgetc will return EOF . This means the whole
while loop will never be executed and control will pass to our printf statement. Since
we took care of initializing our counter to zero at the start of the program, the
program will report correctly the number of characters in an empty file: zero.
Still, it would be interesting to verify that we are getting the right count for a
given file. Well that’s easy. We count the characters with our program, and then we
use the DIR directive of windows to verify that we get the right count.
H:\lcc\examples>countchars countchars.c
466
H:\lcc\examples>dir countchars.c
07/01/00
11:31p
492 countchars.c
1 File(s)
492 bytes
1.10. Input and output
47
Wow, we are missing 492-466 = 26 chars!
Why?
We read again the specifications of the fopen function. It says that we should use
it in read mode with “r” or in binary mode with “rb”. This means that when we open
a file in read mode, it will translate the sequences of characters \r (return) and \n
(new line) into ONE character. When we open a file to count all characters in it, we
should count the return characters too.
This has historical reasons. The C language originated in a system called UNIX,
actually, the whole language was developed to be able to write the UNIX system in
a convenient way. In that system, lines are separated by only ONE character, the
new line character.
When the MSDOS system was developed, dozens of years later than UNIX, people
decided to separate the text lines with two characters, the carriage return, and the
new line character. This provoked many problems with software that expected only
ONE char as line separator. To avoid this problem the MSDOS people decided to
provide a compatibility option for that case: fopen would by default open text files
in text mode, i.e. would translate sequences of \r\n into a single \n, skipping the
\r.
Conclusion:
Instead of opening the file with:
fopen(argv[1], "r");
we use fopen(argv[1], "rb"); i.e. we force NO translation. We recompile, relink
and we obtain:
H:\lcc\examples> countchars countchars.c
493
H:\lcc\examples> dir countchars.c
07/01/00
11:50p
493 countchars.c
1 File(s)
493 bytes
Yes, 493 bytes instead of 492 before, since we have added a “b” to the arguments
of fopen! Still, we read the docs about file handling, and we try to see if there are
no hidden bugs in our program. After a while, an obvious fact appears: we have
opened a file, but we never closed it, i.e. we never break the connection between the
program, and the file it is reading. We correct this, and at the same time add some
commentaries to make the purpose of the program clear.
/*---------------------------------------------------------
Module:
H:\LCC\EXAMPLES\countchars.c
Author:
Jacob
Project:
Tutorial examples
State:
Finished
Creation Date: July 2000
Description: This program opens the given file, and
prints the number of characters in it.
----------------------------------------------------------*/
48
Chapter 1. Introduction to C
#include <stdio.h>
#include <stdlib.h>
int
main(int argc,char *argv[])
{
size_t count=0;
FILE *infile;
int c;
if (argc < 2) {
printf("Usage: countchars <file name>\n");
exit(EXIT_FAILURE);
}
infile = fopen(argv[1],"rb");
if (infile == NULL) {
printf("File %s doesn’t exist\n",argv[1]);
exit(EXIT_FAILURE);
}
c = fgetc(infile);
while (c != EOF) {
count++;
c = fgetc(infile);
}
fclose(infile);
printf("%d\n",count);
return 0;
}
The skeleton of the commentary above is generated automatically by the IDE. Just
right-click somewhere in your file, and choose “edit description”.
Summary:
• A program is defined by its specifications. In this example, counting the number
of characters in a file.
• A first working version of the specification is developed. Essential parts like
error checking are missing, but the program “works” for its essential function.
• Error checking is added, and test cases are built.
• The program is examined for correctness, and the possibility of memory leaks,
unclosed files, etc., is reviewed. Comments are added to make the purpose of
the program clear, and to allow other people know what it does without being
forced to read the program text.
1.10.4
Other input/output functions
The current position
Each open file descriptor has a current position, i.e. the position in the data stream
where the next write or read operation will be done. To know where the file pointer is,
1.10. Input and output
49
use the ftell function. To set the current position to some specific index use the fseek
function. Here is an example of the usage of those functions. We write a function
that will return the length of a given file. The algorithm is very simple: 1) Set the
file pointer at the end of the file 2) Read the position of the file cursor. This is the
size of the file. Easy isn’t it?
size_t FileLength(char *FileName)
(1)
{
FILE *f = fopen(FileName,"rb");
(2)
size_t result;
if (f == NULL)
return -3;
if (fseek(f,0,SEEK_END)) {
(3)
fclose(f);
return -2;
}
result = ftell(f);
(4)
fclose(f);
return result;
(5)
}
1. We use the type dedicated for sizes within the language: size_t as the return
value of our function. This type is translated by the implementation into the
integer size that can hold the biggest size supported. In lcc-win’s case this is
an unsigned int. In 64 bit systems is normally an unsigned long long.
2. We open the file. Note that we open it in binary mode, because ftell and fseek
will NOT work with files opened in text mode, where the sequence \r\n is
translated into only one character.
3. The fseekfunction will position the file pointer. The first argument is the file
descriptor pointer; the second is the distance from either the beginning, the
current position or the end of the file. In our example we use the position from
the end of the file, SEEK_END. If the function fails for some reason we return
-2.
4. We call the ftellfunction to retrieve the current position. Note that that
function returns -1 if there was an error. We do not test for this result since if
that function failed, we return the error code without any further processing.
5. Since all this functions return a 32 bit integer, files bigger than 2GB can’t be
measured using this functions. lcc-win provides some 64 bit file primitives, and
the Windows operating system provides a full set of 64 bit file primitives.
1.10.5
File buffering
A file can be either unbuffered (all characters and input output operations happen
immediately) or buffered, meaning that characters are accumulated in a buffer and
50
Chapter 1. Introduction to C
transmitted from or to the device as a block. Obviously buffering input output has
the advantage of efficiency, since fewer I/O operations are performed.
Buffering can be either fully buffered, when i/o is done when the buffer is full
only, or line buffered, when I/O is done only when the new line character is found
in the data. Normally, the files associated with the keyboard and the screen are line
buffered, and disk files are fully buffered.
You can force the system to empty the buffer with the function fflush. The
amount of buffering can be changed with the setbuf and setvbuf functions.
The setbuf function allows you to setup your own buffer instead of the default
one. This can be sometimes better when you know you need to read/write large
chunks of the file, for instance:
#include <stdio.h>
unsigned char mybuf[BUFSIZ]; // BUFSIZ defined in stdio.h
int main(void)
{
setbuf(stdout,mybuf);
}
Note that the buffer is declared global. Be careful about the scope of the buffer you
pass to setbuf. You can use a buffer with local scope, but you must be aware that
before leaving the scope where the buffer is declared the file must be closed. If not,
the program will crash when the fclose function tries to flush the contents of the
buffer that is no longer available. The fclose function will be called automatically at
program exit, if any open files exist at that time.
The setvbuf function allows you to change the mode of the buffering (either line,
full or none) and pass a buffer of a different size than BUFSIZ, a constant defined in
stdio.h.
Error conditions
What is the proper way of finding an end of file condition while reading from a file?
Try to read the file, and if it fails, see if the reason it failed was end of file. You
find this using the feof function. There are two reasons why a read from a file can
fail. The first one is that there is nothing more to read, the end of the data set has
been reached and this means that the current position is at the end of the file.
The second reason is that there is a hardware error that makes any reading
impossible: the disk drive has a bad spot and refuses to give any data back, or the
device underlying this file is a network connection and your internet service provider
has problems with the router connected to your machine, or whatever. There are
endless reasons why hardware can fail.
You can find out which the reason is responsible for this using the feof and ferror
functions.
1.11
Commenting the source code
The writing of commentaries, apparently simple, is, when you want to do it right,
quite a difficult task. Let’s start with the basics. Commentaries are introduced in
1.11. Commenting the source code
51
two forms: Two slashes // introduce a commentary that will last until the end of the
line. No space should be present between the first slash and the second one. A slash
and an asterisk /* introduce a commentary that can span several lines and is only
terminated by an asterisk and a slash, */. The same rule as above is valid here too:
no space should appear between the slash and the asterisk, and between the asterisk
and the slash to be valid comment delimiters. Examples:
// This is a one-line commentary. Here /* are ignored anyway.
/* This is a commentary that can span several lines.
Note that here the two slashes // are ignored too */
This is very simple, but the difficulty is not in the syntax of commentaries, of course,
but in their content. There are several rules to keep in mind: Always keep the com-
mentaries current with the code that they are supposed to comment. There is nothing
more frustrating than to discover that the commentary was actually misleading you,
because it wasn’t updated when the code below changed, and actually instead of
helping you to understand the code it contributes further to make it more obscure.
Do not comment what you are doing but why. For instance:
record++; // increment record by one
This comment doesn’t tell anything the C code doesn’t tell us anyway.
record++;
//Pass to next record.
// The boundary tests are done at
// the beginning of the loop above
This comment brings useful information to the reader.
At the beginning of each procedure, try to add a standard comment describing
the purpose of the procedure, inputs/outputs, error handling etc.12
At the beginning of each module try to put a general comment describing what
this module does, the main functions etc.
Note that you yourself will be the first guy to debug the code you write. Com-
mentaries will help you understand again that hairy stuff you did several months ago,
when in a hurry.
The editor of lcc-win provides a "Standard comments" feature. There are two
types of comments supported: comments that describe a function, and comments that
apply to a whole file. These comments are maintained by the editor that displays a
simple interface for editing them.
1.11.1
Describing a function
You place the mouse anywhere within the body of a function and you click the right
mouse button. A context menu appears that offers you to edit the description of the
current function. The interface that appears by choosing this option looks like this:
12The IDE of lcc-win helps you by automatic the construction of those comments. Just press,
edit description in the right mouse button menu.
52
Chapter 1. Introduction to C
There are several fields that you should fill:
1. Purpose. This should explain what this function does, and how it does it.
2. Inputs: Here you should explain how the interface of this function is designed:
the arguments of the function and global variables used if any.
3. Outputs. Here you should explain the return value of the function, and any
globals that are left modified.
4. Error handling. Here you should explain the error return, and the behavior of
the function in case of an error occurring within its body.
For the description provided in the screen shot above, the editor produces the fol-
lowing output:
/*---------------------------------------------------------------
Procedure:
multiple ID:1
Purpose:
Compiles a multiple regular expression
Input:
Reads input from standard input
Output:
Generates a regexp structure
Errors:
Several errors are displayed using the "complain"
function
-----------------------------------------------------------------*/
void multiple(void)
{
This comment will be inserted in the interface the next time you ask for the descrip-
tion of the function.
1.12. An overview of the whole language
53
1.11.2
Describing a file
In the same context menu that appears with a right click, you have another menu
item that says "description of file.c", where "file.c" is the name of the current file.
This allows you to describe what the file does. The editor will add automatically
the name of the currently logged on user, most of the time the famous administrator.
The output of the interface looks like this:
/*-----------------------------------------------------------------
Module:
d:\lcc\examples\regexp\try.c
Author:
ADMINISTRATOR
Project:
State:
Creation Date:
Description: This module tests the regular expressions
package. It is self-contained and has a main()
function that will open a file given in the
command line that is supposed to contain
several regular expressions to test. If any
errors are discovered, the results are printed
to stdout.
-----------------------------------------------------------------*/
As with the other standard comment, the editor will re-read this comment into the
interface.
This features are just an aid to easy the writing of comments, and making them
uniform and structured. As any other feature, you could use another format in
another environment. You could make a simple text file that would be inserted where
necessary and the fields would be tailored to the application you are developing. Such
a solution would work in most systems too, since most editors allow you to insert a
file at the insertion point.
1.12
An overview of the whole language
Let’s formalize a bit what we are discussing. Here are some tables that you can use
as reference tables. We have first the words of the language, the statements. Then
we have a dictionary of some sentences you can write with those statements, the
different declarations and control-flow constructs. And in the end is the summary of
the pre-processor instructions. I have tried to put everything hoping that I didn’t
forget something.
You will find in the left column a more or less formal description of the construct,
a short explanation in the second column, and an example in the third. In the
first column, these words have a special meaning: “id”, meaning an identifier, “type”
meaning some arbitrary type and “expr” meaning some arbitrary C expression.
I have forced a page break here so that you can print these pages separately, when
you are using the system.
54
Chapter 1. Introduction to C
1.12.1
Statements
Expression
Meaning
Example
identifier
The value associated with that
id
identifier. (see page 65.)
constant
The value defined with this con-
stant (see page 67.).
Integer or unsigned integer con-
45 45U
stant.
long integer or unsigned long inte-
45L 45UL
ger constant
long long integer or unsigned long
45LL 45ULL
long integer constant
Floating constant
45.9
float constant
45.9f
long double constant
45.9L
qfloat constant
45.9Q
character constant or wide char-
‘A’ L’A’
acter constant enclosed in single
quotes
String literal or wide character
"Hello" L"Hello"
string literal enclosed in double
quotes
{constants}
Define tables or structure data.
int tab[]={1,67}
Each comma separated item is an
item in the table or structure.
Prefixed integer
Uses different numerical bases.
constants
octal constant (base 8) introduced
055 (45 in base 8)
with a leading zero
Hexadecimal constant introduced
0x2d (45 in base 16)
with 0x
Binary constant introduced with
0b101101 (45 in bi-
0b. This is an lcc-win extension.
nary)
Array[index]
Access the position “index” of the
Table[45]
given array. Indexes start at zero
(see page 72.)
Array[i1][i2]
Access the n dimensional array us-
Table[34][23]
ing the indexes i1, i2, . . . in..See
This access the 35th
“Arrays.” on page 72.
line,
24th position
of Table
fn(args)
Call the function
“fn” and pass
sqrt(4.9);
it the comma separated argument
list "args". see “Function calls” on
page 75.
fn (arg ...),
Function with variable number of
arguments
1.12. An overview of the whole language
55
Table 1.7 - Continued
Expression
Meaning
Example
(*fn)(args)
Call the function whose machine
address is in the pointer fn.
struct.field
Access the member of the struc-
Customer.Name
ture
struct->field
Access the member of the struc-
Customer->Name
ture through a pointer
var = value
Assign to the variable the value of
a = 45
the right hand side of the equals
sign. See “Assignment.” on page
79
expression++
Equivalent to expression = expres-
a = i++
sion + 1. Increment expression af-
ter using its value. See “Postfix”
on page 80.
expression--
Equivalent to expression = expres-
a = i-
sion - 1. Decrement expression af-
ter using its value. see “Postfix” on
page 80.
++expression
Equivalent to expression = expres-
a = ++i
sion+1. Increment expression be-
fore using its value.
--expression
Equivalent to Expression
= ex-
a = -i
pression - 1. Decrement expres-
sion before using it.
& object
Return the machine address of ob-
i
ject. The type of the result is a
pointer to the given object.
* pointer
Access the contents at the machine
*pData
address stored in the pointer. .See
“Indirection” on page 72.
- expression
Subtract expression from zero, i.e.
-a
change the sign.
~ expression
Bitwise complement expression.
a
Change all 1 bits to 0 and all 0
bits to 1.
! expression
Negate expression: if expression is
!a
zero,
!expression becomes one, if
expression is different than zero, it
becomes zero.
sizeof(expr)
Return the size in bytes of expr.
sizeof(a)
.see “Sizeof.” on page 60.
56
Chapter 1. Introduction to C
Table 1.7 - Continued
Expression
Meaning
Example
(type) expr
Change the type of expression to
(int *)a
the given type. This is called a
“cast”. The expression can be a lit-
eral expression enclosed in braces,
as in a structure initialization. See
page 69.
expr * expr
Multiply
a*b
expr / expr
Divide
a/b
expr % expr
Divide first by second and return
a%b
the remainder
expr + expr
Add
a+b
expr1
- expr2
Subtract expr2 from expr1. .
a-b
expr1
<< expr2
Shift left expr1 expr2 bits.
a << b
expr1
>> expr2
Shift right expr1 expr2 bits.
a >> b
expr1
< expr2
1 if expr1 is smaller than expr2,
a < b
zero otherwise
expr1
<= expr2
1 if expr1 is smaller or equal than
a <= b
expr2, zero otherwise
expr1
>= expr2
1 if expr1 is greater or equal than
a >= b
expr2, zero otherwise
expr1
> expr2
1 if expr2 is greater than expr2,
a > b
zero otherwise
expr1
== expr2
1 if expr1 is equal to expr2, zero
a == b
otherwise
expr1
!= expr2
1 if expr1 is different from expr2,
a != b
zero otherwise
expr1
& expr2
Bitwise AND expr1 with expr2.
a&8
See “Bitwise operators” on page
67.
expr1
^ expr2
Bitwise XOR expr1 with expr2.
a^b
See Bitwise operators on page 67.
expr1
| expr2
Bitwise OR expr1 with expr2. See
a|16
“Bitwise operators” on page 67.
expr1
&& expr2
Evaluate expr1. If its result is
a < 5 &&
a
>
0
zero, stop evaluating the whole ex-
This will be
1
if
pression and set the result of the
“a” is between
1
to
whole expression to zero. If not,
4.
If a
>= 5 the
continue evaluating expr2. The re-
second test is not
sult of the expression is the logical
performed.
AND of the results of evaluating
each expression. See “Logical op-
erators” on page 66.
1.12. An overview of the whole language
57
Table 1.7 - Continued
Expression
Meaning
Example
expr1 || expr2
Evaluate expr1. If the result is
a == 5 ||a == 3
one, stop evaluating the whole ex-
This will be
1
if
pression and set the result of the
either a is 5 or 3
expression to 1. If not, continue
evaluating expr2. The result of
the expression is the logical OR of
the results of each expression. See
“Logical operators” on page 66.
expr
? v1:v2
If expr evaluates to non-zero
a= b ? 2 : 3
(true), return v1, otherwise return
a will be
2 if b
is
v2. see “Conditional operator.” on
true, 3 otherwise
page 58.
expr
*= expr1
Multiply expr by expr1 and store
a *= 7
the result in expr
expr
/= expr1
Divide expr by expr1 and store the
a /= 78
result in expr
expr
\%= expr1
Calculate the remainder of expr
a %= 6
mod expr1 and store the result in
expr
expr
+= expr1
Add expr1 with expr and store the
a += 6
result in expr
expr
-= expr1
Subtract expr1
from expr and
a -= 76
store the result in expr
expr
<<= expr1
Shift left expr by expr1 bits and
a <<= 6
store the result in expr
expr
>>= expr1
Shift right expr by expr1 bits and
a >>= 7
store the result in expr
expr
&= expr1
Bitwise and expr with expr1 and
a &= 32
store the result in expr
expr
^= expr1
Bitwise xor expr with expr1 and
a ^= 64
store the result in expr
expr
|= expr1
Bitwise or expr with expr1 and
a |= 128
store the result in expr.
expr,
expr1
Evaluate expr, then expr1 and re-
a=7,b=8
turn the result of evaluating the
Result of this is
8
last expression, in this case expr1.
.See page 69
;
Null statement
;
58
Chapter 1. Introduction to C
1.12.2
Declarations
Declarations
Meaning
Example
type id;
Identifier will have the specified type
int a;
within this scope. In a local scope
its value is undetermined. In a global
scope, its initial value is zero, at pro-
gram start.
type * id;
Identifier will be a pointer to objects of
int *pa; pa will be
the given type. You add an asterisk for
a pointer to int
each level of indirection. A pointer to
a pointer needs two asterisks, etc.
type
Identifier will be an array of “int expr”
int *ptrTab[56*2];
id[int expr]
elements of the given type. The expres-
Array of
112
int
sion must evaluate to a compile time
pointers.
constant or to a constant expression
that will be evaluated at run time. In
the later case this is a variable length
array.
typedef old new
Define a new type-name for the old
typedef unsigned
type. see “Typedef.” on page 59.
uint;
register id;
Try to store the identifier in a machine
register f;
register. The type of identifier will
be equivalent to signed integer if not
explicitly specified. see “Register.” on
page 59.
extern type id;
The definition of the identifier is in an-
extern int
other module. No space is reserved.
frequency;
static type id
Make the definition of identifier not ac-
static int f;
cessible from other modules.
struct id {
Define a compound type composed of
struct coord {
declarations
the list of fields enclosed within the
int x;
}
curly braces.
int y;
};
type id:n
Within a structure field declaration, de-
unsigned n:4
clare “id” as a sequence of n bits of type
n is an unsigned int
“type”. See “Bit fields” on page 63.
of 4 bits
union id {
Reserve storage for the biggest of the
union dd {
declarations
declared types and store all of them in
double d;
};
the same place. see “Union.” on page
int id[2];
59.
};
enum id {
Define an enumeration of comma-
enum color
enum list
separated identifiers assigning them
{red,green,blue};
};
some integer value. see “Enum.” on
page 60.
1.12. An overview of the whole language
59
const type id;
Declare that the given identifier can’t
const int a;
be changed (assigned to) within this
scope. see “Const.” on page 60.
type * restrict
This pointer has no other pointers that
char * restrict p;
point to the same data.
volatile type
Declare that the given object changes
volatile int
identifier
in ways unknown to the implementa-
hardware_clock;
tion. The compiler will not store this
variable in a register, even if optimiza-
tions are turned on.
unsigned
When applied to integer types do not
unsigned char a;
int-type
use the sign bit. see “Unsigned.” on
page 63.
type id(args);
Declare the prototype for the given
double sqrt(double);
function. The arguments are a comma
separated list. see
“Prototypes.” on
page 60.
type(*id)
Declare a function pointer called “id”
void (*fn)(int)
(arguments);
with the given return type and argu-
ments list
label:
Declare a label.
lab1:
type fn(args) {
Definition of a function with return
int add1(int x)
... statements ...
type type and arguments args .
{ return x+1;}
}
inline
This is a qualifier that applies to func-
double inline
tions. If present, it can be understood
overPi(double a)
by the compiler as a specification to
{return a/3.14159;}
generate the fastest function call pos-
sible, generally by means of replicating
the function body at each call site.
main
This is the entry point of each pro-
int main(void);
gram. There should be a single func-
int main(int argc,
tion called main in each conforming C
char *argv[])
program. There are two possible inter-
faces for main: without arguments, and
with arguments.
1.12.3
Pre-processor
Declarations
Meaning
Example
// commentary
Double slashes introduce com-
// comment
ments up to the end of the
line.see
“Comments” on page
65.
60
Chapter 1. Introduction to C
/*commentary */
Slash star introduces a com-
/* comment */
mentary until the sequence
star slash
*/ is seen. see
“Comments” on page 65.
defined (id)
If the given identifier is
#if defined(max)
#defined, return 1, else re-
turn 0.
#define id
text
Replace all appearances of the
#define TAX 6
given identifier
(id here) by
the corresponding expression.
See “Preprocessor commands”
on page 176.
#define
Define a macro with n argu-
#define max(a,b)
macro(a,b)
ments. When used, the ar-
((a)<(b)?
guments are lexically replaced
(b):(a))
within the macro. See page
177
#ifdef id
If the given identifier is de-
#ifdef TAX
fined (using #define) include
the following lines. Else skip
them. See page 178.
#ifndef id
The contrary of the above
#ifnef TAX
#if (expr)
Evaluate expression and if the
#if (TAX==6)
result is TRUE, include the
following lines. Else skip all
lines until finding an #else or
#endif
#else
the else branch of an #if or
#else
#ifdef
#elif
Abbreviation of #else #if
#elif
#endif
End an #if or #ifdef prepro-
#endif
cessor directive statement
#warning "text”
Writes the text of a warning
#warning
message. This is an extension
"MACHINE undefined"
of lcc-win but other compilers
(for instance gcc) support it
too.
#error "text”
Writes an error message
#error
"M undefined"
#file "foo.c"
Set the file name
#file "ff.c”
#line nn
Set the line number to nn
#line 56
#include <fns.h>
Insert the contents of the
#include <stdio.h>
given file from the standard
include directory into the pro-
gram text at this position.
1.12. An overview of the whole language
61
#include "fns.h"
Insert the contents of the
#include "foo.h"
named file starting the search
from the current directory.
##
Token concatenation
a##b → ab
#token
Make a string with a token.
#foo → "foo"
Only valid within macro dec-
larations
#pragma
Special compiler directives
#pragma
optimize(on)
_Pragma(string)
Special compiler directives
_Pragma(
"optimize (on)");
#undef id
Erase from the pre-processor
#undef TA
tables the given identifier.
\
If a
\ appears at the end of
a line just before the new-
line character, the line and the
following line will be joined
by the preprocessor and the \
character will be eliminated.
__LINE__
Replace this token by the cur-
printf(
rent line number
"error line %d\n"
,
__LINE__);
__FILE__
Replace this token by the cur-
printf( "error in
rent file name
%s\n", __FILE__ );
__ func__
Replace this token by the
printf("fn %s\n",
name of the current function
__func__ );
being compiled.
__STDC__
Defined as 1
#if __STDC__
__LCC__
Defined as 1 This allows you
#if __LCC__
to conditionally include or not
code for lcc-win.
1.12.4
Control-flow
Syntax
Description
if (expression)
If the given expression evaluates to something
{ block}
different than zero execute the statements of the
else
following block. Else, execute the statements of
{ block }
the compound statement following the else key-
word. The else statement is optional. Note that
a single statement can replace blocks.
while (expression) {
If the given expression evaluates to something
... statements ...
different than zero, execute the statements in
}
the block, and return to evaluate the controlling
expression again. Else continue after the block.
See “while” on page 14.
62
Chapter 1. Introduction to C
do {
Execute the statements in the block, and after-
... statements ...
wards test if condition is true. If that is the
} while (condition);
case, execute the statements again.See “do” on
page 14.
for(init;test;incr)
Execute unconditionally the expressions in the
{
init statement. Then evaluate the test expres-
... statements ...
sion, and if evaluates to true, execute the state-
}
ments in the block following the for. At the end
of each iteration execute the incr statements and
evaluate the test code again. See “for” on page
13.
switch (expression) {
Evaluate the given expression. Use the result-
case int-constant:
ing value to test if it matches any of the inte-
statements ...
ger expressions defined in each of the ‘case’ con-
break;
structs. If the comparison succeeds, execute the
default:
statements in sequence beginning with that case
statements
statement. If the evaluation of expression pro-
}
duces a value that doesn’t match any of the cases
and a “default” case was specified, execute the
default case statements in sequence. See “Switch
statement.” on page 65.
goto label;
Transfer control unconditionally to the given la-
bel.
continue
Within the scope of a for/do/while loop state-
ment, continue with the next iteration of the
loop, skipping all statements until the end of
the loop.See “Break and continue statements” on
page 64.
break
Stop the execution of the current do/for/while
loop statement.
return expression
End the current function and return control to
the calling one. The return value of the func-
tion (if any) can be specified in the expression
following the return keyword.See page 62.
1.12.5
Extensions of lcc-win
Syntax
Description
t operator token(args)
Redefine one of the operators like +, * or others
{
so that instead of issuing an error, this function
statements
is called instead. See page 206
}
type & id = expr;
Identifier will be a reference to a single object
of the given type. References must be initialized
immediately after their declaration.
1.12. An overview of the whole language
63
int fn(int a,int b=0)
Default function arguments. If the argument is
not given in a call, the compiler will fill it with
the specified compile time constant
int overloaded f(int)
Generic functions. These functions have several
int overloaded f(char*)
types of arguments but the same name.
2
A closer view
Let’s go in-depth for each of the terms described succintely in the table above. The
table gives a compressed view of C. Now let’s see some of the details.
2.1
Identifiers.
An “identifier” is actually a name. It names either an action, a piece of data, an
enumeration of several possible states, etc. The C language uses the following rules
concerning names:
• The letters allowed are A-Z, a-z and the underscore character ‘_’.
• Digits are allowed, but no identifier starts with a digit.
• Lower case and upper case letters are considered different
• lcc-win has 255 characters for the maximum length of a name. The standard
guarantees 31 significant characters for an external identifier, 63 for an internal
one. If you use overly long names, your code may not work in other environ-
ments.
Identifiers are the vocabulary of your software. When you create them, give a
mnemonic that speaks about the data stored at that location. Here are some rules
that you may want to follow:
• Most of the time, construct identifiers from full words, joined either by un-
derscore or implicitly separated by capitalization. For example we would use
list_element or ListElement. A variant of this rule is “camel-casing”: the
first word is in lower case, the second starts with upper case. In this example
we would have listElement.
• Use abbreviations sparingly, for words you use frequently within a package. As
always, be consistent if you do so.
• Identifier names should grow longer as their scope grows longer: Identifiers
local to a function can have much shorter names than those used throughout a
package.
• Identifiers containing a double underscore (" __ ") or beginning with an un-
derscore and an upper-case letter are reserved by the compiler, and should
65
66
Chapter 2. A closer view
therefore not be used by programmers. To be on the safe side it is best to avoid
the use of all identifiers beginning with an underscore.1
2.1.1
Identifier scope and linkage
Until now we have used identifiers and scopes without really caring to define pre-
cisely the details. This is unavoidable at the beginning, some things must be left
unexplained at first, but it is better to fill the gaps now.
An identifier in C can denote:
• an object.
• a function
• a tag or a member of a structure, union or enum
• a typedef
• a label
For each different entity that an identifier designates, the identifier can be used (is
visible) only within a region of a program called its scope. There are four kinds of
scopes in C.
The file scope is built from all identifiers declared outside any block or parame-
ter declaration, it is the outermost scope, where global variables and functions are
declared.
A function scope is given only to label identifiers.
The block scope is built from all identifiers that are defined within the block. A
block scope can nest other blocks.
The function prototype scope is the list of parameters of a function. Identifiers
declared within this scope are visible only within it. Let’s see a concrete example of
this:
static int Counter = 780;
// file scope
extern void fn(int Counter); // function prototype scope
void function(int newValue, int Counter) // Block scope
{
double d = newValue;
label:
for (int i = 0; i< 10;i++) {
if (i < newValue) {
char msg[45];
int Counter = 78;
sprintf(msg,"i=%d\n",i*Counter); <----
}
1Microsoft has developed a large set of rules, mostly very reasonable ones here:
http://msdn.microsoft.com
/library/default.asp?url=/library/en-us/cpgenref/ html/cpconNamingGuidelines.asp
2.2. Constants
67
if (i == 4)
goto label;
}
}
At the point indicated by the arrow, the poor “Counter” identifier has a busy life:
• It was bound to an integer object with file scope
• Then it had another incarnation within the function prototype scope
• Then, it was bound to the variables of the function ‘setCounter’ as a parameter
• That definition was again “shadowed” by a new definition in an inner block, as
a local variable.
The value of “Counter” at the arrow is 78. When that scope is finished its value will
be the value of the parameter called Counter, within the function “function”.
When the function definition finishes, the file scope is again the current scope,
and “Counter” reverts to its value of 780.
The “linkage” of an identifier refers to the visibility to other modules. Basically,
all identifiers that appear at a global scope (file scope) and refer to some object
are visible from other modules, unless you explicitly declare otherwise by using the
“static” keyword.
Problems can appear if you first declare an identifier as static, and later on, you
define it as external. For instance:
static void foo(void);
and several hundred lines below you declare:
void foo(void) { ...
}
Which one should the compiler use? static or not static? That is the question. . .
Lcc-win chooses always non-static, to the contrary of Microsoft’s compiler that
chooses always static. Note that the behavior of the compiler is explicitly left unde-
fined in the standard, so both behaviors are correct.
2.2
Constants
2.2.1
Evaluation of constants
The expressions that can appear in the definition of a constant will be evaluated in
the same way as the expressions during the execution of the program. For instance,
this will put 1 into the integer constant d:
static int d = 1;
This will also put one in the variable d:
static int d = 60 || 1 +1/0;
68
Chapter 2. A closer view
Why?
The expression 60 || 1+1/0 is evaluated from left to right. It is a boolean
expression, and its value is 1 if the first expression is different from zero, or the value
of the second expression if the value of the first one is zero. Since 60 is not zero, we
stop immediately without evaluating the second expression, what is fortunate since
the second one contains an error...
Constant expressions
The standard defines constant expressions as follows:
A constant expression can be evaluated during translation rather than
runtime, and accordingly may be used in any place that a constant may
be.
Constant expressions can have values of the following type:
• Arithmetic. Any arithmetic operations are allowed. If floating point is used
the precision of the calculations should be at least the same as in the run time
environment.
• A null pointer constant.
• The address of an object with global scope. Optionally an integer offset can be
added to the address.
Since constant expressions are calculated during compilation, even inefficient al-
gorithms are useful since the execution time is not affected. For instance Hallvard
B Furuseth proposed2 a set of clever macros to calculate the logarithm base 2 of a
number during compilation:
/*
* Return (v ? floor(log2(v)) : 0) when 0 <= v < 1<<[8, 16, 32, 64].
* Inefficient algorithm, intended for compile-time constants.
*/
#define LOG2_8BIT(v)
(8 - 90/(((v)/4+14)|1) - 2/((v)/2+1))
#define LOG2_16BIT(v) (8*((v)>255) + LOG2_8BIT((v) >>8*((v)>255)))
#define LOG2_32BIT(v) \
(16*((v)>65535L) + LOG2_16BIT((v)*1L >>16*((v)>65535L)))
#define LOG2_64BIT(v)\
(32*((v)/2L>>31 > 0) \
+ LOG2_32BIT((v)*1L >>16*((v)/2L>>31 > 0) \
>>16*((v)/2L>>31 > 0)))
Clever isn’t it?
So much clever that I have been unable to understand how they work3. I just
tested this with the following program:
2In a message to the comp.lang.c discussion group posted on June 28th 2006, 4:37 pm. You can
find the original message in
https://groups.google.com/group/comp.lang.c/msg/706324f25e4a60b0?hl=en&
3Thomas Richter explained them to me. He said (in comp.lang.c):
2.2. Constants
69
#include <math.h>
#include <stdio.h>
int main(void)
{
printf("LOG2_32BIT(35986)=%ld\n",LOG2_32BIT(35986));
printf("log2(35986.0)=%g\n",log2(35986.0));
}
OUTPUT:
LOG2_32BIT(35986)=15
log2(35986.0)=15.1351
What is also interesting is that lcc-win receives from the preprocessor the result
of the macro expansion. Here is it, for your amusement4:
#line 17 "tlog2.c"
int main(void)
{
printf("LOG2_32BIT(35986)=%ld\n",(16*((35986)>65535L) +
(8*(((35986)*1L >>16*((35986)>65535L))>255) +(8 - 90/
(((((35986)*1L >>16*((35986)>65535L)) >>8*(((35986)*
1L >>16*((35986)>65535L))>255))/4+14)|1) - 2/((((35986)*
1L >>16*((35986)>65535L)) >>8*(((35986)*1L >>16*((35986)
>65535L))>255))/2+1)))));
}
The compiler calculates all those operations during compilation, and outputs the 15,
that is stuffed into a register as an argument for the printf call. Instead of calling an
expensive floating point library function you get the result with no run time penalty.
2.2.2
Integer constants
An integer constant begins with a digit, but has no period or exponent part. It may
have a prefix that specifies its base and a suffix that specifies its type. A decimal
constant begins with a nonzero digit and consists of a sequence of decimal digits. An
octal constant consists of the prefix 0 optionally followed by a sequence of the digits
0 through 7 only. A hexadecimal constant consists of the prefix 0x or 0X followed by
a sequence of the decimal digits and the letters a (or A) through f (or F) with values
10 through 15 respectively. Here are various examples of integer constants:
LOG2_8BIT is a rational function approximation of the log - it is not identical to the log and gives
false results for x = 0, or for x > 255, but only then. That is, it matches the output of log(x) by
properly tweaking the coefficients.
LOG2_16 simply uses the functional equation of the log, namely
log(x ∗ b) = log(b) + log(x)
and in this case, b = 256. The same goes for LOG2_32 and LOG2_64 which simply extend the game
to 64 bit by factoring more and more powers out.
4Of course that is a single huge line that I had to cut in several places to make it fit the text
70
Chapter 2. A closer view
Constant
Description
12345
integer constant, decimal
0777
octal for 511 decimal
0xF98A
hexa for 63882 decimal. Result type is unsigned
12345L
long integer constant
2634455LL
long long integer constant
2634455i64
long long integer, Microsoft notation
5488UL
unsigned long constant 5488
548ULL
unsigned long long constant 548
2.2.3
Floating constants
For floating constants, the convention is either to use a decimal point (1230.0) or
scientific notation (in the form of 1.23e3). They can have the suffix ‘F’ (or ‘f’)
to mean that they are float constants, and not double constants as it is implicitly
assumed when they have no suffix.
A suffix of “l” or “L” means long double constant. A suffix of “q” or “Q” means a
qfloat. The default format (without any suffix) is double. This default is important
since it can be the source of bugs that are very difficult to find. For instance:
long double d = 1e800;
The dynamic range of a long double is big enough to accept this value, but since the
programmer has forgotten the L the number will be read in double precision. Since
a double precision number can’t hold this value, the result is that the initialization
will not work at all: a random value will be stored into "d".
2.2.4
Character string constants
For character string constants, they are enclosed in double quotes. If immediately
before the double quote there is an "L" it means that they are double byte strings.
Example:
L"abc"
This means that the compiler will convert this character string into a wide char-
acter string and store the values as double byte character string instead of just ASCII
characters.
To include a double quote within a string it must be preceded with a backslash.
Example:
"The string \"the string\" is enclosed in quotes"
Note that strings and numbers are completely different data types. Even if a
string contains only digits, it will never be recognized as a number by the compiler:
"162" is a string, and to convert it to a number you must explicitly write code to do
the transformation.5
5Using operator overloading you can add new meanings to the normal arithmetic operators.
You can then, "add" strings, what is a very bad idea for expressing string concatenation. String
concatenation is not a similar operation to addition, since it is not commutative: "abc"+"def", →
"abcdef", but "def"+"abc" → "defabc"
2.2. Constants
71
Character string constants that are too long to write in a single line can be entered
in two ways:
char *a = "This is a long string that at the end has a backslash \
that allows it to go on in the next line";
Another way, introduced with C99 is:
char *a = "This is a long string written",
"in two lines";
Remember that character string constants should not be modified by the program.
Lcc-win stores all character string constants once, even if they appear several times
in the program text. For instance if you write:
char *a = "abc";
char *b = "abc";
Both a and b will point to the SAME string, and if either is modified the other will
not retain the original value6.
2.2.5
Character abbreviations
Within a string constant, the following abbreviations are recognized:
Abbrev.
Meaning
Value
\n
New line
10
\r
carriage return
12
\b
backspace
8
\v
vertical tab
11
\t
tab
9
\f
form feed
12
\e
escape
27
\a
bell
7
\\
Insert a backslash
\
\”
Insert a double quote
\x<hex>
Insert at the current position the
Any value, since any
character with the integer value of the
digit can be entered.
hexadecimal digits.
Example:
The string "AB\xA"
is the same as "AB\n"
\<octal>
The same as the \x case above, but
Any.
with values entered as 3 octal digits,
i.e. numbers in base 8. Note that
no special character is needed after
the backslash. The octal digits start
immediately after it.
Example:
"AB\012" is the same
as "AB\n"
6Other compilers are different. GCC, for instance, makes the program crash if a character string
constant is modified.
72
Chapter 2. A closer view
2.3
Arrays
Here are various examples of using arrays.
int a[45];
// Array of 45 elements
a[0] = 23;
// Sets first element to 23;
a[a[0]] = 56; // Sets the 24th element to 56
a[23] += 56; // Adds 56 to the 24th element
char letters[] = {‘C’, ‘-’, ‘-’};
Note that the last array “letters” is NOT a zero terminated character string but an
array of 3 positions that is not terminated by a zero byte.
Multidimensional arrays are indexed like this:
int tab[2][3];
tab[1][2] = 7;
A table of 2 rows and three columns is declared. Then, we assign 7 to the second
row, third column. (Remember: arrays indexes start with zero). Note that when you
index a two dimensional array with only one index you obtain a pointer to the start
of the indicated row.
int *p = tab[1];
Now p contains the address of the start of the second column.
Arrays in C are stored in row-major order, i.e. the array is a contiguous piece
of memory and the rows of the array are stored one after the other. The individual
array members are accessed with a simple formula:
x[i][j] == *(x+i*n+j)
where n is the row size of the array x. It is evident from this formula that the compiler
treats differently a two dimensional array from a one dimensional one, because it
needs one more piece of information to access the two dimensional one: the size of
each row, “n” in this case.
How does the compiler know this value?
From the declaration of the array of course. When the compiler parses a decla-
ration like
int tab[5][6];
The last number (6) is the size of each row, and is all the information the compiler
needs when compiling each array access. Since arrays are passed to functions as
pointers, when you pass a two dimensional array it is necessary to pass this informa-
tion too, for instance by leaving empty the number of rows but passing the number
of columns, like this:
int tab[][6]
This is the standard way of passing two dimensional arrays to a function. For exam-
ple:
2.3. Arrays
73
#include <stdio.h>
int tab[2][3] = {1,2,3,4,5,6};
// Note the declaration of the array parameter
int fn(int array[][3])
{
printf("%d\n",array[1][1]);
}
int main(void)
{
fn(tab);
}
Arrays can be fixed, i.e. their dimensions are determined at compile time, or they
can be dynamic, i.e. their dimensions are determined at run time.
For dynamic arrays, we have to do a two stage process to allocate the storage
that we need for them, in contrast to one dimensional arrays where we need just a
single allocation.
For instance, here is the code we would write to allocate dynamically an array of
integers of 3 rows and 4 columns:
int ** result = malloc(3*sizeof(int *));
for (int i = 0; i<3;i++) {
result[i] = malloc(4*sizeof(int));
}
Of course in a real program we would have always tested the result value of malloc
for failure.
We see that we allocate an array of pointers first, that is equal to the number of
rows we will need. Then, we fill each row with an allocation of space for the number
of columns in the array times the size of the object we are storing in the array, in
this example an integer.
It is important to distinguish the difference between dynamically allocated and
compile-time fixed arrays. The row major order formula does not work with dynamic
arrays, only with arrays whose dimensions are known during the compilation.
From the above discussion we see too that we need always an array of pointers
as big as the number of rows in the array, something we do not need in the case of
arrays with known dimensions
Obviously, if you want the best of the two alternatives you can allocate a single
block of memory for the two dimensional array, and instead of using the array notation
you use the pointer notation (the array formula above) yourself to access the array,
eliminating any need for increased storage. You would allocate the two dimensional
array like this:
int *result = malloc(sizeof(int) * NbOfRows * NbOfColumns);
and you would access the array like this:
result[row*NbOfColumns+column];
74
Chapter 2. A closer view
Note that you have to keep the dimensions of the array separately.
The array of pointers can be cumbersome but it gives us more flexibility. We can
easily add a new row to the array, i.e. between the row 1 and 2. We just need to add
a pointer after the first one, and move upward the other pointers. We do not have
to move the data at all.
Not so with the shortcut that we described above. There, we have to move the
data itself to make space for an extra row. In the case of arrays defined at compile
time it is impossible to do anything since the dimensions of the array are “compiled
in” at each access by the compiler.
2.3.1
Variable length arrays.
These arrays are based on the evaluation of an expression that is computed when
the program is running, and not when the program is being compiled. Here is an
example of this construct:
int Function(int n)
{
int table[n];
}
The array of integers called “table” has n elements. This “n” is passed to the function
as an argument, so its value can’t be known in advance. The compiler generates
code to allocate space for this array in the stack when this function is entered. The
storage used by the array will be freed automatically when the function exits.
2.3.2
Array initialization
To fill an array you could theoretically write a function like this:
int array[10];
void arrayinit(void)
{
array[0] = 12;
array[1] = 13;
array[2] = 765;
// and so on until array[9]
}
To avoid unnecessary typing, and speed up the program, you can ask the compiler
to do the same operation at compile time by typing:
int array[10] = {12,13,765,123,5,0,0,21,78,1};
This has exactly the same result, but instead of doing the initialization at run time,
it will be done during the compilation, producing a bit image of the array that will
be loaded in the executable.
Obviously this is an improvement, but still, suppose you have an array of 20
positions, where all positions are zero excepting the 18th, that has the value 1.
2.4. Function calls
75
Obviously you can write:
int array[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0};
This is quite error prone, specially if instead of 20 positions you have 200. For this
situations, the standard language provides the following syntax:
int array[] = {[17] = 1};
This is much shorter, allowing you to initialize sparse arrays easily.
2.3.3
Compound literals
You can use a construct similar to a cast expression to indicate the type of a composite
constant literal. For instance:
typedef struct tagPerson {
char Name[75];
int age;
} Person;
void process(Person *);
process(&(Person){"Mary Smith” , 38});
This is one of the new features of C99. The literal should be enclosed in braces, and it
should match the expected structure. This is just “syntactic sugar” for the following:
Person __998815544ss = { "Mary Smith”, 38};
process(&__998815544ss);
The advantage is that now you are spared that task of figuring out a name for
the structure since the compiler does that for you. Internally however, that code
represents exactly what happens inside lcc-win.
2.4
Function calls
sqrt( hypo(6.0,9.0) ); // Calls the function hypo with
// two arguments and then calls
// the function sqrt with the
// result of hypo
An argument may be an expression of any object type. In preparing for the call to
a function, the arguments are evaluated, and each parameter is assigned the value
of the corresponding argument. Obviously some conversions may be done to the
result of evaluating those expressions. For instance if we call the function ‘sqrt’ that
expects a double precision number but we find a call like this c = sqrt(42); The
compiler will convert the integer 42 (the actual argument) into a double precision
number before passing it to the ‘sqrt’ function.
76
Chapter 2. A closer view
Sometimes the conversion cannot be done. A call of sqrt("Jane”); is an error
since there is no conversion possible from a character string into a double precision
number.
A function may change the values of its parameters, but these changes cannot
affect the values of the arguments. On the other hand, it is possible to pass a pointer
to an object, and the function may change the value of the object pointed to.
A parameter declared to have array or function type is converted to a parameter
with a pointer type.
The order of evaluation of the actual arguments, and sub expressions within the
actual arguments is unspecified. For instance:
fn( g(), h(), m());
Here the order of the calls to the functions g(), h() and m() is unspecified.
The syntax for function pointers is the same as for a normal function call. It is
not needed to dereference a pointer to a function. For instance
int main(void)
{
int (*fn)(int);
// fn is initialized somewhere here
(*fn)(7);
fn(7);
}
Both calls will work. If the called function has a prototype, the arguments are
implicitly converted to the types of the corresponding parameters when possible.
When this conversion fails, lcc-win issues an error and compilation fails. Other
compilers may have different behavior. When a function has no prototype or when a
function has a variable length argument list, for each argument the default argument
promotions apply. The integer promotions are applied to each argument, and float
arguments are passed as double.
2.4.1
Prototypes.
A prototype is a description of the return value and the types of the arguments
of a function. The general form specifies the return value, then the name of the
function. Then, enclosed by parentheses, come a comma-separated list of arguments
with their respective types. If the function doesn’t have any arguments, you should
write ‘void’, instead of the argument list. If the function doesn’t return any value
you should specify void as the return type. At each call, the compiler will check that
the type of the actual arguments to the function is a correct one.
The compiler cannot guarantee, however, that the prototypes are consistent across
different compilation units. For instance if in file1.c you declare:
int fn(void);
then, the call
fn();
2.4. Function calls
77
will be accepted. If you then in file2.c you declare another prototype
void fn(int);
and then you use:
fn(6);
the compiler cannot see this, and the program will be in error, crashing myste-
riously at run time. This kind of errors can be avoided if you always declare the
prototypes in a header file that will be included by all files that use that function.
Do not declare prototypes in a source file if the function is an external one.
2.4.2
Functions with variable number of arguments.
To use the extra arguments you should include <stdarg.h>. To access the additional
arguments, you should execute the va_start, then, for each argument, you execute
a va_arg. Note that if you have executed the macro va_start, you should always
execute the va_end macro before the function exits. Here is an example that will
add any number of integers passed to it. The first integer passed is the number of
integers that follow.
#include <stdarg.h>
int va_add(int numberOfArgs, ...)
{
va_list ap;
int n = numberOfArgs;
int sum = 0;
va_start(ap,numberOfArgs);
while (n--) {
sum += va_arg(ap,int);
}
va_end(ap);
return sum;
}
We would call this function with
va_add(4,987,876,567,9556);
or
va_add(2,456,789);
Implementation details
Under 32 bit systems (linux or windows) the variable arguments area is just a starting
point in the stack area. When you do a va_start(ap), the system makes to ap pointer
point to the start of the arguments, just after the return address.
Later, when you retrieve something from the variable argument list, this pointer
is incremented by the size of the argument just being passed in, and rounded to point
to the next. This is quite simple and works in many systems.
78
Chapter 2. A closer view
Other systems, specially windows 64 bits or Linux 64 bits need a much more
complicated schema since arguments are not passed in the stack but in predetermined
registers. This forces the compiler to save all possible registers in a stack area, and
retrieve the argument s from there. The issue is further complicated because some
arguments are passed in some register sets (integer arguments for instance are passed
in a different set as floating point arguments), and the compiler should keep pointers
to different stack areas.
2.4.3
stdcall
Normally, the compiler generates assembly code that pushes each argument to the
stack, executes the “call” instruction, and then adds to the stack the size of the pushed
arguments to return the stack pointer to its previous position. The stdcall functions
however, return the stack pointer to its previous position before executing their final
return, so this stack adjustment is not necessary.
The reason for this is a smaller code size, since the many instructions that adjust
the stack after the function call are not needed and are replaced by a single instruction
at the end of the called function.
Functions with this type of calling convention will be internally “decorated” by
the compiler by adding the stack size to their name after an “@” sign. For instance a
function called fn with an integer argument will get called fn@4. The purpose of this
“decorations” is to force the previous declaration of a stdcall function so that always
we are sure that the correct declarations was seen, if not, the program doesn’t link.
In 64 bit systems (64 bit windows, and non windows systems like AIX, or Linux)
lcc-win doesn’t use this calling convention. The symbol _stdcall is accepted but
ignored.
2.4.4
Inline
This instructs the compiler to replicate the body of a function at each call site. For
instance:
int inline f(int a) { return a+1;}
Then:
int a = f(b)+f(c);
will be equivalent to writing:
int a = (b+1)+(c+1);
Note that this expansion is realized in the lcc-win compiler only when optimizations
are ON. In a normal (debug) setting, the “inline” keyword is ignored. You can control
this behavior also, by using the command line option "-fno-inline”.
2.5. Assignment.
79
2.5
Assignment.
An assignment expression has two parts: the left hand side of the equal’s sign that
must be a value that can be assigned to, and the right hand side that can be any
expression other than void.
int a = 789; // "a" is assigned 789
array[345] = array{123]+897; //An element of an array is assigned
Struct.field = sqrt(b+9.0); // A field of a structure is assigned
p->field = sqrt(b+9.0);
/* A field of a structure is assigned through a pointer. */
Within an assignment there is the concept of “L-value”, i.e. any assignable object.
You can’t, for instance, write:
5 = 8;. The constant 5 can’t be assigned to. It is
not an “L-value”, the “L” comes from the left hand side of the equals sign of course.
In the same vein we speak of LHS and RHS as abbreviations for left hand side and
right hand side of the equals sign in an assignment.
The rules for type compatibility in assignment are also used when examining the
arguments to a function. When you have a function prototyped like this:
void fn(TYPE1 t1);
TYPE2 t2;
fn(t2);
The same rules apply as if you had written: t1 = t2;
2.6
The four operations
This should be simple, everyone should be able to understand what a*b represents.
There are some subtleties to remember however.
2.6.1
Integer division
When the types of both operands to a division are one of the integer types, the
division performed is “integer” division by truncating the result towards zero. Here
are some examples:
#include <stdio.h>
int main(void)
{
printf("5/6=%d\n",5/6);
printf("6/5=%d\n",6/5);
printf("-6/5=%d, (-6)/5=%d\n",-6/5,(-6)/5);
printf("(-23)/6=%d\n",(-23)/6);
}
The output of this program is:
80
Chapter 2. A closer view
5/6=0
6/5=1-6/5=-1, (-6)/5=-1
(-23)/6=-3
2.6.2
Overflow
All four arithmetic operations can produce an overflow. For signed integer types,
the behavior is completely unspecified and it is considered an error by the standard.
Floating point types (double or float for instance) should set the overflow flag and
this flag can be queried by the program using the floating point exception functions.
Most modern computers can distinguish between two types of overflow conditions:
1. A computation produces a carry, i.e. the result is larger than what the desti-
nation can hold
2. The sign of the result is not what you would expect from the signs of the
operands, a true overflow.
Both cases unless treated by the program will produce incorrect results. Historically,
integer overflow has never been treated by the language with the attention it de-
serves. Everything is done to make programs run fast, but much less is done to make
programs run correctly, giving as a result programs that can return wrong results at
an amazing speed.
lcc-win is the only C compiler that gives the user access to the overflow flag, in
a similar (and efficient) way that the programmer has access to the floating point
exception flags. The built-in function
int _overflow(void);
This function will return 1 when the overflow flag is set, zero otherwise. To use this
feature separate your operations in small steps, and call this pseudo function when
needed. Instead of c = (b+a)/(b*b+56); write
c1 = b+a;,
if (_overflow()) goto ovfl;,
c2 = b*b;,
if (_overflow()) goto ovfl;c = c1/(c2+56);
This can become VERY boring, so it is better to give a command line argument to
the compiler, that will generate the appropiate assembly to test each operation. The
operations monitored are signed ones, unsigned operations wrap around.
2.6.3
Postfix
These expressions increment or decrement the expression at their left side returning
the old value. For instance:
array[234] = 678;
a = array[234]++;
2.7. Conditional operator
81
In this code fragment, the variable a will get assigned 678 and the array element 234
will have a value of 679 after the expression is executed. In the code fragment:
array[234] = 678;
a = ++array[234];
The integer a and the array element at the 234th position will both have the value
679.
When applied to pointers, these operators increment or decrement the pointer
to point to the next or previous element. Note that if the size of the object those
pointers point to is different than one, the pointer will be incremented or decremented
by a constant different than one too.NOTE: Only one postfix expression is allowed
for a given variable within a statement. For instance the expression:
i++ = i++;
is illegal C and will never work the same from one compiler to the next, or even within
the same compiler system will change depending whether you turn optimizations on
or off, for instance. The same applies for the decrement operator:
i-- = i--;
is also illegal. Note that this holds even if the expression is much more complicated
than this:
i++ = MyTable[i--].Field->table[i++];
is completely illegal C.
2.7
Conditional operator
The general form of this operator is:
expression1 ? expression2 : expression3
The first operand of the conditional expression (expression1) is evaluated first. The
second operand (expression2) is evaluated only if the first compares unequal to 0;
the third operand is evaluated only if the first compares equal to 0; the result of the
whole expression is the value of the second or third operand (whichever is evaluated),
converted to the type described below.
If both the second and the third operand have an arithmetic type, the result of
the expression has that type. If both are structures, the result is a structure. If both
are void, the result is void. These expressions can be nested.
int a = (c == 66) ? 534 : 698;
the integer a will be assigned 534 if c is equal to 66, otherwise it will be assigned 698.
struct b *bb = (bstruct == NULL) ? NULL : b->next;
If bstruct is different than NULL, the pointer bb will receive the “next” field of the
structure, otherwise bb will be set to NULL.
82
Chapter 2. A closer view
2.8
Register
This keyword is a recommendation to the compiler to use a machine register for
storing the values of this type. The compiler is free to follow or not this directive.
The type must be either an integer type or a pointer. If you use this declaration,
note that you aren’ t allowed to use the address-of operator since registers do not
have addresses.
Registers are the highest part of your machine memory hierarchy. They are the
fastest storage available to the program by the circuit, and in a PC x86 architecture
there are just a few of them available at a time.
After registers there is the level 1 cache, level 2 cache, main memory, then the
disk, in order of decreasing access speed.
2.8.1
Should we use the register keyword?
The register keyword is no longer really necessary with the improvement of the com-
piler technology. In most cases, the compiler can figure out better than the user
which variables should be stored in registers and which variables should be stored in
memory. Lcc-win tries to honor the register keyword, and it will follow your advice,
but other compilers will ignore it and use their own schema. In general you can’t
rely that the register keyword means that the variable is not stored in memory.
2.9
Sizeof
The result of sizeof is an unsigned constant integer calculated at compile time. For
instance sizeof(int) will yield under lcc-win the constant 4. In the case of a variable
length array however, the compiler can’t know its size on advance, and it will be forced
to generate code that will evaluate the size of the array when the program is running.
For example:
int fn(int size)
{
int tab[size];
}
Here the compiler can’t know the size of tab in advance.
The maximum size of an object for an implementation is given by the macro
SIZE_MAX, defined in limits.h. Lcc-win defines this as 4GB, but in 32 bit systems the
actual maximum will be much lower than that. In 64 bit operating systems, a 32 bit
program running an emulation layer can have all the addressing space of 4GB.
2.10
Enum
An enumeration is a sequence of symbols that are assigned integer values by the
compiler. The symbols so defined are equivalent to integers, and can be used for
instance in switch statements. The compiler starts assigning values at zero, but you
can change the values using the equals sign.
2.11. Goto
83
An enumeration like enum{a,b,c}; will make a zero, b will be 1, and c will be 2.
You can change this with enum {a=10,b=25,c=76};
2.10.1
Const.
Constant values can’t be modified. The following pair of declarations demonstrates
the difference between a “variable pointer to a constant value” and a “constant pointer
to a variable value”.
const int *ptr_to_constant;
int *const constant_ptr;
The contents of any object pointed to by ptr_to_constant shall not be modified
through that pointer, but ptr_to_constant itself may be changed to point to an-
other object. Similarly, the contents of the int pointed to by constant_ptr may be
modified„ but constant_ptr itself shall always point to the same location.
Implementation details
Lcc-win considers that when you declare:
static const int myVar = 56;
it is allowed to replace everywhere in your code the variable myVar with 56.
2.11
Goto
This statement transfers control to the given label. Many scientific papers have
been written about this feature, and many people will tell you that writing a goto
statement is a sin. I am agnostic. If you need to use it, use it. Only, do not abuse it.
Note that labels are always associated with statements. You can’t write:
if (condition) {
if (another_condition) {
goto lab;
}
lab:
// WRONG!
}
In this case the label is not associated with any statement, this is an error. You can
correct this by just adding an empty statement:
if (condition) {
if (another_condition) {
goto lab;
}
lab:
; // empty statement
}
84
Chapter 2. A closer view
Now, the label is associated with a statement.
A goto statement can jump in the middle of a block, skipping the initialization
of local variables. This is a very bad idea. For instance:
if (condition)
goto label;
{
int m = 0;
label:
}
In this case, the jump will skip the initialization of the variable m. A very bad idea,
since m can now contain any value.
2.12
Break and continue statements
The break and continue statements are used to break out of a loop or switch, or to
continue a loop at the test site. They can be explained in terms of the goto statement:
while (condition != 0) {
errno = 0;
doSomething();
if (errno != 0)
break;
doSomethingElse();
}
is equivalent to:
while (condition != 0) {
errno = 0;
doSomething();
if (errno != 0)
goto lab1;
doSomethingElse();
}
lab1:
The continue statement can be represented in a similar way:
while (condition != 0) {
doSomething();
if (condition == 25)
continue;
doSomethingElse();
}
is equivalent to:
2.13. Return
85
restart:
while (condition != 0) {
doSomething();
if (condition == 25)
goto restart;
doSomethingElse();
}
The advantage of avoiding the goto statement is the absence of a label. Note that in
the case of the “for” statement, execution continues with the increment part.
Remember that the continue statement within a switch statement doesn’t mean
that execution will continue the switch but continue the next enclosing for, while, or
do statement.
2.13
Return
A return statement terminates the function that is currently executing, and returns
(hence the name) to the function that called the current function.
2.13.1
Two types of return statements
Since a function can return a value or not, we have then, two types of return state-
ments:
return;
or
return expression;
A return with no value is used in functions that have the return type void, i.e.
they do not return any value. Functions that have a return type other than void must
use the second form of the return statement. It is a serious error to use an empty
return statement within a function that expects a returned value. The compiler will
warn about this.
The type of the return will be enforced by the compiler that will generate any
necessary conversion. For instance:
double fn(void)
{
int a;
// ...
return a;
}
The compiler will generate a conversion from integer to double to convert a to double
precision.
There is one exception to this rule. If the function main() does not return any
value and control reaches the end of main(), the compiler will supply automatically
a value of zero.
86
Chapter 2. A closer view
2.13.2
Returning a structure
When the return type of a function is a structure, and that structure is big enough to
make it impossible to return the value in the machine registers, the compiler passes
the address of a temporary storage to the function where the function writes its
result. This is done automatically and does not need any special intervention from
the user.
Other compilers may have different schematas for returning a structure. Under
Windows, lcc-win uses the guidelines proposed by Microsoft. Non Microsoft compilers
under windows may use different schematas.
2.13.3
Never return a pointer to a local variable
Suppose a function like this:
double *fn(void)
{
double m;
// ...
return &m;
}
This is a serious error that will be flagged by the lcc-win compiler, but other compilers
may be less explicit. The problem here is that the variable “m” lives in the activation
record of the procedure “fn”. Once the procedure returns, all storage associated with
it is freed, including all storage where local variables are stored. The net effect of
this return is to return a bad address to the calling function. The address is bad
since the storage is no longer valid. Any access to it can provoke a trap, or (worst)
can give wrong values when accessed.
2.13.4
Unsigned
Integer types (long long, long, int, short and char) have the most significant bit
reserved for the sign bit. This declaration tells the compiler to ignore the sign bit
and use the values from zero the 2n for the values of that type. For instance, a
signed short goes from -32768 to 32767, an unsigned short goes from zero to 65535
(216). See the standard include file <limits.h> for the ranges of signed and unsigned
integer types.
2.14
Null statements
A null statement is just a semicolon. This is used in two contexts:
1. An empty body of an iterative statement (while, do, or for). For instance you
can do:
while (*p++)
;
/* search the end of the string */
2.15. Switch statement
87
2. A label should appear just before a closing brace. Since labels must be attached
to a statement, the empty statement does that just fine.
2.15
Switch statement
The purpose of this statement is to dispatch to several code portions according to
the value in an integer expression. A simple example is:
enum animal {CAT,DOG,MOUSE};
enum animal pet = GetAnimalFromUser();
switch (pet) {
case CAT:
printf("This is a cat");
break;
case DOG:
printf("This is a dog");
break;
case MOUSE:
printf("This is a mouse");
break;
default:
printf("Unknown animal");
break;
}
We define an enumeration of symbols, and call another function that asks for an
animal type to the user and returns its code. We dispatch then upon the value of the
In this case the integer expression that controls the switch is just an integer, but it
could be any expression. Note that the parentheses around the switch expression are
mandatory. The compiler generates code that evaluates the expression, and a series
of jumps (gotos) to go to the corresponding portions of the switch. Each of those
portions is introduced with a “case” keyword that is followed by an integer constant.
Note that no expressions are allowed in cases, only constants that can be evaluated
by the compiler during compilation.
Cases end normally with the keyword “break” that indicates that this portion
of the switch is finished. Execution continues after the switch. A very important
point here is that if you do not explicitly write the break keyword, execution will
continue into the next case. Sometimes this is what you want, but most often it is
not. Beware. An example for this is the following:
switch (ch) {
case ‘a’: case ‘e’: case ‘i’: case ‘o’: case ‘u’:
vowelCount++;
break;
}
Here we have the same action for 5 cases, and we use to our advantage this feature.
88
Chapter 2. A closer view
There is a reserved word “default” that contains the case for all other values that
do not appear explicitly in the switch. It is a good practice to always add this keyword
to all switch statements and figure out what to do when the input doesn’t match any
of the expected values. If the input value doesn’t match any of the enumerated cases
and there is no default statement, no code will be executed and execution continues
after the switch.
Conceptually, the switch statement above is equivalent to:
if (pet == CAT) {
printf("This is a cat");
}
else if (pet == DOG) {
printf("This is a dog");
}
else if (pet == MOUSE) {
printf("This is a mouse");
} else printf("Unknown animal");
Both forms are exactly equivalent, but there are subtle differences:
• Switch expressions must be of integer type. The “if” form doesn’t have this
limitation.
• In the case of a sizeable number of cases, the compiler will optimize the search
in a switch statement to avoid comparisons. This can be quite difficult to do
manually with “if”s.
• Cases of type other than int, or ranges of values can’t be specified with the
switch statement, contrary to other languages like Pascal that allows a range
here.
Switch statements can be nested to any level (i.e. you can write a whole switch within
a case statement), but this makes the code unreadable and is not recommended.
2.16
Logical operators
A logical expression consists of two boolean expressions (i.e. expressions that are
either true or false) separated by one of the logical operators && (AND) or || (OR).
The AND operator evaluates from left to right. If any of the expressions is
zero, the evaluation stops with a FALSE result and the rest of the expressions is
not evaluated. The result of several AND expressions is true if and only if all the
expressions evaluate to TRUE.
Example:
1 && 1 && 0 && 1 && 1
Here evaluation stops after the third expression yields false (zero). The fourth and
fifth expressions are not evaluated. The result of all the AND expressions is zero.
2.17. Bitwise operators
89
The OR operator evaluates from left to right. If any of the expressions yields
TRUE, evaluation stops with a TRUE result and the rest of the expressions is not
evaluated. The result of several OR expressions is
true
if
and only if one of the
expressions evaluates to TRUE.
If we have the expression:
result = expr1 && expr2;
this is equivalent to the following C code:
if (expr1 == 0)
result = 0;
else {
if (expr2 == 0)
result = 0;
else result = 1;
}
In a similar way, we can say that the expression
result = expr1 || expr2;
is equivalent to:
if (expr1 != 0)
result = 1;
else {
if (expr2 != 0)
result = 1;
else result = 0;
}
2.17
Bitwise operators
The operators & (bitwise AND), ^ (bitwise exclusive or), and | (bitwise or) perform
boolean operations between the bits of their arguments that must be integers: long
long, long, int, short, or char.
The operation of each of them is as follows:
1. The & (AND) operator yields a 1 bit if both arguments are 1. Otherwise it
yields a 0.
2. The ^ (exclusive or) operator yields 1 if one argument is 1 and the other is zero,
i.e. it yields 1 if their arguments are different. Otherwise it yields zero
3. The | (or) operator yields 1 if either of its arguments is a 1. Otherwise it yields
a zero.
We can use for those operators the following truth table.
a b
a&b a^b a|b
0
0
0
0
0
0
1
0
1
1
1
0
0
1
1
1
1
1
0
1
90
Chapter 2. A closer view
Note that this operators are normal operators, i.e. they evaluate always their
operands, unlike && or || that use short-circuit evaluation. If we write:
0 && fn(67);
the function call will never be executed. If we write
0 & fn(67);
the function call will be executed even if the result is fixed from the start.
2.18
Shift operators
Shifts are performed between two operands of integer type. The type of the result is
determined by the type of the left operand. Supposing 32 bit integers the operation
int a = 4976;
int b = a << 3;
consists of:
1 0011 0111 0000
(4976)
1001 1011 1000 0000
(39808)
This is the same than 4976 * 8 = 39808, since 8 is 1<<3. A shift to the left is equal
to multiplying by 2, a shift to the right is equivalent to dividing by two. Obviously
if you make shifts bigger than the number of bits in the integer you are shifting you
get zero.
This snippet:
#include <stdio.h>
int main(void)
{
int a = 4977;
int b = -4977;
int i;
for (i=0; i<5;i++) {
printf("4977 << %d:
%8d (0x%08x)\n",i,a
<<
i,a
<<
i);
printf("-4977<< %d:
%8d (0x%08x)\n",i,b <<
i,b
<<
i);
}
}
produces the following output:
4977 << 0:
4977
(0x00001371)
-4977<< 0:
-4977
(0xffffec8f)
4977 << 1:
9954
(0x000026e2)
-4977<< 1:
-9954
(0xffffd91e)
4977 << 2:
19908
(0x00004dc4)
-4977<< 2:
-19908
(0xffffb23c)
4977 << 3:
39816
(0x00009b88)
-4977<< 3:
-39816
(0xffff6478)
4977 << 4:
79632
(0x00013710)
-4977<< 4:
-79632
(0xfffec8f0)
2.19. Address-of operator
91
We have shifted a nibble (4 bits) We see it when we compare the last two lines with
the first ones.
The standard specifies that a right shift with a negative number is implementation
defined. It can be that in another machine you would get different results.
Right shifts are obviously very similar to left shifts. In the Intel/Amd family of
machines there is a different operation code for signed or unsigned right shift, one
filling the shifted bits with zero (unsigned right shift) and the other extending the
sign bit.
2.19
Address-of operator
The unary operator & yields the machine address of its argument that must be ob-
viously an addressable object. For instance if you declare a variable as a “register”
variable, you can’t use this operator to get its address because registers do not live
in main memory. In a similar way, you can’t take the address of a constant like &45
because the number 45 has no address.
The result of the operator & is a pointer with the same type as the type of its
argument. If you take the address of a short variable, the result is of type “pointer
to short”. If you take the address of a double, the result is a pointer to double, etc.
If you take the address of a local variable, the pointer you obtain is valid only until
the function where you did this exits. Afterward, the pointer points to an invalid
address and will produce a machine fault when used, if you are lucky. If you are
unlucky the pointer will point to some random data and you will get strange results,
what is much more difficult to find.
In general, the pointer you obtain with this operator is valid only if the storage
of the object is pointing to is not released. If you obtain the address of an object
that was allocated using the standard memory allocator malloc, this pointer will be
valid until there is a “free” call that releases that storage. Obviously if you take the
address of a static or global variable the pointer will be always valid since the storage
for those objects is never released.
Note that if you are using the memory manager (gc), making a reference to an
object will provoke that the object is not garbage collected until at least the reference
goes out of scope.
2.20
Indirection
The * operator is the contrary of the address-of operator above. It expects a pointer
and returns the object the pointer is pointing to. For instance if you have a pointer
pint that points to an integer, the operation *pint will yield the integer value the
pointer pint is pointing to.
The result of this operator is invalid if the pointer it is de referencing is not valid.
In some cases, de referencing an invalid pointer will provoke the dreaded window
“This program has performed an invalid operation and will terminate” that windows
shows up when a machine fault is detected. In other cases, you will be unlucky and
the de referencing will yield a nonsense result. For instance, this program will crash:
92
Chapter 2. A closer view
int main(void)
{
char *p;
*p = 0;
return 1;
}
We have never assigned to p an object where it should point to. We are using a
dangling pointer.
The debugger tells us that a machine exception has occurred, with the code
0xc0000005. This means that the CPU has detected an invalid memory reference
and has called the exception mechanism of windows, that notified the debugger of
the problem. Note the value of the pointer in the output window:
0xfffa5a5a.
Lcc-win follows the philosophy that the sooner you see an error, the better. When
it allocates the stack frame of the function, it will write this value to all memory that
has not been explicitly initialized by the program. When you see this value in the
debugger you can be highly confident that this is an uninitialized pointer or variable.
This will not be done if you turn on optimizations. In that case the pointer will
contain whatever was in there when the compiler allocated the stack frame.
Note that many other compilers do not do this, and some programs run without
crashing out of sheer luck. Since lcc-win catches this error, it looks to the users as if
the compiler was buggy. I have received a lot of complaints because of this.
This kind of problem is one of the most common bugs in C. Forgetting to initialize
a pointer is something that you can never afford to do. Another error is initializing
a pointer within a conditional expression:
char *BuggyFunction(int a)
{
char *result;
if (a > 34) {
result = malloc(a+34);
}
return result;
}
If the argument of this function is less than 35, the pointer returned will be a dangling
pointer since it was never initialized.
2.21
Sequential expressions
A comma expression consists of two expressions separated by a comma. The left
operand is fully evaluated first, and if it produces any value, that value will be
discarded. Then, the right operand is evaluated, and its result is the result of the
expression. For instance:
2.22. Casts
93
p = (fn(2,3),6);
The “p” variable will always receive the value 6, and the result of the function call
will be discarded.
Do not confuse this usage of the comma with other usages, for example within a
function call. The expression:
fn(cd=6,78);;
is always treated as a function call with three arguments, and not as a function
call with a comma expression. Note too that in the case of a function call the order
of evaluation of the different expressions separated by the comma is undefined, but
with the comma operator it is well defined: always from left to right.
2.22
Casts
A cast expression specifies the conversion of a value from one type to another. For
instance, a common need is to convert double precision numbers into integers. This
is specified like this:
double d;
(int)d
In this case, the cast needs to invoke run time code to make the actual transformation.
In other cases there is no code emitted at all. For instance in:
void *p;
(char *)p;
Transforming one type of pointer into another needs no code at all at run-time in
most implementations.
2.22.1
When to use casts
A case when casts are necessary occurs when passing arguments to a variadic function.
Since the type of the arguments can’t be known by the compiler, it is necessary to cast
a value to its exact expected type (double to float for example), so that the arguments
are converted to the exact types the variadic function expects. For instance:
float f;
printf("%Lg\n",(long double)f);
The printf function expects a long double (format Lg). We need to convert our float
f into a long double to match the expectations of printf. If the cast is eliminated,
the promoted value for a float when passed to a variadic function (or to a function
without prototype) is double. In that case printf would receive a double and would
expect a long double, resulting in a run time error or in bad output.
Another use of casts is to avoid integer division when that is not desired:
94
Chapter 2. A closer view
int a,b;
double c;
c = a/b; // Invokes integer division.
c = a/(double)b; // Invokes floating point division.
In the first case, integer division truncates the result before converting it to double
precision. In the second case, double precision division is performed without any
truncation.
2.22.2
When not to use casts
Casts, as any other of the constructs above, can be misused. In general, they make
almost impossible to follow the type hierarchy automatically. C is weakly typed, and
most of the “weakness” comes from casts expressions.
Many people add casts to get rid of compiler warnings. A cast tells essentially to
the compiler “I know what I am doing. Do not complain”. But this can make bugs
more difficult to catch. For instance lcc-win warns when a narrowing conversion is
done since the narrowed value could be greater than the capacity of the receiving
type.
char c;
long m;
c = m; // Possible loss of data.
It is easy to get rid of that warning with a cast. But is this correct?
Some people in the C community say that casts are almost never necessary, are
a bad programming practice, etc. For instance instead of:
void *p;
int c = *((char *)p);
they would write:
void *p;
char *cp = p;
int c = *cp;
This is more of a matter of aesthetic. Personally I would avoid the temporary variable
since it introduces a new name, and complicates what is otherwise a simple expression.
2.23
Selection
A structure can have several different fields. The operators . and -> select from a
variable of structure type one of the fields it contains. For instance given:
struct example {
int amount;
double debt_ratio;
2.23. Selection
95
};
struct example Exm;
struct example *pExm;
you can select the field debt_ratio using Exm.debt_ratio. If you have a pointer to
a structure instead of the structure itself, you use pExm->debt_ratio. This leads to
an interesting question: Why having two operators for selection?
It is obvious that the compiler can figure out that a pointer needs to be derefer-
enced, and could generate the code as well if we would always write a point, as in
many other languages. This distinction has historical reasons. In a discussion about
this in comp.lang.c Chris Torek, one of the maintainers of the gcc C library wrote:
The "true need" for separate . and -> operators went away sometime around
1978 or 1979, around the time the original K&R white book came out. Before then,
Dennis’ early compilers accepted things like this:
struct { char a, b; };
int x 12345; /* yes, no "=" sign */
main() {
printf("%d is made up of the bytes %d and %d\n", x,
(x.a) & 0377, (x.b) & 0377);
}
(in fact, in an even-earlier version of the compiler, the syntax was struct ( rather
than struct {. The syntax above is what appeared in V6 Unix. I have read V6
code, but never used the V6 C compiler myself.)
Note that we have taken the "a" and "b" elements of a plain "int", not the
"struct" that contains them. The "." operator works on *any* lvalue, in this early
C, and all the structure member names must be unique - no other struct can have
members named "a" and "b".
We can (and people did) also write things like:
struct rkreg { unsigned rkcsr, rkdar, rkwc; };
/* read disk sector(s) */
0777440->rkdar = addr;
0777440->rkwc = -(bytecount / 2);
0777440->rkcsr = RK_READ | RK_GO;
Note that the -> operator works on *any* value, not just pointers.
Since this "early C" did not look at the left hand side of the . and -> operators,
it really did require different operators to achieve different effects. These odd aspects
of C were fixed even before the very first C book came out, but - as with the "wrong"
precedence for the bitwise & and | operators - the historical baggage went along for
the ride.
96
Chapter 2. A closer view
2.24
Predefined identifiers
A very practical predefined identifier is __func__ that allows you to obtain the name
of the current function being compiled. Combined with the predefined preprocessor
symbols __LINE__ and __FILE__ it allows a full description of the location of an error
or a potential problem for logging or debug purposes.
An example of the usage of those identifiers is the macro require, that tests certain
entry conditions before the body of a function:
#define require(constraint) \
((constraint) ? 1 : ConstraintFailed(__func__,#constraint,NULL))
For instance when we write: require(Input >= 9)we obtain:
((Input >= 9) ? 1 : ConstraintFailed(__func__,"Input >= 9",NULL) );
2.25
Precedence of the different operators.
In their book "C, a reference manual", Harbison and Steele propose the following
table.
Tokens
Operator
Class
Precedence
Associates
literals,
literals
primary
16
n/a
names,
simple tokens
primary
16
n/a
a[k]
subscripting
postfix
16
left-to-right
f(...)
function call
postfix
16
left-to-right
(point)
selection
postfix
16
left-to-right
->
indirection
postfix
16
left-to-right
++
increment
postfix
16
left-to-right
--
decrement
postfix
16
left-to-right
(type)init
compound literal
postfix
16
left-to-right
++
increment
prefix
15
right-to-left
--
decrement
prefix
15
right-to-left
sizeof
size
unary
15
right-to-left
~
bitwise not
unary
15
right-to-left
!
logical not
unary
15
right-to-left
-
negation
unary
15
right-to-left
+
plus
unary
15
right-to-left
&
address of
unary
15
right-to-left
*
indirection
unary
15
right-to-left
(type name)
casts
unary
14
right-to-left
* / %
multiplicative
binary
13
left-to-right
+ -
additive
binary
12
left-to-right
<<
>>
left/right shift
binary
11
left-to-right
< > <= >=
relational
binary
10
left-to-right
==
!=
equal/not equal
binary
9
left-to-right
&
bitwise and
binary
8
left-to-right
^
bitwise xor
binary
7
left-to-right
|
bitwise or
binary
6
left-to-right
2.26. The printf family
97
&&
logical and
binary
5
left-to-right
||
logical or
binary
4
left-to-right
? :
conditional
binary
2
right-to-left
= += -=
assignment
binary
2
right-to-left,
*= /= %=
assignment
binary
2
right-to-left,
<<= >>=
assignment
binary
2
right-to-left,
&= ^= |=
assignment
binary
2
right-to-left,
sequential evalua-
binary
1
left-to-right
tion
2.26
The printf family
The functions fprintf, printf, sprintf and snprintf are a group of functions to output
formatted text into a file (fprintf, printf) or a character string (sprintf). The snprintf
is like sprintf function, but accepts a count of how many characters should be put as
a maximum in the output string. The printf function is the same as fprintf, with
Figure 2.1: The parts of a printf specification
Table 2.2: Prototypes for the main functions of the printf family
Function
Prototype
fprintf
int fprintf(FILE * stream,const char *fmt, ...);
printf
int printf(const char *fmt,...);
sprintf
char *outputstring, const char *fmt,...);
snprintf
int snprintf(char *out,size_t maxchars,const char *fmt, ...);
98
Chapter 2. A closer view
an implicit argument “stdout”, i.e. the standard output of the program, that in most
cases is the console window.
fprintf(stdout,"hello\n"); <---> printf("hello\n");
The value returned by all this functions is EOF (End Of File, usually -1) if an
error occurred during the output operation. Otherwise, all is OK and they return
the number of characters written. For sprintf, the returned count does not include
the terminating zero appended to the output string.
The “fmt” argument is a character string or “control string”. It contains two types
of characters: normal characters that are copied to the output without any change,
and conversion specifications, that instruct the function how to format the next
argument. In the example above, we have just the string “hello\n”, without any
conversion specification so that character string is simply copied to the destination.
There should be always at least as many arguments to this functions as there are
conversion specifications. If you fail to do this with lcc-win, you will get a warning
from the compiler. Other compilers can be less user friendly, so do not rely on that.
2.26.1
Conversions
A conversion specification begins with a percent sign ( % ) and is made of the following
elements:
1. Zero or more flag characters (-, +, 0, #, ‘, or space), which modify the meaning
of the operation.
2. An optional minimum field width. Note well this. The printf function will not
truncate a field. The specified width is just that: a minimum.
For instance the output of this program:
#include <stdio.h>
int main(void)
{
printf("%5s\n","1234567890");
}
is 1234567890 and NOT 12345 as expected. If you want to specify a maximum
field width you should use the precision field, not the width field.
3. An optional precision field made of a period followed by a number. This is
where you specify the maximum field width.
4. An optional size flag, expressed as one of the letters ll, l, L, h, hh, j, q,t, or z.
5. The type specification, a single character from the set a,A, c, d, e, E, f, g, G, i,
n, o, p, s, u, x, X, and %.
2.26. The printf family
99
Table 2.3: The conversion flags
- (minus)
Value will be left justified. The pad character is space.
0
Use zero as pad character instead of the default space. This is
relevant only if a minimum field width is specified, otherwise
there is no padding. If the data requires it, the minimum
width is not honored. Note that the padding character will
be always space when padding is introduced right of the data.
+
Always add a sign, either + or -. Obviously, a minus flag is
always written, even if this flag is not specified.
’ (single quote)
Separate the digits of the formatted numbers in groups of
three. For instance 123456 becomes 123,456. This is an lcc-
win extension.
space
Use either space or minus, i.e. the + is replaced by a space.
#
use a variant of the main version algorithm
2.26.2
The minimum field width
This specifies that if the data doesn’t fit the given width, the pad character is inserted
to increase the field. If the data exceeds the field width the field will be bigger than
the width setting. Numbers are written in decimal without any leading zeroes, that
could be misunderstood with the 0 flag.
2.26.3
The precision
In floating point numbers (formats g G f e E) this indicates the number of digits
to be printed after the decimal point. Used with the s format (strings) it indicates
the maximum number of characters that should be used from the input string. If
the precision is zero, the floating point number will not be followed by a period and
rounded to the nearest integer. Table 2.26.3 shows the different size specifications,
together with the formats they can be used with. Well now we can pass to the final
part.
2.26.4
The conversions
Conversion
Description
d,i
Signed integer conversion is used and the argument is by de-
fault of type int. If the h modifier is present, the argument
should be a short, if the ll modifier is present, the argument
is a long long.
u
Unsigned decimal conversion. Argument should be of type
unsigned int (default), unsigned short (h modifier) or unsigned
long long (ll modifier).
o
Unsigned octal conversion is done. Same argument as the u
format.
100
Chapter 2. A closer view
Table 2.5 - Continued
Conversion
Description
x,X
Unsigned hexadecimal conversion. If x is used, the letters will
be in lower case, if X is used, they will be in uppercase. If the
# modifier is present, the number will be prefixed with 0x.
c
The argument is printed as a character or a wide character if
the l modifier is present.
s
The argument is printed as a string, and should be a pointer
to byte sized characters (default) or wide characters if the l
modifier is present. If no precision is given, all characters are
used until the zero byte is found. Otherwise, the conversion
stops at the given precision.
p
The argument is a pointer and is printed in pointer format.
Under lcc-win this is the same as the unsigned format (#u).
n
The argument is a pointer to int (default), pointer to short (h
modifier) or pointer to long long (ll modifier). Contrary to all
other conversions, this conversion writes the number of char-
acters written so far in the address pointed by its argument.
e, E
Signed decimal floating point conversion..Argument is of type
double (default), or long double (with the L modifier) or qfloat
(with the q modifier). The result will be displayed in scientific
notation with a floating point part, the letter ‘e’ (for the e
format) or the letter E (for the E format), then the exponent.
If the precision is zero, no digits appear after the decimal
point, and no point is shown. If the # flag is given, the point
will be printed.
f, F
Signed decimal floating point conversion. Argument is of type
double (default), or long double (with the L modifier). If the
argument is the special value infinite, inf will be printed. If
the argument is the special value NAN the letters nan are
written.
g, G
This is the same as the above but with a more compact rep-
resentation. Arguments should be floating point. Trailing
zeroes after the decimal point are removed, and if the number
is an integer the point disappears. If the # flag is present, this
stripping of zeroes is not performed. The scientific notation
(as in format e) is used if the exponent falls below -4 or is
greater than the precision, that defaults to 6.
%
How do you insert a % sign in your output? Well, by using
this conversion: %%.
2.26.5
Scanning values
The scanf family of functions fulfills the inverse task of the printf family. They scan
a character array and transform sequences of characters into values, like integers,
strings or floating point values. The general format of these functions are:
2.26. The printf family
101
Table 2.4: The size specification
Sign
Formats
Description
l
d,i,o,u,x, X
The letter l with this formats means long or unsigned
long.
l
n
The letter l with the n format means long *.
l
c
Used with the c (character) format, it means the char-
acter string is in wide character format.
l
all others
No effect, is ignored
ll
d, i, o, u, x,
The letters ll mean long long or unsigned long long.
X
ll
n
With this format, ll means long long *.
h
d, i, o, u, x,
With this formats, h indicates short or unsigned short.
X
h
n
Means short *.
hh
d, i, o, u, x,
Means char or unsigned char.
X
hh
n
Means char * or unsigned char *.
L
A, a, E, e,
Means that the argument is a long double. Notice that
F, f, G, g,
the l modifier has no effect with those formats. It is
uppercase L.
j
d, i, o, u, x,
Means the argument is of type intmax_t, i.e. the
X
biggest integer type that an implementation offers. In
the case of lcc-win this is long long.
q
f,g,e
Means the argument is of type qfloat (350 bits preci-
sion). This is an extension of lcc-win.
t
d, i, o, u, x,
Means the argument is ptrdiff_t, under lcc-win int
X
in the 32 bits version, 64 in the 64 bits version.
Z
e, E, g, G, f,
Means the argument is a complex number. Output is
F, A, a
in standard complex notation. If the alternative flag
is present (‘#’) the output will have a lowercase ‘i’
instead of the standard “*I” suffix. Each of the other
qualifiers that applies to the floating format will be
applied to the real and to the imaginary parts of the
number. Note that this a lcc-win extension.
z
d, i, o, u, x,
Means the argument is size_t, in lcc-win unsigned int
X
in 32 bits, unsigned long long in 64 bits.
102
Chapter 2. A closer view
Table 2.6: scanf directives
Type
Format
char
c
short
hd
int
d or i
long
ld
long long
lld
float
f or e
double
lf or le
long double
Lf or Le
string
s
scanf(format,p1,p2,p3...); // Reads characters from stdin
fscanf(file,format,p1,p2,p3,...);
sscanf(string,format,p1,p2,p3,...);
where format is a character string like we have seen in printf, and p1,p2,p3
are
pointers to the locations where scanf will leave the result of the conversion.
For
instance:
scanf("%d",&integer);
and the input line
123
will store the integer 123 at the location of the “integer” variable.
Some things to remember when using scanf:
1. You have to provide always a format string to direct scanf.
2. For each format directive you must supply a corresponding pointer to a suitable
sized location.
3. Leading blanks are ignored except for the %c (character) directive.
4. The “%f” format means float, to enter a double you should write “%lf”.
5. Scanf ignores new lines and just skips over them. However, if we put a \n in
the input format string, we force the system to skip a new line.
6. Calls to scanf can’t be intermixed with calls to getchar, to the contrary of calls
to printf and putchar.
7. When scanf fails, it stops reading at the position where it finds the first input
character that doesn’t correspond to the expected sequence.
If you are expecting a number and the user makes a mistake and types 465x67,
scanf will leave the input pointing to the “x”, that must be removed by other means.
Because of this problem it is always better to read one line of input and then using
sscanf in the line buffer rather than using scanf directly with user input. Here are
some common errors to avoid:
2.27. Pointers
103
int integer;
short shortint;
char buffer[80];
char* str = buffer;
/* providing the variable instead of a pointer
to
it */
sscanf("%d", integer);
/* wrong! */
sscanf("%d", &integer);
/* right */
/* providing a pointer to the wrong type
(or a wrong format to the right pointer)
*/
sscanf("%d", &shortint);
/* wrong */
sscanf("%hd", &shortint);
/* right */
/* providing a pointer to a string pointer
instead of the string pointer itself.
(some people think "once &, always &) */
sscanf("%s", &str);
/* wrong */
sscanf("%s", str);
/* right */
Consider the following code:
#include <stdio.h>
int main(void)
{
int i;
char c;
scanf("%d",&i);
scanf("%c",&c);
printf("%d %c\n",i,c);
}
Assume you type 45\n in response to the first scanf.
The
45
is
copied into variable
n. When the program encounters the next scanf, the remaining \n is quickly copied
into the variable c. The fix is to put explicitly a \n like this: scanf("%d\n",&i);.
2.27
Pointers
Pointers are one of the great ideas in C, but it is one that is difficult to grasp at the
beginning. All objects (integers, structures, any data) reside in RAM. Conceptually
memory is organized in a linear sequence of locations, numbered from 0 upwards.
Pointers allow you to pass the location of the data instead of the data itself.
To make things explicit, suppose you have some structure like this:
#define MAXNAME 128
struct person {
char Name[MAXNAME];
104
Chapter 2. A closer view
int Age;
bool Sex;
double Weight;
};
Instead of passing all the data to a function that works with this data, you just pass
the address where it starts. What is this address? We can print it out. Consider this
simple program:
1
#include <stdio.h>
2
#include <stdbool.h>
3
#define MAXNAME 128
4
struct person {
5
char Name[MAXNAME];
6
int Age;
7
bool Sex;
8
double Weight;
9
};
10 struct person Joe;
11 int main(void)
12 {
13
printf("0x%x + %d\n",&Joe,sizeof(struct person));
14 }
The address-of operator in line 13 returns the index of the memory location where
the “Joe” structure starts. In my machine this prints: 0x402004 + 144.
The memory location 0x402004 (4 202 500 in decimal) contains the start of this
data, that goes up to 0x402094 (4 202 644).
When we write a function that should work with the data stored in that structure,
we give it just the number 4 202 500. That means: "The data starts at 4 202 500".
No copying needed, very efficient.
A pointer then, is a number that contains the machine address, i.e. the number
of the memory location, where the data starts. The integer that contains a memory
location is not necessarily the same as a normal “int”, that can be smaller or bigger
than an address. In 64 bit systems, for instance, addresses can be 64 bits wide, but
“int” can remain at 32 bits. In other systems (Win 32 for instance) a pointer fits in
an integer.
Pointers must be initialized before use by making them point to an object. Before
initialization they contain a NULL value if they are defined at global scope, or an
undefined value if they are local variables. It is always a very bad idea to use an
uninitialized pointer.
Memory locations are dependent on the operating system, the amount of memory
installed, and how the operating system presents memory to the programmer. Never
make many assumptions about memory locations. For instance, the addresses we
see now under windows 32 bit could radically change in other context, where they
become 64 bit addresses. Anyway, under windows we use virtual memory, so those
numbers are virtual addresses, and not really memory locations inside the circuit
board. .
2.27. Pointers
105
A pointer can store the start address of an object, but nothing says that this
object continues to exist. If the object disappears, the pointers to it contain now
invalid addresses, but it is up to the programmer to take care of this. An object can
disappear if, for instance, its address is passed to the “free” function to release the
memory. An object can disappear if its scope (i.e. the function where it was defined)
ends. It is a beginner’s mistake to write:
int *fn(int a)
{
int a;
return &a;
}
The “a” variable has a duration defined until the function “fn” exits. After that
function exits, the contents of all those memory locations containing local variables
are undefined, and the function is returning a pointer to memory that will be freed
and recycled immediately. Of course the memory location itself will not disappear,
but the contents of it will be reassigned to something else, maybe another function,
maybe another local variable, nobody knows.
A pointer that is not initialized or that points to an object that has disappeared
is a “dangling” pointer and it is the nightmare of any C programmer. The bugs
produced by dangling pointers are very difficult to find, since they depend on whether
the pointers destroy or not some other object. This programs tend to work with small
inputs, but will crash mysteriously with complex and big inputs. The reason is that
in a more complex environment, object recycling is done more often, what means
that the memory locations referenced by the dangling pointers are more likely used
by another object.
2.27.1
Operations with pointers
The most common usage of a pointer is of course the “dereferencing” operation, i.e.
the operator -> or the unary *. This operations consist of reading the contents of
a pointer, and using the memory address thus retrieved either fetch the data stored
at that place or at some displacement from that place. For instance when we use a
pointer to the “person” structure above the operation:
struct person *pJoe = &Joe;
pJoe->weight
means:
1. Fetch the contents of the “pJoe” pointer.
2. Using that address, add to it sizeof(Name[MAXNAME]) + sizeof(int) + sizeof(bool)
3. Retrieve from the updated memory location a “double” value
106
Chapter 2. A closer view
The operation 2) is equivalent to finding the offset of the desired field within a given
structure. This is often required and the language has defined the macro “offsetof”
in the “stddef.h” header file for using it within user’s programs.
Pointers can be used to retrieve not only a portion of the object (operation ->)
but to retrieve the whole object using the “*” notation. In the example above the
operation “*pJoe” would yield as its result the whole structure. This operation deref-
erences the pointer and retrieves the entire object it is pointing to, making it the
exact opposite of the “&” (address-of) operator.
Only two kinds of arithmetic operations are possible with machine addresses:
Addition or subtraction of a displacement, or subtraction of two machine addresses.
No other arithmetic operators can be applied to pointers.
2.27.2
Addition or subtraction of a displacement: pointer arithmetic
Adding a displacement (an integer) to a pointer means:
Using the address stored in that pointer, find the nth object after it. If we have
a pointer to int, and we add to it 5, we find the 5th integer after the integer whose
address was stored in the pointer.
Example:
int d[10];
int *pint = &d[2];
The number stored in pint is the memory location index where the integer “d” starts,
say 0x4202600. The size of each integer is 4, so the 3rd integer after 402600 starts
at 0x4202608. If we want to find the fifth integer after 0x4202608, we add 20
(5*sizeof(int) = 20) and we obtain 0x420261C.
To increase a pointer by one means adding to it the size of the object it is pointing
to. In the case of a pointer to integer we add 4 to 0x204600 to obtain the next integer.
This is very often written with the shorthand: pint++; or ++pint;.
This is a short hand for “Move the pointer to point to the next element”. Obviously
this is exactly the same for subtraction. Subtracting a displacement from a pointer
means to point it to the nth element stored before the one the pointer is pointing to.
Id we have a pointer to int with a value of 0x4202604, making: p--; meaning that
we subtract 4 from the value of the pointer, to make it point to the integer stored at
address 0x4202600, the previous one.
To be able to do pointer arithmetic, the compiler must know what is the under-
lying type of the objects. That is why you can’t do pointer arithmetic with void
pointers: the compiler can’t know how much you need to add or subtract to get to
the next element!
2.27.3
Subtraction
The subtraction of two pointers means the distance that separates two objects of
the same type. This distance is not expressed in terms of memory locations but in
terms of the size of the objects. For instance the distance between two consecutive
objects is always one, and not the number of memory locations that separates the
start addresses of the two objects.
2.27. Pointers
107
The type of the result is an integer, but it is implementation defined exactly which
(short, int, long, etc). To make things portable, the standard has defined a special
typedef, ptrdiff_t that encapsulates this result type. Under lcc-win this is an “int”
but under other versions of lcc-win (in 64 bit architectures for instance) it could be
something else.
2.27.4
Relational operators
Pointers can be compared for equality with the == operator. The meaning of this
operation is to find out if the contents of two pointers are the same, i.e. if they point
to the same object. The other relational operators are allowed too, and the result
allows to know which pointer appears before or later in the linear memory space.
Obviously, this comparisons will fail if the memory space is not linear, as is the
case in segmented architectures, where a pointer belongs to a memory region or
segment, and inter-segment comparisons are not meaningful.
2.27.5
Null pointers
A pointer should either contain a valid address or be empty. The “empty” value of a
pointer is defined as the NULL pointer value, and it is usually zero.
Using an empty pointer is usually a bad idea since it provokes a trap immediately
under lcc-win, and under most other compilers too. The address zero is specifically
not mapped to real addresses to provoke an immediate crash of the program. In
other environments, the address zero may exist, and a NULL pointer dereference will
not provoke any trap, but just reading or writing to the address zero.
2.27.6
Pointers and arrays
Contrary to popular folklore, pointers and arrays are NOT the same thing. In some
circumstances, the notation is equivalent. This leads to never ending confusion, since
the language lacks a correct array type. Consider this declarations:
int iArray[3];
int *pArray = iArray;
This can lead people to think that pointers and arrays are equivalent but this is just
a compiler trick: the operation being done is: int *pArray = &iArray[0];.
Another syntax that leads to confusion is: char *msg = "Please enter a number”;
Seeing this leads people to think that we can assign an entire array to a pointer, what
is not really the case here. The assignment being done concerns the pointer that gets
the address of the first element of the character array.
2.27.7
Assigning a value to a pointer
The contents of the pointer are undefined until you initialize it. Before you initialize
a pointer, its contents can be anything; it is not possible to know what is in there,
until you make an assignment. A pointer before is initialized is a dangling pointer,
i.e. a pointer that points to nowhere.
A pointer can be initialized by:
108
Chapter 2. A closer view
• Assign it a special pointer value called NULL, i.e. empty.
• Assignment from a function or expression that returns a pointer of the same
type. In the frequencies example we initialize our infile pointer with the function
fopen, that returns a pointer to a FILE.
• Assignment to a specific address. This happens in programs that need to access
certain machine addresses for instance to use them as input/output for special
devices. In those cases you can initialize a pointer to a specific address. Note
that this is not possible under windows, or Linux, or many operating systems
where addresses are virtual addresses. More of this later.
• You can assign a pointer to point to some object by taking the address of that
object. For instance:
int integer;
int *pinteger = &integer;
Here we make the pointer “pinteger” point to the int “integer” by taking the
address of that integer, using the & operator. This operator yields the machine
address of its argument.
2.27.8
References
In lcc-win pointers can be of two types. We have normal pointers, as we have de-
scribed above, and “references”, i.e. compiler maintained pointers, that are very
similar to the objects themselves.
References are declared in a similar way as pointers are declared:
int a = 5;
// declares an integer a
int * pa = &a;
// declares a pointer to the integer a
int &ra = a;
// declares a reference to the integer a
Here we have an integer, that within this scope will be called “a”. Its machine address
will be stored in a pointer to this integer, called “pa”. This pointer will be able to
access the data of “a”, i.e. the value stored at that machine address by using the “*”
operator. When we want to access that data we write:
*pa = 8944;
This means: “store at the address contained in this pointer pa, the value 8944”.
We can also write:
int m = 698 + *pa;
This means: “add to 698 the contents of the integer whose machine address is con-
tained in the pointer pa and store the result of the addition in the integer m”
We have a “reference” to a, that in this scope will be called “ra”. Any access
to this compiler maintained pointer is done as we would access the object itself, no
special syntax is needed. For instance we can write:
ra = (ra+78) / 79;
Note that with references the “*” operator is not needed. The compiler will do
automatically this for you.
2.28. setjmp and longjmp
109
2.27.9
Why pointers?
It is obvious that a question arises now: why do we need references? Why can’t we
just use the objects themselves? Why is all this pointer stuff necessary?
Well this is a very good question. Many languages seem to do quite well without
ever using pointers the way C does.
The main reason for these constructs is efficiency. Imagine you have a huge
database table, and you want to pass it to a routine that will extract some information
from it. The best way to pass that data is just to pass the address where it starts,
without having to move or make a copy of the data itself. Passing an address is just
passing a 32-bit number, a very small amount of data. If we would pass the table
itself, we would be forced to copy a huge amount of data into the called function,
what would waste machine resources.
The best of all worlds are references. They must always point to some object,
there is no such a thing as an uninitialized reference. Once initialized, they can’t
point to anything else but to the object they were initialized to, i.e. they can’t be
made to point to another object, as normal pointers can. For instance, in the above
expressions, the pointer pa is initialized to point to the integer “a”, but later in the
program, you are allowed to make the “pa” pointer point to another, completely
unrelated integer. This is not possible with the reference “ra”. It will always point to
the integer “a”.
2.28
setjmp and longjmp
2.28.1
General usage
This two functions implement a jump across function calls to a defined place in your
program. You define a place where it would be wise to come back to, if an error
appears in any of the procedures below this one.
For instance you will engage in the preparation of a buffer to send to the database.,
or some other lengthy operation that can fail. Memory can be exhausted, the disk
can be full (yes, that can still arrive, specially when you get a program stuck in an
infinite write loop...), or the user can become fed up with the waiting and closes the
window, etc.
For all those cases, you devise an exit with longjmp, into a previously saved
context. The classical example is given by Harbison and Steele:
#include <setjmp.h>
jmp_buf ErrorEnv;
int guard(void)
/* Return 0 if successful; else lonjmp code */
{
int status = setjmp(ErrorEnv);
if (status != 0)
return status; /* error */
process();
return 0;
110
Chapter 2. A closer view
}
int process(void)
{
int error_code;
if (error_happened) longjmp(ErrorEnv,error_code);
}
With all respect I have for Harbison and Steele and their excellent book, this example
shows how NOT to use setjmp/longjmp. The ErrorEnv global variable is left in an
undefined state after the function exits with zero. When you use this facility utmost
care must be exercised to avoid executing a longjmp to a function that has already
exited. This will always lead to catastrophic consequences. After this function exists
with zero, the contents of the global ErrorEnv variable are a bomb that will explode
your program if used. Now, the process() function is entirely tied to that variable
and its validity. You can’t call process() from any other place. A better way could
be:
#include <setjmp.h>
jmp_buf ErrorEnv;
int guard(void)
/* Return 0 if successful; else longjmp code */
{
jmp_buf pushed_env;
memcpy(push_env,ErrorEnv,sizeof(jmp_buf));
int status = setjmp(ErrorEnv);
if (status == 0)
process();
memcpy(ErrorEnv, pushed_env, sizeof(jmp_buf));
return status;
}
int process(void)
{
int error_code=0;
if (error_code) longjmp(ErrorEnv,error_code);
}
This way, the contents ErrorEnv are left as they were before, and if you setup in the
first lines of the main() function:
int main(void)
{
2.28. setjmp and longjmp
111
if (setjmp(ErrorEnv))
// Do not pass any other code.
return ERROR_FAILURE; // Just a general failure code
}
This way the ErrorEnv can be always used without fearing a crash. Note that I
used memcpy and not just the assignment:
pushed_env = ErrorEnv; /* wrong! */
since jmp_buf is declared as an array as the standard states. Arrays can only be
copied with memcpy or a loop assigning each member individually.
Note that this style of programming is sensitive to global variables. Globals will
not be restored to their former values, and, if any of the procedures in the process()
function modified global variables, their contents will be unchanged after the longjmp.
#include <setjmp.h>
jmp_buf ErrorEnv;
double global;
int guard(void)
/* Return 0 if successful; else longjmp code */
{
jmp_buf pushed_env;
memcpy(push_env,ErrorEnv,sizeof(jmp_buf));
int status = setjmp(ErrorEnv);
global = 78.9776;
if (status == 0)
process();
memcpy(ErrorEnv, pushed_env, sizeof(jmp_buf));
// Here the contents of “global” will be either
78.9776
// or 23.87 if the longjmp was taken.
return status;
}
int process(void)
{
int error_code=0;
global = 23.87;
if (error_code) longjmp(ErrorEnv,error_code);
}
And if you erase a file longjmp will not undelete it. Do not think that longjmp is a
time machine that will go to the past.
Yet another problem to watch is the fact that if any of the global pointers
pointed to an address that was later released, after the longjmp their contents will
112
Chapter 2. A closer view
be wrong.Any pointers that were allocated with malloc will not be released, and
setjmp/longjmp could be the source of a memory leak. Within lcc-win there is an
easy way out, since you can use the garbage collector instead of malloc/free. The
garbage collector will detect any u nused memory and will released when doing the
gc.
2.28.2
Register variables and longjmp
When you compile with optimizations on, the use of setjmp and longjmp can produce
quite a few surprises. Consider this code:
#include <setjmp.h>
#include <stdio.h>
int main(void)
{
jmp_buf jumper;
int localVariable = 1;
(1)
printf("1: %d\n",localVariable);
if (setjmp(jumper) == 0) {
// return from longjmp
localVariable++;
(2)
printf("2: %d\n",localVariable);
longjmp(jumper,1);
}
localVariable++;
(3)
printf("3: %d\n",localVariable);
return 0;
}
Our “localVariable” starts with the value 1. Then, before calling longjmp, it is in-
cremented. Its value should be two. At exit, “localVariable” is incremented again at
should be three. We would expect the output:
1: 1
2: 2
3: 3
And this is indeed the output we get if we compile without any optimizations. When
we turn optimizations on however, we get the output:
1: 1
2: 2
3: 2
Why?Because “localVariable” will be stored in a register. When longjmp returns, it
will restore all registers to the values they had when the setjmp was called, and if
localVariable lives in a register it will return to the value 1, even if we incremented
it before calling longjmp.
2.29. Time and date functions
113
The only way to avoid this problem is to force the compiler to allocate localVari-
able in memory, using the “volatile” keyword. The declaration should look like this:
int volatile localVariable;.
This instructs the compiler to avoid any optimizations with this variable, i.e. it
forces allocating in the stack, and not in a register. This is required by the ANSI
C standard. You can’t assume that local variables behave normally when using
longjmp/setjmp.
The setjmp/longjmp functions have been used to implement larger exception
handling frameworks. For an example of such a usage see for example “Exceptions
and assertions” in “C Interfaces and implementations” of David Hanson, Chapter 4.
2.29
Time and date functions
The C library offers a lot of functions for working with dates and time. The first of
them is the time function that returns the number of seconds that have passed since
January first 1970, at midnight.
Several structures are defined that hold time information. The most important
from them are the “tm” structure and the “timeb” structure.
struct tm {
int tm_sec;
int tm_min;
int tm_hour;
int tm_mday;
int tm_mon;
int tm_year;
int tm_wday;
int tm_yday;
int tm_isdst;
};
The fields are self-explanatory. The structure “timeb” is defined in the
directory
include\sys, as follows:
struct timeb {
time_t time;
unsigned short pad0;
unsigned long lpad0;
unsigned short millitm; // Fraction of a second in ms
unsigned short pad1;
unsigned long lpad1;
// Difference (minutes), moving westward, between
// UTC and local time
short timezone;
unsigned short pad2;
unsigned long lpad2;
// Nonzero if daylight savings time is currently
114
Chapter 2. A closer view
// in effect for the local time zone.
short dstflag;
};
We show here a small program that displays the different time settings.
#include <time.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/timeb.h>
#include <string.h>
#ifdef _MSC_VER
/* Microsoft compilers call these functions differently */
#define strdate _strdate
#define strtime _strtime
#endif
int main(void)
{
char tmpbuf[128], ampm[3];
time_t ltime;
struct _timeb tstruct;
struct tm *today, *gmt, xmas = { 0, 0, 12, 25, 11, 113
};
/* Display operating system-style date and time. */
/* Windows specific functions */
#ifdef _WIN32
strtime( tmpbuf );
printf( "OS time:\t\t\t\t%s\n", tmpbuf );
strdate( tmpbuf );
printf( "OS date:\t\t\t\t%s\n", tmpbuf );
#endif
/* Get UNIX-style time and display as number and string. */
time( <ime );
printf( "Time in seconds since UTC 1/1/70:\t%ld\n", ltime );
printf( "UNIX time and date:\t\t\t%s", ctime( <ime ) );
/* Display UTC. See note (1) in text */
gmt = gmtime( <ime );
printf( "Coordinated universal time:\t\t%s", asctime( gmt ) );
/* Convert to time structure and adjust for PM if necessary. */
ampm[1] = ’M’; ampm[2]=0;
today = localtime( <ime );
if( today->tm_hour > 12 ) {
ampm[0] = ’P;
2.29. Time and date functions
115
today->tm_hour -= 12;
}
else ampm[0]=’A’;
if (today->tm_hour == 0)
/* Adjust if midnight hour. */
today->tm_hour = 12;
/* See note (2) in text */
printf( "12-hour time:\t\t\t\t%.8s %s\n",
asctime( today ) + 11, ampm );
/* Print additional time information. */
ftime( &tstruct );
printf( "Plus milliseconds:\t\t\t%u\n", tstruct.millitm );
printf( "Zone difference in seconds from UTC:\t%d\n",
tstruct.timezone );
printf( "Daylight savings:\t\t\t%s\n", // See note (3) in text
tstruct.dstflag ? "YES" : "NO" );
/* Make time for noon on Christmas, 2013. */
if (mktime( &xmas ) != (time_t)-1 )
printf("Christmas\t\t\t\t%s\n", asctime( &xmas ) );
/* Use time structure to build a customized time string. */
today = localtime( <ime );
/* Use strftime to build a customized time string. */
strftime( tmpbuf, 128,
"Today is %A, day %d month of %B in the year %Y.\n",
today );
printf( tmpbuf );
}
OUTPUT:
OS time:
17:53:23
OS date:
11/30/11
Time in seconds since UTC 1/1/70:
1322693603
UNIX time and date:
Wed Nov 30 17:53:23 2011
Coordinated universal time:
Wed Nov 30 22:53:23 2011
12-hour time:
05:53:23 PM
Plus milliseconds:
6
Zone difference in seconds from UTC:
-60
Daylight savings:
NO
Christmas
Wed Dec 25 12:00:00 2013
Today is Wednesday, day 30 month of November in the year 2011.
You will need some conversion functions to convert between the C time and the
116
Chapter 2. A closer view
windows time format:
#include <winbase.h>
#include <winnt.h>
#include <time.h>
void UnixTimeToFileTime(time_t t, LPFILETIME pft)
{
long long ll;
ll = Int32x32To64(t, 10000000) + 116444736000000000;
pft->dwLowDateTime = (DWORD)ll;
pft->dwHighDateTime = ll >> 32;
}
Once the UNIX time is converted to a FILETIME structure, other Windows time
formats can be easily obtained using Windows functions such as FileTimeToSystemTime
(), and FileTimeToDosDateTime ().
void UnixTimeToSystemTime(time_t t, LPSYSTEMTIME pst)
{
FILETIME ft;
UnixTimeToFileTime(t, &ft);
FileTimeToSystemTime(&ft, pst);
}
3
Simple programs
To give you a more concrete example of how C works, here are a few examples of
very simple programs. The idea is to find a self-contained solution for a problem that
is simple to state and understand.
3.1
strchr
Find the first occurrence of a given character in a character string. Return a pointer
to the character if found, NULL otherwise.This problem is solved in the standard
library by the strchr function. Let’s write it.
The algorithm is very simple: We examine each character. If it is zero, this is the
end of the string, we are done and we return NULL to indicate that the character
is not there. If we find it, we stop searching and return the pointer to the character
position.
char *FindCharInString(char *str, int ch)
{
while (*str != 0 && *str != ch) {
str++;
}
if (*str == ch)
return str;
return NULL;
}
We loop through the characters in the string. We use a while condition requiring
that the character pointed to by our pointer “str” is different than zero and it is
different than the character given. In that case we continue with the next character
by incrementing our pointer, i.e. making it point to the next char. When the while
loop ends, we have either found a character, or we have arrived at the end of the
string. We discriminate between these two cases after the loop.
3.1.1
How can strchr fail?
We do not test for NULL. Any NULL pointer passed to this program will provoke
a trap. A way of making this more robust would be to return NULL if we receive a
NULL pointer. This would indicate to the calling function that the character wasn’t
found, what is always true if our pointer doesn’t point anywhere.
117
118
Chapter 3. Simple programs
A more serious problem happens when our string is missing the zero byte. . . In
that case the program will blindly loop through memory, until it either finds the byte
is looking for, or a zero byte somewhere. This is a much more serious problem, since
if the search ends by finding a random character somewhere, it will return an invalid
pointer to the calling program!
This is really bad news, since the calling program may not use the result imme-
diately. It could be that the result is stored in a variable, for instance, and then
used in another, completely unrelated section of the program. The program would
crash without any hints of what is wrong and where was the failure. Note that this
implementation of strchr will correctly accept a zero as the character to be searched.
In this case it will return a pointer to the zero byte.
3.2
strlen
Return the length of a given string not including the terminating zero.
3.2.1
A straightforward implementation
This is solved by the strlen function. We just count the chars in the string, stopping
when we find a zero byte.
int strlen(char *str)
{
char *p = str;
while (*p != 0) {
p++;
}
return p - str;
}
We copy our pointer into a new one that will loop through the string. We test for
a zero byte in the while condition. Note the expression *p != 0. This means “Fetch
the value this pointer is pointing to (*p), and compare it to zero”. If the comparison
is true, then we increment the pointer to the next byte.
We return the number of characters between our pointer p and the saved pointer
to the start of the string. This pointer arithmetic is quite handy.
3.2.2
An implementation by D. E. Knuth
... one of the most common programming tasks is to search through a long string
of characters in order to find a particular byte value. For example strings are often
represented as a sequence of nonzero bytes terminated by 0. In order to locate the
end of a string quickly, we need a fast way to determine whether all eight bytes of a
given word x are nonzero (because they usually are).
I discovered that quote above in the fascicle 1a in "Bitwise Tricks and Techniques"
when reading D. E. Knuth’s pages:
http://www-cs-faculty.stanford.edu/~knuth/fasc1a.ps.gz.
3.2. strlen
119
In that document, Knuth explains many boolean tricks, giving the mathematical
background for them too, what many other books fail to do. I adapted his algorithm
to C. It took me a while because of a stupid problem, but now it seems to work OK.
The idea is to read 8 bytes at a time and use some boolean operations that can be
done very fast in assembly to skip over most of the chain, stopping only when a zero
byte appears in one of the eight bytes read. Knuth’s strlen then, looks like this.
#include <stdio.h>
#include <string.h>
#define H 0x8080808080808080ULL
#define L 0x0101010101010101ULL
size_t myStrlen(char *s)
{
unsigned long long t;
char *save = s;
while (1) {
// This supposes that the input string is aligned
// or that the machine doesn’t trap when reading
// a 8 byte integer at a random position like the
x86
t = *(unsigned long long *)s;
if (H & (t - L) & ~t)
break;
s += sizeof(long long);
}
// This loop will be executed at most 7 times
while (*s) {
s++;
}
return s - save;
}
#ifdef TEST
int main(int argc,char *argv[])
{
char *str = "The lazy fox jumped over the slow
dog";
if (argc > 1) {
str = argv[1];
}
printf(
"Strlen of ’%s’ is %d (%d)\n",
str,strlen(str),myStrlen(str));
}
#endif
There are two issues with this code. The first is that it reads 8 bytes from a random
location, what can make this code trap if used in machines that require aligned reads.
120
Chapter 3. Simple programs
The second is that it reads some bytes beyond the end of the string, if the length of
the string is not a multiple of eight.
To solve the first problem, we should read the bytes of the string until our pointer
is correctly aligned, i.e. its value is a multiple of eight.
intptr_t i;
unsigned int n;
i = (intptr_t)s;
n = i&3;
while (n && *s)
--n,++s;
This will align our pointer before the main loop starts.
To solve the second problem is impossible within the frame of the given algorithm.
We can’t know that we have read beyond the end of the string until after we have
done it. This will never be a problem in any existing machine and in any runtime
since we can’t have a page boundary at an unaligned address. The only problem
that could arise can come from debugging setups, where reading beyond the end of
a string can be detected by special measures.
The two “magic constants” H and L can be obtained from standard values by
using:
#include <limits.h>
#define L (ULLONG_MAX / UCHAR_MAX)
#define H (L << (CHAR_BIT - 1))
3.2.3
How can strlen fail?
The same problems apply that we discussed in the previous example, but in an
attenuated form: only a wrong answer is returned, not an outright wrong pointer.
The program will only stop at a zero byte. If (for whatever reason) the passed string
does not have a zero byte this program will go on scanning memory until a zero
byte is found by coincidence, or it will crash when attempting to reference inexistent
memory.
3.3
ispowerOfTwo
Given a positive number, find out if it is a power of two.
Algorithm: A power of two has only one bit set, in binary representation. We
count the bits. If we find a bit count different than one we return 0, if there is only
one bit set we return 1.
Implementation: We test the rightmost bit, and we use the shift operator to shift
the bits right, shifting out the bit that we have tested. For instance, if we have the
bit pattern 1 0 0 1, shifting it right by one gives 0 1 0 0: the rightmost bit has
disappeared, and at the left we have a new bit shifted in, that is always zero.
3.3. ispowerOfTwo
121
int ispowerOfTwo(unsigned int n)
{
unsigned int bitcount = 0;
while (n != 0) {
if (n & 1) {
bitcount++;
}
n = n >> 1;
}
if (bitcount == 1)
return 1;
return 0;
}
Our condition here is that n must be different than zero, i.e. there must be still some
bits to count to go on. We test the rightmost bit with the binary and operation. The
number one has only one bit set, the rightmost one. By the way, one is a power of
two.
Note that the return expression could have also been written like this:
return bitcount == 1;
The intention of the program is clearer with the “if” expression.
3.3.1
How can this program fail?
The while loop has only one condition: that n is different than zero, i.e. that n has
some bits set. Since we are shifting out the bits, and shifting in always zero bits since
bitcount is unsigned, in a 32 bit machine like a PC this program will stop after at
most 32 iterations. Running mentally some cases (a good exercise) we see that for an
input of zero, we will never enter the loop, bitcount will be zero, and we will return
0, the correct answer. For an input of 1 we will make only one iteration of the loop.
Since 1 & 1 is 1, bitcount will be incremented, and the test will make the routine
return 1, the correct answer. If n is three, we make two passes, and bitcount will be
two. This will be different than 1, and we return zero, the correct answer.
Anh Vu Tran anhvu.tran@ifrance.com made me discover an important bug. If
you change the declaration of “n” from unsigned int to int, without qualification, the
above function will enter an infinite loop if n is negative.
Why?
When shifting signed numbers sign is preserved, so the sign bit will be carried
through, provoking that n will become eventually a string of 1 bits, never equal to
zero, hence an infinite loop.
3.3.2
Write ispowerOfTwo without any loops
After working hard to debug the above program, it is disappointing to find out that
it isn’t the best way of doing the required calculation. Here is an idea I got from
reading the discussions in comp.lang.c.
122
Chapter 3. Simple programs
isPow2 = x && !( (x-1) & x );
How does this work?
Algorithm: If x is a power of two, it doesn’t have any bits in common with x-1,
since it consists of a single bit on. Any positive power of two is a single bit, using
binary integer representation.
For instance 32 is a power of two. It is represented in binary as: 100000 32-1 is
31 and is represented as:011111 32&31 is:
100000 & 011111 ==> 0
This means that we test if x-1 and x doesn’t share bits with the and operator. If
they share some bits, the AND of their bits will yield some non-zero bits. The only
case where this will not happen is when x is a power of two.
Of course, if x is zero (not a power of two) this doesn’t hold, so we add an explicit
test for zero with the logical AND operator: xx && expression.
Negative powers of two (0.5, 0.25, 0.125, etc) could share this same property in a
suitable fraction representation. 0.5 would be 0.1, 0.250 would be 0.01, 0.125 would
be 0.001 etc.
This snippet and several others are neatly explained in:
http://www.caam.rice.edu/~dougm/twiddle.
3.4
signum
This function should return -1 if its argument is less than zero, zero for argument equal
to zero, and 1 if the argument is bigger than zero. A straightforward implementation
could look like this:
int signum1(double x)
{
if (x < 0)
return -1;
else if (x == 0)
return 0;
else return 1;
}
We can rewrite that in a more incomprehensible form:
int signum2(double x) { return (x<0)?-1:(x==0)?0:1;}
Note that the second form is identical to the first in the generated code. Only in the
source code there is a noticeable loss in readability.
A more interesting implementation is this one:
int signum3(double x) { return (x > 0) - (x < 0); }
All those forms are equivalent in speed and size. The only difference is that the first
form is immediately comprehensible.
3.5. strlwr
123
3.5
strlwr
Given a string containing upper case and lower case characters, transform it in a
string with only lower case characters. Return a pointer to the start of the given
string.
This is the library function strlwr. In general is not a good idea to replace library
functions, even if they are not part of the standard library (as defined in the C
standard) like this one.
We make the transformation in-place, i.e. we transform all the characters of the
given string. This supposes that the user of this program has copied the original
string elsewhere if the original is needed again.
#include <ctype.h> /* needed for using isupper and tolower */
#include <stdio.h> /* needed for the NULL definition */
char *strTolower(char *str)
{
/* iterates through str */
unsigned char *p = (unsigned char *)str;
if (str == NULL)
return NULL;
while (*p) {
*str = tolower(*p);
p++;
}
return str;
}
We include the standard header ctype.h, which contains the definition of several
character classification functions (or macros) like “isupper” that determines if a given
character is upper case, and many others like “isspace”, or “isdigit”. We need to
include the stdio.h header file too, since it contains the definition for NULL.
The first thing we do is to test if the given pointer is NULL. If it is, we return
NULL. Then, we start our loop that will span the entire string. The construction
while(*p) tests if the contents of the character pointer p is different than zero. If this
is the case, we transform it into a lower case one. We increment our pointer to point
to the next character, and we restart the loop. When the loop finishes because we
hit the zero byte that terminates the string, we stop and return the saved position
of the start of the string.
Note the cast that transforms str from a char * into an unsigned char *. The
reason is that it could exist a bad implementation of the toupper() function, that
would index some table using a signed char. Characters above 128 would be con-
sidered negative integers, what would result in a table being indexed by a negative
offset, with bad consequences, as you may imagine.
3.5.1
How can this program fail?
Since we test for NULL, a NULL pointer can’t provoke a trap. Is this a good idea?
124
Chapter 3. Simple programs
Well this depends. This function will not trap with NULL pointers, but then the
error will be detected later when other operations are done with that pointer anyway.
Maybe making a trap when a NULL pointer is passed to us is not that bad, since
it will uncover the error sooner rather than later. There is a big probability that if
the user of our function is calling us to transform the string to lower case, is because
he/she wants to use it later in a display, or otherwise. Avoiding a trap here only
means that the trap will appear later, probably making error finding more difficult.
Writing software means making this type of decisions over and over again. Obvi-
ously this program will fail with any incorrect string, i.e. a string that is missing the
final zero byte. The failure behavior of our program is quite awful: in this case, this
program will start destroying all bytes that happen to be in the range of uppercase
characters until it hits a random zero byte. This means that if you pass a non-zero
terminated string to this apparently harmless routine, you activate a randomly firing
machine gun that will start destroying your program’s data in a random fashion. The
absence of a zero byte in a string is fatal for any C program. In a tutorial this can’t
be too strongly emphasized!
3.6
paste
You have got two text files, and you want to merge them in a single file, separated
by tabulations. For instance if you have a file1 with this contents:
line 1
line2
and you got another file2 with this contents
line 10
line 11
you want to obtain a file
line1
line 10
line 2
line 11
Note that both files can be the same.
A solution for this could be the following program:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/* We decide arbitrarily that lines longer than 32767 chars
will make this program fail. */
#define MAXLINELEN 32767
int main(int argc,char *argv[])
{
/* We need two FILE pointers, and two line buffers to hold
each line from each file. We receive in argc the number
of arguments passed + 1, and in the character array
3.6. paste
125
argv[] the names of the two files */
FILE *f1,*f2;
char buf1[MAXLINELEN],buf2[MAXLINELEN];
/* We test immediately if the correct number of arguments
has been given. If not, we exit with a clear error message. */
if (argc < 3) {
fprintf(stderr,"Usage: paste file1 file2\n");
exit(EXIT_FAILURE);
}
/*
We open both files, taking care not to open the same file
twice. We test with strcmp if they are equal. */
f1 = fopen(argv[1],"r");
if (strcmp(argv[1],argv[2]))
f2 = fopen(argv[2],"r");
else
f2 = f1;
/*
We read line after line of the first file until we reach
the end of the first file. */
while(fgets(buf1,MAXLINELEN,f1)) {
char *p = strchr(buf1,’\n’);
/*
the fgets function leaves a \n in the input. We erase it if
it is there. We use for this the strchr function, that
returns the first occurrence of a character in a string
and returns a pointer to it. If it doesn’t it returns NULL,
so we test below before using that pointer */
if (p)
*p = 0;
/*
We output the first file line, separated from the next with
a single tabulation char. */
printf("%s\t",buf1);
/*
If there are still lines to be read from file 2, we read
them and we print them after doing the same treatment as
above. */
if (f2 != f1 && fgets(buf2,MAXLINELEN,f2)) {
p = strchr(buf2,’\n’);
if (p)
*p = 0;
printf("%s\n",buf2);
}
/*
If we are the same file just print the same
line again. */
else printf("%s\n",buf1);
}
/*
End of the while loop. When we arrive here the first file
has been completely scanned. We close and shut down. */
fclose(f1);
126
Chapter 3. Simple programs
if (f1 != f2)
fclose(f2);
return 0;
}
3.6.1
How can this program fail?.
Well, there are obvious bugs in this program. Before reading the answer, try to see
if you can see them. What is important here is that you learn how to spot bugs and
that is a matter of logical thinking and a bit of effort. Solution will be in the next
page. But just try to find those bugs yourself. Before that bug however we see this
lines in there:
if (f2 != f1 && fgets(buf2,MAXLINELEN,f2)) {
}
else printf("%s\n",buf1);
If f1 is different from f2 (we have two different files) and file two is shorter than file
one, that if statement will fail after n2 lines, and the else portion will be executed,
provoking the duplication of the contents of the corresponding line of file one.
To test this, we create two test files, file1 and file2. their contents are:
File1:
File 1: line 1
File 1: line 2
File 1: line 3
File 1: line 4
File 1: line 5
File 1: line 6
File 1: line 7
File 1: line 8
File2:
File 2: line 1
File 2: line 2
File 2: line 3
File 2: line 4
We call our paste program with that data and we obtain:
The line five of file one was read and since file two is already finished, we repeat
it.
Is this a bug or a feature?
We received vague specifications. Nothing was said about what the program
should do with files of different sizes. This can be declared a feature, but of course
is better to be aware of it.
We see that to test hypothesis about the behavior of a program, there is nothing
better than test data, i.e. data that is designed to exercise one part of the program
logic.
3.6. paste
127
In many real cases, the logic is surely not so easy to follow as in this example.
Building test data can be done automatically. To build file one and two, this small
program will do:
#include <stdio.h>
int main(void)
{
FILE *f = fopen("file1","w");
for (int i =0; i<8;i++)
fprintf(f,"File 1: Line %d\n",i);
fclose(f);
f = fopen("file2","w");
for (int i = 0; i < 5;i++)
fprintf(f,"File 2: Line %d\n",i);
fclose(f);
return 0;
}
This a good example of throw away software, software you write to be executed
once. No error checking, small and simple, so that there is less chance for mistakes.
And now the answer to the other bug above.
One of the first things to notice is that the program tests with strcmp to see if
two files are the same. This means that when the user passes the command line:
paste File1 filE1 our program will believe that they are different when in fact
they are not. Windows is not case sensitive for file names. The right thing to do
there is to compare the file names with stricmp, that ignores the differences between
uppercase and lowercase.
But an even greater problem is that we do not test for NULL when opening the
files. If any of the files given in the command line doesn’t exist, the program will
crash. Add the necessary tests before you use it.
Another problem is that we test if we have the right number of arguments (i.e. at
least two file names) but if we have more arguments we simply ignore them. What
is the right behavior?
Obviously we could process (and paste) several files at once. Write the necessary
changes in the code above. Note that if you want to do the program really general,
you should take into account the fact that a file could be repeated several times in
the input, i.e.
paste file1 file2 file1 file3
Besides, the separator char in our program is now hardwired to the tab character
in the code of the program. Making this an option would allow to replace the tab
with a vertical bar, for instance.
But the problem with such an option is that it supposes that the output will
be padded with blanks for making the vertical bars align. Explain why that option
needs a complete rewrite of our program. What is the hidden assumption above that
makes such a change impossible?
Another feature that paste.exe could have, is that column headers are automat-
ically underlined. Explain why adding such an option is falling into the featurism
that pervades all modern software. Learn when to stop!
128
Chapter 3. Simple programs
3.7
Using arrays and sorting
Suppose we want to display the frequencies of each letter in a given file. We want
to know the number of ‘a’s, of ‘b’, and so on. One way to do this is to make an
array of 256 integers (one integer for each of the 256 possible character values) and
increment the array using each character as an index into it. When we see a ‘b’, we
get the value of the letter and use that value to increment the corresponding position
in the array. We can use the same skeleton of the program that we have just built
for counting characters, modifying it slightly.
#include <stdio.h>
#include <stdlib.h>
int
Frequencies[256]; // Array of frequencies
int
main(int argc,char *argv[])
{
// Local variables declarations
int count=0;
FILE *infile;
int c;
if (argc < 2) {
printf("Usage: countchars <file name>\n");
exit(EXIT_FAILURE);
}
infile = fopen(argv[1],"rb");
if (infile == NULL) {
printf("File %s doesn’t exist\n",argv[1]);
exit(EXIT_FAILURE);
}
c = fgetc(infile);
while (c != EOF) {
count++;
Frequencies[c]++;
c = fgetc(infile);
}
fclose(infile);
printf("%d chars in file\n",count);
for (count=0; count<256;count++) {
if (Frequencies[count] != 0) {
printf(“’%3c’ (%4d) = %d\n”, count,
count,
Frequencies[count]);
}
}
return 0;
}
3.7. Using arrays and sorting
129
We declare an array of 256 integers, numbered from zero to 255. Note that in C the
index origin is always zero. This array is not enclosed in any scope. Its scope then,
is global, i.e. this identifier will be associated to the integer array for the current
translation unit (the current file and its includes) from the point of its declaration
on. Since we haven’t specified otherwise, this identifier will be exported from the
current module and will be visible from other modules. In another compilation unit
we can then declare:
extern int Frequencies[];
and we can access this array. This can be good (it allow us to share data between
modules), or it can be bad (it allows other modules to tamper with private data), it
depends on the point of view and the application.
If we wanted to keep this array local to the current compilation unit we would
have written:
static int Frequencies[256];
The “static” keyword indicates to the compiler that this identifier should not be made
visible in another module.
The first thing our program does, is to open the file with the name passed as a
parameter. This is done using the fopen library function. If the file exists, and we are
able to read from it, the library function will return a pointer to a FILE structure,
defined in stdio.h. If the file can’t be opened, it returns NULL. We test for this
condition right after the fopen call.
We can read characters from a file using the fgetc function. That function updates
the current position, i.e. the position where the next character will be read.
But let’s come back to our task. We update the array at each character, within
the while loop. We just use the value of the character (that must be an integer from
zero to 256 anyway) to index the array, incrementing the corresponding position.
Note that the expression:
Frequencies[count]++
That means: Frequencies[count] = Frequencies[count]+1;
i.e.; the integer at that array position is incremented, and not the count variable!
Then at the end of the while loop we display the results. We only display fre-
quencies when they are different than zero, i.e. at least one character was read at
that position. We test this with the statement:
if (Frequencies[count] != 0) { ... statements ... }
The printf statement is quite complicated. It uses a new directive %c, meaning
character, and then a width argument, i.e.
%3c meaning a width of three output
chars. We knew the %d directive to print a number, but now it is augmented with
a width directive too. Width directives are quite handy when building tables to get
the items of the table aligned with each other in the output.
The first thing we do is to build a test file, to see if our program is working
correctly. We build a test file containing
ABCDEFGHIJK
And we call:
lcc frequencies.c
lcclnk frequencies.obj
130
Chapter 3. Simple programs
frequencies fexample
and we obtain:
D:\lcc\examples>frequencies fexample
13 chars in file
(
10) = 1
(
13) = 1
A (
65) = 1
B (
66) = 1
C (
67) = 1
D (
68) = 1
E (
69) = 1
F (
70) = 1
G (
71) = 1
H (
72) = 1
I (
73) = 1
J (
74) = 1
K (
75) = 1
We see that the characters \r (13) and new line (10) disturb our output. We aren’t
interested in those frequencies anyway, so we could just eliminate them when we
update our Frequencies table. We add the test:
if (c >= ’ ’)
Frequencies[c]++;
i.e. we ignore all characters with value less than space: \r \n or whatever. Note that
we ignore tabulations too, since their value is 8. The output is now more readable:
H:\lcc\examples>frequencies fexample
13 chars in file
A (
65) = 1
B (
66) = 1
C (
67) = 1
D (
68) = 1
E (
69) = 1
F (
70) = 1
G (
71) = 1
H (
72) = 1
I (
73) = 1
J (
74) = 1
K (
75) = 1
We test now our program with itself. We call: frequencies frequencies.c 758 chars in
file I have organized the data in a table to easy the display.
3.7. Using arrays and sorting
131
(
32) = 57
! (
33) = 2
" (
34) = 10
# (
35) = 2
% (
37) = 5
’ (
39) = 3
( (
40) = 18
) (
41) = 18
* (
42) = 2
+ (
43) = 6
, (
44) = 7
. (
46) = 2
/ (
47) = 2
0
(
48) = 4
1
(
49) = 4
2
(
50) = 3
3
(
51) = 1
4
(
52) = 1
5
(
53) = 2
6
(
54) = 2
: (
58) = 1
; (
59) = 19
< (
60) = 5
= (
61) = 11
> (
62) = 4
A (
65) = 1
E (
69) = 2
F (
70) = 7
I (
73) = 1
L (
76) = 3
N (
78) = 1
O (
79) = 1
U (
85) = 2
[ (
91) = 7
\ (
92) = 4
] (
93) = 7
a (
97) = 12
b (
98) = 2
c (
99) = 33
d ( 100) = 8
e ( 101) = 38
f ( 102) = 23
g ( 103) = 8
h ( 104) = 6
i ( 105) = 43
l ( 108) = 14
m ( 109) = 2
n ( 110) = 43
o ( 111) = 17
p ( 112) = 5
q ( 113) = 5
r ( 114) = 23
s ( 115) = 14
t ( 116) = 29
u ( 117) = 19
v ( 118) = 3
w ( 119) = 1
x ( 120) = 3
y ( 121) = 1
{ ( 123) = 6
} ( 125) = 6
What is missing obviously, is to print the table in a sorted way, so that the most
frequent characters would be printed first. This would make inspecting the table for
the most frequent character easier.
3.7.1
How to sort arrays
We have in the standard library the function “qsort”, that sorts an array. We study
its prototype first, to see how we should use it:
void qsort(void *b,size_t n,size_t s,int(*f)(const void *));
Well, this is quite an impressing prototype really. But if we want to learn C, we will
have to read this, as it was normal prose. So let’s begin, from left to right.
The function qsort doesn’t return an explicit result. It is a void function. Its
argument list, is the following:
Argument 1: is a void *.
Void *??? What is that? Well, in C you have void, that means none, and void *,
that means this is a pointer that can point to anything, i.e. a pointer to an untyped
value. We still haven’t really introduced pointers, but for the time being just be
happy with this explanation: qsort needs the start of the array that will sort. This
array can be composed of anything, integers, user defined structures, double precision
numbers, whatever. This "whatever" is precisely the “void *”.
Argument 2 is a size_t.
This isn’t a known type, so it must be a type defined before in stdlib.h. By
looking at the headers, and following the embedded include directives, we find:
132
Chapter 3. Simple programs
“stdlib.h” includes “stddef.h”, that defines a “typedef” like this:
typedef unsigned int size_t;
This means that we define here a new type called “size_t”, that will be actually
an unsigned integer. Typedefs allow us to augment the basic type system with our
own types. Mmmm interesting. We will keep this for later use.
In this example, it means that the size_t n, is the number of elements that will
be in the array.
Argument 3 is also a size_t.
This argument contains the size of each element of the array, i.e. the number
of bytes that each element has. This tells qsort the number of bytes to skip at
each increment or decrement of a position. If we pass to qsort an array of 56 double
precision numbers, this argument will be 8, i.e. the size of a double precision number,
and the preceding argument will be 56, i.e. the number of elements in the array.
Argument 4 is a function:
int (*f)(const void *));
Well this is quite hard really. We are in the first pages of this introduction and
we already have to cope with gibberish like this? We have to use recursion now. We
have again to start reading this from left to right, more or less. We have a function
pointer (f) that points to a function that returns an int, and that takes as arguments
a void *, i.e. a pointer to some unspecified object, that can’t be changed within that
function (const).
This is maybe quite difficult to write, but quite a powerful feature. Functions can
be passed as arguments to other functions in C. They are first class objects that can
be used to specify a function to call.
We have to use recursion now. We have again to start reading this from left to
right, more or less. We have a function pointer (f) that points to a function that