The Department of Computer Science & Engineering

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003

Statement Level Control Structures

According to a theorem proved by Corrado Böhm and Giuseppe Jacopini (CACM, 1966), any procedure can be written with the following control structures:

  1. Sequence: Do one action, and then another, and then another, etc.
  2. Selection: Do one action or another, based on some condition.
  3. Loop: Repeatedly do an action as long as some condition holds.
If these are the only control structures used, some code may need to be repeated. However, if the break (or exit) statement is added, code repetition is not needed.

Note, that in the Preliminaries notes, I said that among the defining criteria of being a programming language was the facilities for sequence, selection, and loop.

This is an often overlooked control structure because it is so pervasive.

A standard part of any imperative program is a sequence of statements, to be executed in sequential order. Functional programming languages get sequential execution from the left-to-right order of evaluating the arguments to a function. Logical programming languages also often have a left-to-right evaluation order.

Fortran indicates statement sequence by line sequence, but also allows a sequence of statements on one line separated by semi-colons (;).

The semi-colon (;) started out as a statement separator, but it turned out to be cleaner syntax to make it a statement terminator, as it is in most current languages.

A compound statement is a sequence of statements treated syntactically as one statement. The statements in a compound statement are usually surrounded by braces ({ ... }), but some languages use begin ... end or other bracketing keywords.

A block is a compound statement in which variables may be declared with scope limited to that compound statement.

The goto is the most basic control structure. It specifies immediate transfer to a given statement. In some languages that statement may be anywhere in the program; other languages have rules that restrict the goto target. The usual syntax is goto <label>, where <label> is a symbolic or numeric label of some statement. (We will assume symbolic labels.) Generally, a statement is labeled using the syntax <label>: <statement>.

Some languages allow a goto target that is the result of a run-time computation.

There are two major objections to the use of goto:

  1. It allows "spaghetti" code that is extremely difficult to understand.
  2. The use of labels makes program verification extremely difficult. Consider the difference between
         x = 3.0;
         y = z / x;
               x = 3.0;
         here: y = z / x;
    The first obviously does not involve a divide by zero, but consider what you have to do to check that the second does not.

The use of the goto was challenged by Edsger Dijkstra in his famous letter to the editor, "Goto Statement Considered Harmful" (CACM 1968). This launched the structured programming movement. Since the Böhm/Jacopini theorem proved that the goto is not needed, several subsequent languages downplayed, or even eliminated the goto statement. Java, for example, does not have it.

The exit statement is an executable statement whose effect is to immediately continue execution at the statement immediately following the lexically innermost containing control structure. What control structures may be exited from is language-dependent.

The exit statement can eliminate the need for repeated code when using only the sequence, selection, and loop control structures. For example, consider the following pseudocode for reading and processing some input file, first using a goto:

loop: input := read(file);
      if (input = eof) goto out;
      goto loop;
out:  ...
Using a while, this may be rewritten as:
input := read(file);
while (input != eof) {
    input := read(file);
Note the repetition of input := read(file);. However, using exit, this may be rewritten as:
while (true) {
    input := read(file);
    if (input == eof) exit;

A single-level exit always exits from the lexically innermost control structure. A multi-level exit may exit from a more distant lexically containing control structure. In some languages, the multi-level exit takes a numerical parameter, i, and exits from the ith containing structure. In others, the multi-level exit takes a symbolic parameter, label, and exits from the control structure labeled label. This use of a label differs from, and is not as dangerous as the statement label used as the target of a goto. Multi-level exit statements can be used to avoid code repetition that is needed if only the single-level exit is available.

In C-based languages, break is used as the exit statement. In C and C++, break is a single-level exit. In Java it is a multi-level exit, taking an optional symbolic label. Perl uses last as a multi-level exit with an optional symbolic label.

In Common Lisp (return-from <label> [<expression>]) is a multi-level exit. (return [<expression>]) is equivalent to (return-from nil [<expression>]), and serves as a single-level exit.

The if Statements
Single-Branch Selection
The single-branch selection statement is usually of the form if (<expression>) <statement>. If the <expression> evaluates to a true value the <statement> is executed. Otherwise it isn't. In either case, the next statement done is the statement following the if statement.

Perl has both an if (<expression>) <compound-statement> and an unless (<expression>) <compound-statement>. The latter executes the <compound-statement> if and only if the <expression> is false. In Perl, a <compound-statement>, surrounded by braces, is required, even if there is only one statement in it.

Common Lisp, like Perl, has both (when <expression> <form>) and (unless <expression> <form>) single-branch selection expressions.

Double-Branch Selection
The double-branch selection statement is usually of the form
if (<expression>) <then-statement> else <else-statement>
and is often referred to as the if-then-else statement. If the <expression> evaluates to a true value, <then-statement> is executed. Otherwise <else-statement> is. In either case, only one of the two statements is executed, and execution continues with the statement following the if satement. This is the selection referred to in the Böhm/Jacopini theorem. Of course the then statement could always be empty, reducing to a single-branch selection statement.

The if-then-else statement gave rise to a famous case of syntactic ambiguity. When is the else statement executed in a case like

if (test1)
    if (test2)
The two possibilities are: if test1 is true and test2 is false; if test1 is false. In current languges, this is generally solved by a rule that matches the else with the nearest unmatched if. Thus, in the example above, statement2 would be done in the case that test1 is true and test2 is false. If the other case were wanted, brackets or a special keyword need to be used. In Java, brackets would be used to turn the inner if statement into a one-statement compound statement:
if (test1) {
    if (test2)
Other languages end every if with a keyword such as end if, making the two cases
if (test1)
    if (test2)
    end if
end if
if (test1)
    if (test2)
    end if
end if

Common Lisp's double-branch selection expression is

(if test

Multi-Branch Selection
The multi-branch selection that is the most straightforward extension of the double-branch selection chooses one of multiple statements to execute based on the first of multiple tests to evaluate to true. In the C-based languages, it can be expressed as a nested set of (right-associative) if-then-elses:
if (test1) statement1
else if (test2) statement2
else if (test3) statement3
else if (testn) statementn
else default-statement
Notice that if we used indentation and brackets to indicate statement nesting, this would be
if (test1) {statement1}
    else {if (test2) {statement2}
             else {if (test3) {statement3}
                            else {if (testn) {statementn}
                                    else {default-statement}}...}}
To flatten this, even syntactically, some languages combine the else with the subsequent if giving an elseif keyword. The above would then look like
if (test1) statement1
elseif (test2) statement2
elseif (test3) statement3
elseif (testn) statementn
else default-statement
and would not involve nested selection statements.

Common Lisp's multiple-branch selection expression looks like this latter version:

(cond (test1 expression11 expression12 ...)
      (test2 expression21 expression22 ...)
      (test3 expression31 expression32 ...)
      (t default-expression1 default-expression2 ...))

The case Statement
The case statement, introduced by ALGOL W (1966), is a special purpose multiple-branch selection statement for use when all the tests test the same expression for equality to various values.

The general form of the case statement is captured in the version in Common Lisp:

(case keyform
  (keylist1 expression11 expression12 ...)
  (keylist2 expression21 expression22 ...)
  (keylist3 expression31 expression32 ...)
  (t default-expression1 default-expression2 ...))
Each keylist must be a list of literal values (they are not evaluated). To evaluate the case expression, the keyform is evaluated. If its value is listed in one of the keylists the expressions of that keylist are evaluated in order, and the value of the case expression is the value of the last such expression. If the value of the keyform is not listed in one of the keylists, and the optional t case is present, the default-expressions are evaluated, and the value of the last one of those is the value of the case expression. No key may appear in more than one keylist.

Several languages limit the case keys to be ordinal values. That way, the case statement may be compiled into a table of instructions indexed by the key.

The case statement of the C-based languages is

switch (keyform) {
  case key1: statement11 statement12 ...
  case key2: statement21 statement22 ...
  case key3: statement31 statement32 ...
  [default: default-statement1 default-statement2 ...]
These keys are limited to integers, and there can only be one key per case. To handle the situation of allowing the same set of statements to be executed for several different keys, the C-based languages specify that control flows from the last statement of the chosen case directly through to the first statement of the next listed case. If the programmer does not want this to happen, a break statement must be used. For example,
switch (keyform) {
  case 1: 
  case 3:
  case 5:
  case 7:
  case 9: statement-odd;
  case 2:
  case 4:
  case 6:
  case 8: statement-even;
  default: statement-too-big;
Often, there is only one key in each case, and the break can easily be forgotten.

Iterative Loops
Logically Controlled Loops
Pretest Loops
The pretest logically controlled loop, referred to as the while loop, is the loop considered in the proof of the Böhm/Jacopini theorem. The version used in the C-based languages is a typical example:
while (test) statement
The semantics of this (in HOSL) is:
loop: if not test goto out
      goto loop
out:  ...
Notice that test is evaluated each time around the loop, and that statement might never be executed.

Posttest Controlled Loops
The C-based languages also have a posttest logically controlled loop, called the do-while:
do statement while (test);
Its semantics is:
loop: statement
      if test goto loop
Notice that statement is always executed at least once, and, again test is evaluated each time around the loop.

A variant that several languages have is called the repeat-until loop and looks like:

repeat statement until test;
Its semantics is:
loop: statement
      if not test goto loop
The repeat-until is very similar to the do-while, but often easier to think about because of the opposite sense of its test.

Loop Forever
The most flexible iterative loop is the one that loops forever, until an exit statement is executed within its body. The Common Lisp version is
(loop {expression})

In the C-based languages, it may be simulated by

while (true) statement
for (;;;) statement
Recall the example of the exit statement above:
while (true) {
    input := read(file);
    if (input == eof) exit;

Counter-Controlled Loops
The oldest counter-controlled loop, Fortran's DO loop illustrates all the issues:
       DO label variable = initial-expression, terminal-expression [, stepsize-expression]
label  last-statement
The semantics of this are [text, p. 331-332]
       init-value := initial-expression
       terminal-value := terminal-expression
       step-value := stepsize-expression
       variable := init-value
       iteration-count := max(int((terminal-value - init-value + step-value)
                                  / step-value),
loop:  if iteration-count <= 0 goto out
label: last-statement
       variable = variable + step-value
       iteration-count = iteration-count - 1
       goto loop
out:   next-statement
  • The loop parameter expressions are evaluated only once, so changing variables that are part of them, doesn't affect the number of times the loop is executed.
  • The number of times the loop is exected is controlled by the iteration-count, not the value of the variable, so assigning to the loop variable inside the loop doesn't affect the number of times the loop is executed.
  • The loop label is available to be the target of goto's inside the loop body. The effect would be to skip the rest of the loop body (except the last-statement---therefore the CONTINUE statement) and continue with the next iteration.
  • The scope of the loop variable is not limited to the loop body, and it retains its last value when the loop is terminated.

The counter-controlled loop statement of the C-based languages is different in several respects from Fortran's DO loop. Their format is:

for (init-expr1, ..., init-exprk;
      terminal-expr1, ..., terminal-exprn;
       step-expr1, ..., step-exprm) 
The semantics is
{    init-expr1
     if not terminal-exprn goto out
  • The loop parameter expressions are arbitrary expressions, except that, in Java, the terminal expressions must be Boolean expressions.
  • The loop parameter expressions are evaluated each time through the loop, so changing variables that are part of them, does affect the number of times the loop is executed.
  • The number of times the loop is exected is controlled by the last terminal expression, which is evaluated each time through the loop, so assigning to any variables it uses inside the loop does affect the number of times the loop is executed.
  • If a continue statement is executed inside the loop, control transfers to the bottom label. The effect would be to skip the rest of the loop body and continue with the next iteration. (continue may also be used inside while and do-while loops.)
  • The scope of any variable declared inside the init expressions is limited to the loop parameter expressions and the loop body, and so their last values are not available when the loop is terminated.

Just about every current language has a counter-controlled loop.

Loops Based on Data Types or Data Structures
Ada's for statement has the format [text p. 332]
for variable in [reverse] discrete_range loop
end loop;
Where discrete_range is s subrange of an integer or enumeration type. For example,
for d in Mon .. Fri loop
end loop;
Thus, the loop variable ranges over all the values of the given subrange type.

Recall the discussion in my notes on Operational Semantics that such a loop over a subrange cannot be directly simulated by a while loop.

Several languages also have loops that range over all the elements of some data structure. We saw an example from Perl in my notes on Data Types.

Here's a simple example from Common Lisp:

cl-user(4): (dolist (d '(Sunday Monday Tuesday
                         Wednesday Thursday Friday Saturday))
	      (print d))


A generator is a function that, each time it is called, returns another member of some data structure (collection). There are generally three parts to a generator function:
  1. A function that takes the collection as argument, and sets up the generator. It may also return the generator function as its value.
  2. The generator function itself, that returns another element of the collection each time it is called.
  3. A way to tell that the collection has been exhausted. Either the generator returns some special value (such as nil, or there is a special function for the purpose.
Generators in Java are classes that implement the Iterator interface. Their three parts are the three methods
  1. iterator()
  2. next()
  3. hasNext()

Recursive Loops
Loops are one of the three classes of control structures of the Böhm/Jacopini theorem. The loops we have been considering have been iterative loops. Recursive loops are an alternative. For example, an iterative Lisp function to call the function visit on every member of a list is
(defun visitAll (list)
  (dolist (x list)
    (visit x)))
whereas a recursive version to do the same thing is
(defun visitAll (list)
  (unless (endp list)
    (visit (first list))
    (visitAll (rest list))))
Every iterative loop may be rewritten as a recursive loop, but some recursive loops may be rewritten as iterative loops only with the aid of an explicit stack.

Guarded Commands
Dijkstra's guarded if:
if <test1> -> <statement1>
[] <test2> -> <statement2>
[] ...
[] <testn> -> <statementn>
Evaluate the tests, and, nondeterministically, execute the statement of one of the tests that evaluates to true. If none of the tests evaluates to true, it is a run-time error.

Dijkstra's guarded loop:

do <test1> -> <statement1>
[] <test2> -> <statement2>
[] ...
[] <testn> -> <statementn>
Evaluate the tests, and, if any evaluate to true, nondeterministically execute the statement of one of the tests that evaluates to true, and then, do it all again. When none of the tests evaluates to true, the loop terminates.

First Previous Next

Copyright © 2003 by Stuart C. Shapiro. All rights reserved.

Stuart C. Shapiro <>