UNIVERSITY AT BUFFALO, THE STATE UNIVERSITY OF NEW YORK
The Department of Computer Science & Engineering

STUART C. SHAPIRO: CSE 305

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003

Variables and Binding

A variable is a bundle of six attributes: name, address, value, type, scope, and lifetime.

An attribute may be bound to a variable (or other program entity) at various times. The text mentions: language design time; language implementation time; compile time; link time; load time; and run time.

We will just be concerned with a two-way distinction:

Static Binding: Static binding happens at compile time (including link and load time), and remains the same throughout the run of the program.
Dynamic Binding: Dynamic binding happens during run time, and may change during run time.

Name

Most variables have a name. Names were discussed in the previous web page As an example of a variable without a name, consider this interaction with the Java BeanShell:

bsh % string = "This is a string.";
bsh % set = new HashSet();
bsh % set.add(string);
bsh % print(string);
This is a string.
bsh % print(set);
[This is a string.]

There are two named variables, string and set. The variable string is bound to a word of memory that contains a reference to a string object. This reference is string's value. The variable set is bound, as its value, to a reference to an instance of the HashSet class, and that instance includes a word of memory which contains a copy of the string reference bound, as a value, to string. That element of the HashSet is as much a variable as string is, but it doesn't have a name.

On the other hand, more than one variable might have the same name. Consider the Java code,

for (int i = 0; i < 10; i++) {
     a[i] = i;
}

for (int i = 0; i < 100; i++) {
     squares[i] = i*i;
}

The two for loops each have a variable named i, yet they are different variables.

Also, when a subroutine, that has a local variable x, calls itself recursively, each instance of the subroutine will have a separate variable named x. This issue of subroutine management will be discussed in more detail later.

The two variables named i in the for loops are statically bound to their names at compile time. The variables named x, in all recursive activations except the top one, are bound to their names dynamically, during run time.

When using an interpreted language, such as Lisp or BeanShell Java, variables such as string and set, above, may be created and bound to their names dynamically, at run time.

Scope

The scope of a variable is the amount of program within which the variable's name refers to it (as opposed to another variable of the same name). If the "amount" of program is determined spatially, that is static (sometimes called "lexical") scope. If it is determined temporally, that is dynamic scope.

To discuss scope, we need another 3-way distinction

Local variable: A local variable is a variable declared in the same program subunit (program, subprogram, function, etc.) in which it is used.
Nonlocal variable: A nonlocal variable is a variable not declared in the same program subunit in which it is used, but not available to every subunit in the program.
Global variable: A global variable is a variable not declared in the same program subunit in which it is used, and available to every subunit in the program.

Static and dynamic scope may be clearly compared in Common Lisp and Emacs-Lisp.

Here's an interaction with Common Lisp:

cl-user(1): (setf x 1)
1

cl-user(2): (defun outer (x)
	      (inner))
outer

cl-user(3): (defun inner ()
	      x)
inner

cl-user(4): (outer 2)
1

Common Lisp's variables are statically scoped. (Although variables may be declared to be dynamically scoped.) Since x is not local to the function inner, it refers to the variable with the same name that has most recently been declared looking up the static spatial area of the program. There, the most recent declaration of x is the global one implicitly declared in the setf expression. So the x of inner is in the scope of the global x, and they refer to the same variable. However, the x of the function outer is a formal parameter, and so is in different scope, and so refers to a different variable.

Now here's the apparently same interaction with Emacs-Lisp

(setf x 1)
1

(defun outer (x)
  (inner))
outer

(defun inner ()
  x)
inner

(outer 2)
2

(inner)
1

Emacs-Lisp is dynamically scoped (like pre-Common Lisp Lisps). Since x is not local to the function inner, it refers to the variable with the same name that has most recently been declared looking up the dynamic chain of function calls. When inner was called from outer, that would be outer's x, but when inner was called from the top-level, that would be the top-level, global x, the one assigned by the setf.

The programming languages that descend from Algol 60 allow blocks where variables may be declared, giving them smaller scopes than subprograms (methods). The Java for loops shown above are excellent examples of this.

This discussion of scope applies to names other than variable names, for example, names of subprograms.

Summary
The static scope of a variable is the block in which it is declared, plus all spatially (lexically) enclosed blocks, except those where it is shadowed by a declaration of another variable with the same name. Some languages, in some circumstances, include the area of the block in which it is declared that occurs before the declaration; others don't. Java doesn't allow declaration of a new variable inside the static scope of another variable of the same name.

The dynamic scope of a variable is the block in which it is declared, plus all dynamically enclosed blocks, i.e., blocks that are executed while the block in which the variable is declared is still executing.

Dynamic scope is very difficult to understand and to check for program correctness, since it is extremely hard to tell, by looking at a program, where any given variable has gotten its value. Most programming languages use static scoping, although Perl, as well as Common Lisp, allows variables to be declared to use dynamic scoping.

In general, you should make the scope of any variable be the smallest that is needed. In particular: declare the for loop index in the for loop itself; and avoid global variables unless absolutely necessary.

See the text for more examples of static and dynamic scoping in block-structured languages.

Address

The address of a variable is also referred to as its l-value, as opposed to the value of the variable, which is referred to as its r-value. This is from considerations of an assignment statement: x = y, where the l-value is the address of x, the r-value is the value of y and the r-value is to be stored into the address at the l-value. Note that the computation of the l-value might be as complicated as the computation of the r-value, as in

a[<expression>] =
<expression>;

Aliases:

Aliases are two variables that share the same address.

Fortran77 has several ways to create aliases. One is by the Equivalence statement:

      Program Alias
C     Test program for aliases

      Integer i,j
      Equivalence (i, j)

      i = 1
 10   Print *, '10: i = ', i, ', j = ', j

      j = 2
 20   Print *, '20: i = ', i, ', j = ', j

      End

-------------------------------------------------------
<cirrus:Programs:1:124> f77 -o alias.out alias.f
NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o alias.out alias.f
alias.f:
 MAIN alias:

<cirrus:Programs:1:125> alias.out
 10: i =   1, j =   1
 20: i =   2, j =   2

The Equivalence statement is deprecated in Fortran90.

Deprecated
"A deprecated element or attribute is one that has been outdated by newer constructs... Deprecated elements may become obsolete in future versions" [http://www.w3.org/TR/REC-html40/conform.html]

C also lets you do this, if you know where to look:

/*
 * C Alias Program
 *
 */

#include <stdio.h>

int main() {
  int i, a[3] = {1,2,3}, j;

  i = 0;
  j = 6;

  printf("a = %d, %d, %d, %d, %d, \n", a[-2], a[-1], a[0], a[1], a[2]);

  return 0;
}

-------------------------------------------------------
 gcc -Wall alias.c -o alias.out

 alias.out
a = 0, 6, 1, 2, 3,

This is not a "feature" of C, but results from it not doing range checking on arrays.

There are other ways to create aliases. We will discuss them in later sections of the course.

Clearly, aliasing can lead to programs that are hard to understand and to debug.

Storage Bindings and Lifetime

A variable may be bound to an address in RAM, on the stack, or on the heap. For example, Fortran77 and earlier Fortrans used neither a stack nor a heap, and so bound all variables to addresses in RAM; the versions of the variable x of the recursive subroutine mentioned above are stored on the stack; the HashSet and its unnamed variables discussed above are stored on the heap.

Relevant terms:

Allocation: The process of taking a memory cell from the pool of available memory, in order to bind it to a variable. [Text, p. 202]
Deallocation: The process of returning a memory cell to the pool of available memory. [Text, p. 202]
Lifetime:: The time during which a variable is bound to a memory cell. [Text, p. 202]

Variable categories by lifetime and memory location:

Static Variables

Static variables are bound to memory cells before execution begins, and remain so until program termination. That is, the lifetime of a static variable is the entire running time of the program.

Fortran77 and earlier versions of Fortran use only static variables. One implication is that recursion is not possible. This can be demonstrated by a subroutine that keeps count of the number of times it has been called:

      Program Count

C     Demonstration of static variables in Fortran.

      print *, 'Starting Test Program'
      Call CountingRoutine()
      Call CountingRoutine()
      Call CountingRoutine()
      Call CountingRoutine()
      Call CountingRoutine()
      End

      Subroutine CountingRoutine ()
C     Keeps track of the number of times it has been called
C        and prints that count each time.
      Integer count
      Data count/0/
      count = count + 1
      Print *, 'count = ', count
      Return
      End

-------------------------------------------------------
<cirrus:Programs:1:152> f77 -o count.fout count.f
NOTICE: Invoking /opt/SUNWspro/bin/f90 -f77 -ftrap=%none -o count.fout count.f
count.f:
 MAIN count:
	countingroutine:

<cirrus:Programs:1:153> count.fout
 Starting Test Program
 count =   1
 count =   2
 count =   3
 count =   4
 count =   5

Notice that, even though count is a local variable of CountingRoutine, its lifetime exceeds the running time of CountingRoutine.

C can achieve this effect by declaring a variable to be static:

/*
 *  Count
 *  Stuart C. Shapiro
 *  
 *  This program demonstrates static variables
 *  with a function that counts the number of times that it is called.
 *  
 */

#include <stdio.h>

void counting_function() {
  /* Prints the number of times that it has been called. */

  static int count = 0;
  printf("count = %d\n", ++count);
}

int main() {
  /* Demonstrates counting_function by calling it 5 times. */
  counting_function(); counting_function(); counting_function();
  counting_function(); counting_function();
  return 0;
}

-------------------------------------------------------
<cirrus:Programs:1:140> gcc -Wall count.c

<cirrus:Programs:1:141> a.out
count = 1
count = 2
count = 3
count = 4
count = 5

This can also be done in Java:

/**
 * Counter.java
 *
 *
 * Created: Mon Sep 15 16:47:41 2003
 *
 * @author Stuart C. Shapiro
 */

public class Counter {
    public static int count;

    public Counter (){
    }

    /* Prints the number of times that it has been called.
     */
    public static void counting_function() {
	System.out.println("count = " + ++count);
    }

    /* Demonstrates counting_function by calling it 5 times. */
    public static void main (String[] args) {
	counting_function(); counting_function(); counting_function();
	counting_function(); counting_function();
    } // end of main ()

}// Counter

-------------------------------------------------------
<cirrus:Programs:1:142> javac Counter.java

<cirrus:Programs:1:143> java Counter
count = 1
count = 2
count = 3
count = 4
count = 5

Is it true that

"when the static modifier appears in the declaration of a variable in a class definition in C++, Java, and C#, it has nothing to do with the lifetime of the variable. In this context, it means the variable is a class variable, rather than an instance variable." [Text, p. 203]

The Java Standard says,

"Preparation involves creating the static fields (class variables and constants) for a class or interface and initializing such fields to the default values (¤4.5.5). This does not require the execution of any source code" [Java Language Specification, Chapter 12.3.2]

Notice that the class variable count was available for use without constructing an instance of the class Counter. CLOS also has class variables, called "shared slots", but they cannot be accessed except via an instance of the class.

Stack-Dynamic Variables

A stack-dynamic variable is one that is bound to an address on the stack, which is dynamically (during run-time) allocated for that purpose. It may also be unbound during run-time, and its memory cell deallocated by being popped off the stack.

In most current programming languages, the formal parameters and local variables of subroutines (functions, methods) are stack-dynamic variables. Memory cells are allocated for them when the subroutine begins execution, and are deallocated when the subroutine ends execution.

At any time during the run of the program, the stack contains the memory cells for all the subroutines currently executing, including all the invocations of recursive subroutines currently executing.

If subroutine A calls subroutine B, then B terminates, and A then calls C, the stack memory used by the formal parameters and local variables of C will be some or all the memory cells just used by B, and may be more.

The address of a local variable of a subroutine will always be the same address relative to the beginning of the area of the stack that subroutine uses. That is how the compiler can compile code for the subroutine, even though it is not known until run-time what the actual addresses of the local variables will be.

Explicit Heap-Dynamic Variables

Explicit heap-dynamic variables are those nameless variables allocated on the heap for dynamic data structures or for instances of objects in OO programming languages. In Java or C++, they are allocated by the new operator. For example, in the Java BeanShell interaction shown above, the statement

set = new
HashSet();

allocated memory cells on the heap to hold the instance variables of a HashSet, bound the nameless variables to those cells, and returned a reference (pointer) to them to be stored in the stack-dynamic variable set.

In C, the allocation operator is the function malloc(size), which takes an argument specifying the amount of memory required, and returns a pointer to that area of heap memory.

Heap memory must be used for any dynamically allocated object or data structure that can be allocated in a subroutine and then have a pointer (reference) to it assigned to a variable which is outside the dynamic scope of the subroutine, so that its lifetime must extend to the time after the subroutine terminates and deallocates its stack memory. Consider the Java program,

import java.util.*;

public class HeapDemo {

    public static HashSet singleton(Object obj) {
	HashSet set;
	set = new HashSet();
	set.add(obj);
	return set;
    }

    public static void main (String[] args) {
	HashSet myset;
	myset = singleton("element");
	System.out.println(myset);
	myset = singleton("another");
	System.out.println(myset);
    }
}// HeapDemo

-------------------------------------------------------
<cirrus:Programs:2:101> javac HeapDemo.java

<cirrus:Programs:2:102> java HeapDemo
[element]
[another]

Although the memory for the HashSet is allocated in the singleton method, it cannot be allocated on the stack, because it must survive the termination of singleton.

Heap memory should be returned to the heap when it is no longer needed, like the HashSet [element] was no longer needed after myset was reassigned above. Otherwise, a program that runs long enough might use up the heap and abnormally terminate. In C and C++, heap memory must be explicitly deallocated with the operator free(p) or delete p, respectively, where p is a pointer to the object or data structure whose memory is no longer needed.

Requiring the programmer to explicitly deallocate heap memory allows for mistakes of failing to deallocate storage that is no longer needed, resulting in programs that eventually use up their heaps, or of attempting to use pointers to memory that has already been deallocated (the "dangling pointer" problem, which is discussed again in Chapter 6).

A more reliable idea is for the programming system itself to deallocate unusable heap memory, a process called "automatic garbage collection", which will be discussed again in Chapter 6. Lisp was the first programming language to perform automatic garbage collection. Java also does it.

Implicit Heap-Dynamic Variables

An implicit heap-dynamic variable is like an explicit heap-dynamic variable, but is created without an explicit allocation operator. The text gives as an example the JavaScript statement, list = [10.2, 3.5], where the variable storing the two-element array (not the variable list) is an implicit heap-dynamic variable. I am a little skeptical about this category.

Type

The type of a variable specifies;

how the bits in its memory cell are to be interpreted;
the range of values the variable can have;
the operations that are defined for those values;
which operations are to be used for overloaded operators, such as +, with the variable as operand.

Static Typing

In static typing, the compiler determines the type of every variable and expression, and therefore the operation to use for overloaded operators, for example the operation denoted by + in the statement x = y + z. Once compiled, the information used by the compiler for typing needn't be retained. Often, the names of variables are not retained after compilation, so that symbolic debugging of a running program cannot be done.

Implicit Static Typing

In implicit static typing, the compiler determines the type of a variable by the variable's name. For example,
in Perl, @

	is an array
a name starting with	`$`	is a scalar (a number or string)
	`%`	is a hash

and in Fortran,

a name starting with	`I, i, J, j, K, k, L, l, M, m, N, n`	is an integer
a name starting with	anything else	is a real

Explicit Static Typing

In explicit static typing, the most common in modern languages, the type of each variable is given to the compiler by a declaration statement, which is not an executable statement---it is used by the compiler, rather than being converted into executable code.

In Fortran, an explicit type declaration can override the implicit naming convention, which could lead to confusion.

Inferred Static Typing

The ML compiler determines the type of each variable by inferring it from various evidence, though it can be declared also.

<cirrus:Programs:2:109> sml97
Standard ML of New Jersey, Version 110.0.3, January 30, 1998
val use = fn : string -> unit

- fun reciprocal(x) = 1.0 / x;
val reciprocal = fn : real -> real

- fun bad(y) = reciprocal(y) + 2 * y;
stdIn:7.14-7.28 Error: operator and operand don't agree [literal]
  operator domain: int * int
  operand:         int * real
  in expression:
    2 * y

The explanation of this error message is

2 is of type int.
Since * requires both its operands to be of the same type, y should be an int.
But y is the argument of the function reciprocal.
When reciprocal was defined, its formal parameter was used in an expression that requires it to be a real.
Therefore the actual argument of reciprocal, namely y must be real.
That is a type conflict, and a compiler error.

Dynamic Typing

It should be that in dynamic typing, the variable is bound to a type during run-time. The text says, "the variable is bound to a type when it is assigned a value in an assignment statement" [p. 198].

However, it seems better to think of such languages has having typed values, rather than typed variables. For example, the ANSI Common Lisp standard says that "Objects, not variables, have types" [ANSI Common Lisp, Section 4.1]. Consider this example:

cl-user(1): (setf x 33.72
		  y 7.9)
7.9

cl-user(2): (print (gcd x y))
Error: `33.72' is not of the expected type `integer'
  [condition type: type-error]

In this example, gcd requires its arguments to be integers, and a type error results. Yet the type error is about 33.72, not about x.

Although Java uses static typing, it also has types associated with values:

bsh % list = new LinkedList();

bsh % list.add("A string");

bsh % list.add(new HashSet());

bsh % print(list.getFirst().getClass());
class java.lang.String

bsh % print(list.get(1).getClass());
class java.util.HashSet

Notice that in the Java expression referenceVariable.method(), the static type of referenceVariable must have the method() defined for it, or a compile-time error will be issued. However, the actual class of the dynamic value of referenceVariable will be used to choose the particular details of the method(), if that class is a subclass of the static class of referenceVariable.

For example, let ClassA be a class in which methodA() is defined, let ClassA1, ClassA2, and ClassA3 extend ClassA and specialize methodA() and let varA be a variable declared to be of type ClassA. The expression varA.methodA() is syntactically legal regardless of whether the current value of varA is an object of type ClassA, ClassA1, ClassA2, or ClassA3, but the value of varA.getClass() will determine the specific version of methodA() used. On the other hand, if varB is declared to be of some superclass of ClassA for which methodA() is not defined, varB.methodA() will produce a compiler error, unless a cast is used, such as ((ClassA2)varB).methodA().

In at least several languages with typed variables, including Common Lisp, Java, and JavaScript, the programmer may write code to test the types of values and give reasonable error messages if they are not what was expected, but, without doing this, a type error might only be caught many levels of function calls below where the error actually occurred. Static type-checking generally makes program debugging easier.

Value

The value of a variable, also called its r-value, is the contents of the memory cell at the variable's address.

This might lead to confusion with pointer or reference variables.
For example, after the assignment set = new HashSet();, above, should we say that the value of set is a HashSet or a reference to a HashSet? The latter is the more careful way to speak; the former is more informal. We will discuss this more when we discuss pointer types in Chapter 6.

Now we will discuss when a variable is first bound to a value, and whether its value binding is allowed to change.

A formal Parameter is initially bound to a value when its subprogram is called. We will discuss this in Chapter 9.

The remainder of this discussion will concern variables that are not parameters---global and local variables. When a variable is bound to an address (memory cell), its value might be whatever bit settings were left in that cell, interpreted according to the variable's type, for example #include <stdio.h> int main() { int x; double y; printf("x = %d y = %e\n", x, y); return 0; } ------------------------------------------------------- <cirrus:Programs:1:103> gcc -Wall leftover.c -o leftover.out <cirrus:Programs:1:104> leftover.out x = -4264396 y = 8.485876e-314 or it might be initialized, either to a default value, or to an value explicitly specified in a declaration.

Variable Initialization might be done at compile-time, or at run-time.

If variable initialization is done at compile-time, the initialization expression will usually be limited.

"FORTRAN also provides a nice feature ... of initially defining the values of a number of variables in a compact manner, using a DATA statement. The DATA statement is of the form: DATA listofvars₁/listofconsts₁/[[,]listv₂/listc₂/]... ... The DATA statement puts the constant values into the variables on the list at compile time" [S. L. Edgar, FORTRAN For The '90's (New York: Computer Science Press) 1992, p. 199-200. italics in the original]

If variable initialization is done at run-time, the initialization expression will usually be any expression that could be on the right-hand side of an assignment statement.
We already saw that Java initializes variables to their default values during compile-time. It can also initialize variables during run-time to the value of any expression. See the Standard Sect. 14.4.

Most variables are allowed to change their value binding. Named Constants are variables that are not allowed to do so. Java: bsh % final double pi = 3.14159; bsh % pi = 3; // Error: Typed variable: pi: Final variable, can't assign : at Line: 3 : in file: : pi = 3 Common Lisp: cl-user(1): pi 3.141592653589793d0 cl-user(2): (defconstant mypi 3.14159) mypi cl-user(3): (setf mypi 3) Error: Cannot change the value of mypi -- it is a constant. [condition type: program-error]

A named constant that gets its value binding at compile time is called a manifest constant.

A Literal Constant is sometimes treated just like a manifest constant whose name has a special syntax.

For each literal constant of a given value, Fortran allocates a memory cell with that value in it, and makes every occurrence of that literal a reference to that cell. For example, a program might have many occurrences of the literal constant 1. Fortran allocates a cell in RAM, stores in integer 1 in that cell, and makes every occurrence of 1 in the program a variable whose address is bound to that cell, thus saving memory. Old versions of Fortran could even change the value of such a variable dynamically. We will discuss how in Chapter 9.

A Macro is a source code expression that is textually replaced by (expanded into) other source code before execution, and often before compilation.

A macro may look like a named constant, but it is different.

For example, C's #define <identifier> <string> declares a macro expanded by a preprocessor that runs before the compiler.

CSE 305 Programming Languages Lecture Notes Stuart C. Shapiro Fall, 2003

Variables and Binding

Stuart C. Shapiro <shapiro@cse.buffalo.edu>

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Fall, 2003