The Department of Computer Science & Engineering
cse@buffalo

CSE 305
Programming Languages
Lecture Notes
Stuart C. Shapiro
Spring, 2005


Names

Names, or identifiers, are used for variables, subprograms (or methods), types, classes, etc.

Before discussing what a name looks like, we need to discuss what aren't names:

Comments
Fortran77, which is line oriented uses a C or * in column 1 to indicate that the line is a comment.

Other languages typically have one comment symbol to indicate that the rest of the line is a comment, and a pair of brackets to indicate that the enclosing material is a comment. For example,
LanguageRest of Line Comment Open CommentClose Comment
bash# nonenone
C// /**/
C++// /**/
Fortran90,95! nonenone
Java// /**/
Common Lisp; #||#
Perl# ==cut
Prolog% /**/
Python# nonenone
(Perl's comment brackets must be at the beginning of a line where a statement would be legal.)

There are also common practices, which IDE's and tools are sometimes sensitive to. Such as, in Java, the open comment bracket /** begins a JavaDoc comment. And in Lisp

Line Boundaries
Many programming languages ignore line boundaries, except for the "rest of line" comment. They include C, C++, Java, Common Lisp, Perl, and Prolog.

Bash, Fortran, and Python will normally consider the end of a line to be a statement terminator. They each have a way to explicitly indicate continuation onto the next line.

Whitespace
Whitespace includes spaces, and other characters that act as separators, such as newlines and tabs.

Fortran ignores spaces (blanks). For example in the Do statement,

Do 50 n = 1, 9999
if the comma is omitted, the statement will be interpreted as the assignment statement
Do50n = 19999

Bash uses spaces as separators, especially between a command and its arguments:

bash-2.02$ x=3

bash-2.02$ echo $x
3

bash-2.02$ x = 3
bash: x: command not found

Python uses indentation at the beginning of the line to indicate a block.

if expr:
   print "Block line 1."
   print "Block line 2."
else:
   print "In else block."
print "Out of Block"

Tokens
A token is "a sequence of characters with a unit of meaning." [Wall, Christiansen & Orwant, Programming Perl, p. 49]

Better: A token is a terminal symbol of the programming language, that the reader (parser) passes to the compiler or interpreter.

Punctuation
Punctuation, sometimes called separators, are non-whitespace characters that separate other tokens. They may include parentheses, brackets, and semicolons. For example in the expression a[i], the brackets separate the tokens a and i and prevent the expression from looking like the identifier ai.

In Python, the commas in [1,2,3] separate the elements of the list.

In bash, punction marks are called metacharacters:

metacharacter

A character that, when unquoted, separates words. One of the following:
| & ; ( ) < > space tab

[bash man page]

Operators
Operators include the numeric, relational, boolean, and other operators of the language. For example, the 37 operators of Java are
=	>	<	!	~	?	:
==	<=	>=	!=	&&	||	++	--
+	-	*	/	&	|	^	%
<<	>>	>>>
+=	-=	*=	/=	&=	|=	^=	%=
<<=	>>=	>>>=
Operators usually separate other tokens. For example, in Java, x+y is the same as x + y. In Lisp, however, most of these symbols are ordinary characters, so that while (+ x y) is an expression that evaluates to the sum of x and y, (+xy) is a call to the function of no arguments whose name is +xy, and 3+5 is a variable.

Numbers and other literals
Literals are tokens that the compiler recognizes as particular data values. They include numbers such as 5 and 78.34, but many languages have literals of other types, such as the Java boolean literals true and false, and Java's null.

There is generally an involved syntax for numeric literals, including optional signs, decimal points, exponentiation marks, and radix indicators. For example, in C++ and Java 0x57 is a hexadecimal integer equal to the decimal integer 87, and in Lisp, -3745e-2 is a floating point number equal to -37.45. Lisp also has literals of a ratio type, such as 3/5.

Names (Identifiers)
Names, or identifiers, are used for variables, subprograms (or methods), types, classes, etc. Different languages have different rules for the formation of identifiers. In Java,
"An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword, boolean literal, or the null literal." [The Java Standard, Section 3.8.]
In Fortran77, a name may only be 1-6 letters and/or digits, the first of which must be a letter. Fortran90 allows names up to 31 characters long, and allows them to include the _ character.

Common Lisp allows names (symbols) to be of arbitrary length, and treats as a name any token that cannot be interpreted as a number. (See the Lisp Hyperspec Section 2.2 Reader Algorithm and Section 2.3.4 Symbols as Tokens) So Lisp names include

1+     /5     ^/-     734ff     89..93 
Also, the symbols that are operators in other languages, such as + and > are names in Common Lisp. In fact, Common Lisp treats any character preceded by the escape character \ to be an alphabetic character. So the following are also Lisp names
ab\(c     quo\"te
and even several\ words\ strung\ together, which includes internal spaces. Even the newline character may be included in a Lisp name if preceded by an escape character.

Common Lisp also includes escape brackets:

|several words strung together|
is the same name as
several\ words\ strung\ together

Common Lisp macro characters, when encountered by the reader, cause the reader to call a function that recursively reads the input file, and returns an object as if the reader read that in the first place.

Moreover, Common Lisp puts the attributes of characters in the control of the programmer. For example, the programmer could make ( and ) be considered simple alphabetic characters, and make [ and ] serve the role ( and ) normally do.

Languages also differ about the significance of upper- and lower-case letters. Most modern languages distinguish between them. So HashTable is a different name from Hashtable.

Prolog considers a name that starts with an upper-case letter to be a variable, while one that begins with a lower-case letter is considered to be a literal symbol.

In Perl, every variable name must start with a "funny character". The name of a scalar variable, such as one that stores a number or string, must start with a $, such as $x. The name of a variable whose value is an array must start with an @, such as @monthTable. The name of a variable whose value is a hash table, called simply a "hash", must start with a %, such as %addressBook.

Fortran allows lower-case letters in names, but considers them equivalent to the upper-case version.

Versions of Common Lisp before ACL version 6 differentiated upper-case from lower-case letters, but automatically upper-cased non-escaped lower-case letters.

Although Emacs-Lisp is not a version of Common Lisp, like current ACL, it differentiates upper- from lower-case letters, and does not change either to the other.

Keywords and Reserved Words
Keywords and reserved words are tokens that look like identifiers, but whose use is restricted. Sebesta distinguishes them by saying that a keyword is restricted in only certain contexts, whereas a reserved word may never be used as an identifier. However, what Java calls keywords would be reserved words by this definition. When starting with a new programming language, finding the list of keywords and reserved words and what their restrictions are is as important as finding out what the comment symbols are.

The bash man page says,

Reserved words
are words that have a special meaning to the shell. The following words are recognized as reserved when unquoted and either the first word of a simple command ... or the third word of a case or for command:
! case do done elif else esac fi for function if in select then until while { } time [[ ]]
...
Note that unlike the metacharacters ( and ), { and } are reserved words ... Since they do not cause a word break, they must be separated from [other words] by whitespace.

First Previous Next

Copyright © 2003-2005 by Stuart C. Shapiro. All rights reserved.

Last modified: Fri Feb 4 08:45:53 EST 2005
Stuart C. Shapiro <shapiro@cse.buffalo.edu>