CSE676: Knowledge Representation Fall, 1999
Course Notes
Stuart C. Shapiro
Department of Computer Science
State University of New York at Buffalo

Introduction
These notes will comment on and supplement the text. They will serve as an outline for the class meetings, and will be quite informal.
Artificial Intelligence (AI) is a field of computer science and engineering concerned with the computational understanding of what is commonly called intelligent behavior, and with the creation of artifacts that exhibit such behavior. [Shapiro, EofAI 2nd Ed.,1992, p. 54.]
See the story "An Approach to Serenity", and its accompanying questions for some motivation.
Brian C. Smith's knowledge representation hypothesis:

Any mechanically embodied intelligent process will be comprised of structural ingredients that a) we as external observers naturally take to represent a propositional account of the knowledge that the overall process exhibits, and b) independent of such external semantical attribution, play a formal but causal and essential role in engendering the behaviour that manifests that knowledge. [Smith in Brachman & Levesque, 1985, p. 33.]

Knowledge representation is the area of AI concerned with the formal symbolic languages used to represent the knowledge (data) used by intelligent systems, and the data structures used to implement those formal languages. However, one cannot study static representation formalisms and know anything about how useful they are. Instead, one must study how they are helpful for their intended use. In most cases, the intended use is to use explicitly stored knowledge to produce additional explicit knowledge. This is what reasoning is. Together, knowledge representation and reasoning can be seen to be both necessary and sufficient for producing general intelligence, [that is, KR&R is an] AI-complete area. Although they are bound up with each other, knowledge representation and reasoning can be teased apart, according to whether the particular study is more about the representation language/data structure, or about the active process of drawing conclusions. [Shapiro, EofAI 2nd Ed.,1992, p. 56.]
Discuss:

Formal Language: Syntax
Language for Representation: Semantics
Production of additional explicit knowledge: Inference
I.e. Logic?

Introduction, Part 2
Knowledge as "justified true belief" (see text pp. 7-8) vs. Belief.
Which is appropriate for a "Knowledge-based system"?
Knowledge vs. attribution of knowledge.
Belief vs. attribution of belief.
Does a "Knowledge base" contain knowledge of the world? Is that what a KR represents?
Procedural/Declarative controversy: Discuss CS notion of procedural vs. declarative representation (program vs. data?). Better distinction: What can the entity say?
Know that, e.g., "I know that Buffalo is west of Rochester."
vs. Know how, e.g., "I know how to type."
vs. Know who, e.g., "I know Bill Rapaport", "I know who Bill Clinton is".
The Tell/Ask interface: A KR&R system as a utility vs. constructing a computational cognitive agent.

Classical Logic
See the partial FISI Course Notes, (1994) Chapters 1, 2.1, 2.2, 3.1, 3.2, and 3.4
and/or
The IJCAI'95 tutorial Presentation, (1995) Chapters 1, 2.1, 2.2, 3.1, and 3.2.
and/or
the paper, Propositional, First-Order And Higher-Order Logics: Basic Definitions, Rules of Inference, Examples (1999).

Facing the Open World
Ray Reiter: The Closed World Assumption (CWA): "Anything you don't know is true is false."
Example (DBMS): "Is Mary Smith the manager of the Data Processing Division?" If the Database doesn't have that she is, then she isn't.
CWA justifies "negation by failure", such as in Prolog.
In our everyday lives, we are always learning new things. So CWA doesn't hold---the "Open World Assumption" (OWA).
If a KB system is allowed to mix Tell with Ask, it must make the OWA.
The OWA entails "Unknown" as a third possible "truth value".
What if new information contradicts old conclusions? (Non-monotonicity)
What if new information contradicts old information? (Belief Revision)
Classical Logic is Monotonic: If A |- C then A u {B} |- C.

Example:

Birds fly.
Canaries are birds.
Penguins are birds.
Tweety is a canary.
Opus is a penguin.

Does Tweety fly?
Does Opus fly?

What if then learn that penguins don't fly.
What does "Birds fly" mean, if you know that penguins are birds, but don't fly?

Default rule:

Bird(x): Flies(x)

Flies(x)

Non-Monotonic Logics (Chapter 4)
Default Logic, Chapter 4.2, and Circumscription, Chapter 4.3.1, do not give much advice to builders of computer reasoning systems, so we will not spend much time discussing them, just a few comments.
Default Logic: If we call the parts of a default rule

preconditions: justifications

consequent

then, to fire the rule, we must derive the preconditions and try but fail to derive the negations of the justifications. Remember, trying to derive a non-derivable may be non-terminating if we have a logical system that is only semi-decidable. If, nevertheless, we succeed in firing the rule, we must use some sort of TMS (see below) to prepare for the possibility that we might have to retract the conclusion.
Circumscription is an operation performed on the set of non-logical axioms to introduce additional non-logical axioms in order to formalize the CWA in the sense of being able to formally show that only the mentioned objects exist and/or only the objects that provably have some property have that property. The problem with trying to automate circumscription is that a choice must be made of which axioms or predicates are to be "circumscribed," and this choice remains an art form.

Modal Logic
[Additional reference: E. Davis, Representations of Commonsense Knowledge, San Mateo, CA: Morgan Kaufmann, 1990.]
Modal logics add sentential operators to the syntax and semantics of propositional and predicate logic.
In these notes, I'll use L and M as the two modal operators.
Syntactic Extensions:
Propositional Logic: If P is a well-formed proposition, so are L(P) and M(P).
Predicate Logic: If P is a well-formed formula, so are L(P) and M(P).
I will assume we are discussing modal predicate logic, but note that not too much will hang on that.
M(P) is often taken as an abbreviation of ¬L(¬P). Otherwise, the equivalence of these two will be incorporated in the logic some other way.
Common intensional semantics of L(P) and M(P) (choose one):

L(P) M(P)

Necessarily [P]. Possibly [P].

I know that [P]. I believe that [P] might be true.

I believe that [P]. I believe that [P] might be true.

[P] will always be true. [P] will be true at some time.

Modal logic often used when

Operators do not commute with quantifiers
e.g. L(ExSpy(x)) vs. ExLSpy(x)
Operators are referentially opaque
e.g. ¬L(Scott = Author-of(Waverly)) vs. ¬L(Scott = Scott)
No need for quantifying over sentences
e.g. Ax(Says(Bill, x) => L(x))
(If first two don't hold, something simpler might be used. If third doesn't hold, might need something more complicated.)
Extensional semantics of modal logics requires the notion of possible worlds connected by an "accessibility" relation.
L(P) is true in world w if and only if P is true in every world accessible from w.
M(P) is true in world w if and only if there is some world accessible from w in which P is true.
The Necessitation rule of inference of Modal Logic (Martins p. 58) is
If A is a theorem of classical logic, then we can infer L(A). Is this reasonable if L is Knowledge (Logical Omniscience)?
Different modal axioms are associated with different properties of the accessibility relation.
L(P) => P is valid if accessibility is reflexive. This is reasonable if L is knowledge, but not if L is belief.
L(P) => L(L(P)) is valid if accessibility is transitive. This is reasonable if L is temporal, but what if knowledge? belief?

Exercises:

(L(A) & L(A => B)) => L(B) is valid
(L(A) & (A => B)) => L(B) is not valid

Justification-Based TMSs
Whenever a rule of inference is used to infer C from P and Q, record that P and Q are the "justifications" of C, so that later one can find P and Q from C, or find C from P and/or Q.
If one ever has a contradiction, P and ~P, one can then trace back through the justifications of P and ~P to find the original premises, and delete one of them.
Similarly, if belief in some proposition P is removed, one can trace forward through the propositions P justifies and remove belief in all dependent consequents, at least those which do not have some other justification.
JTMSs are often separate facilities from the problem solvers they support, and often only record justifications among atomic propositions.
A Negative: The paths of justifications sometimes form loops that prevent belief revision from being performed.
Example of a loop from [E. Charniak, C. Riesbeck, & D. McDermott, Artificial Intelligence Programming, Hillsdale, NJ: Lawrence Erlbaum, 1980, p 197.]

KB:

all(x)(Man(x) => Person(x))
all(x)(Person(x) => Human(x))
all(x)(Human(x) => Person(x))
Man(Fred)

Dependency Network:

Man(Fred) ---->o<---- all(x)(Man(x) => Person(x) | | v Person(Fred) --->o<---- all(x)(Person(x) => Human(x)) ^ | | | | v o<----Human(Fred) ^ | | all(x)(Human(x) => Person(x))

Now if Man(Fred) is retracted, one justification of Person(Fred) goes away, but it has another justification!

A Positive: Justifications are useful for explanation.

Assumption-Based TMSs
ATMSs record with every consequent the original premises (assumptions) on which it depends. This eliminates the possibility of loops and the necessity of tracing through justifications to find the premises. Unfortunately, assumptions are not as useful as justifications for explanation.
A technique for keeping track of assumptions was borrowed from a technique used for Fitch-Style proofs in the Logic of Relevant Implication (see Chapter 3.2).

Semantic Networks
Raphael's SIR

Reference: Bertram Raphael, SIR: Semantic Information Retrieval. In Marvin Minsky, Ed. Semantic Information Processing, MIT Press, Cambridge, MA, 1968, 33-145. (Reprint (partial?) of 1964 Ph.D. dissertation.)
Raphael did not present a graphical representation of "the SIR model" [p. 54], but used Lisp property lists, and spoke about "type-1", "type-2", and type-3" links [pp. 57-58]:

A type-1 link was for a relation R where one and only one y could be in the R relation to any x. For example CHAIR: ((JRIGHT LAMP)...) means the lamp is just to the right of the chair.

A type-2 link was for cases where more than one y could be in the R relation to any x. For example, PERSON: ((SUBSET (BOY GIRL MIT-STUDENT))...) means that boys, girls and MIT students are subsets of the set of persons.

A type-3 link was for cases where "descriptive information" needed to be added. For example the representation of "A person has two hands" and "A finger is part of a person" was PERSON: ((SUBPART ((PLIST NAME HAND NUMBER 2) (PLIST NAME FINGER)))...)
[All examples from Raphael, 1968, pp. 57-58.]
A noteworthy "special feature" of SIR was the "exception principle":
"General information about `all the elements' of a set is considered to apply to particular elements only in the absence of more specific information about those elements. Thus it is not necessarily contradictory to learn that `mammals are land animals' and yet `a whale is a mammal which always lives in water.' In the program, this idea is implemented by always referring for desired information to the property-list of the individual concerned before looking at the descriptions of sets to which the individual belongs.
The justification for this departure from the no-exception principles of Aristotelian logic is that this precedence of specific facts over background knowledge seems to be the way people operate, and I wish the computer to communicate with people as naturally as possible.
The present program does not experience the uncomfortable feeling people frequently get when they must face facts like [these]. However, minor programming additions to the present system could require it to identify those instances in which specific information and general information differ; the program could then express its amusement at such paradoxes." [Raphael, 1968, p. 85, italics in original]
This is the first appearance I know of the "exception principle" in AI. It lead to default reasoning, concern with the logical principles underlying its procedural semantics lead to nonmonotonic logics, and it is an ancestor of similar operations in object-oriented programming.

Quillian's Semantic Memory

Reference: M. Ross Quillian, Semantic Memory. In Marvin Minsky, Ed. Semantic Information Processing, MIT Press, Cambridge, MA, 1968, 227-270. (Reprint (partial?) of 1966 Ph.D. dissertation.)
This work introduced several important notions:

"semantic networks"

The notion that the full meaning of a term is its place in the network as a whole, including how it is connected to every other term.

Use of subclass as a major categorizing principle.

Use of "associative links" of several different types (distinguished by shape) to structure memory, and to represent the information in NL sentences (dictionary entries).

The relevance of one term to another measured by closeness in the network.

Spreading, bidirectional activation in a network.

Collins & Quillian's Semantic Networks (ca. 1970, 1972)
Firmly established notion of inheritance hierarchies and their psychological validity.
Even though had problem with "fast negatives".

Concern with Foundations
What do semantic networks represent, meaning of a sentence or mind of a language user?
Need for syntax and semantics of networks.
Structural vs. Assertional information.

SNePS

Some SNePS References

SNePS Research Group home page
SNePS 2.5 User's Manual, PS version HTML version
SNePS Tutorial, PS version PDF version DVI version
Stuart C. Shapiro, SNePS: A Logic for Natural Language Understanding and Commonsense Reasoning talk presented to Master's Class, Graduate Program in Philosophy and Computers and Cognitive Science (PACCS), Binghamton U., Binghamton, NY, April 16, 1999.

Stuart C. Shapiro, SNePS: A Logic for Natural Language Understanding and Commonsense Reasoning. In Lucja Iwanska & Stuart C. Shapiro, Eds., Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language, AAAI Press/The MIT Press, Menlo Park, CA, forthcoming.
Stuart C. Shapiro and William J. Rapaport, The SNePS Family. Computers &Mathematics with Applications 23, 2-5 (January-March, 1992), 243-275. Reprinted in F. Lehmann, Ed. Semantic Networks in Artificial Intelligence. Pergamon Press, Oxford, 1992, 243-275.
Stuart C. Shapiro, Cables, Paths and "Subconscious" Reasoning in Propositional Semantic Networks. In J. Sowa, Ed. Principles of Semantic Networks: Explorations in the Representation of Knowledge. Morgan Kaufmann, San Mateo, CA, 1991, 137-156.
Stuart C. Shapiro, Belief spaces as sets of propositions. Journal of Experimental and Theoretical Artificial Intelligence 5, 2&3 (April-September 1993), 225-235.
SNeRG Bibliography

Conceptual Dependency
The following brief description of CD is taken from Stuart C. Shapiro and William J. Rapaport, Models and minds: knowledge representation for natural-language competence. In R. Cummins & J. Pollock, Eds. Philosophy and AI: Essays at the Interface. MIT Press, Cambridge, MA, 1991, 215--259.
Conceptual Dependency theory (Schank and Rieger 1974, Shank 1975, Schank and Riesbeck 1981; cf. Hardt 1987) uses a knowledge-representation formalism consisting of sentences, called ``conceptualizations'', which assert the occurrence of events or states, and six types of terms:

PPs---``real-world objects'',
ACTs---``real-world actions'',
PAs---``attributes of objects'',
AAs---``attributes of actions'',
Ts---``times'',
and LOCs---``locations''.
(The glosses of these types of terms are quoted from Schank & Rieger 1974: 378-379.)
The set of ACTs is closed and consists of the well-known primitive ACTs PTRANS (transfer of physical location), ATRANS (transfer of an abstract relationship), etc.
The syntax of an event conceptualization is a structure with six slots (or arguments), some of which are optional: actor, action, object, source, destination, and instrument.
A stative conceptualization is a structure with an object, a state, and a value.
Only certain types of terms can fill certain slots. For example, only a PP can be an actor, and only an ACT can be an action. Interestingly, conceptualizations, themselves, can be terms, although they are not one of the six official terms. For example, only a conceptualization can fill the instrument slot, and a conceptualization can fill the object slot if MLOC (mental location) fills the act slot.
A ``causation'' is another kind of conceptualization, consisting only of two slots, one containing a causing conceptualization and the other containing a caused conceptualization.
Although, from the glosses of PP and ACT, it would seem that the intended domain of interpretation is the real world, the domain also must contain theoretically postulated objects such as: the ``conscious processor'' of people, in which conceptualizations are located; conditional events; and even negated events, which haven't happened.

Frames
Proposed by Marvin Minsky
Motivated by vision problems
Answer to a challenge by Hubert Dreyfus (intentionally?)
Along with scripts, first structured KR theory
Based, at least in part, on Simula 67 classes---part of the development of OOP
Basic ideas:
Slots and fillers
Hierarchy
Inheritance
Defaults
Procedural Attachments---IfNeeded & IfAdded
Multiple Inheritance
Possibilities: E.g. Dogs are Domestic Mammals vs. Wild Birds
Problems: What to do when there's a conflict Esp if negated links allowed
Nixon Diamond:

Pacifist /\ / \ / \NOT / \ / \ Quaker Republican \ / \ / \ / \ / \/ Nixon

Frame systems usually do not allow negative links. See Brachman comment that elephants are grey by default, but albino elephants are allowed, so why not Clyde, the non-elephant elephant?
What is a hierarchy? If have classes and individuals and inheritance, must distinguish class properties from individual properties. An abstraction hierarchy doesn't have classes, but abstract individuals.
See text for examples of KEE.

The KL-ONE Family
TBox: Definitional, structural information
ABox: Assertional
TBox
Hierarchy of Concepts and Relations
Generic Concepts
Defined Concepts contain Necessary and Sufficient conditions
Primitive Concepts contain only Necessary conditions
Necessary: All x [C(x) -> P(x)]
Sufficient: All x [P(x) -> C(x)]

Individual Concepts are classes with a single element. Note: another way to have an inheritance hierarchy with only one relation.
A Generic Concept is defined as a subconcept of others with certain roles.
See examples in text.
ABox
Concepts provide Unary Predicates
Roles provide Binary Relations
Defining an arch (text p. 202)

(cdef Arch (and (atleast 1 lintel) (atmost 1 lintel) (all lintel Block) (atleast 2 upright) ; note typo in text (atmost 2 upright) ; note typo in text (all upright Block) (sd NotTouch (upright objects)) (sd Support (lintel supported) (upright supporter))))
Assume an Arch A1 with lintel B1 and uprights B2 and B3.
From semantics, p. 203 of text:

R[(sd s (pc1 ps1) ... (pcn psn))] = {x in D: Ey[y in R[s] & Az1, ..., zn[(<x, z1> in R[pc1] <-> <y, z1> in R[ps1]) & ... & (<x, zn> in R[pcn] <-> <y, zn> in R[psn])]]}
Instantiating to the Arch, we get

R[(sd Support (lintel supported) (upright supporter))] = {x in D: Ey[y in R[Support] & Az1, z2[(<x, z1> in R[lintel] <-> <y, z1> in R[supported]) & (<x, z2> in R[upright] <-> <y, zn> in R[supporter])]]}

"Understanding Subsumption and Taxonomy: A Framework for Progress"
by William A. Woods From John F. Sowa, Ed. Principles of Semantic Networks: Explorations in the Representation of Knowledge (San Mateo, CA: Morgan Kaufmann), 1991, 45-94.

Notation, Terminology & Ideas

[English description]: a concept. E.g., [person whose sons are professionals]

Structural subsumption: subsumption inferred from structure of concepts.

Classification: assimilating a new description into a taxonomy of existing concepts by directly linking it to its most specific subsumers and its most general subsumees.

Concepts identified with abstract description or abstract conceptual entities (intensional concepts), rather than predicates of first-order logic or classes.

Three aspects of descriptions---A description can be

satisfied by something (usual role of predicate),

satisfied in a situation, recognizing something as fitting the description,

used as a structured plan for creating an entity so described.

Atomic description: the name of an atomic category. E.g., [tree]. Treated as primitive.

Composite description: constructed from other conceptual descriptions using concept-forming operators. e = c1, ..., ck / (r1: v1), ..., (rn: vn): {p1, ..., pt}

ci: primary conceptual descriptions.
e can be described by ci.

(ri: vi): relational modifiers ("relation: value" pairs). Structural link labelled ri from e to a concept vi. Each ri is a concept that denotes a relation. ("in any network with self-reflexive properties ... it would be necessary for the link label (i.e., the relation), as well as the linked concepts, to be represented by conceptual nodes in the network." [p. 81] (Idea usually attributed to Shapiro, 1971, though may be closer to Winston, 1976.))
E(r,v)[r can be described by r1 & v can be described by vi & r holds between e and v]

pi: general conditions expressed as second-order predicates.
e satisfies pi.
E.g., [person] / ([like]: [golf]) = a person who likes golf
[golfer], [woman] = a woman golfer

...

Representing and Reasoning about Time
Reference: James F. Allen, Maintaining Knowledge about Temporal Intervals, Communications of the ACM 26, 11 (1983), 832-843. Reprinted in R. J. Brachman and H. J. Levesque, eds. Readings in Knowledge Representation, Morgan Kaufmann, Los Altos, CA, 1985, 509-521.

Issues

Representing time itself, rather than situations, events, or other categories of entities that relate to time.
Representing time-in-the-world accurately according to physics (Newtonian? Einsteinian? Quantum?), or representing human cognitive ideas of time?

Representing time as real numbers (dense, well-ordered), or as partially ordered entities.

Points vs. intervals (see Allen '83).

Relations between intervals (see Allen '83).

Granularity.

Time vs. tense.

Dealing with "now": movement; nests of "now".

Values: linear and cyclic; calendars.

Which comes first, times or events?

KIF: Knowledge Interchange Format
For all the details, see the KIF Home Page.

"Knowledge Interchange Format (KIF) is a computer-oriented language for the interchange of knowledge among disparate programs. It has declarative semantics (i.e. the meaning of expressions in the representation can be understood without appeal to an interpreter for manipulating those expressions); it is logically comprehensive (i.e. it provides for the expression of arbitrary sentences in the first-order predicate calculus); it provides for the representation of knowledge about the representation of knowledge; it provides for the representation of nonmonotonic reasoning rules; and it provides for the definition of objects, functions, and relations." [Abstract of the Reference Manual]

The Introduction to the Reference Manual says what KIF is and isn't.
A KIF knowledge base is a finite set (not sequence) of forms, each of which is either a sentence, a definition, or a rule.
KIF sentences are very similar to FOPC wffs.
KIF definitions can be either complete, giving necessary and sufficient conditions, or partial, giving only necessary conditions. Each defined constant gets a defining axiom, which is an analytic truth. Discuss this.
KIF rules may be nonmonotonic, but need not be. Rules are not sentences. See the discussion of this in the Reference Manual.
See the interesting discussion of metaknowledge.

Ontologies
You can browse the ontologies stored under the Stanford Knowledge Systems Laboratory Network Services.

Stuart C. Shapiro <shapiro@cse.buffalo.edu>