CSE 111, Fall 2000

Great Ideas in Computer Science

Lecture Notes #17

PROGRAMMING IN PASCAL:
TEXT PROCESSING

1. The Great Ideas of Computer Science So Far

1. Insight I: Can represent anything that a computer
can deal with using only 2 nouns: 0,1

2. Insight II: Can do anything that a computer can
                    do, using only 5 actions:
                    *    either Karel's 5 basic instructions
                         or the 5 instructions of a Turing
                         machine (to be discussed later)

3. Insight III: Can express all complex actions, using
                      only 3 rules of grammar:
                      *    sequence (s1;s2)
                      *    selection (choice): (if-then-else)
                      *    repetition (while-do)

4. Insight IIIA: Can name new instructions defined
using insights II & III.

5. The notion of an "algorithm":
    *    a procedure for solving a problem, such that
         it is unambiguous and effective,
          where "effective" means: it is correct
                                                & it halts.

6. Algorithms can be implemented in computer
     programs:
    *    Input (including data) is given to the
          computer running the program that
          implements the algorithm,
         then the program processes the data,
         and the computer provides output

* I/P --> algorithm (program) --> O/P

7. Decision Trees:
* How to get a computer to help you solve a
problem

* Note that the programmer must know
(& give to the computer) all the information!

2. Great Idea #8:

8. Computers can manipulate non-numeric data.
e.g., textual data.

e.g.) A word processor (such as MS Word) or a text
editor (such as Pico)

a)   Think of text as a single, very long string of
    alphabetical letters, numerals (0, ..., 9), the blank
    space, symbols, <return>, <tab>, etc.

b) Some word-processing operations:

    insert new text:
        s1 s2 --> s1 s3 s2
        e.g.)    'abde' --> 'abcde'

    delete text:
        s1 s2 s3 --> s1 s3
        e.g.)    'abxcd' --> 'abcd'

    move text (= delete from one place
                        & insert in another)
        s1...s2...s3...s4 --> s1...s3...s2...s4
        e.g.)    'acbd' --> 'abcd'

    copy text (=insert)
        s1...s2...s3 -->    s1...s2...s2...s3
        e.g.)    'abc' --> 'abbc'

    replace text (=delete old string & insert new one)
        s1 --> s2
        e.g.)    'axc' --> 'abc'

    concatenate text (=join, or insert):
        s1, s2 --> s1s2
        e.g.)    'ab', 'cd' --> 'abcd'

etc.

3. The Post Correspondence Problem

a)    This is a mathematical puzzle involving
        textual manipulation that is of use in
        deciding the limits of computation:

        *    The PCP is not computable
            (exactly what that means will be explained
            a bit later)

b) Given 2 strings (call them S, T)

i.e.) var S,T : varying [infinity] of char;

Given some string-manipulation operations:

            e.g.)
            OP1: replace S by Sb
                    (i.e., concatenate S to b)
                    replace T by Tabb

Notation: OP1(S) = Sb
OP1(T) = Tabb

OP2(S) = Saa
OP2(T) = Ta

OP3(S) = Sbabab
OP3(T) = Tab

        Notes:
        1. There can be different operations, and
            even different numbers of operations;
            this is just an example.
        2. An open issue: How to code this in Pascal

Assume that initially, S & T are empty

i.e.) using Pascal notation to be introduced
soon:

S := '';
T := ''

        Question: Is there a sequence of applications
                        of operations 1-3 above such that
                        eventually S=T?

Answer: Yes:

        S                    T
        --------------
         ''                    ''
OP2:    aa                a
OP1:    aab              aabb
OP3:    aabbabab     aabbab
OP1:    aabbababb    aabbababb

c) But now suppose:

OP1(S) = Saaba
OP1(T) = Taab

OP2(S) = Sabb
OP2(T) = Tbaab

OP3(S) = Sbbab
OP3(T) = Tabba

Now is there a sequence of operations 1-3 applied
to initially empty S & T such that eventually S=T?

Answer: Next time!