CSE 111, Fall 2000

Great Ideas in Computer Science

Lecture Notes #17

PROGRAMMING IN PASCAL:
TEXT PROCESSING

1.  The Great Ideas of Computer Science So Far

1.  Insight I:  Can represent anything that a computer
                    can deal with using only 2 nouns:  0,1

2.  Insight II:  Can do anything that a computer can
                    do, using only 5 actions:
                    *    either Karel's 5 basic instructions
                         or the 5 instructions of a Turing
                         machine (to be discussed later)

3.  Insight III:  Can express all complex actions, using
                      only 3 rules of grammar:
                      *    sequence (s1;s2)
                      *    selection (choice):  (if-then-else)
                      *    repetition (while-do)

4.  Insight IIIA:  Can name new instructions defined
                        using insights II & III.

5.  The notion of an "algorithm":
    *    a procedure for solving a problem, such that
         it is unambiguous and effective,
          where "effective" means:  it is correct
                                                & it halts.

6.  Algorithms can be implemented in computer
     programs:
    *    Input (including data) is given to the
          computer running the program that
          implements the algorithm,
         then the program processes the data,
         and the computer provides output

    *    I/P  -->  algorithm (program) --> O/P

7.  Decision Trees:
    *    How to get a computer to help you solve a
        problem

    *    Note that the programmer must know
        (& give to the computer) all the information!
 

2.  Great Idea #8:

8.  Computers can manipulate non-numeric data.
    e.g., textual data.

e.g.)    A word processor (such as MS Word) or a text
          editor (such as Pico)

a)   Think of text as a single, very long string of
    alphabetical letters, numerals (0, ..., 9), the blank
    space, symbols, <return>, <tab>, etc.

b)    Some word-processing operations:

    insert new text:
        s1 s2 --> s1 s3 s2
        e.g.)    'abde' --> 'abcde'

    delete text:
        s1 s2 s3 --> s1 s3
        e.g.)    'abxcd' --> 'abcd'

    move text (= delete from one place
                        & insert in another)
        s1...s2...s3...s4 --> s1...s3...s2...s4
        e.g.)    'acbd' --> 'abcd'

    copy text (=insert)
        s1...s2...s3 -->    s1...s2...s2...s3
        e.g.)    'abc' --> 'abbc'

    replace text (=delete old string & insert new one)
        s1 --> s2
        e.g.)    'axc' --> 'abc'

    concatenate text (=join, or insert):
        s1, s2 --> s1s2
        e.g.)    'ab', 'cd' --> 'abcd'

    etc.
 

3.  The Post Correspondence Problem

a)    This is a mathematical puzzle involving
        textual manipulation that is of use in
        deciding the limits of computation:

        *    The PCP is not computable
            (exactly what that means will be explained
            a bit later)

b)    Given 2 strings (call them S, T)

            i.e.) var S,T : varying [infinity] of char;

        Given some string-manipulation operations:

            e.g.)
            OP1:  replace S by Sb
                    (i.e., concatenate S to b)
                    replace T by Tabb

            Notation:  OP1(S) = Sb
                            OP1(T) = Tabb

            OP2(S) = Saa
            OP2(T) = Ta

            OP3(S) = Sbabab
            OP3(T) = Tab

        Notes:
        1.  There can be different operations, and
            even different numbers of operations;
            this is just an example.
        2.  An open issue:  How to code this in Pascal

        Assume that initially, S & T are empty

            i.e.)  using Pascal notation to be introduced
                    soon:

                    S := '';
                    T := ''

        Question:  Is there a sequence of applications
                        of operations 1-3 above such that
                        eventually S=T?

        Answer:  Yes:

        S                    T
        --------------
         ''                    ''
OP2:    aa                a
OP1:    aab              aabb
OP3:    aabbabab     aabbab
OP1:    aabbababb    aabbababb

c)  But now suppose:

    OP1(S) = Saaba
    OP1(T) = Taab

    OP2(S) = Sabb
    OP2(T) = Tbaab

    OP3(S) = Sbbab
    OP3(T) = Tabba

Now is there a sequence of operations 1-3 applied
to initially empty S & T such that eventually S=T?

Answer:  Next time!


Copyright © 2000 by William J. Rapaport (rapaport@cse.buffalo.edu)

file: 111F00/lecturenotes17.23oc00.html