CSE250                   Final Exam Answer Key                 Spring 2014

Edited for Spring 2022, adding AVL trees and changing C++ notation to Scala

(1) Chart of words and values for a=1, b=2, c=3, etc.

Key k    h(k)      %8
ace        9        1
bad        7        7
bed       11        3
bag       10        2
ebb        9        1
beg       14        6
did       17        1

(a) Chaining.
0 []-->        []-->                []-->
1 []-->[ace]   []-->[ace]-->[ebb]   []-->[ace]-->[ebb]-->[did]
2 []-->        []-->[bag]           []-->[bag]
3 []-->[bed]   []-->[bed]           []-->[bed]
4 []-->        []-->                []-->
5 []-->        []-->                []-->
6 []-->        []-->                []-->[beg]
7 []-->[bad]   []-->[bad]           []-->[bad]

(b) Open address, Linear probing    (c) Quadratic probing: same up to ebb
0 [   ]       [   ]       [   ]     0 [   ]      [   ]      [   ]   Now
1 [ace]       [ace]       [ace]     1 [ace]      [ace]      [ace]   "did"
2 [   ]       [bag]       [bag]     2 [bag]      [bag]      [bag]   is un-
3 [bed]       [bed]       [bed]     3 [bed]      [bed]      [bed]   place
4 [   ]       [ebb]       [ebb]     4 [   ]      [   ]      [   ]   able!
5 [   ]       [   ]       [did]     5 [   ]      [ebb]      [ebb]
6 [   ]       [   ]       [beg]     6 [   ]      [   ]      [beg]
7 [bad]       [bad]       [bad]     7 [bad]      [bad]      [bad]


(d)(ace)          (e) (ace)B            (bad)B            (bad)B
     \                    \             /   \             /   \
     (bad)               R(bad)     (ace)B  (bed)B    (ace)B  (bed)R
        \                     \             /   \             /   \
        (bed)                R(bed)    R(bag)  R(ebb)    B(bag)   (ebb)B
        /   \      Nil/black uncle,                               /
    (bag)   (ebb)  so rotate left.   Now "beg" gives Red     R(beg)  (bed)R
            /                        Aunt case, so flip.        ...    \
        (beg)          (bad)B        Then "did" makes in-            B(did)
Tree is    \           /   \         leaning kid with black           /   \
spindly    (did)   (ace)B  (bed)B    uncle, so rotate twice.     R(beg)  R(ebb)


(e with AVL) The insertion of "bed" causes the first rotation:

   (ace)      ==>   (bad)   Then at    (bad)     inserting "beg" makes the root
     \              /   \              /   \     imbalanced.  This is an "outer
     (bad)      (ace)   (bed)      (ace)   (bed) taller" case, so do 1 rotation:
        \                                  /   \
        (bed)                          (bag)   (ebb),

         (bed)       The final insert "did"                (bed)      
        /     \      puts imbalance at "ebb"              /     \     
   (bad)       (ebb) with "did" being "inner         (bad)       (did)
   /   \       /     taller", so now need 2          /   \       /   \
(ace) (bag)  (beg)   rotations or 1 "hoist":     (ace)  (bag)  (beg) (ebb)


(f) Here $ marks the insertion point at the end of the last stored word overall.

  ()-->[ace bad $]      ()-->[ace bad]       Now "beg"        Final BALBOA
       bed              ()-->[bed bag $]     causes split     after "did":
  ()-->[ace bad bed $]  "ebb" goes at $      ()-->[ace bad]   ()-->[ace bad]
       bag              ()-->[ace bad]       ()-->[bed bag]   ()-->[bed bag]
  "bag" causes split!   ()-->[bed bag ebb $] ()-->[ebb beg $] ()-->[ebb beg did $]


(2) (a) No: If you use the indexing version, the same loop as for "at" loops 
thru about sqrt(n) nodes for each insert, giving overall time O(n\sqrt{n}).
    (b) Yes or no, with appropriate justification.  Let's read (c) first.
	(c) Yes: because the "pre-allocated" representation here guarantees that
the vector will be disturbed only when it hits capacity and the node splits.  
This causes O(\sqrt{n}) extra work, but only every \sqrt{n} inserts, so the
total is still O(n).
    Now back to (b): it depends on whether we are assured of getting the
standard behavior of push_back on the vector in the last node.  Since push_back
is amortized with regard to the node vector itself, it still takes O(sqrt(n))
time to do the sqrt(n) inserts for each node, so the whole time stays O(n).
So "Yes" is the preferred answer.  But it is possible that the code for 
BALBOA<T>::insert(...) only calls a vector<T>::insert method which always
resizes.  That would give sqrt(n) on each insert, making it "No" like (a).
Credit given for both "yes" and "no" with appropriate justification,
because it depends on whether /vector/ relocates on a call to /insert/ that
happens to be at the end.  Note that an insert in any other place /will/
always cause the vector to resize/relocate, so it's possibly an implementation
specific matter.  Ny preferred answer is "no", still O(n\sqrt{n}) time.
    (d) It is impossible because O(n) time here would imply the existence of an
o(nlogn)-time sorting algorithm based on comparisons, which is impossible.
    (e) Yes: lim_{n->infty} (f(n)^2/g(n)^2) = lim_n (f(n)/g(n))(f(n)/g(n))
        = 0*0 = 0.
    (f) An AIOLI will stay "balanced" under similar conditions as a basic BST
will stay balanced.  Then it gives runtime O(log n) for all of find, insert,
and remove---again like a basic BST.  As explored on part 1(c) of Assignment 5,
however, BALBOA needs settings giving O(sqrt(n)) time for inserts and removes.


(3) 1. (c) (best answer) Usually O(1) time.  The iterator
already references the node so no "find(key)" call is needed, de-linking
the node is O(1) time, and the successor-or-predecessor node to put in
its place is /usually/ found in O(1) time.  "Amortized" is not quite right
because we don't know if the erasures will be in sequence.

    2. (c) (still best answer): No real change,
except that the log-depth of the tree at least guarantees that finding the
successor or predecessor will take O(log n) steps.  The reason why an AVL
tree does not disturb the "O(1) usually" time for finding the inorder
predecessor or successor is that the inorder transversal takes 2n-2 total
edge steps regardless of the kind of tree.

    3. (f) Guaranteed O(\sqrt{n}) time, no better because the pop from the
front always causes the first node's vector to re-form.
 
    4. (g) Never better than the guaranteed O(n) time because inserting at
the beginning always causes a vector to re-form itself, unlike with push_back.

    5. (a) Guaranteed O(1) time since "itr" is already on the node to erase.

    6. (d) Guaranteed O(log n) time since <set> uses a red-black tree.

    7. (a) Even if the BALBOA iterator class has 3 fields for node, local
index, and a pointer to the BALBOA it's on, comparing those fields for
equality is still O(1) time.

    8. (g) Each pop is guaranteed O(1) time, so n pops make O(n) time.

    9. (g) Each insert is O(n') time with n <= n' <= n+(n/logn) < 2n, and 
since log(2n) = 1 + log(n), each insert is O(log n) time.  There are n/log(n)
inserts, and iterating thru the source tree poses no time obstacle, so the
overall time is O(n) guaranteed.

   10. (a) Since "c" is the same, we can make a new BALBOA simply by linking
their nodes together, in O(1) time.  A bit of a trick question...(as I wrote in the
original C++ key, but even more in Scala where linked structures are less explicit).

(The answers (b) and (e) were not used, but allowed partial credit with justifications.)


(4) (a) Users can be modeled by indexed lookup: <vector> is best, with O(1)
time lookup, while <list> is O(n) time on average hence poor.  Since <set>
is guaranteed O(log n) time for lookup it is OK, while hash tables give O(1)
time usualy and so might be as good as <vector>.
    (b) No: indexed lookup is the one thing for which BALBOA's O(sqrt(n))
time is poor.  Moreover, since we would be relying on an INV that the index
equals the user ID, an erase or insert in the middle would throw everything off.
    (c) For the cards with titles, <vector> is now poor, since it doesn't
support associative lookup (i.e., lookup by strings) in good time.  Likewise
<list> is poor, but <set> with O(log n) time guaranteed and hash tables with
O(1) usual time are competitively best.  
    (d) Reading the N bids into a linked list is an OK first step but not
enough for the main task, because the bids are not sorted by user, so handling
each user would take multiple wasted iterations thru all N.  Instead iterate 
once and "deal" bids to each user via indexed lookup on the user IDs.  Then
they are grouped when you next iterate thru all the users.
    (e) The following code body assumes class Bid has size_t getUserID() and
class User has void addBid(const Bid& b).  It can also use a const_iterator:
	
(Spring 22 note: sorry, leaving this code snippet in C++)

	list<Bid>::iterator itr = bids.begin();
	while (itr != bids.end()) {
           size_t uid = (*itr).getUserID();
           uvec.at(u).addBid(*itr);  //REQ: class Bid allows copying
           ++itr;
	}

If you like crunchy one-liners, the first two lines of the while loop can be:

        uvec.at((*itr).getUserID()).addBid(*itr);

However, one cannot increment itr++ in the same line, because you wouldn't
know which of the two dereferences *itr would be executed first---like with 
the two pops in Assignment 2.
    (f) Here getUserID() is logically /const/ since it is a pure getter method.
The addBid method is logically a mutator of the User class.  [But, if the bids
are stored by pointer, as vector<Bid>* mybidsp; then the body can add the bid
without modifying the pointer itself, and "masquerade" as const.]
    (g) Here is the strategy broken up into steps:
	
1. Build uvec with U users, and build a set or hash "cont" with M cards.
2. Deal the bids to the users as in (3), which is always O(N) time.
3. For each user:
    3a) Iterate thru all that user's bids;   //will add up to O(N) time
    3b) Lookup the card in cont              //O(1) wish hash, O(logM) with set
    3c) Add the par price from the looked-up card and the bid price in the bid
        to running totals.                   //O(1) time after card lookup
    3d) Compute the P_u factor from the final running totals.   //O(1) time
4. Make the Users into a heap according to the P_u factors from part 3.  *Or*
   sort them according to those factors all the way.
5. Pop k times off the heap, *or*, count off the first k sorted items.

The total runtimes broken by steps depend on the choices in steps 1 and 4.
Every "O" is really a "Theta" as the runtimes are pretty much totally optimal:
1: O(U) + O(M) with hash, or O(U) + O(M*log(M)) with red-black tree set.
2 & 3: O(N) time overall with hash, *or* O(N*log(M)) with <set>.
4 & 5: O(U + k*log(U)) if using a heap, O(U*log(U) + k) if using sorting.
       [If you assume k <= U/log(U), these become O(U) versus O(U*log(U)).]
Hence the total is O(U' + M' + N'), where U' can be U or U*log(U), M' can be
M or M*log(M), and N' can be N or N*log(M) (note: log(N) is not involved).
Any answer with "log-linear" efficiency is full credit.


(5) 

I.
def apply(i:Int): A = {
   var itr = begin()       # BALBOA#Iter has fields liat for the linked list and ind for the array
   var listitr = itr.liat  # listitr() and listitr.next() give the corresponding array chunk
   var sum = 0
   while (listitr.hasNext && sum < i) {
      sum += listitr.next().size
   }
   //now we have gone one past, so use prev
   listitr.at = listitr.at.prev         //would actually need friending to write this
   val ind = i - (sum - listitr().size)
   return listitr()(ind)
}

A version without having to use prev to back up:

def apply(i:Int): A = {
   var itr = begin()       # BALBOA#Iter has fields liat for the linked list and ind for the array
   var listitr = itr.liat  # listitr() and listitr.next() give the corresponding array chunk
   var sum = 0
   while (listitr.hasNext && sum + listitr().size < i) {
      listitr.next()
   }
   val ind = i - sum
   return listitr()(ind)
}

Running time: O(r) for the listitr part, and if r = sqrt(n), that's the overall time.


II.
def merge(ba1: BALBOA[A], ba2: BALBOA[A]): BALBOA[A] = { 
   var ba3 = new BALBOA[A](keyComp)
   var itr1 = ba1.begin
   var itr2 = ba2.begin
   while (itr1.hasNext && itr2.hasNext) {
      if (keyComp(itr1(), itr2()) < 0) {
         ba3.insert(itr1.next())
      } else {
         ba3.insert(itr2.next())
      }
   } //one of the two following while tests will fail immediately---that's OK!
   while (itr1.hasNext) {
      ba3.insert(itr1.next())
   }
   while (itr2.hasNext) {
      ba3.insert(itr2.next())
   }
   return ba3
}


-------------C++ code, FYI.  Note: the tasks originally given were different: -----------
-------------Part I was to index in reverse order, and II merged ba1 into ba2 -----------
-------------Part I also asked for O(1) time when i is near one end of the BALBOA -------
-------------or the other.  Part II called ba1 "source" and ba2 "target". ---------------


(5) I. The following code assumes on as on the project that each ChunkNode 
maintains a field "mySize" with its current number of elements.

def revat(size_t i) {
    ChunkNode<T>* p = endNode->prev;
    if (p == NULL) { 
        throw runtime_error("Container is empty.");  //or etc.
    } //else: p is a real node with data.
    size_t currentFloor = size() - p->mySize;  
	while (i < currentFloor) {  //INV: since floor != 0 can't be first node
        p = p->prev;
        currentFloor -= p->mySize;
    }
    //now currentFloor <= i so we are in the correct node, could be firstNode.
	size_t localIndex = i - currentFloor;
	return p->elements->at(localIndex);
}

To make an "at(i)" function that works in O(1) time when i is near 0 or near
the end, first compare i with size()/2, and call the original "at" or "revat"
accordingly.  Note that size() itself is always required to be O(1) time, 
which is achieved by maintaining a field with INV: equals the current size.
The return type needs to be T& not just T to allow for assignment too.

II. The following generic code uses no extra assumptions.

template <typename T>
void merge(BALBOA<T>& source, BALBOA<T>& target) {
    typename BALBOA<T>::iterator sitr = source.begin();
    while (sitr != source.end()) {
        target.insert(target.end(), *sitr);
	    sitr = source.erase(sitr);  //the "sitr =" re-validates sitr
    }
}

That's all---the behavior of erase positions "sitr" on the next element already.
It was OK to do while(!(source.empty())) { ... } or just iterate thru source and
empty it later by the "range version" source.erase(source.begin(),source.end()).
The code can be made completely generic without even declaring the iterators:

template <class F>   //say "class" because iterator functions are assumed
void merge(F& source, F& target) {
    while (!(source.empty())) {
	    target.insert(target.end(),*(source.begin()));
		source.erase(source.begin());
    }
}