CSE 250 - Spring 2008 - Banner
   CSE 250 - Spring 2008 - Data Structures
CSE 250 - Spring 2008 - Navigation CSE 250 - Spring 2008 - Assignments: Project 2

Project 2
Last modified: March 07 2008 01:42:28 PM

Introduction

In this project, you will be implementing the creation of an index for a book. Also, you will analyze the performance of different trees in performing this task.


Project Description

You will create a program that reads in a plain text file that contains the contents of a book (i.e. it's a large file) and creates an index of all of the words in the book. Once again, you can safely ignore punctuation, and words like Cat and cat should be considered the same. At then end, you be able to produce, on demand, an index of the book that contains an alphabetical listing of all the words in the book as well as their page numbers. Your program should be able to take in multiple books and create one index. Since the file you are reading in is a plaintext file, you can create "pages" in it by using the *standard* that there are 66 lines on a page.

The data structre you will use to organize the words should be a tree. You are not expected to implement the trees, feel free to copy the tree code from the text or another source (with proper attribution). You will create your program so that it can use either an AVL tree or a traditional binary search tree.

You should create a user interface for your program. This interface should allow us to input a filename as well as print out the index. We should also be able to switch implementations of the underlying structure using the interface (from AVL to BST or reverse).

You will create a paper for this proejct as well in which you will analyze and compare performance of the AVL tree and the standard BST on several large files.

In your analysis, you should compare the running times (system time) of overall creation of the tree, average insertion time for each word, and time to find the min and max of each tree. You should also give the total number of nodes in each tree, the length of the shortest branch, and the height of the tree.

Some sample files to work with (You must analyze the four files listed here. Note that these might not be the best files for your initial testing because they are quite large.)

The index of all the books available in the gutenberg project: http://www.gutenberg.org/dirs/GUTINDEX.ALL

No one gets through War and Peace: http://www.gutenberg.org/dirs/etext01/wrnpc12.txt

Pick another book from the Gutenberg project to run you analysis on. You must identify which book you chose.

A dictionary file in alphabetical order: On nickelback/timberlake in ~adrienne/250/Project2/enable1.txt


Due dates

This project is due for all students by 11:59:59pm on Friday, March 7th Tuesday, March 11th. You should name your submission Project2.zip (therefore, you should zip all of your files up before submitting).

 

CSE 250 - Spring 2008 - Footer

 

 
Page Maintained by: Adrienne Decker