CSE 250 - Spring 2008 - Banner
   CSE 250 - Spring 2008 - Data Structures
CSE 250 - Spring 2008 - Navigation CSE 250 - Spring 2008 - Assignments: Project 3

Project 3
Last modified: April 04 2008 09:41:11 AM

Introduction

In this project, you will re-implementing the project you completed for Project 1 with some new underlying structures.


Project Description

First, you should ensure that your project 1 works correctly with an underlying linear data structure.

Then, you should pick two other data structures to use as the storage of the word frequency counts. You have a choice of any structure we have studied thus far including heaps. However, you can not use both a Binary Search Tree and an AVL Tree as your structures. You can use one of those and one of another structure we've studied. You should think about which structure would be the most interesting/effective for the processes that the project asked you to implement. You will justify your decision in a paper described below.

Your user interface for this project should include the ability to read in a filename from which to read the words as well as the ability to switch the underlying structure used for reading in the file. Also, an option should be given to print out the words and frequencies as well as an option to search if a word is present in the structure.

In this project, you should be able to read in multiple files into the same structure and still maintain the word count. There is no need to differentiate between which word the file was in, just how many times the word has been seen across all the files. Searching for the word should once again reorder the structure so that words that have been searched for most are the easiest to find, followed by the words with the greatest frequency and then alphabetical.

Paper:

In the paper, you should first discuss your original implementation from project 1 and if you had to change anything to make it work. Then, you should discuss the two structures you picked for this project and the reasons why you feel those structures would be preferrable to your original structure. This discussion should include a discussion of the big-oh running times of the basic insertion and search algorithms for all three structures. It should also demonstrate your knowledge of the insertion and searching strategies of each of these structures, including any balancing or restructuring that the structures go through while performing these operations.

The next part of the paper should perform a system time analysis on the performance of your structure analyzing the amount of time taken to insert a file into your structure. You should use files of significant size. If you are not using a binary search tree, enable1.txt is an option. Also, the books from the Gutenberg project are good choices. However, feel free to use any large file that you'd like. You should clearly state where your files came from in the paper. You should analyze two different files for the following:

  • total running time to insert the file
  • average search time for a significant portion of the words in the structure (pick some number of them and cleary state how many you chose, how you chose them, and the results of your analysis)
  • search time for the word that is farthest from the initial "starting point" of the structure (you should discuss how you determined this for each structure)

The last part of your paper should be an analysis of the data structures you chose and your pick as the best data structure of the three you picked for this task based on running time (big-oh), actual time analysis and ease of use/coding for the programmer. If you feel that perhaps another structure would actually have been even better than any of the three you chose, state that here. You need to make an argument for your choice and defend that argument. There is not really a correct answer, but rather convincing arguments for the use of a particular structure.


Due dates

This project is due for all students by 11:59:59pm on Friday, April 4th Monday, April 7th. You should name your submission Project3.zip (therefore, you should zip all of your files up before submitting). You must include a makefile with your project.

 

CSE 250 - Spring 2008 - Footer

 

 
Page Maintained by: Adrienne Decker