Data-Intensive Computing Research and Education
Project partially funded by National Science Foundation Grant NSF-DUE-CCLI-0920335



Data-intensive computing has been receiving much attention as a collective solution to address the data deluge that has been brought about by tremendous advances in distributed systems and Internet-based computing. An innovative programming models such as MapReduce and a peta-scale distributed file system to support it have revolutionized and fundamentally changed approaches to large scale data storage and processing.  These data-intensive computing approaches are expected to have profound impact on any application domain that deals with large scale data, from healthcare delivery to military intelligence. Given the omnipresent nature of large scale data and the tremendous impact they have on a wide variety of application domains, it is imperative to ready our workforce to face the challenges in this area. This project aims to improve the big-data preparedness of diverse STEM audience by defining a comprehensive framework for education and research in data-intensive computing.
Dr. Bina Ramamurthy is the director and principle investigator of this project.


CSE487/587 Data-Intensive Computing is a new course that has been designed to address big-data preparedness of our workforce.
This will be taught for the first time at University at Buffa
lo in the Fall 2010.


CSE487/587 Course Description

Date Topic Demo
8/30 Introduction to Data-intensive computing
9/1 Relevance of Cloud computing to Data-intensive computing TheCloud
9/3 Defining data-intensive computing DataInt
9/8
Searching with Lucene; project 1 discussion
Lucene
9/10 Project 1 discussion (contd.) : indexer demo; web crawler code analysis to understand its operation in aggrgating content; bean shell demo of the ch2. material A simple UML tool
9/13 ET1: Web services; Lets discuss a design for Project 1 ET1:WS
Pr1 Design
9/15
ET2: Multi-threading; Lets discuss a simple project structure; Project 1 groups; MapReduce/HDFS
Mr.HDFS
Demo
9/24 Running Hadoop MapReduce on EC2
9/29
More on MapReduce and HDFS
MapRedEngine

Movie data for MRHDFS applications
Data
10/1
Demo day: Data-intensive computing with Amazon EC2
AMI launch, connect, application deploy
10/4
Virtualization
Virtual
10/6 work on your project 1; attend Career Day
10/8 How do they do it? The Cloud Co.s
(Fri: Demo day)
More on virtualization (demo) + services
10/11
Project 2: Data Structures and Algorithms for Data-intensive Computing: Working with Amazon EC2
Prj2.pdf
10/13 Review for exam; Data-intensive computing @ Bloomberg: an informal chat with our alumni at Bloomberg
10/15 Midterm Review; ET: Infrastructure managment review
10/18
Project 1 Due;


Amazon AWS Explained
EC2 services
10/20 More Amazon AWS; review for the exam; How to study? AWS
10/22
Exam 1 during class time

10/25,27
Cloud cost model; metrics
Metrics
10/29 Lets discuss MapReduce & MR for K-means MR.Tutorial
11/1
Deployment on the cloud
CloudDeploy
11/3
Microsoft Windows Azure
Azure
11/5
Apache (Yahoo) Pig
Pig
11/8
Applying data-intensive methods to Classification
DAC
11/22
Hive on Hadoop from Facebook
Hive
11/29
Data-intensive Computing Impact Area #1
Bioinformatics
12/1
Data-intensive Computing Impact Area #2
FinancialEng

Review for final exam
FinalRev

Presentation Order
PresOrder
12/6PresentationsPresDec6
12/8PresentationsPresDec8
12/6,8,10
Presentation of your project (choice of 1 or 2)


Project 1: Basic Operations in data-intensive computing