index.html

Data-intensive computing has been receiving much attention as a collective solution to address the data deluge that has been brought about by tremendous advances in distributed systems and Internet-based computing. An innovative programming models such as MapReduce and a peta-scale distributed file system to support it have revolutionized and fundamentally changed approaches to large scale data storage and processing. These data-intensive computing approaches are expected to have profound impact on any application domain that deals with large scale data, from healthcare delivery to military intelligence. Given the omnipresent nature of large scale data and the tremendous impact they have on a wide variety of application domains, it is imperative to ready our workforce to face the challenges in this area. This project aims to improve the big-data preparedness of diverse STEM audience by defining a comprehensive framework for education and research in data-intensive computing.
Dr. Bina Ramamurthy is the director and principal investigator of this project.

CSE487/587 Data-Intensive Computing is a new course that has been designed to address big-data preparedness of our workforce.

CSE487/587 Course Description

Date	Topics	Lecture material	Demos/reading material
1/15	Data-intensive computing	What is it?
1/17	Two case studies: one from Fourth Paradigm text; another from Bioinformatics	DI:Climate DI: BioInf
1/21	Introduction to MapReduce framework and Hadoop Ecosystem	MR Parallel Processing
1/24	Attend Distinguished Speaker's talk	John H. Reppy, The University of Chicago, Diderot: A Parallel Domain-Specific Language for Image Analysis And Visualization	3.30-4.30 PM 101 Davis
1/29	Hadoop infrastructure	HDFS Getting stared with Hadoop Yahoo hadoop	T. White's MR Prelim Prj1
1/31	Introduction to CCR Hadoop Cluster	Hands-on tutorial; please bring your laptops	CCRIntro HadoopHowTo
2/5	Inside MapReduce: programming in MapReduce	Jean & Ghemawat's paper; Lin and Dyer (LD) Ch.1 and Ch.2	Ubuntu, Helios, MR Perspective
2/7	MapReduce Algorithm design	Ch.3 (LD)	Complete data aggregation by Tue 2/12
2/13	Continue with Ch.3: Best practices for MR algorithm	Design patterns for MR: Pairs and stripes	Demo of co-occurrence using pairs and stripes
2/15	MR best practices continued
2/18	Exam review; Lets apply the MR best practices to Text Retrieval	Exam review Inverted Index Ch4 LD	See yahoo tutorial chapter 4
2/21	MR.Graph algorithms	GraphAlg	Prj1Addendum
2/28	Prj2 discussion; continue MR.Graph and MR.RevIndex	InvertedIndexMore	Prj2
3/5	Midterm exam
3/7	MR.Graphs (continued); MR.Classificiation	MR.Classification
3/9-16	Spring Break
3/19	Graph processing; PageRank	Graph.PageRank (updated with pagerank details)
3/21	Introduction to Amazon AWS; esp. security model	AWS
3/26	Hive	MR.Hive
3/28	Fundamentals of Security	Security
3/28	Declarative script for MR: Pig	Pig
4/2	Pig continued	Developing Pig Script. Demo on aemr.
4/4	Design of Pig	DP.MR
4/9	NoSQL Database: HBase	Pig;Some old material HbaseIntro
4/11	Hbase (contd.)	HBase
4/16	Hbase (contd.)	Web services	HBase on EMR
4/18	KTAC; review for exam
4/23	KTAC; review for exam
4/25	Exam 2