Data-intensive
computing has been receiving much attention as a collective
solution to address the data deluge that has been brought
about by tremendous advances in distributed systems and
Internet-based computing. An innovative programming models
such as MapReduce and a peta-scale distributed file system to
support it have revolutionized and fundamentally changed
approaches to large scale data storage and processing. These data-intensive computing
approaches are expected to have profound impact on any
application domain that deals with large scale data, from
healthcare delivery to military intelligence. A new forum
called big-data computing group
has been formed by a consortium of industrial stakeholders and
agencies including NSF and CRA (Computing Research Associates)
to promote wider dissemination of the big-data solutions and
transform mainstream applications. Given the omnipresent
nature of large scale data and the tremendous impact they have
on a wide variety of application domains, it is imperative to
ready our workforce to face the challenges in this area.
This project aims to improve
the big-data preparedness of diverse STEM
audience by defining a comprehensive framework for education
and research in data-intensive computing.
Dr. Bina Ramamurthy is the
principle investigator of this project, and Dr. John
Benschoten and Dr. Vipin Chaudhary are the co-PIs.
A new certificate program in data-intensive computing
has been
approved by SUNY and is offered by the University.
The details are the undergraduate
catalog page here.
Invited
Presentations
International Conference on
Advances and Emerging Trends in Computing Technologies ICAET
2010, Chennai, India.
6/23/2010:
Women-in-Computing
Conference:
Data-intensive Computing
6/24/2010:
Cloud: The next
generation computer
CUBRC and General Dynamics
11/4/2010:
Cloud
Computing: Capabilities and Limitations
Monroe
Community Library System (Rochester, NY)
12/2/2010:
Cloud Computing
for Small Enterprises
WIPRO
Chennai, India
6/28/2011:
Cloud
Computing: Concepts, Technologies, and Business Implications
Erie
Community College
12/5/2011:
Computer
Science Education Week: Data-intensive
Computing on the Cloud: Concepts, Technology, and Applications
Niagara
Community College
2/7/2012:
Abstract:
Data-intensive
Computing: Concepts, Technologies, and Applications
Metropolitan State University Symposium on Big Data Science and
Engineering
10/19/2012:
Adopting Big
Data Across Undergraduate Curriculum (
CCR run)
Grants related to BigData
Participant in a grant
awarded to CUBRC, Buffalo, NY (1 summer month + 1
graduate student)
Participant in a
grant awarded to Industrial and Systems Engineering (1
academic month + 1 graduate student support)
Participant in a
NY state-level (suny.edu) Innovative
Instructional Technology Grants (2) (nominal stipend)
Undergraduate Researchers:
Bich Vu : Working on Security
Concepts
Andrew Small (Ingram Micro)
Xiang Lin: CSTEP student: Exploring Guttenberg: Text
Analysis using MapReduce
Austin Miller (Honors Thesis) A
Methodology for Transforming Common Algorithms to
MapReduce Framework (Google)
Regina May (CSTEP researcher): Disaster Recovery on the
cloud (M&T Bank)
Mohit Bansal (Honors Thesis): Extracting Information from
Large-scale Data using Probabilistic Methods (J.P. Morgan)
(Research presented at Conference on Academic Excellence,
April 2010, Buffalo, NY.
Graduate Researchers:
Rikson Vareed: MasterSolver:
Paralell Processing of Large Scale Tree Graphs: deals with
billion node graphs-tree processing
Ying Yang: Analysis of Twitter data (ongoing)
Suhani Gupta: Evaluation of Hbase for large scale data
(ongoing)
Eric Nagler: Lucene Indexing Wikipedia Data using MapReduce
(CUBRC)
Abhishek Agarwal : MOPS: A Modified Priority
Scheduler for Improved Resource Utilization, Cluster 2010.
Hingsik
Kim: Pop!World:
A Evolutionary Biology Tool (deployed on the Google
App Engine)
Amol Agarwal: Hosting Applications on the
Cloud
Laboratory Facility
Excellent facility for running Hadoop MapReduce
jobs on real cluster of hundreds of nodes is available at
the Center for Computational Research at Buffalo. See CCR Buffalo.
A five-node hadoop-based system NEXOS was built using old
commodity machines as a student research project. (This
prototye was used in a demo at CCSCNE 2009, romte access to
Buffalo from Plattsburgh, NY)
We use Amazon EC2 MapReduce workflow and also Google
App Engine for the projects.
Students and users can also install and use a standalone
2-node cluster on their laptop or desktop.
Getting started:
Hadoop Distributed File System (HDFS) and MapReduce
References and useful
links
Contact: bina@buffalo.edu
for more information.