UB - University at Buffalo, The State University of New York Computer Science and Engineering

CSE 726: Data Intensive Distributed Computing

This page refers to the Spring 2011 offering of CSE 726 only. The information on this page does not necessarily apply to every offering of CSE 726.

Spring 2011

14719

Design, implementation and deployment challenges of Data Intensive applications on clustered, grid, and cloud infrastructures.

Scientific applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Large experiments, such as genome mapping, climate modeling, astrophysics, health sciences, and high-energy physics simulations generate data volumes reaching thousands of terabytes per year. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and Data Intensive Computing is now considered as the “Fourth Paradigm” in scientific discovery after Empirical, Theoretical, and Computational branches of scientific thought. This seminar will be discussing state-of-the-art research, development, and deployment efforts in running Data Intensive Computing workloads on clustered, grid, and cloud infrastructures. We will be reading and discussing two papers every week in one of the following areas: · Parallel Cluster File Systems · Wide Area Distributed File Systems · Wide Area Data Placement and Optimization · Cloud and Cluster Scheduling · MapReduce Improvements · Scalable Data Placement · Remote Data Access · Global Scale Distributed Testbed Design

None presently required.

Ph.D.: This course does not fulfill core area or core course requirements.

M.S.: This course does not fulfill core area or core course requirements.

Valid XHTML 1.0 Transitional