CSE4/587 Spring 2018
Welcome to Spring 2017 and Data-intensive Computing course. This course covers topics that are relevant to the emerging area of Data Science and Data-intensive Computing. Data Science deals with data acquisition, cleaning, exploratory data analysis, statistical modeling, algorithmic data processing, knowledge extraction, prediction and prescriptive analytics. Data-intensive computing deals with computing aspects such as the infrastructure, big-data architectures, data structures and algorithms that enable the Data Science. We will cover both aspects in this course.
Main text book for the course is:
F. Cady. The Data Science Handbook. Wiley. 2017. ISBN: 9781119092940.
We will be using many other references, online sources and textbooks throughout the semester. The details will be provided in the References tab.
A broad overview of the topics to be covered is given below.
Introduction to Data. Data Aqusition and cleaning. Exploratory Data Analysis (EDA) using R Language and Python. Data Visualization. Statistical modeling. Algorithms for big-data processing. Data bases for small-data and big-data. Infrastructures for big-data (Hadoop Eco-system). High speed, scalable big-data prcessing (Spark Eco-system). Computing on the cloud. Basics of blockchain technology. Research issues in Data-intensive computing.
All concepts discussed during the lectures will be reinforced by 3 labs.