ACM-BCB 2013

From Robot Motion Planning to Modeling Structures and Motions of Biological Molecules (4 hours)

Primary contact: Amarda Shehu amarda@cs.gmu.edu

Other Presenters: Juan Cortés and Nurit Haspel

In the last two decades, great progress has been made in molecular modeling through robotics-inspired computational treatments of biological molecules. Deep mechanistic analogies between articulated robots and biomolecules have allowed robotics researchers to bring forth methods originally developed to address the robot motion planning problem to address and elucidate the relationship between macromolecular structure, dynamics, and function in computational structural biology.

Tight coupling of approaches based on robot motion planning with computational physics and statistical mechanics have resulted in powerful methods capable of elucidating protein-ligand binding, order of secondary structure formation in protein folding, kinetic and thermodynamic properties of folding and equilibrium fluctuations in proteins and RNA, loop motions in proteins, small-scale and large-scale motions in multimodal proteins transitioning between different stable structures, and more.

The objective of this tutorial is to introduce the broad community of researchers and students at ACM BCB to robotics-inspired treatments and methodologies for modeling structures and motions in biomolecules. A comprehensive review of the current state of the art, ranging from the probabilistic roadmap approach to tree-based approaches, will be accompanied with specific detailed highlights and software demonstrations of powerful and recent representative robotics-inspired methods for peptides, proteins, and RNA.

Protein function prediction: formulation, methodology, evaluation, and challenges (2 hours)

Primary contact: Predrag Radivojac predrag@indiana.edu

Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high‐throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. In this tutorial, I intend to discuss difficulties in experimentally characterizing protein function, motivate and precisely formulate the protein function prediction problem and then present main computational approaches published thus far. Such methods utilize a variety of different data types, such as protein sequence, protein‐protein interactions, gene expression, protein structure, etc. Function prediction will be mainly discussed at the level of the entire protein molecule and from different viewpoints (biochemical, biological, phenotypic). Some time will be devoted to discussing metrics for evaluating protein function prediction. Finally, I will introduce the CAFA challenge dedicated to evaluating protein function prediction and briefly discuss the next CAFA challenge whose start is anticipated for the summer of 2013.

Creating Bioinformatic Workflows within the BioExtract Server Leveraging iPlant Resources (2 hours)

Primary contact: Lushbough, Carol Carol.Lushbough@usd.edu

Other Presenters: Rio Dooley

In order to handle the vast quantities of biological data generated by high-throughput experimental technologies, The BioExtract Server (bioextract.org) has leveraged iPlant Collaborative (www.iplantcollaborative.org) functionality to help address big data storage and analysis issues in the bioinformatics field. The BioExtract Server is a Web-based, workflow-enabling system that offers researchers a flexible environment for analyzing genomic data. It provides researchers with the ability to save a series of BioExtract Server tasks (e.g. query a data source, save a data extract, and execute an analytic tool) as a workflow and the opportunity for researchers to share their data extracts, analytic tools and workflows with collaborators. The iPlant Collaborative is a community of researchers, educators, and students working to enrich science through the development of cyberinfrastructure - the physical computing resources, collaborative environment, virtual machine resources, and interoperable analysis software and data services– that are essential components of modern biology. The iPlant Foundation API, developed through the iPlant Collaborative, is a hosted, Software-as-a-Service resource providing access to a collection of High Performance Computing (HPC) and Cloud resources. Leveraging the iPlant Foundation API, the BioExtract Server gives researchers easy access to multiple high performance computers and delivers computation and storage as dynamically allocated resources via the Internet.

Transcriptome Assembly and Analysis using RNA-Seq (2 hours)

Primary contact: Dongxiao Zhu dzhu@wayne.edu

Other Presenters: Tin Nguyen and Nan Deng

The ever increasing accumulation of RNA-seq data demands easy-to-use, reliable and scalable analysis software. RNA-seq data is frequently available in standard file formats, such as FASTQ, BED, WIG and SAM/BAM. RNA-seq software must be compatible with these formats. There have been major community efforts on algorithm and software development. Three major categories of tools are currently available: (1) Transcriptome assembly tools, ab initio assembler depends on the reference genome and annotation while de novo assembler does not; (2) Transcriptome quantification tools, either using read counts or exon coverage signal; (3) Transcriptome comparison tools. In this tutorial, we will provide a comprehensive review of the existing computational methods and tools in each category. We will also provide sample RNA-seq data analyses using the selected tools from each category.

Hands-on Experience with the MIMIC II Database: An Open-Access Database for Knowledge Discovery and Reasoning in Critical Care (2 Hours)

Primary contact: Mengling 'Mornin' Feng mfeng@mit.edu

Other Presenters: Thomas Brennan, Leo Anthony Celi and Roger G. Mark

Since 2003, our group has been building the Multi-parameter Intelligent Monitoring in Intensive Care II (MIMIC II) Database, which now holds data from about 40,000 ICU admissions. MIMIC II contains vital sign time series, lab results, imaging results, records of medication and fluid administration, staff notes, demographic data and more. Multichannel waveform data is available for a subset of patients. MIMIC II has been freely shared with the research community via PhysioNet (http://physionet.org/mimic2), and we currently have 600 users in over 32 countries.

This hands-on tutorial aims to introduce MIMIC II to the ACM-BCB community. MIMIC II is a valuable resource for research in:

In this tutorial, we will introduce The Story Behind MIMIC II. We will also share how MIMIC II has allowed our group to develop predictive models with actionable outputs that may potentially lead to measurable improvement in process and/or outcome. The tutorial will end with A Hands-on Tour of MIMIC II, where participants will learn to navigate through the rich data of MIMIC II. Basic data analytic examples are included to facilitate understanding of MIMIC II data schema.

GPU Programming for Bioinformatics Applications (2 hours)

Primary Contact: Soha Hassoun soha@cs.tufts.edu

Other Presenter: Ehsan Ullah

The use of Graphics Processing Units (GPUs) has recently emerged as a viable and effective option for compute-intensive scientific applications. The idea here is simple. Instead of using GPUs to speed rendering graphics on the screen for gaming applications, a programmer can utilize the GPUs to solve computationally challenging problems. The GPUs can deliver superb performance owing to hundreds of computing cores grouped into multiprocessors, all connected with impressive memory bandwidth. GPU Speedups (in execution time) have been achieved across multiple scientific application domains, including speedups of 130x for iterative image reconstruction for computational tomography, and 16x to 100x speedup for applications in computational fluid dynamics.

Do you have an application that you would like to speed up using GPUs? This tutorial presents the basic architectural features of GPUs, and basic programming constructs for CUDA™, NVIDIA’s parallel computing platform and programming model. A working example based on computing elementary pathways in biochemical networks will be presented to illustrate issues related to CPU to GPU data transfers, effective use of the memory hierarchy, and other GPU-specific optimization techniques. The tutorial caters to programmers who have a background in C/C++ and are interested in speeding up a parallelizable scientific application.