Fine-grained phenotypes, comorbidities and disease trajectories from data mining of electronic patient records

Søren Brunak,
Technical University of Denmark & University of Copenhagen

Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases, drugs and genetic information in individual patients. Such data makes it possible to compute fine-grained disease co-occurrence statistics, and to link the comorbidities to the treatment history of the patients. A fundamental issue is to resolve whether specific adverse drug reaction stem from variation in the individual genome of a patient, from drug/environment cocktail effects, or both. Here it is essential to perform temporal analysis of the records for identification of ADRs directly from the free text narratives describing patient disease trajectories over time. We can then characterize the similarity of ADR profiles of approved drugs using drug-ADR networks and report on the relationship between the chemical similarity of drugs and their ADRs. Given the availability of longitudinal data covering long periods of time we can extend the temporal analysis to become more life-course oriented. We describe how the use of an unbiased, national registry covering 6.2 million people from Denmark can be used to construct disease trajectories which describe the relative risk of diseases following one another over time. We show how one can "condense" millions of trajectories into a smaller set which reflect the most frequent and most populated ones. This set of trajectories then represent a temporal diseaseome as opposed to a static one computed from non-directional comorbidities.


  • Using electronic patient records to discover disease correlations and stratify patient cohorts. Roque FS et al., PLoS Comput Biol. 2011 Aug;7(8):e1002141.
  • Mining electronic health records: towards better research applications and clinical care. Jensen PB, Jensen LJ, and Brunak S, Nature Reviews Genetics, 13, 395-405, 2012.
  • A nondegenerate code of deleterious variants in mendelian Loci contributes to complex disease risk. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, Melamed R, Rabadan R, Bernstam EV, Brunak S, Jensen LJ, Nicolae D, Shah NH, Grossman RL, Cox NJ, White KP, Rzhetsky A. Cell. 155, 70-80, 2013.
  • Dose-specific adverse drug reaction identification in electronic patient records, Robert Eriksson R, Werge T, Jensen LJ, Brunak S. Drug Safety, Mar 15, 2014.
  • Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients Jensen AB, Moseley PL, Oprea TI, Ellese SG, Eriksson R, Schmock H, Jensen PB, Jensen LJ, Brunak S. Nature Comm, to appear 2014.

Keynote Address: Leslie Lenert, Medical University of South Carolina

"The Fractal-Like Architecture of the Learning Health System"

Dr. Leslie Lenert, MS, MD, FACMI, is the Chief Research Information Officer of the Medical University of South Carolina (MUSC) and the Chief Medical Information Officer of Health Sciences South Carolina (HSSC), a statewide research collaborative.

A recent report from the Institute of Medicine entitled "The Best Care at Lower Costs" maps out the escape path for the United States from the paradox of its high-cost low-value health system. That path runs through the transformation of the nation's healthcare business sector into a Learning Health System. At MUSC and HSSC, Dr. Lenert is working to create a Learning Health System that spans individual practices, the research laboratories, clinics, and universities of the state of South Carolina. This talk will focus on the enterprise architecture of that system and how its structure is like a fractal, with parallels at each level of organization that make organizational learning, as well as scientific insights possible from the data generated by the routine care of patients. Creating a Learning Health System for an academic health center requires creating linkages between genomic and proteomic databases, tissue repositories, electronic medical records systems and data warehouses, and mobile and web based tools for capture of personal sensor data and patient reported outcomes. In South Carolina, linkages across systems are facilitated by a statewide research master person index system that is available as a web service. Another critical component of South Carolina's architecture is the HSSC Clinical Data Warehouse, which has transaction level data on patients on 3.2 million patients from the four largest health systems in the state and makes these data available in pseudo-anonymized form for research through an i2b2 database. Work with this database is supported by statewide human subjects permissions system that facilitates Institutional Review Board (IRB) reliance. A unique governance mechanism for access to statewide data protects the interests of organizational data contributors and those of patients, creating a collaborative environment for learning.

Biography: Prior to moving to MUSC, Dr. Lenert was the Associate Chair for Quality and Innovation of the Department of Internal Medicine, and Professor of Biomedical Informatics at the University of Utah. He has also held posts at the Centers for Disease Control and Prevention (CDC); the University of California, San Diego, and at Stanford University School of Medicine. Dr. Lenert received his MD degree from the University of California, Los Angeles and an MS in Biomedical Informatics from Stanford University. He is a practicing primary care physician with a 20-year history of research and development work in informatics and predictive analytics. He was a pioneer (1990s) in development of web-based systems for patient use and online research studies. In response to 9/11 attacks, Dr. Lenert led a team of engineers and computer scientists that developed the first wireless "location aware" EHR system for first responders, including the world's first WiFi pulse oximeter and electronic triage tag, obtaining more than $4 million in Federal funding. In 2007, Dr. Lenert became the founding Director of the National Center for Public Health Informatics at CDC. There, he managed the development of key national biodefense computer systems, including BioSense (which merged real time emergency room data from hundreds of hospitals nationally) and the Nationally Notifiable Disease Surveillance System. He also led efforts to integrate public health data systems with the Nationwide Health Information Network.

In addition to his work on the Learning Health System, Dr. Lenert researches approaches to help make healthcare safer and more patient-centric through the application of predictive analytics and collaborative filtering in medicine. An internationally recognized expert in informatics, he is a fellow of the American College of Medical Informatics and sits on the editorial boards of three leading journals in the field.

Towards systems level analysis of tumor heterogeneity.

Teresa M. Przytycka obtained her Masters degree from Warsaw University, Poland, and her PhD from the University of British Columbia, Vancouver. She is currently a Senior Investigator at the National Center for Biotechnology Information, National Institutes of Health (NIH) where she heads the Algorithmic Methods in Computational and Systems Biology research section and an affiliate faculty of the University of Maryland Institute of Advanced Computer Studies. Her group at NIH focuses on modeling dynamical changes of gene expression in response to perturbations and disease. Dr. Przytycka was a recipient of several awards including the I.W. Killam Memorial Predoctoral Fellowship, the Sloan Foundation and the U.S. Department of Energy Postdoctoral Fellowship in Computational Biology, the Burroughs Wellcome Fellowship in Computational Biology and a K01 NIH research development award. She is a co-editor of a book on protein-protein interactions and an associated editor of several high impact computational biology journals including PLoS Computational Biology, BMC Bioinformatics, BMC Algorithms for Molecular Biology, and IEEE Transactions on Computational Biology and Bioinformatics.

Uncovering and interpreting genotype/phenotype relationships are among the most challenging open questions in disease studies. In cancer, uncovering these relationships is complicated even further due to the heterogeneous nature of the disease. Over the years, we have developed several algorithms that help to analyze heterogeneous cancer data in the context of uncovering genotype-phenotype relations, identification of dysregulated pathways, and cancer classification. These approaches span a large spectrum of algorithmic techniques including optimization-based techniques and mixture models. Taken together, these approaches help to leverage datasets collected through TCGA and other initiatives for better understanding of cancer and cancer diversity.