Skip to Main Content

Digging Deep Into Data

February 09, 2016
by jill max

Until recently, clinical and other types of data were siloed in datawarehouses throughout the institution. Now, the implementationof a single unified EHR database offers new and varied opportunities to delve into data for research and analytics.

To help manage the vast amounts of data contained in the EHRand other databases, Yale utilizes Epic’s data warehouse, renamed Helix and customized with the capacity to contain all the clinical, research, financial, quality, and operational data across the Yale New Haven Health System (YNHHS) and the School of Medicine. Helix draws from such varied sources as the EHR; the Help Us Discover database with approximately 7,000 potential research subjects; patient satisfaction surveys; the Hospital Consumer Assessment of Healthcare Providers andSystems (HCAHPS); private databases; and the Social SecurityDeath Index to provide clinicians and researchers with unprecedented access to information. “Having a unified databaseallows us to ask questions that are multidisciplinary, whereas before that was much harder to do,” said Medical Information Officer Prem Thomas, MD, who leads the YNHHS data warehouse development team.

In one initiative, a multidisciplinary group of hospitalists, hematologists, and informaticists studied the use of blood products across the health system. Variances in blood utilization were analyzed using data from Helix. When variances in certain cardiothoracic surgeries were noted to be high compared to benchmarks, analyses were conducted that led to changes in equipment. Ultimately this process translated to dramatic decreases in intraoperative blood loss and a corresponding decrease in the need for transfusions.

For large data sets such as genomics and medical device data, Helix uses Hadoop, an open-source software framework that allows for the compression, storage, and fast processing of extremely large data sets at a low cost. In an effort to reduce noise from monitoring devices in the hospital’s neonatal intensive care units, alarm data—some 40,000 messages per second—were streamed into Hadoop. After examining the alarm thresholds, the Information Technology Services (ITS) team was able to reduce noise by almost 50 percent. “Now wecan open this up to all nursing units at Yale-New Haven and eventually Greenwich and Bridgeport Hospitals, because Hadoop can handle the volume of data streaming in,” said Charles Torre, Jr., ITS System Executive Director for Yale New Haven Health System. “It’s a patient and staff satisfier.” Torre and his team are starting to incorporate genomic data as well so that clinical researchers will be able to look at outcomes in the EHR and link them to genomic variants.

Recognizing the need to provide investigators with data warehousing support, Yale has developed a toolkit that includes such resources as i2b2 (Informatics for Integrating Biology andthe Bedside), an open source database structure originally developed at Harvard and used by Clinical and Translational Science Award (CTSA) institutions. This resource allows investigators to query for de-identified clinical cohorts for research. Other resources include the Shared Health Research Information Network (SHRINE), also used by CTSA sites,which expands this concept to other institutions and incorporates security measures; and Slicer Dicer, Epic’s data warehouse tool that works with Helix. Epic is also developing a platform that will allow institutions to opt in and share data on various disease registries and performance metrics, ultimately contributing to national and international benchmarking. With Epic’s patient base—the system is used with more than half of the U.S. population—this database will be very useful for investigators interested in extremely large cohorts and/or multicenter subject populations.

All of the informatics tools we put in place are helping investigators leverage, manage, and access the vast amount of data that’s available, but that was previously scattered in different places.

Allen Hsiao, MD, chief medical information officer for the School of Medicine and Yale New Haven Health System

The EHR contains a wealth of data captured in clinical notesaccessible only by reading them. Yale is working on implementing a natural language processing (NLM) engine to utilize this untapped source of potentially useful information. Data from written or dictated physician notes will be exported to Helix in real time. The data will be useful for qualifying diagnoses and accessing data for research, and could eventually be expanded to include pharmacy, diagnostic radiology, and lab notes.

It’s not just the quantity of data contained in these platforms that is relevant to clinical care and research. Data quality is alsoa concern. Thomas and his colleagues are responsible for ensuring that the underlying database structures, technologies, and datamodels provide the answers clinicians and researchers are seeking.For example, there are about 15,000 terms within the EHR for categorizing the various types of diabetes and their complications, so discerning which patients meet certain criteria is critical. Thomas thinks of his role collaborating with ITS and the Joint Data Analytics Team, which coordinates all clinical andresearch analytics, as being an “information gardener” responsible for pruning and checking the quality of data, often working at the interface between how data are stored and what they mean.

“All of the informatics tools we put in place are helping investigators leverage, manage, and access the vast amount of data that’s available, but that was previously scattered in different places,” said Allen Hsiao, MD, chief medical information officer for the School of Medicine and Yale New Haven Health System. “We’re excited to see the research that will take place as a result.”